
Jessica explores this intuition a bit here. Since that computation is being done by the daemon, it is embedded as a smaller circuit. If my daemon is doing some complex reasoning to answer the question "Should I predict well?" we could just skip straight to the answer "yes." This both makes the circuit smaller, and prevents the circuit from ever deciding not to predict well.Ī different perspective on a similar intuition: the daemon is doing some actual cognitive work to solve the problem. Intuitively, if we have a daemon that is instrumentally or incidentally motivated to solve my problem, then there is some smaller circuit that solves the problem equally well but skips the instrumental reasoning. Problem statement and intuitionĬan the smallest boolean circuit that solves a problem be a daemon? For example, can the smallest circuit that predicts my behavior (at some level of accuracy) be a daemon? Because we have an unusually strong intuitive handle on the problem, I think it's a good thing to think about. weight sharing in neural networks).īut I do think this question has deep similarities to more important problems, and that answering this question will involve developing useful conceptual machinery. I don't think this question has much intrinsic importance, because almost all realistic learning procedures involve a strong simplicity prior (e.g. That is, I suspect that the fastest way to solve any particular problem doesn't involve daemons. I suspect that daemons aren't a problem if we exclusively select for computational efficiency. I think that this problem can probably be patched, but that's one of the major open questions for the feasibility of prosaic AGI alignment.

Intuitively, the problem is caused by "too much compute." I suspect that daemons are also a problem for some more realistic learning procedures (like human evolution), though in a different shape. I am pretty convinced that daemons are a real problem for Solomonoff induction. But from a theoretical perspective, not knowing is already concerning-I'm trying to find a strong argument that we've solved alignment, not just something that seems to work in practice. I don't know whether this is a real problem or not.

Under different conditions, the daemon may no longer be motivated to predict well, and may instead make "predictions" that help it achieve its goals at my expense. I may get a " daemon," a consequentialist who happens to be motivated to make good predictions (perhaps because it has realized that only good predictors survive). Suppose I search for an algorithm that has made good predictions in the past, and use that algorithm to make predictions in the future.
