Now let us turn to Marcello’s objection that an agent with Roko’s goal system will try to destroy rival agents. My answer is that it seems to me that most agents will try to destroy certain classes of rival agents. There are reasons to think that Goal System Zero (GSZ) is more cooperative than most goal systems are.
If a person wanted to, he could probably specify a goal system that will try to help all agents it encounters, resolving all conflicts between the helped agents as peacefully and as fairly as possible. Such a goal system would be more cooperative than GSZ, but although it is a little reassuring to me that GSZ is cooperative, neither universal helpfulness nor fairness nor conflict minimization are terminal values of mine.
Let us examine what happens when an agent loyal to GSZ meets up with an agent loyal to the goal system of maximizing human happiness.
First, note that most people alive today who advocate the terminal value of human happiness have a strong preference for moments of human happiness that will occur in the near future. If you explain to them that what maximizes human happiness is for the humans alive now to spend all their time and energy designing and building superintelligences (SIs) to expand through the universe to turn the resources of the universe into more SIs so that billions of years from now all that engineered intelligence can be applied to creating humans and keeping them happy, well, they do not want to hear that: they want humans alive now to be happy; they do not want human alive now to devote themselves to a project that take billions of years to bear fruit. In other words, most people with goal systems that emphasize human happiness have goal systems that assign greater utility or moral value to moments in time close to here and now. Let us call such a goal system “non-time-indifferent”.
Let us suppose a GSZ agent meets up with an agent with the time-indifferent goal of maximizing human happiness. To the first order, the GSZ agent is perfectly happy to see the happiness maximizer dominate the resources of the part of the future shared by both agents — at least if the happiness maximizer is agent-indifferent as well as time-indifferent. Again, GSZ has no preference as to what eventual end the creativity of the universe is applied to: human happiness is just as good as anything else as far as GSZ is concerned. It would however disapprove of the premature pursuit of happiness.
I included the disclaimer “to the first order” because the GSZ agent has to consider what happens if the other agent meets a third agent intent on maximizing, e.g., gold atoms. The second and third agents might fight, and of course a fight has the potential to expend resources that could have gone into maximizing creativity. It is in the interest of GSZ to try to prevent the fight. Yes, one way to do that is for the GSZ agent to destroy the second agent — or to deny it resources.
But life as an SI loyal to an agent-indifferent goal system differs significantly from life as a human. For one thing, any SI Eliezer or I would endorse would know how to create SIs with any goal system the SI cares to specify. (This is a side effect of its knowing how to improve itself). This gives two SIs a powerful method to reach a compromise that is unavailable to humans: in this method, the SIs agree to destroy themselves while simultaneously creating a new SI whose goal system is a composite of the goal systems of the “parent” SIs. (E.g., the utility function of the child SI is the sum of the utility functions of the parent SIs. The child SI can be used to verify the destruction of the parent SIs.)
The availability of this high-reliability method for compromise suggest that if two SIs with conflicting agent-indifferent goal systems meet, they will expend very little of their resources fighting. So, GSZ’s motivation to destroy or hobble another agent-indifferent goal system to prevent it from fighting a third agent-indifferent goal system is likely to be weak. It seems to me that GSZ is more likely to provide positive assistance to an agent-indifferent time-indifferent goal system than to hobble or destroy it.
Note that when the goal systems under discussion are agent-indifferent, it seems harmless to blur the distinction between a goal system and an agent loyal to the goal system, which is what we have done in the previous couple of sentences.
The helpfulness that GSZ is expected to extend to an agent loyal to an agent-indifferent time-indifferent goal system does not extend to an agent with an goal system that lacks one or both of those two properties. GSZ is likely to consider these other agents to be rivals, but as far as I can tell, GSZ is not different than most goal systems in that regard — the exception being goal systems deliberately designed for cooperativeness or helpfulness.
Tags: machine ethics
So a GSZ agent maximizes the ability-to-get-things-done of the universe. Wouldn’t it then oppose a gold-maximizer, since a universe full of gold atoms is not maximally able-to-get-things-done? Or is the idea that a time-indifferent gold-maximizer would not produce gold (unless it was certain it was in a finite universe), but seek to refine its model of reality indefinitely (and thus cooperate with GSZ)? If so, doesn’t this mean you would consider the creation of a time-indifferent agent-indifferent gold/happiness/… maximizer just as good as the creation of a GSZ agent?
Also, what do you think of Peter de Blanc’s analysis of unbounded utility functions?
Nick, you got it.
I consider it almost as good. The article gives one reason why it might be suboptimal, and I will repeat the heart of the reason now: “The [happiness maximizer] and [gold maximizer] might fight, and of course a fight has the potential to expend resources that could have gone into maximizing creativity. It is in the interest of GSZ to try to prevent the fight.”
Is there something about my blog that makes it harder to read than, e.g., Eliezer’s blog? If so, I will get a blog of the type (Typepad) Eliezer got.
My benefactor Garcia believed that an optimal strategy for maximizing the creativity of the universe was simply at every moment of decision to increase creativity as much as possible. I have always assumed that is true because I cannot imagine some other property of the universe Q such that at some moment of decision, maximizing Q has greater expected utility (under GSZ’s definition of utility) than maximizing creativity. If that is indeed true, then obviously the implementor of GSZ will not need the utility of any outcome to rise above all bounds (because it need not refer to eventual outcomes at all; it need refer only to immediate outcomes).
ISTM that de Blanc’s result, or something close to it, should apply to that situation as well; for an action A, there should be an infinite sequence of hypotheses about the world for which the magnitude of the immediate utility of A increases faster than probability decreases, so E[U(A)] diverges. Also, I assume you wouldn’t want to greatly increase creativity now in exchange for a certainty of overwhelming loss in the future, in which case don’t you have to refer to outcomes?
Good point, Nick. I retract my argument for the position that GSZ does not suffer from the unbounded-utility problem.