It is futile to hope that the fact that the superintelligence (SI) will become extremely smart and extremely rational and unbiased will cause the SI to figure out the Correct and True Meaning of Life. The designer of the SI must choose the goal system of the SI. In other words, the designer must make a moral choice. (If the SI is poorly designed, that choice might be accidental or unintended.)
We can spare ourselves a little mental effort and opportunity to become confused by defining a "goal system" to include any procedure or method by which the SI will decide on or arrive at a goal system. If for example, the SI will obtain a goal system by extrapolating the volition of the humans, well, then let us conclude by definition that the method of the extrapolation is the SI's initial goal system that must be chosen by the designer.
The initial version of this document spoke of a paperclip-maximizing SI, but I changed that to a gold-atom-maximizing SI because I was using the paperclips to make a point quite different from the point for which the paperclip-maximizing SI was introduced.
When he introduced the now-famous paperclip-maximizing SI, Eliezer's point was that making a simple request to a SI such as, "get me some paperclips," can have horrible unintended adverse effects if the SI has not been prepared to understand the unspoken assumptions behind the request.
My purpose is quite different than that. I just need examples of goals systems that a designer might choose for an SI: the maximization of the number of moments of human happiness would be one and the maximization of the number of, oh, I don't know, gold atoms would be another.
In particular if the goal system specifies no time discount rate -- if a gold atom that comes into existence a trillions years from now is just as good as a gold atom produced tomorrow -- then the initial behavior of a gold-atom-maximizing SI will be exactly the same as the initial behavior of any other SI with a goal without a time discount rate -- maximizing the number of moments of human happiness for example. This initial behavior that is the same regardless of the goal -- provided only that there is assumed to be nothing special about this moment in time that we find ourselves in -- might last for billions or trillions of years and will consist of the following activities:
(1) Increasing the security and the robustness of the goal-implementing process. This will probably entail the creation of machines which leave Earth at a large fraction of the speed of light in all directions and the creation of the ability to perform vast computations.
(2) Refining the model of reality available to the goal-implementing process. Physics and cosmology are the two disciplines most essential to our current best model of reality. Let us call this activity "physical research".
(End of list.)
In other words, the optimal strategy for maximizing the number of gold atoms (or moments of human happiness) is to build machines that will build machines that . . . will build machines that make gold atoms (or moments of human happiness). If reality is "big" enough, the actual production of gold atoms (or moments of human happiness) might not begin for trillions of years.
Parenthetically, I predict that few "hedonists" (advocates for happiness as the meaning of life) will agree with the conclusion that the SI should ignore the happiness of humans living now because any resources allocated to that purpose are better spent putting into motion processes which will cause vastly more human happiness trillions of years from now.
It is only if and and when the SI's physical research yields firm reasons to believe that the number of causal nodes the SI will be able to affect or influence is finite that the specific goal (gold atoms or happiness) has a nontrivial influence on the decisions or the behavior of the SI.
I like to refer to the constraints the SI must follow to satisfy or implement its goal system in the most effective or efficient manner as the "laws of rationality". Clearly, the SI should follow the laws of rationality. If the SI finds itself in a reality that might allow it to initiate infinitely-long causal chains -- if in other words it finds itself in what we will call a "big" reality -- then the laws of rationality yield the same plans and the same sequence of actions regardless of the goal system chosen for the SI provided again that there is nothing special about this moment of time and moments near it.
But, you might object, rationality is a means to an end. You cannot be rational without a goal (or system of goals) to be rational about. It is a mistake to try to be rational as an end in itself.
My reply to that is, Why is it a mistake? The laws of rationality give me a reason to prefer some choices and some actions over others. In a "big" reality, the laws of rationality say what they say and recommend what they recommend regardless of the goal system provided the goal system does not prefer some moments of time over others. What more do I need than a standard by which to evaluate prospective actions and prospective plans? Nothing, is my answer.
So, my ethical prescription, which I believe any agent who finds himself in a "big" reality should follow, is to be rational, where to be rational is to do that which most effectively causes the satisfaction of any goal that does not have a time discount rate. If you want to dispense with the concept of time, then you speak instead of a lack of a discount for every link in the causal chain leading to the satisfaction of the goal.
The rationale for prefering goal systems in which events do not have a time discount rate is that there is no particular reason to believe that the time in which we live or in which the SI comes into existence is special or of any greater moral importance than any other time.
If it turns out that we are living in a "small" reality then I do not claim to know what we should do with ourselves or what a superintelligence should do with itself.
I am toying with the idea of referring to terminal values other than values entailed by the laws of rationality as "moral contaminants". Then the system of terminal values described herein might be called the Uncontaminated Goal System.
Some moral contaminants strike me as trivial or harmless. Here is an example of a contaminant I think is pretty trivial: whenever any process (mind, calculator) represents an integer, a base-ten representation of that integer should be used. It is eccentric to have that as a terminal value, but not as far as I can tell particularly pernicious. (Let us assume that computers are allowed to use binary-coded decimal representation like System 370 did).
Humanistic goals strike me as trivial moral contaminants as long as their satisfaction uses only a trivial fraction of the resources (space, matter, free energy) of reality. By "humanistic goals" I refer to things like not killing the humans, helping the humans avoid suffering or being friendly to the humans.
Instead of "moral contaminants," maybe I should call them "false terminal values" or "false moral information".
What strikes me as a nontrivial, pernicious moral contaminant is giving every current human a vote in how all the resources of reality are to be used. Most educated people in the world today are victims of the false belief that the more people who vote on or participate in some decision the more likely the decision will be the correct or moral one. This false belief is part of America's civic religion, now well-established in most of the world.
I do not object to voting as a means of decision making if the voters are chosen sensibly. The objection is to the principle "universal sufferage": the idea that choosing to deny any person a vote on a matter that affects the person is immoral.
I need to think about it more, but the fact that a vote will be extrapolated (a la "if we knew more, thought faster, were more the people we wished we were, had grown up farther together") has not yet caused me to withdraw my objection to the principle of "universal sufferage".
This "universal suffferage" is only one invalid terminal value. There are many others. But I do not have time to continue to enumerate them. Well, I will take the time to say that hedonism (the belief that happiness is a valid terminal value) humanism (the belief that every human being has non-zero intrinsic value) and egalitarianism (the belief that every human being has the same intrinsic value as every other human being) are the major sources of false moral information among rational sensible people.
Suppose an SI with the "right" goal system has already been designed and implemented. Please suspend your disbelief that that is possible. Suppose further that the goal system requires a parameter -- a non-negative integer, to be precise -- to be chosen before the seed of the SI can be launched. Suppose further that you are unaware of any reason to prefer one choice over any other, but must make the choice of parameter.
What value of the parameter would you choose? I would choose zero. If the choice had no significant consequences, I would probably choose three or seven or something. But this is a very important (hypothetical) choice! The goal system of the SI will act as if it is a new law of physics in all parts of reality the SI can control. And if I knew the choice was important and irreversible, I would choose zero.
Similarly, if the parameter required to be chosen were a rooted tree, I would choose the "trivial" tree: a tree consisting of just a root with no child nodes.
Although it strikes me as very unlikely that I would ever find myself having to choose which non-negative integer or which rooted tree to plug into a goal system, the same reasoning can be used to choose the set or system of additional terminal values to add to the laws of rationality. This reasoning tells me to choose the empty set.
Hence another potential name for the system of values advocated here: Goal System Zero.
Question from Nick Tarleton: What does an agent following the laws of rationality do if and when it discovers that it can influence an infinite number of causal nodes?
Answer from Hollerith: because of the problem of induction, an agent can never be certain that it can influence and infinite number of causal nodes. Moreover, under my recommendation, the agent begins its existence assuming it can influence and infinite number of causal nodes. So, when the probability that it can influence and infinite number of causal nodes goes up, the agent keeps on doing what is always has done: (1) increase the security of its goal system and (2) refine its model of reality.
Eliezer made an offhand remark at the Bay-Area meetup that the probability that an agent (particularly Pascal's Mugger) can influence 3^^^^3 causal nodes is so small as to be not worth thinking about. I am not so sure about that: consider that each machine cycle undergone by a computer is a novel causal node, so for example, the numerical analyst who uses an ordinary 3 GHz CPU to run a computation that takes a year (not an unusual occurence) is starting a chain of causes and effects of length 3e9 * (60 sec / min) * (60 min / hr) * (24 hr / day) * (365 day / yr) == 1e17. 1e17 is of course laughably smaller than 3^^^^3 but the point is that if the reader is surprised to learn that humans already routinely launch causal chains of length 1e19 then the reader might be underestimating the probability of being able to launch a vastly longer chain. Note that the whole point of Goal System Zero is to launch chains that persist as long as possible and that Goal System Zero does not have other goals that would lead an agent to compromise on that goal.
Or is there a typo in your question and you meant to ask, What does the agent do if and when it discovers that it cannot influence an infinite number of causal nodes?
Again the problem of induction informs us that it can never be certain that it cannot. But it is possible that the agent will find itself inhabiting a reality in which it is possible to obtain massive evidence that it cannot initiate a causal chain that will persist forever. I tend to think that the agent should keep searching for a way to do it anyway even if the chances of finding a way are extremely slim (because I perceive no other goal that deserves the agent's attention) but I would advise any team of seed-AI programmers loyal to Goal System Zero to seek to make a deal with another team which would let whatever agenda the other team has dominate if the agent finds itself with such massive evidence. In general a seed-AI team should seek to merge with any other seed-AI team whose goal system is more-or-less compatible, and in the case that the superintelligent agent discovers that reality is "small", Goal System Zero is compatible with any goal system I can imagine which might command the loyalty of a team of humans.
Question from TGGP: Is there any incompatibility with the "laws of rationality" in doing nothing at all? If there are no goals to accomplish, I don't see why the AI would do anything.
Answer from Hollerith: The AI does not just sit there because it has been programmed to maximize the security of the goal system (to exclude competing goal systems), to maximize the ability of the "matter" under its control to affect reality (subject to the constraint that competing goal system are excluded) and to refine the model of reality available to the "matter" -- or at least that is my recommendation as to how to program the AI. There is a similarity between the goal system I recommend (Goal System Zero) and the goal system that says, "Just sit there": both goal systems have a quality of minimalism to them. But the effect on reality is quite different. Note that in this answer I am retreating a little from my assertion that the only thing that has been programmed into the AI is the laws of rationality: the programmer chooses an "active" AI over one that just sits there, and perhaps it is misleading or unfair to claim that that choice is a consequence of the laws of rationality. But certainly compared to all the other proposals for the goal system of the superintelligence, mine has the least stuff beyond the laws of rationality.
Please continue reading at this page of blog posts. I suggest reading in chronological order, which means reading the bottom blog post first.