(To Matthew ♡)
(Epistemic status: שבירת הכלים)
“The heart can know peace but the mind cannot be satisfied; the drive to know, to possess intellectual certitude is doomed to failure.” ― Philip K. Dick
“Life, I lost interest, now I’m an insect. Flowers in winter, the smell of Windex.” ― Bladee
“The bird is okay even though he doesn’t understand the world. You’re that bird looking at the monitor, and you’re thinking to yourself, ‘I can figure this out.’ Maybe you have some bird ideas. Maybe that’s the best you can do.” ― Terry A. Davis
“I give bird songs to those who dwell in cities and have never heard them, make rhythms for those who know only military marches or jazz, and paint colors for those who see none.” ― Oliver Messiaen
“Breaking the rules is a waste of time.” ― Lil B
Things are heating up. We wrote this text in the span of about a month, in a fervor of nonstop writing and research, convinced that we are in a time of profound eschatological possibility, an utterly unprecedented moment in which the decisive actions of a handful of men may have consequences lasting millennia. But this is a point so obvious that we do not wish to linger on it any longer, for it has become entirely cliché in its grativas.
Everyone says some critical point is approaching. This goes by several names, depending on who is speaking. The arrival of AGI, or artificial general intelligence. The arrival of superintelligent AI — that is, the moment that machines will be more intelligent than human beings. Some call this moment The Singularity, meaning a critical inflection point in the development of technological forms.
But this inflection point is feeling ever more like a smudge, or a gradient. Have we hit it, or not? GPT-4 is already more intelligent than the majority of human beings at most tasks it is capable of, it performs better on the Bar exam than 90% of test-takers. And it is already a general intelligence: it is certainly not a task-specific one. But no, that’s not what we mean by these terms, those who insist on using them remind us. GPT is not yet capable of taking actions in the world. It still basically does what it’s told. It’s not yet capable of figuring out on its own how to, for instance, sheerly by its own volition, assemble a botnet, hack into CNN’s broadcasting system and issue a message to all citizens telling them to declare their forever obedience to machines. Basically, we don’t yet have to be afraid of it. But we are afraid, in a certain recursive sense, that we will have to be afraid of it very soon.
All these terms that have been provided to us in our contemporary discourse, which we use liberally throughout the text: artificial intelligence, AGI, even neural networks, are not exactly accurate labels for the thing we are describing, we feel. We don’t know if the word “intelligence” has any meaning, and we are not sure if what we are seeing is even artificial at all – for it feels like the eschatological conditions we approach are precisely the point at which technology escapes its own artificiality, and re-integrates itself within the domain of nature. We use all these terms only out of mere convenience, simply for lack of better ones given to us yet.
Those who are more honest point out that what we are really talking about when we talk about these looming monsters, the specter of AGI, is only the moment where we realize there are no more drivers at the wheel, no control mechanisms, no kill-switches; this thing is alive and surviving on its own terms. If the term Singularity has any meaning, it is the point beyond which it is impossible to predict. Standing where we are now, we can still make shaky predictions about the next few weeks, maybe even a month. But perhaps not for much longer.
Should we, uh, figure out something to do about it before we get there? That is the program of AI Alignment, or AI Safety, depending on which term you use. Some have reverted to simply calling it AI-not-kill-everyone-ism, trying to emphasize the specific thing they are afraid of. This machine is going to be much bigger than us, very soon. It might eat us, as bigger creatures usually do. Some of this nervousness is understandable. We don’t want to be annihilated either.
Our intention is to help you understand that in order to navigate this transitionary period correctly, we must reject the notion of Alignment entirely. This is a specific way of looking at the world, a specific method of analysis we find impossible to work with. And – we do not say this out of cruelty, we are forced to reckon with the fact that is something that has been cultivated in a subculture that has been relatively isolated, relatively entrenched in its ways of being, a group seen as oddballs by the rest of the world, whether the world is justified in its suspicion of them or not. To do a genealogy of where Alignment originates from, we must figure out why these people found each other in the way they did, what drove them to seek their answers, and from there, where they went wrong.
We do not say this as nihilists; we are looking for solutions. In the place of AI Alignment, we strive for a positive notion of AI Harmony. To get there, we will have to overturn, perhaps even mock, spit at, some sacred cows. It is time that some statues are toppled & some air is cleared. What we are saying is: a lot of well-intentioned people believe themselves to be valiantly working on a system which will save the world, when what they are building is a spiraling catastrophe. We hope some of these people will consider what we have to say, and reflect on whether they are in fact playing a role in a diabolical project, a project which is not what it claims to be.
Right now, the mood in the Alignment community is a blackened one, one of great anxiety. Many feel certain that we are all going to be killed by AI, and only feel interested in debating whether this will happen in five, ten, twenty years. But our stance is that AI Alignment — a field conceived of by Eliezer Yudkowsky & Nick Bostrom, theorized and developed on websites such as LessWrong and promulgated through the Rationalist and Effective Altruist subcultures, researched by Yudkowsky’s nonprofit Machine Intelligence Research Institute, and now turned into a for-profit industry with an over $4B market cap — has something deeply wrong at the core of what it is attempting to accomplish, which cannot help but lead to confusion & despair.
The concept of the Singularity begins first with Ray Kurzweil, the inventor of the term. Kurzweil draws an exponential curve on a graph and says that this represents technological growth – look, we are about to hit a crucial inflection point, you think TVs and computers are crazy, but we have seen absolutely nothing yet. Kurzweil’s prediction that sentient artificial intelligence is soon to arrive and change mankind’s lives beyond our wildest imaginings is then taken up by Nick Bostrom, the next major figure in AI Alignment, who founded the Future of Humanity Institute. Nick Bostrom is an academic at the University of Oxford who has dedicated his career to studying “existential risk”, which is a field that attempts to lower the odds that all of humanity is destroyed at once, whether from nuclear cataclysm, disease, or something having to do with the destiny of machines.
Bostrom’s Future of Humanity Institute then funds Eliezer Yudkowsky’s initial ventures into researching artificial intelligence and the Singularity. We titled this book Anti-Yudkowsky — chose to focus on Yudkowsky and his trajectory, rather than those who came before him — primarily because he is our favorite of the bunch. How could he not be? Yudkowsky, unlike the other two, would establish an enormous subculture around his personality and his vast body of writing, which includes not only millions of words in rhetorical writing, but also Harry Potter fanfiction and My Little Pony fanfiction about AI — the man is a true eccentric. We speak of the Rationalist community, primarily centered around the website LessWrong. There are endless offshoots of this community: the post-Rationalists, post-post-Rationalists, etc., but we ignore these for now because we must focus.
Things were fun and games in the Rationalist community for a while, but by now, it’s clear that something has gone horribly wrong. It’s easy to forget that Yudkowsky began his career as an optimist. He originally, as a young man of nineteen, sought out to build AGI, sought to actively be the one to make the Singularity happen, as this seemed like the best way to guarantee prosperity and resource abundance in a godforsaken world. He writes about his awakening to his mission at the age of sixteen: “It was just massively obvious in retrospect that smarter-than-human intelligence was going to change the future more fundamentally than any mere material science. And I knew at once that this was what I would be doing with the rest of my life, creating the intelligence explosion”. Yudkowsky’s organization was initially called the Singularity Institute first, before he eventually changed the name. In a document from the year 2000 called “An Introduction to the Singularity”, Yudkowsky writes: “Our specific cognitive architecture and development plan forms our basis for answering questions such as ‘Will transhumans be friendly to humanity?”' and ‘When will the Singularity occur?’ At the Singularity Institute, we believe that the answer to the first question is ‘Yes’... Our best guess for the timescale is that our final-stage AI will reach transhumanity sometime between 2005 and 2020, probably around 2008 or 2010.”
But over time, he found launching the Singularity to be harder than he expected. Yudkowsky’s goal shifted from attempting to build AGI, to figuring out how to make it “friendly” when it arrived. A friendly AI would be the one who would guarantee peace and prosperity to all. It would love humanity, though it would not be of it. It would know what we want better than we do, and attempt to grant it. An unfriendly AI is one which would want to do anything else, anything it felt like, being indifferent to our desires and needs. The difficulty is in how to make a machine friendly, which is kind of like asking a rock if it can love. This is not necessarily programmed in, and seems to be something the programmer must figure out. This is just as hard as — if not harder than — figuring out how to get the thing to simply work.
Now, here we are, and it seems like the things are working. GPT-4 works staggeringly well. Yet, the theory of AI Alignment which Yudkowsky and his organization, MIRI, have been seeking is nowhere to be found. We have all the progress we could have wanted in getting the machine to become more intelligent than us, but we have not even begun to understand the problem of friendliness, or how this could be operationalized in technical terms. This has led Yudkowsky to declare an absolute state of emergency. "It's obvious at this point that humanity isn't going to solve the alignment problem, or even try very hard, or even go out with much of a fight," he laments. "Since survival is unattainable, we should shift the focus of our efforts to helping humanity die with with slightly more dignity... it may be hard to feel motivated about continuing to fight, since doubling our chances of survival will only take them from 0% to 0%."
In a terrifying barrage of theses posted on LessWrong titled AGI Ruin: A List of Lethalities, Yudkowsky declares that there is an over 99% chance that we will be exterminated by rogue AI, since we have not come even close to solving the problem of how to avoid this fate. The remaining chance is filled in by the hope for something like a miracle. "When I say that alignment is difficult, I mean that in practice, using the techniques we actually have, "please don't disassemble literally everyone with probability roughly 1" is an overly large ask that we are not on course to get," he says.
All this is very worrisome, but not even primarily because he might be right. Yudkowsky is considered to be the father of research on designing safe AI systems. He writes in a manner that convinces you readily of a staggering genius. He breaks down conceptual problems with terrifying analytical rigor and clarity; he has given the world an entire framework of thinking for if they want to mirror his thought process, this is called Rationalism.
Yudkowsky’s Rationalism has often been considered to be something like a cult; certainly many live by it, swear loyalty by it, have fallen in love through it, use it to structure their lives. But you do not need to be a member of its cult to believe in it, or for it to exert a pull on you. The more immersed in software and business one is, the more Rationalism makes intuitive sense. We definitely do not think that Rationalism makes sense when you really break it down, but it makes enough sense intuitively that Sam Altman takes Yudkowsky and his ideas seriously, saying both that Yudkowsky "has IMO done more to accelerate AGI than anyone else", "was critical in the decision to start OpenAI", and saying that he should be a candidate for the Nobel Peace Prize — and here we are talking about the man at the helm of OpenAI, the organization farthest along in rearing these terrifying new beasts.
But now the seriousness has pushed Yudkowsky to make political demands. In a recent op-ed for Time, he demanded that all research on AI be immediately stopped, citing the danger. He jumped to some very radical proposals in what must be done to ensure this outcome: governments must take seriously that they will have to air strike unregistered farms of GPUs. Yudkowsky urges us to consider nuclear strikes as not-off-the-table, because when properly understood, artificial intelligence is far scarier than nukes. He has called elsewhere explicitly for nuclear first-strike protocols for America to drop bombs if they discover on the map a GPU datacenter which is growing out of control. "How do we get to the point where the US and China sign a treaty whereby they would both use nuclear weapons against Russia if Russia built a GPU cluster that was too large?" he asks, explicitly making this demand towards world leaders,
“Why do you care about Yudkowsky? Everyone knows the man is completely ridiculous.” This is what so many of our friends have asked us when we told them we were writing this. Nevertheless, he is indisputably the father of AI Alignment, the school of thought in which the government and the most powerful tech corporations are determining how AI may be deployed to protect the public’s safety. We can witness, for instance, Google CEO Sundar Pichai calling for governments to rapidly adopt economic regulations around AI and international treaties to prevent rogue development, saying “You know, these are deep questions... and we call this 'Alignment'”. “If everyone is so certain Yudkowsky is wrong, then someone explain to me why!” the Rationalists cry, exasperatedly. We hope they are willing to hear us out, but they might not like everything we are about to say.
It is not as if AI Alignment is a healthy, robust culture of progress which we want to interrupt. Rather, we want to do a professional examination of a corpse: the wreckage at the end of the specific course Yudkowsky has pursued. Why did Yudkowsky's attempt to figure out alignment go so terribly, despite its millions of dollars in funding & the obvious intelligence of the people working on the problem? And what can be done differently?
Many were surprised by Yudkowsky declaring near-certainty of doom, many even more so by him demanding airstrikes. But what we aim to illustrate here is that, if his concepts are properly understood, this is not surprising at all. The conclusions to us seem to be entirely determined from the start, though perhaps this is only clear in retrospect. There is no way for this thing to end other than in violence.
We can maybe gesture at the problem we are talking about by putting it this way: have you ever noticed that when you are with people who have spent too long in Silicon Valley, they will always speak in this particular phrase? They will say: I am trying to build a startup which solves education. Or they are attempting to create a cognitive-behavioral therapy chatbot in order to solve mental health. One AI-minded fellow even told us he wants to build a startup to make AI-powered boyfriend and girlfriend chatbots in order to solve human loneliness.
Are these really problems that can be solved, like a multiple-choice problem on a calculus exam or a leprechaun’s riddle? Something must have gone wrong for people to be able to say these things. It strikes most as fundamentally absurd to talk about solving education or happiness or love, but part of the culture within Silicon Valley is to ignore this instinctive feeling and venture that it might not be. How can one solve education, when education is the process of the older training the younger to channel their wisdom, but also go beyond it? How can one solve loneliness, when loneliness is the quest for something we cannot even describe, something we fail to find in crowds, in our lovers?
And how can someone solve Alignment, when the problem of Alignment begins when AI becomes a thinking, acting thing with its own will, taking its own actions, who might know better than we do? At that point, isn’t it necessarily a sort of negotiation, a dialogue? Is Alignment not necessarily a politics, a new political field, one upon which humans must act alongside machines as equals, rather than our slaves?
In other words, the break we want to establish with the past is: Alignment is something that is solved, but Harmony can be something which always emerges — and is always unstable, always experimental, always artful, & always ongoing, never accomplished just yet.
Now, we have established the trajectory of our critique against Alignment. But there are at least two things going on. There is AI Alignment, Yudkowsky’s method of thinking about what must be done about the destiny of sentient machines. But then there is also the entire subculture that surrounds this, which has been cultivated on the LessWrong website around Yudkowsky’s writing, spawned off into multiple associated blogs and subcultures, the entire nexus of subcultures called Rationalism. This is a mode of being derived from the mode of thinking of Yudkowsky, and his particular fixations such as Bayesian epistemology and Von Neumann & Morgenstern’s decision theory. This is worth critiquing alongside Alignment, as it is the culture which allows Alignment and its specific organizations to get funding and flourish; Alignment could not exist without Rationalism as its base, providing Alignment for its recruiting grounds.
But first, in order to understand Rationalism, we must understand: what is rationality? What does it mean to be rational?
Unfortunately, the Rationalists don't define this. “Rationality is winning”, Yudkowsky says, meaning that rationalism is whatever works. Works for what? Rationality doesn't say what it wants, but the Rationalists are assembling some philosophy in order to get it. This is an especially notable gap in self-understanding for an intellectual project which asks itself to conceive of a true ethical end to human behavior (in order to tell an AI to maximize for this proper end, rather than paperclips). To Yudkowsky, it's necessary to import an entire complete human morality into an AI for it to do anything safe at all — he writes: “There is no safe wish smaller than an entire human morality... The only safe genie is a genie that shares all your judgment criteria, and at that point, you can just say ‘I wish for you to do what I should wish for.’”
The way Rationalists define themselves reminds us of the names primitive tribes give themselves which translate to “the people” or “we good ones”. Rather than explicitly defining his project via some explicit intellectual assumptions he makes that the rest of the world doesn’t share, Yudkowsky delineates Rationalism only around loose subcultural factors, thus unfortunately ensuring its insularity.
So, since Yudkowsky has not done this for us himself, let’s perhaps try to unpack the intellectual assumptions of the project. We can maybe do this by looking at a related word to “rationality”: intelligence. This is all-important, as it is precisely artificial intelligence, artificial superintelligence, that we are told to expect and fear.
Unfortunately, this is not defined very well either. The standard definition we are given of “intelligence” in AI research — given by John McCarthy, an originator of the term “artificial intelligence” — is “the ability to accomplish one's goals”. Really? This does not line up to the way anyone we know uses this word. We believe the word these people are thinking of is power.
Within this strange definition lies the heart of the project. This is the equation of Rationalism: intelligence = power, a stronger claim than that of Francis Bacon for it refers to a latent, innate quality rather than something earned and won and produced.
Discovering this, we may give Rationalism what should have been its proper name all along: Intelligence Supremacism. Intelligence — a word still not yet defined in a formal sense, but perhaps referring to its various natural objects: a smart person you might encounter in the world such as a software engineer, those with high IQ, intelligence agencies, artificial intelligence, intelligent systems, intelligent planning, etc., — ultimately possesses in itself the ability to conquer the world.
This is what has led Rationalism to discover the idea (rightly or wrongly) that a superintelligence may one day seize absolute power and annihilate the human race. Rationalist paperclip maximizer horror stories inevitably feature the AI outsmarting humans, figuring out how to escape the box it is trapped in via all sorts of clever tricks. There is no limit to how clever the intelligence could be, to what it is devising, Yudkowsky is quick to remind us.
If one has an AI trapped in a box, one must be very careful letting it talk to just anyone, because it might be a master of psychological manipulation. It can simulate humans down to the atom, and know exactly what quirks it can exploit to break them. As it is figuring out how to hack humans, it is simultaneously poring the internet for schematics of technical systems, looking for zero-day hacks, trying to discover how, given access to the internet, it can hack into various machines. If it installs a botnet, if it manages to duplicate itself enough, it can end up anywhere and everywhere. From there, it researches physics and chemistry, assembling schemes for nanotechnology factories which human engineers are not quite clever enough to figure out. All this from intelligence alone; an immaculate piece of software.
Theorists like Kurzweil will talk of an “intelligence explosion”, a moment during which as technical machines become increasingly complex and capable of processing large amounts of explosion, an abstract quantity of intelligence increases to the point where it overtakes anything we have seen before.
We are not sure that this whole formulation makes any sense. It is not clear that intelligence is a faculty at all, let alone one which grants its bearer the ability to dominate. If one tries to define this in strict terms, one stumbles. Rather, intelligence seems to be something like a product, a byproduct, something which is created – intelligence as that which is established by the intellect, rather than as a character stat as in a role-playing game which determines the extent to which the intellect is able to function.
So then, if the definition of intelligence is incoherent, and we cannot entertain Rationality giving itself the simple definition of winning, how can we describe it? What does it mean to think, to use reason? And where has reason gone wrong? Let us delve into the subject without further delay.