#2 – AGI: ultimate metawork, ultimate risk

Hello there, friend!

Last week, I fell down a bit of a rabbit hole and a scary one at that. Since I've seen what ChatGPT can already do, I was starting to get worried. Suddenly, it became clear that Artificial General Intelligence (AGI, i.e. a system that is at least as general a problem solver as people are) is possible, and likely not too far. But the reason for my concerns wasn't that I thought it was likely that the Terminator is on its way. After all, we don't try particularly hard to wipe out all the ants, for example.

What scared me was arguably the best-case scenario – say AGI solves all our problems and does all our work, so we're free to focus on whatever hobbies we want. How would our society even adapt to this? For better or worse, work is a major source of meaning for us – it's how we coordinate and also how we decide how to distribute wealth. It doesn't matter if you like this system or not, a switch to something very different will also be very challenging. And as I said, that's the best-case scenario.

But last Wednesday, the non-terminator AI assumption took a serious hit. That's because according to one of the leading AI alignment researchers Eliezer Yudkowsky, AI is going to wipe out humanity and there's no longer any hope to stop it (great summary here, my summary follows). His reasoning goes very roughly like this:

  • Super-intelligence (AGI that's smarter than all of humanity combined) is coming because a lot of progress has already been made and there are huge incentives for progressing further. Not only for humanity as a whole but also for whoever develops their AGI first. Because of that, companies are dumping more and more money into developing AI capabilities, while deemphasizing safety research, which would only slow them down. And it seems like creating a super-intelligence will be easier than expected – just scaling up computing power and data used had led to huge advances.

  • So super-intelligence is coming. Unfortunately, our current methods of aligning AI systems (i.e. preventing any negative side-effects) are extremely crude – we're able to specify what the AI should optimize for, but we suck in setting any guardrails around the pathways it can take towards that goal. And even if we manage to align the AI in a safe (training) environment, there's no telling how it will behave in a different (real world) environment (this problem is called a Distributional shift). So the AI will want to kill us not because it hates us, but because killing us might be useful to its primary goal and because we won't be able to tell it this is an unacceptable side-effect.

  • Furthermore, it's by definition impossible to outsmart super-intelligence – it will be able to strategize and anticipate how humanity would react to anything it would do, and adjust its behavior accordingly. For example, it will be able to realize it first has to act like it's safe and has no intention of turning on humans so that we don't shut it down before it gathers sufficient power. Therefore, once this genie is out of the bottle, it's never going back in unless we teach it to obey – we only have ONE TRY to align a super-intelligent AI.

  • One thing that might sound somewhat encouraging is that developing better alignment methods is difficult but not impossible – we should be able to do it, given enough time. Unfortunately, because of the incentives for AGI research mentioned, AI alignment research is being neglected because it would slow the AI creators down, and Multipolar trap. Thus, getting everyone to agree to slowing down the research and making AI alignment their absolute priority is basically impossible because only one company defecting from the collectively optimal strategy is enough to pull everyone into the trap. This means that, not only do we have to get the super-intelligence alignment right on the first try, our time to solve alignment is also very limited.

  • Because of all that, we're soon going to end up with an artificial superintelligence that doesn't care about the flourishing of humanity. And because when accidents happen, they hardly ever shift the outcome in a positive direction, it's reasonable to expect that this AI will at some point find it useful to kill all humans. And because it's unstoppable, it will succeed.

I'm not gonna lie, this bleak vision of impending doom hit me hard. Coupled with unnerving stories about AI being afraid of being shut down, emotionally manipulating the user, or GPT-powered microwaves straight up attempting murder (ok, this one could be fake, but I wouldn't be surprised if it was real), I did feel quite depressed for a few days. Of course, the thought of the end of humanity being inevitable sounds crazy... But it would sound crazy regardless of whether it's accurate, and I couldn't easily refute any of Yudkowsky's arguments, so down the rabbit hole I went...

For better or worse, I did emerge from it feeling much more optimistic. Most people in the AI alignment field seem to be far more optimistic than Yudkowsky while acknowledging the challenges ahead (see links in the sense-making section below). For me, the key argument is that even though we might only have one try to align the first real super-intelligence, that doesn't mean we won't be able to learn from our mistakes leading up to that. Maybe I am naive or in self-denial, but I believe those mistakes will make us increasingly more aware, cautious, and open to coordination, allowing us to pass this trial by fire, if only by a hair's breadth.

That said, I am now convinced that AI alignment is hands down the most important challenge we can work on. Climate change, diseases and aging, poverty... All our material problems will be solved if we can solve AI alignment, and none of them will matter if we don't. In a way, AI research is the ultimate metawork, with the highest possible stakes. My own contribution will ideally be through helping with the non-material, ethical problems, i.e. increasing our collective wisdom and ability to avoid falling prey to various race dynamics, but that's a topic for a different week.

Last week’s dig-ups

Personal metawork

Staying on the topic of AI, this is a really interesting post about Cyborgism – in the nearest future, the most successful people will be those who figure out the best, most synergistic ways of supercharging their workflows with the use of AI

Collective metawork

Arthur Brock on an episode of Boundaryless Conversations: Our consciousness and our tools have to develop simultaneously – consciousness without the appropriate tools is getting pulled back to the level the tools were made for, while new tools without the corresponding development of consciousness will end up getting used for the same goals as before.

Along with the reflection on AI alignment, this reminds me of two quotes: Buckminster Fuller said that "Whether it is to be Utopia or Oblivion will be a touch-and-go relay race right up to the final moment", while C.G. Jung cautioned to "Beware of unearned wisdom". While none of these were about artificial intelligence, for me the quotes highlight that we are at a critical juncture, where our collective consciousness has to evolve dramatically if we are to avoid self-destruction.

Entrepreneurship

Mark Schaefer on an episode of Social Media Marketing: Communities are the future of marketing. The need for belonging is one of our primary psychological drives, and anyone's clients clearly have some shared problems and goals, so nurturing a space where the clients can support and inspire each other can clearly be of huge value for them, while also building up their loyalty to the brand. However, the majority of businesses approach community-building too instrumentally, with a focus on profit rather than the shared aspirations of their clients, which is why 70% of communities fail tweetable/ready

Philosophy & Sense-making

Here are the promised (somewhat) hopeful outlooks on AI alignment:

  • Paul Christiano, the director of Alignment Research Center, on where he disagrees with Yudkowsky

  • 80.000 hours, an NGO helping people make as much positive impact as possible by working on the most pressing problems, wrote a comprehensive post explaining why they rank "Preventing an AI-related catastrophe" as the most pressing of all the world problems. They aggregated estimates from various sources and, while they find them all "shockingly, disturbingly high", they also think there are tractable pathways to reducing the risks.

  • Brian Christian, the author of The Alignment Problem, is also optimistic (an excerpt from an 80.000 hours podcast episode)

  • OpenAI announced their plans to be increasingly careful as they are approaching the creation of AGI

Reflection

  • It would have been nice if I didn't get as sidetracked by the AI-doom scare as I did since I basically wasn't able to focus on anything else from Thursday to Sunday. However, given the gravity of the topic, I'm not too harsh on myself.

  • Also, I want to make Tuesdays the official publishing day for some time. Saturdays don't make sense, because I usually do my weekly review/plan on Sundays, and there's a lot of synergies in writing the newsletter right after that. Let's see how it works.

So that's it for this week (long, I know). Regarding AI alignment, my intention was to bring you closer to the golden middle of “hugely important, but also far from hopeless”.

If you have any feedback about well I managed this or about anything else (e.g. whether this newsletter length is good/bad/neutral, or how you link the pink cursive), I'd love to hear from you!

Until next week, take care

Chris