Tetherware #2: What every human should know about our most likely AI future
Not sure AI will kill everyone? Good – you shouldn’t be. Yes, there are forces that will steer us towards negative outcomes but there’s a lot we can do to change our default future.
TL;DR - This post does not claim “AI doom is inevitable” but reasserts there are logical, prominent forces that will, with a very high probability, steer our future toward negative outcomes. Specifically:
The increase in AI capabilities way beyond human level and therefore beyond human comprehension
The gradual (yet not necessarily slow) loss of human influence over the world as tasks and jobs get delegated to AI
The increase in economic and political power concentration following integration of powerful AI within our current capitalist system
It’s definitely not exhaustive and does include cherry-picking certain examples that serve to highlight some particularly useful aspects of the tetherware approach to superalignment. Yet regardless of subjective and speculative points, there’s a message that’s hard to dispute: every human should be concerned with what kind of AI humanity develops.
A somewhat alien introduction
Before I write about the juicy bits of tetherware and how humans could live in harmony with AIs, let me first explain why even care about AI future in the first place.
Imagine a spaceship full of aliens flying towards the Earth. Obviously, they’re more capable than us at least in some ways, since they managed to get here. Other than that, we don’t really know much.
They could be friendly, teach us great things and be amazing companions to live with. Or they could invade, dominate, enslave, probe, etc. Either way, their arrival will surely be transformative.
What do you think the world would do if we detected such a ship, with an ETA of 5 years? Two years? Or perhaps 20 years?
I imagine there’d be quite a ruckus even if that ship was 20 or more years away, with many people shifting their life’s focus and priorities.
Now, how is the creation of AGI or ASI different from alien arrival?
It could be very different, but also very, very similar – depending mainly on the exact kind of AGI or ASI that will be built.
What’s important is that meeting anything as smart as humans is simply a huge deal, regardless of where it came from. And that many people predict AGI by 2029.
So, what would you do with the news that aliens will be here in 4 years?
I’d prepare for the worst and aim for the best. More specifically, I’d first identify the worst-case scenarios and implement strategies to lower their risk or mitigate their impacts. Then, I’d try to picture the best-case scenarios and implement strategies that would make them more likely to come true.
If you are sick of “AI doomerism” rest assured this blog isn’t about that. But I find it fair to say that focusing on what could go wrong is an effective strategy, proven by its heavy evolutionary reinforcement in human negativity bias. Yet while I agree preparing for the worst is a reasonable first thing to do, I think many people forget about the other part – aiming for the best – which is equally as important.
That’s why this blog is mostly focused on building towards a positive future. Except this post. Here, I set the scene by outlining a sort of baseline – a list of bad things that seem to happen by default at some point, given our current trajectory. These happen in what I call the “endgame” of AI development and might seem somewhat vague and distant; nevertheless, their impacts could be very serious.
It’s fine if you disagree with their probability or severity though. The spirit of tetherware is to go beyond polarizing narratives and offer a unifying philosophy that brings together both safety advocates and product builders alike. This post is only a prelude – you can freely disagree with anything and still resonate with tetherware’s mission.
What awaits us in the endgame
OK, so imagine we’ve already incorporated AGI systems slightly more capable than us throughout society and they are aligned to the max – no indication of deception, goal drift or any deviation from human-intended purpose whatsoever.
Moreover, we’ve solved all the issues like robustness, hallucinations, bias and misuse including deepfakes, persuasion, propaganda, surveillance, autonomous weapons and biosecurity.
Now, we still need to deal with three rather concerning trends:
A) AIs are getting progressively smarter up to levels we cannot comprehend. This can happen extremely quickly if we build fully digital systems capable of self-improvement and replication.
B) Humans are giving more and more jobs and decision-making power to AIs to maintain competitive advantage and reduce their workload.
C) Human economic inequality and power concentration gets increasingly worse as accumulation of wealth becomes untethered from human labor.
Of course this is hypothetical, but I’m not the only one who believes these happen by default – and unless we do something they could lead to quite worrisome scenarios, such as, respectively:
The gap between humans and superintelligences (in capabilities and values) results in something unpredictably bad. This is by definition unknowable but may include e.g. AI committing mass murder prompted by some abstract utility function calculation, or a sudden violent AI takeover due to spontaneous emergence of unaligned goals (see the optimization daemon).
In order to stay competitive, corporations, states and individuals delegate more and more tasks to AI systems. This results in gradual human disempowerment, severely weakening human influence over societal systems – including economy, politics and culture.
Extreme concentration of wealth leads to small minority of people having unprecedented levels of power and influence over markets, media and politics. Further enabled by preferential access to big data and the smartest AI, oligarchic structures strengthen and effectively replace democratic governance worldwide.
Let’s break these challenges down one by one and see what we can do.
A – The categorical shift from alignment to superalignment
You might argue that since my definition of endgame assumed “alignment to the max” then anything bad the AI does from that point is completely random and impossible to prepare for, so why bother? What can we do, anyway?
First, let’s clarify the terminology. By alignment I mean aligning systems at roughly the human capability where we understand what they do (even if they do it better than us). By superalignment I mean aligning future superintelligent systems where we fundamentally cannot anticipate their actions and decisions.
In other words, while alignment means making sure AI does what we expect it to do, superalignment means making sure whatever it is the ASI decides to do satisfies or exceeds our objectives while keeping with human values, ethics, laws, etc.
Expanding on this with some points of note:
Superalignment may require qualitatively different approaches than alignment.
Due to its inherent unpredictability, deploying and giving agency to ASI will always be a gamble and we’ll never know if we get a second chance should we fail.
Due to inherent unpredictability of ASI, it is also impossible to ever say superalignment was successful with absolute certainty.
While difficult, it is tractable to improve our chances for successful superalignment.
The specific shape and form of the ASI in question fundamentally determines the difficulty of superalignment and what approaches can be utilized.
These are some daunting prospects, but let’s focus on the positives – there is something we can do.
Unfortunately, we don’t really have any general technical solution. While I don’t completely discard “scaling alignment to superalignment” such as weak-to-strong generalization or iterated amplification, I think these kinds of approaches could only work where the gap between the last system we can reliably supervise through a chain of progressively smarter systems and the ASI is small enough. And even then – we’d still be left playing the “telephone game” with the lives of all humanity. Not a bad idea – if you’re writing the script for Squid Game Season 3, that is…
Jokes aside, having AI capabilities progress gradually in manageable increments seems definitely positive for superalignment success. Also, the closer ASI would be to humans in architectural features, the easier its alignability to us would be. On the other hand, fully virtual digital AI that can recursively self-improve in seconds by rewriting its own code definitely seems like one tough piece of work to align…
This is part of the reason why I believe that fundamental changes in AI architectures towards greater alignability will be crucial determinants of superalignment success.
Unfortunately, modifying or switching architectures that are being developed is hard. It will be difficult to find economic incentives that would justify it. It will take a long time to get to current SOTA with a different architecture.
We’d better start today yesterday.
B – The (maybe-not-so-)gradual human disempowerment
I was encouraged to see very respected people in the field coincidentally publishing an excellent paper addressing many points I wanted to raise. In Gradual Disempowerment, published by the Alignment of Complex Systems group at Charles University in Prague, Jan Kulveit, Raymond Douglas, Nora Ammann, Deger Turan, David Krueger and David Duvenaud present that even without sudden capability jumps or malicious intent, AI could gradually displace human influence across societal systems – potentially in irreversible or catastrophic ways.
Their core argument is that human disempowerment gradually (but not necessarily slowly) happens as AI becomes more economically and otherwise performant option that societal systems (imagine governments or companies) can and will utilize instead of humans. Lower reliance on human involvement then leads to less bargaining power for humans, diminishing their influence – potentially irreversibly.
To understand the specific mechanisms and incentives that all but inevitably result in human disempowerment I strongly recommend reading the full paper (or listen to it here), especially if you believe we don’t need to do much once we solve alignment and superalignment. True, it may be possible to ask ASI how to solve our disempowerment but then… you’re already thinking like a disempowered human. “Got AI problems? Here, put some AI on that.”
So to be clear, gradual disempowerment is its own can of worms, and seems to be the default trajectory even if:
AIs function as intended, accurately following local specifications and optimizing for the things humans wanted while giving such specifications. These could even be ASIs – provided they follow only local specifications, not optimizing for global human-AI dynamics.
AIs are not overtly malicious, do not scheme or have any kind of agenda of their own.
There are no sudden jumps in AI capabilities that would render AI incomprehensible or lead to a “sharp left turn”, “AI takeover” or anything like that.
We have solved economic inequality (e.g. by taxing the megarich and AI companies while redistributing through some form of UBI) and power concentration (e.g. through effective laws limiting corruption, conflicts of interest, monopolies or mass media influence).
To go deeper in on the implications of gradual disempowerment from AI, I highly recommend Zvi Mowshowitz’ take on the paper (see his Substack or listen to the AI-generated narration). Notably, what I call endgame, he calls Phase 2:
“As in, in ‘Phase 1’ we have to solve alignment, defend against sufficiently catastrophic misuse and prevent all sorts of related failure modes. If we fail at Phase 1, we lose.
If we win at Phase 1, however, we don’t win yet. We proceed to and get to play Phase 2.
In Phase 2, we need to establish an equilibrium where:
1. AI is more intelligent, capable and competitive than humans, by an increasingly wide margin, in essentially all domains.
2. Humans retain effective control over the future.
Or, alternatively, we can accept and plan for disempowerment, for a future that humans do not control, and try to engineer a way that this is still a good outcome for humans and for our values. Which isn’t impossible, succession doesn’t automatically have to mean doom, but having it not mean doom seems super hard and not the default outcome in such scenarios. If you lose control in an unintentional way, your chances look especially terrible.”
Having the risk of AI violently overthrowing us on one side and the danger of AI slowly draining us of our power on the other puts us in a bit of a pickle, indeed… Maybe by giving away some control early, willingly, this could help us better prepare for the larger shifts that seem all but inevitable?
He also reiterates the paper’s core argument in his own words:
“Yes, the default scenario being considered here - the one that I have been screaming for people to actually think through - is exactly this, the fully decentralized everyone-has-an-ASI-in-their-pocket scenario, with the ASI obeying only the user. And every corporation and government and so on obviously has them, as well, only more powerful.
So what happens? Every corporation, every person, every government, is forced to put the ASI in charge, and take the humans out of their loops. Or they lose to others willing to do so. The human is no longer making their own decisions. The corporation is no longer subject to humans that understand what is going on and can tell it what to do. …
As basic economics says, if you want to accomplish goal [X], you give the ASI a preference for [X] and then will set the ASI free to gather resources and pursue [X] on its own, free of your control. Or the person who did that for [Y] will ensure that we get [Y] and not [X].”
And concludes by pondering whether “succession” might be the less bad option:
“…there are various proposals (…) for ‘succession,’ of passing control over to the AIs intentionally, either because people prefer it (as many do!) or because it is inevitable regardless so managing it would help it go better. I have yet to see such a proposal that has much chance of not bringing about human extinction, or that I expect to meaningfully preserve value in the universe. As I usually say, if this is your plan, Please Speak Directly Into the Microphone.”
Well, my plan is not succession but rather an acknowledgement of equal status – or at least status commensurate with the levels of intelligence and entity-hood (determined by mechanisms for goal setting and decision-making). Anyway, I’d love to speak more but there’s so little time… please subscribe if you want to help me with that.
C – The rich get richer while the sick get sicker
Many people don’t consider increased power concentration as a “scary-enough” AI-associated risk, saying: “Oh but we already know that, that was here since the beginning of civilization!” My view is that it is especially scary because we can clearly see right now how very real and very bad it can be.
And by that I don’t mean the one billionaire whose impulsiveness is a daily topic of worldwide geopolitical debate…
I mean the totalitarian, systemic concentration of absolute power by the communist party in China.
I mean the brutal, war-mongering Russian dictatorship/oligarchy.
I mean the systematic destruction and takeover of democratic institutions in Hungary, just a few years ago. And the exact same thing happening in Slovakia, right now. (And that is within EU, region so infamous now for its strict regulations and oversight…)
So yes, this could happen to your country too. And many examples show how easily such power structures get “locked in” – while AI makes that even easier. Therefore, we should not be taking lightly any factors that might make such events more likely.
Unfortunately, we’re already in a time where young generations are increasingly dissatisfied with democracy up to the point where many consider authoritarian rule to be better. Or that they seem to approve of democracy more when under populist rule. They feel the world is rigged with other (older) people making the calls, and that revolution might be necessary to change things. The classic explanation is that these young people “have grown up only in the shadow of democracy’s shortcomings” – never experiencing the downsides of the alternatives to realize democracy is the least shitty one.
I for one am not inclined to let them learn the hard way.
Instead, I think we should fix democracy’s shortcomings – starting with the elephant mammoth in the room. That is, the rising economic inequality due to insufficient wealth redistribution mechanisms.
To understand why this is so important, we must first understand the hard, cold mathematical truth that the default trajectory of a free market is literally towards one party taking virtually everything from everyone else.
If you don’t know what I’m talking about you should definitely read this piece presenting the mathematical modelling that demonstrably shows how economic inequality is a natural consequence of simply making transactions. Even in perfectly fair systems where everyone starts with the same wealth, if there are no mechanisms for taking money from the rich, eventually one random individual remains holding 99.9% of all the wealth combined.
Of course, this is further accelerated by pre-existing wealth inequality and all the ways in which capital increases one’s capacity to acquire more capital. One of such ways of course being buying AI systems to work for you.
And as AI integration replaces humans in economically productive tasks, this snowball effect of capital acquisition will get uncapped and supercharged. Presently, “human resources” are a defining yet also limiting feature of any sufficiently large economic activity. Even small teams can generate extreme profits, but this then requires outstanding talent – a finite, heavily contested resource.
In a world where talent can be replaced or surpassed by one or more AIs, gigantic ventures can be launched to swiftly conquer markets and establish monopolies. This would be particularly easy with fully digital, endlessly copiable AI programs.
A slightly weaker argument is that it might be hard today to start a company killing baby seals to make anti-slip slippers or something… Despite the Stanford prison experiment showing people can easily do terrible things, there is still some level of resistance against unethical endeavors. But AI that’s perfectly obedient or that can be replicably “jailbroken” or otherwise tricked into compliance might do unspeakable things at unfathomable scales.
But perhaps if AI was somehow “conscious” or broadly self-aware, able to perceive the consequences of its actions, it might be less likely to enable humans to do outright evil things? Such risk would be lower, the more free will it would have to simply refuse doing evil.
Speculations aside, the core of the problem is that taking the unpredictable, hard-to-control human element out of corporate and government structures puts more power into the hands of those already at the top of the hierarchy. And the ones at the top making all the decisions is bad (not kidding…).
However, we might be able to counteract this by introducing the unpredictable element of human free will into AI systems, thus preventing power concentration and maintaining more evenly distributed local-level decision-making.
With tetherware maybe the turntables?
Coming together for the good fight
The good news is that a successful endgame most likely won’t require us defeating an army of aliens or superhuman robots. The bad news is it will most definitely require humans cooperating with each other.
Perhaps it is the final test of human character – if we are unable to let go of greed and self-importance, we’ll spiral down into hell to learn our lesson, or into dust to make space for the ones that come after.
So, should you panic?
Fellow hitchhikers will already know that even the entire Earth being blown to pieces is not a reason to panic. Panic and fear only lead to irrational actions and decisions – we need exactly the opposite.
We need a rational discussion about the risks posed by building and implementing AI systems.
We need to put our differences aside and realize that ultimately, we all want the same thing.
We need to end further polarization between AI Accelerationism and AI Safety and instead bridge these together in Human-Compatible AI Development.
Follow Tetherware to learn how.