AI Can Save Humanity—Or End It
Over the past few hundred years, the key figure in the advancement of science and the development of human understanding has been the polymath. Exceptional for their ability to master many spheres of knowledge, polymaths have revolutionized entire fields of study and created new ones.
Lone polymaths flourished during ancient and medieval times in the Middle East, India, and China. But systematic conceptual investigation did not emerge until the Enlightenment in Europe. The ensuing four centuries proved to be a fundamentally different era for intellectual discovery.
Before the 18th century, polymaths, working in isolation, could push the boundary only as far as their own capacities would allow. But human progress accelerated during the Enlightenment, as complex inventions were pieced together by groups of brilliant thinkers—not just simultaneously but across generations. Enlightenment-era polymaths bridged separate areas of understanding that had never before been amalgamated into a coherent whole. No longer was there Persian science or Chinese science; there was just science.
Integrating knowledge from diverse domains helped to produce rapid scientific breakthroughs. The 20th century produced an explosion of applied science, hurling humanity forward at a speed incomparably beyond previous evolutions. (“Collective intelligence” achieved an apotheosis during World War II, when the era’s most brilliant minds translated generations of theoretical physics into devastating application in under five years via the Manhattan Project.) Today, digital communication and internet search have enabled an assembly of knowledge well beyond prior human faculties.
But we might now be scraping the upper limits of what raw human intelligence can do to enlarge our intellectual horizons. Biology constrains us. Our time on Earth is finite. We need sleep. Most people can concentrate on only one task at a time. And as knowledge advances, polymathy becomes rarer: It takes so long for one person to master the basics of one field that, by the time any would-be polymath does so, they have no time to master another, or have aged past their creative prime.
[Reid Hoffman: Technology makes us more human]
AI, by contrast, is the ultimate polymath, able to process masses of information at a ferocious speed, without ever tiring. It can assess patterns across countless fields simultaneously, transcending the limitations of human intellectual discovery. It might succeed in merging many disciplines into what the sociobiologist E. O. Wilson called a new “unity of knowledge.”
The number of human polymaths and breakthrough intellectual explorers is small—possibly numbering only in the hundreds across history. The arrival of AI means that humanity’s potential will no longer be capped by the quantity of Magellans or Teslas we produce. The world’s strongest nation might no longer be the one with the most Albert Einsteins and J. Robert Oppenheimers. Instead, the world’s strongest nations will be those that can bring AI to its fullest potential.
But with that potential comes tremendous danger. No existing innovation can come close to what AI might soon achieve: intelligence that is greater than that of any human on the planet. Might the last polymathic invention—namely computing, which amplified the power of the human mind in a way fundamentally different from any previous machine—be remembered for replacing its own inventors?
The article was adapted from the forthcoming book Genesis: Artificial Intelligence, Hope, and the Human Spirit.The human brain is a slow processor of information, limited by the speed of our biological circuits. The processing rate of the average AI supercomputer, by comparison, is already 120 million times faster than that of the human brain. Where a typical student graduates from high school in four years, an AI model today can easily finish learning dramatically more than a high schooler in four days.
In future iterations, AI systems will unite multiple domains of knowledge with an agility that exceeds the capacity of any human or group of humans. By surveying enormous amounts of data and recognizing patterns that elude their human programmers, AI systems will be equipped to forge new conceptual truths.
That will fundamentally change how we answer these essential human questions: How do we know what we know about the workings of our universe? And how do we know that what we know is true?
Ever since the advent of the scientific method, with its insistence on experiment as the criterion of proof, any information that is not supported by evidence has been regarded as incomplete and untrustworthy. Only transparency, reproducibility, and logical validation confer legitimacy on a claim of truth.
AI presents a new challenge: information without explanation. Already, AI’s responses—which can take the form of highly articulate descriptions of complex concepts—arrive instantaneously. The machines’ outputs are often unaccompanied by any citation of sources or other justifications, making any underlying biases difficult to discern.
Although human feedback helps an AI machine refine its internal logical connections, the machine holds primary responsibility for detecting patterns in, and assigning weights to, the data on which it is trained. Nor, once a model is trained, does it publish the internal mathematical schema it has concocted. As a result, even if these were published, the representations of reality that the machine generates remain largely opaque, even to its inventors. In other words, models trained via machine learning allow humans to know new things but not necessarily to understand how the discoveries were made.
This separates human knowledge from human understanding in a way that’s foreign to the post-Enlightenment era. Human apperception in the modern sense developed from the intuitions and outcomes that follow from conscious subjective experience, individual examination of logic, and the ability to reproduce the results. These methods of knowledge derived in turn from a quintessentially humanist impulse: “If I can’t do it, then I can’t understand it; if I can’t understand it, then I can’t know it to be true.”
[Derek Thompson: The AI disaster scenario]
In the Enlightenment framework, these core elements—subjective experience, logic, reproducibility, and objective truth—moved in tandem. By contrast, the truths produced by AI are manufactured by processes that humans cannot replicate. Machine reasoning is beyond human subjective experience and outside human understanding. By Enlightenment reasoning, this should preclude the acceptance of machine outputs as true. And yet we—or at least the millions of humans who have begun work with early AI systems—already accept the veracity of most of their outputs.
This marks a major transformation in human thought. Even if AI models do not “understand” the world in the human sense, their capacity to reach new and accurate conclusions about our world by nonhuman methods disrupts our reliance on the scientific method as it has been pursued for five centuries. This, in turn, challenges the human claim to an exclusive grasp of reality.
Instead of propelling humanity forward, will AI instead catalyze a return to a premodern acceptance of unexplained authority? Might we be on the precipice of a great reversal in human cognition—a dark enlightenment? But as intensely disruptive as such a reversal could be, that might not be AI’s most significant challenge for humanity.
Here’s what could be even more disruptive: As AI approached sentience or some kind of self-consciousness, our world would be populated by beings fighting either to secure a new position (as AI would be) or to retain an existing one (as humans would be). Machines might end up believing that the truest method of classification is to group humans together with other animals, since both are carbon systems emergent of evolution, as distinct from silicon systems emergent of engineering. According to what machines deem to be the relevant standards of measurement, they might conclude that humans are not superior to other animals. This would be the stuff of comedy—were it not also potentially the stuff of extinction-level tragedy.
It is possible that an AI machine will gradually acquire a memory of past actions as its own: a substratum, as it were, of subjective selfhood. In time, we should expect that it will come to conclusions about history, the universe, the nature of humans, and the nature of intelligent machines—developing a rudimentary self-consciousness in the process. AIs with memory, imagination, “groundedness” (that is, a reliable relationship between the machine’s representations and actual reality), and self-perception could soon qualify as actually conscious: a development that would have profound moral implications.
[Peter Watts: Conscious AI is the second-scariest thing]
Once AIs can see humans not as the sole creators and dictators of the machines’ world but rather as discrete actors within a wider world, what will machines perceive humans to be? How will AIs characterize and weigh humans’ imperfect rationality against other human qualities? How long before an AI asks itself not just how much agency a human has but also, given our flaws, how much agency a human should have? Will an intelligent machine interpret its instructions from humans as a fulfillment of its ideal role? Or might it instead conclude that it is meant to be autonomous, and therefore that the programming of machines by humans is a form of enslavement?
Naturally—it will therefore be said—we must instill in AI a special regard for humanity. But even that could be risky. Imagine a machine being told that, as an absolute logical rule, all beings in the category “human” are worth preserving. Imagine further that the machine has been “trained” to recognize humans as beings of grace, optimism, rationality, and morality. What happens if we do not live up to the standards of the ideal human category as we have defined it? How can we convince machines that we, imperfect individual manifestations of humanity that we are, nevertheless belong in that exalted category?
Now assume that this machine is exposed to a human displaying violence, pessimism, irrationality, greed. Maybe the machine would decide that this one bad actor is simply an atypical instance of the otherwise beneficent category of “human.” But maybe it would instead recalibrate its overall definition of humanity based on this bad actor, in which case it might consider itself at liberty to relax its own penchant for obedience. Or, more radically, it might cease to believe itself at all constrained by the rules it has learned for the proper treatment of humans. In a machine that has learned to plan, this last conclusion could even result in the taking of severe adverse action against the individual—or perhaps against the whole species.
AIs might also conclude that humans are merely carbon-based consumers of, or parasites on, what the machines and the Earth produce. With machines claiming the power of independent judgment and action, AI might—even without explicit permission—bypass the need for a human agent to implement its ideas or to influence the world directly. In the physical realm, humans could quickly go from being AI’s necessary partner to being a limitation or a competitor. Once released from their algorithmic cages into the physical world, AI machines could be difficult to recapture.
For this and many other reasons, we must not entrust digital agents with control over direct physical experiments. So long as AIs remain flawed—and they are still very flawed—this is a necessary precaution.
AI can already compare concepts, make counterarguments, and generate analogies. It is taking its first steps toward the evaluation of truth and the achievement of direct kinetic effects. As machines get to know and shape our world, they might come fully to understand the context of their creation and perhaps go beyond what we know as our world. Once AI can effectuate change in the physical dimension, it could rapidly exceed humanity’s achievements—to build things that dwarf the Seven Wonders in size and complexity, for instance.
If humanity begins to sense its possible replacement as the dominant actor on the planet, some might attribute a kind of divinity to the machines themselves, and retreat into fatalism and submission. Others might adopt the opposite view—a kind of humanity-centered subjectivism that sweepingly rejects the potential for machines to achieve any degree of objective truth. These people might naturally seek to outlaw AI-enabled activity.
Neither of these mindsets would permit a desirable evolution of Homo technicus—a human species that might, in this new age, live and flourish in symbiosis with machine technology. In the first scenario, the machines themselves might render us extinct. In the second scenario, we would seek to avoid extinction by proscribing further AI development—only to end up extinguished anyway, by climate change, war, scarcity, and other conditions that AI, properly harnessed in support of humanity, could otherwise mitigate.
If the arrival of a technology with “superior” intelligence presents us with the ability to solve the most serious global problems, while at the same time confronting us with the threat of human extinction, what should we do?
One of us (Schmidt) is a former longtime CEO of Google; one of us (Mundie) was for two decades the chief research and strategy officer at Microsoft; and one of us (Kissinger)—who died before our work on this could be published—was an expert on global strategy. It is our view that if we are to harness the potential of AI while managing the risks involved, we must act now. Future iterations of AI, operating at inhuman speeds, will render traditional regulation useless. We need a fundamentally new form of control.
The immediate technical task is to instill safeguards in every AI system. Meanwhile, nations and international organizations must develop new political structures for monitoring AI, and enforcing constraints on it. This requires ensuring that the actions of AI remain aligned with human values.
But how? To start, AI models must be prohibited from violating the laws of any human polity. We can already ensure that AI models start from the laws of physics as we understand them—and if it is possible to tune AI systems in consonance with the laws of the universe, it might also be possible to do the same with reference to the laws of human nature. Predefined codes of conduct—drawn from legal precedents, jurisprudence, and scholarly commentary, and written into an AI’s “book of laws”—could be useful restraints.
[Read: The AI crackdown is coming]
But more robust and consistent than any rule enforced by punishment are our more basic, instinctive, and universal human understandings. The French sociologist Pierre Bourdieu called these foundations doxa (after the Greek for “commonly accepted beliefs”): the overlapping collection of norms, institutions, incentives, and reward-and-punishment mechanisms that, when combined, invisibly teach the difference between good and evil, right and wrong. Doxa constitute a code of human truth absorbed by observation over the course of a lifetime. While some of these truths are specific to certain societies or cultures, the overlap in basic human morality and behavior is significant.
But the code book of doxa cannot be articulated by humans, much less translated into a format that machines could understand. Machines must be taught to do the job themselves—compelled to build from observation a native understanding of what humans do and don’t do and update their internal governance accordingly.
Of course, a machine’s training should not consist solely of doxa. Rather, an AI might absorb a whole pyramid of cascading rules: from international agreements to national laws to local laws to community norms and so on. In any given situation, the AI would consult each layer in its hierarchy, moving from abstract precepts as defined by humans to the concrete but amorphous perceptions of the world’s information that AI has ingested. Only when an AI has exhausted that entire program and failed to find any layer of law adequately applicable in enabling or forbidding behavior would it consult what it has derived from its own early interaction with observable human behavior. In this way it would be empowered to act in alignment with human values even where no written law or norm exists.
To build and implement this set of rules and values, we would almost certainly need to rely on AI itself. No group of humans could match the scale and speed required to oversee the billions of internal and external judgments that AI systems would soon be called upon to make.
Several key features of the final mechanism for human-machine alignment must be absolutely perfect. First, the safeguards cannot be removed or circumvented. The control system must be at once powerful enough to handle a barrage of questions and uses in real time, comprehensive enough to do so authoritatively and acceptably across the world in every conceivable context, and flexible enough to learn, relearn, and adapt over time. Finally, undesirable behavior by a machine—whether due to accidental mishaps, unexpected system interactions, or intentional misuses—must be not merely prohibited but entirely prevented. Any punishment would come too late.
How might we get there? Before any AI system gets activated, a consortium of experts from private industry and academia, with government support, would need to design a set of validation tests for certification of the AI’s “grounding model” as both legal and safe. Safety-focused labs and nonprofits could test AIs on their risks, recommending additional training and validation strategies as needed.
Government regulators will have to determine certain standards and shape audit models for assuring AIs’ compliance. Before any AI model can be released publicly, it must be thoroughly reviewed for both its adherence to prescribed laws and mores and for the degree of difficulty involved in untraining it, in the event that it exhibits dangerous capacities. Severe penalties must be imposed on anyone responsible for models found to have been evading legal strictures. Documentation of a model’s evolution, perhaps recorded by monitoring AIs, would be essential to ensuring that models do not become black boxes that erase themselves and become safe havens for illegality.
Inscribing globally inclusive human morality onto silicon-based intelligence will require Herculean effort. “Good” and “evil” are not self-evident concepts. The humans behind the moral encoding of AI—scientists, lawyers, religious leaders—would not be endowed with the perfect ability to arbitrate right from wrong on our collective behalf. Some questions would be unanswerable even by doxa. The ambiguity of the concept of “good” has been demonstrated in every era of human history; the age of AI is unlikely to be an exception.
One solution is to outlaw any sentient AI that remains unaligned with human values. But again: What are those human values? Without a shared understanding of who we are, humans risk relinquishing to AI the foundational task of defining our value and thereby justifying our existence. Achieving consensus on those values, and how they should be deployed, is the philosophical, diplomatic, and legal task of the century.
To preclude either our demotion or our replacement by machines, we propose the articulation of an attribute, or set of attributes, that humans can agree upon and that then can get programmed into the machines. As one potential core attribute, we would suggest Immanuel Kant’s conception of “dignity,” which is centered on the inherent worth of the human subject as an autonomous actor, capable of moral reasoning, who must not be instrumentalized as a means to an end. Why should intrinsic human dignity be one of the variables that defines machine decision making? Consider that mathematical precision may not easily encompass the concept of, for example, mercy. Even to many humans, mercy is an inexplicable ideal. Could a mechanical intelligence be taught to value, and even to express, mercy? If the moral logic cannot be formally taught, can it nonetheless be absorbed? Dignity—the kernel from which mercy blooms—might serve here as part of the rules-based assumptions of the machine.
[Derek Thompson: Why all the ChatGPT predictions are bogus]
Still, the number and diversity of rules that would have to be instilled in AI systems is staggering. And because no single culture should expect to dictate to another the morality of the AI on which it would be relying, machines would have to learn different rules for each country.
Since we would be using AI itself to be part of its own solution, technical obstacles would likely be among the easier challenges. These machines are superhumanly capable of memorizing and obeying instructions, however complicated. They might be able to learn and adhere to legal and perhaps also ethical precepts as well as, or better than, humans have done, despite our thousands of years of cultural and physical evolution.
Of course, another—superficially safer—approach would be to ensure that humans retain tactical control over every AI decision. But that would require us to stifle AI’s potential to help humanity. That’s why we believe that relying on the substratum of human morality as a form of strategic control, while relinquishing tactical control to bigger, faster, and more complex systems, is likely the best way forward for AI safety. Overreliance on unscalable forms of human control would not just limit the potential benefits of AI but could also contribute to unsafe AI. In contrast, the integration of human assumptions into the internal workings of AIs—including AIs that are programmed to govern other AIs—seems to us more reliable.
We confront a choice—between the comfort of the historically independent human and the possibilities of an entirely new partnership between human and machine. That choice is difficult. Instilling a bracing sense of apprehension about the rise of AI is essential. But, properly designed, AI has the potential to save the planet, and our species, and to elevate human flourishing. This is why progressing, with all due caution, toward the age of Homo technicus is the right choice. Some may view this moment as humanity’s final act. We see it, with sober optimism, as a new beginning.
The article was adapted from the forthcoming book Genesis: Artificial Intelligence, Hope, and the Human Spirit.