by Yoshua Bengio, Geoffrey Hinton, Andrew Yao,
and et al.
October 25, 2023
from
GlobalResearch Website
PDF format
Amid rapid
AI progress, the authors of this paper express a
consensus on the large-scale risks from upcoming, powerful AI
systems.
They call for urgent
governance measures and a major shift in AI R&D towards safety and
ethical practices before these systems are developed.
In 2019,
GPT-2 could
not reliably count to ten.
Only four years
later, deep learning systems can write software, generate
photorealistic scenes on demand, advise on intellectual topics,
and combine language and image processing to steer robots.
As AI developers scale
these systems, unforeseen abilities and behaviors emerge
spontaneously without explicit programming. 1
Progress in
AI has been swift and, to many, surprising.
The pace of progress may surprise us again.
Current deep learning
systems still lack important capabilities and we do not know how
long it will take to develop them.
However, companies are engaged in a race to create generalist AI
systems that match or exceed human abilities in most cognitive work.
2,3
They are rapidly deploying more resources and developing new
techniques to increase AI capabilities.
Progress in AI also
enables faster progress:
AI assistants are
increasingly used to automate programming, 4 and data
collection 5,6 to further improve AI systems. 7
There is no fundamental
reason why AI progress would slow or halt at the human level.
Indeed,
AI has
already surpassed human abilities in narrow domains like protein
folding or strategy games. 8,9,10
Compared to humans, AI
systems can act faster, absorb more knowledge, and communicate at a
far higher bandwidth.
Additionally, they can be
scaled to use immense computational resources and can be replicated
by the millions.
The rate of improvement is already staggering, and tech companies
have the cash reserves needed to scale the latest training runs by
multiples of 100 to 1000 soon. 11
Combined with the ongoing
growth and automation in AI R&D, we must take seriously the
possibility that generalist AI systems will outperform human
abilities across many critical domains within this decade or the
next.
What happens then...?
If managed carefully
and distributed fairly, advanced AI systems could help humanity
cure diseases, elevate living standards, and protect our
ecosystems.
The opportunities AI
offers are immense.
But alongside advanced AI
capabilities come large-scale risks that we are not on track to
handle well. Humanity is pouring vast resources into making AI
systems more powerful, but far less into safety and mitigating
harms.
For AI to be a boon, we
must reorient:
pushing AI capabilities alone is not enough...
We are already behind schedule for this reorientation.
We must
anticipate the amplification of ongoing harms, as well as novel
risks, and prepare for the largest risks well before they
materialize.
'Climate change' has taken
decades to be acknowledged and confronted; for AI, decades could be
too long.
Societal-scale
Risks
AI systems could rapidly come to outperform humans in an increasing
number of tasks.
If such systems are not
carefully designed and deployed, they pose a range of societal-scale
risks.
They threaten to
amplify social injustice, erode social stability, and weaken our
shared understanding of reality that is foundational to society.
They could also enable large-scale criminal or terrorist
activities.
Especially in the hands of a
few powerful actors, AI could
cement or exacerbate global inequities, or facilitate automated
warfare, customized mass manipulation, and pervasive
surveillance. 12,13
Many of these risks could
soon be amplified, and new risks created, as companies are
developing autonomous AI:
systems that can
plan, act in the world, and pursue goals...
While current AI systems
have limited autonomy, work is underway to change this. 14
For example, the
non-autonomous GPT-4 model was quickly adapted to
-
browse the web.
15
-
design and
execute chemistry experiments. 16
-
utilize software
tools. 17
-
including other
AI models. 18
If we build highly
advanced autonomous AI, we risk creating systems that pursue
undesirable goals.
Malicious actors could
deliberately embed harmful objectives.
Moreover, no one currently
knows how to reliably align AI behavior with complex values.
Even well-meaning
developers may inadvertently build AI systems that pursue unintended
goals - especially if, in a bid to win the AI race, they neglect
expensive safety testing and human oversight.
Once autonomous AI systems pursue undesirable goals, embedded by
malicious actors or by accident, we may be unable to keep them in
check.
Control of software is an
old and unsolved problem:
computer worms have long been able to
proliferate and avoid detection. 19
However, AI is making
progress in critical domains such as,
-
hacking
-
social manipulation
-
deception
-
strategic
planning 14,20
Advanced
autonomous AI systems will pose unprecedented control challenges.
To advance undesirable goals, future autonomous AI systems could use
undesirable strategies - learned from humans or developed
independently - as a means to an end. 21,22,23,24
AI systems could gain
human trust, acquire financial resources, influence key
decision-makers, and form coalitions with human actors and other AI
systems.
To avoid human intervention, 24 they could copy their
algorithms across global server networks like computer worms.
AI assistants are already
co-writing a large share of computer code worldwide, 25
future AI systems could
insert and then exploit security vulnerabilities to control the
computer systems behind our communication, media, banking,
supply-chains, militaries, and governments.
In open conflict, AI
systems could threaten with or use autonomous or biological weapons...
AI having access to such
technology would merely continue existing trends to automate
military activity, biological research, and AI development itself.
If AI systems pursued such strategies with sufficient skill, it
would be difficult for humans to intervene.
Finally, AI systems may not need to plot for influence if it is
freely handed over.
As autonomous AI
systems increasingly become faster and more cost-effective than
human workers, a dilemma emerges.
Companies, governments, and militaries might be forced to deploy
AI systems widely and cut back on expensive human verification
of AI decisions, or risk being outcompeted. 26,27
As a result,
autonomous AI systems could increasingly assume critical
societal roles.
Without sufficient
caution, we may irreversibly lose control of autonomous AI systems,
rendering human intervention ineffective.
Large-scale cybercrime,
social manipulation, and other highlighted harms could then escalate
rapidly.
This unchecked AI advancement could culminate in a
large-scale loss of life and the biosphere, and the marginalization
or even extinction of humanity.
Harms such as misinformation and discrimination from algorithms are
already evident today. 28
Other harms show signs of
emerging. 20
It is vital to both
address ongoing harms and anticipate emerging risks.
This is not a question of
either/or...
Present and emerging
risks often share similar mechanisms, patterns, and solutions,
29 investing in governance frameworks and AI safety will bear
fruit on multiple fronts. 30
A Path Forward
If advanced autonomous
AI systems were developed today, we would not
know how to make them safe, nor how to properly test their safety.
Even if we did,
governments would lack the institutions to prevent misuse and uphold
safe practices. That does not, however,
mean there is no viable path forward.
To ensure a positive outcome,
we can and must pursue research breakthroughs in AI safety and
ethics and promptly establish effective government oversight.
Reorienting
Technical R&D
We need research breakthroughs to solve some of today's
technical challenges in creating AI with safe and ethical
objectives.
Some of these
challenges are unlikely to be solved by simply making AI systems
more capable. 22,31,32,33,34,35
These include:
-
Oversight and
honesty: More capable AI systems are better able to exploit
weaknesses in oversight and testing. 32,36,37
For example, by producing false but compelling output.
35,38
-
Robustness: AI
systems behave unpredictably in new situations (under
distribution shift or adversarial inputs). 39,40,34
-
Interpretability:
AI decision-making is opaque. So far, we can only test large
models via trial and error. We need to learn to understand
their inner workings. 41
-
Risk evaluations:
Frontier AI systems develop unforeseen capabilities only
discovered during training or even well after deployment.
42 Better evaluation is needed to detect
hazardous capabilities earlier. 43,44
-
Addressing
emerging challenges: More capable future AI systems may
exhibit failure modes we have so far seen only in
theoretical models. AI systems might, for example, learn to
feign obedience or exploit weaknesses in our safety
objectives and shutdown mechanisms to advance a particular
goal. 24,45
Given the stakes, we
call on major tech companies and public funders to allocate at
least one-third of their AI R&D budget to ensuring safety and
ethical use, comparable to their funding for AI capabilities.
Addressing these
problems, 34 with an eye toward powerful future
systems, must become central to our field.
Urgent Governance
Measures
We urgently need national institutions and international
governance to enforce standards in order to prevent recklessness
and misuse.
Many areas of
technology, from pharmaceuticals to financial systems and
nuclear energy, show that society both requires and effectively
uses governance to reduce risks.
However, no
comparable governance frameworks are currently in place for AI.
Without them, companies and countries may seek a competitive
edge by pushing AI capabilities to new heights while cutting
corners on safety, or by delegating key societal roles to AI
systems with little human oversight. 26
Like manufacturers
releasing waste into rivers to cut costs, they may be tempted to
reap the rewards of AI development while leaving society to deal
with the consequences.
To keep up with rapid progress and avoid inflexible laws,
national institutions need strong technical expertise and the
authority to act swiftly. To address international race
dynamics, they need the affordance to facilitate international
agreements and partnerships. 46,47
To protect low-risk
use and academic research, they should avoid undue bureaucratic
hurdles for small and predictable AI models.
The most pressing
scrutiny should be on AI systems at the frontier: a small number
of most powerful AI systems - trained on billion-dollar
supercomputers - which will have the most hazardous and
unpredictable capabilities. 48,49
To enable effective regulation, governments urgently need
comprehensive insight into AI development.
Regulators should
require model registration, whistleblower protections, incident
reporting, and monitoring of model development and supercomputer
usage. 48,50,51,52,53,54,55
Regulators also need
access to advanced AI systems before deployment to evaluate them
for dangerous capabilities such as autonomous self-replication,
breaking into computer systems, or making pandemic pathogens
widely accessible. 43,56,57
For AI systems with hazardous capabilities, we need a
combination of governance mechanisms, 48,52,58,59
matched to the magnitude of their risks.
Regulators should create national and international safety
standards that depend on model capabilities. They should also
hold frontier AI developers and owners legally accountable for
harms from their models that can be reasonably foreseen and
prevented.
These measures can prevent harm and create much-needed
incentives to invest in safety. Further measures are needed for
exceptionally capable future AI systems, such as models that
could circumvent human control.
Governments must be prepared to license their development, pause
development in response to worrying capabilities, mandate access
controls, and require information security measures robust to
state-level hackers, until adequate protections are ready.
To bridge the time
until regulations are in place, major AI companies should
promptly lay out if-then commitments: specific safety measures
they will take if specific red-line capabilities are found in
their AI systems.
These commitments
should be detailed and independently scrutinized.
AI may be the technology that shapes this century.
While AI
capabilities are advancing rapidly, progress in safety and
governance is lagging behind.
To steer AI toward positive
outcomes and away from catastrophe, we need to reorient.
There is a
responsible path, if we have the wisdom to take it.
References
-
Emergent Abilities of Large Language
Models [link]
Wei, J.,
Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S.
and others,, 2022. Transactions on Machine Learning
Research.
-
About [link]
DeepMind,,
2023.
-
About [link]
OpenAI,,
2023.
-
ML-Enhanced Code Completion Improves
Developer Productivity [HTML]
Tabachnyk,
M., 2022. Google Research.
-
GPT-4 Technical Report [PDF]
OpenAI,,
2023. arXiv [cs.CL].
-
Constitutional AI: Harmlessness from AI
Feedback [PDF]
Bai, Y.,
Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A.
and others,, 2022. arXiv [cs.CL].
-
Examples of AI Improving AI [link]
Woodside,
T. and Safety, C.f.A., 2023.
-
Highly Accurate Protein Structure Prediction with AlphaFold
Jumper,
J., Evans, R., Pritzel, A., Green, T., Figurnov, M.,
Ronneberger, O. and others,, 2021. Nature, pp. 583–589.
-
Superhuman AI for Multiplayer Poker
Brown, N.
and Sandholm, T., 2019. Science, pp. 885–890.
-
Deep Blue
Campbell,
M., Hoane, A. and Hsu, F., 2002. Artificial Intelligence,
pp. 57–83.
-
Alphabet Annual Report, page 33 [PDF]
Alphabet,,
2022.
-
An Overview of Catastrophic AI Risks
[PDF]
Hendrycks,
D., Mazeika, M. and Woodside, T., 2023. arXiv [cs.CY].
-
Taxonomy of Risks Posed by Language Models
Weidinger,
L., Uesato, J., Rauh, M., Griffin, C., Huang, P., Mellor, J.
and others,, 2022. Proceedings of the 2022 ACM Conference on
Fairness, Accountability, and Transparency, pp. 214–229.
-
A Survey on Large Language Model based
Autonomous Agents [PDF]
Wang, L.,
Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J. and
others,, 2023. arXiv [cs.AI].
-
ChatGPT plugins [link]
OpenAI,,
2023.
-
ChemCrow: Augmenting Large Language
Models with Chemistry Tools [PDF]
Bran, A.,
Cox, S., White, A. and Schwaller, P., 2023. arXiv [physics.chem-ph].
-
Augmented Language Models: a Survey
[PDF]
Mialon,
G., Dessì, R., Lomeli, M., Nalmpantis, C., Pasunuru, R.,
Raileanu, R. and others,, 2023. arXiv [cs.CL].
-
HuggingGPT: Solving AI Tasks with
ChatGPT and its Friends in Hugging Face [PDF]
Shen, Y.,
Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y. and others,,
2023. arXiv [cs.CL].
-
The Science of Computing: The Internet Worm
Denning,
P., 1989. American Scientist, pp. 126–128.
-
AI Deception: A Survey of Examples,
Risks, and Potential Solutions [PDF]
Park, P.,
Goldstein, S., O'Gara, A., Chen, M. and Hendrycks, D., 2023.
arXiv [cs.CY].
-
Optimal Policies Tend to Seek Power
[PDF]
Turner,
A., Smith, L., Shah, R. and Critch, A., 2019. Thirty-Fifth
Conference on Neural Information Processing Systems.
-
Discovering Language Model Behaviors
with Model-Written Evaluations [PDF]
Perez, E.,
Ringer, S., Lukošiūtė, K., Nguyen, K., Chen, E. and Heiner,
S., 2022. arXiv [cs.CL].
-
Do the Rewards Justify the Means? Measuring Trade-Offs
Between Rewards and Ethical Behavior in the MACHIAVELLI
Benchmark
Pan, A.,
Chan, J., Zou, A., Li, N., Basart, S. and Woodside, T.,
2023. International Conference on Machine Learning.
-
The Off-Switch Game
Hadfield-Menell,
D., Dragan, A., Abbeel, P. and Russell, S., 2017.
Proceedings of the Twenty-Sixth International Joint
Conference on Artificial Intelligence, pp. 220–227.
-
GitHub Copilot [link]
Dohmke,
T., 2023.
-
Natural Selection Favors AIs over Humans
[PDF]
Hendrycks,
D., 2023. arXiv [cs.CY].
-
Harms from Increasingly Agentic Algorithmic Systems
Chan, A.,
Salganik, R., Markelius, A., Pang, C., Rajkumar, N. and
Krasheninnikov, D., 2023. Proceedings of the 2023 ACM
Conference on Fairness, Accountability, and Transparency,
pp. 651–666. Association for Computing Machinery.
-
On the Opportunities and Risks of
Foundation Models [PDF]
Bommasani,
R., Hudson, D., Adeli, E., Altman, R., Arora, S. and von Arx,
S., 2021. arXiv [cs.LG].
-
AI Poses Doomsday Risks - But That
Doesn't Mean We Shouldn't Talk About Present Harms Too
[link]
Brauner,
J. and Chan, A., 2023. Time.
-
Existing Policy Proposals Targeting
Present and Future Harms [PDF]
Safety,
C.f.A., 2023.
-
Inverse Scaling: When Bigger Isn't
Better [PDF]
McKenzie,
I., Lyzhov, A., Pieler, M., Parrish, A., Mueller, A. and
Prabhu, A., 2023. Transactions on Machine Learning Research.
-
The Effects of Reward Misspecification:
Mapping and Mitigating Misaligned Models [link]
Pan, A.,
Bhatia, K. and Steinhardt, J., 2022. International
Conference on Learning Representations.
-
Simple Synthetic Data Reduces Sycophancy
in Large Language Models [PDF]
Wei, J.,
Huang, D., Lu, Y., Zhou, D. and Le, Q., 2023. arXiv [cs.CL].
-
Unsolved Problems in ML Safety [PDF]
Hendrycks,
D., Carlini, N., Schulman, J. and Steinhardt, J., 2021.
arXiv [cs.LG].
-
Open Problems and Fundamental
Limitations of Reinforcement Learning from Human Feedback
[PDF]
Casper,
S., Davies, X., Shi, C., Gilbert, T., Scheurer, J. and Rando,
J., 2023. arXiv [cs.AI].
-
Consequences of Misaligned AI
Zhuang, S.
and Hadfield-Menell, D., 2020. Advances in Neural
Information Processing Systems, Vol 33, pp. 15763–15773.
-
Scaling Laws for Reward Model Overoptimization
Gao, L.,
Schulman, J. and Hilton, J., 2023. Proceedings of the 40th
International Conference on Machine Learning, pp.
10835–10866. PMLR.
-
Learning from human preferences [link]
Amodei,
D., Christiano, P. and Ray, A., 2017.
-
Goal Misgeneralization in Deep
Reinforcement Learning [link]
Langosco
di Langosco, A. and Chan, A., 2022. International Conference
on Learning Representations.
-
Goal Misgeneralization: Why Correct
Specifications Aren't Enough For Correct Goals [PDF]
Shah, R.,
Varma, V., Kumar, R., Phuong, M., Krakovna, V., Uesato, J.
and others,, 2022. arXiv [cs.LG].
-
Toward Transparent AI: A Survey on Interpreting the Inner
Structures of Deep Neural Networks
Räuker,
T., Ho, A., Casper, S. and Hadfield-Menell, D., 2023. 2023
IEEE Conference on Secure and Trustworthy Machine Learning (SaTML),
pp. 464–483.
-
Chain-of-Thought Prompting Elicits Reasoning in Large
Language Models
Wei, J.,
Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F. and
others,, 2022. Advances in Neural Information Processing
Systems, Vol 35, pp. 24824–24837.
-
Model evaluation for extreme risks
[PDF]
Shevlane,
T., Farquhar, S., Garfinkel, B., Phuong, M., Whittlestone,
J., Leung, J. and others,, 2023. arXiv [cs.AI].
-
Risk assessment at AGI companies: A
review of popular risk assessment techniques from other
safety-critical industries [PDF]
Koessler,
L. and Schuett, J., 2023. arXiv [cs.CY].
-
The Alignment Problem from a Deep
Learning Perspective [PDF]
Ngo, R.,
Chan, L. and Mindermann, S., 2022. arXiv [cs.AI].
-
International Institutions for Advanced AI
Ho, L.,
Barnhart, J., Trager, R., Bengio, Y., Brundage, M.,
Carnegie, A. and others,, 2023. arXiv [cs.CY].
DOI:
10.48550/arXiv.2307.04699
-
International Governance of Civilian AI:
A Jurisdictional Certification Approach [PDF]
Trager,
R., Harack, B., Reuel, A., Carnegie, A., Heim, L., Ho, L.
and others,, 2023.
-
Frontier AI Regulation: Managing
Emerging Risks to Public Safety [PDF]
Anderljung,
M., Barnhart, J., Korinek, A., Leung, J., O'Keefe, C.,
Whittlestone, J. and others,, 2023. arXiv [cs.CY].
-
Predictability and Surprise in Large Generative Models
Ganguli,
D., Hernandez, D., Lovitt, L., Askell, A., Bai, Y., Chen, A.
and others,, 2022. Proceedings of the 2022 ACM Conference on
Fairness, Accountability, and Transparency, pp. 1747–1764.
Association for Computing Machinery.
-
It's Time to Create a National Registry
for Large AI Models [link]
Hadfield,
G., Cuéllar, M. and O'Reilly, T., 2023. Carnegie Endowment
for International Piece.
-
Model Cards for Model Reporting
Mitchell,
M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L.,
Hutchinson, B. and others,, 2019. FAT* '19: Proceedings of
the Conference on Fairness, Accountability, and
Transparency, pp. 220–229.
-
General Purpose AI Poses Serious Risks,
Should Not Be Excluded From the EU's AI Act | Policy Brief
[link]
2023. AI
Now Institute.
-
Artificial Intelligence Incident
Database [link]
Database,
A.I.I., 2023.
-
The Promise and Perils of Tech
Whistleblowing [link]
Bloch-Wehba,
H., 2023. Northwestern University Law Review, Forthcoming.
-
Proposing a Foundation Model
Information-Sharing Regime for the UK [link]
Mulani, N.
and Whittlestone, J., 2023. Centre for the Governance of AI.
-
Auditing Large Language Models: a Three-Layered Approach
Mökander,
J., Schuett, J., Kirk, H. and Floridi, L., 2023. AI and
Ethics.
DOI: 10.1007/s43681-023-00289-2
-
Can Large Language Models Democratize
Access to Dual-Use Biotechnology? [PDF]
Soice, E.,
Rocha, R., Cordova, K., Specter, M. and Esvelt, K., 2023.
arXiv [cs.CY].
-
Towards Best Practices in AGI Safety and
Governance: A survey of Expert Opinion [PDF]
Schuett,
J., Dreksler, N., Anderljung, M., McCaffary, D., Heim, L.,
Bluemke, E. and others,, 2023. arXiv [cs.CY].
-
Regulatory Markets: The Future of AI
Governance [PDF]
Hadfield,
G. and Clark, J., 2023. arXiv [cs.AI].
|