by Cade Metz and Karen Weise recovered through WayBackmachine Website
The widely used chatbot ChatGPT was designed to generate digital text, everything such as,
But when a team of artificial intelligence researchers at the computer chip company Nvidia got their hands on the chatbot's underlying technology, they realized it could do a lot more.
Within weeks, they taught it to play Minecraft, one of the world's most popular video games.
Inside Minecraft's digital universe, it learned to swim, gather plants, hunt pigs, mine gold and build houses.
The project was an early sign that the world's leading artificial intelligence researchers are transforming chatbots into a new kind of autonomous system called an A.I. agent.
These agents can do more than chat. They can use software apps, websites and other online tools, including spreadsheets, online calendars, travel sites and more.
In time, many researchers say, the A.I. agents could become far more sophisticated, and could replace office workers, automating almost any white-collar job.
Nvidia's agent plays a game.
Similar agents can,
The idea is that these automated systems will eventually act as personal assistants able to handle a wide range of tasks across the internet.
From left, Anima Anandkumar, senior director of A.I. research at Nvidia, with Yuke Zhu and Jim Fan, both senior research scientists. Credit - Gabriela Hasbun for The New York Times
Today's agents are limited, and they can't exactly organize your life.
ChatGPT can search the travel site Expedia for flights to New York, but you still have to book the reservation on your own.
This technology, as researchers improve it, could make office workers and consumers more efficient. It could also change the nature of video games, providing a new wave of bots that gamers can play alongside and chat with.
Over the past several months, the technology has wowed hundreds of millions of people with the way it generates emails, writes speeches and riffs on almost any topic.
But its most important skill may be its knack for writing computer programs.
It can instantly generate a program that draws a unicorn or drops digital snow across your laptop screen.
Professional software developers can ask for code that they can fold into larger programs, including everything from social media apps to search engines. But that is only part of what this technology can do. It can also generate computer code that taps into other software apps and websites.
This is how Dr. Fan and other Nvidia researchers taught GPT-4 to play Minecraft.
People use software apps and websites by touching buttons, menus and other graphical widgets.
A.I. agents use apps and websites by accessing their application programming interfaces, or A.P.I.s - the underlying software code that lets them communicate with other online services.
If you ask an agent to upload a video to the internet, for instance, it could generate code that called an A.P.I. offered by YouTube.
In theory, a chatbot can write code for access to any A.P.I. on the internet. But today's chatbots are not yet adept enough to do more than just simple tasks.
And even if they were, letting them freely roam the internet would be an enormous security risk. So companies are starting small.
A few months after OpenAI unveiled ChatGPT, it quietly released a way for the chatbot to do more than generate text.
After installing various plug-ins - software that augments what the bot can do - you could ask it to search travels sites like Expedia for available flights, grab a map of your hometown from Google Earth or even transform a spreadsheet detailing your yearly spending into a multicolored bar chart.
Equipped with a plug-in called code interpreter, ChatGPT could not just write code but also run it.
This allowed the technology to instantly perform tasks it could not in the past, including editing spreadsheets and transforming still images into videos.
Google, Microsoft and other companies are exploring similar technologies.
Independent projects such as AutoGPT are trying to take this kind of thing several steps further.
The idea is to give the system goals like,
Then it will look for ways of reaching that goal by asking itself questions and connecting to other internet services.
Today, this does not work all that well. Systems like AutoGPT tend to get stuck in endless loops. But researchers like Dr. Fan are constantly refining this kind of technology in an effort to make it more useful and more reliable.
Other researchers are building a new kind of A.I. agent designed for using software tools.
In summer 2022, Dr. Clune was among a team of OpenAI researchers who built an agent that could use computer software much as a person would - mouse click by mouse click, keystroke by keystroke.
Dr. Clune and his colleagues fed the system hours of online videos that showed people playing Minecraft. By analyzing the way people used their mouse and keyboard to navigate through Minecraft's digital universe, the system learned to play the game on its own.
Other companies, including a start-up called Adept, are building similar agents that use websites like Wikipedia, Redfin and Craigslist and popular office apps from companies like Salesforce.
Dr. Clune argues that this kind of agent will eventually allow artificial intelligence to use a much broader range of software apps and websites.
He said everyone would have access to a digital assistant that could potentially do almost anything on the internet.
That could make life easier - but it could also replace countless jobs.
|