The stunning capabilities of ChatGPT, the chatbot from startup OpenAI, has triggered a surge of new interest and investment in artificial intelligence. But late last week, OpenAI’s CEO warned that the research strategy that birthed the bot is played out. It’s unclear exactly where future advances will come from.
OpenAI has delivered a series of impressive advances in AI that works with language in recent years by taking existing machine-learning algorithms and scaling them up to previously unimagined size. GPT-4, the latest of those projects, was likely trained using trillions of words of text and many thousands of powerful computer chips. The process cost over $100 million.
But the company’s CEO, Sam Altman, says further progress will not come from making models bigger. “I think we’re at the end of the era where it’s going to be these, like, giant, giant models,” he told an audience at an event held at MIT late last week. “We’ll make them better in other ways.”
Altman’s declaration suggests an unexpected twist in the race to develop and deploy new AI algorithms. Since OpenAI launched ChatGPT in November, Microsoft has used the underlying technology to add a chatbot to its Bing search engine, and Google has launched a rival chatbot called Bard. Many people have rushed to experiment with using the new breed of chatbot to help with work or personal tasks.
Meanwhile, numerous well-funded startups, including Anthropic, AI21, Cohere, and Character.AI, are throwing enormous resources into building ever larger algorithms in an effort to catch up with OpenAI’s technology. The initial version of ChatGPT was based on a slightly upgraded version of GPT-3, but users can now also access a version powered by the more capable GPT-4.
Altman’s statement suggests that GPT-4 could be the last major advance to emerge from OpenAI’s strategy of making the models bigger and feeding them more data. He did not say what kind of research strategies or techniques might take its place. In the paper describing GPT-4, OpenAI says its estimates suggest diminishing returns on scaling up model size. Altman said there are also physical limits to how many data centers the company can build and how quickly it can build them.
Nick Frosst, a cofounder at Cohere who previously worked on AI at Google, says Altman’s feeling that going bigger will not work indefinitely rings true. He, too, believes that progress on transformers, the type of machine learning model at the heart of GPT-4 and its rivals, lies beyond scaling. “There are lots of ways of making transformers way, way better and more useful, and lots of them don’t involve adding parameters to the model,” he says. Frosst says that new AI model designs, or architectures, and further tuning based on human feedback are promising directions that many researchers are already exploring.
Each version of OpenAI’s influential family of language algorithms consists of an artificial neural network, software loosely inspired by the way neurons work together, which is trained to predict the words that should follow a given string of text.