DeepMind, a London-based artificial intelligence research firm owned by Google parent Alphabet, has developed an artificial intelligence algorithm that outperforms any existing similar software in a wide range of language tasks, from reading comprehension to answering questions on a wide range of subjects.
The program approaches human-level performance in select areas, such as a high school reading comprehension test. However, the system fell well short of human talents in other areas, such as common sense reasoning and mathematical reasoning.

DeepMind underlined its aim to play a larger role in natural language processing by releasing the new language model on Wednesday. The company is most known for developing an artificial intelligence system that can beat the world’s best human player in the strategic game Go, a key milestone in computer science, and has recently made a breakthrough in applying AI to predict protein structure.
However, compared to competing laboratories such as OpenAI, a San Francisco-based A.I. research startup, and the A.I. research arms of Facebook, Microsoft, Alibaba, Baidu, and even its sister company Google, DeepMind has done significantly less work on natural language processing (NLP).
Each of these other firms has developed massive language AI systems. This language A.I. algorithms, which are based on neural networks—a type of machine learning software loosely fashioned after the human brain—can ingest and manipulate hundreds of millions, if not hundreds of billions, of data. These A.I. systems are trained on massive archives of books and material gathered from the Internet, and are known among A.I. experts as “ultra-large language models.”
The advantage of such ultra-large language models is that once taught, they can execute a wide range of language abilities, including translation, question-answering, and text writing, with little or no prior experience in those areas. Furthermore, they routinely outperform smaller NLP A.I. software that has been trained to be an expert at a single task.
According to the data published by DeepMind, its language model, Gopher, was significantly more accurate than these existing ultra-large language models on many tasks, particularly answering questions about specialized subjects like science and the humanities, and equal or nearly equal to them on others, such as logical reasoning and mathematics.
Despite the fact that Gopher is smaller than other extremely huge language packages, this was the case. Gopher can tune almost 280 billion various parameters, or variables. This puts it ahead of OpenAI’s GPT-3, which has 175 billion. However, it is smaller than Megatron, a system that Microsoft and Nivida partnered on earlier this year that has 535 billion parameters, as well as Google’s 1.6 trillion and Alibaba’s 10 trillion.
Implications for business
Larger language models have already resulted in more fluent chatbots and digital assistants, more accurate translation software, better search engines, and systems that can summarize difficult documents. DeepMind, on the other hand, stated that Gopher will not be commercialized. DeepMind’s vice president of research, Koray Kavukcuoglo, said, “That’s not the focus right now.”
DeepMind, along with OpenAI, is one of the few businesses whose explicit purpose is artificial general intelligence. However, the researchers did not believe that ever-larger language models would result in human-like intelligence.
“Even though we don’t believe that language or scale is the only way to go,” said Oriol Vinyals, a DeepMind researcher who worked on the company’s new language A.I. system, “we see it as part of a broader portfolio of approaches that we’re studying at DeepMind towards achieving our mission of solving intelligence to advance science and benefit humanity.”
Concerns about ethics
A.I. researchers and social scientists have expressed ethical issues about ultra-large language models as they become more commercialized. The models frequently learn racial, ethnic, and gender preconceptions from the texts on which they are trained, and the models are so complicated that it is impossible to detect and trace these biases before deploying the system. GPT-3, OpenAI’s linguistic A.I., for example, frequently connects Muslims with violent narratives and regurgitates professional gender stereotypes.
Another major concern about such software is that it uses a lot of electricity to train and run the algorithms, which could exacerbate global warming. Some linguistics and A.I. researchers have urged on tech corporations to stop constructing A.I. systems because of these risks and the fact that, despite their scale, they still don’t achieve human-level language understanding.
DeepMind attempted to inoculate itself against this criticism by producing a research article in which its own A.I. ethics team probed into these concerns and attempted to identify ways to address them in tandem with the release of its Gopher research. “We believe that if done responsibly, language model research has the potential to uncover a spectrum of positive uses,” said Iason Gabriel, a DeepMind A.I. ethics researcher who worked on the article. “This concept of responsible innovation is important to us.”
The researchers divided the 21 ethical concerns they found into six categories, ranging from harmful software usage to its potential to disseminate misinformation to environmental consequences. However, the DeepMind ethics team stated that there is no one-size-fits-all solution to many of the problems that ultra-large language models cause. “One significant takeaway we’ve got is that unearthing this broad picture [of potential ethical difficulties] really does involve collaboration of all kinds of expertise,” said Laura Weidinger, another member of the DeepMind A.I. ethics team.
DeepMind also published separate research on a technique that could make the creation of large language models more energy efficient and potentially make it easier for researchers to detect bias and toxic language as well as verify information sources, demonstrating that it was working to address some of the ethical harms that ultra-large language models pose. The technology, dubbed a Retrieval-Enhanced Transformer or Retro for short by the company, has access to a 2 trillion-word database that the software uses as memory.
When given a human-written prompt, the system looks for the passage from its training set that is closest to the first prompt, then looks for the next closest text block, and then utilizes those two passages to guide its response.
Gopher — The new leader in language AI
Autoregressive transformer dense LLMs like Gopher and GPT-3 are called dense LLMs because they use a lot of words. Gopher predicts what the next word will be based on how many words there have been. With 280 billion parameters, it’s only matched in size by Nvidia’s MT-NLG (530B), which was made with Microsoft.

The model was trained on MassiveText (10.5 TB), which is made up of a lot of different things, like MassiveWeb, C4 (Common Crawl text), Wikipedia, GitHub, books, and news stories. DeepMind made the Gopher family with the help of Gopher, which is a group of smaller models that range in size from 44M to 7.1B params. A total of 300B tokens (12.8 percent of MassiveText) were used to train all of the models. This way, we could see how the size of the dataset affected their power.
Performance of Gopher
Doing tasks like math and logic, reasoning, knowing more, science, ethics, or reading comprehension were all compared to how well Gopher did and how well SOTA models did in 124 tasks across different fields. SOTA models like GPT-3 and J1 Jumbo didn’t do as well as Gopher in 81% of the tasks they were asked to do.
There is no disputing that Gopher has been and still is the best language AI system to date. DeepMind is the number one contender for the title and is a sure thing when it comes to language AI, too.
Besides the amazing results, researchers found patterns that recur in different tasks, and these patterns are the same across different tasks. SOTA models did better in some tests than Gopher models, but the gains were huge in other tests. A lot of people say that LLMs have a hard time with common sense and causal reasoning, but they did well with “knowledge-intensive” tasks.
Even though Gopher is better than most previous LLMs, it is still a long way from being able to do most things at the same level as humans (and also from supervised SOTA models trained specifically for the task at hand). Reading comprehension, common sense, logical reasoning, and language understanding are some of the things that people need to know. One anecdote worth noting is that Gopher is good at fact-checking. It did better than smaller models, but not because it had a better understanding of misinformation, but because it knew more facts than the smaller models did, too.