The ability of computers to perceive the world around them has improved dramatically during the previous decade. Face recognition software recognizes people’s faces automatically. Smartphones are capable of converting spoken utterances into text. Self-driving automobiles are capable of recognizing and avoiding things on the road.
Deep learning, an artificial intelligence technology, is at the heart of these discoveries. Deep learning is based on neural networks, which are a sort of data structure inspired by biological neuron networks. Layers are used to structure neural networks, with inputs from one layer connected to outputs from the next. To receive the Ars Technica newsletter, enter your email address here.
Since the 1950s, computer scientists have been working with neural networks. However, two major breakthroughs—one in 1986 and the other in 2012—laid the groundwork for today’s burgeoning deep learning business. The finding that we can obtain much greater performance out of neural networks with many layers, rather than just a few, was the breakthrough in 2012—the deep learning revolution. By 2012, there had been a significant increase in the amount of data and computer power accessible, allowing for this discovery.
This feature offers a primer on neural networks. We’ll explain what neural networks are, how they work, and where they came from. And we’ll explore why—despite many decades of previous research—neural networks have only really come into their own since 2012.
This is the first in a multi-part series on machine learning—in future weeks we’ll take a closer look at the hardware powering machine learning, examine how neural networks have enabled the rise of deep fakes, and much more.
Neural networks date back to the 1950s

At least by computer science standards, neural networks are an old concept. Frank Rosenblatt of Cornell University presented a study in 1957 outlining an early notion of neural networks termed a perceptron. In 1958, he developed a crude system that could analyze a 20-by-20 image and distinguish simple geometric forms with the help of the US Navy. Advertisement
Rosenblatt’s main objective wasn’t to build a practical system for classifying images. Rather, he was trying to gain insights into the human brain by building computer systems organized in a brain-like way. But the concept garnered some over-the-top enthusiasm.
FURTHER READING
What is Apple Spatial Audio, How Does it Work, and How to Get It?
The New York Times said that “the Navy disclosed the embryo of an electronic computer today that it anticipates to be able to walk, talk, see, write, reproduce itself, and be cognizant of its existence.”
Each neuron in a neural network is essentially a mathematical function. Each neuron calculates a weighted sum of its inputs; the greater the weight of an input, the more it influences the neuron’s output. This weighted total is then fed into an activation function, which is a non-linear function that allows neural networks to mimic complex non-linear events.
neural network and how does it work
The power of Rosenblatt’s early perceptron experiments—and of neural networks more generally—comes from their capacity to “learn” from examples. A neural network is trained by adjusting neuron input weights based on the network’s performance on example inputs. If the network classifies an image correctly, weights contributing to the correct answer are increased, while other weights are decreased. If the network misclassifies an image, the weights are adjusted in the opposite direction.
Early neural networks were able to “learn” in a fashion that mirrored the activity of the human nervous system on the surface. In the 1960s, the approach was all the rage. But, as computer scientists Marvin Minsky and Seymour Papert documented in their seminal 1969 book, these early neural networks had serious flaws.
Rosenblatt’s early neural networks only had one or two trainable layers. Minsky and Papert showed that such simple networks are mathematically incapable of modeling complex real-world phenomena.
Deeper neural networks were more versatile in theory. Deeper networks, on the other hand, would have put a burden on the limited computing capabilities available at the time. More crucially, no one had created an efficient deep neural network training technique. The earliest neural networks’ simplistic hill-climbing algorithms didn’t scale to larger networks.