What is computer vision?
Computer vision is a branch of artificial intelligence and machine learning that examines the technologies and techniques that enable computers to learn to perceive and interpret visual data in the real world.
The easiest part is seeing the world: all you need is a camera. Connecting a camera to a computer, on the other hand, is insufficient. The difficult aspect is classifying and interpreting the items in photographs and videos, as well as their relationships and the context of what is happening. What we want computers to be able to do is explain what’s in a picture, video clip, or real-time video stream.
To put it another way, one of the main goals of this subject is to ensure that a machine can understand an image as well as, if not better than, a human. This is a difficult task, as you shall discover later.
What is computer vision and how does it work?
In order to make the machine recognize visual objects, it must be trained on hundreds of thousands of examples. For example, you want someone to be able to distinguish between cars and bicycles. How would you describe this task to a human?
Normally, a bicycle has two wheels, while a machine has four. Alternatively, a bicycle has pedals, whereas the machine has not. This is known as feature engineering in machine learning.
.png)
However, as you might already notice, this method is far from perfect. Some bicycles have three or four wheels, and some cars have only two. Also, motorcycles and mopeds exist that can be mistaken for bicycles. How will the algorithm classify those?
Cases of misclassification grow more often as you construct more complex systems (for example, facial recognition software). To be able to characterize a person’s face, the ML engineer would have to conduct hundreds of measurements such as the distance between the eyes, the distance between the eye and the corners of the mouth, and so on.
Moreover, the accuracy of such a model would leave much to be desired: change the lighting, face expression, or angle and you have to start the measurements all over again.
Here are several common obstacles to solving computer vision problems.
Different lighting
For computer vision, it is very important to collect knowledge about the real world that represents objects in different kinds of lighting. A filter might make a ball look blue or yellow while in fact it is still white. A red object under a red lamp becomes almost invisible.
.png)
Noise
If the image has a lot of noise, it is hard for computer vision to recognize objects. Noise in computer vision is when individual pixels in the image appear brighter or darker than they should be. For example, videocams that detect violations on the road are much less effective when it is raining or snowing outside.
Unfamiliar angles
It’s important to have pictures of the object from several angles. Otherwise, a computer won’t be able to recognize it if the angle changes.

Overlapping
When there is more than one object on the image, they can overlap. This way, some characteristics of the objects might remain hidden, which makes it even more difficult for the machine to recognize them.
Different types of objects
Things that belong to the same category may look totally different. For example, there are many types of lamps, but the algorithm must successfully recognize both a nightstand lamp and a ceiling lamp.

Fake similarity
Items from many categories can sometimes appear to be the same. For example, you’ve probably encountered people who look like celebrities in images taken from a certain perspective but not so much in real life. Misrecognition is a common occurrence in CV. Samoyed puppies, for example, might be mistaken for small polar bears in some photographs.
It’s nearly impossible to consider all of these scenarios and use feature engineering to prevent them. As a result, deep artificial neural networks now dominate computer vision almost entirely.
Convolutional neural networks are highly good at extracting features and allow engineers to save time by eliminating human labor. The VGG-16 and VGG-19 architectures are two of the most well-known CNN designs. True, deep learning necessitates a large number of instances, but this is not a problem: each year, around 657 billion photos are uploaded to the internet!
Uses of computer vision

Interpreting digital images and videos comes in handy in many fields. Let us look at some of the use cases:
- Medical diagnosis. The techniques of image classification and pattern detection are commonly used in the development of software systems that help doctors diagnose severe conditions like lung cancer. An AI system has been trained to evaluate CT scans of oncology patients by a group of researchers. The system was 95% accurate, while humans were just 65% accurate.
- Factory management. It is important to detect defects in the manufacture with maximum accuracy, but this is challenging because it often requires monitoring on a micro-scale. For example, when you need to check the threading of hundreds of thousands of screws. A computer vision system uses real-time data from cameras and applies ML algorithms to analyze the data streams. This way it is easy to find low-quality items.
- Retail. Amazon was the first company to open a store that runs without any cashiers or cashier machines. Amazon Go is fitted with hundreds of computer vision cameras. These devices track the items customers put in their shopping carts. Cameras are also able to track if the customer returns the product to the shelf and removes it from the virtual shopping cart. Customers are charged through the Amazon Go app, eliminating any necessity to stay in the line. Cameras also prevent shoplifting and prevent being out of product.
- Security systems. Facial recognition is used in enterprises, schools, factories, and, basically, anywhere where security is important. Schools in the United States apply facial recognition technology to identify sex offenders and other criminals and reduce potential threats. Such software can also recognize weapons to prevent acts of violence in schools. Meanwhile, some airlines use face recognition for passenger identification and check-in, saving time and reducing the cost of checking tickets.
- Animal conservation. Ecologists benefit from the use of computer vision to get data about the wildlife, including tracking the movements of rare species, their patterns of behavior, etc., without troubling the animals. CV increases the efficiency and accuracy of image review for scientific discoveries.
- Self-driving vehicles. By using sensors and cameras, cars have learned to recognize bumpers, trees, poles, and parked vehicles around them. Computer vision enables them to freely move in the environment without human supervision.
Main problems in computer vision

Computer vision aids humans across a variety of different fields. But its possibilities for development are endless. Here are some fields that are yet to be improved and developed.
Scene understanding
CV has a knack for locating and identifying stuff. However, it has trouble recognizing the scene’s context, especially if it’s not straightforward. Take a look at this illustration. (Don’t peek at the URL!) What do you suppose they’re up to?
You’ll notice that the children are sporting cardboard boxes on their heads right away. It isn’t some type of postmodern art that attempts to demonstrate the futility of school education. A solar eclipse is being observed by these children.
You might a never grasp what’s going on if you don’t have this context. In the great majority of cases, artificial intelligence still feels like that. To fix the issue, we’d have to build generic artificial intelligence (AI whose problem-solving capabilities are more or less comparable to those of a person and can be used anywhere), but we’re still a long way off. you might also be interested in Why is Cybersecurity so Important
Privacy issues
Computer vision has much to do with privacy since the systems for face recognition are being adopted by governments of different countries to promote national security. AI-powered cameras installed in the Moscow metro help catch criminals. Meanwhile, Chinese authorities profile Uyghur individuals (a Muslim ethnic minority) and single them out for tracking and incarceration. When facial recognition is everywhere, everything you do can be subject to policies and shaming. AI ethicists are still to figure out the consequences of omnipresent CV for public wellbeing.
Summing up
Computer vision is an innovative field that uses the latest machine learning technologies to build software systems that assist humans across different fields. From retail to wildlife conservation, smart algorithms solve the problems of image classification and pattern recognition, sometimes even better than humans.