Computer Vision

Posted by Eyituoyo Ogbemi on

computer vision example image 1



Consciously or subconsciously, we all experience and use computer vision in our daily lives, most of the time it’s an effortless activity we perform without thinking. Wikipedia defines Computer vision as an interdisciplinary scientific field that deals with how computers can gain a high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the human visual system can do.

Computer vision tasks include methods for acquiring, processing, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the forms of decisions.


The Goal of Computer Vision

In the 1950s, researchers assumed they could recreate the processes that make up human intelligence in computers, resulting in Artificial Intelligence. Along with other areas of artificial intelligence, computer vision was also one of the distinct areas of AI that researchers explored. It started with three specific areas: recreating the eye to see, the visual cortex to describe, and the rest of the brain to interpret and understand. In these three areas, significant progress has been made.

The eye to see:

Reinventing the eye for computers has had the most success over time. Over the last few decades, sensors and image processors that not just match what the human eye can do but also exceed it has been created. With more optically perfect lenses and nanometer-scaled image sensors and processors, modern cameras are extremely precise, picking up the tiniest details in imaging and taking in more images per second than even the human eye can process.

However, despite these breakthroughs, cameras were still limited in scope and field of vision. Until recently, even the best camera sensors couldn’t capture 3D images.

The visual cortex to describe:

It’s not enough to see the images, computers also have to process the images and interpret them. Computers can apply transformations to an image, and therefore discover objects, edges, perspective and movement, in not just one but multiple images. The processes involve a great deal of math and statistics and weren’t made possible until recent advances in parallel computing powered by GPU.

Others parts of the brain to understand and interpret:

Even achieving a toddler’s intelligence has been proven to be extremely complex. Researchers have put together a system that can go over every aspect of an item, from every angle and in every detail and it still won’t be able to tell what the image is or how it is used. Let alone differentiate it from another image. This is because, as humans, we barely understand how our own minds work, so getting computers to navigate images can be a lot more tricky than expected. While this is not a dead-end and past efforts to build a system that can handle interpreting and processing all this information have been proven to be fruitless. A new AI architecture has emerged in the past 5 years or so that is promising.


Computer Vision VS Image Processing

Image processing is the art of creating a new image from an already existing image, basically simplifying or enhancing the content in some way, (think Photoshop). Image processing primarily involves manipulating an image instead of understanding and interpreting the image

While computer vision systems may require image processing to be applied to raw input. For instance, normalizing photometric properties of the image, cropping the bounds of the image, or removing digital noise from an image. Image processing will not require understanding an image or interpreting it in any significant way.


How Computer Vision Works

Think of a computer vision application as finding a task that requires human vision expertise and then deriving some pattern out of it. This would also mean that if such a task can be automated, then we can develop a computer vision application from it. 

When discussing computer visions application, the following points should be at the top of your mind:

Adapt Existing Jobs and Look for Modification: Looking through existing jobs for motivation can help you derive a computer vision-based solution to problems. For instance, using computer vision-based solutions in shopping malls to determine when an object is taken from a shelf and who has taken it. If an item is returned to the shelf, the system is also able to remove that item from a customer’s virtual basket.

Brainstorm: People can brainstorm with colleagues, friends, and family to gather data on problems and check to see if they can be solved using computer vision.

Research: There is no escaping research when you are looking for ideas. The research will not only help you get ideas for new applications but will also help you explore the market for already existing applications.


computer vision example image 2


The Evolution Of Computer Vision

Before the advent of deep learning, the tasks that computer vision could perform were very limited and required a lot of manual coding and effort by developers and human operators. For instance, if you wanted to perform facial recognition, you would have to perform the following steps:

  • Create a database: You had to capture individual images of all the subjects you wanted to track in a specific format.
  • Annotate images: Then for every individual image, you would have to enter several key data points, such as distance between the eyes, the width of the nose bridge, the distance between upper-lip and nose, and dozens of other measurements that define the unique characteristics of each person.
  • Capture new images: Next, you would have to capture new images, whether from photographs or video content. And then you had to go through the measurement process again, marking the key points on the image. You also had to factor in the angle the image was taken.

After all this manual work, the application would finally be able to compare the measurements in the new image with the ones stored in its database and tell you whether it corresponded with any of the profiles it was tracking. In fact, there was very little automation involved and most of the work was being done manually. And the error margin was still large.

Machine learning provided a different approach to solving these problems. With machine learning, developers no longer needed to manually code every single rule into their vision applications. Instead, they programmed “features,” smaller applications that could detect specific patterns in images. They then used a statistical learning algorithm such as linear regression, logistic regression, decision trees, or support vector machines (SVM) to detect patterns and classify images and detect objects in them.


Challenges Facing Computer Vision

Computer vision might seem easy but research has shown that it is anything but. In the beginning, people assumed that all it entailed was connecting a camera to a computer and that was all, but research and advancement in technology have proved that it is a lot more complex than that. After decades of research, computer vision remains unsolved, at least when it compares to meeting the capabilities of the human vision. This is mainly because we don’t fully understand human vision yet.

Studying biological vision requires an in-depth understanding of organs of perception like the eyes, as well as the interpretation of information taken within the brain. A lot of progress has been made in recent times as new tricks and shortcuts have been invented, but there is still so much more to be done. Another challenge faced is a complex visual world and understanding every bit of it. A given object may be seen from any angle, in any lighting condition, with any type of occlusion from other objects and a true vision system must be able to see an infinite number of scenes in an object and still extract meaningful conclusions.

Application Using Computer Vision


  • Object Classification: Determines what general category of object is in a particular photograph.
  • Object Identification: Determines what type of a given object is in a particular photograph.
  • Object Verification: Identifies and verifies a particular object in a photograph.
  • Object Detection: Specifies where the objects are in a photograph.
  • Object Landmark Detection: Specifies what the key points are for the object in a photograph.
  • Object Segmentation: Determines what pixels belong to the object in an image.
  • Object Recognition: Identifies what objects are in a photograph and where they are.

Real-life instances where computer vision can be seen in effect include;

  1. Amazon revealed 18 AmazonGo stores where shoppers can bypass lines and pay for items immediately. Security cameras use computer vision to let employees know when something was taken off the shelves and if it has been returned. It then charges your Amazon Prime account once you finish taking items and filling up your virtual basket.
  2. In other retail stores, computer vision is used to improve security. Tracking each person inside the store at all times to make sure each individual pays for the items they have taken off the shelves.
  3. Countries all over the world are embedding Artificial Intelligence into military weapons, transportation, healthcare, simulation training, and other systems used on land, air, sea, and even space. This ensures precision in identifying, targeting, and locating objects as well as ensuring efficiency in the use of the system while at the same time, requiring less maintenance.
  4. In recent times, Artificial Intelligence uses computer vision to ensure cybersecurity and protect networks from unauthorized access. 
  5. Computer vision is used in identifying cars that go over the speed limit, capturing their numbers and sending them into the system. Increasing efficiency in dealing with traffic rules and regulations and prosecuting offenders.
  6. Companies like Tesla and Google are building self-driven cars. These ars today have Adaptive or Dynamic Cruise Control that has the ability to maintain a safe distance from other vehicles, curbs, and pedestrians.

As three key interlocking factors have begun to come together since 2012. The concepts of “context, attention, and intention” are slowly evolving into computer vision, a new branch of AI.

Share this post

← Older Post Newer Post →

Leave a comment