Machine Perception

Posted by Eyituoyo Ogbemi on

Research in machine perception tackles the hard problems of understanding images, sounds, music, and video. In recent years, our computers have become much better at such tasks, enabling a variety of new applications such as content-based search in Google Photos and Image Search, natural handwriting interfaces for Android, optical character recognition for Google Drive documents, and recommendation systems that understand music and YouTube videos. Our approach is driven by algorithms that benefit from processing very large, partially-labeled datasets using parallel computing clusters. A good example is our recent work on object recognition using a novel deep convolutional neural network architecture known as Inception that achieves state-of-the-art results on academic benchmarks and allows users to easily search through their large collection of Google Photos. The ability to mine meaningful information from multimedia is broadly applied throughout Google.


Human intelligence comes from an amazing duality of arriving at conclusions based on the perception of patterns, and the contrary, conclusions based on very structured and rational decisions. Both forms are distinct but complementary. Machine-based intelligence also comes in two forms: Deep learning-based Artificial Intelligence interprets patterns in data to arrive at conclusions and hence, mimics the perception-based intelligence of our brain whereas; and standard instruction-by-instruction computing (like in a PC) mimics the rational intelligence of our brain.


I have always wondered why I can easily recall a song’s tune, but not the words. I can remember a face better than the name. I can detect the smell of a specific perfume, but not label it. I can identify the taste of a wine, but not describe it accurately. I can blindly feel the fabric and tell if it is silk, wool, or cotton, but I cannot exactly say why.

It seems as if I can store and recall complex patterns of sound, face, smell, taste, and feeling better than a rational or verbal description. Perception is our ability to see, hear, or become aware of something through the senses. Perception is derived as a single unified awareness from sensory processes. However, when I do logical thinking involving physics, mathematics, planning, calculations, accounts, or formulating strategy and tactics, there are very few patterns from the past. Does my brain deal with patterns differently than with logical situations?


Simplistic Model of Our Brain


If I were to model the brain based on my observations it would consist of two parts:


  1. Right — Perception-based

  1. Left — Rational based

Our senses — taste, sight, touch, smell, and hearing — provide patterns to the right part of our brain to generate perceptions. Whereas all our logical interpretations influence the left part and generate a structured and rational understanding of a situation or a problem.


When we study physics or mathematics we are mostly using the rational part of the brain that is best suited to provide us a logical structure for the subject. However, when we are dealing with patterns created by our senses, we are using the perception part of the brain. Our five senses are the prime sources of patterns for creating perceptions. Since most situations are a mix of logic and patterns, we collaboratively use both rational and perception parts of the brain to arrive at conclusions and make decisions. Both parts, perception and rational, are integral sources of human intelligence.


There is another key component — possibly, the most dominant one — in decision making: Emotions and feelings. I believe our emotions are remnants of perceptions locked in the brain by situations experienced in the past. If the pattern of sounds and sights of a situation generated a perception of fear in me, then the fear associated with that pattern is left behind as an emotion. Emotions come in various forms like fear, likes, dislikes, affinity, anger, envy, and love. Emotions are invoked by patterns generated by senses and contribute to the overall perception, even in a new situation. 


For example, if the smile and voice of a Chinese person created a pleasant perception in me, it leaves behind a positive emotion of being liked. When I see another smiling Chinese person, the emotion left behind from the past influences my perception. How can I integrate emotions into the simplistic model of the brain? They are like an “inner” sense, contributing and influencing the patterns from other senses. We often do not realize, nor can we reconstruct which sense or patterns or emotion contributed most to the overall perception. It is known that the food in a restaurant tastes better, if the décor and the music match the type of food, influencing the perception of the taste, while the same food can taste awful if the service is poor.


Both parts of the brain are simultaneously active in all situations. The right part may be busy generating perceptions based on patterns, but simultaneously, in the same situation, the rational part of the brain is busy constructing a rational interpretation of the situation based on some logical structure and comes up with a rational conclusion. Who wins? Right or the left-brain? It depends on the situation. When talking to the smiling Chinese person, the perceptions could be positive, but the rational brain may disagree with his arguments, leading to a major conflict. Who decides if I continue to deal with the smiling Chinese? The right or the left-brain?


Let me illustrate the dynamics between the two parts of our brain using a trivial example of shopping. The other day I was passing by a sports store when a biking jersey caught my eyes. I went in and checked it out. The color, the shape, the design, the feel of the fabric instantly appealed to me. It fitted beautifully on the mannequin in the store. My emotions, triggered by something similar I bought last year, gave it a very positive perception. My right brain (perception part) said: go for it. However, the rational side of my brain said, I already have something similar and if I buy this one, I will never wear the previous one. 


Moreover, it is quite pricey and not on sale. I should wait and watch for it to come on sale. But the right side insisted, it is so cool; I must get it, fearing that it might be sold out soon. The left side said, I have so many jerseys, I have no space to keep them all and there are better things to spend my money on, like better biking shoes instead. Eventually, my emotions cast a veto and made me buy the jersey. I am sure that everyone has been through similar situations, whether it is about a jersey, shoes, wine, house, or even a partner.


How does all this apply to Artificial Intelligence?


Most AI systems today are based on Deep Learning, where learning happens through exposing the AI system to tens of thousands of illustrative examples. Deep Learning method involves absorbing intricate details and subtle nuances in pictures, videos, or sounds into the parameters of the neural network of the AI system. See: “How AI machines learn — just like humans https://medium.com/@sharad.gandhi/how-humans-and-machines-learn-c48de5360527#.1l2g9vee0.” After the training, the AI system is able to perceive the input data based on patterns in images, faces, objects, movements, or sounds fed into the system. The AI system decision-making is based on the perception of the input data patterns, behaving like the right side of the brain — specializing in perceiving patterns.



Interesting Observations and Conclusions


  1. AI deals with understanding patterns in the input data for a situation and deriving perceptions based on its deep learning about a specific topic. These perceptions are expressed as a “level of confidence” for the decisions to be taken for that situation. AI is, in effect, Artificial Perception. An AI machine mimics the perceptional ability of the human brain.

  1. The software in a standard computer (like a PC) is structured logic and analogous to the rational part of the brain.

  1. Interestingly, the intelligence of humans is associated with rationale thinkers — e.g. Newton, Einstein, etc. However, Deep Learning AI is really about the perceptional skill of our brain.

  1. Human perceptional intelligence has an evolutionary history of millions of years and hence, is much more profound than our rational abilities of Homo Sapiens, which has developed much more recently.

  1. Human perception-based decision-making is difficult to describe verbally in any detail because it is almost automatic and sub-conscious. In contrast, logic, by definition, can be described exactly.

  1. We resolve most situations through an intimate interconnection between the rational and perceptional parts of the brain. The internal networking between the two skills of our brain remains a mystery. The uniqueness of human judgment comes via this ability to simultaneously draw on both parts.

  1. Today’s (narrow) AI neural networks are typically focused on just one area of expertise. Interconnecting 100s or 1000s of neural networks specialized in different areas could result in a wider general-purpose intelligence — similar to the interconnection of various specialized areas within our brain.

  1. We are in the early stages of interconnecting standard computing and AI in real systems to benefit from their complementary roles. This interconnection will make future AI systems much more diversely capable.

  1. Our brain has a vast number of dimensions — far exceeding what we are attempting to imitate with AI machines today. The human potential allows us to solve very complexly, multi-discipline problems, gives us the power of imagining things and situations that never existed, creativity, generating very powerful emotions and drives to achieve what seems impossible and, gives us an incredible sense of consciousness and self-awareness.

  1. Today’s AI systems, even with its narrow and limited scope, are still capable of revolutionary changes to how we live and work. It offers incredible opportunities for simplifying and personalizing the use of products and offering radically new services. We are just at the very beginning of a massive change.


In summary, we can say that just like two major decision-making skills of our human brain — via the perception of patterns and rational, via logic — are also reflected in computing-based decision making. Today’s AI, based on Deep Learning techniques, leads to perception based decisions whereas, standard computing, like in a personal computer, is based on structured logic for rational decision making. Better and balanced decisions come from combining the best of both flavors.


Share this post



← Older Post Newer Post →


Leave a comment