When I first arrived at Silicon Valley for my internship, the entire environment looked new. It took me several weeks to get familiar with my new neighborhood. Interestingly, even with advanced GPS apps on my phone, the most effective way to learn a new environment is to walk on the street, memorizing landmarks, and making different turns at intersections. The GPS app could give me smart directions, but I could not really learn the world just by staring at the map. Knowledge comes from practice, and we always learn something through active exploration. Nevertheless, most artificial intelligence techniques adopt data intensive machine learning approaches. Algorithms are trained to find patterns by observing massive amounts of data passively, usually without generating any actions.
At Numenta, we are working on a next-generation machine intelligence algorithm that learns complex patterns through active exploration. The stream of sensory inputs is actively generated by execution of a series of motor commands. We call this new learning paradigm sensorimotor learning and prediction, or SLAP. To understand how the algorithm works, let us first think of how our brain solves the same problem.
We know that all our remarkable cognitive abilities, object recognition, scene interpretation, reasoning and prediction, starts from data streams collected by our “sensors”, such as retina at the back of eyes, tactile sensors under the skin and auditory sensors in the cochlea. Believe it or not, most of the inputs to the sensors are actually generated by ourselves, rather than by changes in the external world. Our eyes are constantly moving; our touch senses mostly arise from our own body movement, and the speech we generated is also picked up by our auditory nerves. After we learn a new environment, we are rarely surprised by the consequences of our own actions. I can predict exactly what I will see after each turn on my way to work now. This prediction is based on my current sensory input and the motor command I am going to execute. Moreover, despite dramatic changes of input to my sensors, my internal perception is stable. These two aspects reflect two component of the algorithm. We call the prediction step “sensorimotor inference”, and the process of building stable representations as “temporal pooling”.
Jeff Hawkins described the basic ideas on the NuPIC mailing list. During my internship I implemented and worked on several SLAP experiments using synthetic datasets. In one experiment, we trained the SLAP algorithm to recognize a large number of synthetic images composed of “squares”. Each square is painted with different color and different images share the same set of colors. Two example images are shown below. A white diamond represents the portion of the image that lies on the fovea, and a black arrow represents the proposed motor command.
The algorithm is allowed to explore each image through simulated eye-movements. At each step the image under the fovea and the motor command is fed to the algorithm. The first layer of the network learns to make predictions of the next sensory input. The algorithm also utilizes a reasonable assumption that if two things are close to each other in time (temporal proximity), they tend to originate from the same underlying cause, and thus should be grouped together. Neurons of the second layer “pool” over many neurons in the first layer, and form a stable representation that is unique to each pattern. During learning, stable representations will emerge despite changes to the input in the first layer. These stable representations indicate recognition of the larger image.
The figure below shows example output from a trained system while the eyes are moving around two different images (10 iterations for each image). At each step, the sensory input is changing drastically. However, the overall output is a stable and unique representation for each image.
This simple example illustrates a fundamental mechanism our brain uses to create stable representations from a changing world. The same mechanism can also be used for a large variety of problems where the sensor data is actively generated by the system, such as robot learning, vehicle control, and complex pattern recognition/detection problems.
Note: Some of my implementation code is now available in the nupic.research github repository.