Vision4 Demo Guide
Overview
This program demonstrates some capabilities of Numenta's Hierarchical Temporal Memory (HTM) technology applied to visual object recognition.
The HTM network contained in this demo has been trained to recognize four types of objects: cell phones, sailboats, cows, and rubber ducks. The demo assumes a world in which every image falls into one of these four categories. When you run the demo, you select and modify images to be recognized by the HTM network, and it outputs a distribution across the four categories. None of the pre-loaded test images were used in training the HTM network.
What to Expect
We trained the HTM network on these four types of objects by presenting approximately 3500 labeled images, e.g. 720 pictures of cell phones, 1200 pictures of cows, etc. Through this training, the network developed an invariant representation of the four types of objects, allowing it to generalize relative to novel images. All the training and test images are in grayscale; color is not used in this version of recognition. If you import a color image to test, it will be converted to grayscale.
Recognizing four categories of objects may sound like a simple problem, but it isn't! There are a tremendous number of different images that you would recognize as a cow. In addition, the demo lets you add your own test images and create nearly endless variations of test images via scaling, rotation, occlusions, and other transformations.
The demo will not correctly recognize all images. Sometimes this is because the network has never seen a similar image. For example, the network was trained with pictures from the front and side of a cow, but not from the rear. If you show it a picture of the rear of cow, it may not recognize it as a cow. In other cases, the reason that the network makes a mistake will be less clear.
This demo does not include an attention system, which would allow the HTM network to pick out an object of interest in a complex scene. To understand what the network is doing, imagine having an image flashed in front of you and you have to immediately decide what it is, without the ability to focus in on different parts of the image. For example, if you show the system a picture of a farm with a cow in the distance, it may not recognize the cow. It will try to recognize the entire image as a whole. Numenta is developing attention mechanisms that will be included in future versions of NuPIC (Numenta Platform for Intelligent Computing) that address this limitation. For a simple mechanism to approximate attention, you can try the "mask" feature, described below.
Finally, the demo always delivers its best guess for every image it sees. It does not have the ability to say "none". If you show it a picture of yourself, and it returns "cow", don't be offended!
An HTM network is only as good as the training it has received. The
Vision4 demo was trained on a moderate number of images. Have fun, but
don't send bug reports complaining that your favorite rubber duck was
misclassified as a cow! Still, we hope that you will be surprised at
how well it does.
How to Use the Application
To help you get started, we have pre-loaded a set of test images. None of these images were used while training the HTM network, so they are all novel to the network. We purposely included some images that the HTM network will misclassify, so don't expect perfect performance on the pre-loaded images. We recommend that you start with the pre-loaded images. There are controls to rotate, translate in x and y, zoom in and out, add noise, blur, change brightness, and occlude parts of the image. Play with these controls to get a sense of how the system behaves under variation.
There is another control that allows you to isolate or "mask" part of an image. This feature crudely simulates an attention mechanism. Use the drawing tool to create a box around your object, and the system will focus on the information within the box. You can see how recognition accuracy changes when part of an image is isolated.
By default, the demo runs in "Continuous Recognition" mode, meaning that it will attempt to recognize an image after each change you make. If the demo runs too slowly on your system, un-check the "Continuous Recognition" checkbox and press "Recognize" each time you want to recognize an image.
After experimenting with the bundled images, try some of your own. There are two ways to add your own images. First, you can drag individual picture files - or directories of picture files - directly onto the image pane of the demo. Second, you can use a webcam (either one built into your laptop or attached via USB) to take pictures of objects. If a camera is available, the application will show a button that accepts input directly from a camera so you don't need to take pictures and then import them (make sure to attach your camera before starting the application). Be creative! One fun activity is to draw pictures on a white board then see if they are recognized. A few images like this are included in the pre-loaded test image set.
There are menu items that allow you to save captured and modified images to files for later use and to otherwise manage your test images.
We hope that you enjoy and are impressed by the Numenta Vision4
Demo.