Table of ContentsPreviousNextIndex

Put your logo here!


1 Bitworm: Getting Started Example

This chapter gets you started with the Numenta Platform for Intelligent Computing (NuPIC) by explaining how you can run a simple example HTM Network and by briefly examining the example scripts.

See the Numenta website for hardware and software requirements and installation instructions.

Topics

The Bitworm Example

This chapter introduces a simple example called Bitworm. The example illustrates how you might structure your input and category input, how to run your HTM Network, and how to interpret the results. Bitworm is not intended to be a realistic problem, instead, it's used as a Hello World example to get you up and running with NuPIC.

What are Bitworms?

Bitworms are 16-bit vectors. There are solid bitworms, which consist of consecutive on-bits, and textured bitworms, which consist of alternating on/off bits. In each case, the part of the vector that's not a bitworm consists of off bits. Here are some examples:

The Bitworm example program trains an HTM Network to model the world of bitworms. After the HTM Network has been trained, you can submit new data and the HTM Network uses the model of the bitworm world to discriminate between solid and textured bitworms.

Bitworm Example Components

The Bitworm example consists of the following files, discussed in more detail below:

Script
Description
See
RunOnce.py
Runs the other scripts in the example in the appropriate sequence. You can edit RunOnce.py to experiment with different training settings.
CreateNetwork.py
Creates the network, nodes, and links.
GenerateData.py
Generates training set data based on the settings in the RunOnce.py file.
TrainNetwork.py
Trains the HTM Network. At the end of training, each node in the HTM Network has a model of its world and can then use that information to perform inference.
You can edit RunOnce.py to experiment with different settings
RunInference.py
Performs inference using the information collected during training. When you run this script, it categorizes input data; either the training data or new data.
GenerateReport.py
Prints the coincidences for each group to a file.
Grouping is an important part of the learning algorithm. You don't need to understand grouping or the learning algorithm for this simple example.
DisplayReport.py
Displays the groups discovered by the training run.
This script is not called by RunOnce.py, it must be called separately.
ParameterExploration.py
Illustrates how you can explore node parameters.
 

Running the Example

This section explains how to run the example and how you can explore what the HTM Network does by changing the example configuration.

The example has been set up so you need to execute only one script.

On Microsoft Windows, open up a command prompt and type:

cd %NTA%\ share\projects\bitworm 
python RunOnce.py 

On OS X and Linux - assuming $HOME/nta is the location where you installed the software - type the following at the command line:

cd $HOME/nta/current/share/projects/bitworm  
python RunOnce.py  
The example is set up so you always make modifications to the RunOnce.py script, then rerun the script so the example goes through each step again.

The script performs these tasks:

1. Generates a set of training data by calling GenerateData.py using the parameters set in RunOnce.py. The default is to generate temporally coherent data, that is, sequences of solid and textured bitworms of variable bitworm length. The minimum and maximum length are specified in the trainingMinLength/testMinLength and trainingMaxLength/testMaxLength parameters.

2. Creates the bitworm HTM Network, that is

a. creates a Network instance (from the Python package nupic.network)

b. creates the specified nodes for the network

c. links the inputs and outputs of the nodes

d. saves the untrained HTM Network to a file that describes the HTM Network structure

3. Trains the HTM Network. The example proceeds as follows:

a. The script enables the sensor and bottom-level (level 1) node for training and calls the appropriate methods to perform training at level 1.

During training, the nodes learn, that is, they construct a model of their world.

b. After the bottom-level nodes have been trained, the script enables inference for the bottom-level node and training for the top-level node. While in inference mode, the bottom-level node sends the result of its inference computation to the next level (in this example, the top level). The top level performs learning with those input data.

4. When the top-level node receives input during training, it also receives category information. It assigns the data to categories as part of training.

5. RunOnce.py calls runInference to explore how the trained HTM Network handles new data. During inference, the trained HTM Network looks at each input bitworm and determines the probability that the bitworm belongs to one or the other category.

6. Finally, RunOnce.py calls GenerateReport, which prints the coincidences for each group to a file called report.txt.

Grouping is an important part of the learning algorithm. You don't need to understand grouping or the learning algorithm for this simple example. See Inside a Learning Node: How Learning and Inference Happen on page 38 in Advanced NuPIC Programming for an overview and the white papers on the Numenta website at http://www.numenta.com/for-developers/education/algorithms.php for more detailed information.

Examining the Report

The GenerateReport script that is run as part of RunOnce generates a report file named report.txt that includes the following information:

   ------------------------------ 
    
   General network statistics: 
   Network has  5 nodes. 
   Node names are: 
       Sensor 
       CategorySensor 
       Level1 
       Level2 
       FileOutput 
 
 
   Node Level1 has 40 coincidences and 7 groups. 
   Node Level2 has 8 coincidences. 

   ------------------------------

 
   ------------------------------ 
 
   Performance statistics: 
   Comparing:  training_results.txt  with  training_categories.txt 
   Performance on training set:100.00%, 420 correct out of 420 vectors 
 
   Comparing:  test_results.txt  with  test_categories.txt 
   Performance on test set: 97.86%, 411 correct out of 420 vectors 
 
------------------------------ 
 

Note that the Bitworm example gets very good results because this is a toy problem: The assumptions matched those of the current learning algorithm precisely. Achieving the same degree of success for more complex problems can be more challenging.

   Getting groups and coincidences from the node Level1 in network ' 
   trained_bitworm.xml 

 

   ====> Group =  0 
   1 0 1 0 1 0 1 0 1 0 1 0 0 0 0 0  
   0 1 0 1 0 1 0 1 0 1 0 1 0 0 0 0  
   0 0 1 0 1 0 1 0 1 0 1 0 1 0 0 0  
   0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 0  
   0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0  
   0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1  
  
   ====> Group =  1 
   0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0  
   1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0  
   0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0  
   0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0  
   0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0  
   0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0  
   0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1  

This display can be helpful in a simple program, such as bitworms. It's easy to see how clean the groups are. Each group contains a different kind of bitworm. For other programs, using Numenta Visualizer might be better. For a discussion of Numenta Visualizer, see Debugging Your HTM Application.

You can look at the GenerateReport.py file to see what Python calls you can use to retrieve information from the network. Comments in the file make it easy to understand your options.

Displaying the Report

You can run the DisplayReport.py script to see a visual representation of the groups, as follows:

Running the Example with Temporally Incoherent Data

You can change the useCoherentData parameter in RunOnce.py to generate solid and textured bitworms that are not presented in sequence, that is, that have no temporal relationship. Submitting those data to the trained HTM Network illustrates the importance of the temporal aspect of the training data.

To run the example with incoherent data:

1. In the RunOnce.py script, change the useCoherentData parameter to False.

2. Execute RunOnce.py again.

The example runs with data that include both solid and textured bitworms but don't present sequences of solid bitworms followed by sequences of textured bitworms.

3. Examine the report.txt file this run generated. You should see that the HTM system found it difficult to find the groups and to categorize the data.

Running the Example with Noisy Data

The data generation script allows you to change the data by introducing some noise and to observe the results. There are two types of noise:

0
0
0
0
1
1
1
1
1
0
0
0
0
0
without noise
-.1
0.1
0
1.01
1
1.1
.98
.98
1.05
0
0.05
0.09
0
0.07
with noise
0
0
0
0
1
1
1
1
1
0
0
0
0
0
0
0
without bitflip
0
0
0
1
0
1
1
1
0
0
0
0
0
0
1
0
with bitflip

In the bitworm example, you can introduce noise to the data and see how the noise affects recognition.

To run the example with noisy data:

1. In the RunOnce.py script, return the useCoherentData parameter to True.

2. Change additiveNoiseTraining to 0.1. This setting adds uniform random noise in the range [-0.1, 0.1] to the inputs. Note that once you start adding noise to the inputs, it becomes harder for the learning algorithm to detect temporal coherence.

3. Execute RunOnce.py.

You get a Python exception with the message The current parameters generated 400 groups, which exceeds the maximum of 25 groups. This message means you didn't have enough outputs for the number of groups that were found. Although the number of underlying causes has not changed, the noise makes it harder for the algorithm to create a compact set of groups based on temporal coherence.

4. Change the following two parameters in RunOnce.py and save the revised file:

5. Call RunOnce.py again. The script now runs without generating an exception.

6. Examine the report.txt file this run generated. You should see that the results are good; however, notice that the number of groups is fairly large compared to the original number. This is an indication that the learning algorithm found it more difficult to find the groups and categorize the data.

7. Set maxDistance to 0.2, rerun RunOnce.py, and examine reports.txt. This time the HTM gets the same number of groups as it did before you added noise.

This example illustrates how a combination of parameters (maxDistance and maxGroups) affects whether the HTM Network works well or does not work at all. If you wish, you can experiment with some of the other RunOnce.py parameters. Here are some possibilities:

Understanding the Example Scripts

This section briefly discusses the Bitworm example scripts.

RunOnce.py: Your Entry Point to Bitworms

The RunOnce.py script runs the component scripts of the example in sequence. As a rule, you should always call RunOnce.py, not one of the component scripts.

RunOnce allows you to set the following parameters:

useCoherentData
When set to true (the default), the GenerateData.py script creates sequences of solid bitworms followed by sequences of textured bitworms.
When set to false, the GenerateData.py script mixes solid and textured bitworms randomly. In that case, the temporal element is missing from the data.
numSequencesPerBitwormType
Number of sequences for each bitworm type. For example, you could present ten sequences of textured bitworms and ten sequences of solid bitworms. The sequences are always separated by a row of zeros (0). GenerateData.py always generates the same number of sequences of each type.
sequenceLength
Length of each sequence (e.g 20 bitworm vectors, followed by one vector of zeros).
trainingMinLength 
trainingMaxLength 
testMinLength 
testMaxLength 
Minimum and maximum length of the generated bitworms.
inputSize
Size of the input vector. Defaults to 16.
additiveNoiseTraining additiveNoiseTesting
Allows you to add noise in the range [-additiveNoise, additiveNoise] to each input element. Default is 0. You can add noise during testing, during training, or both. See Running the Example with Noisy Data.
bitFlipProbabilityTraining 
bitFlipProbabilityTesting 
Probability that a bit will be flipped from 0 to 1 or vice versa, that is, a 0 bit becomes 1 or a 1 bit becomes 0. Can be combined with additiveNoise. Default is 0. See Running the Example with Noisy Data.
maxGroups
Maximum number of groups that can be learned at level 1.
maxGroupSize
Specifies how large the groups in the temporal pooler can become.
maxDistance
Sets the maximum Euclidean distance at which two input vectors are considered the same during learning. See Affecting Learning Node Behavior With Node Parameters.
topNeighbors
If the topNeighbors value is big, wider grouping and bigger sets result. If the value is small, narrower grouping and smaller sets result. Affecting Learning Node Behavior With Node Parameters on page 42 in Advanced NuPIC Programming.

GenerateData.py: An Example of Data Generation

The GenerateData.py script generates a file with training data or testing data, plus an associated category file, using the parameter settings specified in RunOnce.py. Data are generated in sequences: For each sequence, the code generates a bitworm specified by sequenceLength using a random length and position (within the current parameter constraints), and then slides the bitworm left or right. At the end of each sequence, GenerateData inserts a line of zeros to reset the node so that the node does not attempt to learn temporal correlation between two bitworm sequences.

The script includes methods to generate data in which no temporal correlation exists. Those methods are called when the useCoherentData parameter is set to False.

Several aspects of this data setup are interesting:

CreateNetwork.py: Creating the Untrained HTM Network File

The CreateNetwork.py script creates the Network and Node instances, links the nodes, and saves the complete untrained network to an XML file. During training and testing, the Numenta Runtime Engine (NRE) can then load the network file and use the information about the network structure to process the data. See Constructing an HTM Network for a detailed discussion of network creation.

Node Hierarchy

Figure 1 shows the hierarchy of nodes in the bitworm example. This is the simplest possible HTM hierarchy.

Figure 1 Nodes in the Bitworm Example

To create this hierarchy, the script goes through these steps:

1. Creates the Network instance, specifying three parameters. See Affecting Learning Node Behavior With Node Parameters on page 42 in Advanced NuPIC Programming for some background information about the parameters.

   def createNetwork(untrainedNetwork, 
                  inputSize = 16, 
                  maxDistance = 0.0, 
                  topNeighbors = 3, 
                  maxGroups = 8): 

2. Uses the CreateNode() function to specify each node and its parameters. The bitworm network has a data sensor and a category sensor, one bottom-level node, one top-level node, and one effector.

The level 1 learning node is an instance of Zeta1Node, the level 2 node an instance of and Zeta1TopNode. Those two classes encapsulate the learning algorithms used by Numenta HTM Networks: Zeta1TopNode expects to get both domain data and category data, while Zeta1Node expects domain data only.

3. Adds each node to the network. For example:

   sensor = CreateNode("VectorFileSensor", 
       phase=0, 
       dataOut= inputSize) 
   net.addElement("Sensor", sensor) 
 

The phase determines when the node is scheduled. You must specify a phase for each node. In many cases, using a phase that corresponds to the level is appropriate. See Scheduling Node Processing on page 79 in Advanced NuPIC Programming for more information on scheduling.

4. Links the nodes using the Network.link() method. The method expects as input a node and its output and a second node and its input. See Node Inputs, Node Outputs and Links on page 34 in Advanced NuPIC Programming for more information. For example, to link the sensor to the bottom node:

   net.link("Sensor", "dataOut", "Level1", "bottomUpIn") 
 

This call links the Sensor's dataOut output to the Level1 node's bottomUpIn input.

5. Saves the network to an XML file.

   net.writeXML(untrainedNetwork) 

TrainNetwork.py: Training of the HTM Network

During training, each node in the HTM Network builds a model of its world using the available input data.

The TrainNetwork.py script performs training as follows:

1. Creates a RuntimeNetwork object that contains information about the session and data files to be used. The training file is the data file we created earlier.

   runtimeNet = CreateRuntimeNetwork(untrainedNetwork, 
       files=[trainingFile, trainingCategories]) 

2. Loads the training and category data for each sensor.

   sensor = runtimeNet.getElement("Sensor") 
   categorySensor = runtimeNet.getElement("CategorySensor") 
 
   sensor.execute("loadFile", trainingFile) 
   categorySensor.execute("loadFile", trainingCategories) 
 

These commands extract the RuntimeNode for each sensor from the RuntimeNetwork, and then send a loadFile command to each RuntimeNode.

3. Runs the Sensor and Level1 nodes to train level 1. That means level 1 is progressively building a model of its world, which will later be used by the next level. Before starting the run, learning is automatically turned on for Level1. After the Level1 run is complete, the system turns off learning and turns on inference for that level.

   runtimeNet.run(Zeta1Train("Level1", numVectors), ["Sensor", "Level1"]) 
 

Zeta1Train is a special run policy for training Zeta1Nodes and Zeta1TopNodes. See Advanced NuPIC Programming for more information.

4. Resets the sensors before training Level2. The commands use the RuntimeNode instances extracted in Step 2 above.

   sensor.setParameter("position", "0") 

   categorySensor.setParameter("position", "0")

5. Runs Level2.

   runtimeNet.run(Zeta1Train("Level2", numVectors), exclusion=["FileOutput"]) 
 

In Step 3, inference was turned on for level 1. While in inference mode, the Level1 node sends its output to the top-level node.

6. Finally, saves the trained HTM Network file and cleans up the bundle of temporary files used by the NRE. Session Bundles on page 55 in Advanced NuPIC Programming discusses bundles in some detail.

   runtimeNet.writeXML(trainedNetwork) 

   runtimeNet.cleanupBundleWhenDone()

When the script completes, the NRE stops.

RunInference.py: Running the Trained Network with New Data

The RunInference script allows you to submit new data to the trained HTM Network. This script loads a trained HTM Network, and runs the entire network in inference mode using a given data file. You can run the script on the original training data or on new test data. No category file is specified during inference.

The script proceeds as follows:

1. Creates a runtime network and adds data files.

   runtimeNet = CreateRuntimeNetwork(trainedNetwork, 
      files=[testFile]) 
   runtimeNet.getElement('Sensor').execute('loadFile', testFile) 

2. Sets up the effector output so it's stored inside the bundle.

   fileOutputEffector = runtimeNet.getElement('FileOutput') 
   fileOutputEffector.execute("setFile", resultsFile) 
   fileOutputEffector.execute("echo", 
    "Numbers show Level 2 node output followed by the sensor output") 
 

At this point, each node is in inference mode because the trained HTM Network was saved in that state.

3. Runs through training patterns. During inference, categories are not known, so the CategorySensor is disabled.

   runtimeNet.run(numVectors, exclusion=["CategorySensor"]) 
 

Note that when you submit new data to a trained network, Zeta1Train is not used because no enabling and disabling of learning is necessary. All learning nodes have been trained and are in inference mode.

4. Retrieves the results file and cleans up. The flushFile command ensures that the inference output file has been fully written to disk and is not partially stored in an operating system file cache.

   fileOutputEffector.execute("flushFile") 
   runtimeNet.getFiles(resultsFile) 
   runtimeNet.cleanupBundleWhenDone() 
 

In most cases, you don't need to be concerned about details regarding bundles. Just retrieve any files you'd like to keep, and then call cleanupBundleWhenDone(), as in the code fragment above.


Numenta
www.Numenta.com
Table of ContentsPreviousNextIndex