Table of ContentsPreviousNextIndex

Put your logo here!


1 Bitworm: Getting Started Example

This chapter gets you started with the Numenta Platform for Intelligent Computing (NuPIC) by explaining how you can run a simple example HTM Network and by examining the example scripts.

See the Numenta website for hardware and software requirements and installation instructions.

Topics

The Bitworm Example

This chapter introduces a simple example called Bitworm. The example illustrates how you might structure your input and category input, how to run your HTM Network, and how to interpret the results. Bitworm is not intended to be a realistic problem, instead, it's used as a Hello World example to get you up and running with NuPIC.

What are Bitworms?

Bitworms are 16-bit vectors. There are solid bitworms, which consist of consecutive on-bits, and textured bitworms, which consist of alternating on/off bits. In each case, the part of the vector that's not a bitworm consists of off bits. Here are some examples:

Figure 1 Solid and Textured Bitworms

The Bitworm example program trains an HTM Network to model the world of bitworms. After the HTM Network has been trained, you can submit new data and the HTM Network uses the model of the bitworm world to discriminate between solid and textured bitworms.

Bitworm Example Components

The Bitworm example consists of the following files, discussed in more detail below:

Script
Description
See
RunOnce.py
Creates the HTM structure, trains and saves the HTM Network, then runs the network with new data. You can edit RunOnce.py to experiment with different training settings.
GenerateData.py
Generates training set data based on the settings in the RunOnce.py file.
GenerateReport.py
For each group, prints the coincidences to a file.
Grouping is an important part of the learning algorithm. You don't need to understand grouping or the learning algorithm for this simple example.
DisplayReport.py
Displays the groups discovered by the training run.
You must call this script explicitly, it's not called by RunOnce.py.
ParameterExploration.py
Illustrates how you can explore node parameters.
 
Cleanup.py
Removes all generated files.
 

Running the Example

This section explains how to run the example and how to explore what the HTM Network does by changing the example configuration. The example has been set up so you need to execute only one script.

To run the Bitworm example:

On Microsoft Windows, open a command prompt and type:

cd %NTA%\share\projects\bitworm 
python RunOnce.py 

On OS X and Linux - assuming $HOME/nta is the location where you installed the software - type the following at the command line:

cd $HOME/nta/current/share/projects/bitworm  
python RunOnce.py  
The example is set up so you always make modifications to the RunOnce.py script, then rerun RunOnce.py.

The script performs these tasks:

1. Calls GenerateData.py with the parameters set in RunOnce.py to generate a set of training data. The default is to generate temporally coherent data, that is, sequences of solid and textured bitworms of variable bitworm length. The minimum and maximum length are specified in the trainingMinLength/testMinLength and trainingMaxLength/testMaxLength parameters.

2. Creates the bitworm HTM Network, and calls helper functions to add nodes. The nodes are linked automatically.

3. Trains the HTM Network by calling the TrainBasicNetwork() function.

During training, the nodes learn, that is, they construct a model of their world. Training proceeds level by level, starting at the bottom.

4. When the top-level node receives input during training, it also receives category information. It groups the data based on the categories.

5. RunOnce.py calls RunBasicNetwork() to explore how the trained HTM Network handles new data. The trained HTM Network looks at each input bitworm and determines the probability that the bitworm belongs to one or the other category. This process of categorizing data based on previous training is called inference.

6. Finally, RunOnce.py calls GenerateReport, which prints the coincidences for each group to a file called report.txt.

Examining the Report

The GenerateReport.py script that is run as part of RunOnce generates a report file named report.txt that includes the following information:

   General network statistics: 
   Network has  5 nodes. 
   Node names are: 
       Sensor 
       CategorySensor 
       Level1 
       Level2 
       FileOutput 
 
 
   Node Level1 has 40 coincidences and 7 groups. 
   Node Level2 has 8 coincidences. 
   Performance statistics: 
   Comparing:  training_results.txt  with  training_categories.txt 
   Performance on training set:100.00%, 420 correct out of 420 vectors 
 
   Comparing:  test_results.txt  with  test_categories.txt 
   Performance on test set: 97.86%, 411 correct out of 420 vectors 
 

Note that the Bitworm example gets very good results because this is a toy problem: The assumptions matched those of the current learning algorithm precisely. Achieving the same degree of success for more complex problems can be more challenging.

   Getting groups and coincidences from the node Level1 in network ' 
   trained_bitworm.xml 

 

   ====> Group =  0 
   1 0 1 0 1 0 1 0 1 0 1 0 0 0 0 0  
   0 1 0 1 0 1 0 1 0 1 0 1 0 0 0 0  
   0 0 1 0 1 0 1 0 1 0 1 0 1 0 0 0  
   0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 0  
   0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0  
   0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1  
  
   ====> Group =  1 
   0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0  
   1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0  
   0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0  
   0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0  
   0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0  
   0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0  
   0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1  

This display can be helpful in a simple program, such as bitworms. It's easy to see how clean the groups are. Each group contains a different kind of bitworm. For other programs, using Numenta Visualizer or other tools to explore the grouping might be better than looking at a file. See Using HTM Network Visualizer and Plotting and GUI Packages Bundled with NuPIC.

You can look at the GenerateReport.py file to see what Python calls you can use to retrieve information from the network. Comments in the file make it easy to understand your options.

Displaying the Report

You can run DisplayReport.py to see a visual representation of the groups, for example:

Running the Example with Temporally Incoherent Data

You can change the useCoherentData parameter in RunOnce.py to generate solid and textured bitworms that are not presented in sequence, that is, that have no temporal relationship, as shown in Figure 2. Submitting those data to the trained HTM Network illustrates the importance of the temporal aspect of the training data.

Figure 2 Bitworms Presented Without Temporal Coherence

To run the example with incoherent data:

1. In the RunOnce.py script, change the useCoherentData parameter to False.

2. Execute RunOnce.py again.

The example runs with data that include both solid and textured bitworms but don't present sequences of solid bitworms followed by sequences of textured bitworms.

3. Examine the report.txt file this run generated. You should see that the HTM system found it difficult to find the groups and to categorize the data.

Running the Example with Noisy Data

The data generation script allows you to change the data by introducing some noise and to observe the results. There are two types of noise:

0
0
0
0
1
1
1
1
1
0
0
0
0
0
without noise
-.1
0.1
0
1.01
1
1.1
.98
.98
1.05
0
0.05
0.09
0
0.07
with noise
0
0
0
0
1
1
1
1
1
0
0
0
0
0
0
0
without bitflip
0
0
0
1
0
1
1
1
0
0
0
0
0
0
1
0
with bitflip

In the bitworm example, you can introduce noise to the data and see how the noise affects recognition.

To run the example with noisy data:

1. In the RunOnce.py script, return the useCoherentData parameter to True.

2. Change additiveNoiseTraining to 0.1. This setting adds uniform random noise in the range [-0.1, 0.1] to the inputs. Once you start adding noise to the inputs, it becomes harder for the learning algorithm to detect temporal coherence.

3. Execute RunOnce.py.

You get a Python exception with the message The current parameters generated 400 groups, which exceeds the maximum of 25 groups. This message means you didn't have enough outputs for the number of groups that were found. Although the number of underlying causes has not changed, the noise makes it harder for the algorithm to create a compact set of groups based on temporal coherence.

4. In RunOnce.py set maxDistance to 0.1 and save the revised file.

The maxDistance parameter sets the maximum Euclidean distance at which two input vectors are considered the same during learning. The default for this parameter is 0, so a change usually means better performance if some noise is present. See Affecting Learning Node Behavior With Node Parameters in Advanced NuPIC Programming.

5. Call RunOnce.py again. The script now runs without generating an exception.

6. Examine the report.txt file this run generated. You should see that the results are good; however, notice that the number of groups is fairly large compared to the original number. This is an indication that the learning algorithm found it more difficult to find the groups and categorize the data.

7. Set maxDistance to 0.2, rerun RunOnce.py, and examine reports.txt. This time the HTM gets the same number of groups as it did before you added noise.

This example illustrates how a combination of parameters (maxDistance and maxGroups) affects whether the HTM Network works well or does not work at all. If you wish, you can experiment with some of the other RunOnce.py parameters. See Table 2:, RunOnce.py parameters..

Understanding the Example Scripts

This section briefly discusses the Bitworm example scripts.

RunOnce.py: Your Entry Point to Bitworms

The RunOnce.py script runs the component scripts of the example in sequence. As a rule, you should always call RunOnce.py, not one of the component scripts.

RunOnce allows you to set the following parameters:

Table 2: RunOnce.py parameters.
useCoherentData
When set to true (the default), the GenerateData.py script creates sequences of solid bitworms followed by sequences of textured bitworms.
When set to false, the GenerateData.py script mixes solid and textured bitworms randomly. In that case, the temporal element is missing from the data.
numSequencesPerBitwormType
Number of sequences for each bitworm type. For example, you could present ten sequences of textured bitworms and ten sequences of solid bitworms. The sequences are always separated by a row of zeros (0). GenerateData.py always generates the same number of sequences of each type.
sequenceLength
Length of each sequence (e.g 20 bitworm vectors, followed by one vector of zeros).
trainingMinLength 
trainingMaxLength 
testMinLength 
testMaxLength 
Minimum and maximum length of the generated bitworms.
inputSize
Size of the input vector. Defaults to 16.
bitFlipProbabilityTraining 
bitFlipProbabilityTesting 
Probability that a bit will be flipped from 0 to 1 or vice versa, that is, a 0 bit becomes 1 or a 1 bit becomes 0. Can be combined with additiveNoise. Default is 0. See Running the Example with Noisy Data.
maxGroups
Maximum number of groups that can be learned at level 1.
maxGroupSize
Specifies how large the groups in the temporal pooler can become.
maxDistance
Sets the maximum Euclidean distance at which two input vectors are considered the same during learning. See Affecting Learning Node Behavior With Node Parameters in Advanced NuPIC Programming.

GenerateData.py: An Example of Data Generation

The GenerateData.py script generates a file with training data or testing data, plus an associated category file, using the parameter settings specified in RunOnce.py. Data are generated in sequences: For each sequence, the code generates a bitworm specified by sequenceLength using a random length and position (within the current parameter constraints), and then slides the bitworm left or right. At the end of each sequence, GenerateData.py inserts a line of zeros to reset the node so that the node does not attempt to learn temporal correlation between two bitworm sequences.

The script includes methods to generate data in which no temporal correlation exists. Those methods are called when the useCoherentData parameter is set to False.

Several aspects of this data setup are interesting:

RunOnce.py: Creating, Training, and Using the Trained Network

RunOnce.py performs the following tasks, discussed in this section:

Creating the Untrained HTM Network File

RunOnce.py creates the Network and adds the nodes using helper functions. See Constructing an HTM Network.

Figure 3 shows the hierarchy of nodes in the bitworm example. This is the simplest possible HTM hierarchy.

Figure 3 Nodes in the Bitworm Example

To create this hierarchy, the script goes through these steps:

1. Creates the Network instance.

   bitNet = Network() 

2. Uses the AddSensor(), AddZeta1Level(), and AddClassifierNode() function to specify each level in turn.

      AddSensor(bitNet, featureVectorLength = inputSize) 
  • AddZeta1Level() adds a Zeta1Node node, which uses the older learning algorithm. Other examples use AddLevel() instead to add spatial and temporal poolers that implement the new learning algorithm.
      AddZeta1Level(bitNet, numNodes = 1)
  • AddClassifierNode() adds a Zeta1TopNode by default and specifies the number of categories.
      AddClassifierNode(bitNet, numCategories = 2)

The bitworm network has a data sensor and a category sensor, one bottom-level node, one top-level node, and one effector. In this example,

3. The Network accessor functions are used to set parameters, for example:

   bitNet['level1'].setParameter('maxDistance',maxDistance)

   bitNet['level1'].setParameter('transitionMemory', transitionMemory)

Training the HTM Network

During training, each node in the HTM Network builds a model of its world using the available input data.

RunOnce.py performs training using the TrainBasicNetwork() function. This function trains the network created earlier and returns a RunTimeNetwork object, which contains the trained network. TrainBasicNetwork() uses the training files generated by GenerateData.py (see GenerateData.py: An Example of Data Generation).

bitNet = TrainBasicNetwork( 
      bitNet, 
      dataFiles = [trainingFile], 
      categoryFiles = [trainingCategories]) 

Running the Trained Network with New Data

RunOnce.py includes the RunBasicNetwork() function, which runs the trained HTM Network in inference mode using a given data file. You can invoke the function with the original data or with the new data.

RunOnce.py tests the network first with the training data.

accuracy = RunBasicNetwork( 
      bitNet, 
      dataFiles     = [trainingFile], 
      categoryFiles = [trainingCategories], 
      resultsFile   = trainingResults) 
print "Training set accuracy with HTM = ", accuracy*100.0 

RunOnce.py also submits new data to the trained HTM Network and judge how well the network learned the categories.

accuracy = RunBasicNetwork( 
      bitNet, 
      dataFiles     = [testFile], 
      categoryFiles = [testCategories], 
      resultsFile   = testResults) 
print "Test set accuracy with HTM = ", accuracy*100.0 

Numenta
www.Numenta.com
Table of ContentsPreviousNextIndex