Table of ContentsPreviousNextIndex

Put your logo here!


2 Understanding HTM Development: Waves Example

This chapter explores the sequence of tasks involved in developing an HTM application. The chapter uses the Waves example to illustrate two aspects of each task:

The example in this chapter is slightly more complex than Bitworm. It uses the RuntimeNetwork API directly instead of just using the helper functions in Bitworm.

Topics

Development Overview

The process of creating an HTM application differs significantly from a traditional software engineering process. HTM Networks learn by building a statistical model based on a sequence of input data. The application developers task is not to specify an algorithm but to come up with a suitable representation of the data and to find the optimal parameter settings for the problem at hand. The quality of the final network depends on many factors, such as the network configuration (does the hierarchy have two levels or three?), node parameters (is maxDistance 0 or 0.2?), the training data (are there sufficient sequences?), and so on.

The HTM development process is iterative at a very high level. You don't just write the program, find problems, rewrite and rerun. Instead, each task in the process might influence other tasks earlier or later in the process. During HTM Network design, you might need to refine the problem definition. During testing and analysis, you might find that the data representation is less than optimal.

Pay particular attention to the early stages of the process. Unless your data are represented in a way that's easy to understand for the HTM (and often for humans as well), the results will most likely not be satisfactory.

Figure 4 Numenta Development Process

This chapter walks through the development process using the Waves example as the context.

Defining the Problem

Finding a problem that's well suited to an HTM Network and formulating that problem in a fashion that yields good results are important parts of the HTM development cycle. Many developers redefine the problem as they see the results of running early prototypes of their HTM Network.

See the white paper Problems that Fit HTMs for a discussion of problem definition. While the paper's focus is on what problems fit (and don't fit) HTMs, it implicitly discusses problem definition, for example, by contrasting problems that have temporal elements with problems that don't have them.

Problem Definition in the Waves Example

The Waves example generates data that represent temperature readings at fixed points in a moving stream. The example creates an HTM Network and trains the network with a set of input data. After training, the example submits new data to the HTM Network, and the network classifies those data.

This example assumes that 32 temperature monitors have been placed at fixed locations along the length of a stream. As time progresses, the monitors later in the stream see what the earlier monitors had seen.

This example assumes that the heat does not diffuse, and that hot and cold points move down the stream unaltered.

Figure 5 Waves Example

In the example, each hot spot is modeled as a Gaussian curve with positive amplitude; each cold spot is modeled as a Gaussian curve with negative amplitude. The state of the river is characterized as the combination of hot and cold spots.

The example uses these four state categories:

The warm and cold spots could be anywhere in the river. The goal is to have an HTM application that examines incoming data and recognizes whether the river is in one of the four categories.

The following illustration shows Category 0 (one warm spot).

Figure 6 River in Category 0 State (One Warm Spot)

`

Note that the graph was generated as a by-product of calling RunOnce.py. After the RunOnce.py script completes, you can find the graphs in a subdirectory called visuals. As you change the data generation parameters, you can examine the corresponding updated graphs.

Input to the Temperature Monitors

The input vectors have both a spatial and a temporal element. Having both elements is critical for a problem well suited for an HTM Network.

Notes on Problem Definition

Take special care with problem definition. If problem definition was not thought out carefully, the HTM algorithms might be unable to work with your problem.

Representing Data

You must represent the data to match the sensor node, which receives input for the HTM Network. You need to prepare multiple datasets: training data, category data (if available), and testing data. Here are a few points to consider:

   sensor.execute('loadFile', trainingFile, 3) 
 

If your data source does not produce vectors as output, manipulating the data so that they fit the available sensors is usually the best solution. Developers with source licenses might consider creating a custom sensor using the node plug-in API.

Data in the Waves Example Programs

The WavesData.py script generates data that are a sum of moving Gaussian curves plus a vertical offset. By default, each data time slice consists of 32 data points. There are four data categories; they differ in the numbers and signs of the curves. All curves have identical widths. Some curves have negative weight, some positive weight, and the number of curves ranges from 1 to 3. The centers of the curves are offset from each other by a little bit. The curves drift together at the same rate, which is 0.5 by default. The script saves the data in .txt format for easy submission to VectorFileSensor.

The goal is to have the HTM Network look at a set of data - representing the river in a certain state - and to determine whether the river is in one of the predefined four states. This categorization is trivial for a human to do, because it is easy to see how many humps are facing up and facing down, so it might also work well for the HTM system.

The Waves example data generator allows you to add two types of noise:

The data generator also allows you to set the amplitudes of each Gaussian curve with an amplitudes parameter. Changing this parameter affects the height and sign of each Gaussian curve. If you change the centers parameter, the Gaussian curves have different locations. Other parameters affect data generation as well.

Here's the code fragment from the RunOnce.py script. The script creates one set of data for training the HTM Network and one set for testing. For each set, you specify the number of temperature monitors and the thermal noise (which differs between training and testing).

print "\n====== GENERATING DATA ===========" 
trainData = WavesData() 
trainData['prefix']='train_' 
trainData['sensorDims'] = numTempMonitors 
trainData['thermalNoise'] = trainThermalNoise 
trainData['spatialNoise'] = trainSpatialNoise 
trainData.createData() 
   
testData = WavesData() 
testData['prefix']='test_' 
testData['randomSeed'] += 1 
testData['numSeqPerCat'] = 4 
testData['sensorDims'] = numTempMonitors 
testData['thermalNoise'] = testThermalNoise 
testData['spatialNoise'] = testSpatialNoise 
testData.createData() 

RunOnce.py includes a visualize.Data() method that allows you to view the data for any parameter combinations you choose.

Data Representation Questions

Assembling the data for training and testing the HTM requires organizing data correctly and ensuring that the data have the required spatial and temporal characteristics.

Consider the following questions when you prepare your data:

Designing and Creating the HTM Network Structure

Designing, configuring, and running the HTM Network are iterative steps. You usually start with a design that includes a sensor, a category sensor, and an effector and decide on the number of learning node levels. You then decide how many nodes you want at each level, and on any node parameters.

As part of analysis and testing, it might make sense to experiment with a different number of levels or to change other characteristics of the HTM Network.

Network Creation with the Network API and Helper Functions

You can use the AddSensor(), AddLevel() and AddClassifierNode() helper functions to create most basic networks. The functions create an HTM Network with these elements:

If you call...
Helper functions use these nodes by default...
AddSensor()
VectorFileSensor
AddLevel()
SpatialPoolerNode / TemporalPoolerNode pairs
AddClassifier()
Zeta1TopNode

Using the helper function makes sense if the HTM Network structure is simple.

Figure 7 Structure of Network Created with Helper Functions

The helper functions support the following network structure:

Network Creation with Node Constructors

When the default node type is not appropriate, for example, when you're using a non-default learning algorithm you can use node constructors. For example, if you want to use a non-default classifier node, you first create that node using a node constructor, then add the node to the network using AddClassifierNode().

knnNode = CreateNode('py.KNNClassifierNode') 
AddClassifierNode(myNet, numCategories = 4, nodeInstance = knnNode) 

The Waves example does not use node constructors because the helper functions are appropriate for the desired structure.

The net_construction folder includes examples for many different types of HTM Network creation.

Network Creation in the Waves Example

The Waves example creates the HTM Network structure as follows:

1. Create the network.

   net = Network() 

2. Add a sensor to the network, setting the vector length for the sensor to the number of temperature monitors.

   AddSensor(net, featureVectorLength = numTempMonitors) 

3. Add each level and set appropriate parameter size, requested coincidences, and groups. The example adds levels using a loop.

   for level,size in enumerate(levelSize): 
      AddLevel(net, numNodes = size, 
      requestedCoincidences = coincidences[level], 
      requestedGroups = groups[level]) 

4. Set parameters for experimentation. To have your HTM Network run well, you usually have to experiment with parameters.

   levelSpatialName, levelTemporalName = GetLevelNames(net, level+1) 

   net[levelSpatialName].maxDistance = maxDistance[level]

   net[levelTemporalName].transitionMemory = transitionMemory[level]

The example uses the parameters most likely to affect your HTM Network. See Affecting Learning Node Behavior With Node Parameters in Advanced NuPIC Programming for a complete list.

When you add a level, you add two regions: one region of spatial pooler nodes and one region of temporal pooler nodes. Pairs of nodes are connected by SimpleLink.

5. Finally, add a classifier node to the HTM Network.

   AddClassifierNode(net, numCategories = trainData['numCat']) 

Network Design Questions

When designing your HTM application, consider the following questions.

There are no easy answers to these questions. In many cases, you need to experiment with the available options. For example, you might run your HTM with a different number of hierarchy levels, then compare the results. The Numenta website includes discussions of those topics.

Running the Network to Perform Learning and Inference

During training, an HTM Network performs learning and inference, one level at a time. The RuntimeNetwork API and the RunBasicNetwork() helper function turn learning and inference on and off automatically.

This section gives only a brief introduction to the topic.

When you construct your HTM Network using AddLevel, the tool adds a region of SpatialPoolerNode instances and a region of TemporalPoolerNode instances, connected by single links. During learning, the learning algorithms work on that input on a per-level basis.

Here's a simple view of this process:

1. First, the system performs learning at the lowest level (Level 1). Level 1 nodes are in learning mode, and all nodes above Level 1 are disabled.

2. Next, the system performs inference at the lowest level (Level 1) and learning at the next level (Level 2).

The Level 2 SpatialPoolerNode and TemporalPoolerNode are in learning mode. While Level 2 is being trained, Level 1 must be in inference mode, that is, its grouped data must be available to Level 2.

3. The system performs inference at Level 2 and learning at the next level (Level 3).

4. The system continues performing first learning, then inference at each level until it reaches the classifier node. The classifier uses the data from the category sensor assign the data from the previous level to categories. If there is no category sensor, the classifier groups according to how the data are structured.

5. At the end of the run, each node is trained and in inference mode, ready to process more data.

Training in the Waves Example

The Waves example allows you to perform training and testing in one of two ways:

   trainedNet = TrainBasicNetwork(net, 
                dataFiles     = ["train_sensor.txt"], 
                categoryFiles = ["train_category.txt"]) 
   trainedNet.save("trained_waves.xml") 
 

After the network has been trained, it is called with new data to test the HTM Network. RunBasicNetwork() returns the accuracy of the trained network. Accuracy is the percentage of input data that were classified correctly by the HTM Network. To receive accuracy information, you must supply both test data and category data (or accuracy is None).

   accuracy = RunBasicNetwork(trainedNet,

                           dataFiles = ["train_sensor.txt"],

                           categoryFiles = ["train_category.txt"])

   print "Accuracy of trained network on original training data is ",       accuracy

Training Questions

Before you start training runs, asking the following questions can be useful:

Troubleshooting, Testing, and Analyzing Your HTM System

When you first attempt to run your HTM Network, you might have to do some troubleshooting if things don't work at all. Once the HTM Network runs satisfactorily on the training data set, you might wish to improve accuracy, speed, or both.

A number of tools and scripts allow you to view results and explore how the HTM Network arrived at the result.

   node = trainedNet(1_Temporal[0]) 
   print node.requestedGroupCount 
   node.requestedGroupCount = 3 
   node.execute(`<command>') 
   nodeHelp(`TemporalPoolerNode') 

It is often useful to examine intermediate results or to experiment with different parameter settings to understand how your HTM application processed input data. Based on that information, you can change how you enter data, change nodes and their parameters, or modify other characteristics of your HTM Network. See Debugging Your HTM Application.

Testing and Analysis in the Waves Example

The Waves example generates visualizations of the coincidence matrix and group structure when it's invoked. The visualization code generates a file and is commented out by default. The code looks like this:

if length == 'normal': 

print "Visualizing trained network..."

vis = Visualizer(networkFilename="trained_waves.xml",

                 dataType='bar_graph')

vis.visualizeNetwork(openBrowser=False)

To test Waves and view results:

1. First, run the trained HTM network using the existing input data. View the results by looking at the visualizations that are generated.

2. Modify the RunOnce.py file to change the input data or parameters. For example, try setting the testThermalNoise variable to a higher value. You can also change the maxDistance or requestedGroupCount parameters, which you can specify on a per-level basis in RunOnce.py.

3. Run the HTM network with the new configuration and examine the visualizations.

You can also use NodeInspector to interactively look at results.

Using RunExperiment.py to Explore Results of Changes

The RunExperiment.py script in the Waves/simplehtm directory illustrates how you can prepare different experiment runs to test the results of manipulating different aspects of the HIM's world. The available experiments include:

Experiment
Description
DifferentTestData
Varies test data offsets and Gaussian amplitudes while measuring accuracy.
MaxD
Varies the training data thermal noise and network maxDistance parameters while measuring accuracy.
NodeTypeTestNoise
Tests different inference modes (dot, product, productNonUniform, productNonUniformNonViterbi) against the thermal noise parameter in the testing data.
NodeTypeTrainNoise
Tests different inference modes (dot, product, productNonUniform, productNonUniformNonViterbi) against the thermal noise parameter in the training data.
TestingNoise
Varies the test data thermal noise while measuring accuracy.
TrainingData
Varies the amount of training data, Gaussian spatial noise of the training data, and Gaussian locations of the test data while measuring accuracy.
WhereItFails
Reports which test data points are correctly and incorrectly classified when using the default parameters.

Testing and Analysis Questions

Tuning your HTM Network is the biggest part of the development process.

Remember that you might have to revisit data generation or submission, or even problem definition to arrive at a useful HTM Network design and implementation.

Summary and Guide to Documentation

The process from problem definition to HTM deployment is not linear. At each stage, you might need to return to an earlier stage for redefinition or refinement.

For example, when analyzing the data, you might discover the data gathering or data representation approach needs to change. When designing the HTM Network, you might find that the problem definition needs to be refined. Each stage is directly affected by other stages. You can deploy the HTM Network only when you're satisfied that it can generate satisfactory results for those new data; achieving good results for one set of data is not enough.

The following table summarizes design and implementation tasks and lists relevant documentation for each task. See How to Access NuPIC Built-in Help for information on getting help while working with the APIs.

Table 3: Tasks Overview and Documentation
 
Task
Documentation (Concepts)
Documentation (Task-based)

Numenta website.
Numenta website.
Look at example code for examples of problem definition.
Numenta website.
Numenta website.
Look at example code for examples of data representation (Pictures example) and data generation (all examples).
Numenta website.
Constructing an HTM Network. Python online help.
The Numenta Node Algorithms Guide
Numenta website.
 
For information about the learning algorithm, see the Numenta Nodes Algorithms Guide. This white paper helps you understand how an HTM Network that uses Numenta learning nodes analyzes your data and how different node parameters affect the learning behavior. This, in turn, might help your understand how to improve your application's effectiveness.

Numenta
www.Numenta.com
Table of ContentsPreviousNextIndex