Geoffrey Hinton recently published a paper “How to Represent Part-Whole Hierarchies in a Neural Network” and presented a new theory called GLOM.
We’ve been getting a lot of questions lately as to the differences between Hinton’s GLOM model and Numenta’s Thousand Brains Theory. In this blog, I will outline the commonalities and main differences of both models at a high level. Those who want more details can watch the presentation by our researcher Marcus Lewis where he discussed the GLOM model through the lens of the Thousand Brains Theory.
What is the Thousand Brains Theory of Intelligence?
The Thousand Brains Theory is a sensorimotor theory that models the common circuit in the neocortex and suggests a new way of thinking about how our brain works. Vernon Mountcastle was the first to propose that every cortical column in the neocortex is performing the same computation at every region, and every level of the hierarchy. The only difference is the change in input. Based on Mountcastle’s theory, the Thousand Brains Theory proposes that rather than building one model of an object, the brain builds thousands of models of the object in parallel.
Each column builds a model with different sensory inputs such as the different fingers on your hand. The columns then vote together to reach a single interpretation of what they are sensing. The consensus vote is what we perceive.
The key to the theory is the notion that every cortical column in the neocortex learns models of complete objects through movement. We learn a model of the world by observing how our sensory inputs change as we move. For example, as our eyes saccade, the cortical columns in our brain are constantly making predictions as to what the new sensation will be.
The theory also proposes that the neocortex primarily processes reference frames created by every cortical column. Based on reference frames, columns can associate sensory input with the relative positions and structures of objects (e.g. when you touch a cup, each sensation you receive is processed relative to its location on the cup – reference frames allow that to happen). For more technical details, click here.
What is the GLOM model and how is it similar?
The GLOM model builds on Hinton’s earlier work on Capsules. Back in 2017, our VP of Research, Subutai Ahmad, shared his thoughts on how Hinton’s capsule theory compares to Numenta’s HTM Sensorimotor Theory. You can read the blog here.
GLOM is a computer vision model that suggests a new approach to improving AI visual scene understanding. Similar to the Thousand Brains Theory, the GLOM architecture consists of a large number of structurally similar columns.
The GLOM model proposes that each column consists of five different levels of representation of an object, associated with a specific location, in varying levels of abstraction. (e.g. when you touch a cup, the bottom level of the column creates a representation of a curved edge, the level above represents a cup handle etc.). Hinton proposes that over time, the representations at each level should settle down and vote to produce distinct islands of nearly identical representations. Ultimately, every column modeling the object would resolve into a coherent representation of the object at its top level (e.g. coffee cup).
Since Hinton uses cognitive science to inform his research and the Thousand Brains Theory is a model of the neocortex, it is logical that Hinton will come across the same structures that underpin the Thousand Brains Theory.
Key similarities to note:
- Organize into many structurally similar columns and layers
- Every column associates the sensory input with a specific location
- Many columns learn representations of the same objects
- Every column learns to represent entire objects
- Lateral connectivity is used for localized sharing of object representations between neighboring columns
- Every column uses bottom-up, top-down, and lateral sources to vote for the representation of the object perceived.
What are the differences between the GLOM model and the Thousand Brains Theory?
A key concept of the Thousand Brains Theory is that the neocortex learns through movement. When we move, our neocortex learns thousands of models of the world around us. GLOM is primarily concerned with processing a “single fixation of a time-varying image.” It treats vision as a sequence of frames, so a static image will be treated as a sequence of identical “fixated” frames. But vision is not stationary, it is an interactive process. We do not look at the world in fixed steadiness; instead, our eyes make quick saccadic movements about three times a second. With every saccade, our visual inputs to the brain change. When we move, our visual inputs also change. More generally, Hinton’s paper doesn’t address the sensorimotor problem, a core component of the Thousand Brains Theory. Hinton will likely need to incorporate movement in order to successfully train GLOM to recognize objects from novel viewing angels and explain how the brain processes vision.
The Thousand Brains Theory has a very nontraditional notion of hierarchy while GLOM assumes the traditional hierarchy of feature detectors within each column. In GLOM, a single column spans the entire hierarchy as each level in the column represents a level of abstraction (i.e. level of hierarchy). To learn a model of an object, information is passed from a bottom-up and top-down approach within the same column. On the contrary, each individual column in the Thousand Brains Theory, as in the neocortex, is situated at one level of the hierarchy. Cortical columns work together across the hierarchy, across different regions in the brain, and across sensory modalities.
Mapping to the Neocortex
The biggest difference between the GLOM model and the Thousand Brains Theory is that GLOM is inspired by the brain but does not map to specific cortical circuitry or experimental neuroscience. Hinton frames this as a computer vision problem as he uses cognitive science to tackle the question “How can we make computers intelligent?” In the paper, he notes that GLOM is biologically inspired but “it has several features that appear to make it very implausible as a biological model.” Hinton offers some mathematical descriptions for parts of GLOM, but not a biological circuit.
At Numenta, we frame our research as an intelligence problem and address the question “Why are we intelligent?” Our goal is to understand the function and operation of the brain and apply those core principles to today’s machine learning systems. Unlike GLOM, the Thousand Brains Theory is not just biologically plausible, it is biologically constrained. It models many of the anatomical and physiological details of the common repeating circuit in the neocortex that underlies all intelligence.
It will be interesting to see how GLOM develops over the coming years. It is exciting to see how researchers are going beyond the simple structures that underlie most of the deep learning research and are thinking out of the box. GLOM offers an innovative way to process and represent visual information in neural networks. Despite the similarities at first glance, in the end, our approach with The Thousand Brains Theory is quite different. The brain is currently the only truly intelligent computing machine with an unparalleled ability to learn and adapt. To create truly intelligent systems, we should first understand how the brain works.