Brains@Bay – Sensorimotor Learning in AI

In this Brains@Bay meetup, we are focusing on how sensorimotor learning (i.e. learning through interaction with the environment with a closed-loop between action and perception) can lead to more flexible and robust machine learning systems.

Speaker Lineup:
➤ Richard Sutton, DeepMind and University of Alberta
➤ Clément Moulin-Frier, Flowers Laboratory
➤ Viviane Clay, Numenta and University of Osnabrück

The talks were followed by a discussion panel and Q&A.

Meetup link:

Brains@Bay Meetups focus on how neuroscience can inspire us to create improved artificial intelligence and machine learning algorithms. Find more details here.


Richard Sutton, The Increasing Role of Sensorimotor Experience in Artificial Intelligence

Abstract: We receive information about the world through our sensors and influence the world through our effectors. Such low-level data has gradually come to play a greater role in AI during its 70-year history. I see this as occurring in four steps, two of which are mostly past and two of which are in progress or yet to come. The first step was to view AI as the design of agents which interact with the world and thereby have sensorimotor experience; this viewpoint became prominent in the 1980s and 1990s. The second step was to view the goal of intelligence in terms of experience, as in the reward signal of optimal control and reinforcement learning. The reward formulation of goals is now widely used but rarely loved. Many would prefer to express goals in non-experiential terms, such as reaching a destination or benefiting humanity, but settle for reward because, as an experiential signal, reward is directly available to the agent without human assistance or interpretation. This is the pattern that we see in all four steps. Initially a non-experiential approach seems more intuitive, is preferred and tried, but ultimately proves a limitation on scaling; the experiential approach is more suited to learning and scaling with computational resources. The third step in the increasing role of experience in AI concerns the agent’s representation of the world’s state. Classically, the state of the world is represented in objective terms external to the agent, such as “the grass is wet” and “the car is ten meters in front of me”, or with probability distributions over world states such as in POMDPs and other Bayesian approaches. Alternatively, the state of the world can be represented experientially in terms of summaries of past experience (e.g., the last four Atari video frames input to DQN) or predictions of future experience (e.g., successor representations). The fourth step is potentially the biggest: world knowledge. Classically, world knowledge has always been expressed in terms far from experience, and this has limited its ability to be learned and maintained. Today we are seeing more calls for knowledge to be predictive and grounded in experience. After reviewing the history and prospects of the four steps, I propose a minimal architecture for an intelligent agent that is entirely grounded in experience.

Clément Moulin-Frier, Open-ended Skill Acquisition in Humans and Machines: An Evolutionary and Developmental Perspective

Abstract: In this talk, I will propose a conceptual framework sketching a path toward open-ended skill acquisition through the coupling of environmental, morphological, sensorimotor, cognitive, developmental, social, cultural and evolutionary mechanisms. I will illustrate parts of this framework through computational experiments highlighting the key role of intrinsically motivated exploration in the generation of behavioral regularity and diversity. Firstly, I will show how some forms of language can self-organize out of generic exploration mechanisms without any functional pressure to communicate. Secondly, we will see how language — once invented — can be recruited as a cognitive tool that enables compositional imagination and bootstraps open-ended cultural innovation.

Viviane Clay – The Effect of Sensorimotor Learning on the Learned Representations in Deep Neural Networks

Abstract: Most current deep neural networks learn from a static data set without active interaction with the world. We take a look at how learning through a closed loop between action and perception affects the representations learned in a DNN. We demonstrate how these representations are significantly different from DNNs that learn supervised or unsupervised from a static dataset without interaction. These representations are much sparser and encode meaningful content in an efficient way. Even an agent who learned without any external supervision, purely through curious interaction with the world, acquires encodings of the high dimensional visual input that enable the agent to recognize objects using only a handful of labeled examples. Our results highlight the capabilities that emerge from letting DNNs learn more similar to biological brains, though sensorimotor interaction with the world.

Follow Up Q&A with Viviane

Q: You started your presentation talking about adversarial vulnerability. Did you investigate adversarial vulnerability for the networks you looked at? To all presenters, we might encounter tricky if not adversarial conditions in the environment. This could cause learning by costly mistakes, like crashing a robot. How might we go about learning to navigate adversarial terrain without such costly mistakes?

A: Very good question. I do have this on my very long list of things that I would like to look at but haven’t tested it yet. I did test how robust the agent is against noise and also against lighting conditions outside of the training distribution (eg. making the light red or purple) which seems to still work quite well. The results are in the appendix of this publication. As far as I know, reinforcement learning agents are also vulnerable to certain types of adversarial attacks. In practice this is of course dangerous. I would speculate that if the AI learns better models of the world that are grounded in experience and not just statistical correlations in a static dataset that these models should be less vulnerable. Or at least that perturbations which would trick these systems may also be more visible to humans. However, I did not systematically investigate this yet.

Q: What is the purpose of the reconstruction module?

A: The reconstruction module belongs to one of the control conditions; the autoencoder. I compare a neural network trained through interaction with a supervised non-interactive condition (classifier) and an unsupervised non-interactive condition (autoencoder). The autoencoder’s learning objective is to take the input image, compress it down to a representation of size 256 and use this small representation to reconstruct the input image again. The second part of this is called the decoder and is the reconstruction module that you ask about I assume.

Q: Did you consider comparing the Interaction based Representations with representations from self supervised methods like for example siamese networks?

A: As a self-supervised method I currently use a simple auto encoder as this did not require any sorting of images into categories and allowed me to use the exact same network structure to encode the images as I use in the agent. However, I am currently working on also comparing stronger self-supervised methods than the autoencoder.


Charmaine Lai, Viviane Clay, Subutai Ahmad and Jeff Hawkins • Brains@Bay