This is the project page of paper Visual Sequence Learning in Hierarchical Prediction Networks and Primate Visual Cortex in NeurlPS 2019. In this paper we developed a computational hierarchical network model to understand the spatiotemporal sequence learning effects observed in the primate visual cortex. The model is a hierarchical recurrent neural model that learns to predict video sequences using the incoming video signals as teaching signals. The model performs fast feedforward analysis using a deep convolutional neural network with sparse convolution and feedback synthesis with a stack of LSTM modules. The network learns by minimizing its prediction errors of the incoming signals at each level of the feature hierarchy. We found that recurrent feedback in this network lead to the development of semantic cluster of global movement patterns in the population codes of the units at the lower levels of the hierarchy. These codes facilitate the learning of relationship among movement patterns, yielding state-of-the-art performance in long range video sequence predictions on benchmark datasets. This model automatically exhibits the neurophysiological correlates of visual sequence memories we observed in the early visual cortex of awake monkeys, suggesting the principle of self-supervised prediction learning might be relevant to understanding the cortical mechanisms of representational learning.


Results on Moving-MNIST and KTH datasets.



Familiarity supression

We performed a video learning neurophysiological experiment on V2 neurons in two awake behaving monkeys with Gray-Matter semi-chronic multielectrode arrays (SC32 and SC96) implanted over their V1 operculum3. Six experiments were carried out. Each lasted over 7 days, with daily recording sessions. In each daily session, we presented a set of 20, 800-ms long movie clips to the monkey, 20-25 times a day, so that over time, this set of movies became familiar (and predictable) to the monkey. This set is called the Predicted set or Familiar set. Every day, we also tested another set of 20 movie clips that were different daily. These sets are called the Unpredicted sets or Novel sets. Both sets of movies (8o in diameter) were presented daily, one clip per trial, at the same location on the computer monitor relative to the red spot the monkeys fixated on during each trial. With this experimental paradigm, we can compute and compare the daily temporal responses (PSTH or Peri-stimulus histogram) of all the neurons across all the movies in the Predicted set and in the Unpredicted set to monitor the development of sensitivity to memory of the familiar or predicted movies.

Related Projects

Neural Correlate of Visual Familiarity in Macaque Area V2
Ge Huang, Suchitra Ramachandran, Tai Sing Lee and Carl R. Olson
Journal of Neuroscience 17 October 2018, 38 (42) 8967-8975; DOI:

Convolutional neural network models of V1 responses to complex patterns
Yimeng Zhang, Tai Sing Lee, Ming Li, Fang Liu, and Shiming Tang
Journal of Computational Neuroscience doi: 10.1007/s10827-018-0687-7

Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning
William Lotter, Gabriel Kreiman, David Cox
In ICLR 2017