Attentional tracking with deep learning

Learning where to attend with deep architectures for image tracking
M. Denil, L. Bazzani, H. Larochelle, and N. de Freitas
Neural Computation, 2012
RBM tracker code / video / bibtex
  title = {Learning where to Attend with Deep Architectures
           for Image Tracking},
  author = {Denil, M. and Bazzani, L. and Larochelle, H.
            and {de Freitas}, N.},
  journal = {Neural Computation},
  year = {2012}

Learning attentional policies for object tracking and recognition in video with deep networks
L. Bazzani, N. de Freitas, H. Larochelle, V. Murino, J-A Ting
The 30th International Conference on Machine Learning (ICML), 2011
RBMtrack code / presentation / recorded talk / video / bibtex
  title = {Learning attentional policies for object tracking and
           recognition in video with deep networks},
  author = {Bazzani, L. and de Freitas, N. and Larochelle, H. and
           Murino, V. and Ting, J-A},
  booktitle = {Proceedings of the 28th International Conference on
           Machine Learning (ICML-11)},
  year = {2011},
  address = {New York, NY, USA},
  editor = {Lise Getoor and Tobias Scheffer},
  month = {June},
  pages = {937--944},
  publisher = {ACM},
  series = {ICML '11}
Cite one of these papers if you use the following code and dataset


In this papers, we propose a novel attentional model for simultaneous object tracking and recognition that is driven by gaze data. Motivated by theories of the human perceptual system, the model consists of two interacting pathways: ventral and dorsal. The ventral pathway models object appearance and classification using deep (factored)-restricted Boltzmann machines. At each point in time, the observations consist of retinal images, with decaying resolution toward the periphery of the gaze. The dorsal pathway models the location, orientation, scale and speed of the attended object. The posterior distribution of these states is estimated with particle filtering. Deeper in the dorsal pathway, we encounter an attentional mechanism that learns to control gazes so as to minimize tracking uncertainty. The approach is modular (with each module easily replaceable with more sophisticated algorithms), straightforward to implement, practically efficient, and works well in simple video sequences.

Download Attentional Tracker

(link to github)

Download Synthetic dataset

See the instructions in the file.

Some cool videos

<< Go back to main page