Tutorial on Computational Vision or Theoretical Vision

The material here is quite old. You may find this recent textbook (published May 2014) on computational vision more useful.

This webpage is still being constructed. It 's original aim is to provide a mean for the new members of the laboratory of natural intelligence, who may be unfamiliar with the topic, to get a quick overview on theoretical vision and vision models. Any other researchers or students, whether from theoretical or experimental background, are welcome to use the material. Any comments to improve this tutorial, or contributions to add to it, are very welcome, and please send them to Dr. Li Zhaoping by email.

Contents:

Early Visual Sampling and Coding.

Retinal Sampling

Retinal Coding

Cortical Coding

Pre-attentive Vision.

Saliency Map.

A V1 model of intra-cortical interactions for the saliency and pre-attentive segmentation

Pre-attentive Stereo segmentation and correspondence

A model of how V2 neurons achieve tunings to surface border ownership through intra-cortical interactions

Attentive vision

On early visual sampling and coding.

Why do the retinal ganglion cells have center-surround receptive fields? Why are they single-opponent in coding color, e.g., they have red-center-green surround receptive fields? How should we understand the ocularity of the neurons in the primary visual cortex? What might have caused the orientation selectivity of the cortical receptive fields? How do we understand adaptation of the receptive fields to the visual environment, such as the average lightness or color composition? It has been proposed by Barlow and others that the early visual coding is to achieve efficient coding, such that maximum amount of information is transmitted or represented using minimum amount of resources such as the number of neurons or their overall activities. For a quick review on this, read Optimal Sensory Encoding or postscript version (by L. Zhaoping, in The Handbook of Brain Theory and Neural Networks: page 815-819, The Second Edition. Michael A. Arbib, Editor MIT Press 2002 ).Click here for lecture notes from a lecture on this topic.

Optimal retinal sampling

Is the sampling arrangement of the receptors on the retina, or are the cone sensitivities, determined from some information theoretical considerations? See"The distribution of visual objects on the retina: connecting eye movements and cone distributions", by Lewis, Garcia, and Zhaoping, 2003. and Are cone sensitivities determined by natural color statistics? by Lewis A. and Zhaoping L. (2006)

Retinal coding

It has been proposed that early visual processing, such as those at retina and lateral geniculate nucleus, aims to remove the correlations in the visual inputs, such as correlations between two pixels, or between signals in two cones. Such correlations cause information redundancy in the visual inputs. Removing this redundancy can increase coding efficiency greatly. This proposal have led to understanding of the retinal receptive fields, see the following papers.

For more specific answers on specific questions:
On retinal coding and related:

Atick, J. J. (1992). Could information theory provide an ecological theory of sensory processing? Network, Computation in Neural System 3:213-251.

On retinal and cortical color coding:

Atick, J.J., Li, Zhaoping and Redlich A. N. (1992) Understanding retinal color coding from first principles. Neural Computation 4 559-572.[abstract]

J.J. Atick, Zhaoping Li, and A.N. Redlich (1992) What does post-adaptation color appearance reveal about cortical color representation? Vision Research Vol.33 p. 123-129 (1993) [abstract]

Alternatively, it is viewd that, by removing the input correlation, the retina achieves maximum information extraction given limited information capacity of the optic nerve. This view is termed Informax (Linsker 1989). Although this appraoch explains the receptive field properties of the majority of the retinal ganglion cells termed X cells in cats or P cells in monkeys, it does not account for another class of retinal ganglion cells termed Y cells (in cats) or M cells (in monkeys). The Y(M) cells are more transient in responses and have larger receptive fields. It has been proposed that they served to extract minimum necessary information as fast as possible (Infofast). See:

Zhaoping Li Different retinal ganglion cells have different functional goals. International Journal of Neural Systems, Vol. 3, No. 3 (1992) 237-248. [abstract]

Cortical Coding

A multiscale wavelet-like visual coding, with orientation-selective receptive fields, resembling that in the primary visual cortex is yet another visual coding. It can be show that this coding removes as much information redundancy arising from second order correlations as the retinal coding by the center-surround receptive fields. See the following papers:

On multiscale coding, orientation selectivity, and double opponency color coding of cortical cells:

Zhaoping Li and J.J. Atick (1994) Towards a theory of striate cortex or postscript version Neural Computation Vol.6, p. 127-146. [abstract]

On stereo coding, ocularity, disparity selectivity in cortical cells:

Zhaoping Li and J.J. Atick (1994) Efficient stereo coding in the multiscale representation or postscript version Network Computations in neural systems Vol.5 157-174. [abstract]

Danmei Chen and Zhaoping Li (1997) A psychophysical experiment to test the efficient stereo coding theory or postscript version Theoretical aspects of neural computation K.M. Wong, I. King, and D.Y. Yeung (eds), p225-235, Springer-verlag January 1998

Zhaoping Li (1995) Understanding ocular dominance development from binocular input statistics or postscript version The neurobiology of computation (Proceeding of computational neuroscience conference 1994) p. 397-402. Ed. J. Bower, Kluwer Academic Publishers, 1995. [abstract]

On motion coding, directional selectivity in cortical cells:

Zhaoping Li (1996) A theory of the visual motion coding in the primary visual cortex or postscript version Neural Computation vol. 8, no.4, p705-30, May, 1996 [abstract]

There are also redundancies in natural images not accounted for by second order (pair wise) correlations. In particular, the signals in three neighboring pixels in an image can be correlated in a way that can not be accounted for by the second order correlation between two pixels. Hence, there have been proposal that the visual coding in the primary visual cortex aims to remove the visual information redundancy in the third and higher order correlations. See this website for papers on this point of view. However, one can measure the amount of additional redundancy contained in the third order statistics in the natural scenes. It has been shown that while the redundancy from second order statistics contributed roughly 50% of single pixel information (measured in the unit of bits), the third order statistics contributed only about 6%. See this paper:

On the statistics of natural images:

Petrov Y. and L. Zhaoping Local correlations, information redundancy, and the sufficient pixel depth in natural images. Journal of Optical Society of America A. Vol. 20. No. 1, p56-66 2003 [abstract]

Hence, it would be surprising that the brain devoted a huge visual area (which is about 12% of the neocortex of a monkey) just to improve the information coding efficiency by a small fraction. The visual representation in the cortex is likely to serve other computational needs.

On pre-attentive vision and visual segmentation.

What is the goal of primary visual cortex? I have proposed that, in primary visual cortex, pre-attentive visual computation is carried out to detect and highlight conspicuous image locations for pre-attentive segmentation. In particular, intracortical interactions are the mechanism to locate where input homogeneity breaks down, such as at object boundaries and contours. The outcome is a saliency map of the visual scene, acting to attract visual attention for further processing. Hence, the saliency of a visual location is signalled by the neural firing rate of the most responsive cell responding to it. This theory of the primary visual cortex links physiology and anatomical data on intracortical interactions in V1 (primary visual cortex) to psychophysical data on visual segmentation and visual search and grouping, and provided testable predictions, some of which are confirmed by subsequent experiments.

Saliency Map: See a lecture on the saliency map in primary visual cortex.

An opinion article on saliency map in primary visual cortex:Zhaoping Li (2002) A saliency map in primary visual cortex Trends in Cognitive Sciences Vol 6. No.1. Jan. 2002, page 9-16 [abstract]
The role of the saliency map in visual search: Zhaoping Li (1999) Contextual influences in V1 as a basis for pop out and asymmetry in visual search Proceedings of National Academy of Science, USA Volume 96, 1999. Page 10530-10535 [abstract]

Contrasting with previous works on saliency maps (by Koch, Ullman, Itti, Wolfe etc):

(1) This work specifies V1 as the location of the saliency map, previous works never specified where the saliency map resides in the brain, and implicitly or explicitly stated that it should be beyond V1.

(2) In previous works, separate feature maps are needed before the master saliency map which somehow combines the outcomes of the separate feature maps. The hypothesis of a saliency map in V1 does not need separate feature maps, nor any subsequent combinations of them.

Testable Predictions from the V1 Saliency Map theory:

(1) Even though humans typically can not tell the eye of origin (encoded mainly in V1) of visual inputs, unique eye of origin in visual inputs should attract attention automatically, even without awareness, tested and confirmed in 2007, see Zhaoping 2008.

(2) Since no seperate feature maps or any combination of them is required to make a saliency map, there should be interference by irrelevant features in visual search or segmentation tasks which rely heavily on bottom up saliency (e.g., in reaction time conditions). Tested and confirmed by Zhaoping and May 2007.

(3) Due to the distinguishing properties of neural receptive fields in V1, the theory predicts that the reactions time (RT) to find a color-motion double feature singleton is equal to the smaller of the two RTs to find the corresponding single feature, color or motion, singletons, and that the RT to find a color-orientation double feature or orientation-motion double feature singleton should be smaller than the smaller of the two RTs to find the corresponding singleton feature singletons. Tested and confirmed in Koene and Zhaoping 2007.

(4) fMRI and ERP (C1 component) evidence of the saliency map in V1 in Zhang, Zhaoping, Zhou, and Fang 2012.

See a full list of the predictions and their experimental tests.

Relative contributions to attentional guidance from mechniams within and beyond V1

Neural correlates of the bottom-up saliency signals have been observed in higher brain areas such as LIP (lateral-intraparietal cortex) and V4. These signals may be relayed from V1 rather than created in these higher areas, which perhaps combine the bottom-up saliency signal with top-down task related signals for visual processing. Further investigations are needed to find out the relative contributions by V1 and brain areas outside V1 to attentional guidance. For one such investigation, see Zhaoping et al (2009) which investigates the contributions from V1 and extra-striate cortex, and suggests that contributions from V1 dominates in the first few hundred millisecond after stimulus onset.

A V1 model demonstrating how the intracortical interactions leads to the desired/proposed pre-attentive visual computation and saliency map.

For readers more interested in analytical and mathematical details, and the relations of this work with other modeling and computer vision literature, see Zhaoping Li (1999) Visual segmentation by contextual influences via intracortical interactions in primary visual cortex (Postscript) Or PDF version In Network: Computation in Neural Systems Volume 10, Number 2, May 1999. Page 187-212 [abstract] Correction: In Oct. 2000, Dr. Yury Petrov helped me to discover a typo on page 209 (page 23 of the manuscript), 9th line from the bottom of the page. "d >= 10" printed in the published version should be "d/cos( beta/4) >= 10" instead. This is an error in a model parameter, and this error should lead to quantitative changes in model performance compared to those presented in this paper. Also, Delta (theta) in the expression for W means |Delta (theta)|.

For readers more interested in relations of this work with physiological/psychophysical data and experiments, (and less mathematical analysis), see Zhaoping Li (2000) Pre-attentive segmentation in the primary visual cortex (postscript) Or PDF version Published in Spatial Vision , Volume 13, Number 1, p. 25-50, (2000). [abstract]

To design and study the nonlinear dynamics in the model: Zhaoping Li (2001) Computational design and nonlinear dynamics of a recurrent network model of the primary visual cortex or PDF version Neural Computation 13/8, p.1749-1780, 2001 [abstract]

On specific manifestation or phenomena by the pre-attentive computation in V1

Contour enhancement: Zhaoping Li (1998) A neural model of contour integration in the primary visual cortex (postscript) Or PDF version Neural Computation. 10. 903-940, 1998 [abstract]

"Figure-ground" phenomena in V1: Zhaoping Li (1999) Can V1 mechanisms account for figure-ground and medial axis effects? or PDF version ([abstract]), or see V1 mechanisms and some figure-ground and border effects. In Journal of Physiology (Abstract) (Zhaoping 2003).

Contextual surround properties of V1 receptive fields: Zhaoping Li and John Hertz (2000) Multiple zones of contextual surround for V1 receptive fields (postscript) Or PDF version Annual Meetings of Society for Neuroscience, 2000. Abstract # 211.10 [abstract]

On conjunction searches and double feature searches in psychophysics: Li Zhaoping (2002) Understand conjuction and double feature searches by a saliency map in primary visual cortex Second annual meeting of Vision Science Society, May 10-15th, 2002, Sarasota, Florida, USA. [abstract]

On interaction between feature dimensions (such as color and orientation) for visual search and segmentation: Li Zhaoping (2002) A saliency map model explains the effects of random variations along irrelevant dimensions in texture segregation and visual search. Annual Meeting of Society for Neuroscience, Orlando Florida, USA, Nov. 2-7, 2002.[abstract]

Pre-attentive stereo vision

Most models of early stereo vision focuses on stereo correspondence, i.e., matching the monocular images in the left and right eyes. However, such models omit the problem stereo segmentation such as detecting and highlighting the depth boundaries or enabling a lone depth target to pop out of a background composed of objects at a different depth. Mechanistically, the neural interactions required for stereo correspondence seem to be the opposite of those required for stereo segmentation (to highlight depth edge or depth popout). Here I introduce the first model that addresses both correspondence and segmentation. See Li Zhaoping (2002) Pre-attentive segmentation and correspondence in stereo, Philosophical Transactions of The Royal Society, Biological Sciences, Vol. 357, Number 1428, page 1877-1883 [abstract]

V2 neuron's tuning to surface border ownership via intra-cortical interactions

V2 neurons tuned to oriented contours or surface borders have been observed to have different responses depending on which of the two surfaces on the two sides of the border is perceived to own the border. This is called border ownership, and is important for surface perception since the border is usually percepted to be owned by the occluding surface. Which mechanisms are responsible to generate the tuning to border ownership (BOWN)? Is it from top-down feedbacks from higher visual areas? This paper: Zhaoping L. (2005) Border Ownership from Intracortical Interactions in Visual Area V2 , in NEURON, Vol. 47, 143-153, suggest that BOWN tuning could arise from intra-cortical interactions.

Attentive Vision

The works in this lab on visual attention is still in the beginning, and can be seen by clicking here.
.