Information on CVPR 2022 tutorial: A post-Marrian computational overview of how biological (human) vision works this website is being updated, more videos/materials will be added as they become available.

Marrian: vision in 3 stages: primal sketch, 2.5 D model, and 3D model (a framework by David Marr as in his 1982 book on Vision)

Post-Marrian: vision in 3 stages: encoding, selection, and decoding ( a selection-centered framework for vision)

Why: e.g., do you like to know how is human vision not so easily fooled as CNN vision is? Why is attentional bottleneck a key to understanding human vision?

where: online and on site at CVPR 2022, a premier annual computer vision event, New Orleans, USA.

when: full day (exact schedule for each lecture to be announced) on June 19, 2022

Lecturer: Li Zhaoping at Natural Intelligence Labs, Max Planck Institute for Biological Cybernetics, email: li.zhaoping AT tuebingen.mpg.de

Community before the tutorial day: If you are thinking of participating, we encourage you to email the lecturer with your name, email address, and a brief statement about yourself (e.g., your interest in this tutorial and your research interests), so that you can be included in a pre-tutorial communication group (via slack and/or occasional emails) for tutorial preparation, e.g., to get support for your learning, answers to your questions, and ideas to exchange and discuss.

Target audience: Anybody (students, postdocs, and faculty members) from the CVPR community. Ths tutorial is designed to allow people coming from physical science (physics, engineering, computer science) to understand biological vision,without prior knowledge of technical jargons from neuroscience.

Tutorial format: There will be 5 lectures, with the later lectures, and particularly the last one, to include discussions with the machine vision participants on the interaction between the biological and machine vision research. The content can be much better digested, so that you can be more participative in the discussions, if you preview at least some of the video lectures recommended in the content below. Each tutorial lecture could be seen as a (quite compressed) summary or highlights of the video lectures (and sometimes with other preparation materials) + discussions. It is feasible to get an overview of the main issues by attending the lectures without previewing any videos. Some parts of the tutorial content could be adjusted (by a limited extent) based on the interests of the audience (to give your inputs, please email the lecturer).

Part of the tutorial material will be from "Understanding Vision: theory, models, and data" (Oxford University Press, 2014), a comprehensive and contemporary textbook on computational vision, and another part will be from recent literature. The last part of the tutorial will include a discussion on the relationship between biological vision and machine vision.

Tutorial content:

(1) Computational overview of, and a very brief summary of neural and perceptual facts about, biological vision

Introduction of a post-Marrian framework of vision as composed of three stages: encoding, selection, and decoding. This framework puts in center stage attentional selection, i.e., the choice of only a tiny fraction of the massive input data for deeper processing by the brain. This defines vision in a fundamentally different way from that in most computational vision and machine vision approaches. This framework links data or facts from psychology (on visual perception/illusions) with those from neuroscience (neurons and circuits). A summary of these facts will be given. Much of this lecture will draw from materials in chapter 1 and 2 of "Understanding Vision: theory, models, and data"

Get a quick glimpse from this 3.5 minute video , and the figures of chapter 1 and the figures chapter 2.

A quick introduction to what is known experimentally about biological vision in this growing play list of shorter video clips (about 10 minute each) .

(2) Understanding early visual processing mechanisms by the principle of efficient encoding.

This lecture concerns the brain mechanisms for encoding, the first of the three stages of vision, before the attentional bottleneck and visual inference (decoding). These mechanisms are manifested in neural receptive fields (like linear or nonlinear filters in engineering) in the retina and the primary visual cortex (V1) in the brain. It is based on chapter 3 (see its figures ) of "Understanding Vision: theory, models, and data" . Efficient coding will be framed as an optimization between information (bits) extracted and neural coding cost (e.g., dynamic range of neurons). The efficient coding principle provides testable predictions as to how neural mechanisms can be shaped by statistical properties of the visual environment, and how visual adaptation changes our perception in psychology experiments. Preview materials include:

this video lecture (51 minutes) for an very brief overview: Introduction to Efficient Coding .

In more detail: a playlist of short video lecture clips (about 10 minutes each) .

Also, Problems with understanding V1 by efficient coding principles (15:20 minutes) , based on chapter 4 of "Understanding Vision: theory, models, and data" .

(3) The attentional bottleneck and how it fundamentally shapes visual algorithms, behaviour, and perception

Limited brain resources for information processing force the brain to select only a tiny fraction of visual input data for deeper processing, for instance to recognize objects. Selection is the second of the three stages for vision. To many, it is counter-intuitive that this selection logically precedes the stage for visual recognition, but this order fundamentally shapes the framework for vision. This lecture briefly presents some known neural mechanisms (e.g., the neural circuit in V1) and the corresponding visual behaviour (e.g., eye movements, attentional blindness, visual awareness), and makes apparent what is yet to be understood. Preview materials include:

an hour long lecture Introduction to visual attention and visual salience ,

chapter 5 of "Understanding Vision: theory, models, and data" .

In more detail,here is a growing play list of shorter video clips (about 10 minutes each) .

(4) Visual decoding (recognition) and visual understanding by feedforward and feedback processes along the visual hierarchy

This lecture concerns the third of the three stages in vision, i.e., after the attentional bottleneck. The lecture will start from examples of inference of visual object features from neural signals in the retina and the cortex using, e.g., maximum-likelihood and Bayesian inference. These will be linked with neural and behavioural data, and with ideal observer approaches in vision science. Next, the attentional bottleneck motivates visual decoding by the feedforward and feedback processes along the visual hierarchy. This will provide insights into visual illusions, the difference between central and peripheral vision (which is more than mere visual acuity), eye movements, and visual understanding. This visual decoding process is similar to,and different from,visual recognition by deep neural networks(DNNs). Current day DNNs, which are vulnerable to adversarial attacks, imitate human peripheral vision. Preview materials include:

Introduction to visual decoding ---part 1 , a 25-minute long lecture on the definitions and neural/behavioural observations of visual decoding,

Introduction to visual decoding ---part 2 , a 51-minute long lecture on algorithms and examples of visual decoding.

A more detailed playlist of short videos will be posted here if made available in time, otherwise, the two videos above combined with the paper below is adequate.

A recent (2019) paper on the new framework for understanding vision this paper contains some essential contents not in the book

Figures of chapter 6 of "Understanding Vision: theory, models, and data" .

(5) Relationship between vision by machines and vision by brains.

This lecture will dive deeper into the similarity and differences between the two visual systems, and how the two research communities could draw inspirations from each other.

Preparation materials include: a short article of my view, and more to be updated when available.

Related materials:

video lectures from the book and related.

some video seminars and lectures by Li Zhaoping.