|
Visual perception, learning, and computation
The question that interests us most is: how does our visually perceived world
differ from the physical world? Obviously our visual representation of the world
is not a replica, but reflects our unique evolutionary and ecological needs. We
selectively amplify certain details and ignore others, and increase our
sensitivity to those deemed important via practice (perceptual learning). We
organize these important visual details into categories (e.g., objects) and
encode them into memory in specific ways so that we can recognize objects
effortlessly (object recognition). These organized categories, in turn, impose
on our senses so that we perceive the world in a regular, coherent, and stable
manner (perceptual organization). Indeed, the nature of our memory
representations is one of the most important questions in brain science, and it is
this question that has been our main research interest.
Our approach in studying visual perception integrates human perceptual
experiments, fMRI, and information-theoretic analysis. The theoretical analysis
is at the competence level, based primarily on mathematical statistics. That is
to say, given the visual stimulus, what is theoretically the best any visual
system can do, human or computer alike. Formally, it is called the "ideal
observer" approach.
(Simplified) example projects
Visual perceptual learning and automaticity
One of the most celebrated psychological theories is the automaticity theory by Shiffrin and Schneider (1977). When targets are consistently targets and distractors consistently distractors, searching a target among distractors becomes increasingly easy and requires less attention. A common example of automaticity is learning to drive. It has been also commonly assumed that automaticity applies to visual perceptual learning, i.e., telling apart subtle differences in simple stimuli. After all, "targets" and "distractors" (or signal/noise) are always consistent in visual perceptual learning.
We have evidence, however, that visual perceptual learning never leads to automaticity. It is true indeed that this learning makes one see better and clearer. But it hardly requires less attention, or at least not by much at all. While the issue is controversial, it is extremely important to understand what is special about visual perceptual learning as opposed to higher visual learning.
In practice, without understanding this problem, it is nearly impossible to study visual perceptual learning with fMRI, since attention is always an important issue.
Visual perceptual learning without MT -- behavioral and fMRI
We created a motion stimulus that functionally suppresses visual middle temporal area (MT/V5). We found that without MT human subjects could still learn motion direction discrimination -- but only when the signal-noise-ratio of the stimulus was sufficiently high. Otherwise subjects never improved even though their performance was well above chance, and even though feedback was provided (pdf).
We are currently using fMRI to investigate, when learning takes place, whether MT overcomes the special stimulus and becomes active again, or MT remains inactive while learning is taken over by, e.g., V1.
We are also investigating whether perceptual learning of motion discrimination is possible without V1, using another special stimulus.
Explaining a stereokinetic illusion
In 1924, the Italian scientist Musatti discovered that an ellipse rotating in the image plane gives rise to a vivid perception of a wobbling circular disk (like a satellite dish) (demo). In principle, there are many possible objects (e.g., many ellipses of different aspect ratios) and many possible motions that can give rise to a movie of a rotating ellipse, but the visual system chooses only one object and one motion. We found a precise mathematical solution that minimizes the total motion energy. We further verified experimentally that this solution was indeed extremely close to what the human visual system "chooses" to perceive. Intuitively, this solution corresponds to a bird's eye view of a coin rolling circularly on a tabletop without sliding.
Top-down influences in visual discrimination
We presented in STEREO a static frame of a point-light human figure, which an inexperienced subject could not recognize. We manipulated the 3-D lengths of the forearms (whose dots were respectively colored in red and green, while the remaining dots were all blue), asked the subject if the red or green 3D distance was longer, and measured the discrimination ability. We then showed the subject an animation movie of the walking human figure so that it could be recognized (demo). We now measured the subject's discrimination again, and found that it got worse. This demonstrates that, after realizing that the red and green lengths were actually those of the forearms of a human, the subject expected the forearms to be equal. Then the subject had a harder time telling apart which forearm was longer.
Understanding internal memory representations in object recognition
It is believed by many, in human vision and computer vision alike, that an internal shape representation is 2-D view-based or image-based. Some others believe that one can never find out whether the internal shape representation in memory is 2-D or 3-D, we have constructed a "2-D ideal observer" that gives rise to the best possible performance for any 2-D based object recognition, and demonstrated that human observers outperformed this "2-D ideal observer" (pdf). Hence, we argue that the internal shape representation is unlikely to be 2-D.
Almost all studies in object recognition have found that the more similar a query image is to a studied, the better recognition performance will be. We have recently demonstrated to the contrary. Although the term "similarity" is ill-defined, it is reasonable to assume that two identical images are more similar to each other than two different images. Our results demonstrated that a query image that is different yet more structured gives rise to better recognition performance than a query that is identical to the studied. These results contradict all current image-based or view-based theories in object recognition.
|