Shape Contour Perception
In natural vision, an object is often partially occluded by another object. Yet, we often seem to have a fairly good idea of what the hidden part looks like and, in particular, what the boundary contours look like. This is called perceptual completion. The special case of completion behind occlusion is known as amodal completion. Naturally, then, the term modal completion refers to perceptual completion without occlusion. Modal completion occurs when part of an occluding object shares the same color as the background it occludes, such that there is no optical information about the contour boundary of the occluder. Our brain nevertheless perceives this boundary. In fact, our brain enhances the perceived difference between the occluding and the occluded, even though there is no physical difference. The celebrated Kanizsa square illusion illustrates all the points above (see Figure 1). By the way, when the occluding object (or animal) in the foreground is too well camouflaged, modal completion will fail.
Figure 1. Illustration of the Kanizsa square illusion. People often perceive a black square sitting on top of four white disks. The perceived disk counter behind the occluding square is called amodal perception, whereas the perceived boundary of the square against the black background is called modal perception. There is also an alternative percept as follows. A black square is lying on white background. A piece of black paper with four holes is occluding the square, making only the square’s four corners visible. In this alternative, the circular arc of each hole against the black square is modally perceived, whereas the occluded boundary of the square is amodally perceived. In our experiment, we use stereovision to generate one percept and, by switching the left- and right-eye images, to generate the alternative.
How does the visual brain accomplish perceptual completion? This is one of the important questions in visual perception, because unraveling this question has great significance in unraveling the rest of the shape perception problems in vision. Our work is related to two of our colleagues in the same department. Phil Kellman and colleagues have an elegant theory, termed the identity hypothesis, which postulates that modal and amodal completion share the same underlying mechanism. Accordingly, these completions should give rise to the same behavioral results when tested. Given that perceptual completion is such an important problem, there are naturally several other theories in the field. In fact, the debate is fierce.
Our first step in the hypothesis testing is to try to figure out a rigorous experimental design. In the past, people added an arc contour to close the “mouth” of each Kanizsa inducer (or pacman), which supposedly changes the completion from modal to amodal in nature. But this raises a question: If you obtain different results between modal and amodal completion, how do you know that the difference is due to different processing in the brain rather than differences in the stimuli? (If you are aware of the ideal observer approach in vision, you will know that this is actually a very serious problem.) Our solution is to use stereovision. By switching the left- and right-eye images, modal and amodal percepts switch back and forth. In this sense, the stimulus difference is really minimal (Figure 2).
Figure 2. A “thin” shape created by Ringach and Shapley (1996). When the entire image is rotated by 90 degrees, the shape becomes “fat.”
The task we use is the thin/fat discrimination invented by UCLA’s Dario Ringach and colleagues (Figure 2). By slightly and elegantly modifying the Kanizsa square, Ringach made it possible to have an objective task of discrimination. The two stimuli differ only by a 90° rotation and therefore the stimulus change is also minimal. In a series of nine experiments, we demonstrated that modal discrimination was reliably better than amodal discrimination. We further demonstrated that prior results that found no difference between modal and amodal completions were likely due to perceptual learning. For participants without extensive training, this modal-amodal difference was robust.
Does this difference make sense? We think so. Our interpretation is that estimating an occluded (or amodal) boundary contour does not need to be as accurate as an unoccluded (or modal) contour that one may more directly interact with (e.g., grasping, reaching, or avoiding). This result does not necessarily mean that the identity hypothesis is wrong. On the contrary, the identity hypothesis has much theoretical appeal. Part of this appeal is its simplicity and universality. Given that both modal and amodal completions share the same and visible contour beginnings and endings, it is natural to hypothesize that completing a contour either behind an occluder or in front of some background is the same process. Our result implies that modal and amodal completions do not have to be identical in the entire process, but they may well be identical in the core process.
More recently, we are applying the classification image technique with the thin/fat shape discrimination to figure out whether the completion process occurs before or after binocular fusion. The idea of the classification image is to add pixel luminance noise to an image, and see how the location and pixel value of the noise influence a participant’s behavioral response, e.g., in thin/fat discrimination. The details of the study are rather technical, but giving a few examples of specific hypothesis should be helpful.
If the completion takes place monocularly, before the binocular fusion, then the following consequences are expected:
(1) Adding noise to the Kanizsa shape plane or the inducer plane should make little difference.
(2) Binocular disparity should not matter. That is to say, if the two planes merge into a single plane when viewed binocularly, no difference should occur for the modal case. This is because, when the two planes merge, an additional arc contour is necessary to close the “mouth” of each inducer in order to create the amodal case. Therefore, any difference in the amodal case when comparing two-dimensional (2-D) and 3-D displays could be due to the stimulus difference. However, in the modal case, the difference between 2-D and 3-D displays is only the disparity. There is otherwise no qualitative difference, so the results should be similar if completion is monocular.
(3) The resultant classification image should show two vertical parallel lines corresponding to each of the vertical contour of the Kanizsa shape. The distance should be the same as the binocular disparity. On the other hand, if the completion takes place after binocular fusion, there should only one vertical line in the classification image corresponding to the Kanizsa vertical contour. The location of this line in the classification image should align with the vertical contour of the Kanizsa shape in the dominant-eye-image.
These are testable hypotheses, and the results will provide us with a clearer picture of the mechanism of visual perceptual completion.