Internal Representations of Objects

The nature or characteristic of internal representations is arguably the “holy grail” in cognitive psychology. We are interested in the visual perceptual aspects of the internal representations, focusing on how three-dimensional geometric shapes are represented in the brain. Specifically, we ask the question:

To what extent is the visual system capable of constructing an object representation from an optical stimulus that maximizes the chance that it will be successfully recognized when encountered again?

For example, after you meet someone for the first time, how does the brain memorize the face so that you can recognize him/her next time? The person’s hairstyle, makeup, the viewing angle, and almost certainly the lighting conditions will be different, but these usually do not deter us from recognizing the person. Why? How does the brain remember the face? What is the nature of the remembered information?

This question of the nature of internal shape representations in visual memory has been a topic of prolonged, and at times heated, debate in visual cognition. One school of thought postulates that the representation of an object is appearance based, or view based, or image based – somewhat akin to the exemplar theory in the concept and categorization literature. This has experimental support; an object image that is identical to a previously seen image is almost always better recognized than if the new image differed in some way.

What we have been doing can be described in two ways. First we aim to show that sometimes, an identical image is actually harder to recognize than if the object were presented in a slightly different manner. Second, and more importantly, we believe that perception is a process of abstraction and not just a process of recording snapshots of an object or faces. This is because, as mentioned earlier, an object is far more likely to appear differently than the same next time around. It makes intuitive sense therefore for the visual system to represent the more stable and intrinsic aspects of a shape. For example, if you see a photograph with some paint splashed on it, you would likely ignore the paint in your memory representation of the photograph.

So how do we demonstrate this intuition in the lab? The key idea is that the original image presented is impoverished. In the case of the photograph with splashes of paint, we would call the photograph “partially occluded”. The occlusion gives the visual system an opportunity to work harder to code the relevant information into memory. The first class of objects we have used is faces, since the visual system has strong expectations of what an occluded patch of a face should look like based on the visible portion of the face (Kersten, 1987). During testing, either the same face image or an image of the same face but with less occlusion is presented. Human subjects gave rise to better recognition accuracy and discrimination sensitivity when the test image was different (less occlusion) than when it was identical to the original image. Subsequently, we have also shown that this is true for natural scenes and Chinese characters.

Interestingly, it is not the case that the less occlusion, the better. Recognition improved with reduced occlusion (imaging a shrinking paint droplet), but only to a certain degree. This actually makes a lot of sense, because the visual system can better infer (or “guess”) the occluded information when it is closer to the occlusion boundary – when it is near non-occluded information than otherwise Consequently, the visual system is “wise” to keep this uncertainty in check and not to represent a region deep inside the occlusion with a wrong guess. This line of reasoning can be and has been formulated in statistical inference. That is to say, not only can we make sense of the results, we can also quantify the extent to which the perceptual abstraction achieves. This will give us a better idea how precisely the shape of an object is represented by the visual system and, to a certain extent, why.

We are currently studying other aspects of perceptual abstraction in a number of situations, for example, when a face is distorted and when an image is blurry (low contrast).

internal representations — Figure: When we look at a face that is partially occluded by squares of varying sizes, we all have a reasonable sense what the person looks like. For sure, we all know what a human face looks like. But there are more fundamental properties of a meaningful image at work here.

By a meaningful image, we simply mean that it is not random. For example, if a random image is generated such that a pixel’s color and brightness are completely independent of any of its neighboring pixels, then there is absolutely no way that anyone can predict what an occluded pixel should look like. In a more meaningful photo with objects and scenes, things are more predictable in the sense that a pixel’s color and brightness can be reasonably predicted from the colors and brightness of the neighboring pixels. In fact, the closer the neighboring pixels are, the better the prediction will be.

The human eye is doing something very similar all the time. For example, we all have the blind spot on the retina where there are no photoreceptors because the retinal cables in front of the photoreceptors all need to pass through this spot to go behind the eye. There are also blood vessels in front of the retina but we never see them.

Our research goes beyond this. We hypothesize that our visual memory, or the way our mental image is formed, goes through a similar process. We have also made the following counter-intuitive prediction. After a partially occluded image is seen, participants will insist that they have actually seen a cleaner version of the image, and will recognize better the cleaner version than the original image. This is what we actually found.

Guessing what is behind occlusion, or more generally, inferring what is not directly available from the senses, necessarily carries certain risk also. This is because the visual system should not overreach and make a big mistake. Therefore, how one should balance between sensibly guessing and being faithful to the sensory data at the same time? This is one active research are we are carrying on in the lab.