Researchers at MIT’s McGovern Institute for Brain Research have developed a new mathematical model to describe how the human brain visually identifies objects. The model accurately predicts human performance on certain visual-perception tasks, which suggests that it’s a good indication of what actually happens in the brain, and it could also help improve computer object-recognition systems. The model was designed to reflect neurological evidence that in the primate brain, object identification — deciding what an object is — and object location — deciding where it is — are handled separately. Although what and where are processed in two separate parts of the brain, they are integrated during perception to analyze the image, researchers say. The mechanism of integration, the researchers argue, is attention. According to their model, when the brain is confronted by a scene containing a number of different objects, it can’t keep track of all of them at once. So instead it creates a rough map of the scene that simply identifies some regions as being more visually interesting than others. If it’s then called upon to determine whether the scene contains an object of a particular type, it begins by searching — turning its attention toward — the regions of greatest interest.
The subjects were asked first to simply regard a street scene depicted on a computer screen, then to count the cars in the scene, and then to count the pedestrians, while an eye-tracking system recorded their eye movements. The software predicted with great accuracy which regions of the image the subjects would attend to during each task. The software’s analysis of an image begins with the identification of interesting features — rudimentary shapes common to a wide variety of images. It then creates a map that depicts which features are found in which parts of the image. But thereafter, shape information and location information are processed separately, as they are in the brain. The software creates a list of all the interesting features in the feature map, and from that, it creates another list, of all the objects that contain those features. But it doesn’t record any information about where or how frequently the features occur. At the same time, it creates a spatial map of the image that indicates where interesting features are to be found, but not what sorts of features they are. It does, however, interpret the ‘interestingness’ of the features probabilistically. If a feature occurs more than once, its interestingness is spread out across all the locations at which it occurs. If another feature occurs at only one location, its interestingness is concentrated at that one location.
More information:
http://web.mit.edu/newsoffice/2010/people-images-0607.html
More information:
http://web.mit.edu/newsoffice/2010/people-images-0607.html