19 May 2010

Rudimentary Computer Vision

A conventional object recognition system, when trying to discern a particular type of object in a digital image, will generally begin by looking for the object's salient features. A system built to recognize faces, for instance, might look for things resembling eyes, noses and mouths and then determine whether they have the right spatial relationships with each other. The design of such systems, however, usually requires human intuition: A programmer decides which parts of the objects are the right ones to key in on. That means that for each new object added to the system's repertoire, the programmer has to start from scratch, determining which of the object's parts are the most important. It also means that a system designed to recognize millions of different types of objects would become unmanageably large. Each object would have its own, unique set of three or four parts, but the parts would look different from different perspectives, and cataloguing all those perspectives would take an enormous amount of computer memory.

Researchers developed an approach that solves both of these problems at once. Like most object-recognition systems, their system learns to recognize new objects by being ‘trained’ with digital images of labeled objects. But it doesn't need to know in advance which of the objects' features it should look for. For each labeled object, it first identifies the smallest features it can -- often just short line segments. Then it looks for instances in which these low-level features are connected to each other, forming slightly more sophisticated shapes. Then it looks for instances in which these more sophisticated shapes are connected to each other, and so on, until it's assembled a hierarchical catalogue of increasingly complex parts whose top layer is a model of the whole object. Once the system has assembled its catalogue from the bottom up, it goes through it from the top down, winnowing out all the redundancies. Even though the hierarchical approach adds new layers of information about digitally depicted objects, it ends up saving memory because different objects can share parts.

More information:

http://www.sciencedaily.com/releases/2010/05/100511104633.htm