16 March 2015

Visual Turing Test

Computers are able to recognize objects in photographs and other images, but how well can they understand the relationships or implied activities between objects? Researchers have devised a method of evaluating how well computers perform at that task. Researchers from Brown and Johns Hopkins universities have come up with a new way to evaluate how well computers can divine information from images. The team describes its new system as a "visual Turing test," after the legendary computer scientist Alan Turing's test of the extent to which computers display human-like intelligence. Traditional computer vision benchmarks tend to measure an algorithm's performance in detecting objects within an image, or how well a system identifies an image's global attributes.

To be able to recognize that the image depicts two people walking together and having a conversation is a much deeper understanding. Describing an image as depicting a person entering a building is a richer understanding than saying it contains a person and a building. The system is designed to test for such a contextual understanding of photos. It works by generating a string of yes or no questions about an image, which are posed sequentially to the system being tested. Each question is progressively more in-depth and based on the responses to the questions that have come before. The first version of the test was generated based on a set of photos depicting urban street scenes. But the concept could conceivably be expanded to all kinds of photos, the researchers say.

More information: