Computers are able to recognize objects in photographs and other images, but how well can they understand the relationships or implied activities between objects? Researchers have devised a method of evaluating how well computers perform at that task. Researchers from Brown and Johns Hopkins universities have come up with a new way to evaluate how well computers can divine information from images. The team describes its new system as a "visual Turing test," after the legendary computer scientist Alan Turing's test of the extent to which computers display human-like intelligence. Traditional computer vision benchmarks tend to measure an algorithm's performance in detecting objects within an image, or how well a system identifies an image's global attributes.
To be able to recognize that the image depicts two people walking together and having a conversation is a much deeper understanding. Describing an image as depicting a person entering a building is a richer understanding than saying it contains a person and a building. The system is designed to test for such a contextual understanding of photos. It works by generating a string of yes or no questions about an image, which are posed sequentially to the system being tested. Each question is progressively more in-depth and based on the responses to the questions that have come before. The first version of the test was generated based on a set of photos depicting urban street scenes. But the concept could conceivably be expanded to all kinds of photos, the researchers say.