Computers are
able to recognize objects in photographs and other images, but how well can
they understand the relationships or implied activities between objects?
Researchers have devised a method of evaluating how well computers perform at
that task. Researchers from Brown and Johns Hopkins universities have come up
with a new way to evaluate how well computers can divine information from
images. The team describes its new system as a "visual Turing test,"
after the legendary computer scientist Alan Turing's test of the extent to
which computers display human-like intelligence. Traditional computer vision
benchmarks tend to measure an algorithm's performance in detecting objects
within an image, or how well a system identifies an image's global attributes.
To be able to
recognize that the image depicts two people walking together and having a
conversation is a much deeper understanding. Describing an image as depicting a
person entering a building is a richer understanding than saying it contains a
person and a building. The system is designed to test for such a contextual
understanding of photos. It works by generating a string of yes or no questions
about an image, which are posed sequentially to the system being tested. Each
question is progressively more in-depth and based on the responses to the questions
that have come before. The first version of the test was generated based on a
set of photos depicting urban street scenes. But the concept could conceivably
be expanded to all kinds of photos, the researchers say.
More
information: