Suppose you’re trying to navigate
an unfamiliar section of a big city, and you’re using a particular cluster of
skyscrapers as a reference point. Traffic and one-way streets force you to take
some odd turns, and for a while you lose sight of your landmarks. When they
reappear, in order to use them for navigation, you have to be able to identify
them as the same buildings you were tracking before — as well as your
orientation relative to them. That type of re-identification is second nature
for humans, but it’s difficult for computers. MIT researchers discovered a new
algorithm that could make it much easier, by identifying the major orientations
in 3D scenes. The same algorithm could also simplify the problem of scene
understanding, one of the central challenges in computer vision research.
The algorithm is primarily
intended to aid robots navigating unfamiliar buildings, not motorists
navigating unfamiliar cities, but the principle is the same. It works by
identifying the dominant orientations in a given scene, which it represents as
sets of axes, called ‘Manhattan frames’, embedded in a sphere. As a robot
moved, it would, in effect, observe the sphere rotating in the opposite
direction, and could gauge its orientation relative to the axes. Whenever it
wanted to reorient itself, it would know which of its landmarks’ faces should
be toward it, making them much easier to identify. As it turns out, the same
algorithm also drastically simplifies the problem of plane segmentation, or
deciding which elements of a visual scene lie in which planes, at what depth.
More information: