Researchers combined two
different ways of setting goals for robots into a single process, which
performed better than either of its parts alone in both simulations and
real-world experiments. The team's new system for providing instruction to
robots (known as reward functions) combines demonstrations, in which humans
show the robot what to do, and user preference surveys, in which people answer
questions about how they want the robot to behave. They developed a way of
producing multiple questions at once, which could be answered in quick
succession by one person or distributed among several people. This update sped
the process 15 to 50 times compared to producing questions one-by-one.
The new combination system begins
with a person demonstrating a behavior to the robot. That can give autonomous
robots a lot of information, but the robot often struggles to determine what
parts of the demonstration are important. People also don't always want a robot
to behave just like the human that trained it. For this study, the group used
the slower single question method, but they plan to integrate multiple-question
surveys in later work. In tests, the team found that combining demonstrations
and surveys was faster than just specifying preferences and, when compared with
demonstrations alone, about 80 percent of people preferred how the robot
behaved when trained with the combined system.
More information: