A New Contender in Trial And Error—The Robot That Learns Like You and I

Robots that can learn? Yes they can, at least with these new algorithms. A UC Berkeley research team has created algorithms that allow robots to use trial and error to learn motor tasks. This is a process that is much closer to the ways that humans think and learn, making it a major artificial intelligence milestone.

BRETT, the Berkeley Robot for the Elimination of Tedious Tasks, is living up to his name as he learns to screw a cap onto a bottle, hang clothes in a rack, and assemble toys. This Willow Garage Personal Robot 2 (PR2) can do these tasks without specific programmed details because he can use trial and error to figure the tasks out.

"What we're reporting on here is a new approach to empowering a robot to learn," said Professor Pieter Abbeel of UC Berkeley's Department of Electrical Engineering and Computer Sciences. "The key is that when a robot is faced with something new, we won't have to reprogram it. The exact same software, which encodes how the robot can learn, was used to allow the robot to learn all the different tasks we gave it."

The researchers will present their work this week at the International Conference on Robotics and Automation (ICRA) in Seattle. Part of a new People and Robots Initiative, BRETT and his team work out of UC Berkeley's Center for Information Technology Research in the Interest of Society (CITRIS). The initiative includes the resources of multiple campuses and a variety of disciplines, allowing the participants to use human need to shape advances in automation, robotics, and artificial intelligence (AI).

"Most robotic applications are in controlled environments where objects are in predictable positions," UC Berkeley faculty member and director of the Berkeley Vision and Learning Center, Trevor Darrell says. "The challenge of putting robots into real-life settings, like homes or offices, is that those environments are constantly changing. The robot must be able to perceive and adapt to its surroundings."

The Berkeley team took their inspiration not from conventional approaches to robotics, which have been less that practical, but rather from the neural circuitry of the highly successful human brain. Instead of using pre-programming to handle what can ultimately only be understood as a limitless and unpredictable series of challenges presented by the navigation of 3D environments, the team works with what is called deep learning.

"For all our versatility, humans are not born with a repertoire of behaviors that can be deployed like a Swiss army knife, and we do not need to be programmed," team postdoctoral researcher, Sergey Levine says. "Instead, we learn new skills over the course of our life from experience and from other humans. This learning process is so deeply rooted in our nervous system, that we cannot even communicate to another person precisely how the resulting skill should be executed. We can at best hope to offer pointers and guidance as they learn it on their own."

Deep learning programs create layers of artificial neurons called "neural nets" which take raw sensory input such as image pixels or sound waves and process them. In this way the robot can categorize data it receives and recognize patterns. This kind of AI research is already benefitting us every day; speech to text programs and even Google Street View are already making use of advances in vision and speech recognition made possible by deep learning technology.

However, learning motor tasks goes far beyond the passive task of recognizing sounds and images. For this reason, applying deep learning to motor tasks is a challenge.

"Moving about in an unstructured 3D environment is a whole different ballgame," research team member and Ph.D. student, Chelsea Finn says. "There are no labeled directions, no examples of how to solve the problem in advance. There are no examples of the correct solution like one would have in speech and vision recognition programs."

The team gave BRETT with a sequence of simple motor tasks; for example, BRETT was charged with using shape sorting blocks and stacking Legos. The new algorithm the team created to control the learning process for BRETT has a scoring reward function built in. As BRETT works within his environment, the algorithm gives him real-time feedback based on whether or not his movements get him closer to completing his task. The scores then go through his neural network so he can "remember" what works and what doesn't.

Using this strategy, BRETT can complete a typical task within about 10 minutes if he has the coordinates for the start and end of the job. If he doesn't and has to get the starting point from his environment, it takes him about three hours. Abbeel predicts major improvements as BRETT's data processing capacity expands and improves.

"With more data, you can start learning more complex things," Abbeel says. "We still have a long way to go before our robots can learn to clean a house or sort laundry, but our initial results indicate that these kinds of deep learning techniques can have a transformative effect in terms of enabling robots to learn complex tasks entirely from scratch. In the next five to 10 years, we may see significant advances in robot learning capabilities through this line of work."