Robot learning by demonstration

Post-publication activity

Robot Learning from Demonstration (LfD) or Robot Programming by Demonstration (PbD) (also known as Imitation Learning and Apprenticeship Learning) is a paradigm for enabling robots to autonomously perform new tasks. Rather than requiring users to analytically decompose and manually program a desired behavior, work in LfD - PbD takes the view that an appropriate robot controller can be derived from observations of a human's own performance thereof. The aim is for robot capabilities to be more easily extended and adapted to novel situations, even by users without programming ability.

1 Overview
- 1.1 Principle
- 1.2 Historical Context
2 Key Issues in Programming by Demonstration / Learning from Demonstration
3 Ways to Solve LfD - PbD
- 3.1 Low level learning of individual motions
  - 3.1.1 Teaching Force-Control Tasks
- 3.2 Learning high-level action composition
4 Imitation Learning combined with Other Learning Techniques
5 Further Reading
6 References

Overview

Principle

The main principle of robot LfD-PbD is that end-users can teach robots new tasks without programming. Consider, for example, a domestic service robot that an owner wishes to have prepare orange juice for breakfast, see Figure 1. The task itself may involve multiple subtasks, such as juicing the orange, throwing the rest of the orange in the trash and pouring the liquid in a cup. Furthermore, each time this task is performed, the robot will need to contend with changes such as displacement of the items' locations.

In a traditional programming scenario, a human programmer would have to reason in advance and code a robot controller that is capable of responding to any situation the robot may face, no matter how unlikely. This process may involve breaking down the task into 100s of different steps, and thoroughly testing each step. If errors or new circumstances arise after the robot is deployed, the entire costly process may need to be repeated, and the robot recalled or taken out of service while it is fixed.

In contrast, LfD - PbD allows the end-user to 'program' the robot simply by showing it how to perform the task - no coding required. Then, when failures occur, the end-user needs only to provide more demonstrations, rather than calling for professional help. LfD - PbD hence seeks to endow robots with the ability to learn what it means to perform a task by generalizing from observing several demonstrations, see Figure 1. LfD - PbD is NOT a record and replay technique. Learning and generalizing is core to LfD - PbD.

Figure 1: The teacher performs several demonstrations of the same task, changing the location of each item in between to allow the robot to generalize correctly. From observing these changes the robot can infer that the relative positions of the objects matter, but that their absolute positions do not.

Figure 2: After learning, the robot successfully reproduces the task even when all objects are in novel positions.

Historical Context

Robot Learning from Demonstration started in the 1980s. Then, and still to a large extent now, robots had to be tediously hand programmed for every task they performed. LfD - PbD seeks to minimize, or even eliminate, this difficult step by letting users train their robot to fit their needs. The expectation is that the methods of LfD-PbD, being user-friendly, will allow robots to be utilized to a greater extent in day-to-day interactions with non-specialist humans. Furthermore, by utilizing expert knowledge from the user, in the form of demonstrations, the actual learning should be fast compared to current trial-and-error learning, particularly in high dimensional spaces (henceforth addressing part of the well-known curse of dimensionality).

Research on LfD - PbD has grown steadily in importance since the 80s and several surveys have been published in recent years. The vast majority of work on LfD - PbD follows a more engineering/machine learning approach. Surveys of works in this area include (Argall et al 2010; Billard et al 2013, Schaal et al 2003). At the core, however, LfD - PbD is inspired by the way humans learn from being guided by experts, from infancy through adulthood. A large body of work on LfD - PbD therefore takes inspiration from concepts in psychology and biology. Some of these works pursue a computational neuroscience approach and use neural modeling. Others pursue a more cognitive science approach and build conceptual model of imitation learning in animals. Surveys in this area can be found in (Oztop et al. 2006, Dautenhahn and Nehaniv 2002, Billard 2002, Breazeal and Scassellati 2002).

Key Issues in Programming by Demonstration / Learning from Demonstration

Nehaniv & Dautenhahn (2001) phrased the problems faced by LfD - PbD in a set of key questions: '’'What to imitate? How to imitate? When to imitate? Whom to imitate?'’' To date, only the first two questions have really been addressed in LfD - PbD.

What to imitate and Evaluation Metric

What to imitate relates to the problem of determining which aspects of the demonstration should be imitated. For a given task, certain observable or affectable properties of the environment may be irrelevant and safely ignored. Key to determining what is and is not important is understanding the metric by which the robot's behavior is being evaluated.

In Figure 3, robots are trained to sort boxes by size, but not by color. Another way of saying this is that the metric used to determine if the robots have successfully performed the desired task involves only the size of the boxes, but not their color. Thus, the robots learn to ignore color in their efforts. Teaching what is and is not important can be done in multiple ways. The simplest approach is to take a statistical perspective and deem as relevant the parts (dimension, region of input space, etc) of the data which are consistent across all demonstrations (Calinon et al, 2007). If the dimension of the data is too high, such an approach may require too many demonstrations to gather enough statistics. An alternative is then to have the teacher help the robot determine what is relevant by pointing out parts of the task that are most important (Lockerd and Breazeal, 2004).

The issue of what to imitate take its root in developmental psychology. A fundamental step in child development occurs when children acquire the ability to perform discriminative imitation (Gergely et al. 2002, Carpenter et al. 2002); that is they move from imitating everything (over-imitation) to imitating only the goal of the actions (goal-directed imitation). In (Lopez et al 2009) a computational model of these different forms of imitation is proposed. This paper discusses also the notion of affordance learning in PbD-LfD. In manipulation tasks, actions result in an effect on the object manipulated. Hence, rather than tracking the action per se, one may track the effect this action has on object and learn a task description in this light.

Figure 3: Photo from GA Tech, Maka Cakmak

How to Imitate and the Notion of Correspondence

How to imitate consists in determining how the robot will actually perform the learned behaviors to maximize the metric found when solving the What to Imitate problem. Often, a robot cannot act exactly the same way as a human does, due to differences in physical embodiment. E.g, if the demonstrator uses a foot to move an object, is it acceptable for a wheeled robot to bump it, or should it use a gripper instead? If the metric does not have appendage-specific terms, it may not matter.

This issue is closely related to that of the ‘’’Correspondence Problem’’’ (Nehaniv 2007). Robots and humans, while inhabiting the same space and interacting with the same objects, and perhaps even superficially similar, still perceive and interact with the world in fundamentally different ways. To evaluate the similarity between the human and robot behaviors, we must first deal with the fact that the human and the robot may occupy different state spaces, of perhaps different dimensions. We identify two different ways in which states of demonstrator and imitator can be said to correspond, and give brief examples:

1: Perceptual equivalence: Due to differences between human and robot sensory capabilities, the same scene may appear very different to each. For instance, while a human may identify humans and gestures from light, a robot may use depth measurements to observe the same scene, see Figure 4. Another point of comparison is tactile sensing. Most tactile sensors allow robots to perceive contact, but do not offer information about temperature, in contrast to the human skin. Moreover, the low resolution of the robots' tactile sensors does not allow robots to discriminate across the variety of existing textures, while human skin does. As the same data may therefore not be available to both humans and robots, successfully teaching a robot may require a good understanding of the robot's sensors and their limitations. LfD - PbD explores the limits of these perceptual equivalences, by building interfaces that either automatically correct for or make explicit these differences.

Figure 4: Photo from ETH Zurich, Mario Frank

2: Physical equivalence: Due to differences between human and robot embodiments, humans and robots may perform different actions to accomplish the same physical effect. For instance, even when performing the same task (football), humans and robots may interact with the environment in different ways, see Figure 5. Here the humans run and kick, while the robots roll and bump.

Solving this discrepancy in motor capabilities is akin to solving the How to imitate problem and is the focus of much work in LfD - PbD. For example, a robot may compute a path (in Cartesian space) for its end-effector that is close to the path followed by the human, while relying on inverse kinematics to find the appropriate joint displacements to deal with the fact that the two bodies are different. In the football example above, this would require the robot to determine a path for its center of mass which corresponds to the path followed by the human's right foot when projected on the ground. Clearly, this equivalence is very task-dependent. Recent solutions to this problem for hand motion and body motion can be found in (Giosio et al 2012; Shon et 2006).

Figure 5: Photo from Robocup 2007, Rob Felt

Taken together, these two equivalences deal with discrepancies in how robots and humans are embodied. We can think of the perceptual equivalence as dealing with the manner in which the agents perceive the world, and makes sure that the information necessary to perform the task is available to both. Physical equivalence deals with the manner in which agents affect and interact with the world, and makes sure that the task is actually performable by both.

Interfaces for Demonstration

The interface used to provide demonstrations plays a key role in the way the information is gathered and transmitted. We distinguish three major trends:

A) Directly recording human motions. When interested solely in the kinematics of the motions, one may use any of various existing motion tracking systems, based on vision, exoskeleton or other wearable motion sensors. Figure 6 shows an example of full body motion tracking during walking using vision. The motion of the human body is first extracted from the background using a model of human body, and is then mapped to an avatar and the humanoid robot DB at ATR, Kyoto, Japan.

These external means of tracking human motion return precise measurement of the angular displacement of the joints. They have been used in various works for LfD - PbD of full body motion [Kulic et al. 2008; Ude et al. 2004; Kim et al. 2009]. These methods are advantageous in that they allow the human to move freely, but require good solutions to the correspondence problem. Typically, this is accomplished by an explicity mapping between human and robot joints, but can be quite difficult if the robot (e.g, a hexapod) differs greatly from the human.

B) Kinesthetic teaching, where the robot is physically guided through the task by the humans. With this approach, no explicit physical correspondence is needed, as the user demonstrates the skill with the robot's own body. It also provides a natural teaching interface to correct a skill reproduced by the robot as seen in Figure 1 and Figure 8. In the latter, skin technology is used to learn how touch contacts relate to the task at hand, raising issues of how to differentiate between touches that are part of the task, and those that are part of the teaching (Sauser et al. 2011).

One main drawback of kinesthetic teaching is that the human must often use more of their own degrees of freedom to move the robot than the number of degrees of freedom they are trying to control. This issue is seen in Figure 8 when the human must use both hands to move a few robot fingers. Similarly, in Figure 1, the teacher needs to use two arms to move one. Typically tasks that would require synchronization between multiple limbs are difficult to teach kinesthetically. A possibility is to proceed incrementally, teaching first the task for the right hand and then, while the robot replays the motion with its right hand, teach the motion of the left hand, but this process is often cumbersome.

C) Immersive teleoperation scenarios, where a human operator is limited to using the robot's own sensors and effectors to perform the task. Going further than kinesthetic teaching, which limits the user to the robot's own body, immersive teleoperation seeks to also limit the user's perception to those of the robot. The teleoperation itself may be done using joysticks or other remote control devices, including haptic devices. The later has the advantage that it can allow the teacher to teach tasks that require precise control of forces, while joysticks would only provide kinematic information (position, speed).

Teleoperation is advantageous in that it not only entirely solves the correspondence problem, but also allows for the training of robots from a distance. As the teacher no longer needs to be near the robot, it is well suited for teaching navigation and locomotion patterns. For instance, in (Peternel and Babic 2013) and (Babic et al.2011), a humanoid robot is taught balancing techniques from human demonstrations. A haptic interface attached to the torso of the demonstrator was designed to transmit the perturbations induced on the robot and allow the teacher to adapt appropriately. The motion of the demonstrator were immediately re-transcribed in similar robot motion and used for training a model of motion conditioned on perceived forces.

Teleoperation is, however, more frequently used solely to transmit the kinematic of motion. In [Abbeel et al. 2010], acrobatic trajectories of an helicopter are learned by recording the tilt and pan motion of the helicopter when teleoperated by an expert pilot. In [Grollman & Jenkins 07], a robot dog is taught to play soccer with a human guiding it via a joystick. The main disadvantage of teleoperation is that the teacher often needs training to learn to use the remote control device. Additionally, for high-degree of freedom robots, the teleoperation interface can be highly complex, such as a complete exoskeleton.

Each teaching interface has its pros and cons and may be appropriate for different tasks. Some work has begun to investigate how these interfaces could be used in conjunction to exploit complementary information provided by each modality separately, see, e.g. (Sauser et al 2011).

Figure 6: Full Body Imitation from Vision (Ude et al 2004)

Figure 7: Full Body Imitation with Balance Control (Oztop & Colleagues), see[1]

Figure 8: Kinesthetic Demonstration (Sauser et al 2011) Long version of the movie available at [2]

Figure 9: Teleoperation via haptic device (Evrard et al 2009)

Ways to Solve LfD - PbD

Current approaches to encoding skills through LfD - PbD can be broadly divided between two trends: a low-level representation of the skill, taking the form of a non-linear mapping between sensory and motor information, and, a high-level representation of the skill that decomposes the skill in a sequence of action-perception units.

Low level learning of individual motions

Individual motions/actions (e.g. just juicing the orange, or trashing it, or pouring the liquid in the cup from Figure 1) could be taught separately instead of all at once. The human teacher would then provide one or more examples of each sub-motion apart from the others. If learning proceeds from the observation of a single instance of the motion/action, one calls this one-shot learning (Wu and Demiris 2010). Examples can be found in (Nakanishi et al 2004) for learning locomotion patterns. Different from simple record and play, here the controller is provided with prior knowledge in the form of primitive motion patterns and learns parameters for these patterns from the demonstration.

Mutli-shot learning can be performed in batch after recording several demonstrations, or incrementally as new demonstrations are performed (e.g. Lee and Ott 2011). Learning generally performs inference from statistical analysis of the data across demonstrations, where the signals are modeled via a probability density function, and analyzed with various non-linear regression techniques stemming from machine learning. Popular methods these days include Gaussian Process, Gaussian Mixture Models, Support Vector Machines, see this page for examples of these works.

Teaching Force-Control Tasks

While most LfD - PbD work to date has focused on learning kinematic motions of end-effectors or other joints, more recent work has investigated extracting force-based signals from human demonstration (Calinon et al. 09; Kormushev et al 2011, Rozo et al 2011). Transmitting information about force is difficult for humans and for robots alike, since force can be sensed only when performing the task ourselves. Current efforts therefore seek to decouple the teaching of kinematics and force, as in Figure 10, or develop methods by which one may “embody” the robot and, by so doing, allow human and robot to perceive simultaneously the forces applied when performing the task. This line of work is fueled by recent advances in the design of haptic devices and tactile sensing, and on the development of torque and variable impedance actuated systems to teach force-control tasks through human demonstration.

Figure 10: Example of teaching force-based tasks (Kronander and Billard, 2012).

Learning high-level action composition

Learning complex tasks, composed of a combination and juxtaposition of individual motions, is the ultimate goal of LfD - PbD. A common approach is to first learn models of all of the individual motions, using demonstrations of each of these actions individually (Daniel et al. 2012, Mangin and Oudeyer 2011), and then learn the right sequencing/combination in a second stage either by observing a human performing the whole task (Dilmman 2004; Skoglund et al. 2007) or through reinforcement learning (Mülling et al. 2013). However, this approach assumes that there is a known set of all necessary primitive actions. For specific tasks this may be true, but to date there does not exist a database of general purpose primitive actions, and it is unclear if the variability of human motion may really be reduced to a finite list.

An alternative is to watch the human perform the complete task and to automatically segment the task to extract the primitive actions (which may then become task-dependent), see e.g. (Kulic et al 2012). The main advantage is that both the primitive actions and the way they should be combined are learned in one pass. One issue that arises is that the number of primitive tasks is often unknown, and there could be multiple possible segmentations which must be considered (Grollman and Jenkins 2010).

Figure 11 shows an example of a complex task composed of compound actions - a robot loading dishes into a dishwasher. In an illustration of the first approach, the robot is given a set of known (pre-programmed or learned previously) behaviors, such as pick up cup, move toward dishwasher, open dishwasher, etc, and must learn the correct sequence of actions to perform. The whole sequence itself is either induced through human-request via speech processing or learned through observation of the task completed by a human demonstrator (Asfour et al. 2008). Other examples of high-level learning include learning sequences of known behavior for navigation through imitation of a more knowledgeable robot or human (Hayes and Demiris, 2006; Gaussier et al. 98, Nicolescu and Mataric 2003); and learning and sequencing of primitive motions for full body motion in humanoid robots (Billard, 2000; Ito and Tani 2004; Kulic et al. 2008).

Figure 11: (Asfour et al. 2008)

Imitation Learning combined with Other Learning Techniques

The majority of work in LfD - PbD focuses solely on learning from demonstration data. There is however a growing body of works that look at ways in which LfD - PbD can be combined with other learning techniques. One group of work investigates how to combine imitation learning with reinforcement learning, a method by which the robot learns through trial and error, so as to maximize a reward. Other works take inspiration in the way humans teach each other and introduce interactive and bidirectional teaching scenarios whereby the robot becomes an active partner during the teaching. We briefly review the main principles underlying each of these areas below:

Imitation Learning and Reinforcement Learning

A major limitation of imitation learning is that the robot can only become as good as the human's demonstrations. There is no additional information for improving the learnt behavior. Reinforcement learning, in contrast, allows the robot to discover new control policies through free exploration of the state-action space, but often takes a long time to converge. Approaches that combine the two aim at exploiting the strength of both to overcome their respective drawbacks. Particularly, demonstrations are used to initiate and guide the exploration done during reinforcement learning, reducing the time to find an improved control policy, which may depart from the demonstrated behavior.

Demonstrations can be used in different ways to bootstrap RL. They may be used as initial roll-outs from which an initial estimate of the policy is computed (Kober and Peters, 2010; Kormushev et al, 2010; Jetchev and Toussaint 2013), or to generate an initial set of primitives (Bentivagna et al. 2004; Kormushev et al 2010; Mülling et al., 2013). In the latter case, RL is then used to learn how to select across these primitives. Demonstrations can also be used to limit the search space covered by RL (Peters, Vijayakumar & Schaal, 2003; Guenter et al. 2007), or to estimate the reward function (Ziebart et al., 2008; Abbeel et al. 2010). Finally, RL and imitation learning can be used in conjunction at run time, by letting the demonstrator take over part of the control during one trial (Ross et al. 2011)

Figure 12 and Figure 13 show two examples of techniques that use Reinforcement Learning in conjunction with LfD - PbD to improve the robot's performance beyond that of a demonstrator, with respect to a known reward function. See also Reward-based LfD examples.

Figure 12: Teaching a robot how to play “ball in cup". Start with a few correct human demonstrations and then let the robot learn the rest through trial and error (Kober and Peters, 2011); longer video at [3]

Figure 13: Teaching a robot how to flip pancakes (Kormushev et al 2010). Longer version of the video, see [4]

Figure 14: Learning from failure how to throw a ball in a basket using a catapult. Learning is initialized with two incorrect demonstrations. Learning proceed through guided exploration around the demonstrations (Grollman and Billard, 2011). Longer version of the video, see [5]

Inverse Reinforcement Learning/Learning the Cost Function

Typically, works that combine imitation learning with reinforcement learning assume a known reward to guide the exploration. In contrast, Inverse Reinforcement Learning (IRL) offers a framework to automatically determine the reward and discover the optimal control policy (Abbeel and Ng 2004). When using human demonstrations to guide learning, IRL is solving jointly the What to imitate and How to imitate problems, see examples of Inverse Reinforcement Learning. While the original approach assumes a Markov world (i.e. discrete state action space), alternative approaches derive a cost function in a continuous space (Ratliff et al 2006, 2009), and include extensions of IRL for continuous state-action space (Howard et al. 2013). Note that these works are closely related to inverse optimal control, a large area of research in control theory

Underlying all IRL works is the assumption of a consistent reward function. When demonstrations are provided by multiple experts, this assumes that all experts optimize the same objectives. This is constraining and does not exploit the variability of ways in which humans may solve the same task. Recent IRL works consider multiple experts and identify multiple different reward functions (Choi and Kim 2012; Tanwani and Billard 2013). This allows the robots to learn multiple (albeit suboptimal) ways to perform the same task. The hope is that this multiplicity of policies will make the controller more robust, offering alternative ways to complete the task, when the context no longer allows the robot to perform the task in the optimal way.

In all previous examples, there is a reliance on successful demonstrations of the desired task by the human. LfD-PbD techniques assume that all the demonstrations are good demonstrations and researchers generally discard data that are poor proxy of what would be deemed as a good behavior. Recent work has begun to investigate the possibility that demonstrations corresponding to failed attempts at performing the task may be useful for learning (Grollman and Billard, 2011; Ray et al. 2013). In this case, LfD - PbD expands to learn both what to and what not to imitate. This work offers an interesting alternative to approaches that combine imitation learning and reinforcement learning, in that no reward needs to be explicitly determined, see Figure 14, see also Learning from Failure.

LfD - PbD and Human-Robot Interaction

As LfD - PbD necessarily deals both with humans and robots, it overlaps heavily with the field of Human Robot Interaction (HRI). In addition to the learning algorithms themselves, many human-centric issues are researched as part of LfD - PbD. Generally, the focus is on how to better elicit and utilize the demonstrations (see [Goodrich & Schultz 07, Fong et al 03, Breazeal & Scasselatti 02] for surveys).

New lines of research seek to make the teaching/learning process more interactive. Robots can become more active partners by indicating which segments of the demonstrations were unclear, or what portions of the task are modeled poorly (Grollman & Jenkins 07; Shon et al. 2007). Teachers may then in turn refine the robot’s knowledge by providing complementary information. This supplementary information may consist of additional rounds of demonstrations of the complete task (Chernova and Veloso 2009; Thomaz, A. and Breazeal, C. 2008), or may be limited to subparts of the task (Argall et al 2010, Calinon and Billard 2007). The information can be conveyed through specific task's features, such as a list of way-points (Silver et al. 2012; Akgun et al 2012). The robot is then left free to interpolate a trajectory using these keypoints.

Ongoing work in this area focuses on techniques whereby the user and robot can work more closely together to improve the robot's policy. Areas of interest include endowing the robot with a sense of confidence in its abilities, so it can ask for help, and allowing the user to address particular subportions of the overall task, see examples of Interactive Learning.

However, the design of such incremental teaching methods implies a need for machine learning techniques that enable the incorporation of new data in a robust, and generally speedy, manner (Silver et al. 2012) and deal with ambiguous data. It also opens the door to the design of other human-robot interfacing systems, including the use of speech to permit informed dialogues between humans and robots (Akgun et al 2012; Rybski et al. 2007), as seen in Figure 15. Here the robot asks for help during or after teaching, verifying that its understanding of the task is correct (Cakmak and Thomaz 2012). Any changes provoked by these questions are immediately put into effect.

Figure 15: Active learning on the part of the robot. When unsure as to what to do, the robot can request additional guidance from the human (Cakmak and Thomaz 2012), see [6].

Limitations and Open Questions

Research in LfD-PbD is progressing rapidly, pushing back limits and posing new questions all the time. As such, any list of limitations and open questions is bound to be incomplete and out of date. However, there are a few long-standing limitations and open questions that bear further attention.

Generally, work in LfD-PbD assumes a fixed, given form for the robot's control policy, and learns appropriate parameters. To date, there are several different forms of policies in common usage, and there is no clear correct (or dominant) technique. Furthermore, it is possible that a system could be provided with multiple possible representations of controllers and select which is most appropriate.

The combination of reinforcement learning and imitation learning has been shown effective in addressing the acquisition of skills that require fine tuning of the robot's dynamics. Likewise, more interactive learning techniques have proven successful in allowing for collaborative improvement of the learnt policy by switching between human-guided and robot-initiated learning. But, there do not yet exist protocols to determine when it is best to switch between the various learning modes available. The answer may in fact be task-dependent.

In work to date, teaching is usually done by a single teacher, or teachers with an explicit concept of the task to teach. More work need to be done to address issues related to conflicting demonstrations across teachers with different styles. Similarly, teachers are usually human beings, but could instead be an arbitrary expert agent. This agent could be a more knowledgeable robot or a computer simulation. Early works in this direction was done in the 90's (Hayes and Demiris 1994; Gaussier et al. 1998).

Experiments in LfD-PbD have mostly focused on a single task (or set of closely related tasks) and each experiment starts with a tabula rasa. As learning of complex tasks progresses, means to store and reuse prior knowledge at a large scale will have to be devised. Learning stages, akin perhaps to those found in child development, may be required. There will need to be a formalism to allow the robot to select information, to reduce redundant information, select features, and store efficiently new data.

References

Abbeel, P. & Ng, A. (2004), Apprenticeship Learning via Inverse Reinforcement Learning, International Conference on Machine Learning, ICML04.
Abbeel, P., Coates, A. and Ng, A. (2010), Autonomous Helicopter Aerobatics through Apprenticeship Learning, The International Journal of Robotics Research, 29(13), p. 1608-1639.
Akgun, B, Cakmak, M., Jiang, K., and Thomaz (2012), A.L. Keyframe-based learning from demonstration. International Journal of Social Robotics, 2012.
Argall(2010). Tactile Guidance for Policy Adaptation. Foundations and Trends in Robotics. 1(2): 79-133. doi:10.1561/2300000012.
Argall, B, Chernova, S., Veloso, M. and Browning, B. (2010). A Survey of Robot Learning from Demonstration. Robotics and Autonomous Systems, 57:5, p. 469–483.
Asfour, T. et al. (2008). Toward humanoid manipulation in human-centred environments. Robotics and Autonomous Systems. 56(1): 54-65.
Babic J, Hale JG, Oztop E (2011) Human sensorimotor learning for humanoid robot skill synthesis. Adaptive Behavior. vol. 19 no. 4, p. 250-263.
Bentivegna, D., Atkeson, C. and Cheng, G. (2004). Learning tasks from observation and practice, Robotics and Autonomous Systems. Vol. 47:2-3, P. 163–169.
Billard, A, Calinon, S, and Dillmann, R. (2013). Learning from Human Demonstration. Handbook of Robotics: MIT Press.
Billard, Aude G.; Calinon, Sylvain and Guenter, Florent (2006). Discriminative and adaptive imitation in uni-manual and bi-manual tasks. Robotics and Autonomous Systems. 54(5): 370-384.
Billard, A. (2002). Imitation. Handbook of Brain Theory and Neural Networks: MIT Press.
Billard, A. (2000) Learning motor skills by imitation: a biologically inspired robotic model. Cybernetics & Systems, 32, 1-2, 155-193
Breazeal(2002). Robots that imitate humans. Trends in Cognitive Sciences. 6(11): 481-487. doi:10.1016/s1364-6613(02)02016-8.
Byrne, Richard W. (2002). Imitation of novel complex actions: What does the evidence from animals mean?. In Advanced In the Study of Behavior, Elsevier Ed.., .
Call, J., & Carpenter, M. (2002). Three sources of information in social learning. Kerstin Dautenhahn and Chrystopher L. Nehaniv (Eds.) In Imitation in animals and artifacts. Cambridge, MA: MIT Press, 211–218.
Cakmak and Thomaz (2012), Designing Robot Learners that Ask Good Questions, :Proceedings of the ACM-IEEE Int. Conf. on Human-Robot Interaction.
Calinon, S. and Billard, A. (2007) Active Teaching in Robot Programming by Demonstration. in Proceedings of the IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), Jeju, Korea, pp 702-707.
Calinon, S., Guenter, F. and Billard, A. (2007) On Learning, Representing and Generalizing a Task in a Humanoid Robot. IEEE Transactions on Systems, Man and Cybernetics, 37:2. Part B. Special issue on robot learning by observation, demonstration and imitation.
Calinon, S., Evrard, P., Gribovskaya, E., Billard, A. and Kheddar, A. (2009) Learning collaborative manipulation tasks by demonstration using a haptic interface. Proceedings of the International Conference on Advanced Robotics (ICAR), 2009.
Carpenter, Malinda; Call, Josep and Tomasello, Michael] (2002). Understanding "Prior Intentions" Enables Two-Year-Olds to Imitatively Learn a Complex Task. Child Development. 73(5): 1431-1441.
Chernova, S. and Veloso, M. Interactive Policy Learning through Confidence-Based Autonomy. Journal of Artificial Intelligence Research. Vol. 34, 2009.
Choi, J. and Kim (2012), K., Nonparametric Bayesian inverse reinforcement learning for multiple reward functions, Advances in Neural Information Processing Systems 25, NIPS'12.
Carpenter, Malinda; Call, Josep and Tomasello, Michael (2002). Understanding "Prior Intentions" Enables Two-Year-Olds to Imitatively Learn a Complex Task. Child Development. 73(5): 1431-1441.
Daniel, C., Neumann, G. and Peters, J., Learning concurrent motor skills in versatile solution spaces, In proceedings of the IEEE International Conference on Robotics and Intelligent Systems (IROS'2012), p. 3591 - 3597
Dautenhahn, K. and Nehaniv, C. (2002). Imitation in Animals and Artifacts, MIT Press.
Dillman, R. (2004), "Teaching and learning of robot tasks via observation of human performance", Robotics and Autonomous Systems Volume 47, Issues 2-3, 30 June 2004, Pages 109-116.
Evrard, P., Gribovskaya, E., Calinon, S., Billard, A. and Kheddar, A. (2009) Teaching Physical Collaborative Tasks: Object-Lifting Case Study with a Humanoid. Proceedings of IEEE International Conference on Humanoid Robots, 2009.
Fong, T, Nourbakhsh, I and Dautenhahn (2003), K, A survey of socially interactive robots, Robotics and Autonomous Systems, Volume 42, Issues 3–4, P. 143–166.
Gaussier, P et al., From perception–action loops to imitation processes: A bottom-up approach of learning by imitation, Applied Artificial Intelligence Journal 12 (7–8) (1998).
Gergely, G, Bekkering, H. and Király, I. (2002), Rational imitation in preverbal infants, Nature, 415(6873) p. 755-755.
Gioso, G., Salvietti, G., Malvezzi, M. and Prattichizzo, D. (2012), An Object-Based Approach to Map Human Hand Synergies onto Robotic Hands with Dissimilar Kinematics, in the Proceedings of Robotics: Science and Systems (RSS).
Goodrich, M and Schultz, A (2007), Human-robot interaction: a survey, Foundations and Trends in Human-Computer Interaction, Vol 1, issue 3.
Guenter, F., Hersch, M., Calinon, S. and Billard, A. (2007) Reinforcement Learning for Imitating Constrained Reaching Movements. RSJ Advanced Robotics, Vol. 21, No. 13, pp. 1521-1544.
Grollman, D.H and Billard, A. (2011) Donut as I do: Learning from Failed Demonstrations. In Proceedings of IEEE International Conference on Robotics and Automation.
Grollman, D and Jenkins, O.C (2010), Incremental learning of subtasks from unsegmented demonstration, In International Conference on Intelligent Robots and Systems, Taipei, Taiwan, October 2010.
Grollman, D and Jenkins, O.C (2007), Dogged Learning for Robots, In Proceedings of the IEEE International Conference on Robotics and Automation, Roma, Italy, 10-14 April 2007.
Hayes, G. and Demiris, Y. (1994), A Robot Controller Using Learning by Imitation, The 2nd International Symposium on Intelligent Robotic Systems.
Ito, M. (2004). On-line Imitative Interaction with a Humanoid Robot Using a Dynamic Neural Network Model of a Mirror System. Adaptive Behavior. 12(2): 93-115. doi:10.1177/105971230401200202.
Jetchev, N. and Toussaint, M. (2013), Fast Motion Planning from Experience: Trajectory Prediction for Speeding up Movement Generation. Autonomous Robots. Vol.34:1-2, p. 111-127.
Kim, S., Kim, C., You, B. and Oh, S (2009), Stable Whole-body Motion Generation for Humanoid robots to Imitate Human Motions. Proc. IEEE/RSJ Intl Conf. on Intelligent Robots and Systems (IROS).
Kober, J.; Peters, J. (2011). Policy Search for Motor Primitives in Robotics, Machine Learning, 84, 1-2, pp.171-203.
Kormushev, P, Calinon, S, and D. Caldwell (2011), Imitation Learning of Positional and Force Skills Demonstrated via Kinesthetic Teaching and Haptic Input, Advanced Robotics, pp. 1–20.
Kormushev, P, Calinon, S, and D. Caldwell (2010), Robot motor skill coordination with EM-based reinforcement learning, In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
Kronander, K and Billard, A. (2012), Online Learning of Varying Stiffness Through Physical Human-Robot Interaction, IEEE-RAS Int. Conf. on Human-Robot Interaction (ICRA).
Kruger, Volker; Herzog, Dennis; Baby, Sanmohan; Ude, Aleš and Kragic, Danica (2010). Learning Actions from Observations. IEEE Robotics & Automation Magazine. 17(2): 30-43. doi:10.1109/mra.2010.936961.
Kulic, D.; Ott, C.; Lee, D.; Ishikawa, J. and Nakamura, Y. (2012). Incremental learning of full body motion primitives and their sequencing through human motion observation. The International Journal of Robotics Research. 31(3): 330-345. doi:10.1177/0278364911426178.
Kulic, D.; Takano, W. and Nakamura, Y. (2008). Incremental Learning, Clustering and Hierarchy Formation of Whole Body Motion Patterns using Adaptive Hidden Markov Chains. The International Journal of Robotics Research. 27(7): 761-784.
Lee, D. and Ott, C. (2011), Incremental Kinesthetic Teaching of Motion Primitives Using the Motion Refinement Tube, Autonomous Robots, 31, no. 2, 115-131.
Lopes, M, Melo, F.S, Kenward, B. and Santos-Victor, J. (2009), A Computational Model of Social-Learning Mechanisms, Adaptive Behaviour, 467(17).
Lockerd, A. and Breazeal, C. (2004), Tutelage and Socially Guided Robot Learning, IEEE Int. Conf. on Robotics and Intelligent Systems, IROS.
Mangin, O and Oudeyer, P-Y (2011), Unsupervised learning of simultaneous motor primitives through imitation, IEEE Int. Conf. on Developmental Learning.
Mülling, K., Kober, J., Krömer, O., Peters, J. (2013). Learning to Select and Generalize Striking Movements in Robot Table Tennis, International Journal of Robotics Re- search, 32(3), pp. 280–298.
Nehaniv, C.L. (2007), Nine Billion Correspondence Problems. In C. L. Nehaniv & K. Dautenhahn (Eds.), Imitation and Social Learning in Robots, Humans and Animals: Behavioural, Social and Communicative Dimensions, Cambridge University Press, 2007.
Nehaniv, C.L. & Dautenhah, D. (2011), Like Me? - Measures of Correspondence and Imitation," Cybernetics and Systems, pp. 11-51.
Nakanishi, J.; Morimoto, J.; Endo, G.; Cheng, G.; Schaal, S.; Kawato, M. (2004). Learning from demonstration and adaptation of biped locomotion, Robotics and Autonomous Systems, 47, 2-3, pp.79-91.
Nicolescu, M. N and Matarić, M.J (2003), Methods for robot task learning: Demonstrations, generalization and practice, in: Proceedings of the Second International Joint Conference on Autonomous Agents and Multi-Agent Systems, AAMAS'03.
Oztop, Erhan; Kawato, Mitsuo and Arbib, Michael (2006). Mirror neurons and imitation: A computationally guided review. Neural Networks. 19(3): 254-271. doi:10.1016/j.neunet.2006.02.002.
Peternel, L. and Babic, J. (2013), Humanoid Robot Posture-Control Learning in Real-Time Based on Human Sensorimotor Learning Ability, IEEE International Conference on Robotics and Automation (ICRA)Karlsruhe, Germany, May 6-10, 2013.
Peters, J, Vijayakumar, S. and Schaal, S. (2003), Reinforcement Learning for Humanoid Robotics, In Proceedings of the IEEE International Conference on Humanoid Robots.
Ratliff, N, Bagnell, A. J. and Zinkevich, M. (2006), Maximum Margin Planning, International Conference on Machine Learning, ICML'06.
Ratliff, N, Ziebart, B., Peterson, K., Bagnell, J.B and Hebert, H(2009) Inverse Optimal Heuristic Control for Imitation Learning, Proceedings of the 12th International Conference on Artificial Intelligence and Statistics.
Rai, Akshara, de Chambrier, Guillaume and Billard, A. (2013) Learning from Failed Demonstrations in Unreliable Systems. IEEE-RAS International Conference on Humanoid Robots.
Ross, S, Gordon, G. and Bagnell, J.A. (2011), A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning, Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS11).
Rozo, L, Jimenez, P. and Torras (2011), C. "Robot Learning from Demonstration of Force-based Tasks with Multiple Solution Trajectories," in 15th International Conference on Advanced Robotics, ICAR'11.
Rybski, P., Yoon, K., Stolarz, J. and Veloso, M. (2007), Interactive robot task training through dialog and demonstration. IEEE-ACM International Conference on Human-Robot Interaction, HRI'07.
Sauser, E., Argall, B. D., Metta, G. and Billard, A. (2011) Iterative Learning of Grasp Adaptation through Human Corrections. Robotics and Autonomous Systems, Volume 60, Issue 1, January 2012, Pages 55–71.
Schaal, S., Ijspeert, A. and Billard, A. (2003). Computational approaches to motor learning by imitation, Philosophical Transactions: Biological Sciences (The Royal Society).
Silver, D, Bagnell, A. and Stentz, A, (2012), Active Learning from Demonstration for Robust Autonomous Navigation, IEEE Conference on Robotics and Automation, ICRA'12.
Shon, A. and Grochow, K. and Hertzmann, A. and Rao, R. (2006) Learning Shared Latent Structure for Image Synthesis and Robotic Imitation. Advances in Neural Information Processing Systems (NIPS), p.1233-1240.
Shon, A. P., Verma,D. and Rao, R.P.N. (2007), Active imitation learning, The 22nd Conference on Artificial Intelligence, AAAI'07.
Skoglund, A., Iliev, B., Kadmiry, B. and Palm, R. Programming by Demonstration of Pick-and-Place Tasks for Industrial Manipulators using Task Primitives. International Symposium on Computational Intelligence in Robotics and Automation, 2007. CIRA 2007.
Tanwani, A. K. and Billard, A. (2013) Transfer in Inverse Reinforcement Learning for Multiple Strategies. IEEE/RSJ International Conference on Intelligent Robots and Systems, 2013.
Thomaz(2008). Teachable robots: Understanding human teaching behavior to build more effective robot learners. Artificial Intelligence. 172(6-7): 716-737.
Tomasello, Michael; Kruger, Ann Cale and Ratner, Hilary Horn (1993). Cultural learning. Behavioral and Brain Sciences. 16(03): 495. doi:10.1017/s0140525x0003123x.
Ude, A, Atkeson, C.G and Riley, M (2004), Programming full-body movements for humanoid robots by observation, Robotics and Autonomous Systems, 47:2–3, p. 93–108.
Wu, Y and Demiris, Y (2010), Towards One Shot Learning by imitation for humanoid robots, IEEE-RAS Int. Conf. on Robotics and Automation (ICRA).
Ziebart, B. D., Mass, A., Bagnell, A. and Dey, A.K. (2008), Maximum Entropy Inverse Reinforcement Learning, Proceedings of the AAAI Conference on Artificial Intelligence, 2008.

Robot learning by demonstration

Contents

Overview

Principle

Historical Context

Key Issues in Programming by Demonstration / Learning from Demonstration

What to imitate and Evaluation Metric

How to Imitate and the Notion of Correspondence

Interfaces for Demonstration

Ways to Solve LfD - PbD

Low level learning of individual motions

Teaching Force-Control Tasks

Learning high-level action composition

Imitation Learning combined with Other Learning Techniques

Imitation Learning and Reinforcement Learning

LfD - PbD and Human-Robot Interaction

Limitations and Open Questions

Further Reading

References

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Focal areas

Activity

Tools