Robot learning by demonstration
Robot Learning from Demonstration (RLfD) or Robot Programming by Demonstration (RPbD) (also known as imitation learning and apprenticeship learning) is a paradigm for enabling robots to autonomously perform new tasks. Rather than requiring users to analytically decompose and manually program a desired behavior, work in RLfD/RPbD takes the view that an appropriate robot controller can be derived from observations of a human's own performance thereof. The aim is for robot capabilities to be more easily extended and adapted to novel situations, even by users without programming ability.
Consider a household robot capable of performing manipulation tasks. One task that an end-user may desire the robot to perform is to prepare a meal, such as preparing an orange juice for breakfast, see Figure 1. Doing so may involve multiple subtasks, such as juicing the orange, throwing the rest of the orange in the trash and pouring the liquid in a cup. Further, every time this meal is prepared, the robot will need to adapt its motion to the fact that the location and the type of object manipulated (cup, juicer) may change.
In a traditional programming scenario, a human programmer would have to code a robot controller that is capable of responding to any situation the robot may face. The overall task may need to be broken down into 10s or 100s smaller steps, each one tested for robustness prior to the robot's leaving the factory. If and when failures occurred in the field, highly-skilled technicians would need to be dispatched to update the system for the new circumstances. Instead, RLfD/RPbD allows the end-user to 'program' the robot simply by showing it how to perform the task - no coding required. Then, when failures occur, the end-user need only provide more demonstrations, rather than calling for professional help. RLfD/RPbD hence seeks to endow robots with the ability to learn what it means to perform a task by generalizing from observing several demonstrations, see Figure 1. RLfD/RPbD is NOT a record and play technique. Learning, henceforth generalizing, is core to RLfD/RPbD.
Robot Learning from Demonstration started in the 1980s. Then, and still to a large extent now, robots had to be explicitly and tediously hand programmed for each task they had to perform. RLfD/RPbD sought to minimize, or even eliminate, this difficult step.
RLfD/RPbD promises are thus multiple. On the one hand, one hopes that it will make learning faster, in contrast to tedious trial-and-error learning, particularly in high dimensional spaces (henceforth addressing part of the well-known curse of dimensionality). On the other hand, one expects that the methods, being user-friendly, would allow robots to be utilized to a greater extent in day-to-day interactions with non-specialist humans.
Research on RLfD/RPbD has grown steadily in importance since the 80s. It has become a central topic of robotics that spans across general research areas such as human-robot interaction, machine learning, machine vision and motor control, and entire sessions and tutorials are devoted to RLfD/RPbD in all major robotics conferences.
Next, we outline key questions in RLfD/RPbD and contemporary approaches at solving these, which we illustrate with various short movies. A list of recent surveys of the field are provided at the bottom of this article.
Key Issues in Programming by Demonstration / Learning from Demonstration
Nehaniv & Dautenhahn (2001) phrased the problems faced by RLfD/RPbD in a set of key questions: '’'What to imitate? How to imitate? When to imitate? Whom to imitate?'’'. To date, only the first two questions have really been addressed in RLfD/RPbD.
What to imitate and Evaluation Metric
What to imitate relates to the problem of determining which aspects of the demonstration should be imitated. For a given task, certain observable or affectable properties may be irrelevant and safely ignored.
For instance, if the demonstrator always approaches a location from the north, is it necessary for the robot to do the same? Answering this question strongly influences whether or not a derived robot controller is a successful imitation - a robot that approaches from the south is appropriately trained if direction is unimportant, but needs further education if it is. This issue is related to questions of signal versus noise, and answered by determining the metric by which the resulting behavior is evaluated. Figure 3 shows an example in which color, while sensible to the robots, is ignored in this ordering task. Different ways can be taken to address this issue. The simplest approach is to take a statistical perspective and deem as relevant the part (dimension, region of input space) of the data which are consistently measured across all demonstration instances. If the dimension of the data is too high, such an approach may require too many demonstrations to gather enough statistics. An alternative is then to have the teacher help the robot determine what is relevant by pointing out parts of the task that are most important.
What to imitate removes from consideration details that, while perceptible/performable, do not matter for the task. It participates in determining the metric by which the reproduction of the robot can be measured.
How to Imitate and the Notion of Correspondence
How to imitate consists in determining how the robot will perform those parts of the demonstration that should be imitated. E.g, if the demonstrator uses a foot to move an object, is it acceptable for a wheeled robot to bump it, or should it use a gripper instead? Given that robots and humans may have different embodiments, this issue is closely related to that of the ‘’’Correspondence Problem’’’ (Nehaniv 2007). The evaluation metric addresses this issue in conjunction with the task-mapping projectors.
Robots and Humans, while inhabiting the same space and interacting with the same objects, and perhaps even superficially similar, still perceive and interact with the world in fundamentally different ways. To evaluate the similarity between the human and robot behaviors, we must first deal with the fact that the human and the robot may occupy different state spaces, of perhaps different dimensions. We identify two different ways in which states of demonstrator and imitator can be said to correspond, and give brief examples:
1: Perceptual equivalence: Due to differences between human and robot sensory capabilities, the same scene may appear very different to each. For instance, while a human may identify humans and gestures from light, a robot may use depth measurements to observe the same scene, see Figure 4. Hence, teaching robots requires a good understanding of the physics of the robot's sensors. Often, teachers will design graphical interfaces that allow them to visualize this information in a intuitive (for the human) way.
Tactile sensing is another area where robotic and human sensors differ importantly. Most tactile sensors allow robots to perceive contact, but do not offer information about temperature, in contrast to the human skin. Moreover, the low resolution of the robots' tactile sensors does not allow robots to discriminate across the variety of existing textures, while as the human skin does. RLfD/RPbD explores the limits of these perceptual equivalences.
2: Physical equivalence: Due to differences between human and robot embodiments, humans and robots may perform different actions to accomplish the same physical effect. For instance, even when performing the same task (football), humans and robots may interact with the environment in different ways, see Figure 5. The humans run and kick, while the robots roll and bump.
Solving this discrepancy in motor capabilities is akin to solving the How to imitate problem. RLfD/RPbD develops way to solve this problem. Typically, the robot may compute a path (in Cartesian space) for its end-effector that is close to the path followed by the human's hand, while relying on inverse kinematics to find the appropriate joint displacements. In the football example above, this would require the robot to determine a path for its center of mass which corresponds to the path followed by the human's right foot when projected on the ground. Clearly, this equivalence is very task-dependent. Recent solutions to this problem for hand motion and body motion can be found in (Giosio et al 2012; Shon et 2006).
We can think of the perceptual equivalence as dealing with the manner in which the agents perceive the world, and makes sure that the information necessary to perform the task is available to both. Physical equivalence deals with the manner in which agents affect and interact with the world, and makes sure that the task is actually performable by both.
Interfaces for Demonstration
The interface used to provide demonstration plays a key role in the way the information is gathered and transmitted. We can distinguish three major trends:
A) One may directly record human motions. If one is interested solely in the kinematic of the motion, one may use any of the various existing motion tracking systems, whether these are based on vision, exoskeleton or other types of wearable motion sensors. Figure 6 shows an example of full body motion tracking during walking using vision. The motion of the human body is first extracted from the background using a model of human body. This model is subsequently mapped to an avatar and then to the humanoid robot DB at ATR, Kyoto, Japan.
These external means of tracking human motion return precise measurement of the angular displacement of the joints. They have been used in various works for RLfD/RPbD of full body motion [Kulic et al. 2008; Ude et al. 2004; Kim et al. 2009]. These methods are advantageous in that they allow the human to move freely. They however require solutions to the correspondence problem, i.e. the problem of how to transfer motion from human to robot when both differ in the kinematic and dynamics of their body. This is typically done when mapping the motion of the joints that are tracked visually to a model of human body that matches closely that of the robot. Such mapping would be particularly difficult to perform when the walking machine (e.g. an hexapod) differs importantly from the human body. The problem of mapping actions across two dissimilar bodies has already been evoked earlier on and refers to the correspondence problem.
B) Second, there are techniques such as kinesthetic teaching, where the robot is physically guided through the task by the humans. This approach simplifies the correspondence problem by letting the user demonstrate the skill in the robot's environment with the robot's own capabilities. It also provides a natural teaching interface to correct a skill reproduced by the robot. Recent advances in skin technology offers the possibility to teach robots how to exploit touch contact on object (see figure on the right handside of this part of the text). By exploiting the compliance of the iCub robot’s fingers, the teacher can teach the robot how to adapt the posture of the fingers in response to change in its tactile sensing at the finger tips (Sauser et al. 2011). Another example of kinesthetic teaching is shown in Figure 1.
One main drawback of kinesthetic teaching is that the human must often use more degrees of freedom to move the robot than the number of degrees of freedom moved on the robot. This is visible in the example in the figure on the right handside. To move the fingers of one hand of the robot, the teacher must use both hands. Similarly, when moving the arm of the robot in Figure 1, the teacher needs to use his two arms. This limits the type of tasks that can be taught through kinesthetic teaching. Typically tasks that would require to move the two hands simultaneously could not be taught this way. One could either proceed incrementally, teaching first the task for the right hand and then, while the robot replays the motion with its right hand, teach the motion of the right hand. This may however prove to be cumbersome. The use of external trackers as reviewed in A above are more amenable to teach coordinated motion between several limbs.
C) Lastly, there are immersive teleoperation scenarios, where a human operator is limited to using the robot's own sensors and effectors to perform the task . Teleoperation may be done using simply joysticks or other remote control devices, including haptic devices. The later has the advantage that it can allow the teacher to teach tasks that require precise control of forces, while joysticks would only provide kinematic information (position, speed).
Teleoperation is advantageous over external motion tracking system in that this solves entirely the correspondence problem as the system records directly the perception and action from the robot’s standpoint. It is advantageous over kinesthetic training as it allows to train robots from a distance and is, hence, particularly suited for teaching navigation and locomotion patterns. The teacher no longer needs to be in the same room as the robot. For instance, in [Coates et al. 2008], acrobatic trajectories of an helicopter are learned by recording the motion of the helicopter when teleoperated by an expert pilot. In [Grollman & Jenkins 07], a robot dog is taught to play soccer with a human guiding it via a joystick.
The disadvantage is that the teacher often needs training to learn to use the remote control device. Teleoperation using simple joystick allows to guide only a subset of degrees of freedom. To control for all degrees of freedom, very complex, exoskeleton type of devices must be used, which are cumbersome.
Each teaching interface has its pros and cons. It is hence interesting to investigate how these interfaces could be used in conjunction to exploit complementary information provided by each modality separately, see, e.g. (Sauser et al 2011).
Ways to Solve RLfD/RPbD
Current approaches to encoding skills through RLfD/RPbD can be broadly divided between two trends: a low-level representation of the skill, taking the form of a non-linear mapping between sensory and motor information, and, a high-level representation of the skill that decomposes the skill in a sequence of action-perception units.
While the majority of work in RLfD/RPbD uses solely the demonstrations for learning, a growing body of works looks at ways in which RLfD/RPbD can be combined with other learning techniques. One group of work investigates how to combine imitation learning with reinforcement learning, a method by which the robot learns through trial and error, so as to maximize a reward. Other works take inspiration in the way humans teach each other and introduce interactive and bidirectional teaching scenario whereby the robot becomes an active partner during the teaching. We briefly review the main principle underlying each of these areas below:
Learning individual motions
Individual motions/actions (e.g. juicing the orange, trashing it and pouring liquid in the cup in the example shown in Figure 1) could be taught separately instead of simultaneously, as shown in this previous example. The human teacher would then provide one or more examples of each sub-motion. If learning proceeds from the observation of a single instance of the motion/action, one calls this one-shot learning (Wu and Demiris 2010). Examples can be found in (Nakanishi et al 2004) for learning locomotion patterns. To make sure that this is not akin to simple record and play, the controller is provided with prior knowledge in the form of primitive motion patterns. Learning consists then of instantiating the parameters modulating these motion patterns.
Teaching can also proceed in batch after recording several demonstrations, or incrementally by adding recursively more information trial by trial (e.g. Lee and Ott 2011). When learning in batch mode, learning uses all examples and draw inference from comparing the individual demonstrations. Inference consists usually in a statistical analysis, where the demonstration signals are modeled via a probability density function, exploiting various non-linear regression techniques stemming from machine learning. Popular methods these days include Gaussian Process, Gaussian Mixture Models, Support Vector Machines, see this page for examples of these works.
Teaching Force-Control Tasks
While most RLfD/RPbD work to date focused on learning the kinematic of motions by recording the position of the end-effector and/or the position of the robot’s joints, more recently, a few works have investigated transmission of force-based signals through human demonstration (Calinon et al. 09; Kormushev et al 2011, Rozo et al 2011). Transmitting information about force is difficult for humans and for robots alike. Force can be sensed only when performing the task ourselves. Current efforts, hence, seek to develop methods by which one may “embody” the robot and, by so doing, allow human and robot to perceive simultaneously the forces applied when performing the task. A new exciting line of research, hence, leverages on recent advances in the design of haptic devices and tactile sensing, and on the development of torque and variable impedance actuated systems to teach force-control tasks through human demonstration.
An example of such work is shown in the figure on the right handside of the text. The Barrett WAM 7 DOF robot is taught how to adapt its stiffness by the teacher shaking the robot. The stiffness is decreased in the eigen-direction of the perturbation and inversely proportionally to the eigenvalues (Kronander and Billard, 2012).
Learning Compound Actions
Learning complex tasks, composed of a combination and juxtaposition of individual motions, is the ultimate goal of RLfD/RPbD.
There are two major ways to proceed to learning such complex tasks. One may first learn models of all individual motions, using demonstrations of each of these actions individually. In a second stage, one may learn the right sequencing and combination of these actions, by observing a human performing the whole task (Dilmman 2004; Skoglund et al. 2007). This approach, however, assumes that one can list all necessary individual actions, so-called primitive actions. To date, there does not exist a database of such primitive actions and one may wonder whether the variability of human motion may really be reduced to a finite list of possible motions.
The alternative is to watch the human perform the complete task and to automatically segment the task to extract the primitive actions (which may then become task-dependent), see e.g. (Kulic et al 2012). This has the advantage to learn, in one swipe, both the primitive actions and the way they should be combined.
In the figure on the right-handside of the text, we show an example of such complex tasks composed of compound actions, where the robot loads dirty dishes in a dishwasher, a typical kitchen-cleaning task. Learning proceeded according to the first approach highlighted above. Given a set of known (pre-programmed or learned previously) behaviors, such as pick up cup, move toward dishwasher, open dishwasher, etc, the robot must learn the correct sequence of actions to perform. In the example on the right hand-side of the text, the robot Armar is told to fill in a dish washer; the complete action is done by sequencing a set of primitive actions, several of which were taught through RLfD/RPbD previously (Asfour et al. 2008). Each skill was learned first separately and sequences of these skills was either induced through human-request via speech processing or learned through observation of the task completed by a human demonstrators.
Other examples include learning sequencing of known behavior for navigation through imitation of a more knowledgeable robot or human (Demiris and Hayes, 2006; Gaussier et al. 98, Nicolescu and Mataric 2003); and learning and sequencing of primitive motions for full body motion in humanoid robots (Billard, 2000; Ito and Tani 2004; Kulic et al. 2008).
Imitation Learning and Reinforcement Learning
Imitation learning is limiting in that it requires the robot to learn only from what has been demonstrated. Reinforcement learning, in contrast, allows the robot to discover new control policies through free exploration of state-action space. Approaches that combine imitation learning and reinforcement learning aim at exploiting the strength of both algorithms to overcome their respective drawbacks. Demonstrations are used to guide the exploration done in reinforcement learning, hence reducing the time to find an adequate control policy, while still allowing the robot to depart from the demonstrated behavior. Figure 8 and Figure 9 show two examples of techniques that use Reinforcement Learning in conjunction with RLfD/RPbD to improve the robot's performance beyond that of a demonstrator, with respect to a known reward function. See also Reward-based LfD examples.
- Inverse Reinforcement Learning/Learning the Cost Function
While most of the works that combine imitation learning with reinforcement learning assume a known reward to guide the exploration, Inverse Reinforcement Learning (IRL) offers a framework to determine automatically the reward and the optimal control policy (Abbeel and Ng 2004). When using human demonstrations to guide learning, IRL is solving jointly the What to imitate and How to imitate problems, see examples of Inverse Reinforcement Learning. Other approaches to estimating the reward or cost function automatically have been proposed, see, for instance, the maximum margin planning technique [Ratliff et al 2006] and the automatic extraction of constraints [Billard et al 2006].
- Learning from Failed Demonstrations
The vast majority of work on RLfD/RPbD relies on successful demonstrations of the desired task by the human. It hence assumes that all the demonstrations are good demonstrations and discards those that are poor proxy of what would be deemed as a good demonstration. Recent work has also investigated the possibility that demonstrations may instead be failed attempts at performing the task (Grollman and Billard, 2011). In this case, RLfD/RPbD focuses on learning what to and what not to imitate. It offers an interesting alternative to approaches that combine imitation learning and reinforcement learning, in that no reward needs to be explicitly determined, see Figure , see also Learning from Failure
RLfD/RPbD and Human-Robot Interaction
Since RLfD/RPbD necessarily deals both with humans and robots, it overlaps heavily with the field of Human Robot Interaction (HRI). In addition to the learning algorithms themselves, many human-centric issues are researched as part of RLfD/RPbD, mostly focused on how to better elicit and utilize the demonstrations, see [Goodrich & Schultz 07, Fong et al 03, Breazeal & Scasselatti 02] for surveys.
New lines of research in that area seek to give a more active role to the teacher in a bidirectional teaching process. Robots become more active partners and can indicate which portion of the demonstration was unclear. Teachers may in turn refine the robot’s knowledge by providing complementary information where the robot is performing poorly (see, eg. Chernova and Veloso 2009). The design of such incremental teaching methods calls for machine learning techniques that enable the incorporation of new data in a robust manner. It also opens the door to the design of other human-robot interfacing systems, including the use of speech to permit informed dialogues between humans and robots.
An example of such bidirectional teaching is given in the movie on the right. The robot asks for help during or after teaching, verifying that its understanding of the task is correct(Cakmak and Thomaz 2012), see .
Work in this area focuses on techniques whereby the user and robot can work more closely together to improve the robot's policy. Areas of interest include endowing the robot with a sense of confidence in its abilities, so it can ask for help, and allowing the user to address particular subportions of the overall task, see examples of Interactive Learning.
Limitations and Open Questions
Work in RLfD/RPbD, as any work in robot learning, makes a number of assumptions. These relate to the choice of data representation, of model, of learning method and procedure. Generally, the form of the robot's control policy is fixed, and learning focuses on determining appropriate parameters (even for nonparametric methods). Instead, a system could be provided with multiple possible representations of controllers and select which is most appropriate. Additionally, the data collection process can include varying amounts of interaction with the human. Progresses at solving these issues will advance importantly research on RLfD/RPbD and on robot learning at large.
In an additional page, we have regrouped a list of current work. For ease of viewing, we divide the field into several broad areas, placing work based on the main focus of the research. For each program of research we provide a reference and succinctly describe the choices for the metric and the correspondence function. We further provide some notes on the model used for learning and the update method. This page is by no means complete, and we invite other researchers to submit synopses of their own (or others') work.
See also the Formalism for Learning from Demonstration page, where we provide some formalism to describe the learning process in RLfD/RPbD. This formalism is instantiated in the examples of current work we list in the companion page.
Surveys of the field
The vast majority of work on RLfD/RPbD follows a more machine learning approach to the problem. Surveys of works in this area can be found at:
- B.D. Argall, S. Chernova, M. Veloso, and B. Browning, (2010). A Survey of Robot Learning from Demonstration. Robotics and Autonomous Systems .
- A. Billard, S. Calinon, R. Dillmann and S. Schaal (2008). Robot Programming by Demonstration. Handbook of Robotics: MIT Press .
- S. Schaal, A. Ijspeert and A. Billard (2003). Computational approaches to motor learning by imitation, Philosophical Transactions: Biological Sciences (The Royal Society) .
- S. Schaal (1999). Is imitation learning the route to humanoid robots? Trends in Cognitive Sciences .
RLfD/RPbD is at core inspired by the way humans learn from being guided by experts, from infancy through adulthood. A large body of work on RLfD/RPbD takes, hence, inspiration in concepts from psychology and biology. Some of these works pursue a computational neuroscience approach and uses neural modeling. Others pursue a more cognitive science approach and build conceptual model of imitation learning in animals. Surveys of work in this area can be found in:
- E Oztop, M Kawato (2006). Mirror neurons and imitation: A computationally guided review. Neural Networks .
- K. Dautenhahn and C. Nehaniv (2002). Imitation in Animals and Artifacts, MIT Press .
- A. Billard (2002). Imitation. Handbook of Brain Theory and Neural Networks: MIT Press .
- C. Breazeal and B. Scassellati (2002). "Robots that imitate humans," Trends in Cognitive Science .
- Abbeel & Ng, "Apprenticeship Learning via Inverse Reinforcement Learning." International Conference on Machine Learning, 2004.
- Asfour, T., Azad, P., Vahrenkamp, N., Regenstein, K., Bierbaum, A., Welke, K., Schröder, J. & Dillmann, R. (2007), Toward humanoid manipulation in human-centred environments, Robotics and Autonomous Systems, Vol. 56, pp. 54-65
- Billard, A. (2000) Learning motor skills by imitation: a biologically inspired robotic model. Cybernetics & Systems, 32, 1-2, 155-193
- Billard, A., Calinon, S. and Guenter, F. (2006) Discriminative and Adaptive Imitation in Uni-Manual and Bi-Manual Tasks. Robotics and Autonomous Systems, 54:5.
- Billard, A., Calinon, S., Dillmann, R. and Schaal, S. (2008) Robot Programming by Demonstration (Review). Handbook of Robotics, . chapter 59, 2008.
- Breazeal, C and Scassellati, (2002) B, Robots that imitate humans, Trends in Cognitive Science, Vol. 6, Issue 11, P. 481–487.
- Cakmak and Thomaz (2012), Designing Robot Learners that Ask Good Questions, :Proceedings of the ACM-IEEE Int. Conf. on Human-Robot Interaction.
- Calinon, S., Evrard, P., Gribovskaya, E., Billard, A. and Kheddar, A. (2009) Learning collaborative manipulation tasks by demonstration using a haptic interface. Proceedings of the International Conference on Advanced Robotics (ICAR), 2009.
- Chernova, S. and Veloso, M. Interactive Policy Learning through Confidence-Based Autonomy. Journal of Artificial Intelligence Research. Vol. 34, 2009.
- Coates, A,, Abbeel, P and Ng, A. Y., "Learning for control from multiple demonstrations," in Proc. 25th Intl. Conf. on Machine Learning (ICML 2008), A. McCallum and S. Roweis, Eds., ACM International Conference Proceeding Series, Vol. 307, New York, NY: The Association for Computing Machinery, Inc., 2008, pp. 144-151.
- Wu, Y and Demiris, Y, Towards One Shot Learning by imitation for humanoid robots, IEEE-RAS Int. Conf. on Robotics and Automation, ICRA 2010.
- Dillman, R. (2004), "Teaching and learning of robot tasks via observation of human performance", Robotics and Autonomous Systems Volume 47, Issues 2-3, 30 June 2004, Pages 109-116.
- Evrard, P., Gribovskaya, E., Calinon, S., Billard, A. and Kheddar, A. (2009) Teaching Physical Collaborative Tasks: Object-Lifting Case Study with a Humanoid. Proceedings of IEEE International Conference on Humanoid Robots, 2009.
- Fong, T, Nourbakhsh, I and Dautenhahn (2003), K, A survey of socially interactive robots, Robotics and Autonomous Systems, Volume 42, Issues 3–4, P. 143–166.
- Gaussier, P et al., From perception–action loops to imitation processes: A bottom-up approach of learning by imitation, Applied Artificial Intelligence Journal 12 (7–8) (1998).
- Ito, M and Tani, J, On-line imitative interaction with a humanoid robot using a dynamic neural network model of a mirror system. Adaptive Behavior, 12 2 (2004), pp. 93–115.
- Gioioso, G. AND Salvietti, G. AND Malvezzi, M. AND Prattichizzo, D. (2012), An Object-Based Approach to Map Human Hand Synergies onto Robotic Hands with Dissimilar Kinematics, Proceedings of Robotics: Science and Systems (RSS).
- Goodrich, M and Schultz, A (2007), Human-robot interaction: a survey, Foundations and Trends in Human-Computer Interaction, Vol 1, issue 3.
- Grollman, D and Jenkins, O.C, Incremental learning of subtasks from unsegmented demonstration, In International Conference on Intelligent Robots and Systems, Taipei, Taiwan, October 2010.
- Kim, S., Kim, C., You, B. and Oh, S (2009) Stable Whole-body Motion Generation for Humanoid robots to Imitate Human Motions. Proc. IEEE/RSJ Intl Conf. on Intelligent Robots and Systems (IROS).
- Kormushev, P, Calinon, S, and D. Caldwell, “Imitation Learning of Positional and Force Skills Demonstrated via Kinesthetic Teaching and Haptic Input,” Advanced Robotics, pp. 1–20, 2011.
- Kronander, K and Billard, A. Online Learning of Varying Stiffness Through Physical Human-Robot Interaction, IEEE-RAS Int. Conf. on Human-Robot Interaction (ICRA), 2012.
- Kruger, V.; Herzog, D.; Baby, S.; Ude, A.; Kragic, D.; Learning actions from observations, Robotics and Automation Magazine, 17:2, 30-43, 2010
- Kulić, D, Ott, C, Lee, C, Ishikawa, J. and Nakamura, Y. Incremental learning of full body motion primitives and their sequencing through human motion observation, International Journal of Robotics Research, Vol. 31, No. 3, pp. 330 - 345, 2012
- Kulic, D, Takano, W and Nakamura, Y, “Incremental learning, clustering and hierarchy formation of whole body motion patterns using adaptive hidden Markov chains,” Int. J. Robot. Res., vol. 27, no. 7, pp. 761–784, 2008.
- Lee, D. and Ott, C. Incremental Kinesthetic Teaching of Motion Primitives Using the Motion Refinement Tube , Autonomous Robots , 31 (2011) , no. 2 , 115-131.
- Nehaniv, C.L., "Nine Billion Correspondence Problems". In C. L. Nehaniv & K. Dautenhahn (Eds.), Imitation and Social Learning in Robots, Humans and Animals: Behavioural, Social and Communicative Dimensions, Cambridge University Press, 2007.
- Nehaniv, C.L. & Dautenhah, D., "Like Me? - Measures of Correspondence and Imitation," Cybernetics and Systems, Jan 2011 pp. 11-51 
- Nakanishi, J.;Morimoto, J.;Endo, G.;Cheng, G.;Schaal, S.;Kawato, M. (2004). Learning from demonstration and adaptation of biped locomotion, Robotics and Autonomous Systems, 47, 2-3, pp.79-91.
- Nicolescu, M. N and Matarić, M.J, Methods for robot task learning: Demonstrations, generalization and practice, in: Proceedings of the Second International Joint Conference on Autonomous Agents and Multi-Agent Systems, AAMAS’03, 2003.
- Rozo, L, Jimenez, P. and Torras, C. “Robot Learning from Demonstration of Force-based Tasks with Multiple Solution Trajectories,” in 15th International Conference on Advanced Robotics (ICAR), 2011, pp. 124–129.
- Ratliff, N, Bagnell, A. J. and Zinkevich, M., Maximum Margin Planning, International Conference on Machine Learning, July, 2006.
- Sauser, E., Argall, Brenna Dee, Metta, Giorgio and Billard, A. (2011) Iterative Learning of Grasp Adaptation through Human Corrections. Robotics and Autonomous Systems
- Shon, A. and Grochow, K. and Hertzmann, A. and Rao, R. (2006) Learning Shared Latent Structure for Image Synthesis and Robotic Imitation. Advances in Neural Information Processing Systems (NIPS}, p.1233-1240.
- Skoglund, A., Iliev, B., Kadmiry, B. and Palm, R. Programming by Demonstration of Pick-and-Place Tasks for Industrial Manipulators using Task Primitives. International Symposium on Computational Intelligence in Robotics and Automation, 2007. CIRA 2007.
- Tani, M. Ito and Y. Sugita, Self-organization of distributed represented multiple behavior schemata in a mirror system: Reviews of robot experiments using RNNPB. Neural Networks, 17 8–9 (2004), pp. 1273–1289
- Ude, A, Atkeson, C.G and Riley, M "Programming full-body movements for humanoid robots by observation", Robotics and Autonomous Systems, vol. 47, pp. 93-108, 2004