Talk:Robot learning by demonstration

From Scholarpedia
Jump to: navigation, search

    Comments from Reviewer A:

    This article presents a brief overview of the field of robot learning by demonstration. Unfortunately, the article in its current form is too vague and omits far too much detail to be of much use to the intended audience, a junior student at the graduate level. In order to be useful for newcomers to the field, the article should provide more detail and most imporatntly, a much more detailed list of references for further reading. These concerns are detailed below:

    - This paper has very few references to existing work. In order to be useful as an introductory paper, references to further reading should be provided. In particular, references are needed when discussing different solutions to the correspondence problem, providing examples of well known batch RL and incremental algorithms, discussing different metrics, discussing the open problems etc.

    -Only a single reference is provided in the text, and all the references are relegated to current work. I think it would be helpful if references are provided in the text itself, rather than at the end in the current work subpages.

    - In addition, the current work mostly consists of very recent papers, which is appropriate, but there are no references to key early works, which influence much of the later development. The manuscript should contain an overview of the key early works, with appropriate references provided.

    - The learning section is lacking in detail. I think it would be useful to provide some additional detail (or at least provide references), to the key techniques, such as DMPs, HMMs, and RL approaches. It would be helpful if the reader had at least a basic idea of how these learning systems are implemented, with references to the original works for further reading.

    - In addition, it would be helpful if the authors could include some illustrative figures, showing for example demonstrated and learned trajectories via the various methods. The current illustrations in the paper seem very high level, and not as useful.

    - The authors define robot learning as "finding a controller that satisfies some constraints". I think this definition is vague and possibly too narrow, and captures only a component of the robot learning literature, primarily the learning of individual motion primitives. Robot learning has also been formulated as modeling the task/behavior, as well as assuming that the motion primitives are already known but that the goal or plan must be learned. It would be helpful to overview these alternate formulations in an overview article such as this.

    - The definition of the open problems seems narrowly focused and somewhat arbitrarily chosen. What about the selection of the metric, or the selection of the appropriate level of learning (learing to imitate the trajectory vs. learning the goal)? Do the authors believe these are solved problems? Also, interactive learning is grouped together with controller representation as a single open problem of meta learning. It is not clear how these are related? What about generalization?

    - What is meant by "instatiating the policy" in the problem overview? This might not be obvious to the novice reader, and should be explained.

    - Figure 1 seems very garish and not particularly illustrative. Many of the slides are too busy and filled with various objects, so that it is difficult to understand what is being conveyed before the animation advances to the next slide. The animation style seems very simplistic and more suited to children rather than the intended audience.

    - I am not sure how Figure 3 illustrates the correspondences between the human and robot state spaces.

    Comments from Reviewer B:

    This article describes robot learning from demonstration within a very general framework. In many ways, I think employing this very general perspective makes it difficult to understand the core concepts. My general suggestion is to ground the description in existing work (with appropriate references to application) and to simplify the presentation by assuming the common assumptions of those works. For instance, assuming that demonstrations are obtained from teleoperation greatly simplifies the main ideas and leaves the discussion of correspondences as an extension.

    My specific comments and questions are below.

    Are there stronger justifications for the RLfD approach in the introduction than being biologically inspired and more 'intuitive' than explicit programming? It seems in many applications the efficiency of learning and generalization are what make it a preferred approach.

    The "Background and Motivation" section would benefit by:

    • first describing the general goal of obtaining a policy that attains certain objectives;
    • then describing teleoperation / explicitly programming the policy / reinforcement learning and the limitations of those techniques
    • then describing in detail some of the applications of learning by demonstration since the 1980's with references to the appropriate literature.

    In the "Problem Overview" section, it's not clear that the correspondence problem and the distinction between batch and incremental learning are the most central aspects of RLfD to discuss. Also, why are "Open problems" introduced here before describing existing techniques?

    Much of the early work in this area employed direct policy imitation (i.e., a loss function on the person and robot's controls from the same state). Discussing this general approach and the specific techniques for estimating the policy would be useful (and then contrasting to more recent methods that match less-specific properties of behavior).

    I don't understand the distinction between the "Batch Learning" and "Self-Improvement Learning" settings. Is the only difference whether there is a closed-form solution for determining the optimal policy from M? Are evaluation samples from simulation or invoked on the robot? Perhaps the assumptions of known state dynamics should be stated explicitly for each setting.

    Additional reviewer comments and responses

    Article “Robot Learning by Demonstration” by Aude Billard and Dan Grollman. Revision June 21 2013.

    We thank the reviewer for the constructive suggestions that helped to improve this Scholarpedia article. We respond to each comment in turn below. In italics, we copy the reviewer’s comments. Our responses appear in bold.

    In terms of style I think the article is to narrative in many occasions. A more straight to the point approach would make it more clear.

    We have gone through the entire text and removed unnecessary verbiage. We also tried to condense and cut the text where appropriate.

    The inline links do not always work well.

    We tested all inline links and checked that they are all functional. Perhaps it depends on the browser. We tested with firefox 21.0 and chrome.

    In an encyclopedia like article (not a review article), if there are more than one paper that can be cited, the oldest one should be prefered. For instance, when refering to body tracking why not cited the papers from Schaal's group in early 2000 instead of [Kulic et al. 2008; Ude et al. 2004; Kim et al. 2009]. Unless to provide a revolutionary approach or robot, don't refer to specific robots or reserach groups in the main text, only on the images/references.

    These citations relate to the following sentence “These external means of tracking human motion return precise measurement of the angular displacement of the joints. They have been used in various works for RLfD/RPbD of full body motion [Kulic et al. 2008; Ude et al. 2004; Kim et al. 2009]” and to the accompanying video showing the mapping from image of a body motion tracked by a camera to reconstruction of kinematic motion of the full body and reproduction in a humanoid robot.

    The emphasis is, hence, on full body motion tracking using vision and transfer to robot. We were unable to locate a paper by Schaal in 2000 performing full body motion tracking from cameras.

    In terms of style I would remove the permanent reference to RLfD/RPbD. Just go with LfD as the title of the article.

    We would like to keep both acronyms, even though this is indeed heavier notation. PbD was the starting point of the field and encompasses a large literature. Abandoning the PbD acronym to LfD, we fear, may lead the community to ignore this early body of work. To lighten the notation, we dropped the R for both acronyms. It now reads LfD – PbD.

    Historical Context

    In this historical context please include directly the links for review/concept papers (and remove them from the specialized session latter on).

    The list of survey papers is now included in the historical context section.

    Besides the ones included I would also include Kunyioshi'96, Dillman'04 and Kober'10.

    These three pieces of work are not surveys of the field, but surveys of each of these authors’ works. It does not seem appropriate to cite them here. The section on historical context does not refer to any particular piece of work in LfD on purpose, as it would be impossible to give a fair coverage of the literature given the limit in size of the Scholarpedia article. To palliate to this, we refer to a list of surveys of the literature.

    The following sentence is probably redundant:

    "It has become a central topic of robotics that spans across general research areas such as human-robot interaction, machine learning, machine vision and motor control, and entire sessions and tutorials are devoted to RLfD/RPbD in all major robotics conferences."

    This sentence has been removed.

    The "Key Issues in ..." is very interesting and a proper reference for the papers in biology that introduced all these concepts would be important. From the computational sciences I would probably agree that papers from that group are the most clear ones.

    For instance:

    Byrne, R. W. (2002). Imitation of novel complex actions: What does the evidence from animals mean? Advances in the Study of Behavior, 31, 77– 105.

    Call, J., & Carpenter, M. (2002). Three sources of information in social learning. Kerstin Dautenhahn and Chrystopher L. Nehaniv (Eds.) In Imitation in animals and artifacts. Cambridge, MA: MIT Press, 211– 218.

    Tomasello, M., Kruger, A. C., & Ratner, H. H. (1993). Cultural learning. Behavioral and Brain Sciences, 16(3), 495–511.

    These references have been added in the section “Further Reading”.

    A paper providing links between the fields could include:

    A Computational Model of Social-Learning Mechanisms, Manuel Lopes, Francisco S. Melo, Ben Kenward and José Santos-Victor. Adaptive Behaviour, 467(17), 2009.

    besides the papers already cited that refer to mirror neurons. These papers will also inform how different constraints and contexts change the agent's decision on what/how to imitate.

    A reference to this paper was added.

    How to Imitate and the Notion of Correspondence

    It would be also helpful to include a subsection that include object mediated imitation that include researh on affordances. Here the consequences on objects is transfer and not directly the motions.

    The notion of affordances is now explicitly covered in the Section on What to Imitate.

    2: Physical equivalence

    This section referes to approaches that have to map different bodies. In many cases this requires an explicit step to make it function in the new body. I would include also the paper the ensures stability after imitation

    Khansari Zadeh, S. M. and Billard, A. (2011) Learning Stable Non-Linear Dynamical Systems with Gaussian Mixture Models. IEEE Transaction on Robotics, vol. 27, num 5, p. 943-957.

    Ensuring stability is a property of motion controllers that is not immediately related to discrepancies across bodies (the human CNS may also use a stable controller to plan trajectories).

    Interfaces for Demonstration

    Very important section, it might even by more than just the interface but also include the different modalities and types of demonstrations. See comment below on Learning from Failed Demonstrations.

    The text repeats information on body correspondence. Remove or improve previous section on the topic.

    We have removed the last sentence of the first paragraph that was indeed repetitive.

    kinesthetic teaching is in many aspects a physical remote control. Early works on industrial robot already did this albeit in a very primitive form from the learning point of view.

    The Section Ways to Solve RLfD/RPbD could be reorganized. It starts by saying that there are two trend and then it divides in more than 5.

    We agree with the reviewer that this division was somewhat unclear. We have now changed the structure. The section “Ways to Solve Lfd – PbD” encompasses now solely the first two subsections on “Learning Individual Motion” and “Learning Compound Motion”. We have created a new Section entitled “Imitation Learning combined with other Learning Techniques”. This new section regroups approaches combining PbD/LfD with RL.

    The sub-section Learning individual motions and Teaching Force-Control Tasks both refer to Learning individual motions.

    The section Learning Compound Actions should take into account works that automatically learn multiple actions from raw data. See work from Emily Fox, Jan Peters and others.

    We have added a reference to Daniel et al. 2012 and Magin & Oudeyer 2011 on learning and combining multiple primitives. The work of E. Fox was, to our knowledge, never applied to learning from demonstration and hence referring to this piece of work would seem somewhat out of context.

    The section Imitation Learning and Reinforcement Learning is very weak and does not include any references. As it is almost as old as the others, the presentation must be more balanced. Works from Peters, Bagnell, Toussaint and others must be included. This approach considers initializing a control from observed trajectories and then improve then.

    The reviewer is mistaken. There were already 3 references to RL methods applied to imitation learning and two videos devoted to this topic (including one on Peters’ work). Moreover, 3 more references to work on RL are given in the Current Work page.

    We have expanded this section to cover additional works and to provide more details on what information is extracted from the demonstrations to bootstrap RL.

    We are surprised by the comment that this section is “almost as old as the others”. The 3 papers we were citing appeared between 2009 and 2012. We have also difficulties relating this criticism with the reviewer’s earlier request to cite a paper dating back 1998 and the remark that “if there are more than one paper that can be cited, the oldest should be preferred”.

    The section on inverse reinforcement learning is very weak. More examples of applications and algorithms are necessary. The importance of this approach includes a better generalization and interpretable results. It has already given links to the relation with biology (Lopes et al.) and provide results in the highly complex and high dimension problems (taxi driver data, Bagnell's group; and helicopters, Ng's group). It has already seen several reductions to supervised learning and theoretical results on sample complexity (Bagnell's). Main limitations include assumption of being able to solve the direct problem and knowledge of the model of the environment (although some recent results circunvent this).

    We have expanded this section.

    The section on Learning from Failed Demonstrations seems to be at a different level than the other ones. It would be more helpful to include a session that explicits what is the information used for learning. Most of the times it is a trajectory, but now there are papers including: failed demonstrations, online (force and not only) corrections, queries, key-points and many others.

    Works including on-line corrections and active learning were partly covered in the HRI section. We have expanded this section to specify the different forms that these data can take.

    RLfD/RPbD and Human-Robot Interaction

    Is a new topic, and maybe can already be included in a section with

    all new topics. For instance the following could be listed: Active learning; Simultaneously learn and explore; Learn from ambiguous feedback. Papers from Roy, Knox, Lopes could be included. Intial papers from Thomaz and Breazeal would provide a better introduction to the topic.

    We have added a reference to Thomaz and Breazeal 2008.

    The list of Limitations and Open Questions is too simple. There is other open questions such as reductions between the different problems, feature selections, integration with other learning algorithms, use of biased demonstrations, automatically decomposition of skills.

    This section was very brief indeed and we have now expanded the discussion to cover the issues raised by the reviewers as well as other open questions discussed in the literature. Note that we were initially reluctant to have a section on open issues, as this is typically a section that will immediately become obsolete.

    Personal tools

    Focal areas