Talk:Robot learning by demonstration

From Scholarpedia
Jump to: navigation, search

Comments from Reviewer A:

This article presents a brief overview of the field of robot learning by demonstration. Unfortunately, the article in its current form is too vague and omits far too much detail to be of much use to the intended audience, a junior student at the graduate level. In order to be useful for newcomers to the field, the article should provide more detail and most imporatntly, a much more detailed list of references for further reading. These concerns are detailed below:

- This paper has very few references to existing work. In order to be useful as an introductory paper, references to further reading should be provided. In particular, references are needed when discussing different solutions to the correspondence problem, providing examples of well known batch RL and incremental algorithms, discussing different metrics, discussing the open problems etc.

-Only a single reference is provided in the text, and all the references are relegated to current work. I think it would be helpful if references are provided in the text itself, rather than at the end in the current work subpages.

- In addition, the current work mostly consists of very recent papers, which is appropriate, but there are no references to key early works, which influence much of the later development. The manuscript should contain an overview of the key early works, with appropriate references provided.

- The learning section is lacking in detail. I think it would be useful to provide some additional detail (or at least provide references), to the key techniques, such as DMPs, HMMs, and RL approaches. It would be helpful if the reader had at least a basic idea of how these learning systems are implemented, with references to the original works for further reading.

- In addition, it would be helpful if the authors could include some illustrative figures, showing for example demonstrated and learned trajectories via the various methods. The current illustrations in the paper seem very high level, and not as useful.

- The authors define robot learning as "finding a controller that satisfies some constraints". I think this definition is vague and possibly too narrow, and captures only a component of the robot learning literature, primarily the learning of individual motion primitives. Robot learning has also been formulated as modeling the task/behavior, as well as assuming that the motion primitives are already known but that the goal or plan must be learned. It would be helpful to overview these alternate formulations in an overview article such as this.

- The definition of the open problems seems narrowly focused and somewhat arbitrarily chosen. What about the selection of the metric, or the selection of the appropriate level of learning (learing to imitate the trajectory vs. learning the goal)? Do the authors believe these are solved problems? Also, interactive learning is grouped together with controller representation as a single open problem of meta learning. It is not clear how these are related? What about generalization?

- What is meant by "instatiating the policy" in the problem overview? This might not be obvious to the novice reader, and should be explained.

- Figure 1 seems very garish and not particularly illustrative. Many of the slides are too busy and filled with various objects, so that it is difficult to understand what is being conveyed before the animation advances to the next slide. The animation style seems very simplistic and more suited to children rather than the intended audience.

- I am not sure how Figure 3 illustrates the correspondences between the human and robot state spaces.




Comments from Reviewer B:

This article describes robot learning from demonstration within a very general framework. In many ways, I think employing this very general perspective makes it difficult to understand the core concepts. My general suggestion is to ground the description in existing work (with appropriate references to application) and to simplify the presentation by assuming the common assumptions of those works. For instance, assuming that demonstrations are obtained from teleoperation greatly simplifies the main ideas and leaves the discussion of correspondences as an extension.

My specific comments and questions are below.

Are there stronger justifications for the RLfD approach in the introduction than being biologically inspired and more 'intuitive' than explicit programming? It seems in many applications the efficiency of learning and generalization are what make it a preferred approach.

The "Background and Motivation" section would benefit by:

  • first describing the general goal of obtaining a policy that attains certain objectives;
  • then describing teleoperation / explicitly programming the policy / reinforcement learning and the limitations of those techniques
  • then describing in detail some of the applications of learning by demonstration since the 1980's with references to the appropriate literature.

In the "Problem Overview" section, it's not clear that the correspondence problem and the distinction between batch and incremental learning are the most central aspects of RLfD to discuss. Also, why are "Open problems" introduced here before describing existing techniques?

Much of the early work in this area employed direct policy imitation (i.e., a loss function on the person and robot's controls from the same state). Discussing this general approach and the specific techniques for estimating the policy would be useful (and then contrasting to more recent methods that match less-specific properties of behavior).

I don't understand the distinction between the "Batch Learning" and "Self-Improvement Learning" settings. Is the only difference whether there is a closed-form solution for determining the optimal policy from M? Are evaluation samples from simulation or invoked on the robot? Perhaps the assumptions of known state dynamics should be stated explicitly for each setting.

Personal tools
Namespaces
Variants
Actions
Navigation
Focal areas
Activity
Toolbox