Options
Curriculum goal masking for continuous deep reinforcement learning
Publikationstyp
Conference Paper
Date Issued
2019-08
Sprache
English
Author(s)
Start Page
183
End Page
188
Article Number
8850721
Citation
19th Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob 2019)
Publisher DOI
Scopus ID
Deep reinforcement learning has recently gained a focus on problems where policy or value functions are based on universal value function approximators (UVFAs) which renders them independent of goals. Evidence exists that the sampling of goals has a strong effect on the learning performance, and the problem of optimizing the goal sampling is frequently tackled with intrinsic motivation methods. However, there is a lack of general mechanisms that focus on goal sampling in the context of deep reinforcement learning based on UVFAs. In this work, we introduce goal masking as a method to estimate a goal's difficulty level and to exploit this estimation to realize curriculum learning. Our results indicate that focusing on goals with a medium difficulty level is appropriate for deep deterministic policy gradient (DDPG) methods, while an 'aim for the stars and reach the moon-strategy', where difficult goals are sampled much more often than simple goals, leads to the best learning performance in cases where DDPG is combined with hindsight experience replay (HER).