Goal, mistake and success learning through resimulation
Summary
In this thesis, we improve reinforcement learning through curriculum learning. We pioneer a new
approach to curriculum learning based on resimulation. We formulate 2 approaches to resimulation. Namely, Goal-Based Resimulation (GBR), where we resimulate after changing the goal, and
Initial-State-Based Resimulation (ISBR), where we resimulate after changing the initial state. We
construct one GBR method, namely G, where the goal is set to be the last state of the resimulated
episode. G is shown to enable solving tasks that are neither solvable by using Proximal Policy Optimization (PPO) [Schulman et al., 2017], an Intrinsic Curiosity module (ICM) [Pathak et al., 2017]
nor Hindsight Experience Replay (HER) [Andrychowicz et al., 2017]. We construct 2 ISBR methods, namely S+ and S−. Both methods process the advantage estimates to determine swing events,
periods with high amplitude advantage estimates. S+ and S− then resimulate successes and mistakes, respectively, by setting the initial states to be the states at the start of swing events. All
methods are tested on two tasks that only differed in the level of sparsity, and 3 reward ratios,
controlling for the extent with which the ICM was used. The performance is measured in the solve
rate and the learning speed. We find that
• G enables solving the proposed tasks.
• S+ improves the solve rate, but only on sufficiently sparse tasks and when using ICM.
• S− improves both the solve rate and the learning speed.
• G, S+ and S− can be used in unison to create a better combined algorithm.