Goal, mistake and success learning through resimulation

Scholte, Niels

View/Open

ThesisAI.pdf (1.506Mb)

Publication date

2022

Author

Scholte, Niels

Metadata

Show full item record

Summary

In this thesis, we improve reinforcement learning through curriculum learning. We pioneer a new approach to curriculum learning based on resimulation. We formulate 2 approaches to resimulation. Namely, Goal-Based Resimulation (GBR), where we resimulate after changing the goal, and Initial-State-Based Resimulation (ISBR), where we resimulate after changing the initial state. We construct one GBR method, namely G, where the goal is set to be the last state of the resimulated episode. G is shown to enable solving tasks that are neither solvable by using Proximal Policy Optimization (PPO) [Schulman et al., 2017], an Intrinsic Curiosity module (ICM) [Pathak et al., 2017] nor Hindsight Experience Replay (HER) [Andrychowicz et al., 2017]. We construct 2 ISBR methods, namely S+ and S−. Both methods process the advantage estimates to determine swing events, periods with high amplitude advantage estimates. S+ and S− then resimulate successes and mistakes, respectively, by setting the initial states to be the states at the start of swing events. All methods are tested on two tasks that only differed in the level of sparsity, and 3 reward ratios, controlling for the extent with which the ICM was used. The performance is measured in the solve rate and the learning speed. We find that • G enables solving the proposed tasks. • S+ improves the solve rate, but only on sufficiently sparse tasks and when using ICM. • S− improves both the solve rate and the learning speed. • G, S+ and S− can be used in unison to create a better combined algorithm.

URI

https://studenttheses.uu.nl/handle/20.500.12932/520

Collections

Theses