Goal, mistake and success learning through resimulation

Scholte, Niels

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Wang, Shihan
dc.contributor.author	Scholte, Niels
dc.date.accessioned	2022-02-24T00:00:27Z
dc.date.available	2022-02-24T00:00:27Z
dc.date.issued	2022
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/520
dc.description.abstract	In this thesis, we improve reinforcement learning through curriculum learning. We pioneer a new approach to curriculum learning based on resimulation. We formulate 2 approaches to resimulation. Namely, Goal-Based Resimulation (GBR), where we resimulate after changing the goal, and Initial-State-Based Resimulation (ISBR), where we resimulate after changing the initial state. We construct one GBR method, namely G, where the goal is set to be the last state of the resimulated episode. G is shown to enable solving tasks that are neither solvable by using Proximal Policy Optimization (PPO) [Schulman et al., 2017], an Intrinsic Curiosity module (ICM) [Pathak et al., 2017] nor Hindsight Experience Replay (HER) [Andrychowicz et al., 2017]. We construct 2 ISBR methods, namely S+ and S−. Both methods process the advantage estimates to determine swing events, periods with high amplitude advantage estimates. S+ and S− then resimulate successes and mistakes, respectively, by setting the initial states to be the states at the start of swing events. All methods are tested on two tasks that only differed in the level of sparsity, and 3 reward ratios, controlling for the extent with which the ICM was used. The performance is measured in the solve rate and the learning speed. We find that • G enables solving the proposed tasks. • S+ improves the solve rate, but only on sufficiently sparse tasks and when using ICM. • S− improves both the solve rate and the learning speed. • G, S+ and S− can be used in unison to create a better combined algorithm.
dc.description.sponsorship	Utrecht University
dc.language.iso	EN
dc.subject	In this thesis, we improve reinforcement learning through curriculum learning. We pioneer a new approach to curriculum learning based on resimulation. We formulate 2 approaches to resimulation. Namely, Goal-Based Resimulation (GBR), where we resimulate after changing the goal, and Initial-State-Based Resimulation (ISBR), where we resimulate after changing the initial state.
dc.title	Goal, mistake and success learning through resimulation
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	Reinforcement learning;Curriculum learning;Resimulation;Machine learning;RL;CL;Goals;
dc.subject.courseuu	Artificial Intelligence
dc.thesis.id	2151

Files in this item

Name:: ThesisAI.pdf
Size:: 1.506Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record