View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Goal, mistake and success learning through resimulation

        Thumbnail
        View/Open
        ThesisAI.pdf (1.506Mb)
        Publication date
        2022
        Author
        Scholte, Niels
        Metadata
        Show full item record
        Summary
        In this thesis, we improve reinforcement learning through curriculum learning. We pioneer a new approach to curriculum learning based on resimulation. We formulate 2 approaches to resimulation. Namely, Goal-Based Resimulation (GBR), where we resimulate after changing the goal, and Initial-State-Based Resimulation (ISBR), where we resimulate after changing the initial state. We construct one GBR method, namely G, where the goal is set to be the last state of the resimulated episode. G is shown to enable solving tasks that are neither solvable by using Proximal Policy Optimization (PPO) [Schulman et al., 2017], an Intrinsic Curiosity module (ICM) [Pathak et al., 2017] nor Hindsight Experience Replay (HER) [Andrychowicz et al., 2017]. We construct 2 ISBR methods, namely S+ and S−. Both methods process the advantage estimates to determine swing events, periods with high amplitude advantage estimates. S+ and S− then resimulate successes and mistakes, respectively, by setting the initial states to be the states at the start of swing events. All methods are tested on two tasks that only differed in the level of sparsity, and 3 reward ratios, controlling for the extent with which the ICM was used. The performance is measured in the solve rate and the learning speed. We find that • G enables solving the proposed tasks. • S+ improves the solve rate, but only on sufficiently sparse tasks and when using ICM. • S− improves both the solve rate and the learning speed. • G, S+ and S− can be used in unison to create a better combined algorithm.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/520
        Collections
        • Theses
        Utrecht university logo