View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Task-oriented Dialog Policy Learning via Deep Reinforcement Learning and Automatic Graph Neural Network Curriculum Learning

        Thumbnail
        View/Open
        Master_Thesis_Final.pdf (2.796Mb)
        Publication date
        2024
        Author
        Hanneman, Koen
        Metadata
        Show full item record
        Summary
        In a task-oriented dialog system, a core component is the dialog policy, which determines the response action and guides the conversation system to complete the task. Optimizing such a dialog policy is often formulated as a reinforcement learning (RL) problem. But given the subjectivity and open-ended nature of human conversations, the complexity of dialogs varies greatly and negatively impacts the training efficiency of the RL-based method. A proven method to solve this problem is curriculum learning (CL) which breaks down complex problems and improves learning efficiency by providing a sequence of learning steps of increasing difficulty, similar to human learning. However, existing models implement this sequence by ordering tasks just based on complexity, without taking into account task similarity. In this thesis, we propose a method that reduces the distance between similar tasks in a curriculum, which is hypothesised to lead to increased training efficiency. Therefore, we introduce a curriculum learning model by offline generating a sequence of similar tasks via a graph neural network (GNN), and where the low-level dialog policy is transferred in each iteration of the curriculum. After this, the curriculum learning model performance is compared, on the MultiWOZ dataset, against the performance of dialog policy learning without a curriculum and was found to outperform the baseline model in specific scenarios.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/46228
        Collections
        • Theses

        Related items

        Showing items related by title, author, creator and subject.

        • 45 Anomaly detection with similarity graphs and active learning Building and storing static and dynamic similarity graphs with the help of a vector database 

          Kragting, Sebastiaan (2022)
          Fraudulent transactions of credit cards are a major problem for financial institutions and continues to grow along digital transformation. A conventional view states that fraudulent transactions are anomalies. A novel view ...
        • Greedy causal structure learning of maximal arid graphs 

          Jans, Sebastiaan (2024)
          Causal knowledge is often modelled in directed acyclic graphs (DAGs) where an directed edge between variables, like A → B, indicates that one (A) influences another (B). Many algorithms attempt the difficult task of finding ...
        • Reinforcement Learning and surrogate reward functions based on graph Laplacians 

          Smit, Iris (2022)
          Reinforcement learning is an upcoming area in machine learning with many applications. This thesis covers the basics of reinforcement learning: reward functions, value and policy iterations, and their algorithms. A value ...
        Utrecht university logo