View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Model's Self-adaptability for prescriptive process monitoring in drifting environment: A comparative study between Reinforcement Learning techniques

        Thumbnail
        View/Open
        thesisfinal.pdf (12.67Mb)
        Publication date
        2025
        Author
        Qu, Weiting
        Metadata
        Show full item record
        Summary
        This study investigates the self-adaptive capabilities of various reinforcement learn- ing (RL) models—including Q-learning, Deep Q-Networks (DQN), Action Pickup RL, Proximal Policy Optimization (PPO), and Meta-RL—in managing dynamic process en- vironments. In contrast to conventional studies on concept drift, which primarily focus on changes in transition probabilities, this work also considers variations in the action space, broadening the understanding of RL adaptability. To this end, multiple types of drift are synthesized within Markov Decision Processes (MDPs) which modeling process events, and model performance is evaluated using met- rics such as average reward, reward recovery rate, and recovery speed. The objective is to identify effective RL approaches for adapting to concept drift in process mining tasks. Several key findings emerged from the experiments. For small-scale MDPs—defined by single-digit numbers of states and actions—Q-learning, despite its simplicity, achieves higher episodic rewards under transition probability drift compared to more complex neural network-based models. In larger MDPs, where states and actions are on the order of tens, DQN and MetaDQN (which integrates meta-learning) maintain strong performance after drift events, achieving higher episodic rewards than traditional meth- ods. Under action space drift, meta-learning-based agents demonstrate superior adapt- ability and faster recovery across both small and large MDPs, highlighting their self- adaptability in action changing environments. These findings suggest that while baseline RL models should be selected based on the size and complexity of the task environment, meta-learning-based agents offer a robust and scalable solution to handling action space drift, making them especially suitable for dynamic and structurally evolving processes.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/50118
        Collections
        • Theses
        Utrecht university logo