Model's Self-adaptability for prescriptive process monitoring in drifting environment: A comparative study between Reinforcement Learning techniques
Summary
This study investigates the self-adaptive capabilities of various reinforcement learn- ing (RL) models—including Q-learning, Deep Q-Networks (DQN), Action Pickup RL, Proximal Policy Optimization (PPO), and Meta-RL—in managing dynamic process en- vironments. In contrast to conventional studies on concept drift, which primarily focus on changes in transition probabilities, this work also considers variations in the action space, broadening the understanding of RL adaptability.
To this end, multiple types of drift are synthesized within Markov Decision Processes (MDPs) which modeling process events, and model performance is evaluated using met- rics such as average reward, reward recovery rate, and recovery speed. The objective is to identify effective RL approaches for adapting to concept drift in process mining tasks.
Several key findings emerged from the experiments. For small-scale MDPs—defined by single-digit numbers of states and actions—Q-learning, despite its simplicity, achieves higher episodic rewards under transition probability drift compared to more complex neural network-based models. In larger MDPs, where states and actions are on the order of tens, DQN and MetaDQN (which integrates meta-learning) maintain strong performance after drift events, achieving higher episodic rewards than traditional meth- ods. Under action space drift, meta-learning-based agents demonstrate superior adapt- ability and faster recovery across both small and large MDPs, highlighting their self- adaptability in action changing environments. These findings suggest that while baseline RL models should be selected based on the size and complexity of the task environment, meta-learning-based agents offer a robust and scalable solution to handling action space drift, making them especially suitable for dynamic and structurally evolving processes.