View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Pure Past Action Masking for Safe Multi Agent Reinforcement Learning

        Thumbnail
        View/Open
        thesis_Luchi.pdf (672.0Kb)
        Publication date
        2025
        Author
        Luchi, Leopoldo
        Metadata
        Show full item record
        Summary
        Reinforcement Learning is a machine learning paradigm for solving sequential decision-making problems in non-deterministic environments. However, vanilla RLalgorithms, which rely on trial-and-error learning, exhibit unsafe behaviour, as each action may be repeatedly executed during the exploration phase before the agent learns to avoid it. In Multi-Agent Reinforcement Learning, ensuring safety is further complicated, requiring coordination among multiple agents. In this thesis, we study how temporal logics, such as LTL, can be used to design safer algorithms and how it is possible to extend safe algorithms to the multi-agent setting. This thesis introduces Multi-Agent Pure Past Action Masking, a novel approach for provably safe MARL that leverages Pure Past Linear Temporal Logic to specify and enforce non-Markovian safety constraints. Our contribution is twofold: first, we synthesise a centralised mask, using PPLTL formulas to define safe joint actions; second, we propose a decomposition algorithm that enables decentralised, communication-free execution by individual agents. Finally, we formally prove that the individual masks generated by the decomposition algorithm maintain the safety guarantees of the centralised mask, and we further validate our results with an experimental evaluation.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/48624
        Collections
        • Theses
        Utrecht university logo