View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        r-PLBP: Temporal Logic for Reasoning about Safety and Rewards of Bounded Policies under Uncertainty

        Thumbnail
        View/Open
        Master_Thesis_final.pdf (1.532Mb)
        Publication date
        2023
        Author
        Lutz, Sterre
        Metadata
        Show full item record
        Summary
        Temporal logics can be used to reason about how environments change over time. When environments have an element of uncertainty, agents with the power to make decisions, or an opportunity for gaining rewards, reasoning about such factors requires a temporal logic with the vocabulary to address them. We propose rPLBP, a reward-based extension of Probabilistic Logic of Bounded Policies that can reason about the rewards gained by an agent in a probabilistic environment. This logic includes expressions to reason about rewards accumulated along a finite path, as well as the expected reward given a finite strategy (a sort of playbook describing which actions the agent will take for a finite number of future steps). We show that the model checking problem for rPLBP is PSPACE-complete. Moreover, we prove that the satisfiability problem for rPLBP is in 2EXPSPACE. We explore applications of rPLBP in the field of safe reinforcement learning. In shielded learning, safety constraints are used to block unsafe actions during the learning phase of an agent. Unlike the temporal logics usually used for such constraints -- e.g. Linear Temporal Logic and Probabilistic Computation Tree Logic -- the logic rPLBP allows us to express to what extent safety guidelines may be relaxed based on expected reward.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/45332
        Collections
        • Theses
        Utrecht university logo