Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorDoder, Dragan
dc.contributor.authorLutz, Sterre
dc.date.accessioned2023-10-05T23:00:50Z
dc.date.available2023-10-05T23:00:50Z
dc.date.issued2023
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/45332
dc.description.abstractTemporal logics can be used to reason about how environments change over time. When environments have an element of uncertainty, agents with the power to make decisions, or an opportunity for gaining rewards, reasoning about such factors requires a temporal logic with the vocabulary to address them. We propose rPLBP, a reward-based extension of Probabilistic Logic of Bounded Policies that can reason about the rewards gained by an agent in a probabilistic environment. This logic includes expressions to reason about rewards accumulated along a finite path, as well as the expected reward given a finite strategy (a sort of playbook describing which actions the agent will take for a finite number of future steps). We show that the model checking problem for rPLBP is PSPACE-complete. Moreover, we prove that the satisfiability problem for rPLBP is in 2EXPSPACE. We explore applications of rPLBP in the field of safe reinforcement learning. In shielded learning, safety constraints are used to block unsafe actions during the learning phase of an agent. Unlike the temporal logics usually used for such constraints -- e.g. Linear Temporal Logic and Probabilistic Computation Tree Logic -- the logic rPLBP allows us to express to what extent safety guidelines may be relaxed based on expected reward.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectA new temporal logic for reasoning about rewards and safety of an agent in a probabilistic environment (MDP), including proof of decidable model checking and satisfiability problem.
dc.titler-PLBP: Temporal Logic for Reasoning about Safety and Rewards of Bounded Policies under Uncertainty
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordstemporal logic; bounded policies; agents; reinforcement learning; shielded reinforcement learning
dc.subject.courseuuArtificial Intelligence
dc.thesis.id25086


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record