r-PLBP: Temporal Logic for Reasoning about Safety and Rewards of Bounded Policies under Uncertainty

Lutz, Sterre

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Doder, Dragan
dc.contributor.author	Lutz, Sterre
dc.date.accessioned	2023-10-05T23:00:50Z
dc.date.available	2023-10-05T23:00:50Z
dc.date.issued	2023
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/45332
dc.description.abstract	Temporal logics can be used to reason about how environments change over time. When environments have an element of uncertainty, agents with the power to make decisions, or an opportunity for gaining rewards, reasoning about such factors requires a temporal logic with the vocabulary to address them. We propose rPLBP, a reward-based extension of Probabilistic Logic of Bounded Policies that can reason about the rewards gained by an agent in a probabilistic environment. This logic includes expressions to reason about rewards accumulated along a finite path, as well as the expected reward given a finite strategy (a sort of playbook describing which actions the agent will take for a finite number of future steps). We show that the model checking problem for rPLBP is PSPACE-complete. Moreover, we prove that the satisfiability problem for rPLBP is in 2EXPSPACE. We explore applications of rPLBP in the field of safe reinforcement learning. In shielded learning, safety constraints are used to block unsafe actions during the learning phase of an agent. Unlike the temporal logics usually used for such constraints -- e.g. Linear Temporal Logic and Probabilistic Computation Tree Logic -- the logic rPLBP allows us to express to what extent safety guidelines may be relaxed based on expected reward.
dc.description.sponsorship	Utrecht University
dc.language.iso	EN
dc.subject	A new temporal logic for reasoning about rewards and safety of an agent in a probabilistic environment (MDP), including proof of decidable model checking and satisfiability problem.
dc.title	r-PLBP: Temporal Logic for Reasoning about Safety and Rewards of Bounded Policies under Uncertainty
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	temporal logic; bounded policies; agents; reinforcement learning; shielded reinforcement learning
dc.subject.courseuu	Artificial Intelligence
dc.thesis.id	25086

Files in this item

Name:: Master_Thesis_final.pdf
Size:: 1.532Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record