r-PLBP: Temporal Logic for Reasoning about Safety and Rewards of Bounded Policies under Uncertainty

Lutz, Sterre

View/Open

Master_Thesis_final.pdf (1.532Mb)

Publication date

2023

Author

Lutz, Sterre

Metadata

Show full item record

Summary

Temporal logics can be used to reason about how environments change over time. When environments have an element of uncertainty, agents with the power to make decisions, or an opportunity for gaining rewards, reasoning about such factors requires a temporal logic with the vocabulary to address them. We propose rPLBP, a reward-based extension of Probabilistic Logic of Bounded Policies that can reason about the rewards gained by an agent in a probabilistic environment. This logic includes expressions to reason about rewards accumulated along a finite path, as well as the expected reward given a finite strategy (a sort of playbook describing which actions the agent will take for a finite number of future steps). We show that the model checking problem for rPLBP is PSPACE-complete. Moreover, we prove that the satisfiability problem for rPLBP is in 2EXPSPACE. We explore applications of rPLBP in the field of safe reinforcement learning. In shielded learning, safety constraints are used to block unsafe actions during the learning phase of an agent. Unlike the temporal logics usually used for such constraints -- e.g. Linear Temporal Logic and Probabilistic Computation Tree Logic -- the logic rPLBP allows us to express to what extent safety guidelines may be relaxed based on expected reward.

URI

https://studenttheses.uu.nl/handle/20.500.12932/45332

Collections

Theses