Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorVreeswijk, G.A.W.
dc.contributor.authorHeemskerk, H.C.
dc.date.accessioned2020-10-30T19:00:18Z
dc.date.available2020-10-30T19:00:18Z
dc.date.issued2020
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/38059
dc.description.abstractIn Multi-Agent Reinforcement Learning (MARL), social dilemma environments make cooperation hard to learn. It is even harder in the case of decentralized models, where agents do not share model components. Intrinsic rewards have only been partially explored to solve this problem, and training still requires a large amount of samples and thus time. In an attempt to speed up this process, we propose a combination of the two main categories of intrinsic rewards, curiosity and empowerment. We perform experiments in the cleanup and harvest social dilemma environments for several types of models, both with and without intrinsic motivation. We find no conclusive evidence that intrinsic motivation significantly alters experiment outcomes when using the PPO algorithm. We also find that PPO is unable to succeed in the harvest environment. However, for both of these findings we only show this to be the case without hyperparameter tuning.
dc.description.sponsorshipUtrecht University
dc.format.extent4202632
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.titleSocial curiosity in deep multi-agent reinforcement learning
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsreinforcement learning, multi-agent, multi-agent reinforcement learning, policy gradient, PPO, A3C, actor-critic, social dilemmas, sequential social dilemmas, tragedy of the commons, commons dilemma, intrinsic reward, intrinsic motivation, empowerment, curiosity, social curiosity module
dc.subject.courseuuComputing Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record