Multi-agent learning tournaments
Summary
In this thesis the methods used in previous multi-agent learning tournaments are compared. The goal of the comparison is to provide insight into why different methods are used and the impact of small, but important, design choices, like normalizing rewards between games to avoid misinterpretation of the results. Additional attention is payed to the fairness of the methods. After the analysis a sample tournament is played to ensure the practical problems are encountered as well. Some of the settings do not have an optimal value, in these cases the options are described and we explain the criteria we used to make a choice.
The resulting methodology is used in a small tournament to show how it can be used. The tournament is run in a modular framework which is published along with this thesis. The framework contains a parameter tuner for the algorithms, something not seen in previous research. To gain insight into N-player games some of the algorithms used in this paper have been slightly modified, which led to a new version of Bully. The tournament is analyzed with a set of statistical analysis techniques and plots which are also published. The metrics give different winners, and to our surprise Markov earned the highest average reward.