Training a Back-Propagation Network with Temporal Difference Learning and a database for the board game Pente
Summary
This paper will give a view on how to make a back-propagation network which can play Pente by learning from database games with temporal difference learning. The aspects of temporal difference learning and neural networks are described. The method of how a neural network can learn to play the game Pente is also explained. An experiment is described and analyzed and shows how the neural network has learned the game of Pente and in what kind of behavior the learning strategy resulted. The best performance after training on one single game was reached with a learning rate of 0.001. After learning 300000 games from the database the network did not perform very well. A third experiment with 4000 self-play games was also conducted and showed that the network placed the game more to the middle of the board than to the sides which is some form of intelligence. First this paper will give a view of the aspects of temporal difference learning in neural networks and which choices in this project were made during testing. Then the game of Pente will be explained and the way of how this game can be learned by a neural network will be justified. At the end of this paper the experiments of this project will be explained and the results will be evaluated. Then the conclusions will be discussed and analyzed.