Teaching neural networks to play the piano
Summary
Expressive accentuation in music is a phenomenon that is hard to predict formally, given only a musical score. In this thesis, we propose methods to predict expressive parameters of piano music: dynamics (loudness), tempo and timing. Two different models are applied, one parsing music on a note-by-note basis, and the other parsing the music beat-by-beat. Unsupervised feature learning with sparse RBMs is first used on more than 6000 musical pieces to find recurring patterns (features) in the input score, after which these are correlated to expressive parameters using a supervised learning approach. For dynamics, the system achieves a better R^2 score than the current state of the art on this dataset. The system also exhibits characteristics that indicate that it has the capacity to learn patterns specific to certain genres/musical styles. For tempo and timing, the model has almost no predictive value. A number of suggestions for developing the system further are made.