Using Discovered and Annotated Patterns as Compression Method for determining Similarity between Folk Songs
Summary
Repetition is a fundamental concept in music. From previous research on the classification of folk songs, we learned that repeated patterns are important for the classification in tune classes. These classes are formed based on the similarity between the songs. In this thesis, we use annotated and automatically discovered patterns as compression method for determining the similarity between folk songs. When compressing songs using repeated patterns, we expect that only characteristic information about the classes is preserved in each song and therefore the classification accuracy will be high. We analyze if this method can be used to evaluate Pattern Discovery algorithms.
We discuss a number of state-of-the-art Pattern Discovery algorithms and different measures to determine the similarity between both patterns and entire songs. We propose an experimentation framework that uses these algorithms as compression method in a classification task, to measure the coverage (ratio of notes that are remained after the compression) and the classification accuracy. The framework consists of a compression step that reconstructs a song using the patterns that are discovered and a sequence alignment step for calculating melodic similarity. The experiments are performed on MTC-ANN, a set of Dutch folk songs, and a set of Irish folk songs.
Our results show that classification of uncompressed songs already result in a high classification accuracy. Compression using the Annotated Motifs from MTC-ANN lead to a low coverage and high accuracy. None of the Pattern Discovery algorithms achieve a result comparable to the Annotated Motifs, while some naive compression approaches even outperform the Pattern Discovery algorithms.
This leads to the conclusion that annotated motifs can successfully be used for compression, but that the automatic discovery of these patterns does not lead to satisfactory results. On top of that, we cannot conclude which of the Pattern Discovery algorithms performs "best". However, our framework leaves open specific choices one might want to make in a specific context for comparing the Pattern Discovery algorithms unambiguously. Depending on the context, the desired outcome of coverage and classification accuracy can differ. Furthermore, our method can be used to make suggestions for improvement of the output of Pattern Discovery algorithms.