The Descriptive Power of Chords: Music or Noise?
Summary
In the field of music information retrieval (MIR), several features can be extracted from audio, which can then be used for tasks such as query-by-humming or chord extraction. However, chord extraction can also be applied on non-harmonic (non-musical) audio, although this generally results in meaningless chord sequences.
We are interested in distinguishing harmonic from non-harmonic audio, based on the extracted chords only. This may benefit services, such as Chordify, that automatically extract chords from musical audio, so musicians can play along with a song. As these services would only like to show meaningful chords to their users, being able to filter out chords for non-harmonic audio is useful for them. We divide the audio in two groups: H-music, which is music that follows the rules of western harmony, and non-h-music, which is everything that is not h-music (non-musical audio, but also atonal and percussive music).
In this thesis, we study three novel tasks that are all applied on chord sequences only: 1) We classify an extracted chord sequence as either h- or non-h-music, 2) we segment an extracted chord sequence into parts of h- and non-h-music with a novel segmentation algorithm, and 3) we assign a quality score to an extracted chord sequence; this score indicates to which degree a Chordify user finds the chord sequence acceptable. A framework is constructed that is able to perform the tasks mentioned above, by making use of two different models that can describe (a set of) chord sequences: a language model, which we use to create a probability distribution over words, and a chord histogram, which stores the relative distribution of the chords in a song.
With our framework, we are able to accurately distinguish h-music from non-h-music. The chord extraction algorithm also affects the performance, as the Chordino algorithm appears to give us better results than Chordify's own chord extraction algorithm. We are able to predict relatively well, within a margin of tens of seconds, at which point in time in an audio file there is a switch from h-music to non-h-music and vice versa (the segmentation task). Predicting the quality of a chord sequence proves to be more difficult, as our predictor requires more data than we currently have.
Additionally, we have constructed our own data set that consists of several hundred creative commons non-h-music audio files and radio podcasts, which we have used for our experiments. This data set is made publicly available.