Multiple fundamental frequency estimation and instrument recognition using non-negative matrix factorization
Summary
Non-negative matrix factorization (NMF) is a model-based technique, where a model is built up to reconstruct a
musical piece. This model is generated by linearly combining a limited set of known note structures at each point
in time. This technique can help us to achieve automatic transcription, where note information is extracted from
a musical recording.
For example, if a C and D note are played in the original musical piece, we hope that the matrix factorization
process uses the structure of a C and D note to reconstruct this part of the musical piece. If this is indeed the case,
because we know that these structures belong to certain notes, we can tell which notes were played in the musical
piece - a process known as automatic transcription.
In this project, we examine the performance of NMF for the tasks of estimating the frequencies that are active in
a musical piece, which is closely related to the pitch values of notes that are played. We ?nd that it is trivial for
several NMF variations to obtain high recall, where correct notes are detected and returned.
This result does however come at the cost of some noise in the returned values, therefore we consider several postprocessing
steps to reduce the number of incorrect frequencies that are returned. We ?nd that this task is complex
and deserves more attention in future research.
The same NMF technique was also applied to the task of instrument recognition, where we attempt to derive the
instrument that was used to play a recorded note. Algorithms designed for this task are still challenged by recordings
of musical pieces where several instruments and notes are sounding at the same time.
We believe that NMF may be able to extract notes from such a recording that are of high enough quality to be able
to derive the instrument from these extracted notes. Results in our experiments were inconclusive but promising.
Combining the results of music information retrieval tasks such as those we have discussed above, and involving
more temporal information by thinking in notes instead of individual timesteps, should lead to improved overall
performance for all of the tasks that are involved.