Automatically Predicting Popularity of Music Tracks Based on Lyrics - English

Somme, S. van der

View/Open

Sander_van_der_Somme_Thesis_6287247.pdf (693.1Kb)

Publication date

2021

Author

Somme, S. van der

Metadata

Show full item record

Summary

Hit song science is a machine learning field that focuses on the prediction of the popularity of unreleased tracks. This research aims to examine how different classifiers perform at this task and what features can help predict the popularity of scores. A dataset is built by combining data from the LFM-1b dataset, data retrieved through the Spotify developer API and a lyrics dataset created in a study of Markus Schedl on music information retrieval. After filtering and preprocessing this dataset, features were extracted that comprise of sentiment analysis features, an explicit feature retrieved from Spotify, a repetitiveness feature and tf-idf features. Three models, including a K-Nearest Neighbors classifier, a Naive Bayes classifier and a Support Vector Machine (SVM) were then trained on different combinations of the features. From these classifiers, the SVM performed best when trained on a feature set containing only tf-idf features and obtained an F1-score of 0.60. Unfortunately, each of the models was unable to predict the popularity of highly popular tracks, due to the highly unbalanced feature set.

URI

https://studenttheses.uu.nl/handle/20.500.12932/38767

Collections

Theses