Automatically Predicting Popularity of Music Tracks Based on Lyrics - English
Summary
Hit song science is a machine learning field that focuses on the prediction of the popularity of unreleased tracks. This research aims to examine how different classifiers perform at this task and what features can help predict the popularity of scores. A dataset is built by combining data from the LFM-1b dataset, data retrieved through the Spotify developer API and a lyrics dataset created in a study of Markus Schedl on music information retrieval. After filtering and preprocessing this dataset, features were extracted that comprise of sentiment analysis features, an explicit feature retrieved from Spotify, a repetitiveness feature and tf-idf features. Three models, including a K-Nearest Neighbors classifier, a Naive Bayes classifier and a Support Vector Machine (SVM) were then trained on different combinations of the features. From these classifiers, the SVM performed best when trained on a feature set containing only tf-idf features and obtained an F1-score of 0.60. Unfortunately, each of the models was unable to predict the popularity of highly popular tracks, due to the highly unbalanced feature set.