Predicting diversity in subtitles of NPO shows
Summary
The value of diversity, in terms of representation of people, has recently come to the forefront for public broadcasters, including the Dutch NPO. The NPO measures diversity through a questionnaire, which asks people to what extent they see or hear people from different population groups in an episode. This thesis aims to predict this ‘diversity score’ using TF, TF-IDF and LDA, to gain insight into the predictive capacity of words and topics for diversity in media content. Both words and topics are found that predict this measure of diversity: the diversity score can be predicted with explained variances between 8% and 49.7%, depending on the dataset.