Leading narrative classification
Summary
The transformation of media consumption through social platforms and
short-form video content has created an urgent need for computational
tools capable of automatically identifying leading narratives in news text.
While recent advances in computational narrative analysis have moved
beyond traditional frame analysis to recognize narrative elements such
as character roles and conflicts, significant gaps remain in understanding
how these models perform across different topical domains. This thesis
investigates whether transformer-based models, specifically BERT, can
effectively classify leading narratives in news media and examines per-
formance differences between single-domain and mixed-domain training
configurations.
Using a specialized dataset from SemEval-2025 containing 910 English
news articles with narrative labels across climate change and Russia-Ukraine
war domains, this research conducts experiments comparing Support Vec-
tor Machine baselines with BERT-based approaches. The experimental
framework includes binary topic classification, multiclass narrative clas-
sification, binary classification for specific narrative labels, reduced label
count experiments, and mixed-domain label merging to evaluate cross-
domain generalization capabilities.
The results demonstrate that BERT successfully distinguishes between
broad topical domains, achieving 95% accuracy for climate change versus
Ukraine-Russia war classification. However, fine-grained leading narrative
classification remains challenging, with macro-averaged precision scores
typically below 0.21 even in optimal configurations. Domain-specific train-
ing consistently outperforms mixed-domain approaches, suggesting that
narrative patterns may be more contextually specific than initially hypoth-
esized. Class imbalance emerges as a fundamental challenge, with the pre-
dominant Other label (40.9% of instances) creating persistent classification
difficulties across all experimental configurations.
Binary classification experiments reveal that individual narrative labels
show varying degrees of learnability, with some achieving moderate F1-
scores (e.g., Criticism Policies reaching 0.412) while others exhibit severe
precision-recall imbalances, such as Overpraising West achieving perfect
recall but extremely low precision (0.021) despite high AUC-ROC scores
(0.911).
The findings indicate that while transformer-based models show promise
for certain aspects of narrative classification, the task remains more com-
putationally challenging than theoretical frameworks suggest. Effective
narrative classification systems will likely require domain-specific ap-
proaches, sophisticated data balancing techniques, and methodological
developments beyond current capabilities. The research contributes em-
pirical insights into computational narrative analysis while establishing
realistic expectations for automated tools in understanding how narratives
shape public discourse across increasingly complex media landscapes.