Leading narrative classification

Restum, Ali

View/Open

Bert_for_leading_narrative_classification.pdf (3.127Mb)

Publication date

2025

Author

Restum, Ali

Metadata

Show full item record

Summary

The transformation of media consumption through social platforms and short-form video content has created an urgent need for computational tools capable of automatically identifying leading narratives in news text. While recent advances in computational narrative analysis have moved beyond traditional frame analysis to recognize narrative elements such as character roles and conflicts, significant gaps remain in understanding how these models perform across different topical domains. This thesis investigates whether transformer-based models, specifically BERT, can effectively classify leading narratives in news media and examines per- formance differences between single-domain and mixed-domain training configurations. Using a specialized dataset from SemEval-2025 containing 910 English news articles with narrative labels across climate change and Russia-Ukraine war domains, this research conducts experiments comparing Support Vec- tor Machine baselines with BERT-based approaches. The experimental framework includes binary topic classification, multiclass narrative clas- sification, binary classification for specific narrative labels, reduced label count experiments, and mixed-domain label merging to evaluate cross- domain generalization capabilities. The results demonstrate that BERT successfully distinguishes between broad topical domains, achieving 95% accuracy for climate change versus Ukraine-Russia war classification. However, fine-grained leading narrative classification remains challenging, with macro-averaged precision scores typically below 0.21 even in optimal configurations. Domain-specific train- ing consistently outperforms mixed-domain approaches, suggesting that narrative patterns may be more contextually specific than initially hypoth- esized. Class imbalance emerges as a fundamental challenge, with the pre- dominant Other label (40.9% of instances) creating persistent classification difficulties across all experimental configurations. Binary classification experiments reveal that individual narrative labels show varying degrees of learnability, with some achieving moderate F1- scores (e.g., Criticism Policies reaching 0.412) while others exhibit severe precision-recall imbalances, such as Overpraising West achieving perfect recall but extremely low precision (0.021) despite high AUC-ROC scores (0.911). The findings indicate that while transformer-based models show promise for certain aspects of narrative classification, the task remains more com- putationally challenging than theoretical frameworks suggest. Effective narrative classification systems will likely require domain-specific ap- proaches, sophisticated data balancing techniques, and methodological developments beyond current capabilities. The research contributes em- pirical insights into computational narrative analysis while establishing realistic expectations for automated tools in understanding how narratives shape public discourse across increasingly complex media landscapes.

URI

https://studenttheses.uu.nl/handle/20.500.12932/50491

Collections

Theses