Exploring the Thematic Coherence of Fake and LLM-generated news: A Topic Modeling Approach
Summary
Coherence is a fundamental property of well-written texts. As fake news continues to proliferate online and the prevalence of LLM-generated texts grows, the trustworthiness of news articles is broadly put under pressure. Therefore, robust tools to analyze and help understand the language used in such texts are increasingly important. This research explores the thematic coherence as of such texts, which pertains to a the ability of a text to stay focused on its core theme(s). Existing coherence models largely focus on local coherence or use opaque global coherence models and their applications in determining the veracity and origin of news articles is limited. In this study, a novel, interpretable method for modeling thematic coherence is developed as first research contribution. It leverages topic modeling and divergence metrics to assess the alignment of the themes discussed in news articles. The method is evaluated on the traditional sentence ordering evaluation task, which highlights the limited effectiveness of that task on capturing thematic coherence. To overcome that, a new evaluation task is proposed as second research contribution and the findings demonstrate that the proposed method can effectively distinguish thematically coherent from incoherent articles. When applied to detecting human-written fake news, the method shows significant differences between the thematic coherence of real and fake news articles and yields modest performance in standalone classification. For LLM-generated news articles, the method reveals slight thematic differences compared to human-written articles, with limited effectiveness in distinguishing between the two. The method gives us insights in the thematic coherence of fake and LLM-generated news, which is the third research contribution of this work. Beyond its predictive performance, explanations are constructed to help explain the decisions of the proposed method, to bridge the gap between model behavior and user understanding and acceptance. The overall findings highlight the potential for thematic coherence modeling to further advance automated text assessment and detection tools.
Collections
Related items
Showing items related by title, author, creator and subject.
-
"Stupid Game: The Copy Version": A thematic discourse analysis examining the articulation of dislike by anti-fans that positions the Netflix reality series Squid Game: The Challenge as a bad object.
Labes, Lisa (2024)This thesis examines how anti-fans articulate their dislike towards the Netflix reality series Squid Game: The Challenge (2023) utilising Jonathan Gray’s concept of bad object anti-fandom (2019); positioning itself within ... -
Redefining Fatherhood: Balancing Traditional Roles and Modern Expectations in the U.K. and Spain
Lonsing, Anabel (2024)Background: In contemporary Western societies, the role of fatherhood is undergoing significant transformation, moving away from the traditional breadwinner model towards a more nurturing and emotionally involved model ... -
Marketing Dutch Translations of Anglophone Young Adult Literature
Castro Thijssen, Amanda (2022)This thesis explores the marketing strategies that are used by Dutch publishers of Young Adult Literature (YAL) originating in English-language countries. It does so through qualitative research, with information extracted ...