Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorGarcia Bernardo, Javier
dc.contributor.authorSchmidt, Eveline
dc.date.accessioned2022-09-09T00:02:51Z
dc.date.available2022-09-09T00:02:51Z
dc.date.issued2022
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/42411
dc.description.abstractResearchers experience a lot pressure to get published and cited, as their careers often depend on it. This pressure can result in various forms of misconduct. Fraud in academic research is an important problem that should be tackled. Text classification is one way how fraudulent papers can be detected. This project shows that a Logistic Regression classifier can distinguish retracted papers from non-retracted based on texts. This is only possible for papers within the same topic and journal as the classifier was trained on. The results are not generalisable to more general papers or other topics. Literature suggests there are linguistic markers for deceptive language. In this project the features quantity of lexicon, readability, complexity, lexical diversity and number of references are analysed. The quantity of lexicon, complexity and lexical diversity showed significant differences between retracted and non-retracted papers. Including these five linguistic features did, however, not improve the performance of the classification model.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectA study that builds an NLP classifier to distinguish retracted papers from non-retracted papers based on text and linguistic features.
dc.titleCan linguistic features unmask fraudulent research? A study that builds an NLP classifier to distinguish retracted papers from non-retracted papers based on text and linguistic features.
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsnlp;natural language processing;fraudulent research;text classification
dc.subject.courseuuApplied Data Science
dc.thesis.id8876


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record