View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Can linguistic features unmask fraudulent research? A study that builds an NLP classifier to distinguish retracted papers from non-retracted papers based on text and linguistic features.

        Thumbnail
        View/Open
        ADS Thesis Eveline Schmidt publ.pdf (791.1Kb)
        Publication date
        2022
        Author
        Schmidt, Eveline
        Metadata
        Show full item record
        Summary
        Researchers experience a lot pressure to get published and cited, as their careers often depend on it. This pressure can result in various forms of misconduct. Fraud in academic research is an important problem that should be tackled. Text classification is one way how fraudulent papers can be detected. This project shows that a Logistic Regression classifier can distinguish retracted papers from non-retracted based on texts. This is only possible for papers within the same topic and journal as the classifier was trained on. The results are not generalisable to more general papers or other topics. Literature suggests there are linguistic markers for deceptive language. In this project the features quantity of lexicon, readability, complexity, lexical diversity and number of references are analysed. The quantity of lexicon, complexity and lexical diversity showed significant differences between retracted and non-retracted papers. Including these five linguistic features did, however, not improve the performance of the classification model.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/42411
        Collections
        • Theses
        Utrecht university logo