dc.rights.license | CC-BY-NC-ND | |
dc.contributor.advisor | Garcia Bernardo, Javier | |
dc.contributor.author | Lindenmeyer, Arleen | |
dc.date.accessioned | 2022-09-09T00:03:19Z | |
dc.date.available | 2022-09-09T00:03:19Z | |
dc.date.issued | 2022 | |
dc.identifier.uri | https://studenttheses.uu.nl/handle/20.500.12932/42424 | |
dc.description.abstract | To retain and raise trust in science, it is essential to correct misinformation promptly, and even better to
prevent the publication of incorrect information, to begin with. Taking a technical approach, this study
attempts to address this critical issue of misinformation and trust in science by building models with the
ability to classify retracted and non-retracted published scientific articles. These classifiers could be used
by institutions to detect papers containing misinformation before they are published. Further, this study
highlights the advantage of differentiating between scientific articles that have been retracted due to
error and scientific articles that have been retracted due to misconduct. With this distinction, a Logistic
Regression classifier was able to achieve an F1 weighted test score of 0.75 and an external validation
score of 0.67. | |
dc.description.sponsorship | Utrecht University | |
dc.language.iso | EN | |
dc.subject | Text classification of non-retracted and retracted (due to error/misconduct) scientific articles, using NLP. Comparison of Naive Bayes, Support Vector Machine and Linear Regression. | |
dc.title | Can non-retracted published research articles be differentiated from research articles that are retracted due to error and misconduct? | |
dc.type.content | Master Thesis | |
dc.rights.accessrights | Open Access | |
dc.subject.keywords | Retraction, scientific articles, text classification, NLP | |
dc.subject.courseuu | Applied Data Science | |
dc.thesis.id | 8923 | |