Introducing MENCOD: Multi-modal 
ENsemble Citation Outlier Detector

Angeren, Marco van

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Schoot, Rens van de
dc.contributor.author	Angeren, Marco van
dc.date.accessioned	2025-08-21T00:02:59Z
dc.date.available	2025-08-21T00:02:59Z
dc.date.issued	2025
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/49838
dc.description.abstract	This paper introduces MENCOD (Multi-modal ENsemble Citation Outlier Detector), a novel approach for identifying outliers in academic literature screening. In this context, an outlier refers to a relevant paper that was not retrieved before the stopping rule of an active learning pipeline was triggered—typically because it was ranked much lower than other relevant papers. MENCOD addresses this by proposing a two-phase process: after stopping, a new model is trained using additional information not exploited in the first phase, such as citation networks and metadata. The method combines multiple Local Outlier Factor (LOF)-based models and an isolation forest, leveraging both structural and semantic features. Semantic similarity is computed using SPECTER2 embeddings and cosine similarity. Evaluated on three datasets from the Synergy project (Hall, Jeyaraman, and Appenzeller), MENCOD consistently reprioritized missed relevant papers more effectively than the baseline active learning approach. The improvements were 86.5%, 29.8%, and 75.7%, respectively—amounting to thousands of documents that no longer require manual screening. Although still a conceptual prototype, MENCOD shows strong potential for enhancing the recall of relevant literature in large-scale screening tasks.
dc.description.sponsorship	Utrecht University
dc.language.iso	EN
dc.subject	Detecting outliers in scientific papers ranked by relevance by applying machine learning algorithms
dc.title	Introducing MENCOD: Multi-modal ENsemble Citation Outlier Detector
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.courseuu	Applied Data Science
dc.thesis.id	52091

Files in this item

Name:: ADS Thesis - Marco van Angeren.pdf
Size:: 606.4Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record

Introducing MENCOD: Multi-modal ENsemble Citation Outlier Detector

Files in this item

This item appears in the following Collection(s)