View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Introducing MENCOD: Multi-modal ENsemble Citation Outlier Detector

        Thumbnail
        View/Open
        ADS Thesis - Marco van Angeren.pdf (606.4Kb)
        Publication date
        2025
        Author
        Angeren, Marco van
        Metadata
        Show full item record
        Summary
        This paper introduces MENCOD (Multi-modal ENsemble Citation Outlier Detector), a novel approach for identifying outliers in academic literature screening. In this context, an outlier refers to a relevant paper that was not retrieved before the stopping rule of an active learning pipeline was triggered—typically because it was ranked much lower than other relevant papers. MENCOD addresses this by proposing a two-phase process: after stopping, a new model is trained using additional information not exploited in the first phase, such as citation networks and metadata. The method combines multiple Local Outlier Factor (LOF)-based models and an isolation forest, leveraging both structural and semantic features. Semantic similarity is computed using SPECTER2 embeddings and cosine similarity. Evaluated on three datasets from the Synergy project (Hall, Jeyaraman, and Appenzeller), MENCOD consistently reprioritized missed relevant papers more effectively than the baseline active learning approach. The improvements were 86.5%, 29.8%, and 75.7%, respectively—amounting to thousands of documents that no longer require manual screening. Although still a conceptual prototype, MENCOD shows strong potential for enhancing the recall of relevant literature in large-scale screening tasks.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/49838
        Collections
        • Theses
        Utrecht university logo