dc.description.abstract | This paper introduces MENCOD (Multi-modal ENsemble Citation Outlier
Detector), a novel approach for identifying outliers in academic literature screening.
In this context, an outlier refers to a relevant paper that was not retrieved before the
stopping rule of an active learning pipeline was triggered—typically because it was
ranked much lower than other relevant papers. MENCOD addresses this by
proposing a two-phase process: after stopping, a new model is trained using
additional information not exploited in the first phase, such as citation networks and
metadata. The method combines multiple Local Outlier Factor (LOF)-based models
and an isolation forest, leveraging both structural and semantic features. Semantic
similarity is computed using SPECTER2 embeddings and cosine similarity.
Evaluated on three datasets from the Synergy project (Hall, Jeyaraman, and
Appenzeller), MENCOD consistently reprioritized missed relevant papers more
effectively than the baseline active learning approach. The improvements were
86.5%, 29.8%, and 75.7%, respectively—amounting to thousands of documents that
no longer require manual screening. Although still a conceptual prototype,
MENCOD shows strong potential for enhancing the recall of relevant literature in
large-scale screening tasks. | |