Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorFeelders, Ad
dc.contributor.authorÖz, Ercan
dc.date.accessioned2023-07-22T00:01:54Z
dc.date.available2023-07-22T00:01:54Z
dc.date.issued2023
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/44266
dc.description.abstractFinding all documents relevant to a specific information need in a potentially large collection of documents is essential for many researchers. This is essential not only for researchers who need to sift through thousands of studies to determine which studies are relevant for their meta-analysis but also for clinicians, policy-makers, journalists, and even the general public. Technology Assisted Review (TAR) incorporates machine learning algorithms and human feedback to find all relevant documents to achieve complete recall at a minimal cost. This study investigates methods to enhance the performance of TAR. The availability of labeled data is often limited due to the high costs associated with labeling the data in terms of time and resources. A lack of labeled data can limit a model's capacity for generalization. Semi-supervised learning (SSL) techniques, which use unlabeled data to improve model performance, were examined to address this limitation. This thesis studies various SSL techniques for binary classification and evaluates their contributions to the TAR process. We compared the performance of five semi-supervised learning classifiers within TAR against their supervised equivalents. The findings highlight that the semi-supervised Multinomial Naive Bayes classifier, with many-to-one correspondence via sub-topics, was able to improve the performance over its supervised counterpart multiple times, particularly in the two datasets with the lowest percentage of relevant documents. Significant improvements were also demonstrated for some datasets by combining AutoTAR and semi-supervised Multinomial Naive Bayes with sub-topics, compared to the supervised AutoTAR model. In contrast, label spreading and Support Vector Machines with self-training less frequently outperformed their supervised counterparts. Although semi-supervised models did not consistently outperform their supervised counterparts, this research demonstrates the potential for improved performance using semi-supervised models. This was most notably observed with the semi-supervised Multinomial Naive Bayes model with many-to-one correspondence.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectFinding all documents relevant to a specific information need in a potentially large collection of documents is essential for many researchers. Technology Assisted Review (TAR) incorporates machine learning algorithms and human feedback to find all relevant documents to achieve complete recall at a minimal cost. This thesis studies various semi-supervised learning (SSL) techniques for binary classification and evaluates their contributions to the TAR process.
dc.titleSemi-supervised learning for Technology Assisted Review
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsTechnology Assisted Review; Semi-supervised learning; Multinomial Naive Bayes; Label spreading; Support Vector Machine; Work Saved over Sampling; Self-training; Active learning
dc.subject.courseuuArtificial Intelligence
dc.thesis.id19858


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record