Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorZervanou, Kalliopi
dc.contributor.authorStuffers, tristan
dc.date.accessioned2025-08-28T00:02:20Z
dc.date.available2025-08-28T00:02:20Z
dc.date.issued2025
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/50040
dc.description.abstractEach organization in the Netherlands that withholds taxes is legally required to file tax returns with the Dutch Tax Agency, which in turn sends these data to the Employee Insurance Agency (UWV). A tax return must comply with a set of rules. The technical specifications, intended for payrollsoftware developers, are contained in one document. Examples and explanations, intended for payroll administrators, are contained in another. When analysts or legal experts need to trace an informative rule back to a technical rule, they must manually locate relevant paragraphs as there is no tool to assist their search. We present a framework to automatically retrieve relevant paragraphs. As a baseline, TF–IDF achieved lower Recall@ 10 than BM25 across all metrics. BM25 reached Recall@10 = 0.55 on our manually annotated test set. A BERT bi-encoder, when applied as a reranker over BM25 results, achieved R@10 = 0.42. After domain-adaptive and denoising pre-training, the same bi-encoder reached R@10 = 0.44, yet both scores still trail the BM25 baseline of 0.55.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectWe consider which information retrieval and semantic textual similarity techniques can be used to find relevant passages in the handbook for payroll administration. As input we use a set of rules from the software specifications and we quantify the outcome based on a manually annotated set.
dc.titleConnecting Software Specifications and Payroll Rules: Evaluating Sparse and BERT-Based Retrieval
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsInformation Retrieval; Cross-document information linking;Government regulations text mining;Hybrid retrieval;Transformers; Semantic Textual Similarity
dc.subject.courseuuApplied Data Science
dc.thesis.id52722


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record