Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorLeeuwenberg, A.M.
dc.contributor.authorScheeres, Matthew
dc.date.accessioned2025-01-02T01:01:46Z
dc.date.available2025-01-02T01:01:46Z
dc.date.issued2025
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/48328
dc.description.abstractEarly identification of patients at risk of diseases like pneumonia is partly enabled through structured reporting of disease symptoms in Electronic Health Records (EHRs). However, this structured data is not always complete. Automated extraction of symptoms from unstructured text present in EHRs allows these records to be more exact and complete, resulting in more precise diagnoses. This report assesses the performance of Large Language Models (LLMs) in extracting lower respiratory tract infections (LRTI) from free-text sections of Dutch EHRs. The investigation involves the informed selection and comparison of promising LLMs, considering factors like local applicability, language compatibility, and model architecture. A search of relevant models is first performed, after which RobBERT and MedRoBERTa.nl are selected and evaluated across differing amounts of training samples. These models are both trained as direct classifiers and separately fine-tuned for few-shot prompt-based classification, with the goal of exploring the efficacy of the model types relating to the training (or multi-shot) samples provided. By employing a structured methodology and leveraging the capabilities of LLMs, the investigation seeks insights into the optimal utilisation of LLMs for effective symptom extraction in the context of Dutch EHR data. To increase generalisability, multiple target variables are selected to be extracted from the free-text samples (fever, cough, and shortness of breath). The classification performance is measured systematically by calculating metrics like precision, recall and F1-score. While the directly classifying MedRoBERTa.nl achieved F1-scores up to 0.88 with RobBERT closely following, the prompt-based models underperformed, suggesting limitations in their current design for this task.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectA Comparative Study of Large Language Model Applications in Dutch Electronic Health Records for Symptom Identification
dc.titleA Comparative Study of Large Language Model Applications in Dutch Electronic Health Records for Symptom Identification
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsLarge Language Models; Electronic Health Records; NLP Applications; Multi-shot Classification; Symptom Extraction; Disease Prediction; Few-shot Learning; Clinical Text Analysis
dc.subject.courseuuArtificial Intelligence
dc.thesis.id36134


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record