View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        A Comparative Study of Large Language Model Applications in Dutch Electronic Health Records for Symptom Identification

        Thumbnail
        View/Open
        MSc AI Thesis - Matthew Scheeres (Final Version) (1).pdf (2.265Mb)
        Publication date
        2025
        Author
        Scheeres, Matthew
        Metadata
        Show full item record
        Summary
        Early identification of patients at risk of diseases like pneumonia is partly enabled through structured reporting of disease symptoms in Electronic Health Records (EHRs). However, this structured data is not always complete. Automated extraction of symptoms from unstructured text present in EHRs allows these records to be more exact and complete, resulting in more precise diagnoses. This report assesses the performance of Large Language Models (LLMs) in extracting lower respiratory tract infections (LRTI) from free-text sections of Dutch EHRs. The investigation involves the informed selection and comparison of promising LLMs, considering factors like local applicability, language compatibility, and model architecture. A search of relevant models is first performed, after which RobBERT and MedRoBERTa.nl are selected and evaluated across differing amounts of training samples. These models are both trained as direct classifiers and separately fine-tuned for few-shot prompt-based classification, with the goal of exploring the efficacy of the model types relating to the training (or multi-shot) samples provided. By employing a structured methodology and leveraging the capabilities of LLMs, the investigation seeks insights into the optimal utilisation of LLMs for effective symptom extraction in the context of Dutch EHR data. To increase generalisability, multiple target variables are selected to be extracted from the free-text samples (fever, cough, and shortness of breath). The classification performance is measured systematically by calculating metrics like precision, recall and F1-score. While the directly classifying MedRoBERTa.nl achieved F1-scores up to 0.88 with RobBERT closely following, the prompt-based models underperformed, suggesting limitations in their current design for this task.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/48328
        Collections
        • Theses
        Utrecht university logo