A Comparative Study of Large Language Model Applications in Dutch Electronic Health Records for Symptom Identification
Summary
Early identification of patients at risk of diseases like pneumonia is partly enabled through structured reporting of disease symptoms in Electronic Health Records (EHRs). However, this structured data is not always complete. Automated extraction of symptoms from unstructured text present in EHRs allows these records to be more exact and complete, resulting in more precise diagnoses. This report assesses the performance of Large Language Models (LLMs) in extracting lower respiratory tract infections (LRTI) from free-text sections of Dutch EHRs. The investigation involves the informed selection and comparison of promising LLMs, considering factors like local applicability, language compatibility, and model architecture. A search of relevant models is first performed, after which RobBERT and MedRoBERTa.nl are selected and evaluated across differing amounts of training samples. These models are both trained as direct classifiers and separately fine-tuned for few-shot prompt-based classification, with the goal of exploring the efficacy of the model types relating to the training (or multi-shot) samples provided. By employing a structured methodology and leveraging the capabilities of LLMs, the investigation seeks insights into the optimal utilisation of LLMs for effective symptom extraction in the context of Dutch EHR data. To increase generalisability, multiple target variables are selected to be extracted from the free-text samples (fever, cough, and shortness of breath). The classification performance is measured systematically by calculating metrics like precision, recall and F1-score. While the directly classifying MedRoBERTa.nl achieved F1-scores up to 0.88 with RobBERT closely following, the prompt-based models underperformed, suggesting limitations in their current design for this task.