Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorSchuit, E.
dc.contributor.authorBrink, Jasper van den
dc.date.accessioned2025-02-13T00:01:11Z
dc.date.available2025-02-13T00:01:11Z
dc.date.issued2025
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/48498
dc.description.abstractBackground: Electronic health records (EHRs) provide valuable patient data for research and natural language processing (NLP) helps convert this unstructured data into analyzable formats, though the issue of unrecorded values in the free text remains and can introduce biases. This paper reviews how current studies handle and report these unrecorded values in NLP-extracted data from EHRs in epidemiological studies. Methods: Based on a previous review article, a total of 30 recent observational studies using NLP techniques to extract data from EHRs were included. Data were extracted on the intended usage of NLP, the relevant text-extracted variables, the role of the variable in the analysis (eg. determinant, outcome) and variables in direct relation to the text-extracted variable in the studies’ analyses. Explicit mentions were collected for the referring practices regarding textual variables, addressing of negated and unrecorded values, and reported limitations of NLP techniques. Results: 14 out of 30 studies used both text and structured data while the other 16 used text only for their text-extracted variable. Purpose of NLP techniques varied such as creating new variables or identifying additional cases. Reporting of text-extracted variables differed, with 11 variables being referred as a text-extracted variable in the analysis while the remaining 21 being referred as the actual variables itself. Only six studies reported handling of negated values from the EHRs and only six studies explicitly considered unrecorded values. Finally, 25 studies acknowledged limitations related to NLP techniques, reporting challenges in accuracy and data quality. Conclusion: We highlight the significant variability in reporting and handling of text-extracted variables from EHRs, particularly regarding unrecorded values and emphasize the need for standardized guidelines to improve consistency and accuracy in NLP-assisted epidemiological research
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectA literature review of 30 recent epidemiological studies that use automatic text-extraction techniques. Their reporting practices of text-extracted variables and most noteably unreported values were analyzed and discussed.
dc.titleReporting and handling of unrecorded values in automatically textextracted study variables from electronic health records in epidemiological studies: A Literature Review
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.courseuuEpidemiology
dc.thesis.id36757


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record