View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Reporting and handling of unrecorded values in automatically textextracted study variables from electronic health records in epidemiological studies: A Literature Review

        Thumbnail
        View/Open
        FINAL_WritingAssignment_JaspervdBrink.pdf (434.9Kb)
        Publication date
        2025
        Author
        Brink, Jasper van den
        Metadata
        Show full item record
        Summary
        Background: Electronic health records (EHRs) provide valuable patient data for research and natural language processing (NLP) helps convert this unstructured data into analyzable formats, though the issue of unrecorded values in the free text remains and can introduce biases. This paper reviews how current studies handle and report these unrecorded values in NLP-extracted data from EHRs in epidemiological studies. Methods: Based on a previous review article, a total of 30 recent observational studies using NLP techniques to extract data from EHRs were included. Data were extracted on the intended usage of NLP, the relevant text-extracted variables, the role of the variable in the analysis (eg. determinant, outcome) and variables in direct relation to the text-extracted variable in the studies’ analyses. Explicit mentions were collected for the referring practices regarding textual variables, addressing of negated and unrecorded values, and reported limitations of NLP techniques. Results: 14 out of 30 studies used both text and structured data while the other 16 used text only for their text-extracted variable. Purpose of NLP techniques varied such as creating new variables or identifying additional cases. Reporting of text-extracted variables differed, with 11 variables being referred as a text-extracted variable in the analysis while the remaining 21 being referred as the actual variables itself. Only six studies reported handling of negated values from the EHRs and only six studies explicitly considered unrecorded values. Finally, 25 studies acknowledged limitations related to NLP techniques, reporting challenges in accuracy and data quality. Conclusion: We highlight the significant variability in reporting and handling of text-extracted variables from EHRs, particularly regarding unrecorded values and emphasize the need for standardized guidelines to improve consistency and accuracy in NLP-assisted epidemiological research
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/48498
        Collections
        • Theses
        Utrecht university logo