Text mining of clinical outcomes for medical
research: how accurate should it be?

Grotenhuis, Zwierd

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Leeuwenberg, A.M.
dc.contributor.author	Grotenhuis, Zwierd
dc.date.accessioned	2023-04-01T00:00:49Z
dc.date.available	2023-04-01T00:00:49Z
dc.date.issued	2023
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/43741
dc.description.abstract	In medicine, clinical prediction models are often developed to estimate future risk of pa- tients regarding a certain health outcome (e.g., in-hospital mortality). To develop these models, historic structured data is needed about patient characteristics and the relevant health outcomes. Sometimes the to be predicted health outcome was not recorded in struc- tured data but may be extracted from the textual notes by using text mining. If a text mining model is developed to extract outcome variables from clinical notes, that model can be used to generate the training data for the prediction model. Contemporary research often applies text mining, but the impact of text mining quality on prediction model per- formances in this setting remains unclear. We performed a simulation study that charted this relationship in a case study of in-hospital mortality prediction in ICUs. We created a logistic regression and neural network prediction model and trained it on data extrac- ted by multiple text mining models with a wide range of performance. We varied the performance of the text mining models by changing the size of the training data used to develop them and by shifting the decision boundary. We found that analysis can be done to determine whether the text mining model performs well enough, or whether more data might be needed for text mining training purposes. We also concluded that shifting the decision boundary of the text mining model can be a viable way to increase prediction model performance, especially when a low amount of training data is used. The know- ledge gained in this project may be used to create better performing prediction models using text mining models when training data is limited.
dc.description.sponsorship	Utrecht University
dc.language.iso	EN
dc.subject	Onderzoek naar het gebruiken van text mining in klinische predictiemodellen. Specifiek naar hoe de performance van het text mining model invloed heeft op de performance van het klinische predictiemodel.
dc.title	Text mining of clinical outcomes for medical research: how accurate should it be?
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	text mining; medicine; machine learning; NLP; prognosis; in-hospital mortality
dc.subject.courseuu	Artificial Intelligence
dc.thesis.id	11859

Files in this item

Name:: Master_Thesis_Zwierd_19-10.pdf
Size:: 1.325Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record

Text mining of clinical outcomes for medical research: how accurate should it be?

Files in this item

This item appears in the following Collection(s)