Ways to deal with imbalanced data sets for machine-learning using the identification of potential new risk factors for aneurysmal subarachnoid hemorrhage from the UK Biobank as an example.

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Ruigrok, Ynte
dc.contributor.author	Edwards, Laurens
dc.date.accessioned	2022-02-17T00:00:27Z
dc.date.available	2022-02-17T00:00:27Z
dc.date.issued	2022
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/499
dc.description.abstract	Imbalanced data which is the occurrence of one a minority class in a data set, often causes hardship for machine learning algorithms. A pipeline was built to preprocess the data and apply machine learning algorithms specifically built for imbalanced data sets. Different resulting metrics for model performance were considered (AUROC, AUPCR, precision, recall, accuracy, F-1 and F-beta). The pipeline was applied to the UK Biobank, a large-scale prospective cohort study that allowed to identify hypothesis free, new risk factors for aneurysmal subarachnoid hemorrhage (aSAH).
dc.description.sponsorship	Utrecht University
dc.language.iso	EN
dc.subject	Using machine learning algorithms to find new potential risk factors for subarachnoid hemorrhage.
dc.title	Ways to deal with imbalanced data sets for machine-learning using the identification of potential new risk factors for aneurysmal subarachnoid hemorrhage from the UK Biobank as an example.
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	subarachnoid hemorrhage, stroke, risk factors, aneurysm, machine learning, imbalanced
dc.subject.courseuu	Bioinformatics and Biocomplexity
dc.thesis.id	2311