Ways to deal with imbalanced data sets for machine-learning using the identification of potential new risk factors for aneurysmal subarachnoid hemorrhage from the UK Biobank as an example.

Edwards, Laurens

View/Open

report_final.pdf (933.2Kb)

Publication date

2022

Author

Edwards, Laurens

Metadata

Show full item record

Summary

Imbalanced data which is the occurrence of one a minority class in a data set, often causes hardship for machine learning algorithms. A pipeline was built to preprocess the data and apply machine learning algorithms specifically built for imbalanced data sets. Different resulting metrics for model performance were considered (AUROC, AUPCR, precision, recall, accuracy, F-1 and F-beta). The pipeline was applied to the UK Biobank, a large-scale prospective cohort study that allowed to identify hypothesis free, new risk factors for aneurysmal subarachnoid hemorrhage (aSAH).

URI

https://studenttheses.uu.nl/handle/20.500.12932/499

Collections

Theses