Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorVink, Gerko
dc.contributor.authorDvinskis, Elviss
dc.date.accessioned2022-09-09T01:02:36Z
dc.date.available2022-09-09T01:02:36Z
dc.date.issued2022
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/42449
dc.description.abstractMissing data frequently complicate data analysis. Multiple imputation is a well known and robust technique for addressing missing data. In R, multiple imputation is commonly implemented through the mice package which utilizes the MICE algorithm. However, such a standard choice is not yet established for Python. This study addresses four imputation methods that are implemented in Python to assess if they can yield unbiased and confidence valid estimates. A model-based simulation study is carried out to evaluate the performance of KNNImputer, IterativeImputer, miceforest and MIDASpy. The obtained results demonstrate that while under certain conditions IterativeImputer can show comparable performance to the conventional R imputation method mice, the other methods (KNNImputer, miceforest and MIDASpy) underperform under most conditions specified in this simulation study. This study suggests that it would be unwise to recommend these Python approaches as a general imputation strategy without a detailed comprehension of each of the method’s proper application settings and fine-tuning.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectEvaluation of different imputation methods for missing data.
dc.titleCan a 𝙿𝚢𝚝𝚑𝚘𝚗 (package) do what 𝚖𝚒𝚌𝚎 can?
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsmissing data; mice; MIDAS; KNNImputer; IterativeImputer; miceforest; miceRanger
dc.subject.courseuuApplied Data Science
dc.thesis.id8849
ο»Ώ

Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record