dc.rights.license | CC-BY-NC-ND | |
dc.contributor.advisor | Vink, Gerko | |
dc.contributor.author | Dvinskis, Elviss | |
dc.date.accessioned | 2022-09-09T01:02:36Z | |
dc.date.available | 2022-09-09T01:02:36Z | |
dc.date.issued | 2022 | |
dc.identifier.uri | https://studenttheses.uu.nl/handle/20.500.12932/42449 | |
dc.description.abstract | Missing data frequently complicate data analysis. Multiple imputation is a well known and robust technique for addressing missing data. In R, multiple imputation is commonly implemented through the mice package which utilizes the MICE algorithm. However, such a standard choice is not yet established for Python. This study addresses four imputation methods that are implemented in Python to assess if they can yield unbiased and confidence valid estimates. A model-based simulation study is carried out to evaluate the performance of KNNImputer, IterativeImputer, miceforest and MIDASpy. The obtained results demonstrate that while under certain conditions IterativeImputer can show comparable performance to the conventional R imputation method mice, the other methods (KNNImputer, miceforest and MIDASpy) underperform under most conditions specified in this simulation study. This study suggests that it would be unwise to recommend these Python approaches as a general imputation strategy without a detailed comprehension of each of the method’s proper application settings and fine-tuning. | |
dc.description.sponsorship | Utrecht University | |
dc.language.iso | EN | |
dc.subject | Evaluation of different imputation methods for missing data. | |
dc.title | Can a 𝙿𝚢𝚝𝚑𝚘𝚗 (package) do what 𝚖𝚒𝚌𝚎 can? | |
dc.type.content | Master Thesis | |
dc.rights.accessrights | Open Access | |
dc.subject.keywords | missing data; mice; MIDAS; KNNImputer; IterativeImputer; miceforest; miceRanger | |
dc.subject.courseuu | Applied Data Science | |
dc.thesis.id | 8849 | |