Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorCruyff, Maarten
dc.contributor.authorYan, Jiajian
dc.date.accessioned2024-07-26T00:02:09Z
dc.date.available2024-07-26T00:02:09Z
dc.date.issued2024
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/46957
dc.description.abstractMultiple System Estimation (MSE) is a crucial statistical method used for population estimation in fields like human rights, ecology, and epidemiology, particularly when complete population counts are unfeasible. Traditional methods, such as log-linear analysis, often face computational challenges due to the exponential increase in model possibilities with additional registers and covariates. To address this, grouped lasso as a regularization technique has been explored, aiming at streamlining model selection by penalizing less impactful coefficients. This study evaluated various grouped lasso models through simulations, adjusting parameters like the number of registers, covariates, and population samples. Three contrast methods (treatment, sum, and polynomial) were used and evaluated against traditional log-linear regression that used the AIC and BIC criteria. Results indicated that when using the optimal lambda values, grouped lasso methods consistently achieved low medians relative to the true population, with narrow interquartile ranges and minimal outliers, demonstrating high precision but low accuracy. AIC/BIC based model selection showed high variation and outliers, however with significantly higher precision once outliers have been removed. Results suggest for further possibilities in the exploration of grouped lasso in more simulated datasets of different parameter combinations, as well as the use of other regularization based methods such as ridge and elastic net. This thesis project is completed in collaboration with group member Yanwen Zhang (Student Number 9087605), who investigated the effect of higher sample population percentage and higher number of registers, whilst a common baseline dataset has been used between the two theses.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectGrouped Lasso Regression in Multiple System Estimation , examining the Effect of different Sample Population and Covariates over the results of different grouped lasso models, using different contrasts methods, standardization and factor/interaction grouping.
dc.titleGrouped Lasso Regression in Multiple System Estimation
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsLasso regression; Grouped lasso regression, Multiple system estimation;
dc.subject.courseuuApplied Data Science
dc.thesis.id35071


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record