View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Grouped Lasso Regression in Multiple System Estimation

        Thumbnail
        View/Open
        Thesis_final_Jiajian_Yan.pdf (536.7Kb)
        Publication date
        2024
        Author
        Yan, Jiajian
        Metadata
        Show full item record
        Summary
        Multiple System Estimation (MSE) is a crucial statistical method used for population estimation in fields like human rights, ecology, and epidemiology, particularly when complete population counts are unfeasible. Traditional methods, such as log-linear analysis, often face computational challenges due to the exponential increase in model possibilities with additional registers and covariates. To address this, grouped lasso as a regularization technique has been explored, aiming at streamlining model selection by penalizing less impactful coefficients. This study evaluated various grouped lasso models through simulations, adjusting parameters like the number of registers, covariates, and population samples. Three contrast methods (treatment, sum, and polynomial) were used and evaluated against traditional log-linear regression that used the AIC and BIC criteria. Results indicated that when using the optimal lambda values, grouped lasso methods consistently achieved low medians relative to the true population, with narrow interquartile ranges and minimal outliers, demonstrating high precision but low accuracy. AIC/BIC based model selection showed high variation and outliers, however with significantly higher precision once outliers have been removed. Results suggest for further possibilities in the exploration of grouped lasso in more simulated datasets of different parameter combinations, as well as the use of other regularization based methods such as ridge and elastic net. This thesis project is completed in collaboration with group member Yanwen Zhang (Student Number 9087605), who investigated the effect of higher sample population percentage and higher number of registers, whilst a common baseline dataset has been used between the two theses.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/46957
        Collections
        • Theses
        Utrecht university logo