Grouped Lasso Regression in Multiple System Estimation
Summary
Multiple System Estimation (MSE) is a crucial statistical method used for population estimation in
fields like human rights, ecology, and epidemiology, particularly when complete population counts are
unfeasible. Traditional methods, such as log-linear analysis, often face computational challenges due to
the exponential increase in model possibilities with additional registers and covariates. To address this,
grouped lasso as a regularization technique has been explored, aiming at streamlining model selection
by penalizing less impactful coefficients. This study evaluated various grouped lasso models through
simulations, adjusting parameters like the number of registers, covariates, and population samples. Three
contrast methods (treatment, sum, and polynomial) were used and evaluated against traditional log-linear
regression that used the AIC and BIC criteria. Results indicated that when using the optimal lambda
values, grouped lasso methods consistently achieved low medians relative to the true population, with
narrow interquartile ranges and minimal outliers, demonstrating high precision but low accuracy. AIC/BIC
based model selection showed high variation and outliers, however with significantly higher precision once
outliers have been removed. Results suggest for further possibilities in the exploration of grouped lasso
in more simulated datasets of different parameter combinations, as well as the use of other regularization
based methods such as ridge and elastic net. This thesis project is completed in collaboration with group
member Yanwen Zhang (Student Number 9087605), who investigated the effect of higher sample population
percentage and higher number of registers, whilst a common baseline dataset has been used between the
two theses.