Multiple System Estimation Using Grouped Lasso: Examining the Effect of Sample Population and Registers
Summary
This study evaluates the efficacy of Grouped Lasso regression models within the field of Multiple System Estimation (MSE) by simulating various data combinations using different parameters. Three sets of datasets were generated, starting with a baseline configuration of 3 registers 2 covariates with 3 levels each. The population size was fixed at 1000, with 25% sampling rate and standardization. To explore the impact of known population size in the sample, a second dataset utilized a 50% sampling rate. A third dataset increased the number of registers to 4, maintaining the same covariates and population size as baseline, to evaluate the effect of number of registers. Each dataset was analyzed under four setups to compare the effects of standardization versus non-standardization, and interaction versus factor grouping. The results revealed that the AIC/BIC methods generally outperformed the lasso methods, although AIC often introduced significant outliers. Optimal lambdas consistently exceeded the performance of the optimal lambda plus one standard deviation. While three contrasts did not achieve the target median as effectively, they exhibited much narrower interquartile ranges compared to AIC/BIC, indicating more consistent performance on precision.
This thesis project is completed with collaboration with group member: Jiajian Yan (student number 6763294), who examines the effect of lower percentage of population size in the sample and higher number of covariates, while keeping the baseline dataset the same to ensure the comparability