Project title: Estimating Response Models in the Wild with Algorithmic Modeling and Multiverse Analysis
Summary
[""This project aimed to identify variables that consistently predict item non-response in Scottish adolescents using multiverse modeling. To implement multiverse modelling, four different
state of the art algorithms were employed, along with various tuned versions. The focus of the
project was on individual variable importance, although model performance was essential as
well.
The results indicated that all algorithms (except for the Random Forest) performed reasonably well after tuning, depending on the target variable. The Random Forest models exhibited
poor performance across all target variables and were excluded from the overall variable importance analysis. Three significant findings emerged from the study. Firstly, variables from a
mental health questionnaire showed associations with missingness in multiple dependent variables. Specifically, variables related to emotions, fear, prosocial behavior, and hyperactivity
were important for multiple targets. Secondly, missingness on the questionnaire itself was associated with variables related to alcohol and drug use. Lastly, missingness in the variable
concerning parental supervision was strongly associated with whether the adolescent was likely
tot talk to their father/carer if they had concerns.
The project had a few limitations, including some technical shortcomings. The hyperparameter optimisation process could have been more comprehensive, and the preprocessing steps
were considered too harsh in retrospect. Recommendations for future research included different options to account for imbalanced data, which is ever-present in survey data. Ethical
concerns about interpreting machine learning results and what research in the field on response
mechanisms should emphasise were discussed as well.""]