Project title: Estimating Response Models in the Wild with Algorithmic Modeling and Multiverse Analysis
Summary
Missing data often presents challenges for researchers and professionals. The condition of Missing at Random (MAR) is often based on informed assumptions due to the absence of a comprehensive registry of predictors of nonresponse. This paper investigates missingness in the Human Freedom Index using multiverse analysis and algorithmic modeling with Random Forest, XGBoost, LightGBM, and Neural Networks. The findings highlight the best overall performance for LightGBM and XGBoost, with high Macro F1 scores and Matthew’s Correlation Coefficient scores. The most important predictors for these models are year, pf_movement, ef_gender, and the spatial x- and y coordinates, highlighting geographical, societal, and temporal influences on missingness. The study underscores the significance of understanding missingness mechanisms in global datasets and encourages similar research in other contexts.