dc.description.abstract | This research aimed to select, tune, and interpret a supervised Machine Learning (ML) model in search of a model that could correctly predict the number of attributes, i.e., dimensionality assessment, in Cognitive Diagnosis Models (CDMs). These objectives were achieved by benchmarking various supervised ML algorithms, tuning the best-performing model, and applying interpretable ML in the form of counterfactual explanations. A large-scale simulated dataset of 607,579 observations and 946 predictors was used in this research. Feature selection was used to reduce the number of predictors to 142, after which the analysis was performed. An ensemble model combining random forest and XGBoost performed best among other supervised ML models, with a validation accuracy of 56.0%. Hyperparameter tuning using Model-Based Optimisation did not further increase the accuracy of the model. The final model evaluation on unseen test data achieved an accuracy of 56.1%. Generated counterfactuals revealed relevant predictors influencing model predictions and showed that, on average, 26 predictors needed to be altered to correct misclassifications. Despite limitations in model performance, the chosen model still provided meaningful improvement over the baseline of 11% and the counterfactuals offered insight into the complexity of dimensionality assessment in CDMs. | |