Machine-Learning-Based Dimensionality Assessment for Cognitive Diagnosis Models
Summary
This thesis examines how supervised machine learning algorithms can be utilised to predict
the number of hidden attributes in simulated data derived from cognitive diagnosis model
(CDM) structures. It is essential for CDMs to estimate dimensionality accurately, but current
methods often rely on expert opinion or stringent assumptions. To solve this problem, a da
taset of more than 600,000 simulations was made, each with a different set of psychometric
conditions and summarised by a set of structural and statistical features. Several machine
learning models, including Random Forest, XGBoost, and a Multi-Layer Perceptron, were
trained to identify the actual number of attributes used to generate the data.
The experimental analysis involved feature engineering, hyperparameter optimisa
tion, ensemble learning, and cost-sensitive training. The evaluation is based on macro-aver
aged F1 scores, ROC AUC, and error distance metrics. The results show that all models out
performed a simple baseline. The MLP model achieved the best performance when combined
with a distance-aware loss function, yielding an F1 score of 0.59 and an AUC of 0.93. Cost
sensitive learning helped reduce the average size of errors that occurred due to misclassifica
tions.
These results demonstrate that supervised machine learning can aid in dimensionality
estimation for cognitive diagnosis modelling. The method is a scalable and data-driven alter
native to traditional psychometric methods, showing considerable promise for use in educa
tional and psychological assessment settings.