Constructing and Predicting School Advice for Academic Achievement: A Comparison of Item Response Theory and Machine Learning Techniques
Summary
In contemporary education, tests can be used to estimate students’ abilities and thereby give an indication of whether their school type is suitable for them. However, tests are usually conducted for each content area separately which makes it difficult to combine these results into one single school advice. To this end — in the context of a student monitoring system — we research a series of tests that measure progress for the purpose of predicting school advice. Concretely, we examine domain-specific and domain-agnostic methods and compare their results both quantitatively and on explanations. In this research, we provide a comparison between both domain-specific and domain-agnostic methods for predicting school advice. First, we describe current approaches for measuring progress from educational tests. Next, we examine which methods are suitable to use for the purpose of combining content areas and predicting school advice. An IRT model is calibrated from which an ability score is extracted and is subsequently plugged into a multinomial log-linear regression model. Second, we train a random forest (RF) and a shallow neural network (NN) and apply case weighting to give extra attention to students who switched between school types. We compare the predictive performance, computational feasibility, and explainability of the models. When considering the performance over all students, RFs provided the most accurate predictions followed by NNs and IRT respectively. When only look- ing at the performance of students who switched school type, IRT performed best followed by NNs and RFs. Case weighting proved to provide a major improvement for this group. Furthermore, all models were found to be computationally feasible. Lastly, IRT was found to be much easier to explain in comparison to the other models; RFs are somewhat more interpretable than NNs. In practice, concept drift would occur because the recommendations made by the model leads to different school advice than the one given if there was no model; hence, this positive feedback loop seems inevitable but can be diminished by solely using the model for decision support. Ethical issues for the use of ML models revolve around differences in importance between content areas and whether this is fair. Moreover, legal considerations are posed by the GDPR’s ‘right to an explanation’. Future research includes choosing different class aggregations, other models, inspecting confounding variables, and testing generalisability.