Identifying at-risk students across different stages of distance learning courses and identifying their most relevant predictors at each stage.
MetadataShow full item record
MOOCs have become commonplace in distance learning over the past decade, but they are facing high student dropout rates. A warning system that could identify at-risk students could decrease the withdrawal rate. Many prior studies investigated student dropout prediction models in MOOCs, but most of these studies used a single course or non-dynamic dataset. These models tend to overfit on dynamic real-world data since they trained on a small specific dataset. This paper contributes to the body of research by investigating student dropout performance of the XGBoost algorithm and evaluating the most important predictors at each stage on a diverse and dynamic dataset. For the analysis the OULA dataset is used, the dataset is pre-processed into three main predictors: demographics, assessment and log data (VLE interaction data). It is split up in eleven intervals wherein each interval the data gets richer. Two analyses are performed on the data. The first analysis is done on the dataset where the withdrawal data isn’t removed, this resulted in a prediction accuracy between 0.76 and 0.86 and the most important predictors are log and assessment data. The second analysis is done on the dynamic dataset, the data of dropped out students are removed after they withdrew from a course, this resulted in a performance that couldn’t beat the accuracy threshold. This implicates that XGBoost isn’t able to predict dropouts on a dynamic dataset.