Predicting train journeys from smart card data: a real-world application of the sequence prediction problem
Summary
This study aims to predict the next journey of travelers by train based on
smart card data. After preprocessing raw data into features describing jour-
neys, the problem is framed as a sequence prediction instance. Domain
modelling issues such as the choice of alphabet, representation of time and
the definition of a sequence are discussed. A base alphabet is constructed,
and closed frequent pattern mining is proposed as a method of algorithmi-
cally extending it. The resulting data encodings are tested against a range
of established sequence prediction algorithms. Results show the All-Kth-
Order-Markov algorithm outperforms other algorithms by a margin. With
regard to pattern encoding, the results are somewhat inconclusive.