Predicting traveler demand using explainable models and feature engineering
Summary
This study presents an exploration of using multi-output forecasting models in predicting Origin Destination (OD) time series that represent the travel behavior of travelers using the railway system facilitated by de Nederlandse Spoorwegen (NS). DeepAR was used as a comparative baseline system.
We present an intelligible and effective handcrafted feature set. Using this feature set, we found that multi-output models using LightGBM significantly outperform the DeepAR forecasting model. When coupled with TreeSHAP, we show that the generated explanations are understandable and relatively sparse. Furthermore, we propose an interpretable Linear Regression (LR) model with L1 regularization that matches the DeepAR forecasting performance while having an average of 12 nonzero coeficients. The study also establishes that dimensionality reduction of past time series history via beta-VAE and SVD methods is less effective compared to utilizing the uncompressed feature vector.