Predicting the amount of excess hand luggage on an aircraft using machine learning
Summary
An excess amount of hand luggage is a growing issue for airlines, causing boarding delays and passenger dissatisfaction. This thesis investigates the factors that cause excess hand luggage and presents a model that predicts the amount of excess hand luggage using machine learning. Various models have been tested on flight data from KLM Royal Dutch Airlines including Linear Regression, Decision Tree Regression, Random Forest Regression and XGBoost. These models have been compared to the heuristic currently in use by KLM.
The results indicate that all Regression models gave a big improvement over the heuristic. The best-performing model is the Random Forest Regressor achieving an R2 value of 0.83 on the intercontinental flight data. To improve the prediction for gate agents, a variation of the model estimates the amount of hand luggage that should be collected at each collection point. This addition decreased the Mean Absolute Error of the gate collection from 4.58 to 1.65. For KLM this means that, using the models, planning and assignment tasks become easier since it is known how much work needs to be distributed over the gate agents.
Despite these promising results, some challenges remain. The first challenge is the lack of a feedback loop from the cabin which can correct the model. Currently, the true target is the number of hand luggage pieces collected at the outstations or by the check-in agents or gate agents. Another challenge is the model drift. While still performing much better than the heuristic, results decrease in the next year. Future research could explore the use of quantile models to better manage prediction uncertainty.