TransRV: Regressive Contrastive Learning of Tabular Transformer for Lease Vehicle Residual Value Prediction
Summary
Accurate residual value prediction is essential for car manufacturers and lessors, not only to ensure profitabilityandmaintaincompetitive market positions, but also to uncover market trends for business analysts. The current modeling techniques in this industrial field have remained mainly at the non-deep stage. Recent advances in deep learning for tabular data have introduced innovative approaches, yet there remains an unmet need to handle tables with varying structures effectively. Transfer learning — particularly deep transfer learning augmented by contrastive learning — presents a promising solution by enabling models to generalize across datasets that share both common and unique features related to user information, vehicle condition, and macroeconomic parameters, thereby reducing the need for manual intervention. This thesis focuses on modern deep transferable tabular learning techniques, proposing an improved contrastive learning method for pre-training, Vertical Partition Contrastive Loss for Regression(VPCLR). Considering the cross-market residual value prediction task provided by Athlon Car Lease International B.V. as the research case, we developed two models within VPCLR in a transformer-based deep tabular architecture- TransRVs, pretrained with VPCLR on an eight-country dataset of quotable vehicle attributes; and TransRVn, which further incorporates transfer learning on the Netherlands dataset to assimilate knowledge from varied distributions. Both models attain state-of-the-art RV prediction accuracy with a reduced feature set, achieving the MAE of 0.0322 and 0.0287 in a small hyperparameter searching space, which is more promising than XGBoost, CatBoost, Random Forest, and Vanilla DNN with hyperparameters searched from a larger space for more iterations. Furthermore, to enhance the interpretability of the model and provide actionable business insights for stakeholders, the transformer-based model’s predictions are explained through feature importance rankings evaluated using the SHAP explainer. The SHAP-based feature importance analysis of TransRVs and TransRVn confirm that, while core leasing-data dynamics are global, regional nuances are effectively captured by our column-aware contrastive learning design.