On the predictability of Energy Labels in Dutch Commercial Real Estate: A machine Learning Approach
Summary
The real estate sector is under increasing pressure to enhance sustainability, driven by regulatory demands, market expectations, and environmental imperatives. This thesis evaluates the predictability of energy labels in Dutch commercial real estate using machine learning models. We apply regression and classification techniques, including Random Forest, XGBoost, and Deep Neural Networks, to forecast energy label indexes and the likelihood of energy label upgrades. Utilizing a comprehensive dataset of 66,000 commercial properties, which includes data on residential, office, and retail properties, we compare model performance using metrics such as Mean Absolute Percentage Error (MAPE), R-squared, and Area Under the Curve (AUC). Our findings indicate that ensemble methods, particularly Random Forest and XGBoost, outperform traditional linear models in predicting energy labels. For predicting the energy label index, Random Forest achieved a mean R-squared of 0.69, a mean MAPE of 5,3%, and a Root Mean Square Error (RMSE) of 39.8. In predicting energy label upgrades, XGBoost achieved an AUC of 0.95, with Random Forest showing similar efficiency. SHAP (SHapley Additive exPlanations) analysis identified influential variables, including previous energy labels, property market value, and planned investments, highlighting their significant impact on model predictions. Lastly, we employ Monte Carlo simulations to forecast the development of energy label A adoption within the real estate portfolio of a large Dutch bank, estimating that approximately 74% of the portfolio will achieve this label by 2030. The results underscore the potential of data science to drive substantial advancements in energy efficiency within the commercial real estate sector.
