Improving Accuracy in Real Estate Valuation: A Study on Data Imputation and Prediction Models
Summary
Abstract
In this study, we aim to assist real estate evaluators in avoiding serious errors during the evaluation process due to carelessness or subjective biases. To achieve this, we primarily employ real Dutch real estate valuation data provided by KATE Innovations and build a data prediction framework with three key components: data imputation, feature selection, and data prediction. Within this framework, we build a data imputation method called “Bucket4Imp” and an improved version of the Fast Correlation- Based Filter (FCBF) as an optional feature selection preprocessing step. For data imputation and data prediction, we adopt a bucket-like ensemble model, incorporating three models: K-Nearest Neighbors, Decision Trees, and Random Forest. Through these model and module configurations, we comprehensively address classification or regression problems involving high dimensional and missing data.
Experimenting with diverse datasets demonstrates the significant enhancement in predictive accuracy achievable through our data imputation preprocessing method. Furthermore, our improved feature selec- tion technique contributes to enhanced predictive performance. However, the decision to employ feature selection should be based on the characteristics of the feature set. The study also conducts a compar- ative analysis between the machine learning-based Bucket4Imp method and the statistics-based Mode method. The results show that our Bucket4Imp method performs significantly better than the statistical approach, particularly in multi-class scenarios. Moreover, the findings highlighted the challenges of pre- dicting multi-class features, particularly when class distributions are even, and the number of class labels is large.