View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Improving Accuracy in Real Estate Valuation: A Study on Data Imputation and Prediction Models

        Thumbnail
        View/Open
        Thesis_Yuxuan Hou_Improving Accuracy in Real Estate Valuation.pdf (1.829Mb)
        Publication date
        2024
        Author
        Hou, Yuxuan Hou
        Metadata
        Show full item record
        Summary
        Abstract In this study, we aim to assist real estate evaluators in avoiding serious errors during the evaluation process due to carelessness or subjective biases. To achieve this, we primarily employ real Dutch real estate valuation data provided by KATE Innovations and build a data prediction framework with three key components: data imputation, feature selection, and data prediction. Within this framework, we build a data imputation method called “Bucket4Imp” and an improved version of the Fast Correlation- Based Filter (FCBF) as an optional feature selection preprocessing step. For data imputation and data prediction, we adopt a bucket-like ensemble model, incorporating three models: K-Nearest Neighbors, Decision Trees, and Random Forest. Through these model and module configurations, we comprehensively address classification or regression problems involving high dimensional and missing data. Experimenting with diverse datasets demonstrates the significant enhancement in predictive accuracy achievable through our data imputation preprocessing method. Furthermore, our improved feature selec- tion technique contributes to enhanced predictive performance. However, the decision to employ feature selection should be based on the characteristics of the feature set. The study also conducts a compar- ative analysis between the machine learning-based Bucket4Imp method and the statistics-based Mode method. The results show that our Bucket4Imp method performs significantly better than the statistical approach, particularly in multi-class scenarios. Moreover, the findings highlighted the challenges of pre- dicting multi-class features, particularly when class distributions are even, and the number of class labels is large.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/45760
        Collections
        • Theses
        Utrecht university logo