The estimation of model performance on unseen data

Essaijan, Alex

View/Open

Thesis_MSc Applied Data Science - Confidence Based Performance Estimation - Alex Essaijan.pdf (516.6Kb)

Publication date

2023

Author

Essaijan, Alex

Metadata

Show full item record

Summary

In the field of machine learning, the evaluation of models typically involves training them on a specific dataset and assessing their performance on a separate test set. However, assessing their performance in real-world environments can be challenging, especially when there is a shortage of labeled data. This study focuses on estimating the performance of machine learning classifiers in financial audits, specifically on unseen accounting data. By employing the Confidence Based Probability Estimation methodology, accurate estimation of performance metrics can be achieved, considering both predicted labels and probabilities. These estimates can be made under the assumption that there is no concept drift, the model is well calibrated, and it exhibits consistent performance across all classes. The findings of this study have practical implications for auditors, offering insights into the feasibility and usability of integrating machine learning models into audit procedures. This enables auditors to make informed decisions regarding the adoption of these models. Furthermore, this research contributes to the field by emphasizing the importance of considering class discrepancies and promoting a data-driven approach to improve sampling methods beyond traditional random sampling. In future research, it would be valuable to address challenges such as multiclass calibration, class imbalance, threshold selection methods, and real-time monitoring of model performance. These areas of investigation would enhance the robustness and applicability of machine learning models in production settings.

URI

https://studenttheses.uu.nl/handle/20.500.12932/44950

Collections

Theses