The estimation of model performance on unseen data
Summary
In the field of machine learning, the evaluation of models typically involves training them on a
specific dataset and assessing their performance on a separate test set. However, assessing their
performance in real-world environments can be challenging, especially when there is a shortage of
labeled data. This study focuses on estimating the performance of machine learning classifiers in
financial audits, specifically on unseen accounting data. By employing the Confidence Based
Probability Estimation methodology, accurate estimation of performance metrics can be achieved,
considering both predicted labels and probabilities. These estimates can be made under the
assumption that there is no concept drift, the model is well calibrated, and it exhibits consistent
performance across all classes. The findings of this study have practical implications for auditors,
offering insights into the feasibility and usability of integrating machine learning models into audit
procedures. This enables auditors to make informed decisions regarding the adoption of these
models. Furthermore, this research contributes to the field by emphasizing the importance of
considering class discrepancies and promoting a data-driven approach to improve sampling methods
beyond traditional random sampling. In future research, it would be valuable to address challenges
such as multiclass calibration, class imbalance, threshold selection methods, and real-time
monitoring of model performance. These areas of investigation would enhance the robustness and
applicability of machine learning models in production settings.