A Data-Driven Decision Model for Machine Learning Model Selection
Summary
Context: Machine learning models are readily accessible and extensively utilized due to their practical
utility in predictive modeling tasks. Despite the consistent performance of individual models, selecting
the appropriate model for a specific applied machine learning problem remains a significant challenge for
research modelers. Various features, such as model trainability and stakeholder comprehensibility, must
be considered when applying these models. These considerations can critically influence the long-term
viability of a machine learning model.
Method: To address this challenge, we present a meta-model for the decision-making process in the
context of machine learning model selection. The creation of this decision model adopts a systematic
research approach, combining systematic literature review, expert interviews, case studies, and design
science to investigate machine learning model selection approaches. The systematic literature review
enables us to gather and analyze relevant information from existing literature. The expert interviews
allow a critical approach to our collected data. The case studies help us assess the practical applicability
of our findings. Design science allows for the finalization of a decision model.
Results: Our study analyzed 43 common models across 72 common features. We provide a comprehensive taxonomy of machine learning paradigms, approaches, and domains. We provide insights
into potential model combinations, trends in model selection, evaluation measures, and frequently used
datasets for training and evaluating these models. The collected data was incorporated into a decision
model, further developed through expert interview feedback. Finally, the decision model was practically
evaluated through eight case studies.
Contribution: Our study presents a data-driven decision model that could aid research modelers in
machine learning model selection. We highlight the importance of further developing the decision model
to improve its accuracy and scope beyond its current state.