Identifying Caring Communities Within Dutch Chamber Of Commerce Data: A Classifier Comparison
Summary
This paper aimed to determine the most effective classifier for identifying registered 'caring
communities' using data from the Dutch Chamber of Commerce. I optimized and assessed the
performance of four classifiers: Logistic Regression (LR), Support Vector Machine (SVM),
Random Forest (RF), and Gradient Boosting Tree (GBDT). The results show that LR
consistently outperformed the other models across 2022 and 2023 test sets, excelling across
all evaluation metrics. While GBDT showed competitive performance, SVM and RF were less
effective. Despite LR's strengths, improvements in recall and data quality are essential for
better identification of caring communities. Without these improvements, the algorithm may
underestimate the total number of caring communities, leading to an incomplete
understanding of their prevalence.