Comparing Random Forest, Logistic Regression, and Heterogeneous Graph Neural Networks: Classifying Money Laundering in High Liquidity Sectors
Summary
Money laundering is the act of organizations or individuals aimed at legitimizing the origins of assets obtained through criminal activities. Modern money laundering activities tend to form sophisticated criminal networks involving various entities and individuals with different roles, making detection and prevention using traditional methods, such as rule-based approaches, more challenging. This study combines machine learning methods (Random Forest, Logistic Regression) and deep learning method (specifically Heterogeneous Graph Neural Network) to classify suspicious money laundering companies in high-liquidity sectors. The results indicate that the Heterogeneous Graph Neural Network outperforms the other models with higher recall and AUC-ROC performance. Comparing the network metrics and confusion matrix, the common characteristics of suspicious companies are clarified. Companies that tend to connect with many other firms, play a crucial intermediary role in the network, form a distinct community, and maintain close connections with each other are potentially illegal. These results provide a foundation for building robust anti-money laundering systems in the future. However, further research should focus more on addressing data imbalance and gray data (unconfirmed money laundering) issues to improve the accuracy of the algorithms.