Clustering soccer players: investigating unsupervised learning on player positions
Summary
In this study, we investigate the clustering capability of two unsupervised learning clustering methods: K-means and Expectation Maximization (EM). We train the methods on soccer match data of the Spanish competition La Liga, which contains matches from 2004 to 2019. We classify both clustering methods with soccer player positions to visualize a correlation between player positions using Principal Component Analysis (PCA). In these visualizations, we use 4 and 11 clusters that correspond to player positions in the field. To interpret K-means and EM, we use purity and the silhouette score. Results show that K-means classifies the data better than EM. With the use of feature selection methods Laplacian score and correlation mean, we increase the performance of K-means by 37%. We see that a cluster size of 8 clusters has the best separability, which suggests that there are 8 different types of soccer players on the field during a match.