Binary Classification on a Highly Imbalanced Dataset
Summary
Credit card fraud is a growing field of crime. Data-drive detection of fraudulent transactions can be viewed as a binary classification problem, where the two outcome classes are highly imbalanced.
To overcome the difficulties that arise from this imbalance, multiple solution are described and explored. Furthermore, accompanied statistical arguments, a novel method using subgroup discovery is introduced. Finally, all methods are empirically tested on an actual credit card transaction dataset.