Fairness in Machine Learning: Ensuring Fairness in Datasets for Classification Problems
Summary
Machine Learning (ML) algorithms are used in a wide range of applications, which affected societies either directly or indirectly in daily life. ML algorithms are preferred for many tasks that require complex computations with a massive volume of data due to the better performance compared to humans. Moreover, people have subjective opinions and points of view, which can lead to bias in their decisions. Unfortunately, ML algorithms are not always objective either. Using ML algorithms in several decision-making systems and other services may cause serious discrimination among some groups of people in society. One of the most significant reasons behind the biased predictions of the algorithms for different demographic groups is the imbalanced representation of each demographic subgroup in the population. In this master's thesis, COSCFair (Clustering and OverSampling for Fair Classification) is proposed, which is a framework to ensure fairness among all the subgroups that exist in a dataset without changing the original class labels of the samples. COSCFair consists of clustering, oversampling, and classification components, where the classification component considers the outcomes of the clustering algorithm. The classification component contains an ensembled technique of class label prediction. The experimental results over different datasets that are widely used as benchmarks to evaluate algorithmic fairness show that the COSCFair framework yields consistent improvements in fairness while causing a minimal loss in predictive performance compared to a set of baseline methods.