Mitigating racial bias across the AI life cycle in precision medicine using data science techniques
Summary
Precision medicine aims to tailor healthcare by incorporating data on a patients genomics, medical history, and lifestyle to develop personalized disease risk and treatment. The use of AI has become of increasing importance within precision medicine, due to its ability to process large data sets and identify patterns within them. These AI models are usually trained on Electronic Health Records (EHRs) and omics data. However, the data banks and bases from which these data are sourced are often not racially diverse and the majority of the data usually comes from individuals with European ancestry. In addition, the data can often contain unfair correlations between race and outcomes. The use of these datasets as training data can lead to racial bias within the AI algorithm, which influences the decision making, thereby further exacerbating health disparities among marginalized racial groups. Over the past decades, the topic of racial bias in AI healthcare and precision medicine has gotten increased attention, and various identification and mitigation methods have been proposed to combat this bias. This review focuses on the various steps within the AI lifecycle - including data collection, data processing, model development, model evaluation and model implementation - and discusses how racial bias can occur as well as the various metrics and strategies that have been proposed to identify and mitigate it at each step. This review specifically focuses on data science mitigation strategies to aid developers in creating equitable and fair AI-driven precision medicine tools.