Improving Data Quality: A Review on DataCentric AI and AI-Actionable Data

Garcia Mondejar, Marta

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Melquiond, A.S.J.
dc.contributor.author	Garcia Mondejar, Marta
dc.date.accessioned	2025-08-29T00:01:49Z
dc.date.available	2025-08-29T00:01:49Z
dc.date.issued	2025
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/50108
dc.description.abstract	Artificial Intelligence (AI) has shown remarkable potential in recent years, particularly in fields such as cancer diagnostics. These systems are increasingly being used for tasks like tumor identification, cancer type classification, and predicting patient outcomes. However, despite their great potential, AI systems often face limitations when applied to real-world clinical settings. Performance is often affected due to poor generalization to new environments, low-quality training data, and the underrepresentation of diverse patient groups and cancer types. This literature review explores a new paradigm known as Data-Centric AI (DCAI), which shifts the focus from optimizing model architectures to improving the quality of training data. After outlining the current challenges in cancer detection AI (e.g., data bias, label inconsistency, limited institutional collaboration), we explore three key areas where DCAI techniques are being applied: (1) representation and diversity, (2) label quality and data preprocessing, and (3) accessibility, generalizability, and collaboration. We analyze recent studies that apply DCAI techniques, such as synthetic data generation, semi-supervised labeling, and federated learning, to address challenges in these areas. The review concludes by highlighting the crucial role of data quality in building robust AI models that generalize well across multiple clinical settings and in realizing the full potential of AI in oncology.
dc.description.sponsorship	Utrecht University
dc.language.iso	EN
dc.subject	This thesis reviews Data-Centric AI (DCAI) approaches to improve cancer diagnostics. It highlights how data quality, not just model design, is key to robust AI. Challenges like data bias, label issues, and poor generalizability are addressed through methods such as synthetic data, semi-supervised labeling, and federated learning. DCAI offers a path to more reliable AI in clinical oncology.
dc.title	Improving Data Quality: A Review on DataCentric AI and AI-Actionable Data
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.courseuu	Bioinformatics and Biocomplexity
dc.thesis.id	53071

Files in this item

Name:: LiteratureReview_MartaGarciaMo ...
Size:: 403.1Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record