Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorMelquiond, A.S.J.
dc.contributor.authorGarcia Mondejar, Marta
dc.date.accessioned2025-08-29T00:01:49Z
dc.date.available2025-08-29T00:01:49Z
dc.date.issued2025
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/50108
dc.description.abstractArtificial Intelligence (AI) has shown remarkable potential in recent years, particularly in fields such as cancer diagnostics. These systems are increasingly being used for tasks like tumor identification, cancer type classification, and predicting patient outcomes. However, despite their great potential, AI systems often face limitations when applied to real-world clinical settings. Performance is often affected due to poor generalization to new environments, low-quality training data, and the underrepresentation of diverse patient groups and cancer types. This literature review explores a new paradigm known as Data-Centric AI (DCAI), which shifts the focus from optimizing model architectures to improving the quality of training data. After outlining the current challenges in cancer detection AI (e.g., data bias, label inconsistency, limited institutional collaboration), we explore three key areas where DCAI techniques are being applied: (1) representation and diversity, (2) label quality and data preprocessing, and (3) accessibility, generalizability, and collaboration. We analyze recent studies that apply DCAI techniques, such as synthetic data generation, semi-supervised labeling, and federated learning, to address challenges in these areas. The review concludes by highlighting the crucial role of data quality in building robust AI models that generalize well across multiple clinical settings and in realizing the full potential of AI in oncology.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectThis thesis reviews Data-Centric AI (DCAI) approaches to improve cancer diagnostics. It highlights how data quality, not just model design, is key to robust AI. Challenges like data bias, label issues, and poor generalizability are addressed through methods such as synthetic data, semi-supervised labeling, and federated learning. DCAI offers a path to more reliable AI in clinical oncology.
dc.titleImproving Data Quality: A Review on DataCentric AI and AI-Actionable Data
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.courseuuBioinformatics and Biocomplexity
dc.thesis.id53071


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record