View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Improving Data Quality: A Review on DataCentric AI and AI-Actionable Data

        Thumbnail
        View/Open
        LiteratureReview_MartaGarciaMondejar.pdf (403.1Kb)
        Publication date
        2025
        Author
        Garcia Mondejar, Marta
        Metadata
        Show full item record
        Summary
        Artificial Intelligence (AI) has shown remarkable potential in recent years, particularly in fields such as cancer diagnostics. These systems are increasingly being used for tasks like tumor identification, cancer type classification, and predicting patient outcomes. However, despite their great potential, AI systems often face limitations when applied to real-world clinical settings. Performance is often affected due to poor generalization to new environments, low-quality training data, and the underrepresentation of diverse patient groups and cancer types. This literature review explores a new paradigm known as Data-Centric AI (DCAI), which shifts the focus from optimizing model architectures to improving the quality of training data. After outlining the current challenges in cancer detection AI (e.g., data bias, label inconsistency, limited institutional collaboration), we explore three key areas where DCAI techniques are being applied: (1) representation and diversity, (2) label quality and data preprocessing, and (3) accessibility, generalizability, and collaboration. We analyze recent studies that apply DCAI techniques, such as synthetic data generation, semi-supervised labeling, and federated learning, to address challenges in these areas. The review concludes by highlighting the crucial role of data quality in building robust AI models that generalize well across multiple clinical settings and in realizing the full potential of AI in oncology.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/50108
        Collections
        • Theses
        Utrecht university logo