View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Hidden uncertainty in data analysis: Understanding sources of variability in many-analyst projects

        Thumbnail
        View/Open
        thesis_2394421_Knol.pdf (632.4Kb)
        Publication date
        2025
        Author
        Knol, Sara
        Metadata
        Show full item record
        Summary
        This study examines (1) how analytical decisions contribute to variability in many-analyst studies and (2) whether specific decisions can be identified as key drivers. Several models, varying in complexity, were trained and validated on a synthetic multiverse dataset and tested for generalization on the many-analyst dataset from Breznau et al. (2022). While non-linear models performed well on the multiverse dataset (XGBoost R2 = 0.96), none generalized to the many-analyst dataset (R2 ~ 0.0), possibly due to noise or the absence of key decisions in the synthetic data. SHAP values and feature importance highlighted that choices about variables, especially type of independent variables was most impactful. Although current models failed to explain variance in many-analyst settings, findings suggest that efforts to explain variability in many-analysts projects should employ complex models capturing non-linear relationships and emphasize the choice of variables.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/49822
        Collections
        • Theses
        Utrecht university logo