View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Mapping Research Software Landscapes through Exploratory Studies of GitHub Data

        Thumbnail
        View/Open
        Master_thesis.pdf (2.279Mb)
        Publication date
        2022
        Author
        Quach, Keven
        Metadata
        Show full item record
        Summary
        Research software enables data processing and plays a vital role in academia and industry. As such, it is essential to have findable, accessible, interoperable, and reusable (FAIR) research software. However, what precisely the landscape of research software looks like is unknown. Thus, we would like to understand the research software landscape better and utilize this information to infer actionable recommendations for the Research Software Engineer (RSE) practice. This study provides insights into the research software landscape at Utrecht University through an exploratory analysis while also considering the different scientific domains. We achieve this by collecting GitHub data and analyzing repository FAIRness and characteristics through heatmaps, histograms, statistical tables, and tests. Our method retrieved 176 users with 1521 repositories, of which 823 are considered research software. Others can adopt the proposed method to gain insights into their specific organization, as it is designed to be reproducible and reusable. The analysis showed significant differences between faculty characteristics and how to support the application of FAIR variables. Among other things, our results showed that Geosciences have the highest percentage of unlicensed repositories with 57%. Also, Social Sciences are an outlier in language usage, as they are the only faculty to primarily use R, while other faculties primarily use Python. A first classification model is developed that achieves 70% accuracy in identifying research software that can be used for future labelling tasks. Our recommendations include expanding the R café, creating FAIR reference documents, featuring and highlighting high impact and FAIR research software, and creating yearly reports. We conclude that our labelled GitHub dataset allows us to infer actionable recommendations on RSE practice.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/43162
        Collections
        • Theses
        Utrecht university logo