Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorLamprecht, Anna-Lena
dc.contributor.authorQuach, Keven
dc.date.accessioned2022-11-08T00:00:40Z
dc.date.available2022-11-08T00:00:40Z
dc.date.issued2022
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/43162
dc.description.abstractResearch software enables data processing and plays a vital role in academia and industry. As such, it is essential to have findable, accessible, interoperable, and reusable (FAIR) research software. However, what precisely the landscape of research software looks like is unknown. Thus, we would like to understand the research software landscape better and utilize this information to infer actionable recommendations for the Research Software Engineer (RSE) practice. This study provides insights into the research software landscape at Utrecht University through an exploratory analysis while also considering the different scientific domains. We achieve this by collecting GitHub data and analyzing repository FAIRness and characteristics through heatmaps, histograms, statistical tables, and tests. Our method retrieved 176 users with 1521 repositories, of which 823 are considered research software. Others can adopt the proposed method to gain insights into their specific organization, as it is designed to be reproducible and reusable. The analysis showed significant differences between faculty characteristics and how to support the application of FAIR variables. Among other things, our results showed that Geosciences have the highest percentage of unlicensed repositories with 57%. Also, Social Sciences are an outlier in language usage, as they are the only faculty to primarily use R, while other faculties primarily use Python. A first classification model is developed that achieves 70% accuracy in identifying research software that can be used for future labelling tasks. Our recommendations include expanding the R café, creating FAIR reference documents, featuring and highlighting high impact and FAIR research software, and creating yearly reports. We conclude that our labelled GitHub dataset allows us to infer actionable recommendations on RSE practice.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectExploratory data analysis of GitHub data in the context of FAIRness and research domains
dc.titleMapping Research Software Landscapes through Exploratory Studies of GitHub Data
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsexploratory data analysis; FAIR; research software; GitHub
dc.subject.courseuuBusiness Informatics
dc.thesis.id11858


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record