Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorReijers, Hajo
dc.contributor.authorBoer, Joris de
dc.date.accessioned2025-03-25T00:01:24Z
dc.date.available2025-03-25T00:01:24Z
dc.date.issued2025
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/48661
dc.description.abstractThis research aims to investigate how organizations can improve their business value when handling unstructured data. While they manage their product often in a structured form, unstructured forms are disregarded. The unstructured data is an untapped resource which can be 80% of the data of an organization. It contains a lot of knowledge and is at risk of being forgotten or ignored, requiring organizations to put in effort to investigate and document again and again. At first, a structured literature review was performed to understand the background. The literature provided insights on organizations and their lack of a standardized way of ensuring the quality of their unstructured data. When focus is put on interpretability, relevancy, and accuracy in an iterative manner, organizations are bound to improve the quality. In the context of this research recommendations are made to apply data curation teams ensuring the quality of metadata to improve accessibility, sharing, and management of data. For managing data, literature suggests to apply domainspecific methods to provide structure. To determine the impact of processing techniques on data quality, a comparison was made on quality metrics as a result of classifying differently processed datasets. Three methods were investigated, two with a different order of processing techniques - the methods of Barbantan and Lim - and one with a different set of steps altogether - the method of Sanchez-Segura. The results of the comparison show an overall lack of significant differences, indicating that the implemented processing techniques are not the sole reason for differences in quality metrics. Slight improvements in Accuracy and precision were observed with RF and SVM classifiers in the similarly structured methods, but large variations were found for recall and F1-scores for the NB and DT classifiers. Further research is necessary to gain full understanding of the potential impact of different processing techniques.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectThe research discusses unstructured data and its presence within organizations. In the research, information is provided on characteristics of this data and how it is handled with domain-specific methods. In the research, different methods were investigated that process unstructured data in a specific way. These processing steps are applied to different datasets from which quality metrics were gathered. These metrics are then compared to test the effect of processing steps on unstructured data.
dc.titleGaining Business Value from Unstructured Data
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsBusiness Value, Data Quality, Information Retrieval, Knowledge Retention, Metadata, Method Comparison, Natural Language Processing Techniques, Text Mining Techniques, Unstructured Data
dc.subject.courseuuBusiness Informatics
dc.thesis.id44512


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record