Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorVelegrakis, Ioannis
dc.contributor.authorMahhov, Peter
dc.date.accessioned2025-02-06T00:01:35Z
dc.date.available2025-02-06T00:01:35Z
dc.date.issued2025
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/48465
dc.description.abstractWe have created Row Merge — a new way to save storage space in large datasets while minimizing the loss of information. In recent years, the amount of data collected in all aspects of life has increased not only beyond our ability to process it all, but to even store it. Existing data reduction methods mainly revolve around either the calculation of summary statistics or the deletion of less important components, both of which permanently erase attribute relationships. Our approach involves the merging of similar rows by replacing differing cells with null values, which allows information to be preserved as uncertain that would have instead been lost if tuples had been deleted. A database reduced in this manner will never give false negative answers to queries compared to the original. However, Row Merge does introduce false positive query answers. In this work, we have developed the theory behind the informational value of incomplete databases and designed several data reduction algorithms using this principle. We have created quantitative metrics to evaluate the amount of information in an incomplete table without the need to possess the original, and evaluated the quality and performance of several different novel approaches on both real and synthetic datasets. We have discovered the ideal use cases for each new algorithm, and showed that Row Merge surpasses deletion in preserving information after data reduction in all the real-world datasets we tested.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectThis work introduces Row Merge, a novel data reduction method designed to save storage space in structured datasets while minimizing information loss. By merging similar rows and replacing differing cells with null values, it preserves uncertain information that would be lost through deletion. The research establishes quantitative metrics to evaluate information retention in incomplete databases, creates new reduction algorithms, and tests their effectiveness on real and synthetic datasets.
dc.titleRow Merge: Data reduction through expansion into possible worlds for data sustainability
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsIncomplete databases; Data reduction; Information preservation; Algorithm design
dc.subject.courseuuComputing Science
dc.thesis.id42739


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record