Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorRico Cuevas, Ramon
dc.contributor.authorVeips, Filips
dc.date.accessioned2024-09-16T23:02:06Z
dc.date.available2024-09-16T23:02:06Z
dc.date.issued2024
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/47779
dc.description.abstractEntity matching is an essential field of study in terms of working with data. Effective coinciding of the entities with each other can significantly increase the effective output out of the data. The main problems with entity matching comes from 2 sources: the flaws of the data and the the matching effectiveness. This research is dedicated to proceeding through these problems in order to state an effective entity resolution algorithm capable of dealing with real-world data. We have constructed four different entity matching models: probabilistic model SPLINK, machine learning models logistic regression, support vector machines and BERT-based transformer. All the models were applied to the same data which was preprocessed accordingly. SPLINK model showed the best result and can be used in similar tasks in the future. However, it is worth mentioning the performance of other models is also quite optimistic and their usage can be viable.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectHow can advancements in the data science space such as vectorization, graph traversals and probabilistic record linkage, enhance entity resolution in multi-source data environments?
dc.titleHow can advancements in the data science space such as vectorization, graph traversals and probabilistic record linkage, enhance entity resolution in multi-source data environments?
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsEntity resolution; entity matching; record linkage; machine learning
dc.subject.courseuuApplied Data Science
dc.thesis.id39402


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record