dc.rights.license | CC-BY-NC-ND | |
dc.contributor.advisor | Rico Cuevas, Ramon | |
dc.contributor.author | Veips, Filips | |
dc.date.accessioned | 2024-09-16T23:02:06Z | |
dc.date.available | 2024-09-16T23:02:06Z | |
dc.date.issued | 2024 | |
dc.identifier.uri | https://studenttheses.uu.nl/handle/20.500.12932/47779 | |
dc.description.abstract | Entity matching is an essential field of study in terms of working with data. Effective coinciding of the entities with each other can significantly increase the effective output out of the data. The main problems with entity matching comes from 2 sources: the flaws of the data and the the matching effectiveness. This research is dedicated to proceeding through these problems in order to state an effective entity resolution algorithm capable of dealing with real-world data. We have constructed four different entity matching models: probabilistic model SPLINK, machine learning models logistic regression, support vector machines and BERT-based transformer. All the models were applied to the same data which was preprocessed accordingly. SPLINK model showed the best result and can be used in similar tasks in the future. However, it is worth mentioning the performance of other models is also quite optimistic and their usage can be viable. | |
dc.description.sponsorship | Utrecht University | |
dc.language.iso | EN | |
dc.subject | How can advancements in the data science space such as vectorization, graph traversals and probabilistic record linkage, enhance entity resolution in multi-source data environments? | |
dc.title | How can advancements in the data science space such as vectorization, graph traversals and probabilistic record linkage, enhance entity resolution in multi-source data environments? | |
dc.type.content | Master Thesis | |
dc.rights.accessrights | Open Access | |
dc.subject.keywords | Entity resolution; entity matching; record linkage; machine learning | |
dc.subject.courseuu | Applied Data Science | |
dc.thesis.id | 39402 | |