Scalable and Reuse-Oriented Data Integration: A Distributed Semi-Automatic Approach

Dabroek, M.G.

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Werf, J.M.E.M. van der
dc.contributor.author	Dabroek, M.G.
dc.date.accessioned	2016-11-17T18:00:36Z
dc.date.available	2016-11-17T18:00:36Z
dc.date.issued	2016
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/24806
dc.description.abstract	In the current information age, it is crucial for an organisation to integrate all of its available source systems to provide deep insights, adhere to regulations, or provide a competitive edge. However, data integration often proves to be a tedious and costly process. In this study we aim to answer the question: can we formulate and construct a semi-automatic distributed system to enable scalable and reuse-oriented data integration? To aid the process of data integration we propose a system that takes advantage of previously provided associations between source schemas. To provide a common language between disparate sources, we introduce the use of an ontology to our proposed solution. We attempt to semi-automatically match attributed from the schemas of source systems to entities of the ontology, by utilising a self-learning aspect and a feedback loop. We attempt to achieve this by applying a dual approach, using both semantical and structural aspects of the source and the ontology. We constructed a proof-of-concept and performed a user acceptance study to evaluate our approach and validate our solution. Our contribution is two-fold: we distribute our data integration system, thereby contributing to the scalability of the system, and we reuse previously obtained results to enable semi-automatic matching. We performed both quantitative and qualitative analysis to evaluate the accuracy and feasibility of our system. The outcome suggests that our solution has merit and shows that end-users have positive expectations towards its use and performance.
dc.description.sponsorship	Utrecht University
dc.format.extent	3600810
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.title	Scalable and Reuse-Oriented Data Integration: A Distributed Semi-Automatic Approach
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	data integration; semantic integration; scalable systems; reuse; reference architecture; schema matching; ontology; Elasticsearch; UTAUT;
dc.subject.courseuu	Business Informatics

Files in this item

Name:: 20161115-mgdabroek-thesis+paper.pdf
Size:: 3.434Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record