dc.description.abstract | In the current information age, it is crucial for an organisation to integrate all of its available source systems to provide deep insights, adhere to regulations, or provide a competitive edge. However, data integration often proves to be a tedious and costly process. In this study we aim to answer the question: can we formulate and construct a semi-automatic distributed system to enable scalable and reuse-oriented data integration?
To aid the process of data integration we propose a system that takes advantage of previously provided associations between source schemas. To provide a common language between disparate sources, we introduce the use of an ontology to our proposed solution. We attempt to semi-automatically match attributed from the schemas of source systems to entities of the ontology, by utilising a self-learning aspect and a feedback loop. We attempt to achieve this by applying a dual approach, using both semantical and structural aspects of the source and the ontology.
We constructed a proof-of-concept and performed a user acceptance study to evaluate our approach and validate our solution. Our contribution is two-fold: we distribute our data integration system, thereby contributing to the scalability of the system, and we reuse previously obtained results to enable semi-automatic matching.
We performed both quantitative and qualitative analysis to evaluate the accuracy and feasibility of our system. The outcome suggests that our solution has merit and shows that end-users have positive expectations towards its use and performance. | |