View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Scalable and Reuse-Oriented Data Integration: A Distributed Semi-Automatic Approach

        Thumbnail
        View/Open
        20161115-mgdabroek-thesis+paper.pdf (3.434Mb)
        Publication date
        2016
        Author
        Dabroek, M.G.
        Metadata
        Show full item record
        Summary
        In the current information age, it is crucial for an organisation to integrate all of its available source systems to provide deep insights, adhere to regulations, or provide a competitive edge. However, data integration often proves to be a tedious and costly process. In this study we aim to answer the question: can we formulate and construct a semi-automatic distributed system to enable scalable and reuse-oriented data integration? To aid the process of data integration we propose a system that takes advantage of previously provided associations between source schemas. To provide a common language between disparate sources, we introduce the use of an ontology to our proposed solution. We attempt to semi-automatically match attributed from the schemas of source systems to entities of the ontology, by utilising a self-learning aspect and a feedback loop. We attempt to achieve this by applying a dual approach, using both semantical and structural aspects of the source and the ontology. We constructed a proof-of-concept and performed a user acceptance study to evaluate our approach and validate our solution. Our contribution is two-fold: we distribute our data integration system, thereby contributing to the scalability of the system, and we reuse previously obtained results to enable semi-automatic matching. We performed both quantitative and qualitative analysis to evaluate the accuracy and feasibility of our system. The outcome suggests that our solution has merit and shows that end-users have positive expectations towards its use and performance.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/24806
        Collections
        • Theses
        Utrecht university logo