Applying Semantic Integration to improve Data Quality
Summary
The research project is initiated to compose the mobility carbon footprint for the Netherlands, this footprint describes the impact on the environment of all travel movements in the Netherlands, expressed in emission of Carbon Dioxide (CO2). There is no data source available that can provide an insight in both the number of movements and the location where these movements are made, this information has to be gathered from multiple data sources. The purpose of this research is to improve the data quality of both relevant datasets by applying semantic integration techniques. The resulting, integrated dataset can provide the information that neither of the data sources individually can.
A research method is proposed, consisting of several steps: the data quality of the two individual datasets is assessed, using a shared ontology semantic matches are identified. Using these semantic matches the individual datasets are integrated to a dataset with a higher data quality than the originating data sets. Lastly, the data quality of the integrated dataset is assessed to determine whether there is an increase in data quality.
The semantic integration results in an integrated dataset that is used to compose the mobility carbon footprint for areas in the Netherlands. The carbon footprint grants an insight in the mobility of all Mezuro areas in the Netherlands. Because of the integration of the Mezuro and OViN data it is now possible to get detailed information on CO2 emission from mobility (for example highest emitting areas).