Reusability of volunteered geographic information supported by Semantic Web technologies: a case study for environmental applications.
Summary
New infrastructures, technologies and standards contribute to an internet that is more complex, dynamic and diverse than ever. It has never been easier to contribute to the growing networks of websites and (social media) platforms. All over the internet there is geographical information; sometimes explicitly, often implicit. To signify this, the term volunteered geographic information (VGI) was popularised in the academic community by Michael Goodchild a decade ago.
The amount of VGI keeps growing, and therefore it is timely to start thinking about how we can maintain the reusability of this data for the future. There are already several techniques in place on the internet that allow the reuse of data (i.e. web APIs, download services, and web scraping). Besides these current technologies, there are so-called Semantic Web technologies that can aid the reusability of VGI. Semantic Web technologies strive to create a web of data rather than a web of documents. It consists of a data standard (RDF), data structures (OWL) and a query language (SPARQL) that enables the development of this web of data.
The goal of this thesis is to develop a method in which Semantic Web technologies are used to improve the reusability of VGI. This entails the gathering of data from multiple (VGI) sources and creating proofs of concepts on the basis of use cases. These use cases are exemplary cases of how VGI could be reused by means of Semantic Web technologies. The use cases consist of five VGI systems and one authoritative data source in the environmental domain. The metadata and the data from these systems is extracted and reusability is attempted on both levels.
A domain ontology was developed to aid the reusability of VGI. Where possible, existing ontologies are applied, however many features and attributes were not readily available in existing ontologies. This VGI ontology is published online.
Chapter 4 delineates a general method for the reuse of VGI by employing Semantic Web technologies. It consists of four sequential steps, namely:
1. Gather metadata,
2. Gather data,
3. Model the (meta)data in RDF and
4. Upload and query the (meta)data.
This method is applied in Chapter 5 on the selected data sources as a proof of concept. Within the environmental domain, three use cases are developed: trash, weather and air quality.
In conclusion, this thesis has found that combining metadata from multiple sources yielded the most positive results. The Semantic Web technologies provide a structure for previously unstructured metadata which can be used for exploratory queries to discover the intricacies of a system. On the data level, reuse is more difficult because of the data quality of VGI and semantic gaps between the data collection and -processing methods. Semantic Web technologies provide additional structure and information about data however not every detail and interpretation is modellable.