Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorSheider, Simon
dc.contributor.authorSajjadianjaghargh, B.
dc.date.accessioned2021-08-23T18:00:26Z
dc.date.available2021-08-23T18:00:26Z
dc.date.issued2021
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/41037
dc.description.abstractThe volume of geodata available on Spatial Data infrastructures (SDIs) continues to grow, and there is an increasing problem with the abundance of geodata in terms of the discovery and accessibility in the distributed environments. It is difficult for end-users to find relevant content provided by different data providers. This problem becomes more challenging when it comes to supporting natural language in search engines; since the effectiveness and the findability of datasets rely on search techniques and the clarity of the queries. The keywords used by users are often different from the keywords recorded on metadata. However, the keywords submitted to the search engines may have semantically related to the content of metadata. Therefore, Natural Language Processing (NLP) techniques can be employed in conjunction with the technology used in the search engines to help different users with language limitations and specific domains by capturing the semantic and linguistic content in metadata. When a query executes poorly, the business logic behind the search engine reformulates and enriches queries based on the synonyms and relations gathered from the online data resources, which affects the recall and precision of geodata retrieval. This approach is a common technique and has been implemented for open-domain search engines such as Google and location-based services. However, spatial search and NLP techniques on the current Catalogue Services (CSs) are ongoing research topics and still required much work to be beneficial for users to take advantage of existing open government datasets. To address the limitations of search on the current SDIs and bridge between users’ minds and contents documented in metadata, in this research, we examined query expansion using WordNet and Google translate API to generate more semantic keywords. In this work, we proposed a corpus-based methodology for query keyword extraction. The corpus is gathered from real users’ questions in natural language. Then, these keywords are enriched using WordNet and Google translate APIs. Evaluation is carried out compared to a manual gold standard and baseline for the set of query keywords. Our empirical evaluation study on these tasks is executed by developing three retrieval platforms. Our study shows that the semantic keywords resulted from the local multilingual WordNet platform help users by reformulating good alternative queries. This approach also causes improvement in the precision and recall of geo-datasets by 1% and 22% respectively.
dc.description.sponsorshipUtrecht University
dc.format.extent1795355
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.titleGeodata source retrieval in PDOK by multilingual/semantic query expansion
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsSpatial search, Metadata, NLP, WordNet
dc.subject.courseuuGeographical Information Management and Applications (GIMA)


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record