Geodata source retrieval in PDOK by multilingual/semantic query expansion

Sajjadianjaghargh, B.

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Sheider, Simon
dc.contributor.author	Sajjadianjaghargh, B.
dc.date.accessioned	2021-08-23T18:00:26Z
dc.date.available	2021-08-23T18:00:26Z
dc.date.issued	2021
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/41037
dc.description.abstract	The volume of geodata available on Spatial Data infrastructures (SDIs) continues to grow, and there is an increasing problem with the abundance of geodata in terms of the discovery and accessibility in the distributed environments. It is difﬁcult for end-users to ﬁnd relevant content provided by different data providers. This problem becomes more challenging when it comes to supporting natural language in search engines; since the effectiveness and the findability of datasets rely on search techniques and the clarity of the queries. The keywords used by users are often different from the keywords recorded on metadata. However, the keywords submitted to the search engines may have semantically related to the content of metadata. Therefore, Natural Language Processing (NLP) techniques can be employed in conjunction with the technology used in the search engines to help different users with language limitations and specific domains by capturing the semantic and linguistic content in metadata. When a query executes poorly, the business logic behind the search engine reformulates and enriches queries based on the synonyms and relations gathered from the online data resources, which affects the recall and precision of geodata retrieval. This approach is a common technique and has been implemented for open-domain search engines such as Google and location-based services. However, spatial search and NLP techniques on the current Catalogue Services (CSs) are ongoing research topics and still required much work to be beneficial for users to take advantage of existing open government datasets. To address the limitations of search on the current SDIs and bridge between users’ minds and contents documented in metadata, in this research, we examined query expansion using WordNet and Google translate API to generate more semantic keywords. In this work, we proposed a corpus-based methodology for query keyword extraction. The corpus is gathered from real users’ questions in natural language. Then, these keywords are enriched using WordNet and Google translate APIs. Evaluation is carried out compared to a manual gold standard and baseline for the set of query keywords. Our empirical evaluation study on these tasks is executed by developing three retrieval platforms. Our study shows that the semantic keywords resulted from the local multilingual WordNet platform help users by reformulating good alternative queries. This approach also causes improvement in the precision and recall of geo-datasets by 1% and 22% respectively.
dc.description.sponsorship	Utrecht University
dc.format.extent	1795355
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.title	Geodata source retrieval in PDOK by multilingual/semantic query expansion
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	Spatial search, Metadata, NLP, WordNet
dc.subject.courseuu	Geographical Information Management and Applications (GIMA)

Files in this item

Name:: MaryamSajjadian_Thesis_Final.pdf
Size:: 1.712Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record