Classifying and labeling the relationships between cities with high levels of co-occurrence on the English Wikipedia

Rijen, Diederik van

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Meijers, Evert
dc.contributor.author	Rijen, Diederik van
dc.date.accessioned	2022-09-09T00:02:55Z
dc.date.available	2022-09-09T00:02:55Z
dc.date.issued	2022
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/42413
dc.description.abstract	This study delves into three different approaches of document classification in order to successfully classify the type of relationship between European cities: ”LDA Topic modeling, Word embedding classification and Word frequency representation”. The first method provides a distribution of topics, the second provides hard classification, while the last method uses word frequency metrics to represent a document by its most relevant words. LDA topic modeling and word embedding classification provided very similar results for a dataset of 311.000 paragraphs, indicating a serious level of accuracy, and proving that they could both be used for classification.
dc.description.sponsorship	Utrecht University
dc.language.iso	EN
dc.subject	Classifying and labeling the relationships between cities with high levels of co-occurrence on the English Wikipedia
dc.title	Classifying and labeling the relationships between cities with high levels of co-occurrence on the English Wikipedia
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	European cities, network, toponym co-occurrence, classification, topic modeling, word embedding.
dc.subject.courseuu	Applied Data Science
dc.thesis.id	8880

Files in this item

Name:: THESIS__CITYNET____APPLIED_DAT ...
Size:: 2.016Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record