dc.rights.license | CC-BY-NC-ND | |
dc.contributor.advisor | Meijers, Evert | |
dc.contributor.author | Rijen, Diederik van | |
dc.date.accessioned | 2022-09-09T00:02:55Z | |
dc.date.available | 2022-09-09T00:02:55Z | |
dc.date.issued | 2022 | |
dc.identifier.uri | https://studenttheses.uu.nl/handle/20.500.12932/42413 | |
dc.description.abstract | This study delves into three different approaches of document classification
in order to successfully classify the type of relationship between European
cities: ”LDA Topic modeling, Word embedding classification and Word frequency representation”. The first method provides a distribution of topics,
the second provides hard classification, while the last method uses word frequency metrics to represent a document by its most relevant words. LDA
topic modeling and word embedding classification provided very similar results for a dataset of 311.000 paragraphs, indicating a serious level of accuracy,
and proving that they could both be used for classification. | |
dc.description.sponsorship | Utrecht University | |
dc.language.iso | EN | |
dc.subject | Classifying and labeling the relationships between cities with high levels of co-occurrence on the English Wikipedia | |
dc.title | Classifying and labeling the relationships between cities with high levels of co-occurrence on the English Wikipedia | |
dc.type.content | Master Thesis | |
dc.rights.accessrights | Open Access | |
dc.subject.keywords | European cities, network, toponym co-occurrence, classification, topic modeling, word embedding. | |
dc.subject.courseuu | Applied Data Science | |
dc.thesis.id | 8880 | |