Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorMeijers, Evert
dc.contributor.authorRijen, Diederik van
dc.date.accessioned2022-09-09T00:02:55Z
dc.date.available2022-09-09T00:02:55Z
dc.date.issued2022
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/42413
dc.description.abstractThis study delves into three different approaches of document classification in order to successfully classify the type of relationship between European cities: ”LDA Topic modeling, Word embedding classification and Word frequency representation”. The first method provides a distribution of topics, the second provides hard classification, while the last method uses word frequency metrics to represent a document by its most relevant words. LDA topic modeling and word embedding classification provided very similar results for a dataset of 311.000 paragraphs, indicating a serious level of accuracy, and proving that they could both be used for classification.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectClassifying and labeling the relationships between cities with high levels of co-occurrence on the English Wikipedia
dc.titleClassifying and labeling the relationships between cities with high levels of co-occurrence on the English Wikipedia
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsEuropean cities, network, toponym co-occurrence, classification, topic modeling, word embedding.
dc.subject.courseuuApplied Data Science
dc.thesis.id8880


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record