View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Classifying and labeling the relationships between cities with high levels of co-occurrence on the English Wikipedia

        Thumbnail
        View/Open
        THESIS__CITYNET____APPLIED_DATA_SCIENCE_FINAL_V3.pdf (2.016Mb)
        Publication date
        2022
        Author
        Rijen, Diederik van
        Metadata
        Show full item record
        Summary
        This study delves into three different approaches of document classification in order to successfully classify the type of relationship between European cities: ”LDA Topic modeling, Word embedding classification and Word frequency representation”. The first method provides a distribution of topics, the second provides hard classification, while the last method uses word frequency metrics to represent a document by its most relevant words. LDA topic modeling and word embedding classification provided very similar results for a dataset of 311.000 paragraphs, indicating a serious level of accuracy, and proving that they could both be used for classification.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/42413
        Collections
        • Theses
        Utrecht university logo