Classifying and labeling the relationships between cities with high levels of co-occurrence on the English Wikipedia
Summary
This study delves into three different approaches of document classification
in order to successfully classify the type of relationship between European
cities: ”LDA Topic modeling, Word embedding classification and Word frequency representation”. The first method provides a distribution of topics,
the second provides hard classification, while the last method uses word frequency metrics to represent a document by its most relevant words. LDA
topic modeling and word embedding classification provided very similar results for a dataset of 311.000 paragraphs, indicating a serious level of accuracy,
and proving that they could both be used for classification.