Comparing Topological Communities and Communities of Interest Using Topic Modeling
Summary
In this thesis I propose the repurposing of Latent Dirichlet Allocation (LDA), a topic modeling algorithm, for the discovery of communities of interest. To test it, I use it to discover communities on the social news and entertainment website reddit. I then use it to compare the composition of communities of interest to that of topological communities: communities discovered based on the topology of social graphs. I use both methods to find communities based on the Enron email corpus, and compare their results using cluster evaluation methods.