Sentiment analysis of user generated geographical content produced by locals and tourists across multiple geo-social media platforms: a case study in London, UK
Summary
Recent advances in information technology have sparked increased availability of geographical information produced by users of geo-social platforms. Researchers have used this user-generated geographical content in various contexts, yet a deeper understanding of how content is produced differently by users local to a given region and users that are merely tourists there is required, especially across more than one geo-social media platform.
This thesis explores the content produced by tourists and locals in a case study set in London, UK. Data is retrieved from Twitter and TripAdvisor, spanning from February 2018 to January 2019. For Twitter, only tweets that contain a Place of Interest (POI) place tag are used in order to conceptually match the geographical information retrieved for TripAdvisor. Users are classified as tourists or locals using the n-days approach and the users' location field information. Sentiments of tweets and reviews are analyzed using the Valence Aware Dictionary and sEntiment Reasoner (VADER). Finally, the quantity and sentiments of reviews and tweets produced by tourists and locals are analyzed across time and space. While an integration of the data from the two sources is possible, especially because the location information retrieved from both platforms’ data is conceptually alike, an interpretation of the merged data is of little use.
The results are then analyzed separately for each platform. They show that there are small but significant differences between locals and tourists on both Twitter and TripAdvisor. These include (1) higher activity among tourists during the early summer months than among locals; (2) places visited by tourists have a stronger geographically central tendency than those that locals visit; (3) tourists exhibit more negative sentiments in the mid/late summer months; (4) overall, tourists exhibit slightly lower compound sentiments than locals. Apart from these, there are further differences that can be observed within one of the two platforms, such as lower sentiment of local users during weekends on Twitter, and more significant differences between locals’ and tourists’ sentiment at POI closer to the study area’s center.
It is concluded that the place-tagged tweets from Twitter present an interesting alternative to the geo-tagged tweets, but that a more profound investigation of these places is required. The same goes for the approach used to classify tourists and locals: a combined approach has proven a reliable method, but further research into the different possible combinations of various approaches is needed.