Assessing the spatial context of sentiments in geo-social media
Summary
Various social media networking platforms are nowadays empowering millions of
worldwide users to easily publish their contents on the internet. As the amount of
data created by social media users is steadily growing on a daily basis, many
researchers are investigating the possibilities to exploit this vast cloud of data and
to derive useful information from it. The sheer amount of social media records
available requires automated solutions in data handling and text processing in
order to derive the desired information and to conduct research on it.
This research particularly aims at assessing sentiments and their spatial context at
hand of data from the platforms Twitter and Flickr gathered throughout the year
2018. Textual contents from these platforms are assessed and classified regarding
to the sentiments they contain. Furthermore, options to derive accurate location
information from these social media records are reviewed and employed for data
originated in the Greater London area. Notably, the attention is turned to suitable
techniques for the creation of representative data samples of the respective
population with minimized bias towards certain user groups. Therefore, average
sentiment scores are aggregated for different granularities of administrative area
types. The suitability of different output granularities for this purpose is also
subject to investigation. Eventually, spatial patterns for sentiments are assessed,
and it is inquired if correlations between average sentiments and socio-economic
indicators are detectable. The paper investigates to which degree sentiments on
social media can serve as a socio-economic indicator.
The methods used in this research comprise the handling and processing of large
data volumes within SQL databases, automated text processing (with sentiment
analysis tools) and various approaches to spatial analysis. The nature of spatial
patterns is examined at hand of a Global Moran's I assessment and an optimized
Hot-spot-analysis. Socio-economic indicators are reviewed regarding their Pearson
correlation with derived sentiment scores. Analysis results are discussed and
interpreted at hand of these findings and a series of reviewed small-scale
examples within the city area.
While different approaches to an improved data sampling are successfully
developed and employed within this research, the results suggest that further
investigation on this topic is recommended in order to reach an adequate
representation of a population within a data sample. It is clearly identified that
Twitter data has a better potential of being used for the purpose of this research
than Flickr data. Regarding automatized sentiment analysis, the tool SentiStrength
is identified as an ideal tool for the purpose of this research within a comparative
study on available instruments. It is employed to classify sentiments for a vast
amount of social media records, which are subsequently aggregated to average
values at different output granularities. While spatial patterns within social media
sentiments are clearly detected, correlations between those sentiments and other
socio-economic are only traceable to a weak degree, up to a Pearson' r correlation
coefficient slightly higher than +0.3. Finally, with the aim of strengthening the
eligibility of social media contents as a socio-economic indicator, suggestions on
further improvements in data sampling and analysis methods are given.