Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorFeelders, A.J.
dc.contributor.advisorSiebes, A.P.J.M.
dc.contributor.authorGroot, R. de
dc.date.accessioned2012-08-24T17:01:20Z
dc.date.available2012-08-24
dc.date.available2012-08-24T17:01:20Z
dc.date.issued2012
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/15123
dc.description.abstractThe goal of this master thesis is to classify short Twitter messages with respect to their sentiment using data mining techniques. Twitter messages, or tweets, are limited to 140 characters. This limitation makes it more difficult for people to express their sentiment and as a consequence, the classification of the sentiment will be more difficult as well. The sentiment can refer to two different types: emotions and opinions. This research is solely focused on the sentiment of opinions. These opinions can be divided into three classes: positive, neutral and negative. The tweets are then classified with an algorithm to one of those three classes. Known supervised learning algorithms as support vector machines and naive Bayes are used to create a prediction model. Before the prediction model can be created, the data has to be pre-processed from text to a fixed-length feature vector. The features consist of sentiment-words and frequently occurring words that are predictive for the sentiment. The learned model is then applied to a test set to validate the model. The data that is considered in this research is based on two datasets, one from Sananalytics and a self-built Twitter dataset. When the models are applied and tested in-sample the results were quite acceptable. However out-of-sample, with cross-validation the results were disappointing. The sparsity in the dataset seems to cause the issue that the features in the training set does not cover the data in the test set well.
dc.description.sponsorshipUtrecht University
dc.language.isoen_US
dc.titleData Mining for Tweet Sentiment Classification
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordssentiment analysis, data analysis, text mining, classification, big data, social media, twitter, complex event processing
dc.subject.courseuuComputing Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record