Twitter sentiment analysis using LSS

Data

Data for this experiment comprises of 500K English Twitter posts that links to the Sputnik News website and published between 2017-07-09 and 2017-10-30. The methodological challenge here is to estimate sentiment of words in very small social media texts

Sentiment analysis

Sentiment seed words

Seed words are common English seed words that have also been used to analyze Russian news agencies’ coverage of the Ukraine crisis.

seed <- c('good', 'nice', 'excellent', 'positive', 'fortunate', 'correct', 'superior',
          'bad', 'nasty', 'poor', 'negative', 'unfortunate', 'wrong', 'inferior')

Sentiment estimation

Sentiment of words, excluding tags and user names, in the corpus are estimated based on cosine similarity to the seed words in SVD-reduced semantic space. The following table shows estimated sentiment scores:

DT::datatable(data.frame(word = names(syno3_norm), score = unname(syno3_norm)))

Experimental results

Based on the hash tags, twitter posts are classified as about Russia and the United States in this example.

data_ru <- subset(data, russia)
data_us <- subset(data, america)

Distributions

Average sentiment score for Russia is 0.25 point higher than that for the United States, suggesting that the posts on Russia tend to be more positive.

mean(data_ru$sentiment) - mean(data_us$sentiment)

## [1] 0.2507293

Variance of score is greater in Russia than the United States, suggesting that the content of the posts are more more diverse on Russia.

sd(data_ru$sentiment)

## [1] 0.4114745

sd(data_us$sentiment)

## [1] 0.4037266

plot(NULL, ylim = c(0, 1.5), xlim = c(-2, 2), ylab = 'Frequency')
legend('topleft', lty = 1, legend = c('Russia', 'US'), col = c('red', 'black'))
lines(density(data_us$sentiment), col = 'black')
abline(v = mean(data_us$sentiment), lty = 2)
lines(density(data_ru$sentiment), col = 'red')
abline(v = mean(data_ru$sentiment), lty = 2, col = 'red')

Trends

Posts about Russia is consistently positive, while posts on the United States became more negative gradually.

plot(data_ru$time, data_ru$sentiment, pch = 19, col = adjustcolor('red', 0.1),
     ylim = c(-1, 1), main = 'Russia')
abline(h = 0)
lines(predict(loess(sentiment ~ time, data_ru, span = 0.1),
              newdata = data.frame(time = seq(0, max(data_ru$time)))), col = 'red')

plot(data_us$time, data_us$sentiment, pch = 19, col = adjustcolor('black', 0.1),
     ylim = c(-1, 1), main = 'US')
abline(h = 0)
lines(predict(loess(sentiment ~ time, data_us, span = 0.1),
              newdata = data.frame(time = seq(0, max(data_us$time)))), col = 'black')