Data for this experiment comprises of 500K English Twitter posts that links to the Sputnik News website and published between 2017-07-09 and 2017-10-30. The methodological challenge here is to estimate sentiment of words in very small social media texts
Seed words are common English seed words that have also been used to analyze Russian news agencies’ coverage of the Ukraine crisis.
seed <- c('good', 'nice', 'excellent', 'positive', 'fortunate', 'correct', 'superior',
'bad', 'nasty', 'poor', 'negative', 'unfortunate', 'wrong', 'inferior')
Sentiment of words, excluding tags and user names, in the corpus are estimated based on cosine similarity to the seed words in SVD-reduced semantic space. The following table shows estimated sentiment scores:
DT::datatable(data.frame(word = names(syno3_norm), score = unname(syno3_norm)))
Based on the hash tags, twitter posts are classified as about Russia and the United States in this example.
data_ru <- subset(data, russia)
data_us <- subset(data, america)
Average sentiment score for Russia is 0.25 point higher than that for the United States, suggesting that the posts on Russia tend to be more positive.
mean(data_ru$sentiment) - mean(data_us$sentiment)
## [1] 0.2507293
Variance of score is greater in Russia than the United States, suggesting that the content of the posts are more more diverse on Russia.
sd(data_ru$sentiment)
## [1] 0.4114745
sd(data_us$sentiment)
## [1] 0.4037266
plot(NULL, ylim = c(0, 1.5), xlim = c(-2, 2), ylab = 'Frequency')
legend('topleft', lty = 1, legend = c('Russia', 'US'), col = c('red', 'black'))
lines(density(data_us$sentiment), col = 'black')
abline(v = mean(data_us$sentiment), lty = 2)
lines(density(data_ru$sentiment), col = 'red')
abline(v = mean(data_ru$sentiment), lty = 2, col = 'red')
Posts about Russia is consistently positive, while posts on the United States became more negative gradually.
plot(data_ru$time, data_ru$sentiment, pch = 19, col = adjustcolor('red', 0.1),
ylim = c(-1, 1), main = 'Russia')
abline(h = 0)
lines(predict(loess(sentiment ~ time, data_ru, span = 0.1),
newdata = data.frame(time = seq(0, max(data_ru$time)))), col = 'red')
plot(data_us$time, data_us$sentiment, pch = 19, col = adjustcolor('black', 0.1),
ylim = c(-1, 1), main = 'US')
abline(h = 0)
lines(predict(loess(sentiment ~ time, data_us, span = 0.1),
newdata = data.frame(time = seq(0, max(data_us$time)))), col = 'black')