Twitter Sentiment Comparison

Introduction

One of my greatest joys in life is parousing through Twitter. While I like to think I have curated my feed to tune out much of the noise of the mob, Twitter is undoubtedly the closest thing we have to a quantifiable global town square. Though I am not a judiciary in Delaware I have immense curiosity regarding Twitter. Is Twitter the equivalent of an instant poll of public sentiment? How prevalent are bots on Twitter? How balanced is the Twitter user base as best characterized by political ideology? Is Twitter a nest of negativity? These are some of the questions I have often had about the platform.

My interest in data analysis has certainly been focused more on numbers. So I thought it might be fun to dive into sentiment analysis and Twitter seemed like a great target given my curiosity. I had three goals on this project in addition to my Twitter curiosities:

examine text analysis mechanisms and methods
evaluate the efficacy of sentiment analysis
interact with a large-scale API

library(rtweet)
library(dplyr)
library(tidyr)
library(tidytext)
library(ggplot2)
library(purrr)

Twitter API

Connecting to the Twitter API is pretty simple assuming you have an active account and do not mind accessing data via your account. Other methods include authentication via App Authentication or Bot Authentication which are recommended if doing heavy lifting. Regardless you will have to sign up for a Twitter developer account located here. Directions from cran.r-project.org are quite helpful as well. Below is the simple authentication code leveraging a personal account.

auth_setup_default()

## Using default authentication available.
## Reading auth from '/Users/worthsmacbookair/Library/Preferences/org.R-project.R/R/rtweet/default.rds'

Sampling Data

While the search_tweets function is fairly straight forward, the nuances of Twitter can make for tricky data pulling for sentiment analysis from a design perspective. There are an incredible amount of potential confounders.

Here I decided to only pull 500 tweets related to the subjects of interest. The code used below should work for any n < 18,000. Any pull looking for greater than that should make sure to change the default argument for retryonratelimit = TRUE. Here we used “mixed” type tweets which ware standard tweets as “popular” tweets did not provide a large enough sample size. The query also does not include retweets and the language is set to English. Given their relevance, generalized tendency to polarize, and uniqueness of text I chose Biden and Putin as guinea pigs for this first run. If when running a query an error pops up just try rerunning the code as this is usually temporary. I found the errors were more prevalent as n increased. In addition to pulling the raw data from Twitter the full_text component was selected as it is the variable of interest for this sentiment analysis.

#Pulling A
A_en <- search_tweets("Putin", n=500, type = "mixed", include_rts = FALSE, 
                          lang = "en")

tweets.A = A_en %>% select(full_text)

head(tweets.A)

## # A tibble: 6 × 1
##   full_text                                                                     
##   <chr>                                                                         
## 1 "⚡️Russian Duma officials fined after calling for Putin's removal. \n\nRussia…
## 2 "Again, when faced w/ resolve, resistance, determined defense &amp; force he …
## 3 "A second municipal council in Moscow’s Lomonosovsky district voted on a simi…
## 4 "@spartyflyboy @secretsqrl123 what do you think will happen when Putin faces …
## 5 "@TimOBrien Am seeing Russian equipment losses that are absolutely staggering…
## 6 "@EwanMacKenna What a pathetic statement. Sad that Putin humiliated and prove…

#Pulling B
B_en <- search_tweets("Biden", n=500, type = "mixed", 
                              include_rts = FALSE, lang = "en")

tweets.B = B_en %>% select(full_text)
head(tweets.B)

## # A tibble: 6 × 1
##   full_text                                                                     
##   <chr>                                                                         
## 1 "BREAKING NOW: Reports coming in that AT LEAST 35 TRUMP ALLIES had their HOME…
## 2 "I live in a country whose national media aired a King Charles speech LIVE bu…
## 3 "Joe Biden jets away on yet another weekend vacation in Delaware.\n\nBiden ha…
## 4 "Speaking of division; VP Harris actually makes me grateful for President Bid…
## 5 "@chefbabyd @ThisisPstrange @NFL_Memes @TheAmitie It’s 9-11 today too and you…
## 6 "#trump loves #america , he’s proud of his country and has always supported i…

Creating Function

Pre-Processing Data

The trade offs in dealing with Twitter data for sentiment analysis are immense. The platform is an incredible glimpse into real-time sentiment of a large, diverse portion of the population. However, the data can be full of trip wires. To reduce the risks associated the data needs a good bit of pre-processing. Below is a function which accomplishes a reasonable portion of this.

Sentiment Tools

Before enumerating the steps carried out here it is important to delineate the “bing”sentiment method leveraged here. The method was chosen as its binary fashion catered well to the simple comparison being attempted here. The tidytext package has three methods of analysis and all are based on uniwords or the analysis of a single word:

bing is the simplest categorizing words in a binary fashion as either positive or negative
nrc broadens the scope a bit categorizing words into several factors like fear, joy, trust, etc
AFINN assigns words a positive or negative score ranging from -5 to 5

The function sentiment_bing below accomplishes several pre-processing steps as well as setting up scores for analysis. All of which comes in quite handy with replication. Enumerating the steps this functions executes:

establishing tibbles
removing any links by targeting “http\\S+” with gsub
parsing text into individual words via unnest_tokens function
removing stop words which are words with little sentimental value such as pronouns and articles by means of anti_join
adding bing sentiment values via inner_join (note each run can take ~30 sec at n = 500)
these are then counted and sorted before being ungrouped to provide a list of values for each tweet
score is assigned for each word in a tweet by the n number of times the word appears in the tweet (often n =1 ) being multiplied by 1 if the word is positive and -1 if negative
sent.score is the aggregation of scores for words in a tweet
zero.type is defining whether a tweet ultimately is positive or negative
partitioned into list

sentiment_bing <- function(twt){
  twt_tbl=tibble(text = twt)%>%
    mutate(
      stripped_text = gsub("http\\S+","",text)
    )%>% 
    unnest_tokens(word, stripped_text)%>%
    anti_join(stop_words)%>%
    inner_join(get_sentiments("bing"))%>%
    count(word, sentiment, sort = TRUE)%>%
    ungroup()%>%
    mutate(
      score = case_when(
        sentiment == 'negative'~n*(-1),
        sentiment == 'positive'~n*1)
    )
  
  sent.score = case_when(
    nrow(twt_tbl)==0~0, 
    nrow(twt_tbl)>0~sum(twt_tbl$score)
  )
  zero.type = case_when(
    nrow(twt_tbl)==0~"Type 1",
    nrow(twt_tbl)>0~"Type 2"
  )
  list(score = sent.score, type = zero.type, twt_tbl = twt_tbl)
}

Function Application

The sentiment_bing function is then applied to the full_text column from the streamlined version of the data pulled from Twitter. The result is a tibble with an itemized list of words from the tweet along with its sentiment, the n number of times it was used, and the words total score contribution. Again this is often 1 or -1 as unique words were typically not used more than one in a tweet.

A_sentiment <- lapply(tweets.A$full_text, function(x){sentiment_bing(x)})


B_sentiment <- lapply(tweets.B$full_text, function(x){sentiment_bing(x)})

head(A_sentiment)

## [[1]]
## [[1]]$score
## [1] -3
## 
## [[1]]$type
## [1] "Type 2"
## 
## [[1]]$twt_tbl
## # A tibble: 3 × 4
##   word     sentiment     n score
##   <chr>    <chr>     <int> <dbl>
## 1 dictator negative      1    -1
## 2 impeach  negative      1    -1
## 3 treason  negative      1    -1
## 
## 
## [[2]]
## [[2]]$score
## [1] -2
## 
## [[2]]$type
## [1] "Type 2"
## 
## [[2]]$twt_tbl
## # A tibble: 4 × 4
##   word       sentiment     n score
##   <chr>      <chr>     <int> <dbl>
## 1 deter      negative      1    -1
## 2 failure    negative      1    -1
## 3 resistance negative      1    -1
## 4 resolute   positive      1     1
## 
## 
## [[3]]
## [[3]]$score
## [1] -1
## 
## [[3]]$type
## [1] "Type 2"
## 
## [[3]]$twt_tbl
## # A tibble: 3 × 4
##   word       sentiment     n score
##   <chr>      <chr>     <int> <dbl>
## 1 criticism  negative      1    -1
## 2 rebuke     negative      1    -1
## 3 remarkable positive      1     1
## 
## 
## [[4]]
## [[4]]$score
## [1] -2
## 
## [[4]]$type
## [1] "Type 2"
## 
## [[4]]$twt_tbl
## # A tibble: 2 × 4
##   word  sentiment     n score
##   <chr> <chr>     <int> <dbl>
## 1 badly negative      1    -1
## 2 lie   negative      1    -1
## 
## 
## [[5]]
## [[5]]$score
## [1] -1
## 
## [[5]]$type
## [1] "Type 2"
## 
## [[5]]$twt_tbl
## # A tibble: 1 × 4
##   word   sentiment     n score
##   <chr>  <chr>     <int> <dbl>
## 1 losses negative      1    -1
## 
## 
## [[6]]
## [[6]]$score
## [1] -3
## 
## [[6]]$type
## [1] "Type 2"
## 
## [[6]]$twt_tbl
## # A tibble: 5 × 4
##   word     sentiment     n score
##   <chr>    <chr>     <int> <dbl>
## 1 corrupt  negative      1    -1
## 2 pathetic negative      1    -1
## 3 proven   positive      1     1
## 4 sad      negative      1    -1
## 5 weak     negative      1    -1

head(B_sentiment)

## [[1]]
## [[1]]$score
## [1] 0
## 
## [[1]]$type
## [1] "Type 2"
## 
## [[1]]$twt_tbl
## # A tibble: 2 × 4
##   word     sentiment     n score
##   <chr>    <chr>     <int> <dbl>
## 1 breaking negative      1    -1
## 2 trump    positive      1     1
## 
## 
## [[2]]
## [[2]]$score
## [1] -1
## 
## [[2]]$type
## [1] "Type 2"
## 
## [[2]]$twt_tbl
## # A tibble: 1 × 4
##   word   sentiment     n score
##   <chr>  <chr>     <int> <dbl>
## 1 danger negative      1    -1
## 
## 
## [[3]]
## [[3]]$score
## [1] 0
## 
## [[3]]$type
## [1] "Type 1"
## 
## [[3]]$twt_tbl
## # A tibble: 0 × 4
## # … with 4 variables: word <chr>, sentiment <chr>, n <int>, score <dbl>
## # ℹ Use `colnames()` to see all variable names
## 
## 
## [[4]]
## [[4]]$score
## [1] 0
## 
## [[4]]$type
## [1] "Type 2"
## 
## [[4]]$twt_tbl
## # A tibble: 2 × 4
##   word     sentiment     n score
##   <chr>    <chr>     <int> <dbl>
## 1 grateful positive      1     1
## 2 worse    negative      1    -1
## 
## 
## [[5]]
## [[5]]$score
## [1] 0
## 
## [[5]]$type
## [1] "Type 2"
## 
## [[5]]$twt_tbl
## # A tibble: 4 × 4
##   word          sentiment     n score
##   <chr>         <chr>     <int> <dbl>
## 1 blow          negative      1    -1
## 2 disrespecting negative      1    -1
## 3 master        positive      1     1
## 4 masters       positive      1     1
## 
## 
## [[6]]
## [[6]]$score
## [1] 4
## 
## [[6]]$type
## [1] "Type 2"
## 
## [[6]]$twt_tbl
## # A tibble: 6 × 4
##   word      sentiment     n score
##   <chr>     <chr>     <int> <dbl>
## 1 hates     negative      1    -1
## 2 loves     positive      1     1
## 3 pride     positive      1     1
## 4 proud     positive      1     1
## 5 supported positive      1     1
## 6 trump     positive      1     1

Aggregating Processed Data

The tibbles for each subject, A_sentiment and B_sentiment, are combined into a large tibble, Both_sentiment, by using bind_rows. Subjects were properly defined for output and score and type columns were unlisted before plotting and summarizing.

Both_sentiment <- bind_rows(
  tibble(
    Subject = "Putin",
    Score = unlist(map(A_sentiment, 'score')),
    type = unlist(map(A_sentiment, 'type'))
  ),
  tibble(
    Subject = "Biden",
    Score = unlist(map(B_sentiment, 'score')),
    type = unlist(map(B_sentiment, 'type'))
  )
)
head(Both_sentiment)

## # A tibble: 6 × 3
##   Subject Score type  
##   <chr>   <dbl> <chr> 
## 1 Putin      -3 Type 2
## 2 Putin      -2 Type 2
## 3 Putin      -1 Type 2
## 4 Putin      -2 Type 2
## 5 Putin      -1 Type 2
## 6 Putin      -3 Type 2

Visual Analysis

To visually compare sentiment a basic histogram is utilized. Bins were set to 17 as initial runs resulted in a range of scores from -8 to 8 and subsequent runs have not exceeded that variance.

These histograms are not anomalies. Having run this analysis several times on multiple subjects is was striking how normally distributed the sentiment scores were. Virtually every subject comparison tested from political figures to religious figures resulted in a fairly normally distributed sentiment analysis with a strong tendency to skew slightly negative. Across subjects the majority of tweets tended to be neutral followed by slightly negative. The visual comparison below is great at demonstrating the subtle differences in overall sentiment of the samples for each subject. Biden and Putin both skewed rather negative as most subjects do. However, Biden positive sentiment tweets outperformed Putin’s which was ultimately the difference in this comparison.

ggplot(Both_sentiment, aes(x=Score, fill = Subject))+
  geom_histogram(bins = 17, alpha =.6)+
  facet_grid(~Subject)+
  theme_bw()

Summary Statistics

Visual analysis would lend credence to Biden-related Twitter sentiment being slightly more positive than Putin-related sentiment while still being overall slightly negative. Looking at the mean tweet score for each subject would confirm visual analysis. Having run this analysis several times it seems a difference in mean scores exceeding 0.1 signals a reasonably stark difference. Replicating the analysis and storing these mean scores presents an interesting route for future analysis on interpreting these scores.

While the means are the easiest descriptive stat to interpret here the other stats support our general findings. Median scores tend to be 0 or neutral. Minimum scores very often outsized maximum scores in absolute terms. While first quantiles tended to be negative and third quantiles struggled to break the neutral threshold further supporting the tendency for sentiment on Twitter to skew negative.

A_sentiment_summary <- Both_sentiment%>%
  filter(Subject=="Putin")%>%
  summary(score)
A_sentiment_summary

##    Subject              Score            type          
##  Length:500         Min.   :-8.000   Length:500        
##  Class :character   1st Qu.:-2.000   Class :character  
##  Mode  :character   Median : 0.000   Mode  :character  
##                     Mean   :-0.798                     
##                     3rd Qu.: 0.000                     
##                     Max.   : 5.000

B_sentiment_summary <- Both_sentiment%>%
  filter(Subject=="Biden")%>%
  summary(score)
B_sentiment_summary

##    Subject              Score            type          
##  Length:500         Min.   :-7.000   Length:500        
##  Class :character   1st Qu.:-1.000   Class :character  
##  Mode  :character   Median : 0.000   Mode  :character  
##                     Mean   :-0.564                     
##                     3rd Qu.: 0.000                     
##                     Max.   : 4.000

Replication

The structure of the code allows for relatively easy replication of the analysis. In the code below only the subjects of interest required alteration of code. Only the source code is featured below with no messages or output included from its evaluation aside from the visualization for comparison.

Below the analysis is replicated using the Federal Reserve and FBI as subjects given their relevance in the current affairs. Many of the aforementioned trends are present here as well. The differences in sentiment can be subtle with most tweets have neutral sentiment and both subject skewing negative. It would appear the FBI’s involvement in investigating the former president has triggered more negative sentiment than the Federal Reserve deflating asset values as it combats inflation.

#Pulling A
A_en <- search_tweets("Federal Reserve", n=500, type = "mixed", include_rts = FALSE, 
                          lang = "en")

tweets.A = A_en %>% select(full_text)



#Pulling B
B_en <- search_tweets("FBI", n=500, type = "mixed", 
                              include_rts = FALSE, lang = "en")

tweets.B = B_en %>% select(full_text)


A_sentiment <- lapply(tweets.A$full_text, function(x){sentiment_bing(x)})


B_sentiment <- lapply(tweets.B$full_text, function(x){sentiment_bing(x)})

Both_sentiment <- bind_rows(
  tibble(
    Subject = "Federal Reserve",
    Score = unlist(map(A_sentiment, 'score')),
    type = unlist(map(A_sentiment, 'type'))
  ),
  tibble(
    Subject = "FBI",
    Score = unlist(map(B_sentiment, 'score')),
    type = unlist(map(B_sentiment, 'type'))
  )
)

ggplot(Both_sentiment, aes(x=Score, fill = Subject))+
  geom_histogram(bins = 17, alpha =.6)+
  facet_grid(~Subject)+
  theme_bw()

A_sentiment_summary <- Both_sentiment%>%
  filter(Subject=="Federal Reserve")%>%
  summary(score)
A_sentiment_summary

##    Subject              Score             type          
##  Length:482         Min.   :-5.0000   Length:482        
##  Class :character   1st Qu.:-1.0000   Class :character  
##  Mode  :character   Median : 0.0000   Mode  :character  
##                     Mean   :-0.2095                     
##                     3rd Qu.: 0.0000                     
##                     Max.   : 4.0000

B_sentiment_summary <- Both_sentiment%>%
  filter(Subject=="FBI")%>%
  summary(score)
B_sentiment_summary

##    Subject              Score            type          
##  Length:500         Min.   :-7.000   Length:500        
##  Class :character   1st Qu.:-1.000   Class :character  
##  Mode  :character   Median : 0.000   Mode  :character  
##                     Mean   :-0.632                     
##                     3rd Qu.: 0.000                     
##                     Max.   : 4.000

Conclusion

My personal goals were accomplished here. Objectively evaluating textual data is new to me. Having read and analyzed Federal Reserve speeches for financial market action I know the importance of sentiment and word usage. However, human analysis of text can be time consuming and suffers from validity concerns given the subjectivity of individuals and the variance of their abilities.

The methods used here have an incredible amount of confoundedness. Using this analysis as is to influence strategy and actions is not something I would recommend. Ignoring the likely presence of bots, I believe better validity can be achieved via increasing sample size and surely there are better filtering options of the initial data pull than executed here. Playing around with choice of sentiment scoring type as well as filtering the text data further in pre-processing could yield more reliable results too. Application of machine learning techniques like neural networks could be interesting as to better gauge potential patterns in verbiage to better gauge sentiment.

There are a few conclusions from this basic sentiment analysis which I think deeper analysis would confirm:

Twitter sentiment is dynamic in nature. Regardless of subject, sentiment ebbs and flows. It would seem as though Horace was onto something roughly 2,000 years ago–the mob is indeed fickle.
Twitter sentiment is tends to skew negative. Many a pundit assert the rise of cynicism in recent years. A longitudinal study would be interesting to prove this trend, but it seems likely confirm Twitter sentiment skews negative.
Distribution of sentiment scores was surprising. While far from perfectly normal every replication resulted in a histogram reminiscent of a bell curve. This was a bit of a surprise given the polarization and tendency for extreme speech notorious to online behavior. However, as the bulk of tweets regardless of subject were neutral or near-neutral by sentiment score.