1. PREPARE

1a. Context

In a previous independent analysis (https://rpubs.com/CDNoonan/862191), I pulled tweets about mothers of children with disabilities from Twitter. In the analysis, I contrasted sentiment from tweets using the terms “mother(s)” and “disability” compared to “mother(s)” and “special needs”. The analysis seemed to indicate that sentiment surrounding mothers and disability was consistently more negative (according to all four lexicons) when compared to tweets about mothers and special needs. These results may accurately represent differences in sentiment, or they may represent flaws in the lexicons which automatically assign a positive sentiment to “special” but assign a negative sentiment to words such as “disabled” and “disability”.

Human qualitative analysis of the content of the tweets would be one way to identify such miscategorization of sentiment. However another way might be to group words together, performing text mining according to bi- or tri-grams.The previous text mining analysis was performed using n-grams of n=1, meaning that sentiment was determined on a single-word basis. Thus, sentiment of “special” would not necessarily have been associated with the sentiment of “needs.” In the analysis that follows, sentiment analysis of bigrams and trigrams will be performed and compared to the initial sentiment analysis to determine whether any differences result.

1b. Guiding Questions

My guiding questions for the previous report were:

        1.  What is the overall sentiment of recent tweets on the topic of 
        mothers parenting children with disabilities?

        2.  Does sentiment vary based on keyword (disability vs. 
        special needs)?

        3.  Does sentiment vary by lexicon?
        

The previous project indicated that sentiment for mothers parenting children with disabilities was negative, while tweets surrounding parenting children with special needs was positive. There was variety by lexicon, but this positive/negative difference was found in each lexicon.

A fourth guiding question, which this analysis hopes to answer, is:

      4. Does sentiment of tweets on the topic of mothers parenting
      children with disabilities change, when bi- and trigrams are 
      analyzed?

1c. Set Up

In terms of set up, I will: 1. Set up a new project in Rstudio 2. Load necessary packages into library

library(dplyr)
## Warning: package 'dplyr' was built under R version 4.0.5
library(tidytext)
## Warning: package 'tidytext' was built under R version 4.0.5
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.0.5
## Warning: package 'ggplot2' was built under R version 4.0.5
## Warning: package 'tibble' was built under R version 4.0.5
## Warning: package 'tidyr' was built under R version 4.0.5
## Warning: package 'readr' was built under R version 4.0.5
## Warning: package 'purrr' was built under R version 4.0.5
## Warning: package 'stringr' was built under R version 4.0.5
## Warning: package 'forcats' was built under R version 4.0.5
library(tidyr)
library(ggplot2)
library(igraph)
## Warning: package 'igraph' was built under R version 4.0.5
library(ggraph)
## Warning: package 'ggraph' was built under R version 4.0.5
library(knitr)
## Warning: package 'knitr' was built under R version 4.0.5
library(textdata)
## Warning: package 'textdata' was built under R version 4.0.5
library(vader)
## Warning: package 'vader' was built under R version 4.0.5

2. WRANGLE

2a. Import Data

Next, I’ll read the data from the original project into my environment and assign it to a variable name.

dmtweets <- read_csv("data/dm_tweets.csv")
## Rows: 623 Columns: 91
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (31): created_at, screen_name, text, source, reply_to_screen_name, lang,...
## dbl (19): user_id, status_id, display_text_width, reply_to_status_id, reply_...
## lgl (41): is_quote, is_retweet, quote_count, reply_count, hashtags, symbols,...
## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
snmtweets <- read_csv("data/snm_tweets.csv")
## Rows: 491 Columns: 91
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (31): created_at, screen_name, text, source, reply_to_screen_name, lang,...
## dbl (19): user_id, status_id, display_text_width, reply_to_status_id, reply_...
## lgl (41): is_quote, is_retweet, quote_count, reply_count, hashtags, symbols,...
## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.

The original parameters searched for up to 5000 tweets with the relevant words (example: “mom” , “mother” , “special needs” , “special needs child” , “disability” , “disabled”)

2b. Tokenizing text into bigrams

Next I’ll use tidytext functions for tokenizing the text. This time, however, I’ll tokenize by bigram, instead of unigram, using the unnest_tokens() function, specifying n=2 n = 2 to get bigrams. I’ll look at the “disability” bigrams first.

dm_bigrams <- dmtweets %>%
  unnest_tokens(bigram, text, token = "ngrams", n = 2)

Now I’ll count to see common bigrams:

dm_bigrams %>%
  count(bigram, sort = TRUE)

The most frequent bigram is “disabled child”, with a frequency of 461. That seems pretty pertinent to my analysis, but the next few bigrams do not seem very relevant (for example, “http t.co”).

I’ll perform the same steps for the “special needs” tweets and then go about removing stop words.

snm_bigrams <- snmtweets %>%
  unnest_tokens(bigram, text, token = "ngrams" , n = 2)

Counting the top bigrams for “special needs” tweets:

snm_bigrams %>% 
  count(bigram, sort = TRUE)

Same as above, the top bigram (“special needs”) is very relevant, the next one: not so much

2c. Remove stop words

Next, I will remove stop words. Because I’m working with bigrams, I cannot remove stop words directly. I have to first separate the diagrams into two columns and then remove stop words in each column using separate and filter Then I’ll count the top bigrams again.

dm_bigrams_separated <- dm_bigrams %>%
  separate(bigram, c("word1", "word2"), sep = " ")

dm_bigrams_filtered <- dm_bigrams_separated %>%
  filter(!word1 %in% stop_words$word) %>%
  filter(!word2 %in% stop_words$word)

dm_bigram_counts <- dm_bigrams_filtered %>% 
  count(word1, word2, sort = TRUE)

dm_bigram_counts

Finally, I’ll combine the separated words with unite:

dm_bigrams_united <- dm_bigrams_filtered %>%
  unite(bigram, word1, word2, sep = " ")

dm_bigrams_united

Next I’ll repeat the above steps for the “Special Needs” tweets

snm_bigrams_separated <- snm_bigrams %>%
  separate(bigram, c("word1", "word2"), sep = " ")

snm_bigrams_filtered <- snm_bigrams_separated %>%
  filter(!word1 %in% stop_words$word) %>%
  filter(!word2 %in% stop_words$word)

snm_bigram_counts <- snm_bigrams_filtered %>% 
  count(word1, word2, sort = TRUE)

snm_bigram_counts

Interestingly, many of the top words here seem to have somewhat negative connotations (aggression, biting, anti). It will be interesting to see how the sentiment analysis turns out this time!

snm_bigrams_united <- snm_bigrams_filtered %>%
  unite(bigram, word1, word2, sep = " ")

snm_bigrams_united

3. VISUALIZE WORD NETWORKS

Next, as in the previous week’s Walkthrough, I’ll visualize a word network. To do so, I’ll create edges using the following three variables

  1. from: the node an edge is coming from
  2. to: the node an edge is going towards
  3. weight: A numeric value associated with each edge

I’ll need to transform the two dataset (dm_bigram_counts and, separately, snm_brigram_counts) into these variables as follows: from is the “word1” to is the “word2” weight is “n”.

I’ll use graph_from_data_frame to make the transformation:

dm_bigram_graph <- dm_bigram_counts %>%
  graph_from_data_frame()

dm_bigram_graph
## IGRAPH 1ca9897 DN-- 3079 2979 -- 
## + attr: name (v/c), n (e/n)
## + edges from 1ca9897 (vertex names):
##  [1] disabled       ->child           https          ->t.co           
##  [3] disabled       ->kid             real           ->life           
##  [5] ultimate       ->reality         severely       ->disabled       
##  [7] ashlacreme     ->lightshow10thpl diesel_dougie  ->promomcmc      
##  [9] fattrel        ->wale            lightshow10thpl->fattrel        
## [11] medusa_yq      ->ashlacreme      promomcmc      ->medusa_yq      
## [13] shyglizzy      ->diesel_dougie   kyliejenner    ->taylorswift13  
## [15] nickiminaj     ->rihanna         rihanna        ->thegirljt      
## + ... omitted several edges

Many bigrams only appear a few times, so I’ll keep only those appearing more than three times. This makes the visualization of the network clearer too.

dm_bigram_graph_filtered <- dm_bigram_counts %>%
  filter(n > 3) %>%
  graph_from_data_frame()

dm_bigram_graph_filtered
## IGRAPH 1ccad02 DN-- 55 45 -- 
## + attr: name (v/c), n (e/n)
## + edges from 1ccad02 (vertex names):
##  [1] disabled       ->child           https          ->t.co           
##  [3] disabled       ->kid             real           ->life           
##  [5] ultimate       ->reality         severely       ->disabled       
##  [7] ashlacreme     ->lightshow10thpl diesel_dougie  ->promomcmc      
##  [9] fattrel        ->wale            lightshow10thpl->fattrel        
## [11] medusa_yq      ->ashlacreme      promomcmc      ->medusa_yq      
## [13] shyglizzy      ->diesel_dougie   kyliejenner    ->taylorswift13  
## [15] nickiminaj     ->rihanna         rihanna        ->thegirljt      
## + ... omitted several edges

Next, I’ll visualize the word net:

set.seed(100)

a <- grid::arrow(type = "open", length = unit(.2, "inches"))

ggraph(dm_bigram_graph_filtered, layout = "fr") +
  geom_edge_link(aes(edge_alpha = n, edge_width = n), 
                 show.legend = FALSE,
                 arrow = a, 
                 end_cap = circle(.07, 'inches'),
                 label_dodge = TRUE) +
  geom_node_point(color = "red", size = 3) +
  geom_node_text(aes(label = name), vjust = 1, hjust = 1) +
  theme_void()

Clearly there are some really relevant themes here and some really aribitary ones!

I’ll repeat below for the “special needs” tweets. For the sake of comparing apples to apples, I’ll set the same filter at n=3.

snm_bigram_graph_filtered <- snm_bigram_counts %>%
  filter(n > 3) %>%
  graph_from_data_frame()

snm_bigram_graph_filtered
## IGRAPH 1f36494 DN-- 92 89 -- 
## + attr: name (v/c), n (e/n)
## + edges from 1f36494 (vertex names):
##  [1] https            ->t.co             adhd             ->special         
##  [3] aggression       ->https            anti             ->biting          
##  [5] arthritis        ->auto             autism           ->cerebral        
##  [7] auto             ->aggression       bite             ->protection      
##  [9] cerebral         ->palsy            child            ->autism          
## [11] etsy             ->shop             gnaws            ->hands           
## [13] palsy            ->arthritis        cerebralpalsytoys->https           
## [15] bitingmittens    ->protectivegloves child            ->https           
## + ... omitted several edges

and now create the visualization of the filtered graph

set.seed(100)

a <- grid::arrow(type = "open", length = unit(.2, "inches"))

ggraph(snm_bigram_graph_filtered, layout = "fr") +
  geom_edge_link(aes(edge_alpha = n, edge_width = n), 
                 show.legend = FALSE,
                 arrow = a, 
                 end_cap = circle(.07, 'inches'),
                 label_dodge = TRUE) +
  geom_node_point(color = "red", size = 3) +
  geom_node_text(aes(label = name), vjust = 1, hjust = 1) +
  theme_void()

This cluster of words seems much more relevant to the special needs community. There are issues relating to behavior, specific conditions, government benefits, teachers, parents, and estate planning. This word network also seems more dense than the one for disability tweets.

More information on creating visualizations can be found here: documentation

3. ANALYZING SENTIMENT

Finally, I want to return to the guiding questions of this secondary analysis. Now that we are dealing with bigrams, does sentiment vary between the two terms (disabiled vs special needs) and/or does it vary by lexicon?

To start I’ll filter tweets by language, select relevant columns, add a column for keyword (“disabled” vs. “special needs”) and relocate that column to first position

dm_tojoin <- dmtweets %>%
  filter(lang == "en") %>%
  select(screen_name, created_at, text) %>%
  mutate(keyword = "disability") %>%
  relocate(keyword)
snm_tojoin <- snmtweets %>%
  filter(lang == "en") %>%
  select(screen_name, created_at, text) %>%
  mutate(keyword = "special needs") %>%
  relocate(keyword)

Now I’ll join them together

tweets <- bind_rows(dm_tojoin, snm_tojoin)

Next I’ll tokenize the text, but again, looking for bigrams

all_bigrams <- tweets %>%
  unnest_tokens(bigram, text, token = "ngrams" , n = 2)

Removing those pesky stopwords…again.

all_bigrams_separated <- all_bigrams %>%
  separate(bigram, c("word1", "word2"), sep = " ")

all_bigrams_filtered <- all_bigrams_separated %>%
  filter(!word1 %in% stop_words$word) %>%
  filter(!word2 %in% stop_words$word)

all_bigram_counts <- all_bigrams_filtered %>% 
  count(word1, word2, sort = TRUE)

all_bigram_counts

Next I’ll add sentiment values from each of the four lexicons.

afinn <- get_sentiments("afinn")

bing <- get_sentiments("bing")

nrc <- get_sentiments("nrc")

loughran <- get_sentiments("loughran")

Analyzing sentiment via the afinn lexicon

#not working
#sentiment_afinn <- inner_join(all_bigrams, afinn, by = "bigram")

#sentiment_afinn

This doesn’t seem to be working as expected, so I’m going to try using the vader package as a different approach.

vader_allbigrams <- vader_df(all_bigrams$bigram)
## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

## Warning in sentiments[i] <- senti_valence(wpe, i, item): number of items to
## replace is not a multiple of replacement length

Next I will attempt to visualize by keyword

vaderbigrams <- rename(vader_allbigrams, bigram = text)
sentiment_vader <- inner_join(all_bigrams, vaderbigrams, by = "bigram")
#I'll take a sample of this dataframe because it's so big
sentsample <- sample_n(sentiment_vader, 100)
#then plot the sample
ggplot(data = sentsample) +
geom_point(mapping = aes(x = bigram, y = compound, 
          color = keyword, alpha = 0.5))

According to Hutto, C. & Gilbert, E. (2014):

“The compound score is computed by summing the valence scores of each word in the lexicon, adjusted according to the rules, and then normalized to be between -1 (most extreme negative) and +1 (most extreme positive). This is the most useful metric if you want a single unidimensional measure of sentiment for a given sentence. Calling it a ‘normalized, weighted composite score’ is accurate.”

Typical threshold values are:

positive sentiment: compound score >= 0.05

neutral sentiment: (compound score > -0.05) and (compound score < 0.05)

negative sentiment: compound score <= -0.05

If the above plot can be believed, sentiment is fairly neutral for both keywords (at least looking at bigrams alone). Below I want to take the mean of the compound scores for each keyword and compare the result.

sentiment_means <- sentiment_vader %>%
  group_by(keyword) %>%
  mutate(mean_by_keyword = mean(compound))
keyword_sentiment_means <- sentiment_means %>% 
  select(keyword, mean_by_keyword)

According to these calculations, the sentiment for bigrams associated with “disability” is 0.00539, while the sentiment for bigrams associated with “special needs” is 0.135.

Based on this new calculation sentiment surrounding “special needs” is slightly POSITIVE, as the initial unigram analysis showed. However, this time, tweets associated with “disability” are NEUTRAL, not negative as the unigram analysis showed.

ggplot(data = keyword_sentiment_means) +
geom_col(mapping = aes(x = keyword, y = mean_by_keyword)) # +

  #ylim(-0.5, 0.5)

#Interestingly, when I tried to change the scale of the y axis
#(see commented out lines above), it inaccurately showed disability
#sentiment as being closer to 0.5 than .005 and made it look more positive
#than special needs sentiment...weird!

3. CONCLUSION

Performing sentiment analysis using bigrams confirmed some results from the initial unigram analysis (https://rpubs.com/CDNoonan/862191) but challenged others.

In this analysis, bigram sentiment was positive for “special needs” tweets (as before), but neutral for “disability” tweets. Previous unigram analysis found sentiment surrounding “disability” to be negative.

There are some limitations to this analysis. The first is that a different method was used to analyze sentiment in this analysis. In the first analysis, I used four lexicons (afinn, bing, loughran, nrc) to analyze sentiment. This time I used the vader package. Using a different method of analysis could have altered results.

Another limitation is shown in the word networks–namely that many of the bigrams in the word network for “disability” seem to be irrelevant to the topic. This suggests either flaws in the initial search, or perhaps references to disability from unexpected posters (celebrity names, for example, were prominent in the word network).