Analyzing Twitter Sentiment: Mothers of Children with Disabilities

#I had quite a bit of trouble with getting RStudio.cloud to accept my lang == "en"
# command in code chunks 5 & 6, so here I set the system language to english
# I've never had to do this before, so it may not be necessary for others or
# even on a different day
Sys.setlocale("LC_MESSAGES", "en_US.utf8")

## [1] "en_US.utf8"

PREPARE
1. Context:
  1. Working parents in the United States experience limited support systems due to lack of affordable childcare, absence of paid family leave, and abbreviated family leave (Ogrysko, 2019). Mothers disproportionately manage children’s needs, especially during their children’s early years, due to cultural expectations, gender pay disparity, and likely other factors as well (Kim, 2021; Ogrysko, 2019). As a result, mothers of young children face barriers to career advancement and personal wellbeing, experiencing constrained schedules due to mothering duties and decreased time for sleep, exercise, social engagement, or other self-care. While the United States guarantees legal protection and accommodation for individuals with disabilities through legislation such as the Americans with Disabilities Act (ADA) and the Individuals with Disabilities Education Act (IDEA) and subsidized health insurance for lower income families with disabled children (Medicaid), “on the ground” these programs can be time-consuming and confusing to negotiate. Additionally, children with disabilities typically require frequent medical and therapy appointments and may have unpredictable episodes or illness or even hospitalization. Thus mothers of children with disabilities face significant constraints on their time and are asked to balance many roles (Kim, 2021; Stewart, 2020). To better understand the current dialogue surrounding mothers of children with disabilities, I investigated recent tweets with keywords such as “mom” or “mother” and “disability” or “special needs” and analyzed sentiment of the tweets using the afinn, loughran, bing, and nrc lexicons. Lexicons are pre-existing collections of words with associated sentiment or sentiments attached to them. While some lexicons characterize words on many axes (trust, etc.) all used here also offer a basic positive and negative characterization. Lexicons are created using available texts, generally from online sources, and therefore may not be accurate or valid in all contexts. Validity is enhanced with human review of words and sentiment values to ensure accuracy.
  2. My guiding questions for this report are:
    1. What is the overall sentiment of recent tweets on the topic of mothers parenting children with disabilities?
    2. Does sentiment vary based on keyword (disability vs. special needs)?
    3. Does sentiment vary by lexicon?
    Another question that I am interested in (though I won’t address it in this project, but possibly will in my final project for this course) is whether sentiment varies based on the location of the Twitter poster. More specifically, does sentiment vary in states with expansive free or low-cost pre-K programs?
    
    Some evidence suggests that greater access to low-cost early childhood education improves lifelong developmental and educational trajectories for children with disabilities (as well as for students from low income families and English language learners). It would also be interesting to see if such programs offer a “spillover effect” to mothers of kids with disabilities. Are mothers’ experiences different (as viewed by sentiment of tweets) in states with expansive pre-K programs versus those without?
2. Set up: To begin, I’ll install the required packages. Following, I load them into the library.

# This is updating the wordcloud 2 package
remotes::install_github("lchiffon/wordcloud2")

## Skipping install of 'wordcloud2' from a github remote, the SHA1 (8a12a3b6) has not changed since last install.
##   Use `force = TRUE` to force installation

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

        library(readr)
        library(tidyr)
        library(rtweet)
        library(writexl)
        library(readxl)
        library(tidytext)
        library(textdata)
        library(ggplot2)
        library(textdata)
        library(scales)

## 
## Attaching package: 'scales'

## The following object is masked from 'package:readr':
## 
##     col_factor

        library(wordcloud2)

3.  Next, I'll store API keys and authenticate them and check to see that the token is loaded. Note: secret keys are hidden.

## There is a previous code chunk that stores Twitter API keys but these are private
# so the code chunk is set to not show up in the report: {r, include=FALSE}

## authenticate via web browser
        token <- create_token(
          app = app_name,
          consumer_key = api_key,
          consumer_secret = api_secret_key,
          access_token = access_token,
          access_secret = access_token_secret)
        ## check to see if the token is loaded
        get_token()

## <Token>
## <oauth_endpoint>
##  request:   https://api.twitter.com/oauth/request_token
##  authorize: https://api.twitter.com/oauth/authenticate
##  access:    https://api.twitter.com/oauth/access_token
## <oauth_app> SNAfoo
##   key:    clxr0mEFqC3FwV5ESDd8d3yNF
##   secret: <hidden>
## <credentials> oauth_token, oauth_token_secret
## ---

WRANGLE

2a. Importing Tweets
1. First, I’ll import the tweets and view the resulting data frame.

#disability_tweets <- search_tweets(q = "#disability", n=5000)

        #specialneeds_tweets <- search_tweets(q = "#specialneeds" , n=5000)

        #mom_disspecneeds_tweets <- search_tweets(q = "#disability OR #specialneeds 
         #                                      AND mom" ,
          #                                     n=5000,
           #                                    include_rts = FALSE)

       # kid_disspecneeds_tweets <- search_tweets(q = "#disabledchild OR #specialneeds 
        #                                       AND mom" ,
         #                                        n=5000,
          #                                       include_rts = FALSE)

        # There's actually a lot of overlap b/t the findings of the previous two
        # searches, so the first set of terms seems to be sufficient. 
        
        # I've commented this code out because it takes a long time during the knitting
        # process and I really don't need it for my analysis. This chunk is really
        # just me seeing what's out there on Twitter for these keywords

2.  Next I'll create two dictionaries, one for the keyword "disability" and the other for the keyword phrase "special needs"

#Next I'll create the dictionaries for 'special needs mom' and 'disabled
        #child mom

        specneedsmom_dictionary <- c("#specialneeds AND mom",
                             '"#specialneeds AND mother"',
                             '"special needs mom"',
                             '"special needs kid"',
                             '"special needs child"')

        snm_tweets <- search_tweets2(specneedsmom_dictionary,
                                      n=5000,
                                      include_rts = FALSE)

        diskidmom_dictionary <- c("#disabledchild AND mom",
                              '"#disabledchild AND mother"',
                              '"disabled child mom"',
                              '"disabled kid mom"',
                              '"disabled kid"',
                              '"disabled child"')

        dm_tweets <- search_tweets2(diskidmom_dictionary,
                                     n=5000,
                                     include_rts = FALSE)

3.  Next, I'll save the tweet files to Excel. This allows me to have a stable set of data, since Twitter and tweets are constantly changing. This would be useful if I want to do further analysis on this same set of tweets.

## Saving tweet files to Excel (need to create data folder first)
        write_xlsx(snm_tweets, "data/snm_tweets.xlsx")
        write_xlsx(dm_tweets, "data/dm_tweets.xlsx")

    [2b. Tidying the text]{.ul}

4.  Here I'll filter tweets by language, select relevant columns, add a column for keyword ("disabled" vs. "special needs") and relocate that column to first position

#for disability
        dm_text <- dm_tweets %>%
          filter(lang == "en") %>%
          select(screen_name, created_at, text) %>%
          mutate(keyword = "disability") %>%
          relocate(keyword)

 #for special needs
        snm_text <- snm_tweets %>%
          filter(lang == "en") %>%
          select(screen_name, created_at, text) %>%
          mutate(keyword = "special needs") %>%
          relocate(keyword)

5.  Combine data frames and looking at head & tail of data frame

tweets <- bind_rows(dm_text, snm_text)
        head(tweets)

## # A tibble: 6 × 4
##   keyword    screen_name     created_at          text                           
##   <chr>      <chr>           <dttm>              <chr>                          
## 1 disability NavyPrism       2022-02-04 15:24:35 "@AlliAlliG I actually havnt r…
## 2 disability eflask          2022-02-04 17:29:39 "@VermontStudent @michellebfay…
## 3 disability MacaroniMan2021 2022-02-04 15:49:54 "@MhairiHunter The Nazis kille…
## 4 disability CannaCaptain    2022-02-04 15:39:06 "@allan_cheapshot If you’re te…
## 5 disability MstrssVeronica  2022-02-04 15:29:28 "This doesn't offend me becaus…
## 6 disability NavyPrism       2022-02-04 15:24:35 "@AlliAlliG I actually havnt r…

        tail(tweets)

## # A tibble: 6 × 4
##   keyword       screen_name     created_at          text                        
##   <chr>         <chr>           <dttm>              <chr>                       
## 1 special needs RonSnowflake    2022-01-27 18:16:40 "@HaYanGamer420 @Jester2218…
## 2 special needs jeffreykniffin  2022-01-27 18:05:28 "@irbransmom Apparently she…
## 3 special needs FoxRothschild   2022-01-27 17:45:25 "Difficulties in divorce se…
## 4 special needs rnwalker        2022-01-27 17:30:05 "A #specialneeds #trust can…
## 5 special needs Brandon_Nedib   2022-01-27 16:58:08 "@kgeads17 @GovernorVA I ha…
## 6 special needs TheJayCalledLee 2022-01-27 13:37:14 "Fury over Muslim boy, 11, …

6.  Tokenizing the text

tweet_tokens <- 
          tweets %>%
          unnest_tokens(output = word, 
                        input = text, 
                        token = "tweets")

## Using `to_lower = TRUE` with `token = 'tweets'` may not preserve URLs.

7.  Removing stop words, looking at top word counts, and filtering out nonsense words.

## Removing stop words
        tidy_tweets <-
          tweet_tokens %>%
          anti_join(stop_words, by = "word")

        ## Looking at top word counts
        count(tidy_tweets, word, sort = T)

## # A tibble: 6,390 × 2
##    word         n
##    <chr>    <int>
##  1 child      799
##  2 disabled   697
##  3 special    372
##  4 kid        272
##  5 amp         96
##  6 parent      94
##  7 dont        92
##  8 addition    85
##  9 bite        83
## 10 im          83
## # … with 6,380 more rows

        #Making the code more orderly and filtering out additional words
        tidy_tweets <-
          tweet_tokens %>%
          anti_join(stop_words, by = "word") %>%
          filter(!word %in% c("im", "special" , "amp" , "disabled",
                               "child", "kid" , "kids", "#etsy", "1",
                               "2", "3", "4"))
                 
        count(tidy_tweets, word, sort = T)

## # A tibble: 6,378 × 2
##    word                             n
##    <chr>                        <int>
##  1 parent                          94
##  2 dont                            92
##  3 addition                        85
##  4 bite                            83
##  5 shop                            83
##  6 share                           82
##  7 childautismcerebral             81
##  8 excited                         81
##  9 palsyarthritisautoaggression    81
## 10 care                            80
## # … with 6,368 more rows

    [2c. Sentiment Values]{.ul}

8.  Next I'll add sentiment values. For every lexicon but bing, I have to select '1'

    in the console

afinn <- get_sentiments("afinn")

        bing <- get_sentiments("bing")

        nrc <- get_sentiments("nrc")

        loughran <- get_sentiments("loughran")

9.  Next, I will join each lexicon with the tidy_tweets file, creating a separate column that designates the lexicon being used. *(At least, I think that's what I'm doing here!)*

sentiment_afinn <- inner_join(tidy_tweets, afinn, by = "word")

        sentiment_afinn

## # A tibble: 1,952 × 5
##    keyword    screen_name     created_at          word     value
##    <chr>      <chr>           <dttm>              <chr>    <dbl>
##  1 disability NavyPrism       2022-02-04 15:24:35 lol          3
##  2 disability eflask          2022-02-04 17:29:39 worry       -3
##  3 disability eflask          2022-02-04 17:29:39 supports     2
##  4 disability MacaroniMan2021 2022-02-04 15:49:54 killed      -3
##  5 disability MacaroniMan2021 2022-02-04 15:49:54 jokes        2
##  6 disability MacaroniMan2021 2022-02-04 15:49:54 hate        -3
##  7 disability MstrssVeronica  2022-02-04 15:29:28 offend      -2
##  8 disability MstrssVeronica  2022-02-04 15:29:28 offends     -2
##  9 disability MstrssVeronica  2022-02-04 15:29:28 ashamed     -2
## 10 disability NavyPrism       2022-02-04 15:24:35 lol          3
## # … with 1,942 more rows

        sentiment_bing <- inner_join(tidy_tweets, bing, by = "word")

        sentiment_bing

## # A tibble: 1,872 × 5
##    keyword    screen_name     created_at          word           sentiment
##    <chr>      <chr>           <dttm>              <chr>          <chr>    
##  1 disability eflask          2022-02-04 17:29:39 worry          negative 
##  2 disability eflask          2022-02-04 17:29:39 expensive      negative 
##  3 disability eflask          2022-02-04 17:29:39 supports       positive 
##  4 disability MacaroniMan2021 2022-02-04 15:49:54 killed         negative 
##  5 disability MacaroniMan2021 2022-02-04 15:49:54 hate           negative 
##  6 disability CannaCaptain    2022-02-04 15:39:06 invader        negative 
##  7 disability MstrssVeronica  2022-02-04 15:29:28 offend         negative 
##  8 disability MstrssVeronica  2022-02-04 15:29:28 profoundly     positive 
##  9 disability MstrssVeronica  2022-02-04 15:29:28 embarrassingly negative 
## 10 disability MstrssVeronica  2022-02-04 15:29:28 ashamed        negative 
## # … with 1,862 more rows

        sentiment_nrc <- inner_join(tidy_tweets, nrc, by = "word")

        sentiment_nrc

## # A tibble: 7,926 × 5
##    keyword    screen_name created_at          word    sentiment   
##    <chr>      <chr>       <dttm>              <chr>   <chr>       
##  1 disability NavyPrism   2022-02-04 15:24:35 time    anticipation
##  2 disability eflask      2022-02-04 17:29:39 worry   anticipation
##  3 disability eflask      2022-02-04 17:29:39 worry   fear        
##  4 disability eflask      2022-02-04 17:29:39 worry   negative    
##  5 disability eflask      2022-02-04 17:29:39 worry   sadness     
##  6 disability eflask      2022-02-04 17:29:39 argue   anger       
##  7 disability eflask      2022-02-04 17:29:39 argue   negative    
##  8 disability eflask      2022-02-04 17:29:39 offense anger       
##  9 disability eflask      2022-02-04 17:29:39 offense disgust     
## 10 disability eflask      2022-02-04 17:29:39 offense fear        
## # … with 7,916 more rows

        sentiment_loughran <- inner_join(tidy_tweets, loughran, by = "word")

        sentiment_loughran

## # A tibble: 958 × 5
##    keyword    screen_name     created_at          word       sentiment   
##    <chr>      <chr>           <dttm>              <chr>      <chr>       
##  1 disability eflask          2022-02-04 17:29:39 worry      negative    
##  2 disability eflask          2022-02-04 17:29:39 argue      negative    
##  3 disability eflask          2022-02-04 17:29:39 offense    litigious   
##  4 disability eflask          2022-02-04 17:29:39 adequately positive    
##  5 disability MacaroniMan2021 2022-02-04 15:49:54 depending  uncertainty 
##  6 disability MacaroniMan2021 2022-02-04 15:49:54 depending  constraining
##  7 disability MstrssVeronica  2022-02-04 15:29:28 offend     negative    
##  8 disability MstrssVeronica  2022-02-04 15:29:28 offends    negative    
##  9 disability LansleyAnna     2022-02-04 15:08:55 severely   negative    
## 10 disability LansleyAnna     2022-02-04 15:08:55 doubts     negative    
## # … with 948 more rows

EXPLORE
1. Create a time series visualization

 ts_plot(tweets, by = "days") ##plot by days

        ts_plot(tweets, by = "hours") ## plot by hours

2.  Plot by groups of keywords

#plot by keyword (disability vs. special needs)
        ts_plot(dplyr::group_by(tweets, keyword), "hours")

        ts_plot(dplyr::group_by(tweets, keyword), "days")

3.  Analyzing sentiment, grouping by keyword, and creating a sentiment score for the **bing** lexicon, and adding a lexicon variable (column) to the data frame.

# Bing: Creating a single sentiment score and adding a lexicon variable
        # (the spread function from the tidyr package transforms our sentiment
        # column into separate columns for negative and positive
        # that contains the n counts for each)
        summary_bing <- sentiment_bing %>% 
          group_by(keyword) %>% 
          count(sentiment, sort = TRUE) %>% 
          spread(sentiment, n) %>%
          mutate(sentiment = positive - negative) %>%
          mutate(lexicon = "bing") %>%
          relocate(lexicon)

        summary_bing

## # A tibble: 2 × 5
## # Groups:   keyword [2]
##   lexicon keyword       negative positive sentiment
##   <chr>   <chr>            <int>    <int>     <int>
## 1 bing    disability         785      364      -421
## 2 bing    special needs      355      368        13

4.  Repeating the steps above for the remaining lexicons:

    1.  afinn

 # repeating the step above but for afinn lexicon
            summary_afinn <- sentiment_afinn %>% 
              group_by(keyword) %>% 
              summarise(sentiment = sum(value)) %>% 
              mutate(lexicon = "afinn") %>%
              relocate(lexicon)

            summary_afinn

## # A tibble: 2 × 3
##   lexicon keyword       sentiment
##   <chr>   <chr>             <dbl>
## 1 afinn   disability         -670
## 2 afinn   special needs       221

    2.  loughran

# repeating bing steps above for loughran lexicon and filtering summary
            # to only see positive and negative values

            summary_loughran <- sentiment_loughran %>% 
              group_by(keyword) %>% 
              count(sentiment, sort = TRUE) %>% 
              spread(sentiment, n) %>%
              mutate(sentiment = positive - negative) %>%
              mutate(lexicon = "loughran") %>%
              relocate(lexicon)

            summary_loughran

## # A tibble: 2 × 9
## # Groups:   keyword [2]
##   lexicon  keyword       constraining litigious negative positive superfluous
##   <chr>    <chr>                <int>     <int>    <int>    <int>       <int>
## 1 loughran disability              21        52      408       59           1
## 2 loughran special needs           15        48      183      111          NA
## # … with 2 more variables: uncertainty <int>, sentiment <int>

            summary_loughran_2 <- summary_loughran %>%
              select(lexicon, keyword, negative, positive, sentiment)

            summary_loughran_2

## # A tibble: 2 × 5
## # Groups:   keyword [2]
##   lexicon  keyword       negative positive sentiment
##   <chr>    <chr>            <int>    <int>     <int>
## 1 loughran disability         408       59      -349
## 2 loughran special needs      183      111       -72

    3.  nrc

# repeating above steps for nrc lexicon; also selecting
            # only rows that contain "positive" and "negative" b/c
            # nrc lexicon contains other values like "trust","sadness"

            summary_nrc <- sentiment_nrc %>% 
              group_by(keyword) %>% 
              count(sentiment, sort = TRUE) %>%
              spread(sentiment, n) %>%
              mutate(sentiment = positive - negative) %>%
              mutate(lexicon = "nrc") %>%
              relocate(lexicon)

            summary_nrc

## # A tibble: 2 × 13
## # Groups:   keyword [2]
##   lexicon keyword       anger anticipation disgust  fear   joy negative positive
##   <chr>   <chr>         <int>        <int>   <int> <int> <int>    <int>    <int>
## 1 nrc     disability      351          373     263   410   274      778      811
## 2 nrc     special needs   199          392     127   216   403      454      724
## # … with 4 more variables: sadness <int>, surprise <int>, trust <int>,
## #   sentiment <int>

            summary_nrc_2 <- summary_nrc %>%
              select(lexicon, keyword, negative, positive, sentiment)

            summary_nrc_2

## # A tibble: 2 × 5
## # Groups:   keyword [2]
##   lexicon keyword       negative positive sentiment
##   <chr>   <chr>            <int>    <int>     <int>
## 1 nrc     disability         778      811        33
## 2 nrc     special needs      454      724       270

MODEL
1. The “Modeling” step of the learning analytics workflow involves using statistical models to analyze data and, where possible, make predictions. I’ll also use this section to visualize how sentiment varies by lexicon and what the nature of word choice is in tweets about mothering disabled or special needs children. Because I’m changing the order slightly from the Walkthrough, I’m going to do some “polishing” steps here…
2. Polishing
  1. Step 1: Organizing filters, etc. a bit better

dm_text <- dm_tweets %>%
              filter(lang == "en") %>%
              select(status_id, text) %>%
              mutate(keyword = "disability") %>%
              relocate(keyword)

snm_text <- snm_tweets %>%
              filter(lang == "en") %>%
              select(status_id, text) %>%
              mutate(keyword = "special needs") %>%
              relocate(keyword)

    2.  Step 2: merging data frames

 #merging the two data frames and taking a look
            tweets <- bind_rows(dm_text, snm_text)

            tweets

## # A tibble: 1,108 × 3
##    keyword    status_id           text                                          
##    <chr>      <chr>               <chr>                                         
##  1 disability 1489620910428487682 "@AlliAlliG I actually havnt really heard of …
##  2 disability 1489652384724836362 "@VermontStudent @michellebfay @selene_colbur…
##  3 disability 1489627282738143234 "@MhairiHunter The Nazis killed lots of disab…
##  4 disability 1489624563474436101 "@allan_cheapshot If you’re telling me that I…
##  5 disability 1489622140039188486 "This doesn't offend me because it's an imper…
##  6 disability 1489620910428487682 "@AlliAlliG I actually havnt really heard of …
##  7 disability 1489616967480877066 "@JoeMillerAS1 I think that ppl should only h…
##  8 disability 1489561390218203136 "I remember all the strange, embarrassing or …
##  9 disability 1489510120295788550 "@HSpearmano I would totes contribute better …
## 10 disability 1489451350534418432 "You CANNOT have the liberty to remove someon…
## # … with 1,098 more rows

            head(tweets)

## # A tibble: 6 × 3
##   keyword    status_id           text                                           
##   <chr>      <chr>               <chr>                                          
## 1 disability 1489620910428487682 "@AlliAlliG I actually havnt really heard of t…
## 2 disability 1489652384724836362 "@VermontStudent @michellebfay @selene_colburn…
## 3 disability 1489627282738143234 "@MhairiHunter The Nazis killed lots of disabl…
## 4 disability 1489624563474436101 "@allan_cheapshot If you’re telling me that In…
## 5 disability 1489622140039188486 "This doesn't offend me because it's an impers…
## 6 disability 1489620910428487682 "@AlliAlliG I actually havnt really heard of t…

            tail(tweets)

## # A tibble: 6 × 3
##   keyword       status_id           text                                        
##   <chr>         <chr>               <chr>                                       
## 1 special needs 1486765113155674118 "@HaYanGamer420 @Jester22183 @ChrisCampbell…
## 2 special needs 1486762296428974084 "@irbransmom Apparently she's never heard o…
## 3 special needs 1486757250735747077 "Difficulties in divorce settlements can in…
## 4 special needs 1486753389316907011 "A #specialneeds #trust can help care for a…
## 5 special needs 1486745351843454979 "@kgeads17 @GovernorVA I have a special nee…
## 6 special needs 1486694793996431369 "Fury over Muslim boy, 11, with special nee…

    3.  Step 3: analyzing sentiment from the afinn lexicon

#Cleaning up code for analyzing sentiment from each lexicon
            #afinn
            customwords <- c("amp" , "im" , "child" , "disabled" ,
                             "special" , "kid", "1" , "2" , "3" , "4")

            sentiment_afinn <- tweets %>%
              unnest_tokens(output = word, 
                            input = text, 
                            token = "tweets")  %>% 
              anti_join(stop_words, by = "word") %>%
              filter(!word %in% c("im", "special" , "amp" , "disabled",
                               "child", "kid" , "kids", "#etsy", "1",
                               "2", "3", "4")) %>%
              inner_join(afinn, by = "word")

## Using `to_lower = TRUE` with `token = 'tweets'` may not preserve URLs.

            # Dr. J: wondering if I could have done filter(!word == "customwords") %>%
            # above instead

            sentiment_afinn

## # A tibble: 1,952 × 4
##    keyword    status_id           word     value
##    <chr>      <chr>               <chr>    <dbl>
##  1 disability 1489620910428487682 lol          3
##  2 disability 1489652384724836362 worry       -3
##  3 disability 1489652384724836362 supports     2
##  4 disability 1489627282738143234 killed      -3
##  5 disability 1489627282738143234 jokes        2
##  6 disability 1489627282738143234 hate        -3
##  7 disability 1489622140039188486 offend      -2
##  8 disability 1489622140039188486 offends     -2
##  9 disability 1489622140039188486 ashamed     -2
## 10 disability 1489620910428487682 lol          3
## # … with 1,942 more rows

            afinn_score <- sentiment_afinn %>% 
              group_by(keyword, status_id) %>% 
              summarise(value = sum(value))

## `summarise()` has grouped output by 'keyword'. You can override using the
## `.groups` argument.

            afinn_score

## # A tibble: 901 × 3
## # Groups:   keyword [2]
##    keyword    status_id           value
##    <chr>      <chr>               <dbl>
##  1 disability 1486695407962972164     4
##  2 disability 1486696333209653250     0
##  3 disability 1486699099701264384     3
##  4 disability 1486715162333573125     4
##  5 disability 1486715666556018701    -5
##  6 disability 1486719514221834250    -4
##  7 disability 1486723057536290821    -2
##  8 disability 1486748881920552962     9
##  9 disability 1486750158570209280     1
## 10 disability 1486760075444330501    -3
## # … with 891 more rows

            afinn_sentiment <- afinn_score %>%
              filter(value != 0) %>%
              mutate(sentiment = if_else(value < 0, "negative", "positive"))

            afinn_sentiment

## # A tibble: 858 × 4
## # Groups:   keyword [2]
##    keyword    status_id           value sentiment
##    <chr>      <chr>               <dbl> <chr>    
##  1 disability 1486695407962972164     4 positive 
##  2 disability 1486699099701264384     3 positive 
##  3 disability 1486715162333573125     4 positive 
##  4 disability 1486715666556018701    -5 negative 
##  5 disability 1486719514221834250    -4 negative 
##  6 disability 1486723057536290821    -2 negative 
##  7 disability 1486748881920552962     9 positive 
##  8 disability 1486750158570209280     1 positive 
##  9 disability 1486760075444330501    -3 negative 
## 10 disability 1486761340475740163    -2 negative 
## # … with 848 more rows

            afinn_ratio <- afinn_sentiment %>% 
              group_by(keyword) %>% 
              count(sentiment) %>% 
              spread(sentiment, n) %>%
              mutate(ratio = negative/positive)

            afinn_ratio

## # A tibble: 2 × 4
## # Groups:   keyword [2]
##   keyword       negative positive ratio
##   <chr>            <int>    <int> <dbl>
## 1 disability         313      175 1.79 
## 2 special needs      149      221 0.674

3.  Keyword Differences: graphing positive versus negative tweets for the two keywords

 #For keyword 'disability'
        afinn_counts_dis <- afinn_sentiment %>%
          group_by(keyword) %>% 
          count(sentiment) %>%
          filter(keyword == "disability")

        afinn_counts_dis %>%
          ggplot(aes(x="", y=n, fill=sentiment)) +
          geom_bar(width = .6, stat = "identity") +
          labs(title = "Disability, Disabled Child, & Mom",
               subtitle = "Proportion of Positive & Negative Tweets") +
          coord_polar(theta = "y") +
          theme_void()

    Sentiment is decidedly more negative with the "disability" keyword than it is with the "special needs" keyword phrase (below).

#Repeat for special needs

        afinn_counts_sn <- afinn_sentiment %>%
          group_by(keyword) %>% 
          count(sentiment) %>%
          filter(keyword == "special needs")

        afinn_counts_sn

## # A tibble: 2 × 3
## # Groups:   keyword [1]
##   keyword       sentiment     n
##   <chr>         <chr>     <int>
## 1 special needs negative    149
## 2 special needs positive    221

        afinn_counts_sn %>%
          ggplot(aes(x="", y=n, fill=sentiment)) +
          geom_bar(width = .6, stat = "identity") +
          labs(title = "Special Needs and Mom",
               subtitle = "Proportion of Positive & Negative Tweets") +
          coord_polar(theta = "y") +
          theme_void()

    1.  Calculating sentiment scores for each lexicon and then comparing positive and negative sentiment for each lexicon visually.

# Creating "summary" data frames for each sentiment, parsing out summary scores of positive and negative sentiment.

            summary_afinn3 <- sentiment_afinn %>% 
              group_by(keyword) %>% 
              filter(value != 0) %>%
              mutate(sentiment = if_else(value < 0, "negative", "positive")) %>% 
              count(sentiment, sort = TRUE) %>% 
              mutate(method = "afinn")

            summary_bing3 <- sentiment_bing %>% 
              group_by(keyword) %>% 
              count(sentiment, sort = TRUE) %>% 
              mutate(method = "bing")

            summary_nrc3 <- sentiment_nrc %>% 
              filter(sentiment %in% c("positive", "negative")) %>%
              group_by(keyword) %>% 
              count(sentiment, sort = TRUE) %>% 
              mutate(method = "nrc") 

            summary_loughran3 <- sentiment_loughran %>% 
              filter(sentiment %in% c("positive", "negative")) %>%
              group_by(keyword) %>% 
              count(sentiment, sort = TRUE) %>% 
              mutate(method = "loughran")

        Next, I'll combine lexicon summaries for summary of sentiment overall and visualize in a graph.

#Combining lexicon summaries to compare positive and negative sentiment scores in each lexicon.
            summary_sentiment <- bind_rows(summary_afinn3,
                                           summary_bing3,
                                           summary_nrc3,
                                           summary_loughran3) %>%
              arrange(method, keyword) %>%
              relocate(method)

            total_counts <- summary_sentiment %>%
              group_by(method, keyword) %>%
              summarise(total = sum(n))

## `summarise()` has grouped output by 'method'. You can override using the
## `.groups` argument.

            sentiment_counts <- left_join(summary_sentiment, total_counts)

## Joining, by = c("method", "keyword")

            sentiment_counts

## # A tibble: 16 × 5
## # Groups:   keyword [2]
##    method   keyword       sentiment     n total
##    <chr>    <chr>         <chr>     <int> <int>
##  1 afinn    disability    negative    710  1164
##  2 afinn    disability    positive    454  1164
##  3 afinn    special needs positive    463   788
##  4 afinn    special needs negative    325   788
##  5 bing     disability    negative    785  1149
##  6 bing     disability    positive    364  1149
##  7 bing     special needs positive    368   723
##  8 bing     special needs negative    355   723
##  9 loughran disability    negative    408   467
## 10 loughran disability    positive     59   467
## 11 loughran special needs negative    183   294
## 12 loughran special needs positive    111   294
## 13 nrc      disability    positive    811  1589
## 14 nrc      disability    negative    778  1589
## 15 nrc      special needs positive    724  1178
## 16 nrc      special needs negative    454  1178

        [***Positive and Negative Sentiment by Lexicon***]{.smallcaps}

        | Method   | Keyword       | Sentiment | n   |
        |----------|---------------|-----------|-----|
        | afinn    | disability    | negative  | 758 |
        | afinn    | disability    | positive  | 471 |
        | afinn    | special needs | positive  | 509 |
        | afinn    | special needs | negative  | 358 |
        | bing     | disability    | negative  | 819 |
        | bing     | disability    | positive  | 398 |
        | bing     | special needs | negative  | 407 |
        | bing     | special needs | positive  | 394 |
        | loughran | disability    | negative  | 476 |
        | loughran | disability    | positive  | 62  |
        | loughran | special needs | negative  | 210 |
        | loughran | special needs | positive  | 134 |
        | nrc      | disability    | negative  | 901 |
        | nrc      | disability    | positive  | 830 |
        | nrc      | special needs | positive  | 713 |
        | nrc      | special needs | negative  | 481 |
        | afinn    | disability    | negative  | 758 |

 #converting the sentiment scores to percentages for easier visualization
        sentiment_percents <- sentiment_counts %>%
          mutate(percent = n/total * 100)

        sentiment_percents

## # A tibble: 16 × 6
## # Groups:   keyword [2]
##    method   keyword       sentiment     n total percent
##    <chr>    <chr>         <chr>     <int> <int>   <dbl>
##  1 afinn    disability    negative    710  1164    61.0
##  2 afinn    disability    positive    454  1164    39.0
##  3 afinn    special needs positive    463   788    58.8
##  4 afinn    special needs negative    325   788    41.2
##  5 bing     disability    negative    785  1149    68.3
##  6 bing     disability    positive    364  1149    31.7
##  7 bing     special needs positive    368   723    50.9
##  8 bing     special needs negative    355   723    49.1
##  9 loughran disability    negative    408   467    87.4
## 10 loughran disability    positive     59   467    12.6
## 11 loughran special needs negative    183   294    62.2
## 12 loughran special needs positive    111   294    37.8
## 13 nrc      disability    positive    811  1589    51.0
## 14 nrc      disability    negative    778  1589    49.0
## 15 nrc      special needs positive    724  1178    61.5
## 16 nrc      special needs negative    454  1178    38.5

sentiment_percents %>%
          ggplot(aes(x = keyword, y = percent, fill=sentiment)) +
          geom_bar(width = .8, stat = "identity") +
          facet_wrap(~method, ncol = 1) +
          coord_flip() +
          labs(title = "Public Sentiment on Twitter", 
               subtitle = "Disability vs. Special Needs and Mom",
               x = "Keyword", 
               y = "Percentage of Words")

        summary_sentiment

## # A tibble: 16 × 4
## # Groups:   keyword [2]
##    method   keyword       sentiment     n
##    <chr>    <chr>         <chr>     <int>
##  1 afinn    disability    negative    710
##  2 afinn    disability    positive    454
##  3 afinn    special needs positive    463
##  4 afinn    special needs negative    325
##  5 bing     disability    negative    785
##  6 bing     disability    positive    364
##  7 bing     special needs positive    368
##  8 bing     special needs negative    355
##  9 loughran disability    negative    408
## 10 loughran disability    positive     59
## 11 loughran special needs negative    183
## 12 loughran special needs positive    111
## 13 nrc      disability    positive    811
## 14 nrc      disability    negative    778
## 15 nrc      special needs positive    724
## 16 nrc      special needs negative    454

4.  Visualizing Word Choice Through Word Clouds

    1.  Overall wordcloud

#Now I want to create a wordcloud of the tweets. However, there are
            #too many tweets to visualize. I'll choose the top 50.
            top_tokens_all <- tidy_tweets %>%
              count(word, sort = TRUE) %>%
              top_n(50)

## Selecting by n

            wordcloud2(top_tokens_all)

            #Some words are kind of irrelevant (1, 2, hes), but it does give a glimpse #overall of the top words. I'm interested to see that "support" is in the top 50 #since that is an aspect of the phenomenon of interest to me.

# Below I'm going to try to filter out o to see if words it looks any
            #different (in revising, I think I filtered these out a little earlier)

            top_tokens_all <- tidy_tweets %>%
              filter(!word %in% c("im", "special" , "amp" , "disabled",
                               "child", "kid" , "kids", "#etsy", "1",
                               "2", "3", "4")) %>%
              count(word, sort = TRUE) %>%
              top_n(50)

## Selecting by n

            wordcloud2(top_tokens_all)

    2.  "Disability" wordcloud

##Tokenizing text - Disability and Mom
            tweet_tokens_dm <- 
              dm_tweets %>%
              unnest_tokens(output = word, 
                            input = text, 
                            token = "tweets")

## Using `to_lower = TRUE` with `token = 'tweets'` may not preserve URLs.

            tidy_tweets_dm <-
              tweet_tokens_dm %>%
              anti_join(stop_words, by = "word") %>%
              filter(!word %in% c("im", "special" , "amp" , "disabled",
                               "child", "kid" , "kids", "#etsy", "1",
                               "2", "3", "4"))

            count(tidy_tweets_dm, word, sort = T)

## # A tibble: 4,436 × 2
##    word         n
##    <chr>    <int>
##  1 dont        62
##  2 parent      59
##  3 life        54
##  4 people      51
##  5 support     45
##  6 children    43
##  7 school      37
##  8 hard        36
##  9 care        35
## 10 woman       35
## # … with 4,426 more rows

            #selecting top 50 disability & mom tokens
            top_tokens_dm <- tidy_tweets_dm %>%
              count(word, sort = TRUE) %>%
              top_n(50)

## Selecting by n

            wordcloud2(top_tokens_dm)

    3.  "Special Needs" wordcloud

##Tokenizing text - Special Needs and Mom
            tweet_tokens_snm <- 
              snm_tweets %>%
              unnest_tokens(output = word, 
                            input = text, 
                            token = "tweets")

## Using `to_lower = TRUE` with `token = 'tweets'` may not preserve URLs.

            tidy_tweets_snm <-
              tweet_tokens_snm %>%
              anti_join(stop_words, by = "word") %>%
              filter(!word %in% c("im", "special" , "amp" , "disabled",
                               "child", "kid" , "kids", "#etsy", "1",
                               "2", "3", "4"))

            count(tidy_tweets_snm, word, sort = T)

## # A tibble: 2,875 × 2
##    word                             n
##    <chr>                        <int>
##  1 addition                        82
##  2 bite                            81
##  3 childautismcerebral             81
##  4 excited                         81
##  5 palsyarthritisautoaggression    81
##  6 share                           81
##  7 shop                            81
##  8 #protectivegloves               67
##  9 #cerebralpalsytoys              62
## 10 #bitingmittens                  55
## # … with 2,865 more rows

            #selecting top 50 disability & mom tokens
            top_tokens_snm <- tidy_tweets_snm %>%
              count(word, sort = TRUE) %>%
              top_n(50)

## Selecting by n

            wordcloud2(top_tokens_snm)

COMMUNICATE

Select Research Questions of Interest. As noted above, my primary research questions for this study are:
1. What is the overall sentiment of recent tweets on the topic of mothers parenting children with disabilities?
2. Does sentiment vary based on keyword (disability vs. special needs)?
3. Does sentiment vary by lexicon?

Polish. I did most of my “polishing” in the “Model” section, so not much to report here…

Narrate

Purpose: Mothers experience disproportionate role stress when caring for young children, potentially limiting time and resources for self-care and career advancement. These demands are increased when mothers care for children with special needs. This study reviewed sentiment in recent Twitter posts to investigate sentiment surrounding motherhood while caring for children with special needs.

Methods. Using the twitter API, I searched recent tweets (past six-nine days) for the following keywords:

Keyword	Search Term
special needs	#specialneeds AND mom
	#specialneeds AND mother
	special needs mom
	special needs kid
	special needs child
disabled	#disabledchild AND mom
	#disabledchild AND mother
	disabled child mom
	disabled kid mom
	disabled kid
	disabled child

I then tokenized the text of the tweets, loaded lexicons (afinn, bing, nrc, loughran), created dictionaries, and analyzed sentiment of the tokenized tweets. Next, I visualized sentiment overall and between the keywords “disability” and “special needs”. Lastly, I visualized the top 50 words overall, and those associated with each keyword, in wordclouds.

Findings. Recent tweets are more positive for terms including “special needs” and “mom” versus “disability” and “mom” across all lexicons. TTop words overall include: don’t, parent, shop, share, kids, and bite. Top words for disability and mom tweets include: parent, don’t, kids, school, people, care and (position 8) support. Top words for special needs and mom tweets include: share, addition, bite, excited, childautismcerebral, and palsyarthritisautoaggression.

Discussion. The word disability in tweets seems to be associated with more negative sentiment, which may indicate pejorative connotations of this word versus special needs. Some research has shown that mothers who themselves have disabilities are a highly stressed group (Lee, 2004), and it is unclear whether some tweets in this category may reflect not mothers of disabled children but disabled mothers. However, the difference in sentiment may also reflect inaccuracies in the lexicons themselves, which may ascribe positive sentiment to a word like “special” but a negative one to “disabled”. Terms such as “school”, “bite”, and other references to self-injurious behavior indicate that day-to-day concerns of safety and the educational environment dominate the recent Twitter discourse on mothering children with special needs–the focus is pragmatic. The frequency of the words “support” and “share” merit further investigation, in my opinion, as both may address support systems (familial, community, national) that are either present or absent for mothers raising children with special needs.

REFERENCES

Kim, J. (February 2, 2021). The mothers who already left. New York Magazine. https://www.thecut.com/2021/02/i-always-thought-id-be-a-working-mom.html

Lee, S., Oh, G., Hartmann, H., Gault, B. (February, 2004). The impact of disabilities on mothers’ work participation: Examining differences between single and married mothers. Institute for Women’s Policy Research, Washington, DC.

Ogrysko, N. (December 9, 2019). Lawmakers unveil details of ‘historic’ federal paid parental leave benefits. Federal News Network. Accessed from https://federalnewsnetwork.com/workforce/2019/12/lawmakers-unveil-details-of-historic-federal-paid-parental-leave-benefits/ on November 3, 2021.

Stewart, N. (July 28, 2020). When caring for your child’s needs becomes a job all on its own. The New York Times. https://www.nytimes.com/2020/07/24/us/children-disabilities-parenting-poverty-assistance.html

Analyzing Twitter Sentiment: Mothers of Children with Disabilities

Catherine Noonan

2/1/2022