PREPARE
Context:
Working parents in the United States experience limited support systems due to lack of affordable childcare, absence of paid family leave, and abbreviated family leave (Ogrysko, 2019). Mothers disproportionately manage children’s needs, especially during their children’s early years, due to cultural expectations, gender pay disparity, and likely other factors as well (Kim, 2021; Ogrysko, 2019). As a result, mothers of young children face barriers to career advancement and personal wellbeing, experiencing constrained schedules due to mothering duties and decreased time for sleep, exercise, social engagement, or other self-care. While the United States guarantees legal protection and accommodation for individuals with disabilities through legislation such as the Americans with Disabilities Act (ADA) and the Individuals with Disabilities Education Act (IDEA) and subsidized health insurance for lower income families with disabled children (Medicaid), “on the ground” these programs can be time-consuming and confusing to negotiate. Additionally, children with disabilities typically require frequent medical and therapy appointments and may have unpredictable episodes or illness or even hospitalization. Thus mothers of children with disabilities face significant constraints on their time and are asked to balance many roles (Kim, 2021; Stewart, 2020). To better understand the current dialogue surrounding mothers of children with disabilities, I investigated recent tweets with keywords such as “mom” or “mother” and “disability” or “special needs” and analyzed sentiment of the tweets using the afinn, loughran, bing, and nrc lexicons. Lexicons are pre-existing collections of words with associated sentiment or sentiments attached to them. While some lexicons characterize words on many axes (trust, etc.) all used here also offer a basic positive and negative characterization. Lexicons are created using available texts, generally from online sources, and therefore may not be accurate or valid in all contexts. Validity is enhanced with human review of words and sentiment values to ensure accuracy.
My guiding questions for this report are:
What is the overall sentiment of recent tweets on the topic of mothers parenting children with disabilities?
Does sentiment vary based on keyword (disability vs. special needs)?
Does sentiment vary by lexicon?
Another question that I am interested in (though I won’t address it in this project, but possibly will in my final project for this course) is whether sentiment varies based on the location of the Twitter poster. More specifically, does sentiment vary in states with expansive free or low-cost pre-K programs?
Some evidence suggests that greater access to low-cost early childhood education improves lifelong developmental and educational trajectories for children with disabilities (as well as for students from low income families and English language learners). It would also be interesting to see if such programs offer a “spillover effect” to mothers of kids with disabilities. Are mothers’ experiences different (as viewed by sentiment of tweets) in states with expansive pre-K programs versus those without?
Set up: To begin, I’ll install the required packages. Following, I load them into the library.
Sys.setlocale("LC_MESSAGES", "en_US.utf8")
## Warning in Sys.setlocale("LC_MESSAGES", "en_US.utf8"): LC_MESSAGES exists on
## Windows but is not operational
## Warning in Sys.setlocale("LC_MESSAGES", "en_US.utf8"): OS reports request to set
## locale to "en_US.utf8" cannot be honored
## [1] ""
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.0.5
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(readr)
## Warning: package 'readr' was built under R version 4.0.5
library(tidyr)
## Warning: package 'tidyr' was built under R version 4.0.5
library(rtweet)
## Warning: package 'rtweet' was built under R version 4.0.5
library(writexl)
## Warning: package 'writexl' was built under R version 4.0.5
library(readxl)
## Warning: package 'readxl' was built under R version 4.0.5
library(tidytext)
## Warning: package 'tidytext' was built under R version 4.0.5
library(textdata)
## Warning: package 'textdata' was built under R version 4.0.5
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.0.5
library(textdata)
library(scales)
## Warning: package 'scales' was built under R version 4.0.5
##
## Attaching package: 'scales'
## The following object is masked from 'package:readr':
##
## col_factor
library(wordcloud2)
## Warning: package 'wordcloud2' was built under R version 4.0.5
3. Next, I'll store API keys and authenticate them and check to see that the token is loaded. Note: secret keys are hidden.
## authenticate via web browser
token <- create_token(
app = app_name,
consumer_key = api_key,
consumer_secret = api_secret_key,
access_token = access_token,
access_secret = access_token_secret)
## check to see if the token is loaded
get_token()
## <Token>
## <oauth_endpoint>
## request: https://api.twitter.com/oauth/request_token
## authorize: https://api.twitter.com/oauth/authenticate
## access: https://api.twitter.com/oauth/access_token
## <oauth_app> SNAfoo
## key: clxr0mEFqC3FwV5ESDd8d3yNF
## secret: <hidden>
## <credentials> oauth_token, oauth_token_secret
## ---
WRANGLE
2a. Importing Tweets
#disability_tweets <- search_tweets(q = "#disability", n=5000)
# specialneeds_tweets <- search_tweets(q = "#specialneeds" , n=5000)
# mom_disspecneeds_tweets <- search_tweets(q = "#disability OR #specialneeds
# AND mom" ,
# n=5000,
# include_rts = FALSE)
#kid_disspecneeds_tweets <- search_tweets(q = "#disabledchild OR #specialneeds
# AND mom" ,
# n=5000,
# include_rts = FALSE)
#There's actually a lot of overlap b/t the findings of the previous two
#searches, so the first set of terms seems to be sufficient.
2. Next I'll create two dictionaries, one for the keyword "disability" and the other for the keyword phrase "special needs"
#Next I'll create the dictionaries for 'special needs mom' and 'disabled
#child mom
specneedsmom_dictionary <- c("#specialneeds AND mom",
'"#specialneeds AND mother"',
'"special needs mom"',
'"special needs kid"',
'"special needs child"')
snm_tweets <- search_tweets2(specneedsmom_dictionary,
n=5000,
include_rts = FALSE)
diskidmom_dictionary <- c("#disabledchild AND mom",
'"#disabledchild AND mother"',
'"disabled child mom"',
'"disabled kid mom"',
'"disabled kid"',
'"disabled child"')
dm_tweets <- search_tweets2(diskidmom_dictionary,
n=5000,
include_rts = FALSE)
3. Next, I'll save the tweet files to Excel. This allows me to have a stable set of data, since Twitter and tweets are constantly changing. This would be useful if I want to do further analysis on this same set of tweets.
## Saving tweet files to Excel (need to create data folder first)
#write_xlsx(snm_tweets, "data/snm_tweets.xlsx")
#write_xlsx(dm_tweets, "data/dm_tweets.xlsx")
dm_tweets <- read_xlsx("data/dm_tweets.xlsx")
snm_tweets <- read_xlsx("data/snm_tweets.xlsx")
[2b. Tidying the text]{.ul}
4. Here I'll filter tweets by language, select relevant columns, add a column for keyword ("disabled" vs. "special needs") and relocate that column to first position
#for disability
dm_text <- dm_tweets %>%
filter(lang == "en") %>%
select(screen_name, created_at, text) %>%
mutate(keyword = "disability") %>%
relocate(keyword)
#for special needs
snm_text <- snm_tweets %>%
filter(lang == "en") %>%
select(screen_name, created_at, text) %>%
mutate(keyword = "special needs") %>%
relocate(keyword)
5. Combine data frames and looking at head & tail of data frame
tweets <- bind_rows(dm_text, snm_text)
head(tweets)
## # A tibble: 6 x 4
## keyword screen_name created_at text
## <chr> <chr> <dttm> <chr>
## 1 disability dropoutninja 2022-02-02 15:05:18 "you know what's even more deva~
## 2 disability dhargenerator 2022-02-02 11:37:11 "Disabled Kid threatens to fire~
## 3 disability dhargenerator 2022-01-29 11:57:25 "Disabled Kid witch hunts Famou~
## 4 disability dhargenerator 2022-01-30 21:46:59 "Gay Democrats threatens to fir~
## 5 disability RyanLavender94 2022-02-02 06:51:06 "@charlieINTEL Bragging about b~
## 6 disability Minagelina 2022-02-02 03:35:19 "@kaptanobveus I wonder what we~
tail(tweets)
## # A tibble: 6 x 4
## keyword screen_name created_at text
## <chr> <chr> <dttm> <chr>
## 1 special needs walrozt 2022-01-25 14:43:59 "@WataAce1 @marybaphomet No~
## 2 special needs MrProPHessional 2022-01-25 14:22:53 "@queenglitter4 As a father~
## 3 special needs KinsG8R 2022-01-25 14:11:38 "@ddespairmusic @RollingSto~
## 4 special needs Rasuberri 2022-01-25 14:01:57 "@pulte Single mom of a spe~
## 5 special needs HeatherBayne6 2022-01-25 13:58:46 "@Jayecane Yes I got into a~
## 6 special needs RoseDaddyMike 2022-01-25 13:28:38 "@Neilyoung was a pile of ~
6. Tokenizing the text
tweet_tokens <-
tweets %>%
unnest_tokens(output = word,
input = text,
token = "tweets")
## Using `to_lower = TRUE` with `token = 'tweets'` may not preserve URLs.
7. Removing stop words, looking at top word counts, and filtering out nonsense words.
## Removing stop words
tidy_tweets <-
tweet_tokens %>%
anti_join(stop_words, by = "word")
## Looking at top word counts
count(tidy_tweets, word, sort = T)
## # A tibble: 6,283 x 2
## word n
## <chr> <int>
## 1 child 751
## 2 disabled 656
## 3 special 372
## 4 kid 258
## 5 amp 111
## 6 parent 99
## 7 addition 96
## 8 im 96
## 9 dont 94
## 10 share 94
## # ... with 6,273 more rows
#Making the code more orderly and removing nonsense words
tidy_tweets <-
tweet_tokens %>%
anti_join(stop_words, by = "word") %>%
filter(!word == "amp" & !word == "im" & !word == "special"
& !word == "disabled" & !word == "child"
& !word == "kid" & !word == "#etsy" & !word == "kids"
& !word == "1" & !word == "2" & !word == "3"
& !word == "4")
count(tidy_tweets, word, sort = T)
## # A tibble: 6,271 x 2
## word n
## <chr> <int>
## 1 parent 99
## 2 addition 96
## 3 dont 94
## 4 share 94
## 5 bite 93
## 6 shop 93
## 7 childautismcerebral 91
## 8 excited 91
## 9 palsyarthritisautoaggression 91
## 10 school 86
## # ... with 6,261 more rows
[2c. Sentiment Values]{.ul}
8. Next I'll add sentiment values. For every lexicon but bing, I have to select '1'
in the console
afinn <- get_sentiments("afinn")
bing <- get_sentiments("bing")
nrc <- get_sentiments("nrc")
loughran <- get_sentiments("loughran")
9. Next, I will join each lexicon with the tidy_tweets file, creating a separate column that designates the lexicon being used. *(At least, I think that's what I'm doing here!)*
sentiment_afinn <- inner_join(tidy_tweets, afinn, by = "word")
sentiment_afinn
## # A tibble: 1,930 x 5
## keyword screen_name created_at word value
## <chr> <chr> <dttm> <chr> <dbl>
## 1 disability dropoutninja 2022-02-02 15:05:18 devastating -2
## 2 disability dropoutninja 2022-02-02 15:05:18 support 2
## 3 disability dropoutninja 2022-02-02 15:05:18 support 2
## 4 disability dhargenerator 2022-02-02 11:37:11 threatens -2
## 5 disability dhargenerator 2022-02-02 11:37:11 fire -2
## 6 disability dhargenerator 2022-02-02 11:37:11 shock -2
## 7 disability dhargenerator 2022-01-29 11:57:25 shock -2
## 8 disability dhargenerator 2022-01-30 21:46:59 threatens -2
## 9 disability dhargenerator 2022-01-30 21:46:59 fire -2
## 10 disability dhargenerator 2022-01-30 21:46:59 regrets -2
## # ... with 1,920 more rows
sentiment_bing <- inner_join(tidy_tweets, bing, by = "word")
sentiment_bing
## # A tibble: 1,801 x 5
## keyword screen_name created_at word sentiment
## <chr> <chr> <dttm> <chr> <chr>
## 1 disability dropoutninja 2022-02-02 15:05:18 devastating negative
## 2 disability dropoutninja 2022-02-02 15:05:18 support positive
## 3 disability dropoutninja 2022-02-02 15:05:18 support positive
## 4 disability dhargenerator 2022-02-02 11:37:11 spoiled negative
## 5 disability dhargenerator 2022-02-02 11:37:11 shock negative
## 6 disability dhargenerator 2022-01-29 11:57:25 famous positive
## 7 disability dhargenerator 2022-01-29 11:57:25 shock negative
## 8 disability dhargenerator 2022-01-30 21:46:59 instantly positive
## 9 disability dhargenerator 2022-01-30 21:46:59 regrets negative
## 10 disability ZeffieXD 2022-02-02 01:36:57 bad negative
## # ... with 1,791 more rows
sentiment_nrc <- inner_join(tidy_tweets, nrc, by = "word")
sentiment_nrc
## # A tibble: 7,744 x 5
## keyword screen_name created_at word sentiment
## <chr> <chr> <dttm> <chr> <chr>
## 1 disability dropoutninja 2022-02-02 15:05:18 devastating anger
## 2 disability dropoutninja 2022-02-02 15:05:18 devastating disgust
## 3 disability dropoutninja 2022-02-02 15:05:18 devastating fear
## 4 disability dropoutninja 2022-02-02 15:05:18 devastating negative
## 5 disability dropoutninja 2022-02-02 15:05:18 devastating sadness
## 6 disability dropoutninja 2022-02-02 15:05:18 devastating trust
## 7 disability dropoutninja 2022-02-02 15:05:18 disability negative
## 8 disability dropoutninja 2022-02-02 15:05:18 disability sadness
## 9 disability dhargenerator 2022-02-02 11:37:11 fire fear
## 10 disability dhargenerator 2022-02-02 11:37:11 shock anger
## # ... with 7,734 more rows
sentiment_loughran <- inner_join(tidy_tweets, loughran, by = "word")
sentiment_loughran
## # A tibble: 959 x 5
## keyword screen_name created_at word sentiment
## <chr> <chr> <dttm> <chr> <chr>
## 1 disability dropoutninja 2022-02-02 15:05:18 devastating negative
## 2 disability dhargenerator 2022-02-02 11:37:11 threatens negative
## 3 disability dhargenerator 2022-01-30 21:46:59 threatens negative
## 4 disability ZeffieXD 2022-02-02 01:36:57 bad negative
## 5 disability ZeffieXD 2022-02-02 01:36:57 fired negative
## 6 disability SolidCes 2022-02-01 20:36:05 successful positive
## 7 disability K_BallantyneArt 2022-02-01 19:29:57 frustrated negative
## 8 disability Randy_Bobandy88 2022-02-01 19:12:26 bad negative
## 9 disability Randy_Bobandy88 2022-02-01 19:12:26 criticize negative
## 10 disability ReitzPiper 2022-02-01 12:50:14 hurt negative
## # ... with 949 more rows
EXPLORE
ts_plot(tweets, by = "days") ##plot by days
ts_plot(tweets, by = "hours") ## plot by hours
2. Plot by groups of keywords
#plot by keyword (disability vs. special needs)
ts_plot(dplyr::group_by(tweets, keyword), "hours")
ts_plot(dplyr::group_by(tweets, keyword), "days")
3. Analyzing sentiment, grouping by keyword, and creating a sentiment score for the **bing** lexicon, and adding a lexicon variable (column) to the data frame.
# Bing: Creating a single sentiment score and adding a lexicon variable
# (the spread function from the tidyr package transforms our sentiment
# column into separate columns for negative and positive
# that contains the n counts for each)
summary_bing <- sentiment_bing %>%
group_by(keyword) %>%
count(sentiment, sort = TRUE) %>%
spread(sentiment, n) %>%
mutate(sentiment = positive - negative) %>%
mutate(lexicon = "bing") %>%
relocate(lexicon)
summary_bing
## # A tibble: 2 x 5
## # Groups: keyword [2]
## lexicon keyword negative positive sentiment
## <chr> <chr> <int> <int> <int>
## 1 bing disability 727 356 -371
## 2 bing special needs 342 376 34
4. Repeating the steps above for the remaining lexicons:
1. afinn
# repeating the step above but for afinn lexicon
summary_afinn <- sentiment_afinn %>%
group_by(keyword) %>%
summarise(sentiment = sum(value)) %>%
mutate(lexicon = "afinn") %>%
relocate(lexicon)
summary_afinn
## # A tibble: 2 x 3
## lexicon keyword sentiment
## <chr> <chr> <dbl>
## 1 afinn disability -622
## 2 afinn special needs 309
2. loughran
# repeating bing steps above for loughran lexicon and filtering summary
# to only see positive and negative values
summary_loughran <- sentiment_loughran %>%
group_by(keyword) %>%
count(sentiment, sort = TRUE) %>%
spread(sentiment, n) %>%
mutate(sentiment = positive - negative) %>%
mutate(lexicon = "loughran") %>%
relocate(lexicon)
summary_loughran
## # A tibble: 2 x 9
## # Groups: keyword [2]
## lexicon keyword constraining litigious negative positive superfluous
## <chr> <chr> <int> <int> <int> <int> <int>
## 1 loughran disability 24 56 416 63 1
## 2 loughran special needs 18 24 177 122 NA
## # ... with 2 more variables: uncertainty <int>, sentiment <int>
summary_loughran_2 <- summary_loughran %>%
select(lexicon, keyword, negative, positive, sentiment)
summary_loughran_2
## # A tibble: 2 x 5
## # Groups: keyword [2]
## lexicon keyword negative positive sentiment
## <chr> <chr> <int> <int> <int>
## 1 loughran disability 416 63 -353
## 2 loughran special needs 177 122 -55
3. nrc
# repeating above steps for nrc lexicon; also selecting
# only rows that contain "positive" and "negative" b/c
# nrc lexicon contains other values like "trust","sadness"
summary_nrc <- sentiment_nrc %>%
group_by(keyword) %>%
count(sentiment, sort = TRUE) %>%
spread(sentiment, n) %>%
mutate(sentiment = positive - negative) %>%
mutate(lexicon = "nrc") %>%
relocate(lexicon)
summary_nrc
## # A tibble: 2 x 13
## # Groups: keyword [2]
## lexicon keyword anger anticipation disgust fear joy negative positive
## <chr> <chr> <int> <int> <int> <int> <int> <int> <int>
## 1 nrc disability 378 379 272 469 282 784 750
## 2 nrc special needs 156 388 106 185 399 435 701
## # ... with 4 more variables: sadness <int>, surprise <int>, trust <int>,
## # sentiment <int>
summary_nrc_2 <- summary_nrc %>%
select(lexicon, keyword, negative, positive, sentiment)
summary_nrc_2
## # A tibble: 2 x 5
## # Groups: keyword [2]
## lexicon keyword negative positive sentiment
## <chr> <chr> <int> <int> <int>
## 1 nrc disability 784 750 -34
## 2 nrc special needs 435 701 266
MODEL
The “Modeling” step of the learning analytics workflow involves using statistical models to analyze data and, where possible, make predictions. I’ll also use this section to visualize how sentiment varies by lexicon and what the nature of word choice is in tweets about mothering disabled or special needs children. Because I’m changing the order slightly from the Walkthrough, I’m going to do some “polishing” steps here…
Polishing
dm_text <-
dm_tweets %>%
filter(lang == "en") %>%
select(status_id, text) %>%
mutate(keyword = "disability") %>%
relocate(keyword)
snm_text <-
snm_tweets %>%
filter(lang == "en") %>%
select(status_id, text) %>%
mutate(keyword = "special needs") %>%
relocate(keyword)
2. Step 2: merging data frames
#merging the two data frames and taking a look
tweets <- bind_rows(dm_text, snm_text)
tweets
## # A tibble: 1,072 x 3
## keyword status_id text
## <chr> <chr> <chr>
## 1 disability 1488891280469991425 "you know what's even more devastating than n~
## 2 disability 1488838905637969920 "Disabled Kid threatens to fire Spoiled Impos~
## 3 disability 1487394447344091139 "Disabled Kid witch hunts Famous Mechanic, wh~
## 4 disability 1487905204007702528 "Gay Democrats threatens to fire Disabled Kid~
## 5 disability 1488766912565850113 "@charlieINTEL Bragging about beating this ga~
## 6 disability 1488717642504589319 "@kaptanobveus I wonder what we can do from h~
## 7 disability 1488712251443843075 "also i am attracted to the disabled kid"
## 8 disability 1488115941556715520 "also i am attracted to the disabled kid"
## 9 disability 1487813895284658177 "also i am attracted to the disabled kid"
## 10 disability 1487051331751710721 "also i am attracted to the disabled kid"
## # ... with 1,062 more rows
head(tweets)
## # A tibble: 6 x 3
## keyword status_id text
## <chr> <chr> <chr>
## 1 disability 1488891280469991425 "you know what's even more devastating than no~
## 2 disability 1488838905637969920 "Disabled Kid threatens to fire Spoiled Impost~
## 3 disability 1487394447344091139 "Disabled Kid witch hunts Famous Mechanic, wha~
## 4 disability 1487905204007702528 "Gay Democrats threatens to fire Disabled Kid,~
## 5 disability 1488766912565850113 "@charlieINTEL Bragging about beating this gam~
## 6 disability 1488717642504589319 "@kaptanobveus I wonder what we can do from he~
tail(tweets)
## # A tibble: 6 x 3
## keyword status_id text
## <chr> <chr> <chr>
## 1 special needs 1485986814200528902 "@WataAce1 @marybaphomet No, you used it as~
## 2 special needs 1485981505226776577 "@queenglitter4 As a father of a brilliant ~
## 3 special needs 1485978671315886082 "@ddespairmusic @RollingStone Principles? L~
## 4 special needs 1485976238149754882 "@pulte Single mom of a special needs child~
## 5 special needs 1485975433132879874 "@Jayecane Yes I got into a bad accident my~
## 6 special needs 1485967849881575424 "@Neilyoung was a pile of Sh#t that abando~
3. Step 3: analyzing sentiment from the afinn lexicon
#Cleaning up code for analyzing sentiment from each lexicon
#afinn
customwords <- c("amp" , "im" , "child" , "disabled" ,
"special" , "kid", "1" , "2" , "3" , "4")
sentiment_afinn <- tweets %>%
unnest_tokens(output = word,
input = text,
token = "tweets") %>%
anti_join(stop_words, by = "word") %>%
filter(!word == "amp" & !word == "im" & !word == "special"
& !word == "disabled" & !word == "child"
& !word == "kid" & !word == "#etsy" & !word == "kids"
& !word == "1" & !word == "2" & !word == "3"
& !word == "4") %>%
inner_join(afinn, by = "word")
## Using `to_lower = TRUE` with `token = 'tweets'` may not preserve URLs.
# Dr. J: wondering if I could have done filter(!word == "customwords") %>%
# above instead
sentiment_afinn
## # A tibble: 1,930 x 4
## keyword status_id word value
## <chr> <chr> <chr> <dbl>
## 1 disability 1488891280469991425 devastating -2
## 2 disability 1488891280469991425 support 2
## 3 disability 1488891280469991425 support 2
## 4 disability 1488838905637969920 threatens -2
## 5 disability 1488838905637969920 fire -2
## 6 disability 1488838905637969920 shock -2
## 7 disability 1487394447344091139 shock -2
## 8 disability 1487905204007702528 threatens -2
## 9 disability 1487905204007702528 fire -2
## 10 disability 1487905204007702528 regrets -2
## # ... with 1,920 more rows
afinn_score <- sentiment_afinn %>%
group_by(keyword, status_id) %>%
summarise(value = sum(value))
## `summarise()` has grouped output by 'keyword'. You can override using the
## `.groups` argument.
afinn_score
## # A tibble: 879 x 3
## # Groups: keyword [2]
## keyword status_id value
## <chr> <chr> <dbl>
## 1 disability 1485965986058715137 -2
## 2 disability 1485970391323799553 5
## 3 disability 1485973110335627268 -3
## 4 disability 1485980376552165376 -3
## 5 disability 1485981992185438212 -2
## 6 disability 1485983403367411717 -2
## 7 disability 1485988403309162499 -4
## 8 disability 1485992900513124356 -7
## 9 disability 1485994909274460161 -1
## 10 disability 1485997459696504845 -1
## # ... with 869 more rows
afinn_sentiment <- afinn_score %>%
filter(value != 0) %>%
mutate(sentiment = if_else(value < 0, "negative", "positive"))
afinn_sentiment
## # A tibble: 836 x 4
## # Groups: keyword [2]
## keyword status_id value sentiment
## <chr> <chr> <dbl> <chr>
## 1 disability 1485965986058715137 -2 negative
## 2 disability 1485970391323799553 5 positive
## 3 disability 1485973110335627268 -3 negative
## 4 disability 1485980376552165376 -3 negative
## 5 disability 1485981992185438212 -2 negative
## 6 disability 1485983403367411717 -2 negative
## 7 disability 1485988403309162499 -4 negative
## 8 disability 1485992900513124356 -7 negative
## 9 disability 1485994909274460161 -1 negative
## 10 disability 1485997459696504845 -1 negative
## # ... with 826 more rows
afinn_ratio <- afinn_sentiment %>%
group_by(keyword) %>%
count(sentiment) %>%
spread(sentiment, n) %>%
mutate(ratio = negative/positive)
afinn_ratio
## # A tibble: 2 x 4
## # Groups: keyword [2]
## keyword negative positive ratio
## <chr> <int> <int> <dbl>
## 1 disability 290 170 1.71
## 2 special needs 143 233 0.614
3. Keyword Differences: graphing positive versus negative tweets for the two keywords
#For keyword 'disability'
afinn_counts_dis <- afinn_sentiment %>%
group_by(keyword) %>%
count(sentiment) %>%
filter(keyword == "disability")
afinn_counts_dis %>%
ggplot(aes(x="", y=n, fill=sentiment)) +
geom_bar(width = .6, stat = "identity") +
labs(title = "Disability, Disabled Child, & Mom",
subtitle = "Proportion of Positive & Negative Tweets") +
coord_polar(theta = "y") +
theme_void()
Sentiment is decidedly more negative with the "disability" keyword than it is with the "special needs" keyword phrase (below).
#Repeat for special needs
afinn_counts_sn <- afinn_sentiment %>%
group_by(keyword) %>%
count(sentiment) %>%
filter(keyword == "special needs")
afinn_counts_sn
## # A tibble: 2 x 3
## # Groups: keyword [1]
## keyword sentiment n
## <chr> <chr> <int>
## 1 special needs negative 143
## 2 special needs positive 233
afinn_counts_sn %>%
ggplot(aes(x="", y=n, fill=sentiment)) +
geom_bar(width = .6, stat = "identity") +
labs(title = "Special Needs and Mom",
subtitle = "Proportion of Positive & Negative Tweets") +
coord_polar(theta = "y") +
theme_void()
1. Calculating sentiment scores for each lexicon and then comparing positive and negative sentiment for each lexicon visually.
# Creating "summary" data frames for each sentiment, parsing out summary scores of positive and negative sentiment.
summary_afinn3 <- sentiment_afinn %>%
group_by(keyword) %>%
filter(value != 0) %>%
mutate(sentiment = if_else(value < 0, "negative", "positive")) %>%
count(sentiment, sort = TRUE) %>%
mutate(method = "afinn")
summary_bing3 <- sentiment_bing %>%
group_by(keyword) %>%
count(sentiment, sort = TRUE) %>%
mutate(method = "bing")
summary_nrc3 <- sentiment_nrc %>%
filter(sentiment %in% c("positive", "negative")) %>%
group_by(keyword) %>%
count(sentiment, sort = TRUE) %>%
mutate(method = "nrc")
summary_loughran3 <- sentiment_loughran %>%
filter(sentiment %in% c("positive", "negative")) %>%
group_by(keyword) %>%
count(sentiment, sort = TRUE) %>%
mutate(method = "loughran")
Next, I'll combine lexicon summaries for summary of sentiment overall and visualize in a graph.
#Combining lexicon summaries to compare positive and negative sentiment scores in each lexicon.
summary_sentiment <- bind_rows(summary_afinn3,
summary_bing3,
summary_nrc3,
summary_loughran3) %>%
arrange(method, keyword) %>%
relocate(method)
total_counts <- summary_sentiment %>%
group_by(method, keyword) %>%
summarise(total = sum(n))
## `summarise()` has grouped output by 'method'. You can override using the
## `.groups` argument.
sentiment_counts <- left_join(summary_sentiment, total_counts)
## Joining, by = c("method", "keyword")
sentiment_counts
## # A tibble: 16 x 5
## # Groups: keyword [2]
## method keyword sentiment n total
## <chr> <chr> <chr> <int> <int>
## 1 afinn disability negative 676 1126
## 2 afinn disability positive 450 1126
## 3 afinn special needs positive 489 804
## 4 afinn special needs negative 315 804
## 5 bing disability negative 727 1083
## 6 bing disability positive 356 1083
## 7 bing special needs positive 376 718
## 8 bing special needs negative 342 718
## 9 loughran disability negative 416 479
## 10 loughran disability positive 63 479
## 11 loughran special needs negative 177 299
## 12 loughran special needs positive 122 299
## 13 nrc disability negative 784 1534
## 14 nrc disability positive 750 1534
## 15 nrc special needs positive 701 1136
## 16 nrc special needs negative 435 1136
[***Positive and Negative Sentiment by Lexicon***]{.smallcaps}
| Method | Keyword | Sentiment | n |
|----------|---------------|-----------|-----|
| afinn | disability | negative | 758 |
| afinn | disability | positive | 471 |
| afinn | special needs | positive | 509 |
| afinn | special needs | negative | 358 |
| bing | disability | negative | 819 |
| bing | disability | positive | 398 |
| bing | special needs | negative | 407 |
| bing | special needs | positive | 394 |
| loughran | disability | negative | 476 |
| loughran | disability | positive | 62 |
| loughran | special needs | negative | 210 |
| loughran | special needs | positive | 134 |
| nrc | disability | negative | 901 |
| nrc | disability | positive | 830 |
| nrc | special needs | positive | 713 |
| nrc | special needs | negative | 481 |
| afinn | disability | negative | 758 |
#converting the sentiment scores to percentages for easier visualization
sentiment_percents <- sentiment_counts %>%
mutate(percent = n/total * 100)
sentiment_percents
## # A tibble: 16 x 6
## # Groups: keyword [2]
## method keyword sentiment n total percent
## <chr> <chr> <chr> <int> <int> <dbl>
## 1 afinn disability negative 676 1126 60.0
## 2 afinn disability positive 450 1126 40.0
## 3 afinn special needs positive 489 804 60.8
## 4 afinn special needs negative 315 804 39.2
## 5 bing disability negative 727 1083 67.1
## 6 bing disability positive 356 1083 32.9
## 7 bing special needs positive 376 718 52.4
## 8 bing special needs negative 342 718 47.6
## 9 loughran disability negative 416 479 86.8
## 10 loughran disability positive 63 479 13.2
## 11 loughran special needs negative 177 299 59.2
## 12 loughran special needs positive 122 299 40.8
## 13 nrc disability negative 784 1534 51.1
## 14 nrc disability positive 750 1534 48.9
## 15 nrc special needs positive 701 1136 61.7
## 16 nrc special needs negative 435 1136 38.3
sentiment_percents %>%
ggplot(aes(x = keyword, y = percent, fill=sentiment)) +
geom_bar(width = .8, stat = "identity") +
facet_wrap(~method, ncol = 1) +
coord_flip() +
labs(title = "Public Sentiment on Twitter",
subtitle = "Disability vs. Special Needs and Mom",
x = "Keyword",
y = "Percentage of Words")
summary_sentiment
## # A tibble: 16 x 4
## # Groups: keyword [2]
## method keyword sentiment n
## <chr> <chr> <chr> <int>
## 1 afinn disability negative 676
## 2 afinn disability positive 450
## 3 afinn special needs positive 489
## 4 afinn special needs negative 315
## 5 bing disability negative 727
## 6 bing disability positive 356
## 7 bing special needs positive 376
## 8 bing special needs negative 342
## 9 loughran disability negative 416
## 10 loughran disability positive 63
## 11 loughran special needs negative 177
## 12 loughran special needs positive 122
## 13 nrc disability negative 784
## 14 nrc disability positive 750
## 15 nrc special needs positive 701
## 16 nrc special needs negative 435
4. Visualizing Word Choice Through Word Clouds
1. Overall wordcloud
#Now I want to create a wordcloud of the tweets. However, there are
#too many tweets to visualize. I'll choose the top 50.
top_tokens_all <- tidy_tweets %>%
count(word, sort = TRUE) %>%
top_n(50)
## Selecting by n
wordcloud2(top_tokens_all)
#Some words are kind of irrelevant (1, 2, hes), but it does give a glimpse #overall of the top words. I'm interested to see that "support" is in the top 50 #since that is an aspect of the phenomenon of interest to me.
# Below I'm going to try to filter out "customwords" to see if it looks any
#different
top_tokens_all <- tidy_tweets %>%
filter(!word == "amp" & !word == "im" & !word == "special"
& !word == "disabled" & !word == "child"
& !word == "kid" & !word == "#etsy" & !word == "kids"
& !word == "1" & !word == "2" & !word == "3"
& !word == "4" & !word == "hes") %>%
count(word, sort = TRUE) %>%
top_n(50)
## Selecting by n
wordcloud2(top_tokens_all)
2. "Disability" wordcloud
##Tokenizing text - Disability and Mom
tweet_tokens_dm <-
dm_tweets %>%
unnest_tokens(output = word,
input = text,
token = "tweets")
## Using `to_lower = TRUE` with `token = 'tweets'` may not preserve URLs.
tidy_tweets_dm <-
tweet_tokens_dm %>%
anti_join(stop_words, by = "word") %>%
filter(!word == "amp" & !word == "im" & !word == "special"
& !word == "disabled" & !word == "child"
& !word == "kid" & !word == "#etsy" & !word == "kids"
& !word == "1" & !word == "2" & !word == "3"
& !word == "4" & !word == "hes")
count(tidy_tweets_dm, word, sort = T)
## # A tibble: 4,298 x 2
## word n
## <chr> <int>
## 1 parent 69
## 2 dont 62
## 3 school 56
## 4 people 47
## 5 children 42
## 6 care 37
## 7 support 37
## 8 disability 34
## 9 time 34
## 10 parents 31
## # ... with 4,288 more rows
#selecting top 50 disability & mom tokens
top_tokens_dm <- tidy_tweets_dm %>%
count(word, sort = TRUE) %>%
top_n(50)
## Selecting by n
wordcloud2(top_tokens_dm)
3. "Special Needs" wordcloud
##Tokenizing text - Special Needs and Mom
tweet_tokens_snm <-
snm_tweets %>%
unnest_tokens(output = word,
input = text,
token = "tweets")
## Using `to_lower = TRUE` with `token = 'tweets'` may not preserve URLs.
tidy_tweets_snm <-
tweet_tokens_snm %>%
anti_join(stop_words, by = "word") %>%
filter(!word == "amp" & !word == "im" & !word == "special"
& !word == "disabled" & !word == "child"
& !word == "kid" & !word == "#etsy" & !word == "kids"
& !word == "1" & !word == "2" & !word == "3"
& !word == "4" & !word == "hes")
#Ask Dr. J re: more efficient way to filter this
count(tidy_tweets_snm, word, sort = T)
## # A tibble: 2,939 x 2
## word n
## <chr> <int>
## 1 addition 93
## 2 share 92
## 3 bite 91
## 4 childautismcerebral 91
## 5 excited 91
## 6 palsyarthritisautoaggression 91
## 7 shop 91
## 8 #protectivegloves 77
## 9 #cerebralpalsytoys 72
## 10 #bitingmittens 64
## # ... with 2,929 more rows
#selecting top 50 disability & mom tokens
top_tokens_snm <- tidy_tweets_snm %>%
count(word, sort = TRUE) %>%
top_n(50)
## Selecting by n
wordcloud2(top_tokens_snm)
COMMUNICATE
Select Research Questions of Interest. As noted above, my primary research questions for this study are:
What is the overall sentiment of recent tweets on the topic of mothers parenting children with disabilities?
Does sentiment vary based on keyword (disability vs. special needs)?
Does sentiment vary by lexicon?
Polish. I did most of my “polishing” in the “Model” section, so not much to report here…
Narrate
Purpose: Mothers experience disproportionate role stress when caring for young children, potentially limiting time and resources for self-care and career advancement. These demands are increased when mothers care for children with special needs. This study reviewed sentiment in recent Twitter posts to investigate sentiment surrounding motherhood while caring for children with special needs.
Methods. Using the twitter API, I searched recent tweets (past six-nine days) for the following keywords:
| Keyword | Search Term |
|---|---|
| special needs | #specialneeds AND mom |
| #specialneeds AND mother | |
| special needs mom | |
| special needs kid | |
| special needs child | |
| disabled | #disabledchild AND mom |
| #disabledchild AND mother | |
| disabled child mom | |
| disabled kid mom | |
| disabled kid | |
| disabled child |
I then tokenized the text of the tweets, loaded lexicons (afinn, bing, nrc, loughran), created dictionaries, and analyzed sentiment of the tokenized tweets. Next, I visualized sentiment overall and between the keywords “disability” and “special needs”. Lastly, I visualized the top 50 words overall, and those associated with each keyword, in wordclouds.
Findings. Recent tweets are more positive for terms including “special needs” and “mom” versus “disability” and “mom” across all lexicons. TTop words overall include: don’t, parent, shop, share, kids, and bite. Top words for disability and mom tweets include: parent, don’t, kids, school, people, care and (position 8) support. Top words for special needs and mom tweets include: share, addition, bite, excited, childautismcerebral, and palsyarthritisautoaggression.
Discussion. The word disability in tweets seems to be associated with more negative sentiment, which may indicate pejorative connotations of this word versus special needs. Some research has shown that mothers who themselves have disabilities are a highly stressed group (Lee, 2004), and it is unclear whether some tweets in this category may reflect not mothers of disabled children but disabled mothers. However, the difference in sentiment may also reflect inaccuracies in the lexicons themselves, which may ascribe positive sentiment to a word like “special” but a negative one to “disabled”. Terms such as “school”, “bite”, and other references to self-injurious behavior indicate that day-to-day concerns of safety and the educational environment dominate the recent Twitter discourse on mothering children withe special needs–the focus is pragmatic. The frequency of the words “support” and “share” merit further investigation, in my opinion, as both may address support systems (familial, community, national) that are either present or absent for mothers raising children with special needs.
REFERENCES
Kim, J. (February 2, 2021). The mothers who already left. New York Magazine. https://www.thecut.com/2021/02/i-always-thought-id-be-a-working-mom.html
Lee, S., Oh, G., Hartmann, H., Gault, B. (February, 2004). The impact of disabilities on mothers’ work participation: Examining differences between single and married mothers. Institute for Women’s Policy Research, Washington, DC.
Ogrysko, N. (December 9, 2019). Lawmakers unveil details of ‘historic’ federal paid parental leave benefits. Federal News Network. Accessed from https://federalnewsnetwork.com/workforce/2019/12/lawmakers-unveil-details-of-historic-federal-paid-parental-leave-benefits/ on November 3, 2021.
Stewart, N. (July 28, 2020). When caring for your child’s needs becomes a job all on its own. The New York Times. https://www.nytimes.com/2020/07/24/us/children-disabilities-parenting-poverty-assistance.html