#I had quite a bit of trouble with getting RStudio.cloud to accept my lang == "en"
# command in code chunks 5 & 6, so here I set the system language to english
# I've never had to do this before, so it may not be necessary for others or
# even on a different day
Sys.setlocale("LC_MESSAGES", "en_US.utf8")
## [1] "en_US.utf8"
PREPARE
Context:
Working parents in the United States experience limited support systems due to lack of affordable childcare, absence of paid family leave, and abbreviated family leave (Ogrysko, 2019). Mothers disproportionately manage children’s needs, especially during their children’s early years, due to cultural expectations, gender pay disparity, and likely other factors as well (Kim, 2021; Ogrysko, 2019). As a result, mothers of young children face barriers to career advancement and personal wellbeing, experiencing constrained schedules due to mothering duties and decreased time for sleep, exercise, social engagement, or other self-care. While the United States guarantees legal protection and accommodation for individuals with disabilities through legislation such as the Americans with Disabilities Act (ADA) and the Individuals with Disabilities Education Act (IDEA) and subsidized health insurance for lower income families with disabled children (Medicaid), “on the ground” these programs can be time-consuming and confusing to negotiate. Additionally, children with disabilities typically require frequent medical and therapy appointments and may have unpredictable episodes or illness or even hospitalization. Thus mothers of children with disabilities face significant constraints on their time and are asked to balance many roles (Kim, 2021; Stewart, 2020). To better understand the current dialogue surrounding mothers of children with disabilities, I investigated recent tweets with keywords such as “mom” or “mother” and “disability” or “special needs” and analyzed sentiment of the tweets using the afinn, loughran, bing, and nrc lexicons. Lexicons are pre-existing collections of words with associated sentiment or sentiments attached to them. While some lexicons characterize words on many axes (trust, etc.) all used here also offer a basic positive and negative characterization. Lexicons are created using available texts, generally from online sources, and therefore may not be accurate or valid in all contexts. Validity is enhanced with human review of words and sentiment values to ensure accuracy.
My guiding questions for this report are:
What is the overall sentiment of recent tweets on the topic of mothers parenting children with disabilities?
Does sentiment vary based on keyword (disability vs. special needs)?
Does sentiment vary by lexicon?
Another question that I am interested in (though I won’t address it in this project, but possibly will in my final project for this course) is whether sentiment varies based on the location of the Twitter poster. More specifically, does sentiment vary in states with expansive free or low-cost pre-K programs?
Some evidence suggests that greater access to low-cost early childhood education improves lifelong developmental and educational trajectories for children with disabilities (as well as for students from low income families and English language learners). It would also be interesting to see if such programs offer a “spillover effect” to mothers of kids with disabilities. Are mothers’ experiences different (as viewed by sentiment of tweets) in states with expansive pre-K programs versus those without?
Set up: To begin, I’ll install the required packages. Following, I load them into the library.
# This is updating the wordcloud 2 package
remotes::install_github("lchiffon/wordcloud2")
## Skipping install of 'wordcloud2' from a github remote, the SHA1 (8a12a3b6) has not changed since last install.
## Use `force = TRUE` to force installation
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(readr)
library(tidyr)
library(rtweet)
library(writexl)
library(readxl)
library(tidytext)
library(textdata)
library(ggplot2)
library(textdata)
library(scales)
##
## Attaching package: 'scales'
## The following object is masked from 'package:readr':
##
## col_factor
library(wordcloud2)
3. Next, I'll store API keys and authenticate them and check to see that the token is loaded. Note: secret keys are hidden.
## There is a previous code chunk that stores Twitter API keys but these are private
# so the code chunk is set to not show up in the report: {r, include=FALSE}
## authenticate via web browser
token <- create_token(
app = app_name,
consumer_key = api_key,
consumer_secret = api_secret_key,
access_token = access_token,
access_secret = access_token_secret)
## check to see if the token is loaded
get_token()
## <Token>
## <oauth_endpoint>
## request: https://api.twitter.com/oauth/request_token
## authorize: https://api.twitter.com/oauth/authenticate
## access: https://api.twitter.com/oauth/access_token
## <oauth_app> SNAfoo
## key: clxr0mEFqC3FwV5ESDd8d3yNF
## secret: <hidden>
## <credentials> oauth_token, oauth_token_secret
## ---
WRANGLE
2a. Importing Tweets
#disability_tweets <- search_tweets(q = "#disability", n=5000)
#specialneeds_tweets <- search_tweets(q = "#specialneeds" , n=5000)
#mom_disspecneeds_tweets <- search_tweets(q = "#disability OR #specialneeds
# AND mom" ,
# n=5000,
# include_rts = FALSE)
# kid_disspecneeds_tweets <- search_tweets(q = "#disabledchild OR #specialneeds
# AND mom" ,
# n=5000,
# include_rts = FALSE)
# There's actually a lot of overlap b/t the findings of the previous two
# searches, so the first set of terms seems to be sufficient.
# I've commented this code out because it takes a long time during the knitting
# process and I really don't need it for my analysis. This chunk is really
# just me seeing what's out there on Twitter for these keywords
2. Next I'll create two dictionaries, one for the keyword "disability" and the other for the keyword phrase "special needs"
#Next I'll create the dictionaries for 'special needs mom' and 'disabled
#child mom
specneedsmom_dictionary <- c("#specialneeds AND mom",
'"#specialneeds AND mother"',
'"special needs mom"',
'"special needs kid"',
'"special needs child"')
snm_tweets <- search_tweets2(specneedsmom_dictionary,
n=5000,
include_rts = FALSE)
diskidmom_dictionary <- c("#disabledchild AND mom",
'"#disabledchild AND mother"',
'"disabled child mom"',
'"disabled kid mom"',
'"disabled kid"',
'"disabled child"')
dm_tweets <- search_tweets2(diskidmom_dictionary,
n=5000,
include_rts = FALSE)
3. Next, I'll save the tweet files to Excel. This allows me to have a stable set of data, since Twitter and tweets are constantly changing. This would be useful if I want to do further analysis on this same set of tweets.
## Saving tweet files to Excel (need to create data folder first)
write_xlsx(snm_tweets, "data/snm_tweets.xlsx")
write_xlsx(dm_tweets, "data/dm_tweets.xlsx")
[2b. Tidying the text]{.ul}
4. Here I'll filter tweets by language, select relevant columns, add a column for keyword ("disabled" vs. "special needs") and relocate that column to first position
#for disability
dm_text <- dm_tweets %>%
filter(lang == "en") %>%
select(screen_name, created_at, text) %>%
mutate(keyword = "disability") %>%
relocate(keyword)
#for special needs
snm_text <- snm_tweets %>%
filter(lang == "en") %>%
select(screen_name, created_at, text) %>%
mutate(keyword = "special needs") %>%
relocate(keyword)
5. Combine data frames and looking at head & tail of data frame
tweets <- bind_rows(dm_text, snm_text)
head(tweets)
## # A tibble: 6 × 4
## keyword screen_name created_at text
## <chr> <chr> <dttm> <chr>
## 1 disability NavyPrism 2022-02-04 15:24:35 "@AlliAlliG I actually havnt r…
## 2 disability eflask 2022-02-04 17:29:39 "@VermontStudent @michellebfay…
## 3 disability MacaroniMan2021 2022-02-04 15:49:54 "@MhairiHunter The Nazis kille…
## 4 disability CannaCaptain 2022-02-04 15:39:06 "@allan_cheapshot If you’re te…
## 5 disability MstrssVeronica 2022-02-04 15:29:28 "This doesn't offend me becaus…
## 6 disability NavyPrism 2022-02-04 15:24:35 "@AlliAlliG I actually havnt r…
tail(tweets)
## # A tibble: 6 × 4
## keyword screen_name created_at text
## <chr> <chr> <dttm> <chr>
## 1 special needs RonSnowflake 2022-01-27 18:16:40 "@HaYanGamer420 @Jester2218…
## 2 special needs jeffreykniffin 2022-01-27 18:05:28 "@irbransmom Apparently she…
## 3 special needs FoxRothschild 2022-01-27 17:45:25 "Difficulties in divorce se…
## 4 special needs rnwalker 2022-01-27 17:30:05 "A #specialneeds #trust can…
## 5 special needs Brandon_Nedib 2022-01-27 16:58:08 "@kgeads17 @GovernorVA I ha…
## 6 special needs TheJayCalledLee 2022-01-27 13:37:14 "Fury over Muslim boy, 11, …
6. Tokenizing the text
tweet_tokens <-
tweets %>%
unnest_tokens(output = word,
input = text,
token = "tweets")
## Using `to_lower = TRUE` with `token = 'tweets'` may not preserve URLs.
7. Removing stop words, looking at top word counts, and filtering out nonsense words.
## Removing stop words
tidy_tweets <-
tweet_tokens %>%
anti_join(stop_words, by = "word")
## Looking at top word counts
count(tidy_tweets, word, sort = T)
## # A tibble: 6,390 × 2
## word n
## <chr> <int>
## 1 child 799
## 2 disabled 697
## 3 special 372
## 4 kid 272
## 5 amp 96
## 6 parent 94
## 7 dont 92
## 8 addition 85
## 9 bite 83
## 10 im 83
## # … with 6,380 more rows
#Making the code more orderly and filtering out additional words
tidy_tweets <-
tweet_tokens %>%
anti_join(stop_words, by = "word") %>%
filter(!word %in% c("im", "special" , "amp" , "disabled",
"child", "kid" , "kids", "#etsy", "1",
"2", "3", "4"))
count(tidy_tweets, word, sort = T)
## # A tibble: 6,378 × 2
## word n
## <chr> <int>
## 1 parent 94
## 2 dont 92
## 3 addition 85
## 4 bite 83
## 5 shop 83
## 6 share 82
## 7 childautismcerebral 81
## 8 excited 81
## 9 palsyarthritisautoaggression 81
## 10 care 80
## # … with 6,368 more rows
[2c. Sentiment Values]{.ul}
8. Next I'll add sentiment values. For every lexicon but bing, I have to select '1'
in the console
afinn <- get_sentiments("afinn")
bing <- get_sentiments("bing")
nrc <- get_sentiments("nrc")
loughran <- get_sentiments("loughran")
9. Next, I will join each lexicon with the tidy_tweets file, creating a separate column that designates the lexicon being used. *(At least, I think that's what I'm doing here!)*
sentiment_afinn <- inner_join(tidy_tweets, afinn, by = "word")
sentiment_afinn
## # A tibble: 1,952 × 5
## keyword screen_name created_at word value
## <chr> <chr> <dttm> <chr> <dbl>
## 1 disability NavyPrism 2022-02-04 15:24:35 lol 3
## 2 disability eflask 2022-02-04 17:29:39 worry -3
## 3 disability eflask 2022-02-04 17:29:39 supports 2
## 4 disability MacaroniMan2021 2022-02-04 15:49:54 killed -3
## 5 disability MacaroniMan2021 2022-02-04 15:49:54 jokes 2
## 6 disability MacaroniMan2021 2022-02-04 15:49:54 hate -3
## 7 disability MstrssVeronica 2022-02-04 15:29:28 offend -2
## 8 disability MstrssVeronica 2022-02-04 15:29:28 offends -2
## 9 disability MstrssVeronica 2022-02-04 15:29:28 ashamed -2
## 10 disability NavyPrism 2022-02-04 15:24:35 lol 3
## # … with 1,942 more rows
sentiment_bing <- inner_join(tidy_tweets, bing, by = "word")
sentiment_bing
## # A tibble: 1,872 × 5
## keyword screen_name created_at word sentiment
## <chr> <chr> <dttm> <chr> <chr>
## 1 disability eflask 2022-02-04 17:29:39 worry negative
## 2 disability eflask 2022-02-04 17:29:39 expensive negative
## 3 disability eflask 2022-02-04 17:29:39 supports positive
## 4 disability MacaroniMan2021 2022-02-04 15:49:54 killed negative
## 5 disability MacaroniMan2021 2022-02-04 15:49:54 hate negative
## 6 disability CannaCaptain 2022-02-04 15:39:06 invader negative
## 7 disability MstrssVeronica 2022-02-04 15:29:28 offend negative
## 8 disability MstrssVeronica 2022-02-04 15:29:28 profoundly positive
## 9 disability MstrssVeronica 2022-02-04 15:29:28 embarrassingly negative
## 10 disability MstrssVeronica 2022-02-04 15:29:28 ashamed negative
## # … with 1,862 more rows
sentiment_nrc <- inner_join(tidy_tweets, nrc, by = "word")
sentiment_nrc
## # A tibble: 7,926 × 5
## keyword screen_name created_at word sentiment
## <chr> <chr> <dttm> <chr> <chr>
## 1 disability NavyPrism 2022-02-04 15:24:35 time anticipation
## 2 disability eflask 2022-02-04 17:29:39 worry anticipation
## 3 disability eflask 2022-02-04 17:29:39 worry fear
## 4 disability eflask 2022-02-04 17:29:39 worry negative
## 5 disability eflask 2022-02-04 17:29:39 worry sadness
## 6 disability eflask 2022-02-04 17:29:39 argue anger
## 7 disability eflask 2022-02-04 17:29:39 argue negative
## 8 disability eflask 2022-02-04 17:29:39 offense anger
## 9 disability eflask 2022-02-04 17:29:39 offense disgust
## 10 disability eflask 2022-02-04 17:29:39 offense fear
## # … with 7,916 more rows
sentiment_loughran <- inner_join(tidy_tweets, loughran, by = "word")
sentiment_loughran
## # A tibble: 958 × 5
## keyword screen_name created_at word sentiment
## <chr> <chr> <dttm> <chr> <chr>
## 1 disability eflask 2022-02-04 17:29:39 worry negative
## 2 disability eflask 2022-02-04 17:29:39 argue negative
## 3 disability eflask 2022-02-04 17:29:39 offense litigious
## 4 disability eflask 2022-02-04 17:29:39 adequately positive
## 5 disability MacaroniMan2021 2022-02-04 15:49:54 depending uncertainty
## 6 disability MacaroniMan2021 2022-02-04 15:49:54 depending constraining
## 7 disability MstrssVeronica 2022-02-04 15:29:28 offend negative
## 8 disability MstrssVeronica 2022-02-04 15:29:28 offends negative
## 9 disability LansleyAnna 2022-02-04 15:08:55 severely negative
## 10 disability LansleyAnna 2022-02-04 15:08:55 doubts negative
## # … with 948 more rows
EXPLORE
ts_plot(tweets, by = "days") ##plot by days
ts_plot(tweets, by = "hours") ## plot by hours
2. Plot by groups of keywords
#plot by keyword (disability vs. special needs)
ts_plot(dplyr::group_by(tweets, keyword), "hours")
ts_plot(dplyr::group_by(tweets, keyword), "days")
3. Analyzing sentiment, grouping by keyword, and creating a sentiment score for the **bing** lexicon, and adding a lexicon variable (column) to the data frame.
# Bing: Creating a single sentiment score and adding a lexicon variable
# (the spread function from the tidyr package transforms our sentiment
# column into separate columns for negative and positive
# that contains the n counts for each)
summary_bing <- sentiment_bing %>%
group_by(keyword) %>%
count(sentiment, sort = TRUE) %>%
spread(sentiment, n) %>%
mutate(sentiment = positive - negative) %>%
mutate(lexicon = "bing") %>%
relocate(lexicon)
summary_bing
## # A tibble: 2 × 5
## # Groups: keyword [2]
## lexicon keyword negative positive sentiment
## <chr> <chr> <int> <int> <int>
## 1 bing disability 785 364 -421
## 2 bing special needs 355 368 13
4. Repeating the steps above for the remaining lexicons:
1. afinn
# repeating the step above but for afinn lexicon
summary_afinn <- sentiment_afinn %>%
group_by(keyword) %>%
summarise(sentiment = sum(value)) %>%
mutate(lexicon = "afinn") %>%
relocate(lexicon)
summary_afinn
## # A tibble: 2 × 3
## lexicon keyword sentiment
## <chr> <chr> <dbl>
## 1 afinn disability -670
## 2 afinn special needs 221
2. loughran
# repeating bing steps above for loughran lexicon and filtering summary
# to only see positive and negative values
summary_loughran <- sentiment_loughran %>%
group_by(keyword) %>%
count(sentiment, sort = TRUE) %>%
spread(sentiment, n) %>%
mutate(sentiment = positive - negative) %>%
mutate(lexicon = "loughran") %>%
relocate(lexicon)
summary_loughran
## # A tibble: 2 × 9
## # Groups: keyword [2]
## lexicon keyword constraining litigious negative positive superfluous
## <chr> <chr> <int> <int> <int> <int> <int>
## 1 loughran disability 21 52 408 59 1
## 2 loughran special needs 15 48 183 111 NA
## # … with 2 more variables: uncertainty <int>, sentiment <int>
summary_loughran_2 <- summary_loughran %>%
select(lexicon, keyword, negative, positive, sentiment)
summary_loughran_2
## # A tibble: 2 × 5
## # Groups: keyword [2]
## lexicon keyword negative positive sentiment
## <chr> <chr> <int> <int> <int>
## 1 loughran disability 408 59 -349
## 2 loughran special needs 183 111 -72
3. nrc
# repeating above steps for nrc lexicon; also selecting
# only rows that contain "positive" and "negative" b/c
# nrc lexicon contains other values like "trust","sadness"
summary_nrc <- sentiment_nrc %>%
group_by(keyword) %>%
count(sentiment, sort = TRUE) %>%
spread(sentiment, n) %>%
mutate(sentiment = positive - negative) %>%
mutate(lexicon = "nrc") %>%
relocate(lexicon)
summary_nrc
## # A tibble: 2 × 13
## # Groups: keyword [2]
## lexicon keyword anger anticipation disgust fear joy negative positive
## <chr> <chr> <int> <int> <int> <int> <int> <int> <int>
## 1 nrc disability 351 373 263 410 274 778 811
## 2 nrc special needs 199 392 127 216 403 454 724
## # … with 4 more variables: sadness <int>, surprise <int>, trust <int>,
## # sentiment <int>
summary_nrc_2 <- summary_nrc %>%
select(lexicon, keyword, negative, positive, sentiment)
summary_nrc_2
## # A tibble: 2 × 5
## # Groups: keyword [2]
## lexicon keyword negative positive sentiment
## <chr> <chr> <int> <int> <int>
## 1 nrc disability 778 811 33
## 2 nrc special needs 454 724 270
MODEL
The “Modeling” step of the learning analytics workflow involves using statistical models to analyze data and, where possible, make predictions. I’ll also use this section to visualize how sentiment varies by lexicon and what the nature of word choice is in tweets about mothering disabled or special needs children. Because I’m changing the order slightly from the Walkthrough, I’m going to do some “polishing” steps here…
Polishing
dm_text <- dm_tweets %>%
filter(lang == "en") %>%
select(status_id, text) %>%
mutate(keyword = "disability") %>%
relocate(keyword)
snm_text <- snm_tweets %>%
filter(lang == "en") %>%
select(status_id, text) %>%
mutate(keyword = "special needs") %>%
relocate(keyword)
2. Step 2: merging data frames
#merging the two data frames and taking a look
tweets <- bind_rows(dm_text, snm_text)
tweets
## # A tibble: 1,108 × 3
## keyword status_id text
## <chr> <chr> <chr>
## 1 disability 1489620910428487682 "@AlliAlliG I actually havnt really heard of …
## 2 disability 1489652384724836362 "@VermontStudent @michellebfay @selene_colbur…
## 3 disability 1489627282738143234 "@MhairiHunter The Nazis killed lots of disab…
## 4 disability 1489624563474436101 "@allan_cheapshot If you’re telling me that I…
## 5 disability 1489622140039188486 "This doesn't offend me because it's an imper…
## 6 disability 1489620910428487682 "@AlliAlliG I actually havnt really heard of …
## 7 disability 1489616967480877066 "@JoeMillerAS1 I think that ppl should only h…
## 8 disability 1489561390218203136 "I remember all the strange, embarrassing or …
## 9 disability 1489510120295788550 "@HSpearmano I would totes contribute better …
## 10 disability 1489451350534418432 "You CANNOT have the liberty to remove someon…
## # … with 1,098 more rows
head(tweets)
## # A tibble: 6 × 3
## keyword status_id text
## <chr> <chr> <chr>
## 1 disability 1489620910428487682 "@AlliAlliG I actually havnt really heard of t…
## 2 disability 1489652384724836362 "@VermontStudent @michellebfay @selene_colburn…
## 3 disability 1489627282738143234 "@MhairiHunter The Nazis killed lots of disabl…
## 4 disability 1489624563474436101 "@allan_cheapshot If you’re telling me that In…
## 5 disability 1489622140039188486 "This doesn't offend me because it's an impers…
## 6 disability 1489620910428487682 "@AlliAlliG I actually havnt really heard of t…
tail(tweets)
## # A tibble: 6 × 3
## keyword status_id text
## <chr> <chr> <chr>
## 1 special needs 1486765113155674118 "@HaYanGamer420 @Jester22183 @ChrisCampbell…
## 2 special needs 1486762296428974084 "@irbransmom Apparently she's never heard o…
## 3 special needs 1486757250735747077 "Difficulties in divorce settlements can in…
## 4 special needs 1486753389316907011 "A #specialneeds #trust can help care for a…
## 5 special needs 1486745351843454979 "@kgeads17 @GovernorVA I have a special nee…
## 6 special needs 1486694793996431369 "Fury over Muslim boy, 11, with special nee…
3. Step 3: analyzing sentiment from the afinn lexicon
#Cleaning up code for analyzing sentiment from each lexicon
#afinn
customwords <- c("amp" , "im" , "child" , "disabled" ,
"special" , "kid", "1" , "2" , "3" , "4")
sentiment_afinn <- tweets %>%
unnest_tokens(output = word,
input = text,
token = "tweets") %>%
anti_join(stop_words, by = "word") %>%
filter(!word %in% c("im", "special" , "amp" , "disabled",
"child", "kid" , "kids", "#etsy", "1",
"2", "3", "4")) %>%
inner_join(afinn, by = "word")
## Using `to_lower = TRUE` with `token = 'tweets'` may not preserve URLs.
# Dr. J: wondering if I could have done filter(!word == "customwords") %>%
# above instead
sentiment_afinn
## # A tibble: 1,952 × 4
## keyword status_id word value
## <chr> <chr> <chr> <dbl>
## 1 disability 1489620910428487682 lol 3
## 2 disability 1489652384724836362 worry -3
## 3 disability 1489652384724836362 supports 2
## 4 disability 1489627282738143234 killed -3
## 5 disability 1489627282738143234 jokes 2
## 6 disability 1489627282738143234 hate -3
## 7 disability 1489622140039188486 offend -2
## 8 disability 1489622140039188486 offends -2
## 9 disability 1489622140039188486 ashamed -2
## 10 disability 1489620910428487682 lol 3
## # … with 1,942 more rows
afinn_score <- sentiment_afinn %>%
group_by(keyword, status_id) %>%
summarise(value = sum(value))
## `summarise()` has grouped output by 'keyword'. You can override using the
## `.groups` argument.
afinn_score
## # A tibble: 901 × 3
## # Groups: keyword [2]
## keyword status_id value
## <chr> <chr> <dbl>
## 1 disability 1486695407962972164 4
## 2 disability 1486696333209653250 0
## 3 disability 1486699099701264384 3
## 4 disability 1486715162333573125 4
## 5 disability 1486715666556018701 -5
## 6 disability 1486719514221834250 -4
## 7 disability 1486723057536290821 -2
## 8 disability 1486748881920552962 9
## 9 disability 1486750158570209280 1
## 10 disability 1486760075444330501 -3
## # … with 891 more rows
afinn_sentiment <- afinn_score %>%
filter(value != 0) %>%
mutate(sentiment = if_else(value < 0, "negative", "positive"))
afinn_sentiment
## # A tibble: 858 × 4
## # Groups: keyword [2]
## keyword status_id value sentiment
## <chr> <chr> <dbl> <chr>
## 1 disability 1486695407962972164 4 positive
## 2 disability 1486699099701264384 3 positive
## 3 disability 1486715162333573125 4 positive
## 4 disability 1486715666556018701 -5 negative
## 5 disability 1486719514221834250 -4 negative
## 6 disability 1486723057536290821 -2 negative
## 7 disability 1486748881920552962 9 positive
## 8 disability 1486750158570209280 1 positive
## 9 disability 1486760075444330501 -3 negative
## 10 disability 1486761340475740163 -2 negative
## # … with 848 more rows
afinn_ratio <- afinn_sentiment %>%
group_by(keyword) %>%
count(sentiment) %>%
spread(sentiment, n) %>%
mutate(ratio = negative/positive)
afinn_ratio
## # A tibble: 2 × 4
## # Groups: keyword [2]
## keyword negative positive ratio
## <chr> <int> <int> <dbl>
## 1 disability 313 175 1.79
## 2 special needs 149 221 0.674
3. Keyword Differences: graphing positive versus negative tweets for the two keywords
#For keyword 'disability'
afinn_counts_dis <- afinn_sentiment %>%
group_by(keyword) %>%
count(sentiment) %>%
filter(keyword == "disability")
afinn_counts_dis %>%
ggplot(aes(x="", y=n, fill=sentiment)) +
geom_bar(width = .6, stat = "identity") +
labs(title = "Disability, Disabled Child, & Mom",
subtitle = "Proportion of Positive & Negative Tweets") +
coord_polar(theta = "y") +
theme_void()
Sentiment is decidedly more negative with the "disability" keyword than it is with the "special needs" keyword phrase (below).
#Repeat for special needs
afinn_counts_sn <- afinn_sentiment %>%
group_by(keyword) %>%
count(sentiment) %>%
filter(keyword == "special needs")
afinn_counts_sn
## # A tibble: 2 × 3
## # Groups: keyword [1]
## keyword sentiment n
## <chr> <chr> <int>
## 1 special needs negative 149
## 2 special needs positive 221
afinn_counts_sn %>%
ggplot(aes(x="", y=n, fill=sentiment)) +
geom_bar(width = .6, stat = "identity") +
labs(title = "Special Needs and Mom",
subtitle = "Proportion of Positive & Negative Tweets") +
coord_polar(theta = "y") +
theme_void()
1. Calculating sentiment scores for each lexicon and then comparing positive and negative sentiment for each lexicon visually.
# Creating "summary" data frames for each sentiment, parsing out summary scores of positive and negative sentiment.
summary_afinn3 <- sentiment_afinn %>%
group_by(keyword) %>%
filter(value != 0) %>%
mutate(sentiment = if_else(value < 0, "negative", "positive")) %>%
count(sentiment, sort = TRUE) %>%
mutate(method = "afinn")
summary_bing3 <- sentiment_bing %>%
group_by(keyword) %>%
count(sentiment, sort = TRUE) %>%
mutate(method = "bing")
summary_nrc3 <- sentiment_nrc %>%
filter(sentiment %in% c("positive", "negative")) %>%
group_by(keyword) %>%
count(sentiment, sort = TRUE) %>%
mutate(method = "nrc")
summary_loughran3 <- sentiment_loughran %>%
filter(sentiment %in% c("positive", "negative")) %>%
group_by(keyword) %>%
count(sentiment, sort = TRUE) %>%
mutate(method = "loughran")
Next, I'll combine lexicon summaries for summary of sentiment overall and visualize in a graph.
#Combining lexicon summaries to compare positive and negative sentiment scores in each lexicon.
summary_sentiment <- bind_rows(summary_afinn3,
summary_bing3,
summary_nrc3,
summary_loughran3) %>%
arrange(method, keyword) %>%
relocate(method)
total_counts <- summary_sentiment %>%
group_by(method, keyword) %>%
summarise(total = sum(n))
## `summarise()` has grouped output by 'method'. You can override using the
## `.groups` argument.
sentiment_counts <- left_join(summary_sentiment, total_counts)
## Joining, by = c("method", "keyword")
sentiment_counts
## # A tibble: 16 × 5
## # Groups: keyword [2]
## method keyword sentiment n total
## <chr> <chr> <chr> <int> <int>
## 1 afinn disability negative 710 1164
## 2 afinn disability positive 454 1164
## 3 afinn special needs positive 463 788
## 4 afinn special needs negative 325 788
## 5 bing disability negative 785 1149
## 6 bing disability positive 364 1149
## 7 bing special needs positive 368 723
## 8 bing special needs negative 355 723
## 9 loughran disability negative 408 467
## 10 loughran disability positive 59 467
## 11 loughran special needs negative 183 294
## 12 loughran special needs positive 111 294
## 13 nrc disability positive 811 1589
## 14 nrc disability negative 778 1589
## 15 nrc special needs positive 724 1178
## 16 nrc special needs negative 454 1178
[***Positive and Negative Sentiment by Lexicon***]{.smallcaps}
| Method | Keyword | Sentiment | n |
|----------|---------------|-----------|-----|
| afinn | disability | negative | 758 |
| afinn | disability | positive | 471 |
| afinn | special needs | positive | 509 |
| afinn | special needs | negative | 358 |
| bing | disability | negative | 819 |
| bing | disability | positive | 398 |
| bing | special needs | negative | 407 |
| bing | special needs | positive | 394 |
| loughran | disability | negative | 476 |
| loughran | disability | positive | 62 |
| loughran | special needs | negative | 210 |
| loughran | special needs | positive | 134 |
| nrc | disability | negative | 901 |
| nrc | disability | positive | 830 |
| nrc | special needs | positive | 713 |
| nrc | special needs | negative | 481 |
| afinn | disability | negative | 758 |
#converting the sentiment scores to percentages for easier visualization
sentiment_percents <- sentiment_counts %>%
mutate(percent = n/total * 100)
sentiment_percents
## # A tibble: 16 × 6
## # Groups: keyword [2]
## method keyword sentiment n total percent
## <chr> <chr> <chr> <int> <int> <dbl>
## 1 afinn disability negative 710 1164 61.0
## 2 afinn disability positive 454 1164 39.0
## 3 afinn special needs positive 463 788 58.8
## 4 afinn special needs negative 325 788 41.2
## 5 bing disability negative 785 1149 68.3
## 6 bing disability positive 364 1149 31.7
## 7 bing special needs positive 368 723 50.9
## 8 bing special needs negative 355 723 49.1
## 9 loughran disability negative 408 467 87.4
## 10 loughran disability positive 59 467 12.6
## 11 loughran special needs negative 183 294 62.2
## 12 loughran special needs positive 111 294 37.8
## 13 nrc disability positive 811 1589 51.0
## 14 nrc disability negative 778 1589 49.0
## 15 nrc special needs positive 724 1178 61.5
## 16 nrc special needs negative 454 1178 38.5
sentiment_percents %>%
ggplot(aes(x = keyword, y = percent, fill=sentiment)) +
geom_bar(width = .8, stat = "identity") +
facet_wrap(~method, ncol = 1) +
coord_flip() +
labs(title = "Public Sentiment on Twitter",
subtitle = "Disability vs. Special Needs and Mom",
x = "Keyword",
y = "Percentage of Words")
summary_sentiment
## # A tibble: 16 × 4
## # Groups: keyword [2]
## method keyword sentiment n
## <chr> <chr> <chr> <int>
## 1 afinn disability negative 710
## 2 afinn disability positive 454
## 3 afinn special needs positive 463
## 4 afinn special needs negative 325
## 5 bing disability negative 785
## 6 bing disability positive 364
## 7 bing special needs positive 368
## 8 bing special needs negative 355
## 9 loughran disability negative 408
## 10 loughran disability positive 59
## 11 loughran special needs negative 183
## 12 loughran special needs positive 111
## 13 nrc disability positive 811
## 14 nrc disability negative 778
## 15 nrc special needs positive 724
## 16 nrc special needs negative 454
4. Visualizing Word Choice Through Word Clouds
1. Overall wordcloud
#Now I want to create a wordcloud of the tweets. However, there are
#too many tweets to visualize. I'll choose the top 50.
top_tokens_all <- tidy_tweets %>%
count(word, sort = TRUE) %>%
top_n(50)
## Selecting by n
wordcloud2(top_tokens_all)
#Some words are kind of irrelevant (1, 2, hes), but it does give a glimpse #overall of the top words. I'm interested to see that "support" is in the top 50 #since that is an aspect of the phenomenon of interest to me.
# Below I'm going to try to filter out o to see if words it looks any
#different (in revising, I think I filtered these out a little earlier)
top_tokens_all <- tidy_tweets %>%
filter(!word %in% c("im", "special" , "amp" , "disabled",
"child", "kid" , "kids", "#etsy", "1",
"2", "3", "4")) %>%
count(word, sort = TRUE) %>%
top_n(50)
## Selecting by n
wordcloud2(top_tokens_all)
2. "Disability" wordcloud
##Tokenizing text - Disability and Mom
tweet_tokens_dm <-
dm_tweets %>%
unnest_tokens(output = word,
input = text,
token = "tweets")
## Using `to_lower = TRUE` with `token = 'tweets'` may not preserve URLs.
tidy_tweets_dm <-
tweet_tokens_dm %>%
anti_join(stop_words, by = "word") %>%
filter(!word %in% c("im", "special" , "amp" , "disabled",
"child", "kid" , "kids", "#etsy", "1",
"2", "3", "4"))
count(tidy_tweets_dm, word, sort = T)
## # A tibble: 4,436 × 2
## word n
## <chr> <int>
## 1 dont 62
## 2 parent 59
## 3 life 54
## 4 people 51
## 5 support 45
## 6 children 43
## 7 school 37
## 8 hard 36
## 9 care 35
## 10 woman 35
## # … with 4,426 more rows
#selecting top 50 disability & mom tokens
top_tokens_dm <- tidy_tweets_dm %>%
count(word, sort = TRUE) %>%
top_n(50)
## Selecting by n
wordcloud2(top_tokens_dm)
3. "Special Needs" wordcloud
##Tokenizing text - Special Needs and Mom
tweet_tokens_snm <-
snm_tweets %>%
unnest_tokens(output = word,
input = text,
token = "tweets")
## Using `to_lower = TRUE` with `token = 'tweets'` may not preserve URLs.
tidy_tweets_snm <-
tweet_tokens_snm %>%
anti_join(stop_words, by = "word") %>%
filter(!word %in% c("im", "special" , "amp" , "disabled",
"child", "kid" , "kids", "#etsy", "1",
"2", "3", "4"))
count(tidy_tweets_snm, word, sort = T)
## # A tibble: 2,875 × 2
## word n
## <chr> <int>
## 1 addition 82
## 2 bite 81
## 3 childautismcerebral 81
## 4 excited 81
## 5 palsyarthritisautoaggression 81
## 6 share 81
## 7 shop 81
## 8 #protectivegloves 67
## 9 #cerebralpalsytoys 62
## 10 #bitingmittens 55
## # … with 2,865 more rows
#selecting top 50 disability & mom tokens
top_tokens_snm <- tidy_tweets_snm %>%
count(word, sort = TRUE) %>%
top_n(50)
## Selecting by n
wordcloud2(top_tokens_snm)
COMMUNICATE
Select Research Questions of Interest. As noted above, my primary research questions for this study are:
What is the overall sentiment of recent tweets on the topic of mothers parenting children with disabilities?
Does sentiment vary based on keyword (disability vs. special needs)?
Does sentiment vary by lexicon?
Polish. I did most of my “polishing” in the “Model” section, so not much to report here…
Narrate
Purpose: Mothers experience disproportionate role stress when caring for young children, potentially limiting time and resources for self-care and career advancement. These demands are increased when mothers care for children with special needs. This study reviewed sentiment in recent Twitter posts to investigate sentiment surrounding motherhood while caring for children with special needs.
Methods. Using the twitter API, I searched recent tweets (past six-nine days) for the following keywords:
| Keyword | Search Term |
|---|---|
| special needs | #specialneeds AND mom |
| #specialneeds AND mother | |
| special needs mom | |
| special needs kid | |
| special needs child | |
| disabled | #disabledchild AND mom |
| #disabledchild AND mother | |
| disabled child mom | |
| disabled kid mom | |
| disabled kid | |
| disabled child |
I then tokenized the text of the tweets, loaded lexicons (afinn, bing, nrc, loughran), created dictionaries, and analyzed sentiment of the tokenized tweets. Next, I visualized sentiment overall and between the keywords “disability” and “special needs”. Lastly, I visualized the top 50 words overall, and those associated with each keyword, in wordclouds.
Findings. Recent tweets are more positive for terms including “special needs” and “mom” versus “disability” and “mom” across all lexicons. TTop words overall include: don’t, parent, shop, share, kids, and bite. Top words for disability and mom tweets include: parent, don’t, kids, school, people, care and (position 8) support. Top words for special needs and mom tweets include: share, addition, bite, excited, childautismcerebral, and palsyarthritisautoaggression.
Discussion. The word disability in tweets seems to be associated with more negative sentiment, which may indicate pejorative connotations of this word versus special needs. Some research has shown that mothers who themselves have disabilities are a highly stressed group (Lee, 2004), and it is unclear whether some tweets in this category may reflect not mothers of disabled children but disabled mothers. However, the difference in sentiment may also reflect inaccuracies in the lexicons themselves, which may ascribe positive sentiment to a word like “special” but a negative one to “disabled”. Terms such as “school”, “bite”, and other references to self-injurious behavior indicate that day-to-day concerns of safety and the educational environment dominate the recent Twitter discourse on mothering children with special needs–the focus is pragmatic. The frequency of the words “support” and “share” merit further investigation, in my opinion, as both may address support systems (familial, community, national) that are either present or absent for mothers raising children with special needs.
REFERENCES
Kim, J. (February 2, 2021). The mothers who already left. New York Magazine. https://www.thecut.com/2021/02/i-always-thought-id-be-a-working-mom.html
Lee, S., Oh, G., Hartmann, H., Gault, B. (February, 2004). The impact of disabilities on mothers’ work participation: Examining differences between single and married mothers. Institute for Women’s Policy Research, Washington, DC.
Ogrysko, N. (December 9, 2019). Lawmakers unveil details of ‘historic’ federal paid parental leave benefits. Federal News Network. Accessed from https://federalnewsnetwork.com/workforce/2019/12/lawmakers-unveil-details-of-historic-federal-paid-parental-leave-benefits/ on November 3, 2021.
Stewart, N. (July 28, 2020). When caring for your child’s needs becomes a job all on its own. The New York Times. https://www.nytimes.com/2020/07/24/us/children-disabilities-parenting-poverty-assistance.html