Background and Research Question
Over just the past month, the COVID-19 outbreak has quickly become a global emergency. While the coronavirus is severely contagious and deadly, people respond with mixed feelings on social media. According to Feng Lim’s analysis of 15,000 tweets with #Coronavirus and #COVID19 between January 30 to March 15, 2020, false is the most frequent words that appeared, which suggests lack of understanding and misconceptions regarding the virus [1]. On the other hand, people over the world do express the sadness. Manlio De Domenico, a scientist at Italy’s Bruno Kessler Foundation’s Center for Information and Communication Technology, analyzed millions of coronavirus tweets and found that the whole world is in the negative sentiments [2].
As the more strict governmental regulations are issued, and the social distancing is practiced worldwide to respond to the exponential increase of coronavirus cases, this project is interested in whether and how people change their attitudes towards COVID-19. Specifically, the project would like to answer the following research questions:
How did people respond to the Coronavirus as the worldly COVID-19 cases have risen to 2 million on April 14?
What are people’s primary concerns during the pandemic season?
Exploratory Data Analysis
Data Summary
The dataset used for this project extracted by Shane Smith and uploaded on Kaggle [3]. It includes 449492 tweets on April 14 with the following hashtags: #coronavirus, #coronavirusoutbreak, #coronavirusPandemic, #covid19, #covid_19, #epitwitter, #ihavecorona, #StayHomeStaySafe, #TestTraceIsolate.
The dataset contains 22 variables associated with Twitter. For the purpose of text mining and sentiment analysis, this project would particularly focus on tweets of which the language is English and the following variables:
- created_at: The date and time of the tweet.
- text: The text of the tweet
- country_code: The location of the tweet
- favourites_count: The number of favourites this tweet has received.
- retweet_count: The number of retweets this tweet has received.
Exploratory Analysis
#load libraries
suppressMessages(library(tidytext))
suppressMessages(library(stringr))
suppressMessages(library(readr))
suppressMessages(library(knitr))
suppressMessages(library(tidyverse))
suppressMessages(library(wordcloud))
suppressMessages(library(textdata))
suppressMessages(library(plotrix))
suppressMessages(library(radarchart))
suppressMessages(library(ggplot2))
suppressMessages(library(choroplethr))
suppressMessages(library(choroplethrMaps))
suppressMessages(library(sentimentr))
suppressMessages(library(sjmisc))#get data
tweets0414 <- read_csv("2020-04-14 Coronavirus Tweets.csv")
tweets <- tweets0414[tweets0414$lang == 'en',] #retrieve English tweets1. Geographical Tweet Distribution
Most of tweets are mainly sent from US, followed Canada, UK, Nigeria, and India.
data(country.regions) #a dataset that contains country names in different versions from choroplethr
countryname<-as.data.frame(country.regions) #convert it as a dataframe
plotdata <- tweets %>%
filter(is.na(country_code) ==FALSE) %>% #filter na value
rename(iso2c = country_code) %>% #rename column
left_join(countryname) %>% #join countryname
count(region,sort = T) %>% #count region
rename(value = n) %>% #rename column
select(region, value) #select region and value
labs <- data.frame(region =tail(plotdata[order(plotdata$value),],5)$region, #5 region that tweets most frequently
lon = c(5.44,-105,77.3,-2.6,-93.4), #longtitude
lat = c(13.1,60,31.7,51.5,35)) #latitude
nplotdata <- plotdata %>% left_join(labs) #join labs
country_choropleth(nplotdata, num_colors = 1) + #plot choropleth
scale_fill_gradient(high = "#e34a33", low = "#fee8c8", #set color by stats
guide ="colorbar", na.value="white", name="Counts of Tweets") +
geom_point(data = as.data.frame(nplotdata), #mark the top 5 regions
aes(x = lon, y = lat),
inherit.aes = FALSE,
color = ifelse(is.na(nplotdata$lon)==F & nplotdata$value >1500, 'navy', 'blue'),
size = ifelse(is.na(nplotdata$lon)==F, plotdata$value/100, 0),
alpha = .6) +
geom_point(data = as.data.frame(nplotdata), #mark the top 5 regions
aes(x = lon, y = lat),
inherit.aes = FALSE,
color = 'green',
size = 1) +
geom_text(data = as.data.frame(nplotdata), #annotate the top 5 regions
aes(x = lon, y = lat,
label = toupper(region) , vjust = 1.5, hjust = 0.5), color='black',
inherit.aes = FALSE)+
labs(title = "Tweets by Country") +
theme(plot.title = element_text(size = 14, face = "bold")) 2. Top 5 favorite Tweets
Global
The most favorite tweets are relative to negative messages, such as overly hoarding supplies, government’s insufficient response to COVID-19, the homeless’s relocation, and modified logistics.
fav<-tweets %>%
#order the tweets descendingly by counts of favorites
arrange(desc(favourites_count)) %>%
#select the text and count
select(text,favourites_count) %>%
#get the top 5
head(5)
kable(fav,format = "html")| text | favourites_count |
|---|---|
| We are not #InThisTogether if people have to go about hoarding supplies from stores & leaving others defenseless without the necessary supplies like what the heck is wrong with you people stop being a hoarder!!! 😡😤 #COVID19 | 1989070 |
| It’s absolutely irresponsible and reckless for Trump to be talking about “opening up” the country when TODAY ALONE there were 24,215 new #COVID cases and 2,284 fatalities from #coronavirus and there are no signs of the #COVID19 outbreak slowing. | 1543111 |
| Ordine presidenziale #civid19 #coronavirus BBC News - Coronavirus: Amazon ordered to deliver only essential items in France https://t.co/hxlo2NFrEj | 1122418 |
| Messaggi disorientanti # UK #covid19 #coronavirus government’s coronavirus response beset by mixed messages and U-turns https://t.co/WxxGpEpH8f | 1122107 |
| Nessuna pietà per i senza casa #covid19 #coronavirus Hotels sit vacant during the pandemic. But some locals don’t want homeless people moving in. https://t.co/cuXe9z4JdU | 1122107 |
US
On the other hand, people located in the US are concerned about reduction of vital public service, decline of economy, and severe cybersecurity due to work from home.
fav_us <- tweets %>%
#get tweets located in the US
filter(country_code == 'US' & is.na(country_code) == F) %>%
#order the tweets descendingly by counts of favorites
arrange(desc(favourites_count)) %>%
#select the text and count
select(text,favourites_count) %>%
#get the top 5
top_n(5)
kable(fav_us, format = "html")| text | favourites_count |
|---|---|
| More than 2,100 US cities brace for huge budget shortfalls that will lead to thousands of layoffs, cuts in vital services and less cops on the streets during the #coronavirus pandemic https://t.co/8SdwHycfAE #EconTwitter #economy #publicservices | 570236 |
| A 500% increase in attacks related to #workfromhome individuals as a result of the #coronavirus pandemic #Security #Cybersecurity #Hackers #Databreach #Cybercrime #DataPrivacy #Ransomware #Cyberattacks #CSO #Infosec #Malware #CIS #CyberDefense #WFM https://t.co/6j4IyM5MX1 | 570234 |
| #AirTravel in the era of #coronavirus 😯 #airlines #aviation #aviationinlockdown #avgeek #avgeeks https://t.co/2EfFuXzuga | 570234 |
| #Taiwan steps up the fight against #COVID19 @MHiesboeck #COVID2019 https://t.co/ieypNL5I2D | 570096 |
| #Google to Display More Virtual #Healthcare Options in Search and Maps https://t.co/ymEh57HQMy… #telehealth #TelemedNow @IrmaRaste @eViRaHealth @HealthTap #COVID19 #COVID https://t.co/6qDhnMZLjx | 570089 |
3. Top 5 Retweeted Tweets
Global
For tweets that were retweeted the most, the contents display mixed sentiments: while some spread the positive and correct messages like social distancing and prevention tips, others talk about the conspiracy and lies about the virus.
rt<-tweets %>%
#order the tweets descendingly by counts of retweets
arrange(desc(retweet_count)) %>%
#select the text and count
select(text,retweet_count) %>%
#get the top 5
top_n(5)
kable(rt,format = 'html')| text | retweet_count |
|---|---|
|
We can fight the spread of COVID-19 together by sticking to the basics. 💓 Follow the steps to protect yourself with BT21! 📝 #COVID19 #Coronavirus #Prevention #tips #StayatHome #SocialDistancing #SelfQuarantine #FlatteningtheCurve #BT21 https://t.co/obH9uFvxu7 |
18350 |
|
We can fight the spread of COVID-19 together by sticking to the basics. 💓 Follow the steps to protect yourself with BT21! 📝 #COVID19 #Coronavirus #Prevention #tips #StayatHome #SocialDistancing #SelfQuarantine #FlatteningtheCurve #BT21 https://t.co/2FgwSefIbq |
16720 |
| #Wapo has legit bombshell indicating the #coronavirus was created by & escaped from a #Chinese lab experimenting on bats, which means the whole wet market story was just BS cover for a bio-experiment fuckup of epic proportion. | 12484 |
| We’ve hit PEAK #COVID19 INTERNET https://t.co/GsY78RzBPK | 12350 |
|
Dear @LindseyGrahamS: You lie. The impeachment trial ended Feb 5. Democrats in the House started writing legislation to address the pandemic in FEBRUARY. Democrats in the House held hearings on the #coronavirus in FEBRUARY. #FactsMatter https://t.co/rlN5nwGiHu |
11600 |
US
In the US, people publicize the information regarding virus testing, social distancing, and the need to severe patients, as well as political criticism.
rt_us<-tweets %>%
#get tweets located in the US
filter(country_code == 'US' & is.na(country_code) == F) %>%
#order the tweets descendingly by counts of retweets
arrange(desc(retweet_count)) %>%
#select the text and count
select(text,retweet_count) %>%
#get the top 5
top_n(5)
kable(rt_us,format = "html")| text | retweet_count |
|---|---|
| PRAYER REQUEST This brother in arms & his wife are both veterans. She has breast cancer, he has bladder cancer. #Covid_19 has complicated their situation with not being able to have anyone around, meaning no support. They are isolated at home. Please this vet couple up in prayer. https://t.co/G5MoNfthjH | 235 |
|
Our city operated testing sites are now open to anyone who would like a test. 1,000 tests per day. Please call 832-393-4220 to get you unique code. We started with 18 operators, ramped up to 25, and tomorrow we will have 50 operators with the increase demand. #COVID19 |
149 |
|
Florida Surgeon General suggests social distancing measures should go on until there is a vaccine and is scurried away by governor’s team b/c @GovRonDeSantis is probably going to make some bad policy decisions instead 😒 #coronavirusoutbreak |
99 |
|
Do you have a close family member or close friend that is in the Trump cult? If so, has the undisputedly lethally inept #coronavirus response by Trump made a dent in their support of him? |
88 |
| The ambassador of China has been summoned to the French foreign ministry after a media campaign of its embassy to criticize the handling of #COVID19 by FRANCE (as a way of whitewashing China of any responsibility in the pandemic). https://t.co/wnskIWwyRb | 80 |
Method
Based on the geographical distribution of tweets in the exploratory analysis section, the project would like to focus on people’s responses on Twitter to the coronavirus both worldwide and in the United States.
In terms of methodology, text mining is used to answer the research questions. The project uses tokenization to extract word-unit information from tweets. By counting word frequency, the project explores people’s top concerns during the coronavirus. By performing sentiment analysis using bing and nrc lexicon in word level and applying sentimentr package in sentence level, the project aims to examine people’s attitudes in specific context.
1. Word Frequency
Word tokenization is applied before formal analysis:
remove_reg <- "&|<|>" #regular expression
newstops <- c('covid_19','covid-19','covid 19','coronavirus','covid19', '#coronavirus', '#coronavirusoutbreak', '#coronavirusPandemic', '#covid19', '#covid_19', '#epitwitter', '#ihavecorona', '#StayHomeStaySafe', '#TestTraceIsolate') #hashtags that need to be removed
tidy_tweets <- tweets %>%
mutate(text = str_remove_all(text, remove_reg)) %>% #remove regular expression
unnest_tokens(word, text, token = 'tweets',strip_url = TRUE) %>% #work tokenizations
filter(!word %in% stop_words$word, #remove stopwords
!word %in% str_remove_all(stop_words$word, "'"),
!word %in% newstops, #remove those hashtags
str_detect(word, "[a-z]"))Most Frequent Words Worldwide
- 10 Most Frequent Words
#get words and their frequency
frequency_global <- tidy_tweets %>% count(word, sort=T)
#get the top 10
frequency_global %>% top_n(10)## # A tibble: 10 x 2
## word n
## <chr> <int>
## 1 people 26899
## 2 support 18246
## 3 pandemic 14664
## 4 time 11084
## 5 health 10969
## 6 economy 10799
## 7 million 10434
## 8 deaths 9705
## 9 home 9382
## 10 #stayhome 8844
- WordCloud
wordcloud(frequency_global$word,frequency_global$n, min.freq = 2200,
scale=c(4.5, .2), random.order = FALSE, random.color = FALSE,
colors = brewer.pal(8, "Dark2"), res=800)From above, “people”, “support”, “pandemic”, “health”, and “economy” are mentioned most by people on Twitter.
Most Frequent Words in the US
- 10 Most Frequent Words
#get cleaned tweets that are located in the US
tidy_us <- tidy_tweets[is.na(tidy_tweets$country_code)==F & tidy_tweets$country_code == "US", ]
#get words and their frequency
frequency_us <- tidy_us %>% count(word, sort=T)
#get the top 10
frequency_us %>% top_n(10)## # A tibble: 10 x 2
## word n
## <chr> <int>
## 1 people 556
## 2 economy 311
## 3 #stayhome 303
## 4 million 293
## 5 student 278
## 6 package 271
## 7 debt 264
## 8 urge 263
## 9 #cancelstudentdebt 261
## 10 #studentdebtstimulus 259
- Word Cloud
wordcloud(frequency_us$word,frequency_us$n, min.freq =50, scale=c(4.5, .2), random.order = FALSE, random.color = FALSE,colors = brewer.pal(8, "Dark2"), res=800)Besides worldly concerns, student debt is the main discussion among people in the US. President Trump is also frequently mentioned.
2. Word-Level Sentiment Analysis
(a) Positive/Negative Sentiment
Globally, people present a relative negative attitude on Twitter during the pandemic.
tweets_bing<-tidy_tweets%>%
# Implement sentiment analysis using the "bing" lexicon
inner_join(get_sentiments("bing"))
perc<-tweets_bing %>%
count(sentiment)%>% #count sentiment
mutate(total=sum(n)) %>% #get sum
group_by(sentiment) %>% #group by sentiment
mutate(percent=round(n/total,2)*100) %>% #get the proportion
ungroup()
label <-c( paste(perc$percent[1],'%',' - ',perc$sentiment[1],sep=''),#create label
paste(perc$percent[2],'%',' - ',perc$sentiment[2],sep=''))
pie3D(perc$percent,labels=label,labelcex=1.1,explode= 0.1,
main="Worldwide Sentiment") #create a pie chartSentiment Word Frequency
Global
People have negative feelings towards the death and virus, especially the economic side effects of the pandemic as “debt” appears 8370 times in global tweets and is the most common negative words. However, people are happy about the “support” and, interestingly, “Trump.”
top_words <- tweets_bing %>%
# Count by word and sentiment
count(word, sentiment) %>%
group_by(sentiment) %>% #group ny sentiment
# Take the top 10 for each sentiment
top_n(10) %>%
ungroup() %>%
# Make word a factor in order of n
mutate(word = reorder(word, n))
#plot the result
ggplot(top_words, aes(word, n, fill = sentiment)) +
geom_col(show.legend = FALSE) +
geom_text(aes(label = n, hjust=1), size = 3.5, color = "black") +
facet_wrap(~sentiment, scales = "free") +
coord_flip() +
ggtitle("Most Common Positive and Negative words (Global)") +
theme(plot.title = element_text(size = 14, face = "bold",hjust = 0.5))US
While the most common negative word in the US is still “debt,” “stimulate” is the most frequently appeared among positive words. However, we still need to evaluate the context around these words.
top_words_us <- tidy_us %>%
# Implement sentiment analysis using the "bing" lexicon
inner_join(get_sentiments("bing")) %>%
# Count by word and sentiment
count(word, sentiment) %>%
group_by(sentiment) %>%
# Take the top 10 for each sentiment
top_n(10) %>%
ungroup() %>%
# Make word a factor in order of n
mutate(word = reorder(word, n))
#plot the result above
ggplot(top_words_us, aes(word, n, fill = sentiment)) +
geom_col(show.legend = FALSE) +
geom_text(aes(label = n, hjust=1), size = 3.5, color = "black") +
facet_wrap(~sentiment, scales = "free") +
coord_flip() +
ggtitle("Most common positive and negative words (US)") +
theme(plot.title = element_text(size = 14, face = "bold",hjust = 0.5)) (b) NRC Emotional Lexicon
General Sentiments
People primarily express trust, fear and anticipation in tweets.
tidy_tweets %>%
# implement sentiment analysis using the "nrc" lexicon
inner_join(get_sentiments("nrc")) %>%
# remove "positive/negative" sentiments
filter(!sentiment %in% c("positive", "negative")) %>%
#get the frequencies of sentiments
count(sentiment,sort = T) %>%
#calculate the proportion
mutate(percent=100*n/sum(n)) %>%
select(sentiment, percent) %>%
#plot the result
chartJSRadar(showToolTipLabel = TRUE, main = "NRC Radar")Sentiment Word Frequency
Global
People display anger, disgust, fear, and sadness toward the pandemic and related death. People also show trust in the economy and president, but they are surprised about Trump.
tidy_tweets %>%
# implement sentiment analysis using the "nrc" lexicon
inner_join(get_sentiments("nrc")) %>%
# remove "positive/negative" sentiments
filter(!sentiment %in% c("positive", "negative")) %>%
#get the frequencies of sentiments of words
count(word,sentiment) %>%
group_by(sentiment) %>%
top_n(10) %>%
ungroup() %>%
mutate(word=reorder(word,n)) %>%
#plot the sentiment word frequency
ggplot(aes(x=word,y=n,fill=sentiment)) +
geom_col(show.legend = FALSE) +
facet_wrap(~ sentiment, scales = "free") +
coord_flip() +
ggtitle(label = "Sentiment Word Frequency (Global)") +
theme(plot.title = element_text(size = 14, face = "bold",hjust = 0.5))US
Similarly, people located in the US are unhappy with the virus, but compared to the pandemic itself, debt seems to cause more sadness.
tidy_us %>%
# implement sentiment analysis using the "nrc" lexicon
inner_join(get_sentiments("nrc")) %>%
# remove "positive/negative" sentiments
filter(!sentiment %in% c("positive", "negative")) %>%
#get the frequencies of sentiments of words
count(word,sentiment) %>%
group_by(sentiment) %>%
top_n(10) %>%
ungroup() %>%
mutate(word=reorder(word,n)) %>%
#plot the sentiment word frequency
ggplot(aes(x=word,y=n,fill=sentiment)) +
geom_col(show.legend = FALSE) +
facet_wrap(~ sentiment, scales = "free") +
coord_flip() +
ggtitle(label = "Sentiment Word Frequency (US)") +
theme(plot.title = element_text(size = 14, face = "bold",hjust = 0.5))3. Sentence-Level Sentiment Analysis
In the previous sections, “support,” “trump,” “stimulate,” and “debt” appear most frequently among positive words. In this section, these words are examined in the context of sentences to understand people’s responses toward critical topics on Twitter fully.
The sentimentr package by Tyler Rinker is applied to analyze tweets that contain each of the words. Instead of matching words back to a dictionary of words labeled as “positive,” “negative,” or “neutral,” sentimentr would account for valence shifters such as negators, amplifiers, and de-amplifiers and output a sentiment score of a sentence by averaging the sentiment scores of words it contains.
“Support”
The tweets that contain “support” are about channels to donote to support health institutions and local businesses.
#get tweets that contain "support"
support<-tweets[sapply(1:nrow(tweets), function(x) str_contains(tolower(tweets$text[x]), "support")),]
#View(support$text)
head(support$text)## [1] "Our Founders are only prepared to support a different city weekly. Our focus has shifted more towards New Orleans. Unemployment rate has sky rocketed over the last 3-weeks. Also if you can support please direct message.GOD BLESS. \n#NewOrleans #coronavirus #usaCoronavirus #cashapp"
## [2] "\U0001f4e2 The Deeper their States of mind scrambles for support the more they show their\U0001f918. Symbolism will be their downfall \n\U0001f53b\nNorth Carolina woman gets coronavirus despite staying home for three weeks \U0001f914...\n#NorthCarolina \n#CoronavirusOutbreak\nCLICK LINK\nhttps://t.co/Vv4kqV3VVi"
## [3] "JCF guest @ljiresearch's Dr. @EOSaphire & other #SanDiego researchers featured in @sdut for their work studying #coronavirus. Support our #PublicHealth #COVID19 Fund with your #donoradvisedfund or donate to support institutions doing this critical work. https://t.co/0UBoCs6NL1 https://t.co/jEHjY3aAwI"
## [4] "IN THE HOME--Kieding Senior Project Manager Jaime Brunner gets into some Friday at-home work. #takeabreath #kieding #doingourpart #kiedingith #supportingsmallbusiness #workathome #vigilance #coronavirus https://t.co/sL2WypY7n3"
## [5] "Link to Cauyunan appeal, please see \n\nhttps://t.co/DsioYKIrT1\n\nHow you can help, please see \n\nhttps://t.co/cIBtp9FCmt\n\nWe thank you for your continuing support and generosity. We are in this together!\n\n#resilienceinthetimeofcovid19\n#weareinthistogether\n#Covid_19"
## [6] "Thank you, Senator Gillibrand for your support of the Capital Region’s incredible medical professionals and local small businesses. #Covid19 #TroyNY #coronavirus https://t.co/zzqLyP9aCL"
Most of the related tweets tend to be optimistic given the positive mean and median, but there are extreme outliers with the most negative review being -1.11.
# get average sentiment score for each sentence
sentiment_support <- sentiment_by(get_sentences(support$text))
summary(sentiment_support$ave_sentiment)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.10780 0.09751 0.16013 0.15320 0.18237 1.48076
#plot the score distribution
ggplot(sentiment_support,aes(ave_sentiment)) +
geom_histogram(bins = 50) +
labs(title = "Sentiment Histogram of Tweets that Contain 'Support' ", x = "Sentiment Score") +
theme_bw() +
theme(plot.title = element_text(size = 14, face = "bold",hjust = 0.5)) +
geom_vline(xintercept = 0, color = "red") “Trump”
In the tweets that contain “trump,” people express lots of criticism regarding President Trump’s inaction to cope with the Coronavirus.
#get tweets that contain "trump"
trump<-tweets[sapply(1:nrow(tweets), function(x) str_contains(tolower(tweets$text[x]), "trump")),]
#View(trump$text)
head(trump$text)## [1] "@BamaStephen When the World looks back on the #TrumpPresidency they will see how a smarter & more Compassionate person,would have taken Action in Feb. instead of calling the #coronavirus a Hoax & no more than a typical flu bug!Thousands died b/c of the #IdiotPresident #Trump"
## [2] "\U0001f534 LIVE PODCAST: CWR#866 4_13_20 on @Spreaker #china #coronavirus #fauci #pharma #trump https://t.co/HYfVXGJZxr"
## [3] "Another reason why the Trump propaganda unit should be dismantled due to gross incompetence...this video just skipped the month of February. So basically the Trump Admin did nothing that whole month. Well done. #Trump #coronavirus https://t.co/g5LugStcs6"
## [4] "US's global reputation hits rock-bottom over Trump's #coronavirus response\n\nhttps://t.co/scYsse1IFK"
## [5] "\U0001f51d #PlattsCommodityNews Americas Apr 13\n\U0001f4f0 WTI retreats as market weighs OPEC+ cuts | https://t.co/f1I7GyfTfz\n\U0001f4f0 Baker Hughes plans $15B impairment, citing #coronavirus | https://t.co/s27hN8m9Nw\n\U0001f3a7 Podcast: Has Trump found religion on low oil prices | https://t.co/cmUZGwQBFG https://t.co/4mLLt8xxET"
## [6] "@POTUSrox @ScottPresler @realDonaldTrump We Dems will watch you Reps die from #coronavirus and will take over government in the 2020 election. Darwin was right!"
Most of the related tweets tend to be moderately negative given the negative mean and median, but there are extreme outliers with the most positive review being 1.3 and the most negative being -1.48.
# get average sentiment score for each sentence
sentiment_trump <- sentiment_by(get_sentences(trump$text))
summary(sentiment_trump$ave_sentiment)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.48464 -0.17613 -0.05180 -0.06594 0.04564 1.30479
#plot the score distribution
ggplot(sentiment_trump,aes(ave_sentiment)) +
geom_histogram(bins = 50) +
labs(title = "Sentiment Histogram of Tweets that Contain 'Trump' ", x = "Sentiment Score") +
theme_bw() +
theme(plot.title = element_text(size = 14, face = "bold",hjust = 0.5)) +
geom_vline(xintercept = 0, color = "red")“Stimulate”
The word “stimulate” seems to primarily come from the hashtag “#StudentDebtStimulus.” These tweets urge governors to cancel student debt as a way to stimulate economy.
#get tweets that contain "stimulate"
stimulate<-tweets[sapply(1:nrow(tweets), function(x) str_contains(tolower(tweets$text[x]), "stimulate")),]
#View(stimulate$text)
head(stimulate$text)## [1] ".@SenSchumer, I urge you to #cancelstudentdebt in the next #coronavirus package. A #StudentDebtStimulus will help the 45 million people with student debt and stimulate the economy when it is needed most."
## [2] ".@Kilili_Sablan, I urge you to #cancelstudentdebt in the next #coronavirus package. A #StudentDebtStimulus will help the 45 million people with student debt and stimulate the economy when it is needed most."
## [3] ".@Senatemajldr, I urge you to #cancelstudentdebt in the next #coronavirus package. A #StudentDebtStimulus will help the 45 million people with student debt and stimulate the economy when it is needed most."
## [4] ".@GOPLeader, I urge you to #cancelstudentdebt in the next #coronavirus package. A #StudentDebtStimulus will help the 45 million people with student debt and stimulate the economy when it is needed most."
## [5] ".@RepBonamici, I urge you to #cancelstudentdebt in the next #coronavirus package. A #StudentDebtStimulus will help the 45 million people with student debt and stimulate the economy when it is needed most."
## [6] ".@CongressmanGT, I urge you to #cancelstudentdebt in the next #coronavirus package. A #StudentDebtStimulus will help the 45 million people with student debt and stimulate the economy when it is needed most."
The range of sentiments is small. More than 75% of tweet contents are moderately negative.
# get average sentiment score for each sentence
sentiment_stimulate <- sentiment_by(get_sentences(stimulate$text))
summary(sentiment_stimulate$ave_sentiment)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.40367 -0.03130 -0.03130 -0.02837 -0.03130 0.82957
#plot the score distribution
ggplot(sentiment_stimulate,aes(ave_sentiment)) +
geom_histogram(bins = 50) +
labs(title = "Sentiment Histogram of Tweets that Contain 'stimulate' ", x = "Sentiment Score") +
theme_bw() +
theme(plot.title = element_text(size = 14, face = "bold",hjust = 0.5)) +
geom_vline(xintercept = 0, color = "red")“Debt”
Similar to the tweets that contain “stimulate,” tweets that contain “debt” are mainly from those with the hashtag “#cancelstudentdebt” and often “#StudentDebtStimulus.” These tweets urge governors to cancel student debt as a way to stimulate economy.
#get tweets that contain "debt"
debt<-tweets[sapply(1:nrow(tweets), function(x)
str_contains(tolower(tweets$text[x]), "debt")),]
#View(debt$text)
head(debt$text)## [1] ".@SenSchumer, I urge you to #cancelstudentdebt in the next #coronavirus package. A #StudentDebtStimulus will help the 45 million people with student debt and stimulate the economy when it is needed most."
## [2] ".@Kilili_Sablan, I urge you to #cancelstudentdebt in the next #coronavirus package. A #StudentDebtStimulus will help the 45 million people with student debt and stimulate the economy when it is needed most."
## [3] ".@Senatemajldr, I urge you to #cancelstudentdebt in the next #coronavirus package. A #StudentDebtStimulus will help the 45 million people with student debt and stimulate the economy when it is needed most."
## [4] ".@GOPLeader, I urge you to #cancelstudentdebt in the next #coronavirus package. A #StudentDebtStimulus will help the 45 million people with student debt and stimulate the economy when it is needed most."
## [5] ".@RepBonamici, I urge you to #cancelstudentdebt in the next #coronavirus package. A #StudentDebtStimulus will help the 45 million people with student debt and stimulate the economy when it is needed most."
## [6] ".@CongressmanGT, I urge you to #cancelstudentdebt in the next #coronavirus package. A #StudentDebtStimulus will help the 45 million people with student debt and stimulate the economy when it is needed most."
More than 75% of tweet contents are negative and more negative than tweets contain “stimulate.”
# get average sentiment score for each sentence
sentiment_debt <- sentiment_by(get_sentences(debt$text))
summary(sentiment_debt$ave_sentiment)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.98198 -0.03130 -0.03130 -0.03198 -0.03130 0.67082
#plot the score distribution
ggplot(sentiment_debt,aes(ave_sentiment)) +
geom_histogram(bins = 50) +
labs(title = "Sentiment Histogram of Tweets that Contain 'debt' ",
x = "Sentiment Score") +
theme_bw() +
theme(plot.title = element_text(size = 14, face = "bold",hjust = 0.5)) +
geom_vline(xintercept = 0, color = "red")Conclusion
Overall, the tweets convey a moderately pessimistic sentiment, with 56% of tweets contents marked as negative. It is also reflected in the emotional lexicon analysis chart where words with joy or trust labels have a lower frequency compared to the other emotion tags, especially fear and sadness that have maximum rates of 15,000.
The primary source of negativity comes from not only the related health issue but also the destructive effect of the virus on the economy. On the one hand, compared to the previous research in which people had the misconception that COVID-19 is similar to flu, the current analysis suggests that people have realized the fetal and pervasive nature of the virus and expressed concerns. People complain about governments’ insufficient response to COVID-19, where President Trump is frequently mentioned. On the other hand, economic decline profoundly harms local businesses, education, and job force. In the United States, student debt is one of the most mentioned topics; people appeal to governors to cancel student debts as a means to stimulate the economy.
Ultimately, “people” and “support” are the two most frequent words in all tweets and contribute the most positivity. People continuously share resources and channels to support people in need and express appreciation to health care workers.
Future work
This project, as an exploratory analysis, functions well in detecting social attitudes regarding the Coronavirus and gaining insights that can direct future research.
However, there are certain limitations. The Twitter data, though it has many entries, consists of tweets of only a single day. The analysis is also limited in that the project focuses on tweets that are in the English language and thus fails to capture possible topics and sentiments of tweets in other languages.
In the future, in order to result in a comprehensive and representative analysis, multilingual sentiment analysis should be applied to account for tweets in various languages. Further, future research should collect data over a period. Such data allows us to observe and understand the trend and pattern of issues related to health, economy, and politics during the pandemic season. Based on the conclusion found in the exploratory analysis, we can potentially predict future trends and find out the critical solutions to improve current situations by conducting modeling such as time series analysis, topic modeling, and natural language process.