Research Objective

Trump’s presidency is unique from other presidents in the manner he communicates public and economic policy via social media. We want to examine whether his remarks during this global pandemic (Mid March through April 2020) have tangible effects on the stock market. We’re going to analyze the text from daily White House briefings using sentiment analysis. Compare and contrast the sentiment analysis with the stock market performance during his presidency and Trump’s approval ratings.

Data Sources:

White House Briefings - each weekday President Trump addresses the nation and those scripts can be found on https://www.whitehouse.gov/briefings-statements
Stock Market Data - this data was extracted using the quantmod to obtain small, mid and large Cap funds
Trump Approval Ratings - this shows daily approval ratings from FiveThirtyEight
Twitter API (https://dev.twitter.com/apps)

Necessary R Packages:

The following libraries are used throughout our analysis:

twitteR
quantmod
rvest
stringr
purr
tidytext
dplyr
tidyr
ggplot2
lubridate
RCurl

Gather Data

Daily Presidential White House Briefings

These URLs all end in a number so we’re going to first create a list of URLs & use read_html to grab the < p > tags for each page

base_url <- 'https://www.whitehouse.gov/briefings-statements/remarks-president-trump-vice-president-pence-members-coronavirus-task-force-press-briefing-'

getPageURLs <- function(url) {
   add_number <- seq(2,33)
   urls <- str_c(base_url, add_number)
   return(urls)
}

urls <- getPageURLs(urls)
head(urls)

## [1] "https://www.whitehouse.gov/briefings-statements/remarks-president-trump-vice-president-pence-members-coronavirus-task-force-press-briefing-2"
## [2] "https://www.whitehouse.gov/briefings-statements/remarks-president-trump-vice-president-pence-members-coronavirus-task-force-press-briefing-3"
## [3] "https://www.whitehouse.gov/briefings-statements/remarks-president-trump-vice-president-pence-members-coronavirus-task-force-press-briefing-4"
## [4] "https://www.whitehouse.gov/briefings-statements/remarks-president-trump-vice-president-pence-members-coronavirus-task-force-press-briefing-5"
## [5] "https://www.whitehouse.gov/briefings-statements/remarks-president-trump-vice-president-pence-members-coronavirus-task-force-press-briefing-6"
## [6] "https://www.whitehouse.gov/briefings-statements/remarks-president-trump-vice-president-pence-members-coronavirus-task-force-press-briefing-7"

wh_briefings <- purrr::map(urls, ~read_html(.x) %>% html_nodes("p") %>% html_text())
wh_dates <- purrr::map(urls, ~read_html(.x) %>% html_nodes("p") %>% html_node("time") %>% html_text())

Stock Market Data

We decided it would be best to compare smallCapFunds, midCapFunds, and largeCapFunds.

start<- as.Date("2020-03-15")
end <- as.Date("2020-04-30")


#Retrieving information on Top 5 Small Growth Funds as ranked by "U.S. News"
smallCapFunds <- c("PSGAX", "FKASX", "PGSGX", "QUASX", "TRSSX")
getSymbols(smallCapFunds, src = "yahoo", from = start, to = end)

## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
## 
## This message is shown once per session and may be disabled by setting 
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.

## [1] "PSGAX" "FKASX" "PGSGX" "QUASX" "TRSSX"

#Retreiving information on Top 5 Mid Growth Funds as ranked by "U.S. News"
midCapFunds <- c("DFDMX", "CCSMX","PRDMX", "OTCAX", "BMGAX")
getSymbols(midCapFunds, src = "yahoo", from = start, to = end)

## [1] "DFDMX" "CCSMX" "PRDMX" "OTCAX" "BMGAX"

#Retrieving information on Top 5 Large Growth Funds as ranked by "U.s News"
largeCapFunds <- c("TRLGX", "PREFX", "TPLGX", "FDSVX", "PBLAX")
getSymbols(largeCapFunds, src = "yahoo", from = start, to = end)

## [1] "TRLGX" "PREFX" "TPLGX" "FDSVX" "PBLAX"

#Retrieve Dow Jones Industrial Average
getSymbols("DJIA", src = "yahoo", from = start, to = end)

## [1] "DJIA"

# glimpse data
dplyr::glimpse(PSGAX)

## An 'xts' object on 2020-03-16/2020-04-29 containing:
##   Data: num [1:32, 1:6] 30.3 31.4 29.3 30.5 29.2 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:6] "PSGAX.Open" "PSGAX.High" "PSGAX.Low" "PSGAX.Close" ...
##   Indexed by objects of class: [Date] TZ: UTC
##   xts Attributes:  
## List of 2
##  $ src    : chr "yahoo"
##  $ updated: POSIXct[1:1], format: "2020-05-12 17:28:06"

Trump Approval Ratings

These ratings were available on FiveThiryEight broken out by Votes,Adults, and All Polls for each day during April 2020.

#github URL
theURL <- getURL("https://raw.githubusercontent.com/geeman1209/MSDATA2020/master/DATA607/Final_Project/approval_topline.csv")
# Read csv from github
trump_apprdf <- read.csv(text = theURL,stringsAsFactors = FALSE)
# glimpse data
dplyr::glimpse(trump_apprdf)

## Observations: 3,582
## Variables: 10
## $ president           <chr> "Donald Trump", "Donald Trump", "Donald Trump", "…
## $ subgroup            <chr> "Voters", "Adults", "All polls", "Adults", "Voter…
## $ modeldate           <chr> "4/30/2020", "4/30/2020", "4/30/2020", "4/29/2020…
## $ approve_estimate    <dbl> 42.84215, 43.65953, 42.92110, 43.02131, 42.72302,…
## $ approve_hi          <dbl> 46.74216, 47.53337, 46.79893, 46.37986, 46.54502,…
## $ approve_lo          <dbl> 38.94214, 39.78569, 39.04326, 39.66275, 38.90103,…
## $ disapprove_estimate <dbl> 52.64487, 51.65192, 52.38926, 52.15911, 52.61465,…
## $ disapprove_hi       <dbl> 56.27713, 56.48534, 56.54765, 56.86387, 56.20207,…
## $ disapprove_lo       <dbl> 49.01261, 46.81850, 48.23088, 47.45436, 49.02723,…
## $ timestamp           <chr> "15:46:08 30 Apr 2020", "15:44:36 30 Apr 2020", "…

Cleaning Data

Clean WH Briefing Data

In order to easily analyze the WH briefings, we needed to include a date in the format ‘yyyy-mm-dd’ in addition to our number page number, i. The page number correlates with the last character of the page URL.

#create empty dataframe
testFrame <- data.frame(date = character(),
                        stringsAsFactors = FALSE)

for (i in 1:length(wh_dates)){
  testFrame <- rbind(testFrame,cbind(as.data.frame(unlist(wh_dates[[i]]),stringsAsFactors = FALSE),i))
}

dateFrame <- na.omit(testFrame)

data0 <- data.frame(text=character(),
                   Day=integer(),
                 stringsAsFactors=FALSE)

for (i in 1:length(wh_briefings)){
  data0 <- rbind(data0,cbind(as.data.frame(unlist(wh_briefings[[i]]),stringsAsFactors = FALSE),i))
}

colnames(data0) <- c('text','day')

correctMatch <- inner_join(dateFrame,data0,by= c("i" = "day"))
colnames(correctMatch) <- c('date', 'day', 'text')

wh_data <- correctMatch

tidy_data <- wh_data %>%
  mutate(linenumber = row_number()) %>%
  ungroup() %>%
  unnest_tokens(word, text)

head(tidy_data)

##               date day linenumber       word
## 1   March 15, 2020   1          1    remarks
## 2   March 15, 2020   1          2 healthcare
## 3   March 15, 2020   1          3     issued
## 3.1 March 15, 2020   1          3         on
## 3.2 March 15, 2020   1          3      march
## 3.3 March 15, 2020   1          3         15

Clean Approval Ratings

To analyze the approval ratings, we needed to convert the modeldate into the date format ‘yyyy-mm-dd’ alongwith filterring the data for the month of April, 2020.

# modeldate - convert into date
trump_apprdf$modeldate <- mdy(trump_apprdf$modeldate)
dplyr::glimpse(trump_apprdf)

## Observations: 3,582
## Variables: 10
## $ president           <chr> "Donald Trump", "Donald Trump", "Donald Trump", "…
## $ subgroup            <chr> "Voters", "Adults", "All polls", "Adults", "Voter…
## $ modeldate           <date> 2020-04-30, 2020-04-30, 2020-04-30, 2020-04-29, …
## $ approve_estimate    <dbl> 42.84215, 43.65953, 42.92110, 43.02131, 42.72302,…
## $ approve_hi          <dbl> 46.74216, 47.53337, 46.79893, 46.37986, 46.54502,…
## $ approve_lo          <dbl> 38.94214, 39.78569, 39.04326, 39.66275, 38.90103,…
## $ disapprove_estimate <dbl> 52.64487, 51.65192, 52.38926, 52.15911, 52.61465,…
## $ disapprove_hi       <dbl> 56.27713, 56.48534, 56.54765, 56.86387, 56.20207,…
## $ disapprove_lo       <dbl> 49.01261, 46.81850, 48.23088, 47.45436, 49.02723,…
## $ timestamp           <chr> "15:46:08 30 Apr 2020", "15:44:36 30 Apr 2020", "…

# filter data for April, 2020
trump_apprsubdf <- trump_apprdf %>% 
  filter(modeldate >= as.Date("2020-04-01"))

Data Analysis

White House Briefing & Sentiment Analysis

The tidytext package contains three sentiment lexicons. Three general-purpose lexicons are affin, bing and nrc which are based on unigrams, i.e., single words.

AFINN from Finn Arup Nielsen assigns scores to words for positive/negative sentiments.
bing from Bing Liu and collaborators categorizes words in a binary fashion into positive and negative categories.
nrc from Saif Mohammad and Peter Turney categorizes words in a binary fashion (“yes”/“no”) into categories of positive, negative, anger, anticipation, disgust, fear, joy, sadness, surprise, and trust.

# Using lexicon affin
affinLex <- get_sentiments("afinn")
wh.affin <- tidy_data %>% 
  anti_join(stop_words) %>% 
  group_by(day) %>%
  inner_join(affinLex) %>% 
  summarise(sentiment = sum(value)) %>% 
  mutate(method="affin")

## Joining, by = "word"
## Joining, by = "word"

wh.affin

## # A tibble: 32 x 3
##      day sentiment method
##    <int>     <dbl> <chr> 
##  1     1       172 affin 
##  2     2        43 affin 
##  3     3       243 affin 
##  4     4       114 affin 
##  5     5       228 affin 
##  6     6       142 affin 
##  7     7       207 affin 
##  8     8       152 affin 
##  9     9       190 affin 
## 10    10       174 affin 
## # … with 22 more rows

# bar plot for positive and negative cumulative score
wh.affin %>% 
  summarise(Positive = sum(sentiment[sentiment>0]), Negative = sum(sentiment[sentiment<0])) %>% 
  gather(variable, value, Positive:Negative) %>% 
  ggplot(aes(variable, value,  fill = variable)) +
  geom_bar(stat="identity") +  
  geom_text(aes(label=value), position=position_dodge(width=0.9), vjust=-0.25)

# Using lexicon bing 

bingLex <- get_sentiments("bing")
wh.bing <- tidy_data %>% 
  anti_join(stop_words) %>% 
  group_by(day) %>%
  inner_join(bingLex) %>% 
  count(day, sentiment) %>% 
  spread(sentiment, n, fill=0) %>% 
  mutate(sentiment = positive - negative) %>%
  mutate(method="bing")

## Joining, by = "word"
## Joining, by = "word"

wh.bing

## # A tibble: 32 x 5
## # Groups:   day [32]
##      day negative positive sentiment method
##    <int>    <dbl>    <dbl>     <dbl> <chr> 
##  1     1      125      187        62 bing  
##  2     2      178      141       -37 bing  
##  3     3      230      241        11 bing  
##  4     4      180      180         0 bing  
##  5     5      204      295        91 bing  
##  6     6      241      274        33 bing  
##  7     7      246      295        49 bing  
##  8     8      333      295       -38 bing  
##  9     9       97      143        46 bing  
## 10    10      168      220        52 bing  
## # … with 22 more rows

# bar plot for positive and negative sentiments
wh.bing %>% 
  ungroup() %>%
  select(-day) %>% 
  select(negative, positive) %>% 
  summarise_all(funs(sum)) %>% 
  gather(variable, value, negative:positive) %>% 
  ggplot(aes(variable, value,  fill = variable)) +
  geom_bar(stat="identity") +  
  geom_text(aes(label=value), position=position_dodge(width=0.9), vjust=-0.25)

# Using lexicon nrc

nrcLex <- get_sentiments("nrc")
wh.nrc <- tidy_data %>% 
  anti_join(stop_words) %>% 
  group_by(day) %>%
  inner_join(nrcLex) %>% 
  filter(sentiment %in% c("positive", "negative")) %>%
  count(day, sentiment) %>% 
  spread(sentiment, n, fill=0) %>% 
  mutate(sentiment = positive - negative) %>%
  mutate(method="nrc")

## Joining, by = "word"
## Joining, by = "word"

wh.nrc

## # A tibble: 32 x 5
## # Groups:   day [32]
##      day negative positive sentiment method
##    <int>    <dbl>    <dbl>     <dbl> <chr> 
##  1     1      145      496       351 nrc   
##  2     2      198      503       305 nrc   
##  3     3      251      710       459 nrc   
##  4     4      226      540       314 nrc   
##  5     5      256      691       435 nrc   
##  6     6      282      804       522 nrc   
##  7     7      318      838       520 nrc   
##  8     8      412      884       472 nrc   
##  9     9      108      341       233 nrc   
## 10    10      217      587       370 nrc   
## # … with 22 more rows

# bar graph for all categories in nrc lexicon
tidy_data %>% 
  anti_join(stop_words) %>% 
  group_by(day) %>%
  inner_join(nrcLex) %>% 
  #filter(sentiment %in% c("positive", "negative")) %>%
  count(day, sentiment) %>% 
  spread(sentiment, n, fill=0) %>% 
  ungroup() %>%
  select(-day) %>% 
  summarise_all(funs(sum)) %>% 
  gather(variable, value, anger:trust) %>% 
  ggplot(aes(variable, value,  fill = variable)) +
  geom_bar(stat="identity") +  
  geom_text(aes(label=value), position=position_dodge(width=0.9), vjust=-0.25)

## Joining, by = "word"
## Joining, by = "word"

trump_apprsubdf %>% filter(modeldate == '2020-04-02')

##      president  subgroup  modeldate approve_estimate approve_hi approve_lo
## 1 Donald Trump    Adults 2020-04-02         45.58512   50.25256   40.91767
## 2 Donald Trump    Voters 2020-04-02         46.01426   50.15618   41.87234
## 3 Donald Trump All polls 2020-04-02         45.67397   49.99093   41.35702
##   disapprove_estimate disapprove_hi disapprove_lo            timestamp
## 1            49.40673      54.87109      43.94237 14:48:24  2 Apr 2020
## 2            50.63609      55.11203      46.16016 14:49:51  2 Apr 2020
## 3            50.03857      54.96909      45.10805 14:47:26  2 Apr 2020

trump_apprsubdf %>% filter(modeldate == '2020-04-03')

##      president  subgroup  modeldate approve_estimate approve_hi approve_lo
## 1 Donald Trump    Voters 2020-04-03         45.77735   49.91039   41.64430
## 2 Donald Trump    Adults 2020-04-03         46.01294   50.58268   41.44320
## 3 Donald Trump All polls 2020-04-03         45.84590   50.08724   41.60456
##   disapprove_estimate disapprove_hi disapprove_lo            timestamp
## 1            50.77266      55.23004      46.31528 09:44:49  3 Apr 2020
## 2            49.08885      54.33594      43.84177 09:43:23  3 Apr 2020
## 3            49.88251      54.68981      45.07522 09:42:26  3 Apr 2020

trump_apprsubdf %>% filter(modeldate == '2020-04-04')

##      president  subgroup  modeldate approve_estimate approve_hi approve_lo
## 1 Donald Trump    Adults 2020-04-04         46.01294   50.58270   41.44317
## 2 Donald Trump    Voters 2020-04-04         45.70196   49.85291   41.55100
## 3 Donald Trump All polls 2020-04-04         45.78480   50.03462   41.53498
##   disapprove_estimate disapprove_hi disapprove_lo            timestamp
## 1            49.08885      54.33595      43.84176 07:31:21  4 Apr 2020
## 2            50.79717      55.27833      46.31600 07:32:47  4 Apr 2020
## 3            49.91005      54.74942      45.07069 07:30:25  4 Apr 2020

Comparing Lexicons

Following graphs show Sentiment score assigned by each lexicon to individual words in all documents in the corpus.

# Comparing all 3 lexicons

wh.aff_bin_nrc <- bind_rows(wh.affin, wh.bing, wh.nrc)
bind_rows(wh.aff_bin_nrc) %>%
  ggplot(aes(day, sentiment, fill = method)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~method, ncol = 1, scales = "free_y")

# comparing the cumulative sentiments for all 3 lexicons
wh.aff_bin_nrc %>% 
  group_by(method) %>% 
  summarise(sentiment = sum(sentiment)) %>% 
  ggplot(aes(method, sentiment, fill = method)) +
  geom_bar(stat="identity") +  
  geom_text(aes(label=sentiment), position=position_dodge(width=0.9), vjust=-0.25)

# line graph for all 3 lexicons
wh.aff_bin_nrc %>% 
  ggplot(aes(x=day, y=sentiment, group=method, color=method)) + 
  geom_line(size=1) + 
  geom_point() + labs(x="day", y="sentiment", title = "Sentiment vs Days for all 3 lexicons") + 
  scale_color_manual(values=c("red", "green", "blue")) +
  theme(plot.title = element_text(hjust = 0.5))

The following two graphs are for nrc sentiment where former shows positive, trust and later shows rest of the sentiments.

There are peaks in words of trust and positive on April 3rd, April 7th and April 14th
There are peaks in words of fear on April 1st and April 10th

a <- tidy_data %>% 
  anti_join(stop_words) %>% 
  inner_join(nrcLex) %>% 
  count(day, sentiment) %>% 
  spread(sentiment, n, fill=0) %>% 
  gather(variable, value, c('positive','trust')) %>% 
  ggplot(aes(day, value,  color = variable,fill = variable)) +
  geom_line(stat="identity") +
  ylab('Sentiment')

## Joining, by = "word"
## Joining, by = "word"

b <- tidy_data %>% 
  anti_join(stop_words) %>% 
  inner_join(nrcLex) %>% 
  count(day, sentiment) %>% 
  spread(sentiment, n, fill=0) %>% 
  gather(variable, value, c('anger','anticipation','disgust','fear','joy','negative','sadness','surprise')) %>% 
  ggplot(aes(day, value,  color = variable,fill = variable)) +
  geom_line(stat="identity") +
  ylab('Sentiment')

## Joining, by = "word"
## Joining, by = "word"

ggarrange(a, b, ncol=1,top = "White House Briefings - Daily Sentiment by Categories Using NRC")

Trump approval ratings analysis

Below two graphs show Trump approval and disapproval ratings for the period of April, 2020. The 3 subgroups included in this analysis are Adults, All polls and Voters. Seeing the graphs, approval estimates shows decline while disapproval estimates increases for all 3 categories.

# Disapproval line plot for all 3 sub groups  
disapprove <- trump_apprsubdf %>% 
  ggplot(aes(x=modeldate,y=disapprove_estimate, group=subgroup, color=subgroup)) + 
  geom_line(size=1) + 
  geom_point() + labs(x="Date", y="Disapproval estimates", title = "") + 
  scale_color_manual(values=c("red", "green", "blue")) +
  ggtitle('Trump disapproval estimates for April 2020')+
  theme(plot.title = element_text(hjust = 0.5))

# Approval line plot for all 3 sub groups  
approve <- trump_apprsubdf %>% 
  ggplot(aes(x=modeldate,y=approve_estimate, group=subgroup, color=subgroup)) + 
  geom_line(size=1) + 
  geom_point() + labs(x="Date", y="Approval estimates", title = "") + 
  scale_color_manual(values=c("red", "green", "blue")) +
  ggtitle('Trump approval estimates for April 2020')+
  theme(plot.title = element_text(hjust = 0.5))

ggarrange(approve, disapprove, ncol=1)

Stock Market Performance

We used the U.S. News Report top 5 small, mid, and large growth funds to reflect performance small business america vs mid sized businesses vs big business. It was difficult to pick something that would compare the health of “mom and pop” shops, which are a sizable number of businesses.

A data frame was created that included all the closing prices along with the bing sentiment. The average closing price was calculated and compared against the sentiment over the designated time frame.

A the corr function and lm function were used to calculate any p and corr values between the average closing price and sentiment variables.

Small Growth Funds & White House Sentiment Analysis

smFrame <- dateFrame
colnames(smFrame) <- c("Date","Day")

smFrame$PSGAX_Close <- PSGAX$PSGAX.Close
smFrame$FKASX_Close <- FKASX$FKASX.Close 
smFrame$PGSGX_Close <- PGSGX$PGSGX.Close
smFrame$QUASX_Close <- QUASX$QUASX.Close
smFrame$TRSSX_Close <- TRSSX$TRSSX.Close


smFrame$Sentiment <- wh.bing$sentiment

tstFrame <- smFrame

tstFrame$AvgClose <- (smFrame$PSGAX_Close + smFrame$FKASX_Close + smFrame$PGSGX_Close + smFrame$TRSSX_Close + smFrame$QUASX_Close)/5

SmLm2 <- lm(AvgClose ~ Sentiment, data= tstFrame)

summary(SmLm2)

## 
## Call:
## lm(formula = AvgClose ~ Sentiment, data = tstFrame)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.6750 -1.7080  0.1007  1.6570  4.5108 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 28.955444   0.513187  56.423   <2e-16 ***
## Sentiment    0.011512   0.008987   1.281     0.21    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.458 on 30 degrees of freedom
## Multiple R-squared:  0.05186,    Adjusted R-squared:  0.02026 
## F-statistic: 1.641 on 1 and 30 DF,  p-value: 0.21

cor(tstFrame[8:9])

##           Sentiment  AvgClose
## Sentiment 1.0000000 0.2277354
## AvgClose  0.2277354 1.0000000

plot1 <- tstFrame %>% 
ggplot(aes(x=Day, y=AvgClose, group=1)) +
    geom_line() +
    geom_point() + labs(x=" Days", y="Closing Price", title = "Avg of Closing Price for Top 5 Small Growth Funds", subtitle = " March 15, 2020 - April 30, 2020")

plot2 <- tstFrame %>% 
ggplot(aes(x=Day, y=Sentiment, group=1)) +
    geom_line() +
    geom_point() + labs(x="Days", y="WH Briefing Sentiment", title = "Bing Sentiment Scores over 32 Days of WH Briefings", subtitle = "March 15, 2020 - April 30, 2020")

grid.arrange(plot1, plot2, nrow = 2)

## Don't know how to automatically pick scale for object of type xts/zoo. Defaulting to continuous.

Mid Growth Funds & White House Sentiment Analysis

mdFrame <- dateFrame
colnames(mdFrame) <- c("Date","Day")

mdFrame$DFDMX_Close <- DFDMX$DFDMX.Close
mdFrame$CCSMX_Close <- CCSMX$CCSMX.Close 
mdFrame$PRDMX_Close <- PRDMX$PRDMX.Close
mdFrame$OTCAX_Close <- OTCAX$OTCAX.Close
mdFrame$BMGAX_Close <- BMGAX$BMGAX.Close


mdFrame$Sentiment <- wh.bing$sentiment

tst2Frame <- mdFrame

tst2Frame$AvgClose <- (mdFrame$DFDMX_Close + mdFrame$CCSMX_Close + mdFrame$PRDMX_Close + mdFrame$OTCAX_Close + mdFrame$BMGAX_Close)/5

SmLm3 <- lm(AvgClose ~ Sentiment, data= tst2Frame)

summary(SmLm3)

## 
## Call:
## lm(formula = AvgClose ~ Sentiment, data = tst2Frame)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.3896 -0.9952  0.1580  1.1440  2.7797 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 21.370883   0.358911  59.544   <2e-16 ***
## Sentiment    0.006627   0.006285   1.054      0.3    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.719 on 30 degrees of freedom
## Multiple R-squared:  0.03574,    Adjusted R-squared:  0.003596 
## F-statistic: 1.112 on 1 and 30 DF,  p-value: 0.3001

cor(tst2Frame[8:9])

##           Sentiment AvgClose
## Sentiment  1.000000 0.189044
## AvgClose   0.189044 1.000000

plot1 <- tst2Frame %>% 
ggplot(aes(x=Day, y=AvgClose, group=1)) +
    geom_line() +
    geom_point() + labs(x=" Days", y="Closing Price", title = "Avg of Closing Price for Top 5 Mid Growth Funds", subtitle = " March 15, 2020 - April 30, 2020")

plot2 <- tst2Frame %>% 
ggplot(aes(x=Day, y=Sentiment, group=1)) +
    geom_line() +
    geom_point() + labs(x="Days", y="WH Briefing Sentiment", title = "Biin Sentiment Scores over 32 Days of WH Briefings", subtitle = "March 15, 2020 - April 30, 2020")

grid.arrange(plot1, plot2, nrow = 2)

## Don't know how to automatically pick scale for object of type xts/zoo. Defaulting to continuous.

Large Growth Funds & White House Sentiment Analysis

lgFrame <- dateFrame
colnames(lgFrame) <- c("Date","Day")

lgFrame$TRLGX_Close <- TRLGX$TRLGX.Close
lgFrame$PREFX_Close <- PREFX$PREFX.Close 
lgFrame$TPLGX_Close <- TPLGX$TPLGX.Close
lgFrame$FDSVX_Close <- FDSVX$FDSVX.Close
lgFrame$PBLAX_Close <- PBLAX$PBLAX.Close


lgFrame$Sentiment <- wh.bing$sentiment

tst3Frame <- lgFrame

tst3Frame$AvgClose <- (TRLGX$TRLGX.Close + PREFX$PREFX.Close + TPLGX$TPLGX.Close + FDSVX$FDSVX.Close + PBLAX$PBLAX.Close)/5

SmLm4 <- lm(AvgClose ~ Sentiment, data= tst3Frame)

summary(SmLm4)

## 
## Call:
## lm(formula = AvgClose ~ Sentiment, data = tst3Frame)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.2845 -1.9643  0.3148  2.2355  3.9929 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 34.300249   0.570832  60.088   <2e-16 ***
## Sentiment    0.011255   0.009996   1.126    0.269    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.734 on 30 degrees of freedom
## Multiple R-squared:  0.04055,    Adjusted R-squared:  0.008564 
## F-statistic: 1.268 on 1 and 30 DF,  p-value: 0.2691

cor(tst3Frame[8:9])

##           Sentiment  AvgClose
## Sentiment 1.0000000 0.2013588
## AvgClose  0.2013588 1.0000000

plot1 <- tst3Frame %>% 
ggplot(aes(x=Day, y=AvgClose, group=1)) +
    geom_line() +
    geom_point() + labs(x=" Days", y="Closing Price", title = "Avg of Closing Price for Top 5 Large Growth Funds", subtitle = " March 15, 2020 - April 30, 2020")

plot2 <- tst3Frame %>% 
ggplot(aes(x=Day, y=Sentiment, group=1)) +
    geom_line() +
    geom_point() + labs(x="Days", y="WH Briefing Sentiment", title = "Biin Sentiment Scores over 32 Days of WH Briefings", subtitle = "March 15, 2020 - April 30, 2020")

grid.arrange(plot1, plot2, nrow = 2)

## Don't know how to automatically pick scale for object of type xts/zoo. Defaulting to continuous.

Twitter API

In this analysis, we use the twitter account of Donald Trump. All Twitter posts collected from Donald Trump’s twitter account realDonaldTrump post for corona pandemic. We will request for 10,000 tweets related to #COVID-19 and #realDonaldTrump from March 15 to May 8th in 2020 for analysis.

We use the data science software R with the tidyr,tidytext, dplyr packages to do text analytics and the twitteR package to connect and download data from twitter.

To collect data from twitter, we created one twitter application. This twitter data belongs to realDonaldTrump page. Followed the below steps to connect the app and download data from twitter. After getting the data, store the file in a csv file in Git repository and perform the sentiment analysis.

Create a twitter account
Go to https://dev.twitter.com/apps and log in with twitter credentials
Click create a new app
Give name, Description, Website, and Callback URL (I have used http://localhost:1410 for Callback URL)
Click on Developer Agreement check box
After create app, go to “keys and Access token”
Copy the below keys
- Consumer key (API Key)
- Consumer secret (API Secret)
Then, click the create my access token then copy the below keys
- access token
- access token secret

Load necessary packages and collect the data from twitter.

library(twitteR)
library(tidyr)

consumer_key <- "Lsk1wfKakbL6rfFC1dcLfpSrb"
consumer_secret <- "rKywBX5uJ3d13drP8NxgWGxC3TtPraKZiucS58J2OzFW0Rpcci"
access_token <- "1257449305616650243-J8E4TpO0Fmu6v5KXyY8T9owgSsPI7v"
access_secret <- "tgCzUd7fXgnSZFrLvmeMBYNJzH3i1hZkDX16PK3APkM3E"

#now lets connect
setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)

## [1] "Using direct authentication"

# Collect 10000 tweets
trump_tweets <- twitteR::searchTwitter("#COVID-19 + @realDonaldTrump", n = 10000,lang = "en", since = "2020-03-15", until = "2020-05-08",retryOnRateLimit = 1e2)

# set to a data frame
trump_tweets_df = twListToDF(trump_tweets)

# write data in to a csv
write.csv(trump_tweets_df, file = "trump_tweets.csv")

# read data from Github
trump_tweets_df <- read.csv("https://raw.githubusercontent.com/geeman1209/MSDATA2020/master/DATA607/Final_Project/trump_tweets.csv", stringsAsFactors = FALSE)


# Raw data
head(trump_tweets_df)

##   X
## 1 1
## 2 2
## 3 3
## 4 4
## 5 5
## 6 6
##                                                                                                                                                  text
## 1      RT @no_silenced: San Francisco Delivers Weed, Opioids, Booze and Ciggies to Homeless Locked Up for COVID-19\n\nMr President, can you please n…
## 2    RT @grey_sister: @kyledcheney @juliabhaber &amp; what a co-inkydink that @TheJusticeDept #AGBarr let Flynn walk free today, also good distracti…
## 3        RT @CaslerNoel: While Trump was berating a nurse in the Oval Office yesterday who was telling the truth about the lack on PPE she experienc…
## 4        RT @PARISDENNARD: Brittany from @SamaritansPurse gave a powerful testimony about her work in New York during #Covid_19 with other front lin…
## 5 RT @the_resistor: @JoeBiden "A Grave Hardship" ~ The INABILITY &amp; UNWILLINGNESS of @realDonaldTrump to LEAD \n\n33.5 MILLION Unemployed\n76,681…
## 6        RT @Free_Media_Hub: @realDonaldTrump COVID-19 US update infections 1,291,804 recorded deaths 76,889 deaths per million population 232 new d…
##   favorited favoriteCount replyToSN             created truncated replyToSID
## 1     FALSE             0      <NA> 2020-05-07 23:59:48     FALSE         NA
## 2     FALSE             0      <NA> 2020-05-07 23:59:39     FALSE         NA
## 3     FALSE             0      <NA> 2020-05-07 23:59:31     FALSE         NA
## 4     FALSE             0      <NA> 2020-05-07 23:59:22     FALSE         NA
## 5     FALSE             0      <NA> 2020-05-07 23:59:21     FALSE         NA
## 6     FALSE             0      <NA> 2020-05-07 23:59:14     FALSE         NA
##             id replyToUID
## 1 1.258547e+18         NA
## 2 1.258547e+18         NA
## 3 1.258547e+18         NA
## 4 1.258547e+18         NA
## 5 1.258547e+18         NA
## 6 1.258547e+18         NA
##                                                                           statusSource
## 1 <a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>
## 2              <a href="https://mobile.twitter.com" rel="nofollow">Twitter Web App</a>
## 3 <a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>
## 4            <a href="http://tapbots.com/tweetbot" rel="nofollow">Tweetbot for iΟS</a>
## 5   <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
## 6              <a href="https://mobile.twitter.com" rel="nofollow">Twitter Web App</a>
##        screenName retweetCount isRetweet retweeted longitude latitude
## 1      jabelliott          524      TRUE     FALSE        NA       NA
## 2     juliabhaber            3      TRUE     FALSE        NA       NA
## 3  HollyL85225600         1305      TRUE     FALSE        NA       NA
## 4 michaeljashmore           15      TRUE     FALSE        NA       NA
## 5        dmgraden           25      TRUE     FALSE        NA       NA
## 6  Roberttimestwo            6      TRUE     FALSE        NA       NA

# Remove http from statusSource
trump_tweets_df$statusSource <- gsub("<.*?>", "",trump_tweets_df$statusSource)

# Most favorited tweets
trump_fav <- trump_tweets_df %>%
  dplyr::arrange(desc(favoriteCount))

# Top 6 favorited tweets among the extracted 10000 tweets
head(trump_fav)

##      X
## 1 5294
## 2 4216
## 3 5883
## 4 2214
## 5 7136
## 6 3682
##                                                                                                                                              text
## 1    While Trump was berating a nurse in the Oval Office yesterday who was telling the truth about the lack on PPE she e… https://t.co/meo4q6ijTX
## 2             Whoops! @realDonaldTrump's personal valet **who serves Trump's food and beverage** has tested positive for… https://t.co/RjgrN4IePP
## 3     Trump could have been spreading Covid-19 to everyone in that factory on Tuesday because he didn’t want to look bad… https://t.co/O5ddIdL1d0
## 4    Today I launch the #ThankYou45 campaign to highlight how @realDonaldTrump has delivered for FL to respond to COVID-… https://t.co/wHJjsT1qj7
## 5    This concern of COVID-19 is a huge scam and political stunt. Orchestrated by globalists next step to one government… https://t.co/Bgg1pwmTXs
## 6 @realDonaldTrump The TRUTH IS going to come out \n\nIt ISN'T CHINA'S or @WHO's Fault\n\nIt IS @realDonaldTrump's FAULT… https://t.co/FeIgyTa8LH
##   favorited favoriteCount       replyToSN             created truncated
## 1     FALSE          3991            <NA> 2020-05-07 16:01:22      TRUE
## 2     FALSE          2058            <NA> 2020-05-07 17:02:21      TRUE
## 3     FALSE          1461            <NA> 2020-05-07 15:11:51      TRUE
## 4     FALSE           802            <NA> 2020-05-07 19:27:40      TRUE
## 5     FALSE           602            <NA> 2020-05-07 12:37:54      TRUE
## 6     FALSE           512 realDonaldTrump 2020-05-07 17:30:36      TRUE
##     replyToSID           id replyToUID       statusSource      screenName
## 1           NA 1.258427e+18         NA Twitter for iPhone      CaslerNoel
## 2           NA 1.258442e+18         NA    Twitter Web App   DrDenaGrayson
## 3           NA 1.258414e+18         NA Twitter for iPhone      CaslerNoel
## 4           NA 1.258479e+18         NA Twitter for iPhone realannapaulina
## 5           NA 1.258376e+18         NA Twitter for iPhone          RL9631
## 6 1.258448e+18 1.258449e+18   25073877    Twitter Web App    the_resistor
##   retweetCount isRetweet retweeted longitude latitude
## 1         1305     FALSE     FALSE        NA       NA
## 2          661     FALSE     FALSE        NA       NA
## 3          471     FALSE     FALSE        NA       NA
## 4          276     FALSE     FALSE        NA       NA
## 5          518     FALSE     FALSE        NA       NA
## 6           78     FALSE     FALSE        NA       NA

# Most retweeted
trump_retweet <- trump_tweets_df %>%
  dplyr::arrange(desc(retweetCount)) %>%
  dplyr::distinct(text, .keep_all = TRUE)

# Top 6 retweeted texts among the extracted 10000 tweets
head(trump_retweet)

##      X
## 1 6284
## 2 4694
## 3 1449
## 4  316
## 5 6248
## 6  557
##                                                                                                                                           text
## 1 RT @narendramodi: Fully agree with you President @realDonaldTrump. Times like these bring friends closer. The India-US partnership is stron…
## 2 RT @GOPChairwoman: State Rep. Whitsett nearly died from coronavirus, but because she dared to say something positive about @realDonaldTrump…
## 3 RT @MichaelCoudrey: UPDATE: @Twitter just suspended the account of the publicly traded biotech company AYTU BioScience that created a novel…
## 4 RT @SteveSchmidtSES: This ad has unnerved and rattled the unsteady hand  of @realDonaldTrump.  He was  played by the Chinese over Covid-19.…
## 5 RT @IvankaTrump: Breaking: The House finally passes $480 billion package to deliver aid to millions of small businesses, workers and hospit…
## 6  RT @DonaldJTrumpJr: “COVID-19’s economic fallout is shining a light on exactly how dangerous it is to so heavily rely on China, as well as…
##   favorited favoriteCount replyToSN             created truncated replyToSID
## 1     FALSE             0      <NA> 2020-05-07 14:23:54     FALSE         NA
## 2     FALSE             0      <NA> 2020-05-07 16:31:52     FALSE         NA
## 3     FALSE             0      <NA> 2020-05-07 20:39:23     FALSE         NA
## 4     FALSE             0      <NA> 2020-05-07 23:10:41     FALSE         NA
## 5     FALSE             0      <NA> 2020-05-07 14:28:56     FALSE         NA
## 6     FALSE             0      <NA> 2020-05-07 22:40:31     FALSE         NA
##             id replyToUID        statusSource      screenName retweetCount
## 1 1.258402e+18         NA Twitter for Android   jaykrishna324        34595
## 2 1.258434e+18         NA     Twitter Web App arrowsmithwoman        19387
## 3 1.258497e+18         NA  Twitter for iPhone  CarrieNemitz21        14000
## 4 1.258535e+18         NA    Twitter for iPad    handdonordad        12728
## 5 1.258403e+18         NA Twitter for Android njq5qd4hCP477fH         6947
## 6 1.258527e+18         NA    Twitter for iPad RonaldM79125420         6728
##   isRetweet retweeted longitude latitude
## 1      TRUE     FALSE        NA       NA
## 2      TRUE     FALSE        NA       NA
## 3      TRUE     FALSE        NA       NA
## 4      TRUE     FALSE        NA       NA
## 5      TRUE     FALSE        NA       NA
## 6      TRUE     FALSE        NA       NA

trump_retweet_extracted <- trump_retweet[c(1,12,13,14)]

head(trump_retweet_extracted)

##      X      screenName retweetCount isRetweet
## 1 6284   jaykrishna324        34595      TRUE
## 2 4694 arrowsmithwoman        19387      TRUE
## 3 1449  CarrieNemitz21        14000      TRUE
## 4  316    handdonordad        12728      TRUE
## 5 6248 njq5qd4hCP477fH         6947      TRUE
## 6  557 RonaldM79125420         6728      TRUE

Data cleaning and tokenization

We will convert the data set into a corpus and then clean the corpus such as making all character lower case, remove punctuation marks, white spaces, and stop words.

library(tm)
library(textmineR)
library(RWeka)
library(wordcloud)
library(RColorBrewer)

trump_tweets_df_2 <- trump_tweets_df[c(1,2)]

# remove imocation from text
trump_tweets_df_2$text <- gsub("[^\x01-\x7F]", "", trump_tweets_df_2$text)


# Change dataset into a corpus
trump_tweets_corp <- tm::VCorpus(tm::VectorSource(trump_tweets_df_2))


# Data cleaning
trump_tweets_corp <- tm::tm_map(trump_tweets_corp, tolower)
trump_tweets_corp <- tm::tm_map(trump_tweets_corp, PlainTextDocument)
trump_tweets_corp <- tm::tm_map(trump_tweets_corp, removePunctuation)

# Remove stop words
new_stops <-c("covid","iphone","coronavirus","android","web","rt","chuonlinenews","Fashion", "fashionblogger", "Covid_19", "Juventus", "WuhanVirus","covid19","dranthonyfauci","scotgov youre", "rvawonk two","false","president","realdonaldtrump","champion")


trump_tweets_corp <- tm::tm_map(trump_tweets_corp, removeWords, words = c(stopwords("english"), new_stops))
trump_tweets_corp <- tm::tm_map(trump_tweets_corp, stripWhitespace)
trump_tweets_corp <- tm::tm_map(trump_tweets_corp, PlainTextDocument)
trump_tweets_corp <- tm::tm_map(trump_tweets_corp, removePunctuation)
trump_tweets_corp <- tm::tm_map(trump_tweets_corp, removeNumbers)


# Tokenize tweets texts into words
tokenizer <- function(x) {
  RWeka::NGramTokenizer(x, RWeka::Weka_control(min = 2, max = 2))
}

tdm <- TermDocumentMatrix(
  trump_tweets_corp,
  control = list(tokenize = tokenizer)
)

tdm <- as.matrix(tdm)
trump_tweets_cleaned_freq <- rowSums(tdm)


# Create a bi-gram (2-word) word cloud
pal <- RColorBrewer::brewer.pal(8,"Set1")
wordcloud::wordcloud(names(trump_tweets_cleaned_freq), trump_tweets_cleaned_freq, min.freq=50,max.words = 50, random.order=TRUE,random.color = TRUE, rot.per=.15, colors = pal,scale = c(3,1))

This word cloud shows the Word frequency of bi-grams(2 words). Based on the bi-gram we can know what most people are taking on Trump’s post.

Sentiment Analysis

Sentiment analysis helps us understand peoples’ feelings towards a specific subject. We will break the tweets’ sentences into words for further analysis.

library(tibble)
library(tidytext)

# Transform sentences into words
trump_data <- trump_tweets_df %>%
  tidytext::unnest_tokens(output = "words", input = text, token = "words")

# Remove stop words from tibble
trump_clean_data <- trump_data %>%
  dplyr::anti_join(stop_words, by=c("words"="word")) %>% dplyr::filter(words != "trump" )

Polarity scores help us make quantitative judgments about the feelings of some text. In short, we categorize words from the tweets into positive and negative types and give them a score for analysis. Then, we filter the dataset to get only words with a polarity score of 80 or more. I assigned the words with sentiment using bing lexicon and categorize words using polarity scores.

library(tidyr)
library(ggplot2)

sentiment_data <- trump_clean_data %>% 
  # Inner join to bing lexicon by term = word
  dplyr::inner_join(get_sentiments("bing"), by = c("words" = "word")) %>% 
  # Count by term and sentiment, weighted by count
  dplyr::count(words, sentiment) %>%
  # Spread sentiment, using n as values
  tidyr::spread(sentiment, n, fill = 0) %>%
  # Mutate to add a polarity column
  dplyr::mutate(polarity = positive - negative)

# show summary of sentiment data
summary(sentiment_data)

##     words              negative         positive          polarity       
##  Length:860         Min.   :  0.00   Min.   :  0.000   Min.   :-984.000  
##  Class :character   1st Qu.:  0.00   1st Qu.:  0.000   1st Qu.:  -3.000  
##  Mode  :character   Median :  1.00   Median :  0.000   Median :  -1.000  
##                     Mean   : 10.16   Mean   :  2.238   Mean   :  -7.926  
##                     3rd Qu.:  3.00   3rd Qu.:  1.000   3rd Qu.:   1.000  
##                     Max.   :984.00   Max.   :854.000   Max.   : 854.000

polarity_data <- sentiment_data %>% 
  # Filter for absolute polarity at least 80 
  dplyr::filter(abs(polarity) >= 80) %>% 
  # add new column named as sentiments, shows positive/negative
  dplyr::mutate(
    Sentiments = ifelse(polarity > 0, "positive", "negative")
  )


ggplot2::ggplot(polarity_data, aes(reorder(words, polarity), polarity, fill = Sentiments)) +
  geom_col() + 
  ggtitle("Sentiment Word Frequency") + 
  theme(axis.text.x = element_text(angle = 45, vjust = 0.5, size = 10))+
  xlab("Word")

From the frequency of sentiments, we can see, negative sentiments frequency is much higher than positive sentiments.

To get a clear picture of how positive and negative words are used, I assigned the words with a sentiment using the ‘bing’ lexicon and do a simple count to generate the top 15 most common positive and negative words used in the extracted tweets.

word_counts <- trump_clean_data %>%
  #  sentiment analysis using the "bing" lexicon
  dplyr::inner_join(get_sentiments("bing"), by = c("words" = "word")) %>%
  # Count by word and sentiment
  dplyr::count(words, sentiment)


top_words <- word_counts %>%
  # Group by sentiment
 dplyr:: group_by(sentiment) %>%
  # Take the top 15 for each sentiment
  dplyr::top_n(15) %>%
  dplyr::ungroup() %>%
  # Make word a factor in order of n
  dplyr::mutate(words = reorder(words, n))



ggplot2::ggplot(top_words, aes(words, n, fill = sentiment)) +
  geom_col(show.legend = FALSE) +
  geom_text(aes(label = n, hjust=1), size = 3.5, color = "black") +
  facet_wrap(~sentiment, scales = "free") +  
  coord_flip() +
  ggtitle("Most common positive and negative words")

# Sentiment word cloud
tokenizer <- function(x) {
  RWeka::NGramTokenizer(x, RWeka::Weka_control(min = 1, max = 1))
}

tdm <- TermDocumentMatrix(
  trump_tweets_corp,
  control = list(tokenize = tokenizer)
)

tdm <- as.matrix(tdm)
trump_tweets_cleaned_freq <- rowSums(tdm)


# Create a uni-gram (1-word) word cloud
pal <- RColorBrewer::brewer.pal(9,"Set2")
wordcloud::wordcloud(names(trump_tweets_cleaned_freq), trump_tweets_cleaned_freq, min.freq=50,max.words = 50, random.order=TRUE,random.color = TRUE, rot.per=.15, colors = pal,scale = c(3,1))

Conclusion

Based on the \(R^2\) value (3%-4%) between White House sentiments and stocks, there is little correlation and statistical signifance to the White House Briefings affecting the average closing price of small, mid, or large businesses in America. Ideally, we wanted to perform a sentiment analysis of Trump’s tweets to visualize the impact of his words (unfiltered through tweets) on the economy, especially during the current crises. Unfortunately, twitter’s API only allows for the download of a week’s worth of data from the present date.

Something to consider, there many variables that affect stock or economic performance, which we did not take into consideration. In the future, perhaps a sentiment analysis of news articles from the financial times or wall street journal, as well as the social media accounts of prominent financial talking heads comparing it with White House Briefings might see a better correlation to market performance. Also, the lexicon used for to compare against market performance is bing. We did not compare against others.

Sentence Sentiment Analysis can be considered here since it focuses on the whole sentence and not just words. It works well with the larger amount of data so can be explored here to get the sentiments that might go well with stocks.

Overall, the tweets convey an optimistic sentiment with the high frequency of words such as Positive,safe and lead of defeating Coronavirus. And most negative high frequancy words such as lack,scam and concern.
When looking at bar graph (Sentiment Word Frequency graph), the word “lack” has highest frequency among other words, which suggests that there are news or stories posted on twitter about people died in Covid-19 pandemic and president statement is not correct.
The most frequent words in bi-gram word cloud plot show related to tested positive, approved trails, lung disinfectant, concern huge, and huge scam suggesting that the Government tried to release vaccine for the virus but the number of infection cases increase and people are more panic about their health.

This sentiment analysis, we see the most negative sentiments over positive. This pandemic number of infections of people increasing daily but no vaccine released yet. Government trying hard to help people, due to infection health care professionals also getting infected.

Challenges & Additional Comments

Incorpating such different data into one graph for comparison
Twitter data was only available for the past 8 days, so we were unable to chart this data over time
Our feature not discussed in our class was using linear regression to analyze different stock prices and sentiment analysis

Resources

https://bradleyboehmke.github.io/2015/12/scraping-html-text.html
https://projects.fivethirtyeight.com/trump-approval-ratings/
Twitter API using R: https://www.youtube.com/watch?v=M_PnapGrpNI
Twitter react to the Coronavirus pandemic: https://towardsdatascience.com/how-did-twitter-react-to-the-coronavirus-pandemic-2857592b449a

Data 607 - Final Project