In this project, we are going to see if Americans support President Trump’s impeachment inquiry or not. For that purpose, we have taken the data from three different sources i.e. Washington post was scraped to see the road to impeachment, getting data from five thirty eight.com and then tweets were taken from twitter. The reason behind this was to see if Americans support impeachment without biasness.
library(knitr)
library(tm)
library(dplyr)
library(twitteR)
library(wordcloud)
library(wordcloud2)
library(ggwordcloud)
library(tidyverse)
library(tidytext)
library(DT)
library(knitr)
library(scales)
library(ggthemes)
library(RColorBrewer)
library(RCurl)
library(XML)
library(stringr)
library(ggplot2)
library(htmlwidgets)
library(rvest)
library(data.table)
library(DT)
library(kableExtra)
library(DBI)
library(RMySQL)
library(readr)
library(lattice)
The House of Representatives is engaged in a formal impeachment inquiry of President Trump. It is focused on his efforts to secure specific investigations in Ukraine that carried political benefits for him â including aides allegedly tying those investigations to official U.S. government concessions. To get the data for Trump-Ukraine impeachment timeline of relevant events leading to Trump Impeachment inquiry, we decided to scrap the following website:
https://www.washingtonpost.com/graphics/2019/politics/trump-impeachment-timeline
#First we create an empty data frame for our function to fill. We call it ukraine.
ukraine <- data.frame( Title = character(),
Description = character(),
stringsAsFactors=FALSE
)
url <- "https://www.washingtonpost.com/graphics/2019/politics/trump-impeachment-timeline/"
var <- read_html(url)
title <- var %>%
html_nodes("div.pg-card .pg-card-title") %>%
html_text()
#title
description <- var %>%
html_nodes("div.pg-card .pg-card-description") %>%
html_text()
description <- gsub(pattern = "\\[\\]", replace = "", description)
descrition <- stringr::str_replace(description, '\\*', '')
description <- gsub(pattern = "\n", replace = "", description)
ukraine <- rbind(ukraine, as.data.frame(cbind(title,description)))
## Warning in cbind(title, description): number of rows of result is not a
## multiple of vector length (arg 1)
ukraine <- ukraine %>% mutate(id = row_number())
ukraine <- ukraine[1:215,]
head(ukraine) %>% kable() %>% kable_styling()
title | description | id |
---|---|---|
February 22, 2014 | Ukrainian President Viktor Yanukovych is ousted from power during a popular uprising in the country. He flees to Russia. After his ouster, Ukrainian officials begin a wide-ranging investigation into corruption in the country. | 1 |
March 7, 2014 | Lev Parnas, eventually an associate of former New York City mayor Rudolph W. Giuliani, has his first known interaction with Donald Trump at a golf tournament in Florida. | 2 |
March 1, 2014 | Russia invades the Ukrainian peninsula of Crimea, annexing it. | 3 |
May 13, 2014 | Hunter Biden, a son of then-U.S. Vice President Joe Biden, joins the board of the Ukrainian energy company Burisma Holdings. It is owned by oligarch Mykola Zlochevsky, one of several subjects of the Ukrainian corruption probe. | 4 |
May 25, 2014 | Petro Poroshenko is elected president of Ukraine. | 5 |
February 10, 2015 | Viktor Shokin becomes Ukraines prosecutor general. | 6 |
tail(ukraine) %>% kable() %>% kable_styling()
title | description | id | |
---|---|---|---|
210 | November 10, 2019 | Hill testifies. | 210 |
211 | November 13, 2019 | Kent testifies. | 211 |
212 | November 15, 2019 | McKinley testifies and explains his resignation. “I was disturbed by the implication that foreign governments were being approached to procure negative information on political opponents,” McKinley says. “I was convinced that this would also have a serious impact on Foreign Service morale and the integrity of our work overseas.” | 212 |
213 | November 19, 2019 | Sondland testifies, saying any pressure he applied on Ukraine to investigate Burisma came before he knew the case involved the Bidens. (He claims this despite Giuliani’s efforts and the Bidens’ proximity to them being in the news by early May.) Sondland says he is making that distinction “because I believe I testified that it would be improper” to push for such political investigations. Asked whether it would be illegal, Sondland says: “I’m not a lawyer, but I assume so.” | 213 |
214 | November 20, 2019 | Trump announces Perry will resign by the end of the year. | 214 |
215 | November 21, 2019 | Mulvaney in a news conference momentarily confirms a quid pro quo with Ukraine. “[Did Trump] also mention to me, in the past, that the corruption related to the DNC server?” Mulvaney said. “Absolutely, no question about that. But that’s it. And that’s why we held up the money. . . . The look back to what happened in 2016 certainly was part of the thing that he was worried about in corruption with that nation. And that is absolutely appropriate.” Mulvaney later issues a statement trying to reverse course, saying there actually was no connection. | 215 |
In the above chunk of codes, empty dataframe was created first and then gauged relevant html nodes to scrap the relevant data.
allDescriptions <- ""
mdescription <- c()
for (i in (1:length(ukraine$title))){
mdescription <- ukraine$description[i]
allDescriptions <- paste0(allDescriptions,mdescription)
}
allDescriptions <- gsub(pattern = "\\\"", replace = "", allDescriptions)
allDescriptions <- gsub(pattern = "\\[\\]", replace = "", allDescriptions)
allDescriptions <- gsub(pattern = "\"", replace = "", allDescriptions)
allDescriptions <- gsub(pattern = "__", replace = "", allDescriptions)
allDescriptions <- gsub(pattern = "--", replace = "", allDescriptions)
allDescriptions <- gsub(pattern = "----", replace = "", allDescriptions)
#allDescriptions
Now after getting the data, let’s clean the data using tm package’s Corpus function through removing unnecessary numbers and making the words cleaner.
# putting the words in vector
words <- Corpus(VectorSource(allDescriptions))
#using tm to remove numbers, punctuation and convert to lowercase. Some high frequency words we do not want are removed.
words <- tm_map(words, tolower)
words <- tm_map(words, removeNumbers)
words <- tm_map(words, removePunctuation)
words <- tm_map(words, removeWords, stopwords("english"))
words <- tm_map(words, removeWords, c("will","according", "later", "say", "says", "said", "saying", "tells", "also", "â-", "__" ))
#inspect(words)
#Build a term-document matrix and dataframe d to show frequency of words
tdm <- TermDocumentMatrix(words)
m <- as.matrix(tdm)
#desc(m)
# head(m, 20) %>% kable() %>% kable_styling()
v <- sort(rowSums(m), decreasing=TRUE)
d <- data.frame(word = names(v), freq=v)
head(d,8) %>% kable() %>% kable_styling()
word | freq | |
---|---|---|
trump | trump | 110 |
ukraine | ukraine | 74 |
zelensky | zelensky | 70 |
taylor | taylor | 65 |
sondland | sondland | 52 |
house | house | 39 |
call | call | 38 |
giuliani | giuliani | 37 |
#wordcloud2(d, size = 0.7)
wordcloud2(d, size = 1, color = "random-light", backgroundColor = "grey")
ggplot(head(d, 25), aes(reorder(word, freq),freq)) +
geom_bar(stat = "identity", fill = "#7300AB") + #03DAC6 #6200EE
labs(title = "Road To Impeachment words frequency",
x = "Words", y = "Frequency") +
geom_text(aes(label=freq), vjust=0.4, hjust= 1.2, size=3, color="white")+
coord_flip()
ggplot(d, aes(label = word, size=2)) +
geom_text_wordcloud_area(
mask = png::readPNG("t.png"),
rm_outside = TRUE, color="skyblue"
) +
scale_size_area(max_size = 10) +
theme_minimal()
## Some words could not fit on page. They have been removed.
Do Americans Support Impeaching Trump? reference- https://projects.fivethirtyeight.com/impeachment-polls
poll <- read.csv("https://raw.githubusercontent.com/ekhahm/datascience/master/impeachment-polls.csv")
head(poll)
## Start End Pollster Sponsor SampleSize Pop
## 1 6/28/2019 7/1/2019 ABC News/Washington Post 1008 a
## 2 4/22/2019 4/25/2019 ABC News/Washington Post 1001 a
## 3 1/21/2019 1/24/2019 ABC News/Washington Post 1001 a
## 4 8/26/2018 8/29/2018 ABC News/Washington Post 1003 a
## 5 6/8/2019 6/12/2019 Civiqs 1559 rv
## 6 5/28/2019 5/31/2019 CNN/SSRS 1006 a
## tracking
## 1 NA
## 2 NA
## 3 NA
## 4 NA
## 5 NA
## 6 NA
## Text
## 1 Based on what you know, do you think Congress should or should not begin impeachment proceedings that could lead to Trump being removed from office? Do you feel that way strongly or somewhat?
## 2 Based on what you know, do you think Congress should or should not begin impeachment proceedings that could lead to Trump being removed from office? Do you feel that way strongly or somewhat?
## 3 Based on what you know, do you think Congress should or should not begin impeachment proceedings that could lead to Trump being removed from office? Do you feel that way strongly or somewhat?
## 4 Based on what you know, do you think Congress should or should not begin impeachment proceedings that could lead to Trump being removed from office? Do you feel that way strongly or somewhat?
## 5 Do you think the House of Representatives should open an impeachment inquiry to determine if President Donald Trump should be removed from office?
## 6 Based on what you have read or heard, do you believe that President Trump should be impeached and removed from office, or don't you feel that way?
## Category Include. Yes No Unsure Rep.Sample Rep.Yes Rep.No
## 1 begin_proceedings yes 37 59 4 232 7 87
## 2 begin_proceedings yes 37 56 6 260 10 87
## 3 begin_proceedings yes 40 55 6 240 7 90
## 4 begin_proceedings yes 49 46 5 251 15 82
## 5 begin_inquiry yes 43 51 5 483 5 93
## 6 impeach_and_remove yes 41 54 5 342 6 93
## Dem.Sample Dem.Yes Dem.No Ind.Sample Ind.Yes Ind.No
## 1 292 61 36 373 37 59
## 2 290 62 29 360 36 59
## 3 320 64 30 370 42 53
## 4 331 75 21 371 49 46
## 5 577 77 15 499 41 53
## 6 272 76 18 392 35 59
## URL
## 1 https://games-cdn.washingtonpost.com/notes/prod/default/documents/2557e081-f90a-4c44-a04e-9c98f04bb725/note/d1660489-1b82-43c7-afce-812d2861ecf7.pdf#page=1
## 2 https://games-cdn.washingtonpost.com/notes/prod/default/documents/873ceb77-ad0f-439a-891b-d440139189d0/note/fae3467f-5c96-41b7-99ef-cd49f752e038.pdf
## 3 langerresearch.com/wp-content/uploads/1204a2TrumpInvestigations-1.pdf
## 4 https://www.langerresearch.com/wp-content/uploads/1200a1TrumpandtheMuellerInvestigation-1.pdf
## 5 https://civiqs.com/documents/Civiqs_DailyKos_monthly_banner_book_2019_06.pdf
## 6 https://cdn.cnn.com/cnn/2019/images/06/01/rel7a.-.trump,.investigations.pdf
## Notes
## 1
## 2
## 3
## 4
## 5
## 6
According to the FiveThirtyEightâs Pollster Ratings, the most reliable pollster among fifteen are three pollsters which are Marist College, SurveyUSA, and Emerson College. All of them receive greater than grade A-. The survey questions asking if Congress should impeach/impeach and remove Trump from the three pollsters are analyzed and visualized.
reference - https://projects.fivethirtyeight.com/pollster-ratings
poll$Start <- format(as.Date(poll$Start, format="%m/%d/%Y"))
poll1 <- poll %>%
filter(Pollster == "Marist College"|Pollster == "Emerson College"| Pollster =="SurveyUSA")%>% # filter down to the three pollsters
filter(str_detect(Category, "impeach"))%>%
gather("Answer", "percent",11:13)%>%
select(Start, End, Answer, percent)%>%
group_by(Start, Answer)%>%
summarise(percent = mean(percent))%>%
arrange(Start)
head(poll1)
## # A tibble: 6 x 3
## # Groups: Start [2]
## Start Answer percent
## <chr> <chr> <dbl>
## 1 2017-07-17 No 42
## 2 2017-07-17 Unsure 15
## 3 2017-07-17 Yes 42
## 4 2019-10-03 No 47.5
## 5 2019-10-03 Unsure 4
## 6 2019-10-03 Yes 49
gg <- ggplot(poll1, aes(x= Start, y=percent, fill= Answer))+
geom_bar(aes(fill=Answer), stat="identity", position="dodge",
color="white", width=0.85)
gg <- gg + geom_text(aes(label=percent),hjust=-0.15,
position=position_dodge(width=0.8), size=3)
gg <- gg + coord_flip()
gg <- gg + labs(x="Start_date", y= "percent", title="Do you support the impeachment of President Trump?")
gg <- gg + theme_tufte(base_family="Arial Narrow")
gg <- gg + theme(axis.ticks.x=element_blank())
gg <- gg + theme(axis.text.x=element_blank())
gg <- gg + theme(axis.ticks.y=element_blank())
gg <- gg + theme(legend.position="bottom")
gg <- gg + theme(plot.title=element_text(hjust=0))
gg
When we analyze which pollster has the largest sample set, Ipsos has provided the largest sample size so far. We subset Ipsos pollster and filter the category that contains word “impeach” to specifically look at people’ opinion about impeachment.
There are two poll categories which are begin
and impeach
. The begin
means polls asking if impeachment process should begin and impeach
or impeach_remove
means that polls asking if Congress should impeach/impeach and remove Trump.
poll_Ipsos <- poll %>%
filter(Pollster == "Ipsos") %>% # filter down to Ipsos
filter(str_detect(Category, "impeach"))%>% # subset category = impeach
gather("Answer", "count",11:13)%>% # makes âwideâ data longer
select(Start, End, Answer, count)%>%
arrange(desc(Start))%>%
separate(Start, c("start_year", "start_month", "start_day"), sep = "-")
head(poll_Ipsos)
## start_year start_month start_day End Answer count
## 1 2019 12 02 12/3/2019 Yes 44
## 2 2019 12 02 12/3/2019 Yes 45
## 3 2019 12 02 12/3/2019 No 42
## 4 2019 12 02 12/3/2019 No 45
## 5 2019 12 02 12/3/2019 Unsure 13
## 6 2019 12 02 12/3/2019 Unsure 10
gg1 <- ggplot(poll_Ipsos, aes(x = start_day, y = count, color= Answer)) +
geom_point()+theme_minimal() +
facet_wrap( ~ start_month ) +
labs(title = "Impeachment survey from Ipsos 2019",
x = "Start_date", y = "percent") + theme_bw(base_size = 15)+
theme(axis.text.x = element_blank(),axis.ticks = element_blank())
gg1
## Warning: Removed 1 rows containing missing values (geom_point).
In the poll dataset, there are total 3 different populations (all adults (a), likely voters(lv), registered voters(rv)). In this section, we try to analyze how these populations are thinking about president’s impeachment.
poll %>%
group_by(Pop)%>%
summarise(sum =sum(SampleSize)) #calculating sample size for each population
## # A tibble: 3 x 2
## Pop sum
## <fct> <int>
## 1 a 497574
## 2 lv 9716
## 3 rv 233059
poll2 <- poll %>%
filter(str_detect(Category, "impeach")) # subset category = impeach
#Calculating average percent of each answers
poll3 <- poll2 %>%
filter(Pop == "a")%>%
summarise(Yes = mean(Yes), No = mean(No), Unsure = mean(Unsure, na.rm = TRUE))
poll4 <- poll2 %>%
filter(Pop == "lv")%>%
summarise(Yes = mean(Yes), No = mean(No), Unsure = mean(Unsure, na.rm = TRUE))
poll5 <- poll2 %>%
filter(Pop == "rv")%>%
summarise(Yes = mean(Yes), No = mean(No), Unsure = mean(Unsure, na.rm = TRUE))
# creating a table
poll_pop <- rbind("all adults"= poll3, "likely voters"=poll4, "registered voters"=poll5)
poll_pop
## Yes No Unsure
## all adults 43.55243 43.30278 12.86902
## likely voters 47.62000 45.46000 7.90000
## registered voters 43.79573 45.05897 11.16325
Now, we would like to investigate whether people support impeaching Trump by parties(Republicans, Democrats, and independents). We choose Fox News pollster and CNN/SSRS pollster and then compare the results. Both total sample size are about same around 800. The visualization is grouped by the parties.
Fox News
### tidying data
poll6 <- poll %>%
select(Start, End, Pollster, Rep.Yes, Rep.No, Dem.Yes, Dem.No, Ind.Yes, Ind.No) %>%
filter(Pollster == "Fox News") %>%
gather("Answer", "percent",4:9) %>%
separate(Answer, c("Party", "YesNo"))%>% # separate character by non-character(".")
arrange(desc(Start))
head(poll6)
## Start End Pollster Party YesNo percent
## 1 2019-10-27 10/30/2019 Fox News Rep Yes 8
## 2 2019-10-27 10/30/2019 Fox News Rep No 87
## 3 2019-10-27 10/30/2019 Fox News Dem Yes 86
## 4 2019-10-27 10/30/2019 Fox News Dem No 9
## 5 2019-10-27 10/30/2019 Fox News Ind Yes 38
## 6 2019-10-27 10/30/2019 Fox News Ind No 47
ggplot(poll6, aes(x = YesNo, y = percent, fill = YesNo)) + geom_boxplot() +
facet_wrap(~ Party, ncol = 5)+
labs(title = "Impeachment opinion by party from Fox News",x = "Start_date", y = "percent") + theme_bw(base_size = 15)
CNN/SSRS
### tidying data
poll7 <- poll %>%
filter(Pollster == "CNN/SSRS") %>%
filter(str_detect(Category, "impeach"))%>%
select(Start, End, Pollster, Rep.Yes, Rep.No, Dem.Yes, Dem.No, Ind.Yes, Ind.No) %>%
gather("Answer", "percent",4:9) %>%
separate(Answer, c("Party", "YesNo"))%>%
arrange(desc(Start))
head(poll7)
## Start End Pollster Party YesNo percent
## 1 2019-11-21 11/24/2019 CNN/SSRS Rep Yes 10
## 2 2019-11-21 11/24/2019 CNN/SSRS Rep No 87
## 3 2019-11-21 11/24/2019 CNN/SSRS Dem Yes 90
## 4 2019-11-21 11/24/2019 CNN/SSRS Dem No 6
## 5 2019-11-21 11/24/2019 CNN/SSRS Ind Yes 47
## 6 2019-11-21 11/24/2019 CNN/SSRS Ind No 45
ggplot(poll7, aes(x = YesNo, y = percent, fill = YesNo)) + geom_boxplot() +
facet_wrap(~ Party, ncol = 5)+
labs(title = "Impeachment opinion by party from CNN/SSRS",x = "Start_date", y = "percent") + theme_bw(base_size = 15)
## Warning: Removed 2 rows containing non-finite values (stat_boxplot).
Now let’s look at how this compares to all poll data collected in our sample. We created a new measure called ‘percent’ which is the ratio of yes’s to no’s over time. We then adjust for poll sample size by multiplying the ratio by the log of the sample to see if that changes our results in any way.
poll$percent <- poll$Yes/poll$No
poll$EndDate <- as.Date(poll$End, "%m/%d/%Y")
ggplot(data = poll) +
geom_point(aes(x = EndDate, y = percent,
size = poll$SampleSize,
colour = factor(Category))) +
geom_hline(yintercept=1)
ggplot(data = poll) +
geom_point(aes(x = EndDate, y = percent)) +
geom_smooth(data = poll,
aes(x = EndDate, y = percent))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
poll$wgtpct <- poll$percent * log(poll$SampleSize)
ggplot(data = poll) +
geom_point(aes(x = EndDate, y = wgtpct)) +
geom_smooth(data = poll,
aes(x = EndDate, y = wgtpct,
size = poll$SampleSize))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
ggplot(data = poll) +
geom_point(aes(x = EndDate, y = wgtpct)) +
geom_smooth(data = poll,
aes(x = EndDate, y = wgtpct,
size = poll$SampleSize), method = "lm")
### Regression of poll data
Finally we run a regression on the data and take the slope to estiamte the trajectory of the ratio of Americans in favor of impleachment proceedings.
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:data.table':
##
## hour, isoweek, mday, minute, month, quarter, second, wday,
## week, yday, year
## The following object is masked from 'package:base':
##
## date
days <- yday(poll$EndDate) - 1 # so Jan 1 = day 0
total_days <- cumsum(days)
ref_date <- dmy("01-01-2017")
poll$alldays <- difftime(poll$EndDate,ref_date,units = "days")
lmpct <- lm(poll$percent ~ poll$alldays)
summary(lmpct)
##
## Call:
## lm(formula = poll$percent ~ poll$alldays)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.55833 -0.10327 0.01491 0.09818 0.86686
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.7091200 0.0361863 19.60 < 2e-16 ***
## poll$alldays 0.0003403 0.0000413 8.24 1.99e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1826 on 440 degrees of freedom
## Multiple R-squared: 0.1337, Adjusted R-squared: 0.1317
## F-statistic: 67.9 on 1 and 440 DF, p-value: 1.987e-15
365*3.328e-04
## [1] 0.121472
Now finally let’s go and get tweets from twiter using twitteR package. Initially, we wanted to filter the tweets before few incidents going on with impeachment case but due to access we could not do that. Below is the chunk of code which allows us to get into twitter and get tweets.
## [1] "Using direct authentication"
Let’s get the tweets using keywords such as impeachment, whistleblower, Ukraine. Initially while getting tweets, it brought bunch of retweets then we had to exclude from the data to see reliable results.
# Now let's start extracting tweets regarding impeachment inquiry by using few trending tweets
tweets <- searchTwitter('impeachment, whistleblower, Ukraine, -filter:retweets', n=2000, lang = 'en')
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 2000 tweets were requested but the
## API can only return 397
# Converting tweets into dataframe
tweets_df <- twListToDF(tweets)
tweets_df2 <- tweets_df$text # This vector contain only tweets
# Reading the csv file from local directory
tweets2 <- read.csv('tweets_only2.csv', row.names=NULL, stringsAsFactors = FALSE, header=TRUE)
Again we used Corpus to do text mining and clean the data. There were bunch of unnecessary words which were needed to be excluded otherwise it would brought meaningless result.
words <- Corpus(VectorSource(tweets2$x)) # Saving the tweets in vector 'words' while x is column's name which was given randomly while importing
words <- tm_map(words, tolower)
## Warning in tm_map.SimpleCorpus(words, tolower): transformation drops
## documents
words <- tm_map(words, removeNumbers)
## Warning in tm_map.SimpleCorpus(words, removeNumbers): transformation drops
## documents
words <- tm_map(words, removePunctuation)
## Warning in tm_map.SimpleCorpus(words, removePunctuation): transformation
## drops documents
words <- tm_map(words, stripWhitespace)
## Warning in tm_map.SimpleCorpus(words, stripWhitespace): transformation
## drops documents
words <- tm_map(words, removeWords, stopwords("english"))
## Warning in tm_map.SimpleCorpus(words, removeWords, stopwords("english")):
## transformation drops documents
words <- tm_map(words, removeWords, c("will", "got", "admits<U+0085>", "want", "say")) # This sentence would be helpful for later to remove any unnecessary words
## Warning in tm_map.SimpleCorpus(words, removeWords, c("will", "got",
## "admits<U+0085>", : transformation drops documents
# words <- tm_map(words, gsub, pattern = 'Impeached', replacement= 'Impeachment') # This line of code will replace impeached with impeachment
# Now let's build a matrix and dataframe to show the number of words to make wordcloud
tdm <- TermDocumentMatrix((words))
m <- as.matrix(tdm)
v <- sort(rowSums(m), decreasing=TRUE)
d <- data.frame(word= names(v), freq=v)
head(d,8)
## word freq
## ukraine ukraine 169
## impeachment impeachment 153
## trump trump 134
## whistleblower whistleblower 133
## aid aid 42
## get get 34
## gop gop 33
## realdonaldtrump realdonaldtrump 32
Let’s visualize the data through wordcloud to see the frequency of words and then we will use sentiment analysis to see what Americans think about president Trump’s impeachment inquiry.
set.seed(3322)
wordcloud(words=d$word, freq=d$freq, min.freq=10, max.words =200, random.order=FALSE, decreasing= TRUE, rot.per=0.05, colors=brewer.pal(10,"Dark2"))
# Using sentiment analysis to see people's reaction
imp_tdm <- tidy(tdm)
imp_senti <- imp_tdm %>%
inner_join(get_sentiments("bing"), by=c(term="word"))
imp_senti %>%
count(sentiment, term, wt=count) %>%
ungroup() %>%
filter(n>= 3) %>%
mutate(n= ifelse(sentiment=="negative", -n, n)) %>%
mutate(term=reorder(term,n)) %>%
ggplot(aes(term, n, fill=sentiment))+ geom_bar(stat="identity")+ylab("People's sentiment on Trump's Impeachment")+coord_flip()
Result from sentiment analysis shows that people are upset and angry on his Ukraine case. Although it would be better if we had access, it would help to see the sentiments on different time period. It would also let us filter by locations and the project would be more specific to the the question “Does Americans support Trump’s impeachment”
The result based on all three data sources show mix opinion either president Trump should be impeached or not. Based on the data sources taken from Washington Post and twitter, people are overall angry and upset about his Ukraine case and see that as a shameful but the result was still not clear. Five thirty eight’s result shows clearly mixed opinion. Overall polls shows almost same result but if we take a look at data more specifically, we see that Democrat’s clearly support his impeachment while Republicans clearly do not support his impeachment. Since the sample size were around 1000 during all polls, we cannot exactly say if All the Americans want his impeachment or not but we can clearly point out the Democrats want him impeached while Republicans don’t. Since the Ukraine story we can see that overall, more people wanted him impeached than after the Mueller Report. Based on limitations of data accessibility, further research on this topic might bring more insights.