Introduction

In this project, we are going to see if Americans support President Trump’s impeachment inquiry or not. For that purpose, we have taken the data from three different sources i.e. Washington post was scraped to see the road to impeachment, getting data from five thirty eight.com and then tweets were taken from twitter. Reason behind was to see if Americans support impeachment without biasness.

Loading libraries

library(knitr)
library(tm)
library(dplyr)
library(twitteR)
library(wordcloud)
library(wordcloud2)
library(ggwordcloud)
library(tidyverse)
library(tidytext)
library(DT)
library(knitr)
library(scales)
library(ggthemes)
library(RColorBrewer)
library(RCurl)
library(XML)
library(stringr)
library(ggplot2)
library(htmlwidgets)
library(rvest)
library(data.table)
library(DT)
library(kableExtra)
library(DBI)
#library(RMySQL)
library(readr)
library(lattice)

Scrapping the data from Washington Post

Road to impeachment

The House of Representatives is engaged in a formal impeachment inquiry of President Trump. It is focused on his efforts to secure specific investigations in Ukraine that carried political benefits for him — including aides allegedly tying those investigations to official U.S. government concessions. To get the data for Trump-Ukraine impeachment timeline of relevant events leading to Trump Impeachment inquiry, we decided to scrap the following website:

https://www.washingtonpost.com/graphics/2019/politics/trump-impeachment-timeline

#First we create an empty data frame for our function to fill. We call it ukraine.
ukraine <- data.frame( Title = character(),
                       Description = character(),
                       stringsAsFactors=FALSE
                       ) 
url <- "https://www.washingtonpost.com/graphics/2019/politics/trump-impeachment-timeline/"
var <- read_html(url)
title <- var %>% 
    html_nodes("div.pg-card .pg-card-title") %>%
    html_text()
#title
description <- var %>%
    html_nodes("div.pg-card .pg-card-description") %>%
    html_text()
description <-  gsub(pattern = "\\[\\]", replace = "", description)
descrition <- stringr::str_replace(description, '\\*', '')
description <-  gsub(pattern = "\n", replace = "", description)
ukraine <- rbind(ukraine, as.data.frame(cbind(title,description)))
## Warning in cbind(title, description): number of rows of result is not a
## multiple of vector length (arg 1)
ukraine <- ukraine %>% mutate(id = row_number())
ukraine <- ukraine[1:215,]
head(ukraine) %>% kable() %>% kable_styling()
title description id
February 22, 2014 Ukrainian President Viktor Yanukovych is ousted from power during a popular uprising in the country. He flees to Russia. After his ouster, Ukrainian officials begin a wide-ranging investigation into corruption in the country. 1
March 7, 2014 Lev Parnas, eventually an associate of former New York City mayor Rudolph W. Giuliani, has his first known interaction with Donald Trump at a golf tournament in Florida. 2
March 1, 2014 Russia invades the Ukrainian peninsula of Crimea, annexing it. 3
May 13, 2014 Hunter Biden, a son of then-U.S. Vice President Joe Biden, joins the board of the Ukrainian energy company Burisma Holdings. It is owned by oligarch Mykola Zlochevsky, one of several subjects of the Ukrainian corruption probe. 4
May 25, 2014 Petro Poroshenko is elected president of Ukraine. 5
February 10, 2015 Viktor Shokin becomes Ukraine’s prosecutor general. 6
tail(ukraine) %>% kable() %>% kable_styling()
title description id
210 November 10, 2019 Hill testifies. 210
211 November 13, 2019 Kent testifies. 211
212 November 15, 2019 McKinley testifies and explains his resignation. “I was disturbed by the implication that foreign governments were being approached to procure negative information on political opponents,” McKinley says. “I was convinced that this would also have a serious impact on Foreign Service morale and the integrity of our work overseas.” 212
213 November 19, 2019 Sondland testifies, saying any pressure he applied on Ukraine to investigate Burisma came before he knew the case involved the Bidens. (He claims this despite Giuliani’s efforts and the Bidens’ proximity to them being in the news by early May.) Sondland says he is making that distinction “because I believe I testified that it would be improper” to push for such political investigations. Asked whether it would be illegal, Sondland says: “I’m not a lawyer, but I assume so.” 213
214 November 20, 2019 Trump announces Perry will resign by the end of the year. 214
215 November 21, 2019 Mulvaney in a news conference momentarily confirms a quid pro quo with Ukraine. “[Did Trump] also mention to me, in the past, that the corruption related to the DNC server?” Mulvaney said. “Absolutely, no question about that. But that’s it. And that’s why we held up the money. . . . The look back to what happened in 2016 certainly was part of the thing that he was worried about in corruption with that nation. And that is absolutely appropriate.” Mulvaney later issues a statement trying to reverse course, saying there actually was no connection. 215

In the above chunk of codes, empty dataframe was created first and then gauged relevant html nodes to scrap the relevant data.

get all Events Text in the timeline events

allDescriptions <- ""
mdescription <- c()
for (i in (1:length(ukraine$title))){
  
   mdescription <- ukraine$description[i] 
   allDescriptions <- paste0(allDescriptions,mdescription)
  
}
allDescriptions <- gsub(pattern = "\\\"", replace = "", allDescriptions)
allDescriptions <-  gsub(pattern = "\\[\\]", replace = "", allDescriptions)
allDescriptions <-  gsub(pattern = "\"", replace = "", allDescriptions)
allDescriptions <-  gsub(pattern = "__", replace = "", allDescriptions)
allDescriptions <-  gsub(pattern = "--", replace = "", allDescriptions)
allDescriptions <-  gsub(pattern = "----", replace = "", allDescriptions)
#allDescriptions

create the corpus and clean it up

Now after getting the data, let’s clean the data using tm package’s Corpus function through removing unnecessary numbers and making the words cleaner.

# putting the words in vector
words <- Corpus(VectorSource(allDescriptions))
#using tm to remove numbers, punctuation and convert to lowercase. Some high frequency words we do not want are removed.
words <- tm_map(words, tolower)
words <- tm_map(words, removeNumbers)
words <- tm_map(words, removePunctuation)
words <- tm_map(words, removeWords, stopwords("english"))
words <- tm_map(words, removeWords, c("will","according", "later", "say", "says", "said", "saying", "tells", "also", "—-", "__" ))
#inspect(words)

create Term-Document Matrix for the corpus

#Build a term-document matrix and dataframe d to show frequency of words
tdm <- TermDocumentMatrix(words)
m <- as.matrix(tdm)
#desc(m)
# head(m, 20) %>% kable() %>% kable_styling()
v <- sort(rowSums(m), decreasing=TRUE)
d <- data.frame(word = names(v), freq=v)
head(d,8) %>% kable() %>% kable_styling()
word freq
trump trump 110
ukraine ukraine 74
zelensky zelensky 70
taylor taylor 65
sondland sondland 52
house house 39
call call 38
giuliani giuliani 37

Visualize the Text Events

#wordcloud2(d, size = 0.7)
wordcloud2(d, size = 1, color = "random-light", backgroundColor = "grey")

display words frequency

ggplot(head(d, 25), aes(reorder(word, freq),freq)) +
  geom_bar(stat = "identity", fill = "#7300AB") +  #03DAC6   #6200EE
  labs(title = "Road To Impeachment words frequency",
       x = "Words", y = "Frequency") +
  geom_text(aes(label=freq), vjust=0.4, hjust= 1.2, size=3, color="white")+
  coord_flip()

Show me the Tweets!!

ggplot(d, aes(label = word, size=2)) +
  geom_text_wordcloud_area(
    mask = png::readPNG("t.png"),
    rm_outside = TRUE, color="skyblue"
  ) +
  scale_size_area(max_size = 10) +
  theme_minimal()
## Some words could not fit on page. They have been removed.