Hillary Clinton was Secretary of State from 2009 to 2013. She had her own email server and her own Blackberry for both personal and government emails. She was investigated twice and found innocent of wrongdoing. The last investigation by the FBI was held with timing so as to interfere with a potential successful run for the U.S. Presidency. One of the criticisms that was leveled at Secretary Clinton, whether actual criminal wrongdoing was found, was that one of the missions of the Department of State is in safeguarding the safety and security of the United States and its citizens, both at home and abroad. And in having sensitive Federal communications sent to a personal server and blackberry, she was going against the advice of her digital policy/cyber team.

What is a Word Cloud

A Word Cloud is essentially however it is a depiction - after you clean up your text - removing stop words and words that do not contribute to analysis, the most frequently occurring terms in a graphical depiction.

This exercise will essentially be a data cleaning exercise with a visualization at the end.

Our Roadmap

Calling Required Libraries`

library(tidyverse)
library(dplyr)
library(tidytext)
library(stringr)
library(tidyr)
library(scales)
library(DT)
library(knitr)
library(tm)
library(tibble)
library(magrittr)
library(purrr)
library(readr)
library(devtools)
library(wordcloud)
library(wordcloud2)# metapackage of all tidyverse packages

1) Bringing in Our Data

df <- read.csv("/cloud/project/HC.csv", header=TRUE, stringsAsFactors=FALSE)

head(df, 2)
##       docID                                subject              documentClass
## 1 C06245106                           (NO SUBJECT) Litigation_F-2016-07895_48
## 2 C06245079 Delivery Status Notification (Failure) Litigation_F-2016-07895_48
##                                                                          pdfLink
## 1 DOCUMENTS/Litigation_F-2016-07895_48/F-2016-07895/DOC_0C06245106/C06245106.pdf
## 2 DOCUMENTS/Litigation_F-2016-07895_48/F-2016-07895/DOC_0C06245079/C06245079.pdf
##   originalLink docDate postedDate                 from              to
## 1           NA         2018-10-04     Tom Buffenbarger Hillary Clinton
## 2           NA         2018-10-04 Mail Delivery System Hillary Clinton
##   messageNumber   caseNumber
## 1               F-2016-07895
## 2               F-2016-07895
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        docText
## 1                                                                                                                                                                                                                                         UNCLASSIFIED U.S. Department of State Case No. F-2016-07895 Doc No. C06245106 Date: 09/26/2018From: Buffenbarger TomTo: Hillary Clinton (hdr22@clinfonemail.com) hdr22@clintonemail.comGood afternoon, Madam Secretary,RELEASE IN PART B6Today's New York Times confirms what we knew lay just over the horizon. "Rapid Declines in Manufactur ing Spread GlobalAnxiety" details what one analyst calls a "classic adverse feedback loop" Unless that loop is reversed, the contraction inmanufacturing could increase instability in many countries and it could undermine America's own recovery.Unfortunately, industrial output indicators lag behind events by 60 to 90 days. The magnitude of the manufacturing crisis isonly now becoming evident. Next month's G-20 meeting may come too soon to focus on this newest crisis, but I am confidentyou will find a way to start the consultations required for a coordinated and effective global recovery.Good luck in London and just know I and your union remain exceedingly proud of the work you do for all Americans.Tom BuffenbargerNotice: This message is intended for the addressee only and may contain privileged and/or confidentia I information. Use ordissemination by anyone other than the intended recipient is prohibited.UNCLASSIFIED U.S. Department of State Case No. F-2016-07895 Doc No. C06245106 Date: 09126/2018\f
## 2 UNCLASSIFIED U.S. Department of State Case No. F-2016-07895 Doc No. C06245079 Date: 09120/2018Delivery Status Notification (Failure)From:To:Subject:Mail Delivery System MAILER-DAEMON@smip07.bis.na.bl ackberry.comSRSO.xqtKNI-I.AC.att.blackberry.ne 1.1r15@srs.bis.na.b lackberry.comDelivery Status Notification (Failure)The following message to was undeliverable.The reason for the problem:5.1.0 - Unknown address error 550-1#5.1.0 Address rejectedEmbedded MessageFinal-Recipient:Action: failedStatus: 5.0.0 (permanent failure)Remote-MTA: dns, [207.105.247.120]Diagnostic-Code: smtp, 5.1.0 - Unknown address error 550-'#5.1.0 Address rejected0) End of Embedded Message Embedded Message From: "H"To: "Jim Steinberg"Cc: "Thomas E. Donilon"Date: Sat, 4 Apr 2009 13:14:57Subject: IraqRELEASE IN PARTB5,B6B6(delivery attempts:UNCLASSIFIED U.S. Department of State Case No. F-2016-07895 Doc No. C06245079 Date: 09120/2018\f\nUNCLASSIFIED U.S. Department of State Case No. F-2016-07895 Doc No. C06245079 Date: 09120/2018The President wants I told him you would lead it B5for State starting next week and that Chris Hill should be involved. He agreed and I talked to Donilon who said there had beensome work already but he'd get the broader policy review organized. B5The President was particularly focussed onI had already asked Jeff Feltmann to give me info about what NEA was doing in and about Iraq so there is material for us toreview.Let's discuss Monday. Thx.End of Embedded MessageB5135UNCLASSIFIED U.S. Department of State Case No. F-2016-07895 Doc No. C06245079 Date: 0912012018\f

2) Data Cleaning

Dropping Some Variables

df <- df[ c(8,9,12) ]
df <- df[ c(1,3)]
head(df,2)
##                   from
## 1     Tom Buffenbarger
## 2 Mail Delivery System
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        docText
## 1                                                                                                                                                                                                                                         UNCLASSIFIED U.S. Department of State Case No. F-2016-07895 Doc No. C06245106 Date: 09/26/2018From: Buffenbarger TomTo: Hillary Clinton (hdr22@clinfonemail.com) hdr22@clintonemail.comGood afternoon, Madam Secretary,RELEASE IN PART B6Today's New York Times confirms what we knew lay just over the horizon. "Rapid Declines in Manufactur ing Spread GlobalAnxiety" details what one analyst calls a "classic adverse feedback loop" Unless that loop is reversed, the contraction inmanufacturing could increase instability in many countries and it could undermine America's own recovery.Unfortunately, industrial output indicators lag behind events by 60 to 90 days. The magnitude of the manufacturing crisis isonly now becoming evident. Next month's G-20 meeting may come too soon to focus on this newest crisis, but I am confidentyou will find a way to start the consultations required for a coordinated and effective global recovery.Good luck in London and just know I and your union remain exceedingly proud of the work you do for all Americans.Tom BuffenbargerNotice: This message is intended for the addressee only and may contain privileged and/or confidentia I information. Use ordissemination by anyone other than the intended recipient is prohibited.UNCLASSIFIED U.S. Department of State Case No. F-2016-07895 Doc No. C06245106 Date: 09126/2018\f
## 2 UNCLASSIFIED U.S. Department of State Case No. F-2016-07895 Doc No. C06245079 Date: 09120/2018Delivery Status Notification (Failure)From:To:Subject:Mail Delivery System MAILER-DAEMON@smip07.bis.na.bl ackberry.comSRSO.xqtKNI-I.AC.att.blackberry.ne 1.1r15@srs.bis.na.b lackberry.comDelivery Status Notification (Failure)The following message to was undeliverable.The reason for the problem:5.1.0 - Unknown address error 550-1#5.1.0 Address rejectedEmbedded MessageFinal-Recipient:Action: failedStatus: 5.0.0 (permanent failure)Remote-MTA: dns, [207.105.247.120]Diagnostic-Code: smtp, 5.1.0 - Unknown address error 550-'#5.1.0 Address rejected0) End of Embedded Message Embedded Message From: "H"To: "Jim Steinberg"Cc: "Thomas E. Donilon"Date: Sat, 4 Apr 2009 13:14:57Subject: IraqRELEASE IN PARTB5,B6B6(delivery attempts:UNCLASSIFIED U.S. Department of State Case No. F-2016-07895 Doc No. C06245079 Date: 09120/2018\f\nUNCLASSIFIED U.S. Department of State Case No. F-2016-07895 Doc No. C06245079 Date: 09120/2018The President wants I told him you would lead it B5for State starting next week and that Chris Hill should be involved. He agreed and I talked to Donilon who said there had beensome work already but he'd get the broader policy review organized. B5The President was particularly focussed onI had already asked Jeff Feltmann to give me info about what NEA was doing in and about Iraq so there is material for us toreview.Let's discuss Monday. Thx.End of Embedded MessageB5135UNCLASSIFIED U.S. Department of State Case No. F-2016-07895 Doc No. C06245079 Date: 0912012018\f

Renaming Some of the Columns

df <-rename(df,From = from, Email = docText)
head(df, 2)
##                   From
## 1     Tom Buffenbarger
## 2 Mail Delivery System
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Email
## 1                                                                                                                                                                                                                                         UNCLASSIFIED U.S. Department of State Case No. F-2016-07895 Doc No. C06245106 Date: 09/26/2018From: Buffenbarger TomTo: Hillary Clinton (hdr22@clinfonemail.com) hdr22@clintonemail.comGood afternoon, Madam Secretary,RELEASE IN PART B6Today's New York Times confirms what we knew lay just over the horizon. "Rapid Declines in Manufactur ing Spread GlobalAnxiety" details what one analyst calls a "classic adverse feedback loop" Unless that loop is reversed, the contraction inmanufacturing could increase instability in many countries and it could undermine America's own recovery.Unfortunately, industrial output indicators lag behind events by 60 to 90 days. The magnitude of the manufacturing crisis isonly now becoming evident. Next month's G-20 meeting may come too soon to focus on this newest crisis, but I am confidentyou will find a way to start the consultations required for a coordinated and effective global recovery.Good luck in London and just know I and your union remain exceedingly proud of the work you do for all Americans.Tom BuffenbargerNotice: This message is intended for the addressee only and may contain privileged and/or confidentia I information. Use ordissemination by anyone other than the intended recipient is prohibited.UNCLASSIFIED U.S. Department of State Case No. F-2016-07895 Doc No. C06245106 Date: 09126/2018\f
## 2 UNCLASSIFIED U.S. Department of State Case No. F-2016-07895 Doc No. C06245079 Date: 09120/2018Delivery Status Notification (Failure)From:To:Subject:Mail Delivery System MAILER-DAEMON@smip07.bis.na.bl ackberry.comSRSO.xqtKNI-I.AC.att.blackberry.ne 1.1r15@srs.bis.na.b lackberry.comDelivery Status Notification (Failure)The following message to was undeliverable.The reason for the problem:5.1.0 - Unknown address error 550-1#5.1.0 Address rejectedEmbedded MessageFinal-Recipient:Action: failedStatus: 5.0.0 (permanent failure)Remote-MTA: dns, [207.105.247.120]Diagnostic-Code: smtp, 5.1.0 - Unknown address error 550-'#5.1.0 Address rejected0) End of Embedded Message Embedded Message From: "H"To: "Jim Steinberg"Cc: "Thomas E. Donilon"Date: Sat, 4 Apr 2009 13:14:57Subject: IraqRELEASE IN PARTB5,B6B6(delivery attempts:UNCLASSIFIED U.S. Department of State Case No. F-2016-07895 Doc No. C06245079 Date: 09120/2018\f\nUNCLASSIFIED U.S. Department of State Case No. F-2016-07895 Doc No. C06245079 Date: 09120/2018The President wants I told him you would lead it B5for State starting next week and that Chris Hill should be involved. He agreed and I talked to Donilon who said there had beensome work already but he'd get the broader policy review organized. B5The President was particularly focussed onI had already asked Jeff Feltmann to give me info about what NEA was doing in and about Iraq so there is material for us toreview.Let's discuss Monday. Thx.End of Embedded MessageB5135UNCLASSIFIED U.S. Department of State Case No. F-2016-07895 Doc No. C06245079 Date: 0912012018\f

One token per row format

tidy_HC <- df %>%
  unnest_tokens(word, Email)

Removing stopwords

data(stop_words)

tidy_HC2 <- tidy_HC %>%
  anti_join(stop_words)

Removing unique stopwords which don’t make sense in our analysis

uni_sw <- data.frame(word = c("U.S. Department of State ","27","state.gov","fyi","45","2018","from","october", "u.s", "thursday", "tuesday", "day","april", "september","clintonemail.com","clintonemail","04", "16", "23", "march", "world", "hrod17","government","monday", "friday","saturday","people","2009","al", "21","25", "email","staff","country", "http","told","14","2010", "2011", "2012","2014", "2015","2016","30", "b6", "2", "b5", "4", "6", "20", "00", "26", "19", "05","january","UNCLASSIFIED", "02", "01", "07895", "time", "call", "07", "3", "29", "1", "09", "08", "02", "12", "31", "message", "10", "11", "gov", "hillary", "june", "july", "28", "24","august", "03", "17", "18", "0", "22", "wednesday", "sunday", "13", "5", "7", "15", "1.4", "8", "06", "office", "mailto", "01", "call", "07", "3", "29", "1", "fw", "09", "cc", "08", "12", "original", "message", "10", "11", "31", "gov", "pm", "subject", "20439", "doc", "date","department", "9","unclassified","Case No"))

tidy_HC3 <- tidy_HC2 %>%
  anti_join(uni_sw, by = "word")

3) Identifying the Most Frequently Occuring Terms

Looking at the most frequently occuring terms

HC_count <- tidy_HC3 %>%
  count(word, sort = TRUE)
head(HC_count, 20)
##         word      n
## 1        cid 190355
## 2    release  34002
## 3     cheryl  29366
## 4       huma  25779
## 5      mills  25448
## 6   sullivan  25132
## 7      jacob  24223
## 8     abedin  20719
## 9  president  17879
## 10 secretary  17072
## 11     hdr22  13613
## 12  security  13458
## 13   clinton  13080
## 14   meeting  13038
## 15   foreign  12283
## 16     house  11641
## 17    united  11220
## 18      news  11070
## 19   millscd  10601
## 20     obama  10547

“CID” means Criminal Investigations Department”

4) Visualize Our Word Cloud

wordcloud2(HC_count, size = 2.3, minRotation = -pi/6, maxRotation = -pi/6, rotateRatio = 1)