2 Analysis of political speech

#Loading packages
library(RColorBrewer) #color pallet
library(tm) #package text mining (tm)
library(wordcloud) 
library(DT)

First of all, I load data (speech) that was in a file text.

text<-readLines("C:/text.txt",encoding ='UTF-8')
text[1:10]

##  [1] "<U+FEFF>Chief Justice Roberts, President Carter, President Clinton, President Bush, fellow Americans and people of the world  thank you."
##  [2] ""                                                                                                                                     
##  [3] "We the citizens of America have now joined a great national effort to rebuild our county and restore its promise for all our people. "
##  [4] ""                                                                                                                                     
##  [5] "Together we will determine the course of America for many, many years to come."                                                       
##  [6] ""                                                                                                                                     
##  [7] "Together we will face challenges. We will confront hardships. But we will get the job done."                                          
##  [8] ""                                                                                                                                     
##  [9] "Every four years we gather on these steps to carry out the orderly and peaceful transfer of power."                                   
## [10] ""

The next step is to clean data.

#Removing the stopwords
text<-removeWords(text,stopwords("en"))
#Removing the punctuations
text<-removePunctuation(text)
#Removing the empty spaces
text<-text[-which(text=="")]
#Making all text lowercase
for (i in 1:length(text)) text[i]<-tolower(text[i])
#Choosing the words to be removed
text<-removeWords(text,c("the","there","this","'ve","it's","their","and"))

Corpus is a set of text vectors

doc<-Corpus(VectorSource(text))

The term-documents matrix is a table containing the frequency of each word in the speech.

#Building a matrix of words
(tdm<-TermDocumentMatrix(doc))

## <<TermDocumentMatrix (terms: 425, documents: 61)>>
## Non-/sparse entries: 609/25316
## Sparsity           : 98%
## Maximal term length: 14
## Weighting          : term frequency (tf)

dim(tdm)

## [1] 425  61

We can conclude that there are 424 words and 61 paragraphs in the text.

m <- as.matrix(tdm)
v <- sort(rowSums(m),decreasing=TRUE)

I try to display every word included in the speech in an attractive way using the package DT. Therefore I got a table that shows each word and its frequency.

d <- data.frame(word = names(v),freq=v)
datatable(d,class='compact',options = list(
  initComplete = JS(
    "function(settings, json) {",
    "$(this.api().table().header()).css({'background-color': '#000', 'color': '#fff'});",
    "}")
))

Creating the wordcloud

wordcloud(words = d$word, freq = d$freq, min.freq = 1,random.order=FALSE,max.words=200,
rot.per=0.35,colors=brewer.pal(20, "Paired"))

Note: Feel free to ask me about anything that seems not clear!

Analyzing Donald Trump inauguration speech

Mouna BELAID, Engineering Student

January 22th 2017

1 Introduction

2 Analysis of political speech