We have several speech texts from president candidates of the USA, such as Barack Obama, Donald Trump and Joe Biden. At the end, we will see how their speech sentiment affect the condition at the USA.
We will see the details in cloudwords, and another type of visualizations.
We have these libraries to be used in the whole project.
In this part, we will just see how’s the future president (at that time) do their speech before they are announced as winner and in the victory speech. And the following details is the length and main head of the speeches.
This part is a caller for original data from speeces that we want to observe more.
toSpace <- content_transformer(function (x , pattern) gsub(pattern, " ", x))
BOE <- readLines("/Users/jeank4723/Desktop/2021DSFS/TMS/TMS project_1/Transcript of President Barack Obama’s Election Night Speech.txt", encoding ='UTF-8')
## Warning in readLines("/Users/jeank4723/Desktop/2021DSFS/TMS/TMS project_1/
## Transcript of President Barack Obama’s Election Night Speech.txt", : 於 '/Users/
## jeank4723/Desktop/2021DSFS/TMS/TMS project_1/Transcript of President Barack
## Obama’s Election Night Speech.txt' 找到不完整的最後一列
BOEs <- Corpus(VectorSource(BOE))
Here is the Barack Obama’s speech when he was announced as the winner.
BOV <- readLines("/Users/jeank4723/Desktop/2021DSFS/TMS/TMS project_1/Transcript Of President Barack Obama's Victory Speech.txt", encoding ='UTF-8')
## Warning in readLines("/Users/jeank4723/Desktop/2021DSFS/TMS/TMS project_1/
## Transcript Of President Barack Obama's Victory Speech.txt", : 於 '/Users/
## jeank4723/Desktop/2021DSFS/TMS/TMS project_1/Transcript Of President Barack
## Obama's Victory Speech.txt' 找到不完整的最後一列
BOVs <- Corpus(VectorSource(BOV))
toSpace <- content_transformer(function (x , pattern ) gsub(pattern, " ", x))
DTE <- readLines("/Users/jeank4723/Desktop/2021DSFS/TMS/TMS project_1/Transcript Of President Donald Trump's Election Night Speech.txt", encoding ='UTF-8')
## Warning in readLines("/Users/jeank4723/Desktop/2021DSFS/TMS/TMS project_1/
## Transcript Of President Donald Trump's Election Night Speech.txt", : 於 '/Users/
## jeank4723/Desktop/2021DSFS/TMS/TMS project_1/Transcript Of President Donald
## Trump's Election Night Speech.txt' 找到不完整的最後一列
DTEs <- Corpus(VectorSource(DTE))
DTV <- readLines("/Users/jeank4723/Desktop/2021DSFS/TMS/TMS project_1/Transcript Of President Donald Trump's Victory Speech.txt", encoding ='UTF-8')
DTVs <- Corpus(VectorSource(DTV))
JBE <- readLines("/Users/jeank4723/Desktop/2021DSFS/TMS/TMS project_1/Transcript Of President Joe Biden’s Election Night Speech.txt", encoding ='UTF-8')
## Warning in readLines("/Users/jeank4723/Desktop/2021DSFS/TMS/TMS project_1/
## Transcript Of President Joe Biden’s Election Night Speech.txt", : 於 '/Users/
## jeank4723/Desktop/2021DSFS/TMS/TMS project_1/Transcript Of President Joe Biden’s
## Election Night Speech.txt' 找到不完整的最後一列
JBEs <- Corpus(VectorSource(JBE))
JBV <- readLines("/Users/jeank4723/Desktop/2021DSFS/TMS/TMS project_1/Transcript Of President Joe Biden’s Victory Speech.txt", encoding ='UTF-8')
## Warning in readLines("/Users/jeank4723/Desktop/2021DSFS/TMS/TMS project_1/
## Transcript Of President Joe Biden’s Victory Speech.txt", : 於 '/Users/jeank4723/
## Desktop/2021DSFS/TMS/TMS project_1/Transcript Of President Joe Biden’s Victory
## Speech.txt' 找到不完整的最後一列
JBVs <- Corpus(VectorSource(JBV))
# inspect(BOEs)
head(BOE, n = 10)
## [1] "The following is the full text of President Obama’s victory speech on Wednesday (Transcript courtesy of the Federal News Service)."
## [2] ""
## [3] "PRESIDENT BARACK OBAMA: Thank you. Thank you. Thank you so much. (Sustained cheers, applause.)"
## [4] ""
## [5] "Tonight, more than 200 years after a former colony won the right to determine its own destiny, the task of perfecting our union moves forward. (Cheers, applause.)"
## [6] ""
## [7] "It moves forward because of you. It moves forward because you reaffirmed the spirit that has triumphed over war and depression, the spirit that has lifted this country from the depths of despair to the great heights of hope, the belief that while each of us will pursue our own individual dreams, we are an American family, and we rise or fall together as one nation and as one people. (Cheers, applause.)"
## [8] ""
## [9] "Tonight, in this election, you, the American people, reminded us that while our road has been hard, while our journey has been long, we have picked ourselves up, we have fought our way back, and we know in our hearts that for the United States of America, the best is yet to come."
## [10] ""
head(BOEs)
## <<SimpleCorpus>>
## Metadata: corpus specific: 1, document level (indexed): 0
## Content: documents: 6
length(BOEs)
## [1] 75
In this part, we will see the part of Barack Obama’s speech and the length is 65.
# inspect(BOVs)
head(BOV, n = 10)
## [1] "In these prepared remarks provided by his campaign, President-Elect Barack Obama calls himself the unlikeliest presidential candidate. He thanks many members of his campaign, along with his enormous army of volunteers, and he warns supporters about what he calls the enormity of the tasks at hand that now face the U.S. He concludes by telling an anecdote about a 106-year-old African-American voter from Atlanta."
## [2] ""
## [3] "If there is anyone out there who still doubts that America is a place where all things are possible; who still wonders if the dream of our founders is alive in our time; who still questions the power of our democracy, tonight is your answer."
## [4] ""
## [5] "It's the answer told by lines that stretched around schools and churches in numbers this nation has never seen; by people who waited three hours and four hours, many for the very first time in their lives, because they believed that this time must be different; that their voice could be that difference."
## [6] ""
## [7] "It's the answer spoken by young and old, rich and poor, Democrat and Republican, black, white, Latino, Asian, Native American, gay, straight, disabled and not disabled — Americans who sent a message to the world that we have never been a collection of red states and blue states; we are, and always will be, the United States of America."
## [8] ""
## [9] "It's the answer that led those who have been told for so long by so many to be cynical, and fearful, and doubtful of what we can achieve to put their hands on the arc of history and bend it once more toward the hope of a better day."
## [10] ""
head(BOVs)
## <<SimpleCorpus>>
## Metadata: corpus specific: 1, document level (indexed): 0
## Content: documents: 6
length(BOVs)
## [1] 65
Length = 375
# inspect(DTEs)
head(DTE, n = 10)
## [1] "Today I‘d like to share my thoughts about the stakes in this election."
## [2] ""
## [3] "People have asked me why I am running for President."
## [4] ""
## [5] "I have built an amazing business that I love and I get to work side-by-side with my children every day."
## [6] ""
## [7] "We come to work together and turn visions into reality."
## [8] ""
## [9] "We think big, and then we make it happen."
## [10] ""
head(DTEs)
## <<SimpleCorpus>>
## Metadata: corpus specific: 1, document level (indexed): 0
## Content: documents: 6
length(DTEs)
## [1] 375
# inspect(DTVs)
head(DTV, n = 10)
## [1] "\"Thank you. Thank you very much, everyone. Sorry to keep you waiting; complicated business; complicated. Thank you very much.\""
## [2] ""
## [3] "I've just received a call from Secretary Clinton. She congratulated us, it's about us, on our victory, and I congratulated her and her family on a very, very hard-fought campaign. I mean, she fought very hard. Hillary has worked very long and very hard over a long period of time, and we owe her a major debt of gratitude for her service to our country. I mean that very sincerely."
## [4] ""
## [5] "\"Now it's time for America to bind the wounds of division, have to get together. To all Republicans and Democrats and independents across this nation, I say it is time for us to come together as one united people. It's time. I pledge to every citizen of our land that I will be president for all Americans, and this is so important to me."
## [6] ""
## [7] "\"For those who have chosen not to support me in the past, of which there were a few people, I'm reaching out to you for your guidance and your help so that we can work together and unify our great country."
## [8] ""
## [9] "\"As I've said from the beginning, ours was not a campaign, but rather an incredible and great movement made up of millions of hard-working men and women, who love their country and want a better, brighter future for themselves and for their families. It's a movement comprised of Americans from all races, religions, backgrounds and beliefs who want and expect our government to serve the people - and serve the people it will."
## [10] ""
head(DTVs)
## <<SimpleCorpus>>
## Metadata: corpus specific: 1, document level (indexed): 0
## Content: documents: 6
length(DTVs)
## [1] 33
# inspect(JBEs)
head(JBE, n = 10)
## [1] "Joe Biden: (00:00)"
## [2] "Your patience is commendable. We knew this was going to go long, but who knew we’re going to go into maybe tomorrow morning, maybe even longer. But look, we feel good about where we’re. We really do. I am here to tell you tonight, we believe we’re on track to win this election. We knew because of the unprecedented early vote and the mail-in vote it was going to take a while. We’re going to have to be patient until the hard work of tallying the votes is finished. And it ain’t over until every vote is counted, every ballot is counted."
## [3] ""
## [4] "Joe Biden: (00:51)"
## [5] "But we’re feeling good. We’re feeling good about where we’re. We believe one of the nets has suggested we’ve already won Arizona, but we’re confident about Arizona. That is a turnaround. We also just called it for Minnesota. And we’re still in the game in Georgia, although that is not one we expected. And we’re feeling real good about Wisconsin and Michigan. And by the way, it is going to take time to count the votes, we’re going to win Pennsylvania."
## [6] ""
## [7] "Joe Biden: (01:31)"
## [8] "I’ve been talking to the folks in Philly, Allegheny County, Scranton, and they’re really encouraged by the turnout and what they see. Look, we can know the results as early as tomorrow morning. But it may take a little longer. As I’ve said all along, it is not my place or Donald Trump is place to declare who is won this election. That is the decision of the American people. But I am optimistic about this outcome. And I want to thank everyone of you who came out and voted in this election. And by the way, Chris Coons and the Democrats, congratulations here in Delaware."
## [9] ""
## [10] "Jill Biden: (02:06)"
head(JBEs)
## <<SimpleCorpus>>
## Metadata: corpus specific: 1, document level (indexed): 0
## Content: documents: 6
length(JBEs)
## [1] 20
# inspect(JBVs)
head(JBV, n = 10)
## [1] "Transcript of President-elect Joe Biden’s victory speech Saturday night in Wilmington, Del., as delivered. Provided by the Biden campaign:"
## [2] ""
## [3] "___"
## [4] ""
## [5] "My fellow Americans, the people of this nation have spoken."
## [6] ""
## [7] "They have delivered us a clear victory. A convincing victory."
## [8] ""
## [9] "A victory for “We the People.”"
## [10] ""
head(JBVs)
## <<SimpleCorpus>>
## Metadata: corpus specific: 1, document level (indexed): 0
## Content: documents: 6
length(JBVs)
## [1] 219
Here we should do some data cleaning for stopwords and we also try to make the format uniform for all speeches, cause we see the original data format is different from one to other, so it’s better for us to tidy up the format then make us easier to do the rest.
BOEs <- BOEs %>%
tm_map(removeNumbers) %>%
tm_map(removePunctuation) %>%
tm_map(stripWhitespace)
## Warning in tm_map.SimpleCorpus(., removeNumbers): transformation drops documents
## Warning in tm_map.SimpleCorpus(., removePunctuation): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(., stripWhitespace): transformation drops
## documents
BOEs <- tm_map(BOEs, content_transformer(tolower))
## Warning in tm_map.SimpleCorpus(BOEs, content_transformer(tolower)):
## transformation drops documents
BOEs <- tm_map(BOEs, removeWords, stopwords("english"))
## Warning in tm_map.SimpleCorpus(BOEs, removeWords, stopwords("english")):
## transformation drops documents
BOEs <- tm_map(BOEs, toSpace, "—")
## Warning in tm_map.SimpleCorpus(BOEs, toSpace, "—"): transformation drops
## documents
BOEs <- tm_map(BOEs, toSpace, "’s")
## Warning in tm_map.SimpleCorpus(BOEs, toSpace, "’s"): transformation drops
## documents
BOEs <- tm_map(BOEs, toSpace, "’re")
## Warning in tm_map.SimpleCorpus(BOEs, toSpace, "’re"): transformation drops
## documents
BOEs <- tm_map(BOEs, toSpace, "’m")
## Warning in tm_map.SimpleCorpus(BOEs, toSpace, "’m"): transformation drops
## documents
BOEs <- tm_map(BOEs, toSpace, "’ve")
## Warning in tm_map.SimpleCorpus(BOEs, toSpace, "’ve"): transformation drops
## documents
BOEs <- tm_map(BOEs, toSpace, "’d")
## Warning in tm_map.SimpleCorpus(BOEs, toSpace, "’d"): transformation drops
## documents
BOEs <- tm_map(BOEs, toSpace, "’ll")
## Warning in tm_map.SimpleCorpus(BOEs, toSpace, "’ll"): transformation drops
## documents
BOVs <- BOVs %>%
tm_map(removeNumbers) %>%
tm_map(removePunctuation) %>%
tm_map(stripWhitespace)
## Warning in tm_map.SimpleCorpus(., removeNumbers): transformation drops documents
## Warning in tm_map.SimpleCorpus(., removePunctuation): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(., stripWhitespace): transformation drops
## documents
BOVs <- tm_map(BOVs, content_transformer(tolower))
## Warning in tm_map.SimpleCorpus(BOVs, content_transformer(tolower)):
## transformation drops documents
BOVs <- tm_map(BOVs, removeWords, stopwords("english"))
## Warning in tm_map.SimpleCorpus(BOVs, removeWords, stopwords("english")):
## transformation drops documents
BOVs <- tm_map(BOVs, toSpace, "—")
## Warning in tm_map.SimpleCorpus(BOVs, toSpace, "—"): transformation drops
## documents
BOVs <- tm_map(BOVs, toSpace, "’s")
## Warning in tm_map.SimpleCorpus(BOVs, toSpace, "’s"): transformation drops
## documents
BOVs <- tm_map(BOVs, toSpace, "’re")
## Warning in tm_map.SimpleCorpus(BOVs, toSpace, "’re"): transformation drops
## documents
BOVs <- tm_map(BOVs, toSpace, "’m")
## Warning in tm_map.SimpleCorpus(BOVs, toSpace, "’m"): transformation drops
## documents
BOVs <- tm_map(BOVs, toSpace, "’ve")
## Warning in tm_map.SimpleCorpus(BOVs, toSpace, "’ve"): transformation drops
## documents
BOVs <- tm_map(BOVs, toSpace, "’d")
## Warning in tm_map.SimpleCorpus(BOVs, toSpace, "’d"): transformation drops
## documents
BOVs <- tm_map(BOVs, toSpace, "’ll")
## Warning in tm_map.SimpleCorpus(BOVs, toSpace, "’ll"): transformation drops
## documents
DTEs <- DTEs %>%
tm_map(removeNumbers) %>%
tm_map(removePunctuation) %>%
tm_map(stripWhitespace) %>%
tm_map(stemDocument)
## Warning in tm_map.SimpleCorpus(., removeNumbers): transformation drops documents
## Warning in tm_map.SimpleCorpus(., removePunctuation): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(., stripWhitespace): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(., stemDocument): transformation drops documents
DTEs <- tm_map(DTEs, content_transformer(tolower))
## Warning in tm_map.SimpleCorpus(DTEs, content_transformer(tolower)):
## transformation drops documents
DTEs <- tm_map(DTEs, removeWords, stopwords("english"))
## Warning in tm_map.SimpleCorpus(DTEs, removeWords, stopwords("english")):
## transformation drops documents
DTEs <- tm_map(DTEs, toSpace, "—")
## Warning in tm_map.SimpleCorpus(DTEs, toSpace, "—"): transformation drops
## documents
DTEs <- tm_map(DTEs, toSpace, "–")
## Warning in tm_map.SimpleCorpus(DTEs, toSpace, "–"): transformation drops
## documents
DTEs <- tm_map(DTEs, toSpace, "’s")
## Warning in tm_map.SimpleCorpus(DTEs, toSpace, "’s"): transformation drops
## documents
DTEs <- tm_map(DTEs, toSpace, "’re")
## Warning in tm_map.SimpleCorpus(DTEs, toSpace, "’re"): transformation drops
## documents
DTEs <- tm_map(DTEs, toSpace, "’m")
## Warning in tm_map.SimpleCorpus(DTEs, toSpace, "’m"): transformation drops
## documents
DTEs <- tm_map(DTEs, toSpace, "’ve")
## Warning in tm_map.SimpleCorpus(DTEs, toSpace, "’ve"): transformation drops
## documents
DTEs <- tm_map(DTEs, toSpace, "’d")
## Warning in tm_map.SimpleCorpus(DTEs, toSpace, "’d"): transformation drops
## documents
DTEs <- tm_map(DTEs, toSpace, "’ll")
## Warning in tm_map.SimpleCorpus(DTEs, toSpace, "’ll"): transformation drops
## documents
DTVs <- DTVs %>%
tm_map(removeNumbers) %>%
tm_map(removePunctuation) %>%
tm_map(stripWhitespace)
## Warning in tm_map.SimpleCorpus(., removeNumbers): transformation drops documents
## Warning in tm_map.SimpleCorpus(., removePunctuation): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(., stripWhitespace): transformation drops
## documents
DTVs <- tm_map(DTVs, content_transformer(tolower))
## Warning in tm_map.SimpleCorpus(DTVs, content_transformer(tolower)):
## transformation drops documents
DTVs <- tm_map(DTVs, removeWords, stopwords("english"))
## Warning in tm_map.SimpleCorpus(DTVs, removeWords, stopwords("english")):
## transformation drops documents
DTVs <- tm_map(DTVs, toSpace, "—")
## Warning in tm_map.SimpleCorpus(DTVs, toSpace, "—"): transformation drops
## documents
DTVs <- tm_map(DTVs, toSpace, "’s")
## Warning in tm_map.SimpleCorpus(DTVs, toSpace, "’s"): transformation drops
## documents
DTVs <- tm_map(DTVs, toSpace, "’re")
## Warning in tm_map.SimpleCorpus(DTVs, toSpace, "’re"): transformation drops
## documents
DTVs <- tm_map(DTVs, toSpace, "’m")
## Warning in tm_map.SimpleCorpus(DTVs, toSpace, "’m"): transformation drops
## documents
DTVs <- tm_map(DTVs, toSpace, "’ve")
## Warning in tm_map.SimpleCorpus(DTVs, toSpace, "’ve"): transformation drops
## documents
DTVs <- tm_map(DTVs, toSpace, "’d")
## Warning in tm_map.SimpleCorpus(DTVs, toSpace, "’d"): transformation drops
## documents
DTVs <- tm_map(DTVs, toSpace, "’ll")
## Warning in tm_map.SimpleCorpus(DTVs, toSpace, "’ll"): transformation drops
## documents
JBEs <- JBEs %>%
tm_map(removeNumbers) %>%
tm_map(removePunctuation) %>%
tm_map(stripWhitespace)
## Warning in tm_map.SimpleCorpus(., removeNumbers): transformation drops documents
## Warning in tm_map.SimpleCorpus(., removePunctuation): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(., stripWhitespace): transformation drops
## documents
JBEs <- tm_map(JBEs, content_transformer(tolower))
## Warning in tm_map.SimpleCorpus(JBEs, content_transformer(tolower)):
## transformation drops documents
JBEs <- tm_map(JBEs, removeWords, stopwords("english"))
## Warning in tm_map.SimpleCorpus(JBEs, removeWords, stopwords("english")):
## transformation drops documents
JBEs <- tm_map(JBEs, toSpace, "—")
## Warning in tm_map.SimpleCorpus(JBEs, toSpace, "—"): transformation drops
## documents
JBEs <- tm_map(JBEs, toSpace, "’s")
## Warning in tm_map.SimpleCorpus(JBEs, toSpace, "’s"): transformation drops
## documents
JBEs <- tm_map(JBEs, toSpace, "’re")
## Warning in tm_map.SimpleCorpus(JBEs, toSpace, "’re"): transformation drops
## documents
JBEs <- tm_map(JBEs, toSpace, "’m")
## Warning in tm_map.SimpleCorpus(JBEs, toSpace, "’m"): transformation drops
## documents
JBEs <- tm_map(JBEs, toSpace, "’ve")
## Warning in tm_map.SimpleCorpus(JBEs, toSpace, "’ve"): transformation drops
## documents
JBEs <- tm_map(JBEs, toSpace, "’d")
## Warning in tm_map.SimpleCorpus(JBEs, toSpace, "’d"): transformation drops
## documents
JBEs <- tm_map(JBEs, toSpace, "’ll")
## Warning in tm_map.SimpleCorpus(JBEs, toSpace, "’ll"): transformation drops
## documents
JBVs <- JBVs %>%
tm_map(removeNumbers) %>%
tm_map(removePunctuation) %>%
tm_map(stripWhitespace)
## Warning in tm_map.SimpleCorpus(., removeNumbers): transformation drops documents
## Warning in tm_map.SimpleCorpus(., removePunctuation): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(., stripWhitespace): transformation drops
## documents
JBVs <- tm_map(JBVs, content_transformer(tolower))
## Warning in tm_map.SimpleCorpus(JBVs, content_transformer(tolower)):
## transformation drops documents
JBVs <- tm_map(JBVs, removeWords, stopwords("english"))
## Warning in tm_map.SimpleCorpus(JBVs, removeWords, stopwords("english")):
## transformation drops documents
JBVs <- tm_map(JBVs, toSpace, "—")
## Warning in tm_map.SimpleCorpus(JBVs, toSpace, "—"): transformation drops
## documents
JBVs <- tm_map(JBVs, toSpace, "’s")
## Warning in tm_map.SimpleCorpus(JBVs, toSpace, "’s"): transformation drops
## documents
JBVs <- tm_map(JBVs, toSpace, "’re")
## Warning in tm_map.SimpleCorpus(JBVs, toSpace, "’re"): transformation drops
## documents
JBVs <- tm_map(JBVs, toSpace, "’m")
## Warning in tm_map.SimpleCorpus(JBVs, toSpace, "’m"): transformation drops
## documents
JBVs <- tm_map(JBVs, toSpace, "’ve")
## Warning in tm_map.SimpleCorpus(JBVs, toSpace, "’ve"): transformation drops
## documents
JBVs <- tm_map(JBVs, toSpace, "’d")
## Warning in tm_map.SimpleCorpus(JBVs, toSpace, "’d"): transformation drops
## documents
JBVs <- tm_map(JBVs, toSpace, "’ll")
## Warning in tm_map.SimpleCorpus(JBVs, toSpace, "’ll"): transformation drops
## documents
We are trying to use the approach from what Professor teach us.
Latent dirichlet allocation (LDA) is an approach used in topic modeling based on probabilistic vectors of words, which indicate their relevance to the text corpus. … The approach we propose is based on identifying topical clusters in text based on co-occurrence of words. (reference on the bottom)
LDA is an iterative algorithm that identifies a set of topic that occure in a set of documents. LDA need to know home many topics it’s searching.
At the end, after many iterations, we will get a list of words in each topic with probabilities. We can also select the top 5 words which tend to occur together in the same context.
LDA is the most popular and well-tested method for topic modelling like we want to do here.
BOEslda.dtm <- DocumentTermMatrix(BOEs, control=list(minDocFreq=2, minWordLength=2))
BOEslda.rowTotals <- apply(BOEslda.dtm , 1, sum)
BOEsldadtm.new <- BOEslda.dtm[BOEslda.rowTotals> 0, ]
BOEstopicModel <- LDA(
BOEsldadtm.new,
k = 3,
method="Gibbs",
control = list(seed=1234))
BOEsap_topics <- BOEstopicModel %>%
tidy(matrix = "beta")
BOEstopicModel %>%
tidy(matrix = "beta")
## # A tibble: 1,881 × 3
## topic term beta
## <int> <chr> <dbl>
## 1 1 courtesy 0.00259
## 2 2 courtesy 0.000222
## 3 3 courtesy 0.000225
## 4 1 federal 0.000235
## 5 2 federal 0.00244
## 6 3 federal 0.000225
## 7 1 following 0.000235
## 8 2 following 0.00244
## 9 3 following 0.000225
## 10 1 full 0.000235
## # … with 1,871 more rows
BOEsap_top_terms <- BOEsap_topics %>%
group_by(topic) %>%
slice_max(beta, n = 10) %>%
ungroup() %>%
arrange(desc(beta))
BOEsap_top_terms %>%
mutate(term = reorder_within(term, beta, topic)) %>%
ggplot(aes(beta, term, fill = factor(topic))) +
labs(title = "Topic Modeling (LDA) - Barack Obama's Victory Speech") +
geom_col(show.legend = FALSE) +
facet_wrap(~ topic, scales = "free") +
scale_y_reordered() +
theme_minimal() +
scale_fill_brewer(palette = "Pastel1")
BOVslda.dtm <- DocumentTermMatrix(BOVs, control=list(minDocFreq=2, minWordLength=2))
BOVslda.rowTotals <- apply(BOVslda.dtm , 1, sum)
BOVsldadtm.new <- BOVslda.dtm[BOVslda.rowTotals> 0, ]
BOVstopicModel <- LDA(
BOVsldadtm.new,
k = 3,
method="Gibbs",
control = list(seed=1234))
BOVsap_topics <- BOVstopicModel %>%
tidy(matrix = "beta")
BOVstopicModel %>%
tidy(matrix = "beta")
## # A tibble: 1,746 × 3
## topic term beta
## <int> <chr> <dbl>
## 1 1 africanamerican 0.000253
## 2 2 africanamerican 0.000277
## 3 3 africanamerican 0.00308
## 4 1 along 0.00278
## 5 2 along 0.00305
## 6 3 along 0.000280
## 7 1 anecdote 0.000253
## 8 2 anecdote 0.00305
## 9 3 anecdote 0.000280
## 10 1 army 0.00278
## # … with 1,736 more rows
BOVsap_top_terms <- BOVsap_topics %>%
group_by(topic) %>%
slice_max(beta, n = 10) %>%
ungroup() %>%
arrange(desc(beta))
BOVsap_top_terms %>%
mutate(term = reorder_within(term, beta, topic)) %>%
ggplot(aes(beta, term, fill = factor(topic))) +
labs(title = "Topic Modeling (LDA) - Barack Obama's Victory Speech") +
geom_col(show.legend = FALSE) +
facet_wrap(~ topic, scales = "free") +
scale_y_reordered() +
theme_minimal() +
scale_fill_brewer(palette = "Pastel1")
DTEslda.dtm <- DocumentTermMatrix(DTEs, control=list(minDocFreq=2, minWordLength=2))
DTEslda.rowTotals <- apply(DTEslda.dtm , 1, sum)
DTEsldadtm.new <- DTEslda.dtm[DTEslda.rowTotals> 0, ]
DTEstopicModel <- LDA(
DTEsldadtm.new,
k = 3,
method="Gibbs",
control = list(seed=1234))
DTEsap_topics <- DTEstopicModel %>%
tidy(matrix = "beta")
DTEstopicModel %>%
tidy(matrix = "beta")
## # A tibble: 2,370 × 3
## topic term beta
## <int> <chr> <dbl>
## 1 1 elect 0.000146
## 2 2 elect 0.000152
## 3 3 elect 0.00865
## 4 1 like 0.000146
## 5 2 like 0.00625
## 6 3 like 0.000142
## 7 1 share 0.00306
## 8 2 share 0.000152
## 9 3 share 0.00156
## 10 1 stake 0.000146
## # … with 2,360 more rows
DTEsap_top_terms <- DTEsap_topics %>%
group_by(topic) %>%
slice_max(beta, n = 10) %>%
ungroup() %>%
arrange(desc(beta))
DTEsap_top_terms %>%
mutate(term = reorder_within(term, beta, topic)) %>%
ggplot(aes(beta, term, fill = factor(topic))) +
labs(title = "Topic Modeling (LDA) - Donald Trump's Election Night Speech") +
geom_col(show.legend = FALSE) +
facet_wrap(~ topic, scales = "free") +
scale_y_reordered() +
theme_minimal() +
scale_fill_brewer(palette = "Accent")
DTVslda.dtm <- DocumentTermMatrix(DTVs, control=list(minDocFreq=2, minWordLength=2))
DTVslda.rowTotals <- apply(DTVslda.dtm , 1, sum)
DTVsldadtm.new <- DTVslda.dtm[DTVslda.rowTotals> 0, ]
DTVstopicModel <- LDA(
DTVsldadtm.new,
k = 3,
method="Gibbs",
control = list(seed=1234))
DTVsap_topics <- DTVstopicModel %>%
tidy(matrix = "beta")
DTVstopicModel %>%
tidy(matrix = "beta")
## # A tibble: 816 × 3
## topic term beta
## <int> <chr> <dbl>
## 1 1 business 0.0116
## 2 2 business 0.000549
## 3 3 business 0.000549
## 4 1 complicated 0.000552
## 5 2 complicated 0.000549
## 6 3 complicated 0.0115
## 7 1 everyone 0.000552
## 8 2 everyone 0.000549
## 9 3 everyone 0.0170
## 10 1 keep 0.000552
## # … with 806 more rows
DTVsap_top_terms <- DTVsap_topics %>%
group_by(topic) %>%
slice_max(beta, n = 10) %>%
ungroup() %>%
arrange(desc(beta))
DTVsap_top_terms %>%
mutate(term = reorder_within(term, beta, topic)) %>%
ggplot(aes(beta, term, fill = factor(topic))) +
labs(title = "Topic Modeling (LDA) - Donald Trump's Victory Speech") +
geom_col(show.legend = FALSE) +
facet_wrap(~ topic, scales = "free") +
scale_y_reordered() +
theme_minimal() +
scale_fill_brewer(palette = "Accent")
JBEslda.dtm <- DocumentTermMatrix(JBEs, control=list(minDocFreq=2, minWordLength=2))
JBEslda.rowTotals <- apply(JBEslda.dtm , 1, sum)
JBEsldadtm.new <- JBEslda.dtm[JBEslda.rowTotals> 0, ]
JBEstopicModel <- LDA(
JBEsldadtm.new,
k = 3,
method="Gibbs",
control = list(seed=123))
JBEsap_topics <- JBEstopicModel %>%
tidy(matrix = "beta")
JBEstopicModel %>%
tidy(matrix = "beta")
## # A tibble: 414 × 3
## topic term beta
## <int> <chr> <dbl>
## 1 1 biden 0.0703
## 2 2 biden 0.0127
## 3 3 biden 0.00136
## 4 1 joe 0.00115
## 5 2 joe 0.0588
## 6 3 joe 0.00136
## 7 1 ain’t 0.00115
## 8 2 ain’t 0.0127
## 9 3 ain’t 0.00136
## 10 1 ballot 0.00115
## # … with 404 more rows
JBEsap_top_terms <- JBEsap_topics %>%
group_by(topic) %>%
slice_max(beta, n = 10) %>%
ungroup() %>%
arrange(desc(beta))
JBEsap_top_terms %>%
mutate(term = reorder_within(term, beta, topic)) %>%
ggplot(aes(beta, term, fill = factor(topic))) +
labs(title = "Topic Modeling (LDA) - Joe Biden’s Election Night Speech") +
geom_col(show.legend = FALSE) +
facet_wrap(~ topic, scales = "free") +
scale_y_reordered() +
theme_minimal() +
scale_fill_brewer(palette = "Set3")
JBVslda.dtm <- DocumentTermMatrix(JBVs, control=list(minDocFreq=2, minWordLength=2))
JBVslda.rowTotals <- apply(JBVslda.dtm , 1, sum)
JBVsldadtm.new <- JBVslda.dtm[JBVslda.rowTotals> 0, ]
JBVstopicModel <- LDA(
JBVsldadtm.new,
k = 3,
method="Gibbs",
control = list(seed=1234))
JBVsap_topics <- JBVstopicModel %>%
tidy(matrix = "beta")
JBVstopicModel %>%
tidy(matrix = "beta")
## # A tibble: 1,383 × 3
## topic term beta
## <int> <chr> <dbl>
## 1 1 biden 0.000316
## 2 2 biden 0.00707
## 3 3 biden 0.000368
## 4 1 campaign 0.000316
## 5 2 campaign 0.0239
## 6 3 campaign 0.000368
## 7 1 del 0.000316
## 8 2 del 0.000337
## 9 3 del 0.00404
## 10 1 delivered 0.000316
## # … with 1,373 more rows
JBVsap_top_terms <- JBVsap_topics %>%
group_by(topic) %>%
slice_max(beta, n = 10) %>%
ungroup() %>%
arrange(desc(beta))
JBVsap_top_terms %>%
mutate(term = reorder_within(term, beta, topic)) %>%
ggplot(aes(beta, term, fill = factor(topic))) +
labs(title = "Topic Modeling (LDA) - Joe Biden’s Victory Speech") +
geom_col(show.legend = FALSE) +
facet_wrap(~ topic, scales = "free") +
scale_y_reordered() +
theme_minimal() +
scale_fill_brewer(palette = "Set3")
We will do the observation for sentiment analysis for each documents. Here we will applies AFINN, BING and NRC for it and we will see sentiment visualization for each speech. The blue color represent the positive value and the red color represent the opposite one.
df.BOE <- data_frame(sentence = 1:length(BOE), text = BOE)
## Warning: `data_frame()` was deprecated in tibble 1.1.0.
## Please use `tibble()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
tidy.df.BOE <- df.BOE %>%
unnest_tokens(word,text)
df.BOV <- data_frame(sentence = 1:length(BOV), text = BOV)
tidy.df.BOV <- df.BOV %>%
unnest_tokens(word,text)
df.DTE <- data_frame(sentence = 1:length(DTE), text = DTE)
tidy.df.DTE <- df.DTE %>%
unnest_tokens(word,text)
df.DTV <- data_frame(sentence = 1:length(DTV), text = DTV)
tidy.df.DTV <- df.DTV %>%
unnest_tokens(word,text)
df.JBE <- data_frame(sentence = 1:length(JBE), text = JBE)
tidy.df.JBE <- df.JBE %>%
unnest_tokens(word,text)
df.JBV <- data_frame(sentence = 1:length(JBV), text = JBV)
tidy.df.JBV <- df.JBV %>%
unnest_tokens(word,text)
AFINN is the simplest and the most popular lexicons that developed by Finn Arup Nielson. It contains more than 3300 words with a polarity score associated with each word. It basically a list of words rated for valence with integer between -5 to +5.
AFINN <- get_sentiments("afinn")
# install.packages("textdata")
afinn.df.BOE <- tidy.df.BOE %>%
dplyr::filter(!word %in% stop_words$word) %>%
count(sentence,word) %>%
inner_join(AFINN) %>%
group_by(word) %>%
summarise(sentiment = sum(value)) %>%
ungroup() %>%
top_n(20, abs(sentiment)) %>%
ggplot(aes(reorder(word,sentiment), sentiment, fill = sentiment > 0)) +
geom_col(show.legend = FALSE) +
labs(title = "Sentiment Analysis (AFINN) - Barack Obama's Victory Speech") +
coord_flip() +
theme_minimal() +
scale_fill_brewer(palette = "Pastel1")
## Joining, by = "word"
afinn.df.BOE
afinn.df.BOV <- tidy.df.BOV %>%
dplyr::filter(!word %in% stop_words$word) %>%
count(sentence,word) %>%
inner_join(AFINN) %>%
group_by(word) %>%
summarise(sentiment = sum(value)) %>%
ungroup() %>%
top_n(20, abs(sentiment)) %>%
ggplot(aes(reorder(word,sentiment), sentiment, fill = sentiment > 0)) +
geom_col(show.legend = FALSE) +
labs(title = "Sentiment Analysis (AFINN) - Barack Obama's Victory Speech") +
coord_flip() +
theme_minimal() +
scale_fill_brewer(palette = "Pastel1")
## Joining, by = "word"
afinn.df.BOV
afinn.df.DTE <- tidy.df.DTE %>%
dplyr::filter(!word %in% stop_words$word) %>%
count(sentence,word) %>%
inner_join(AFINN) %>%
group_by(word) %>%
summarise(sentiment = sum(value)) %>%
ungroup() %>%
top_n(20, abs(sentiment)) %>%
ggplot(aes(reorder(word,sentiment), sentiment, fill = sentiment > 0)) +
geom_col(show.legend = FALSE) +
labs(title = "Sentiment Analysis (AFINN) - Donald Trump's Election Night Speech") +
coord_flip() +
theme_minimal() +
scale_fill_brewer(palette = "Accent")
## Joining, by = "word"
afinn.df.DTE
afinn.df.DTV <- tidy.df.DTV %>%
dplyr::filter(!word %in% stop_words$word) %>%
count(sentence,word) %>%
inner_join(AFINN) %>%
group_by(word) %>%
summarise(sentiment = sum(value)) %>%
ungroup() %>%
top_n(20, abs(sentiment)) %>%
ggplot(aes(reorder(word,sentiment), sentiment, fill = sentiment > 0)) +
geom_col(show.legend = FALSE) +
labs(title = "Sentiment Analysis (AFINN) - Donald Trump's Victory Speech") +
coord_flip() +
theme_minimal() +
scale_fill_brewer(palette = "Accent")
## Joining, by = "word"
afinn.df.DTV
afinn.df.JBE <- tidy.df.JBE %>%
dplyr::filter(!word %in% stop_words$word) %>%
count(sentence,word) %>%
inner_join(AFINN) %>%
group_by(word) %>%
summarise(sentiment = sum(value)) %>%
ungroup() %>%
top_n(20, abs(sentiment)) %>%
ggplot(aes(reorder(word,sentiment), sentiment, fill = sentiment > 0)) +
geom_col(show.legend = FALSE) +
labs(title = "Sentiment Analysis (AFINN) - Joe Biden’s Election Night Speech") +
coord_flip() +
theme_minimal() +
scale_fill_brewer(palette = "Set3")
## Joining, by = "word"
afinn.df.JBE
afinn.df.JBV <- tidy.df.JBV %>%
dplyr::filter(!word %in% stop_words$word) %>%
count(sentence, word, sort = TRUE) %>%
inner_join(AFINN, by = "word") %>%
group_by(word) %>%
summarise(sentiment = sum(value)) %>%
top_n(20, abs(sentiment)) %>%
ggplot(aes(reorder(word,sentiment), sentiment, fill = sentiment > 0)) +
geom_col(show.legend = FALSE) +
labs(title = "Sentiment Analysis (AFINN) - Joe Biden’s Victory Speech") +
coord_flip() +
theme_minimal() +
scale_fill_brewer(palette = "Set3")
afinn.df.JBV
BING lexicon will categorized the words in binary result such as positive and negative categories. This can only give the best result and great performance in a tidy data frame.
BING <- get_sentiments("bing")
bing.df.BOE <- tidy.df.BOE %>%
inner_join(BING) %>%
count(word, sentiment, sort=TRUE) %>%
group_by(sentiment) %>%
arrange(desc(n)) %>%
slice(1:20) %>%
ggplot(aes(x=reorder(word, n), y = n)) +
geom_col(aes(fill=sentiment), show.legend=FALSE) +
labs(x="", y="number of times used", title = "Sentiment Analysis (BING) - Barack Obama’s Election Night Speech") +
coord_flip() +
facet_wrap(~sentiment, scales="free_y")+
theme_minimal() +
scale_fill_brewer(palette = "Pastel1")
## Joining, by = "word"
bing.df.BOE
bing.df.BOV <- tidy.df.BOV %>%
inner_join(BING) %>%
count(word, sentiment, sort=TRUE) %>%
group_by(sentiment) %>%
arrange(desc(n)) %>%
slice(1:20) %>%
ggplot(aes(x=reorder(word, n), y = n)) +
geom_col(aes(fill=sentiment), show.legend=FALSE) +
labs(x="", y="number of times used", title = "Sentiment Analysis (BING) - Barack Obama's Victory Speech") +
coord_flip() +
facet_wrap(~sentiment, scales="free_y")+
theme_minimal() +
scale_fill_brewer(palette = "Pastel1")
## Joining, by = "word"
bing.df.BOV
bing.df.DTE <- tidy.df.DTE %>%
inner_join(BING) %>%
count(word, sentiment, sort=TRUE) %>%
group_by(sentiment) %>%
arrange(desc(n)) %>%
slice(1:20) %>%
ggplot(aes(x=reorder(word, n), y = n)) +
geom_col(aes(fill=sentiment), show.legend=FALSE) +
labs(x="", y="number of times used", title = "Sentiment Analysis (BING) - Donald Trump's Election Night Speech") +
coord_flip() +
facet_wrap(~sentiment, scales="free_y")+
theme_minimal() +
scale_fill_brewer(palette = "Accent")
## Joining, by = "word"
bing.df.DTE
bing.df.DTV <- tidy.df.DTV %>%
inner_join(BING) %>%
count(word, sentiment, sort=TRUE) %>%
group_by(sentiment) %>%
arrange(desc(n)) %>%
slice(1:20) %>%
ggplot(aes(x=reorder(word, n), y = n)) +
geom_col(aes(fill=sentiment), show.legend=FALSE) +
labs(x="", y="number of times used", title = "Sentiment Analysis (BING) - Donald Trump's Victory Speech") +
coord_flip() +
facet_wrap(~sentiment, scales="free_y")+
theme_minimal() +
scale_fill_brewer(palette = "Accent")
## Joining, by = "word"
bing.df.DTV
bing.df.JBE <- tidy.df.JBE %>%
inner_join(BING) %>%
count(word, sentiment, sort=TRUE) %>%
group_by(sentiment) %>%
arrange(desc(n)) %>%
slice(1:20) %>%
ggplot(aes(x=reorder(word, n), y = n)) +
geom_col(aes(fill=sentiment), show.legend=FALSE) +
labs(x="", y="number of times used", title = "Sentiment Analysis (BING) - Joe Biden’s Election Night Speech") +
coord_flip() +
facet_wrap(~sentiment, scales="free_y")+
theme_minimal() +
scale_fill_brewer(palette = "Set3")
## Joining, by = "word"
bing.df.JBE
bing.df.JBV <- tidy.df.JBV %>%
inner_join(BING) %>%
count(word, sentiment, sort=TRUE) %>%
group_by(sentiment) %>%
arrange(desc(n)) %>%
slice(1:20) %>%
ggplot(aes(x=reorder(word, n), y = n)) +
geom_col(aes(fill=sentiment), show.legend=FALSE) +
labs(x="", y="number of times used", title = "Sentiment Analysis (BING) - Joe Biden’s Victory Speech") +
coord_flip() +
facet_wrap(~sentiment, scales="free_y")+
theme_minimal() +
scale_fill_brewer(palette = "Set3")
## Joining, by = "word"
bing.df.JBV
The NRC lexicon is a list of English words and their associations with 8 basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy and disgust), and same like BING, NRC also have positive and negative sentiments.
NRC <- get_sentiments("nrc")
nrc.df.BOE <- tidy.df.BOE %>%
inner_join(NRC) %>%
count(sentiment, sort=TRUE) %>%
ggplot(aes(x=sentiment, y=n, fill=sentiment)) +
geom_bar(stat="identity") +
coord_polar() +
theme(legend.position = "none", axis.text.x = element_blank()) +
theme_minimal() +
scale_fill_brewer(palette = "RdYlBu") +
labs(x = "", y = "", title = "Sentiment Analysis (NRC) - Barack Obama’s Election Night Speech")
## Joining, by = "word"
nrc.df.BOE
nrc.df.BOV <- tidy.df.BOV %>%
inner_join(NRC) %>%
count(sentiment, sort=TRUE) %>%
ggplot(aes(x=sentiment, y=n, fill=sentiment)) +
geom_bar(stat="identity") +
coord_polar() +
theme(legend.position = "none", axis.text.x = element_blank()) +
theme_minimal() +
scale_fill_brewer(palette = "RdYlBu") +
labs(x = "", y = "", title = "Sentiment Analysis (NRC) - Barack Obama's Victory Speech")
## Joining, by = "word"
nrc.df.BOV
nrc.df.DTE <- tidy.df.DTE %>%
inner_join(NRC) %>%
count(sentiment, sort=TRUE) %>%
ggplot(aes(x=sentiment, y=n, fill=sentiment)) +
geom_bar(stat="identity") +
coord_polar() +
theme(legend.position = "none", axis.text.x = element_blank()) +
theme_minimal() +
scale_fill_brewer(palette = "RdYlGn") +
labs(x = "", y = "", title = "Sentiment Analysis (NRC) - Donald Trump's Election Night Speech")
## Joining, by = "word"
nrc.df.DTE
nrc.df.DTV <- tidy.df.DTV %>%
inner_join(NRC) %>%
count(sentiment, sort=TRUE) %>%
ggplot(aes(x=sentiment, y=n, fill=sentiment)) +
geom_bar(stat="identity") +
coord_polar() +
theme(legend.position = "none", axis.text.x = element_blank()) +
theme_minimal() +
scale_fill_brewer(palette = "RdYlGn") +
labs(x = "", y = "", title = "Sentiment Analysis (NRC) - Donald Trump's Victory Speech")
## Joining, by = "word"
nrc.df.DTV
nrc.df.JBE <- tidy.df.JBE %>%
inner_join(NRC) %>%
count(sentiment, sort=TRUE) %>%
ggplot(aes(x=sentiment, y=n, fill=sentiment)) +
geom_bar(stat="identity") +
coord_polar() +
theme(legend.position = "none", axis.text.x = element_blank()) +
theme_minimal() +
scale_fill_brewer(palette = "Spectral") +
labs(x = "", y = "", title = "Sentiment Analysis (NRC) - Joe Biden’s Election Night Speech")
## Joining, by = "word"
nrc.df.JBE
nrc.df.JBV <- tidy.df.JBV %>%
inner_join(NRC) %>%
count(sentiment, sort=TRUE) %>%
ggplot(aes(x=sentiment, y=n, fill=sentiment)) +
geom_bar(stat="identity") +
coord_polar() +
theme(legend.position = "none", axis.text.x = element_blank()) +
theme_minimal() +
scale_fill_brewer(palette = "Spectral") +
labs(x = "", y = "", title = "Sentiment Analysis (NRC) - Joe Biden’s Victory Speech")
## Joining, by = "word"
nrc.df.JBV
BOEsdtm <- TermDocumentMatrix(BOEs)
BOEsmatrix <- as.matrix(BOEsdtm)
BOEswords <- sort(rowSums(BOEsmatrix),decreasing=TRUE)
BOEsdf <- data.frame(word = names(BOEswords),freq=BOEswords)
set.seed(1234)
wordcloud(words = BOEsdf$word,
freq = BOEsdf$freq,
min.freq = 1,
max.words=200,
random.order=FALSE,
rot.per=0.35,
colors=brewer.pal(8, "Pastel1"))
BOVsdtm <- TermDocumentMatrix(BOVs)
BOVsmatrix <- as.matrix(BOVsdtm)
BOVswords <- sort(rowSums(BOVsmatrix),decreasing=TRUE)
BOVsdf <- data.frame(word = names(BOVswords),freq=BOVswords)
set.seed(1234)
wordcloud(words = BOVsdf$word,
freq = BOVsdf$freq,
min.freq = 1,
max.words=200,
random.order=FALSE,
rot.per=0.35,
colors=brewer.pal(8, "Pastel1"))
DTEsdtm <- TermDocumentMatrix(DTEs)
DTEsmatrix <- as.matrix(DTEsdtm)
DTEswords <- sort(rowSums(DTEsmatrix),decreasing=TRUE)
DTEsdf <- data.frame(word = names(DTEswords),freq=DTEswords)
set.seed(1234)
wordcloud(words = DTEsdf$word,
freq = DTEsdf$freq,
min.freq = 1,
max.words=200,
random.order=FALSE,
rot.per=0.35,
colors=brewer.pal(8, "Accent"))
DTVsdtm <- TermDocumentMatrix(DTVs)
DTVsmatrix <- as.matrix(DTVsdtm)
DTVswords <- sort(rowSums(DTVsmatrix),decreasing=TRUE)
DTVsdf <- data.frame(word = names(DTVswords),freq=DTVswords)
set.seed(1234)
wordcloud(words =DTVsdf$word,
freq =DTVsdf$freq,
min.freq = 1,
max.words=200,
random.order=FALSE,
rot.per=0.35,
colors=brewer.pal(8, "Accent"))
JBEsdtm <- TermDocumentMatrix(JBEs)
JBEsmatrix <- as.matrix(JBEsdtm)
JBEswords <- sort(rowSums(JBEsmatrix),decreasing=TRUE)
JBEsdf <- data.frame(word = names(JBEswords),freq=JBEswords)
set.seed(1234)
wordcloud(words = JBEsdf$word,
freq = JBEsdf$freq,
min.freq = 1,
max.words=200,
random.order=FALSE,
rot.per=0.35,
colors=brewer.pal(8, "Set1"))
## Warning in wordcloud(words = JBEsdf$word, freq = JBEsdf$freq, min.freq = 1, :
## supporters could not be fit on page. It will not be plotted.
JBVsdtm <- TermDocumentMatrix(JBVs)
JBVsmatrix <- as.matrix(JBVsdtm)
JBVswords <- sort(rowSums(JBVsmatrix),decreasing=TRUE)
JBVsdf <- data.frame(word = names(JBVswords),freq=JBVswords)
set.seed(1234)
wordcloud(words = JBVsdf$word,
freq = JBVsdf$freq,
min.freq = 1,
max.words=200,
random.order=FALSE,
rot.per=0.35,
colors=brewer.pal(8, "Set1"))
The result is satisfied and give us some keywords which is the correct sentiment as we, as human nature know.
Each president with their speech give the different result on the wordcloud. And some specific words is so visible and strong in their speech.
It has some variation on what sentiment they will build. For example, Donald Trump in his election night speech, he repeat the word “Hillari Clinton”. She is his rival for that presidentiall competition and he just always mentioned her.
## Reference LDA
AFINN AFINN from Github AFINN with Tidy Data BING Sentiment Analysis NRC