During our coursework for learning how to conduct sentiment analysis, we focused heavily on the differences in speeches among politicans, specfically President Donald Trump, President Barack Obama, and Senator Mitt Romney. Since I am heavily interested in politics, for this project I decided to extend this analysis. Since the first speech as President of the United States is an important speech, it is a defining moment where they can tell their own individual plan of what they want to do as President and what sort of policies and laws they want to enact. Since these speeches are watched by millions of people worldwide, I thought it would be important to focus on Inauragtion speeches for the past twenty years, and how the speeches have changed from President to President, and even from first term to second term if the particular President was re-elected. Here, I include visualizations using the NRC sentiment lexicon, the Bing postive/negative lexicon, and the Bing lexicon presented in a percent of total method. Analysis follows every visualization below.

library(SentimentAnalysis)
## 
## Attaching package: 'SentimentAnalysis'
## The following object is masked from 'package:base':
## 
##     write
library(sentimentr)
library(widyr)
library(FactoMineR)
library(factoextra)
## Loading required package: ggplot2
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(scales)
library(magrittr)
library(xml2)
library(selectr)
library(rvest)
library(cowplot)
library(ggpubr)
## 
## Attaching package: 'ggpubr'
## The following object is masked from 'package:cowplot':
## 
##     get_legend
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(textdata)
library(tidytext)
library(tidyr)
## 
## Attaching package: 'tidyr'
## The following object is masked from 'package:magrittr':
## 
##     extract
library(ggthemes)
## 
## Attaching package: 'ggthemes'
## The following object is masked from 'package:cowplot':
## 
##     theme_map
nrc_sent = get_sentiments("nrc")
bing_sent = get_sentiments("bing")
webpage_biden = read_html("https://www.presidency.ucsb.edu/documents/inaugural-address-53")
webpage_trump = read_html("https://www.presidency.ucsb.edu/documents/inaugural-address-14")
webpage_obama2 = read_html("https://www.presidency.ucsb.edu/documents/inaugural-address-15")
webpage_obama1 = read_html("https://www.presidency.ucsb.edu/documents/inaugural-address-5")
webpage_bush2 = read_html("https://www.presidency.ucsb.edu/documents/inaugural-address-13")
webpage_bush1 = read_html("https://www.presidency.ucsb.edu/documents/inaugural-address-52")
dfw0_biden  = webpage_biden %>% html_nodes("p") %>% html_text()
dfw0_trump  = webpage_trump %>% html_nodes("p") %>% html_text()
dfw0_obama2  = webpage_obama2 %>% html_nodes("p") %>% html_text()
dfw0_obama1  = webpage_obama1 %>% html_nodes("p") %>% html_text()
dfw0_bush2  = webpage_bush2 %>% html_nodes("p") %>% html_text()
dfw0_bush1  = webpage_bush1 %>% html_nodes("p") %>% html_text()
dfw1_biden = data.frame(text = dfw0_biden)
dfw1_trump = data.frame(text = dfw0_trump)
dfw1_obama2 = data.frame(text = dfw0_obama2)
dfw1_obama1 = data.frame(text = dfw0_obama1)
dfw1_bush2 = data.frame(text = dfw0_bush2)
dfw1_bush1 = data.frame(text = dfw0_bush1)
dfw1_biden$text = as.character(dfw1_biden$text)
dfw1_trump$text = as.character(dfw1_trump$text)
dfw1_obama2$text = as.character(dfw1_obama2$text)
dfw1_obama1$text = as.character(dfw1_obama1$text)
dfw1_bush2$text = as.character(dfw1_bush2$text)
dfw1_bush1$text = as.character(dfw1_bush1$text)
dfw3_biden = dfw1_biden %>% unnest_tokens(word, text, 
                              to_lower = T,  
                              strip_punct = T, 
                              strip_numeric = T)
dfw3_trump = dfw1_trump %>% unnest_tokens(word, text, 
                              to_lower = T,  
                              strip_punct = T, 
                              strip_numeric = T)
dfw3_obama2 = dfw1_obama2 %>% unnest_tokens(word, text, 
                              to_lower = T,  
                              strip_punct = T, 
                              strip_numeric = T)
dfw3_obama1 = dfw1_obama1 %>% unnest_tokens(word, text, 
                              to_lower = T,  
                              strip_punct = T, 
                              strip_numeric = T)
dfw3_bush2 = dfw1_bush2 %>% unnest_tokens(word, text, 
                              to_lower = T,  
                              strip_punct = T, 
                              strip_numeric = T)
dfw3_bush1 = dfw1_bush1 %>% unnest_tokens(word, text, 
                              to_lower = T,  
                              strip_punct = T, 
                              strip_numeric = T)
dfw3_biden = dfw3_biden %>% anti_join(stop_words, by = "word")

dfw3_trump = dfw3_trump %>% anti_join(stop_words, by = "word")

dfw3_obama2 = dfw3_obama2 %>% anti_join(stop_words, by = "word")

dfw3_obama1 = dfw3_obama1 %>% anti_join(stop_words, by = "word")

dfw3_bush2 = dfw3_bush2 %>% anti_join(stop_words, by = "word")

dfw3_bush1 = dfw3_bush1 %>% anti_join(stop_words, by = "word")
dfw4_biden = cbind.data.frame(linenumber = row_number(dfw3_biden), dfw3_biden)

dfw4_trump = cbind.data.frame(linenumber = row_number(dfw3_trump), dfw3_trump)

dfw4_obama2 = cbind.data.frame(linenumber = row_number(dfw3_obama2), dfw3_obama2)

dfw4_obama1 = cbind.data.frame(linenumber = row_number(dfw3_obama1), dfw3_obama1)

dfw4_bush2 = cbind.data.frame(linenumber = row_number(dfw3_bush2), dfw3_bush2)

dfw4_bush1 = cbind.data.frame(linenumber = row_number(dfw3_bush1), dfw3_bush1)
dfw4_emotion_biden = dfw4_biden %>%
  inner_join(nrc_sent) %>%
  count(index = linenumber %/% 50, sentiment) 
## Joining, by = "word"
dfw4_emotion_trump = dfw4_trump %>%
  inner_join(nrc_sent) %>%
  count(index = linenumber %/% 50, sentiment) 
## Joining, by = "word"
dfw4_emotion_obama2 = dfw4_obama2 %>%
  inner_join(nrc_sent) %>%
  count(index = linenumber %/% 50, sentiment) 
## Joining, by = "word"
dfw4_emotion_obama1 = dfw4_obama1 %>%
  inner_join(nrc_sent) %>%
  count(index = linenumber %/% 50, sentiment) 
## Joining, by = "word"
dfw4_emotion_bush2 = dfw4_bush2 %>%
  inner_join(nrc_sent) %>%
  count(index = linenumber %/% 50, sentiment) 
## Joining, by = "word"
dfw4_emotion_bush1 = dfw4_bush1 %>%
  inner_join(nrc_sent) %>%
  count(index = linenumber %/% 50, sentiment) 
## Joining, by = "word"
gg_emotion_biden = dfw4_emotion_biden %>% 
  ggplot(aes(index, n, fill = as.factor(sentiment))) +
  geom_col() +
  theme(legend.position = "right")+
  ggtitle("Biden 2021")+
  theme_dark()+
  scale_fill_discrete(name="Sentiment")+
  theme(plot.title=element_text(hjust=0.5))

gg_emotion_trump = dfw4_emotion_trump %>% 
  ggplot(aes(index, n, fill = as.factor(sentiment))) +
  geom_col() +
  theme(legend.position = "right")+
  ggtitle("Trump 2017")+
  theme_dark()+
  scale_fill_discrete(name="Sentiment")+
  theme(plot.title=element_text(hjust=0.5))

gg_emotion_obama2 = dfw4_emotion_obama2 %>% 
  ggplot(aes(index, n, fill = as.factor(sentiment))) +
  geom_col() +
  theme(legend.position = "right")+
  ggtitle("Obama 2013")+
  theme_dark()+
  scale_fill_discrete(name="Sentiment")+
  theme(plot.title=element_text(hjust=0.5))

gg_emotion_obama1 = dfw4_emotion_obama1 %>% 
  ggplot(aes(index, n, fill = as.factor(sentiment))) +
  geom_col() +
  theme(legend.position = "right")+
  ggtitle("Obama 2009")+
  theme_dark()+
  scale_fill_discrete(name="Sentiment")+
  theme(plot.title=element_text(hjust=0.5))

gg_emotion_bush2 = dfw4_emotion_bush2 %>% 
  ggplot(aes(index, n, fill = as.factor(sentiment))) +
  geom_col() +
  theme(legend.position = "right")+
  ggtitle("Bush 2005")+
  theme_dark()+
  scale_fill_discrete(name="Sentiment")+
  theme(plot.title=element_text(hjust=0.5))

gg_emotion_bush1 = dfw4_emotion_bush1 %>% 
  ggplot(aes(index, n, fill = as.factor(sentiment))) +
  geom_col() +
  theme(legend.position = "right")+
  ggtitle("Bush 2001")+
  theme_dark()+
  scale_fill_discrete(name="Sentiment")+
  theme(plot.title=element_text(hjust=0.5))
title <- ggdraw() + draw_label("Emotional Sentiment In Presidential Inauguration Speeches", fontface='bold')
p<-ggpubr::ggarrange(gg_emotion_biden, 
          gg_emotion_trump, 
          gg_emotion_obama2, 
          gg_emotion_obama1, 
          gg_emotion_bush2, 
          gg_emotion_bush1, 
          common.legend = TRUE,
          legend = "bottom",
          align = "hv",
          nrow = 2,
          ncol = 3)
plot_grid(title, p, rel_heights=c(0.1, 1), ncol = 1, nrow = 2)

In the NRC sentiment lexicon, words in the Presidential Inauguration speeches are classified as 10 different types of sentiments - anger, anticipation, disgust, fear, joy, negative, positive, sadness, surprise, and trust. First, the two speeches with the highest level of words classified as anger are Obama 2013 and Biden 2021. For President Biden, this is understandable as only weeks earlier there was a riot on Capital Hill where he stood that killed several people that sought to directly interfer in the Constitutionally mandated certifiation of his election by Congress. Having a little bit of anger in his words is completely understandable when in his mind democracy itself was directly threatened. For President Obama’s speech in 2013, after analyzing the speech further, it appears that his level of anger is coming from a portion of speech dedicated to talking about the Revolution of 1776 and forming a country free from slavery. Moving on, every President had high levels of anticipation words being used. Since these speeches are at the beginning of a President’s term, this type of sentiment is expected as they are laying out their vision in anticipation of trying to meet their goals. Other than this, there are not a lot of signficant points to pick out of this visualization for more sentiments, as all Presidents had levels of joy, surprise, sadness, and trust throughout their speeches. Moving to changes between speeches of the same President, for President Obama, between 2009 and 2013 his speech noticeably had larger amount of words classified as anger, trust, and joy. Trust makes sense because with re-election, voters have demonstrated that they trusted President Obama the most to carry out their preferred policies, and President Obama is naturally playing to that preceived trust in his speech. For joy, in 2009 President Obama entered office during the Great Recession, the worst economic problem facing the country since the Great Depression. In his 2013, his speech is stock full of words expaining the resurgence of the economy, the scale down of forever words, and the truimph of builing a new country together. This level of joy directly plays to the progress that he preceived that the made over the course of the first term of his presidency. As for President Bush, he too had an increase of the level of trust between his speeches in 2001 and 2005. More importantly, there was an increase in joy like President Obama. Here, after the attacks on 9/11 and the invasion of Iraq and Afganistan, President Bush demonstrated the progress that was being made in the War Against Terror.

Below, we change directions with a new lexicon, bing, where words are more simply classified as only positive and negative.

dfw_sentiment_biden <- dfw4_biden %>%
  inner_join(bing_sent) %>%
  count(index = linenumber %/% 50, sentiment) %>%
  spread(sentiment, n, fill = 0) %>%
  mutate(sentiment = positive - negative)
## Joining, by = "word"
dfw_sentiment_trump <- dfw4_trump %>%
  inner_join(bing_sent) %>%
  count(index = linenumber %/% 50, sentiment) %>%
  spread(sentiment, n, fill = 0) %>%
  mutate(sentiment = positive - negative)
## Joining, by = "word"
dfw_sentiment_obama2 <- dfw4_obama2 %>%
  inner_join(bing_sent) %>%
  count(index = linenumber %/% 50, sentiment) %>%
  spread(sentiment, n, fill = 0) %>%
  mutate(sentiment = positive - negative)
## Joining, by = "word"
dfw_sentiment_obama1 <- dfw4_obama1 %>%
  inner_join(bing_sent) %>%
  count(index = linenumber %/% 50, sentiment) %>%
  spread(sentiment, n, fill = 0) %>%
  mutate(sentiment = positive - negative)
## Joining, by = "word"
dfw_sentiment_bush2 <- dfw4_bush2 %>%
  inner_join(bing_sent) %>%
  count(index = linenumber %/% 50, sentiment) %>%
  spread(sentiment, n, fill = 0) %>%
  mutate(sentiment = positive - negative)
## Joining, by = "word"
dfw_sentiment_bush1 <- dfw4_bush1 %>%
  inner_join(bing_sent) %>%
  count(index = linenumber %/% 50, sentiment) %>%
  spread(sentiment, n, fill = 0) %>%
  mutate(sentiment = positive - negative)
## Joining, by = "word"
gg_sentiment_biden = ggplot(dfw_sentiment_biden, aes(index, sentiment, fill = as.factor(sentiment ))) + geom_col()+
  ggtitle("Biden 2021")+
  theme_dark()+
  scale_fill_discrete(name="Sentiment")+
  theme(plot.title=element_text(hjust=0.5), legend.position="none")

gg_sentiment_trump = ggplot(dfw_sentiment_trump, aes(index, sentiment, fill = as.factor(sentiment ))) + geom_col()+
  ggtitle("Trump 2017")+
  theme_dark()+
  scale_fill_discrete(name="Sentiment")+
  theme(plot.title=element_text(hjust=0.5), legend.position="none")

gg_sentiment_obama2 = ggplot(dfw_sentiment_obama2, aes(index, sentiment, fill = as.factor(sentiment ))) + geom_col()+
  ggtitle("Obama 2013")+
  theme_dark()+
  scale_fill_discrete(name="Sentiment")+
  theme(plot.title=element_text(hjust=0.5), legend.position="none")

gg_sentiment_obama1 = ggplot(dfw_sentiment_obama1, aes(index, sentiment, fill = as.factor(sentiment ))) + geom_col()+
  ggtitle("Obama 2009")+
  theme_dark()+
  scale_fill_discrete(name="Sentiment")+
  theme(plot.title=element_text(hjust=0.5), legend.position="none")

gg_sentiment_bush2 = ggplot(dfw_sentiment_bush2, aes(index, sentiment, fill = as.factor(sentiment ))) + geom_col()+
  ggtitle("Bush 2005")+
  theme_dark()+
  scale_fill_discrete(name="Sentiment")+
  theme(plot.title=element_text(hjust=0.5), legend.position="none")

gg_sentiment_bush1 = ggplot(dfw_sentiment_bush1, aes(index, sentiment, fill = as.factor(sentiment ))) + geom_col()+
  ggtitle("Bush 2001")+
  theme_dark()+
  scale_fill_discrete(name="Sentiment")+
  theme(plot.title=element_text(hjust=0.5), legend.position="none")
title <- ggdraw() + draw_label("Positive/Negative Sentiment In Presidential Inauguration Speeches", fontface='bold')
p<-plot_grid(gg_sentiment_biden, 
          gg_sentiment_trump, 
          gg_sentiment_obama2, 
          gg_sentiment_obama1, 
          gg_sentiment_bush2, 
          gg_sentiment_bush1, 
          nrow = 2)
plot_grid(title, p, rel_heights=c(0.1, 1), ncol = 1, nrow = 2)

Above the visualization is presented for the Bing lexicon with only positive and negative words being netted at their particular point in the index throghout the duration of the speech. For President Obama in 2009 and President Biden in 2021, there graphs looks very closely aligned, with a large amount of net negative words being used at the beginning of the speech, followed up a large chunk of net positive words, then ended with negative once again. These similarities can’t just be chalked up to the possibility of similar speech writing staff. Rather, they both entered office and spoke these speeches during crises facing the country. For President Obama, he faced an economic collapse and meltdown. For President Biden, he faced a worldwide pandemic and a recent threat to democracy. These speeches follow the same pattern. They begin with talking about the problems facing the country (negative), followed by their plans to fix it and their hope for the future (positive), and finalized with their warnings if crisis is not solved. This format is especially effective in persuading a captive audience. For Bush 2001, Bush 2005, and Obama 2013, overall they had almost their entire speeches having a net positive classification, with only a few points with a net negative classification when they were trying to make a particular point. This obviously differs than Obama in 2009 and Biden in 2021 because for them, at these points in time they were not entering into office in a period of national turmoil or crisis. Therefore, the same degree of negative words was not needed for their speeches. Out of this visualization, I noted one point that was of a particular surprise to me - that President Trump did not have a single point throughout his Inaguration speech that had a net negative classification. This surprised me because the speeches I’m used to from President Trump is often saddled with language that shouldn’t be considered positive. This however brings up one potential issue with using lexicons in this manner for sentiment analysis - they only match to what the word actually means, not what is meant by the speaker who is saying it. As many know, the same one can have vastly different connotations depending on who is saying the word and how it is said. President Trump often uses words like beautiful, wonderful, and other clearly positive words that this lexicon would have picked up on. However, when using these words, they are not always used in a positive way. One way to better look at the meanings of indvidual words is to look at the surrounding words as they would give clues on the actualy connotation. Since the lexicon only looks at each individual word and not a collection of them, it becomes clear that there is a major flaw in conducting sentiment analysis in this manner.

Another problem I identified both in this visualization and the one above was that the x axis index for each speech is slightly different, as every President had a different length of speech. President Biden had more words in his speech than any other President since Preident Ronald Regan in 1985 which was a complete surprise to me. I have watched inauguration speeches for both President Obama, Trump, and Biden, and I personally felt that Biden’s was the shortest. To have his come in at the longest in terms of words was a complete shock. Either way, the trouble with comparing postive and negative sentiment overall in this manner is that the length of the speech will partially cloud the data. Therefore, below the code is adjusted to change the bing positive/negative lexicon results into a percent of total format, so we can see the percent of the speech that is positive and negative for the whole speech as a whole.

dfw_sentiment_biden$total<-dfw_sentiment_biden$negative+dfw_sentiment_biden$positive

biden_positive<-sum(dfw_sentiment_biden$positive)
biden_negative<-sum(dfw_sentiment_biden$negative)
biden_total<-sum(dfw_sentiment_biden$total)

biden_pos_percent<-round(biden_positive/biden_total, 3)
biden_neg_percent<-round(biden_negative/biden_total, 3)



dfw_sentiment_trump$total<-dfw_sentiment_trump$negative+dfw_sentiment_trump$positive

trump_positive<-sum(dfw_sentiment_trump$positive)
trump_negative<-sum(dfw_sentiment_trump$negative)
trump_total<-sum(dfw_sentiment_trump$total)

trump_pos_percent<-round(trump_positive/trump_total, 3)
trump_neg_percent<-round(trump_negative/trump_total, 3)



dfw_sentiment_obama2$total<-dfw_sentiment_obama2$negative+dfw_sentiment_obama2$positive

obama2_positive<-sum(dfw_sentiment_obama2$positive)
obama2_negative<-sum(dfw_sentiment_obama2$negative)
obama2_total<-sum(dfw_sentiment_obama2$total)

obama2_pos_percent<-round(obama2_positive/obama2_total, 3)
obama2_neg_percent<-round(obama2_negative/obama2_total, 3)



dfw_sentiment_obama1$total<-dfw_sentiment_obama1$negative+dfw_sentiment_obama1$positive

obama1_positive<-sum(dfw_sentiment_obama1$positive)
obama1_negative<-sum(dfw_sentiment_obama1$negative)
obama1_total<-sum(dfw_sentiment_obama1$total)

obama1_pos_percent<-round(obama1_positive/obama1_total, 3)
obama1_neg_percent<-round(obama1_negative/obama1_total, 3)



dfw_sentiment_bush2$total<-dfw_sentiment_bush2$negative+dfw_sentiment_bush2$positive

bush2_positive<-sum(dfw_sentiment_bush2$positive)
bush2_negative<-sum(dfw_sentiment_bush2$negative)
bush2_total<-sum(dfw_sentiment_bush2$total)

bush2_pos_percent<-round(bush2_positive/bush2_total, 3)
bush2_neg_percent<-round(bush2_negative/bush2_total, 3)



dfw_sentiment_bush1$total<-dfw_sentiment_bush1$negative+dfw_sentiment_bush1$positive

bush1_positive<-sum(dfw_sentiment_bush1$positive)
bush1_negative<-sum(dfw_sentiment_bush1$negative)
bush1_total<-sum(dfw_sentiment_bush1$total)

bush1_pos_percent<-round(bush1_positive/bush1_total, 3)
bush1_neg_percent<-round(bush1_negative/bush1_total, 3)
presidents<-c("Biden 2021", "Trump 2017", "Obama 2013", "Obama 2009", "Bush 2005", "Bush 2001")
positivepercent<-c(biden_pos_percent, trump_pos_percent, obama2_pos_percent, obama1_pos_percent, bush2_pos_percent, bush1_pos_percent)
negativepercent<-c(biden_neg_percent, trump_neg_percent, obama2_neg_percent, obama1_neg_percent, bush2_neg_percent, bush1_neg_percent)

pres_df<-data.frame(presidents, positivepercent, negativepercent)
pres_df$presidents<-factor(pres_df$presidents, levels = c("Bush 2001", "Bush 2005", "Obama 2009", "Obama 2013", "Trump 2017", "Biden 2021"))

pres_df
##   presidents positivepercent negativepercent
## 1 Biden 2021           0.536           0.464
## 2 Trump 2017           0.705           0.295
## 3 Obama 2013           0.630           0.370
## 4 Obama 2009           0.557           0.443
## 5  Bush 2005           0.622           0.378
## 6  Bush 2001           0.659           0.341
pres_df_reshape<- pres_df %>% gather(key=Percent, value=Value, positivepercent, negativepercent)
pres_sentiment<-ggplot(pres_df_reshape, aes(x=presidents, y=Value, fill=Percent))+
  geom_col(position="dodge")+
  ylab("Percent of Speech")+
  xlab("Presidential Inauguration")+
  ggtitle("Sentiment of Presidential Inauguration Speeches")+
  theme(plot.title=element_text(hjust=0.5))+
  scale_fill_discrete(name="Sentiment", labels=c("Negative", "Positive"))+
  theme_economist()
pres_sentiment

As noted above, this turned the second visualization from total words into a percent of total method for the speech as a whole. Like with the second graph, surprise came with the fact that President Trump came down with the highest level of positive word use out of any of the studied Inauguration speeches. One again, an issue with the connotation and the meaning of the words themselves comes forward. Another surprise came with President Biden having the highest level of negative speech out of all the studied speeches, even compared to President Obama in 2009 during the Great Recession. Although this is understandable due to the Covid-19 pandemic and Capital Riot that is noted above, I personally listened to the speech, and I thought it overall was especially positive. For example, he spoke highly of unity and coming together and similar themes. Like with President Trump, conntation matters. A question comes to mind for both President Obama in 2009 and Biden in 2021 - how would these sentiment percentages change with their corresponding crises? Would they be closer to the levels noted in Obama 2013, and both of Bush’s speeches? Another we cannot know for certain, it is defintely a point to ponder.

In conclusion, although this sentiment analysis was helpful because it both reinforced known facts and caused a bit of surprise in other areas, it did reveal one major flaw in sentiment analysis dealing with the connotation of a word being used. Since the English language is constantly evolving, meaning can change through generations. One off color example is through the word “shit”. Saying “you’re shit” is negative, while saying “you’re the shit” is positive. Since sentiment analysis would remove the word “the” as a stopword, the ending analysis would have both words having the same sentiment, though of course as noted, they have opposite sentiment. Now if this problem is multiplied over the course of a whole speech, many wrong sentiments can be taken. Therefore, even though information coming from this analysis is both useful and helpful, there needs to be a more effective way of using sentiment analysis moving forward.