It is the eve of another polarizing Presidential election in the United States. The winner of the election will be in inaugurated on January 20th 2025 and then have the opportunity to speak to the nation for the first time as President via the inaugural address. While the corpus of each President’s recorded words spans far beyond the inaugural address, the inaugural address is a unique opportunity to set the tone for an administration, frame the values they believe in, the policies they will pursue, and attempt to heal the wounds wrought by a contentious election. Perhaps the two most influential political figures in 21st century American politics have been Barack Obama and Donald Trump. These men represent wildly different political viewpoints, rhetorical styles, and value systems. Moreover, these speeches occurred in different historical contexts. Are these differences apparent in sentiment analysis of their inaugural speeches?
Note that parts of this code were adopted from examples in Chapter 2 of “Tidy Text Mining with R” by Julia Silge and David Robinson.
The texts from both President Obama’s 2009 inaugural speech and President Trump’s 2017 inaugural speech were retrieved using the gutenbergr library. Next, each respective text data set is tokenized; additionally, stop words are removed from the data sets. Next, the respective speech data sets are subjected to separate sentiment scoring analyses via the bing, nrc, and afinn lexicons. Note the the nrc lexicon was limited to “positive” and “negative” bins. The bing and nrc sentiments are combined for plotting and comparison in ggplot since they both bin the words as “positive” and “negative”; the proportion of each President’s words that fall in each bin are presented. For the afinn sentiment scoring results, the mean score is plotted via ggplot.
Very surprisingly, given his general reputation for saying controversial things, President Trump used a greater portion of positive words (as scored via nrc and bing) and had a higher mean sentiment score (afinn). It is important to remember that this analysis considers only a single speech, so real patterns in sentiment probably cannot be reliably detected here. Additionally, scoring the sentiment of the words without context is imperfect, especially given the small data set. A more thorough examination of the sentiments of each President would require a larger corpus of speeches and comments.
knitr::opts_chunk$set(echo = TRUE)
#Parts of this code were adopted from Chapter 2 of "Tidy Text Mining with R" by Julia Silge and David Robinson
#Import libraries
library(tidytext)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.5
## ✔ ggplot2 3.5.1 ✔ stringr 1.5.1
## ✔ lubridate 1.9.3 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(gutenbergr)
data("stop_words")
#Get and tidy Obama's speech
obama_speech <- gutenberg_download(c(28001))
## Determining mirror for Project Gutenberg from https://www.gutenberg.org/robot/harvest
## Using mirror http://aleph.gutenberg.org
tidy_obama <- obama_speech |>
unnest_tokens(word, text) |>
anti_join(stop_words) |>
mutate(president = "Obama")
## Joining with `by = join_by(word)`
#Get and tidy Trump's speech
trump_speech <- gutenberg_download(c(57953))
tidy_trump <- trump_speech |>
unnest_tokens(word, text) |>
anti_join(stop_words) |>
mutate(president = "Trump")
## Joining with `by = join_by(word)`
#Generate positive and negative sentiment calls using bing and nrc on the combined words from each president's speech
combined_speeches_bing <- rbind(tidy_obama, tidy_trump) |>
inner_join(get_sentiments("bing")) |>
mutate(method = 'bing')
## Joining with `by = join_by(word)`
combined_speeches_nrc <- rbind(tidy_obama, tidy_trump) |>
inner_join(get_sentiments("nrc")) |>
filter(sentiment %in% c("positive", "negative")) |>
mutate(method = 'nrc')
## Joining with `by = join_by(word)`
## Warning in inner_join(rbind(tidy_obama, tidy_trump), get_sentiments("nrc")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 2 of `x` matches multiple rows in `y`.
## ℹ Row 9654 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
combined_speeches_afinn <- rbind(tidy_obama, tidy_trump) |>
inner_join(get_sentiments("afinn")) |>
mutate(method = 'afinn')
## Joining with `by = join_by(word)`
#Combine the nrc and bing datasets
final_speech_pn <- rbind(combined_speeches_nrc,
combined_speeches_bing)
#Plot the proportion of words that are positive or negative in each speech
final_speech_pn |>
group_by(president, method)|>
mutate(n_sentiments = n()) |>
ungroup() |>
group_by(president, sentiment, method) |>
mutate(sen_count = n(), proportion = (sen_count/n_sentiments)*100) |>
summarize(percent_of_words = mean(proportion)) |>
ggplot(aes(x = sentiment, y = percent_of_words, fill = sentiment)) +
geom_col() +
facet_grid(rows = vars(method), cols = vars(president))
## `summarise()` has grouped output by 'president', 'sentiment'. You can override
## using the `.groups` argument.
#Plot the mean values assigned by the afinn lexicon
combined_speeches_afinn |>
group_by(president) |>
summarize(mean_sentiment = mean(value))|>
ggplot(aes(x = president, y = mean_sentiment, fill = president)) +
geom_col()