I analyzed the farewell speeches of First Ladies Melania Trump and Michelle Obama. I will be exploring the amount of times the word “I” is used in each respective speech, the word count, overall sentiment, the most commonly used phrases, and . First Ladies Melania and Michelle speeches were similar in nature focusing on children and education. The audiences were polar as First Lady Melania gave her farewell speech at the republican convention in 2020. At this time, a select amount of delegates were allowed to attend because of the threat of Covid -19. First Lady Obama gave her speech in 2017 with a full audience at the White House. The similarity between the two speeches is noted, it has been widely speculated that First Lady Melania Trump plagiarized several speeches from First Lady Michelle Obama. I hypothisize that Michelle Obama’s speech will feature phrases of hope for the youth and future of the country and Melania Trumps will focus on the task she accomplished in the role.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.8     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.1
## ✔ readr   2.1.2     ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(tidytext)
library(textdata)
library(wordcloud2)
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     combine
library(readr)

First I downloaded all of my packages

michelle_speech <- read_delim("~/Downloads/Michelle Obama- Farewell transcript.txt", 
                                                 delim = "\t", escape_double = FALSE, 
                                                 col_names = FALSE, trim_ws = TRUE)
## Rows: 29 Columns: 1
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (1): X1
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
melania_speech <- read_delim("~/Downloads/Melania Trump- Farewell Speech .txt", 
                                             delim = "\t", escape_double = FALSE, 
                                             col_names = FALSE, trim_ws = TRUE)
## Rows: 27 Columns: 1
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (1): X1
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Then I downloaded the transcripts from both Michelle Obama and Melania Trump’s farewell speeches

new_stop_words <- stop_words %>% 
  filter(word != "i")

To determine the frequency that the word I is used in each speech I filtered all of the stop words except for “I” and titled this new column new_stop_words

michelle_words <- michelle_speech %>% 
  unnest_tokens(word, X1, token="words") %>% 
  anti_join(new_stop_words)
## Joining, by = "word"

I then created a column of all of the words in Michelle Obama’s speech including I but excluding all other stop words

michelle_ngrams <- michelle_speech %>%
  unnest_tokens(bigram, X1, token = "ngrams", n = 2) %>%
  filter(!is.na(bigram))

The ngram shows the most common two words grouped together

michelle_filtered <- michelle_ngrams %>%
  separate(bigram, c("word1", "word2"), sep=" ")
michelle_bigram_counts <- michelle_filtered %>%
  count(word1, word2, sort = TRUE)  
michelle_bigram_counts
## # A tibble: 2,160 × 3
##    word1 word2       n
##    <chr> <chr>   <int>
##  1 young people     16
##  2 i     want       11
##  3 to    be         10
##  4 want  to         10
##  5 all   of          9
##  6 our   young       9
##  7 this  country     9
##  8 going to          8
##  9 in    the         8
## 10 of    the         8
## # … with 2,150 more rows

I created a bigram to show commonly used phrases Michelle Obama’s most commonly used phrases include: “Young People” which was said 16 times in her farewell. Obama also says “I want” 11 times and and “To be” 10 times. “Our young people” and “this country” were said nine times.

michelle_bigram_counts %>%
  filter(word1 == "i") %>%
  count(word2, sort = TRUE)
## # A tibble: 15 × 2
##    word2          n
##    <chr>      <int>
##  1 also           1
##  2 am             1
##  3 can            1
##  4 debuted        1
##  5 end            1
##  6 especially     1
##  7 feel           1
##  8 guarantee      1
##  9 have           1
## 10 hope           1
## 11 know           1
## 12 say            1
## 13 want           1
## 14 was            1
## 15 will           1

I explored the second most used word after “I” but the results did not yield any conclusive information I filtered the second word after I to get this information

melania_words <- melania_speech %>% 
  unnest_tokens(word, X1, token="words") %>% 
  anti_join(new_stop_words)
## Joining, by = "word"

I then created a column of all of the words in Melania Trump’s speech including I but excluding all other stop words

melania_ngrams <- melania_speech %>%
  unnest_tokens(bigram, X1, token = "ngrams", n = 2) %>%
  filter(!is.na(bigram))

This ngram shows the most common two words grouped together

melania_filtered <- melania_ngrams %>%
  separate(bigram, c("word1", "word2"), sep=" ")

This shows Melania Trumps most commonly used phrases

melania_bigram_counts <- melania_filtered %>%
  count(word1, word2, sort = TRUE)

Melania Trumps most commonly used phrases include “I have” used 8 times and “be best” 5 times

melania_filtered %>%
  filter(word1== "i") %>%
  count(word2, sort = TRUE)
## # A tibble: 13 × 2
##    word2         n
##    <chr>     <int>
##  1 have          8
##  2 ask           3
##  3 think         3
##  4 am            1
##  5 came          1
##  6 conclude      1
##  7 launched      1
##  8 reflected     1
##  9 remember      1
## 10 say           1
## 11 see           1
## 12 thank         1
## 13 treasure      1

I explored the second most used word after I Melania’s uses the word have 8 times after saying I which is consistent with the speech being focused on her completed tasks as first lady I filtered the word “I” and sorted the data to include the most commonly word after it

michelle_words_freq <- michelle_words %>% 
  group_by(word) %>% 
  summarise(cnt = n()) %>% 
  mutate(freq = round(cnt / sum(cnt), 3)) %>% 
  arrange(desc(freq)) %>% 
  head(7) %>% 
  ggplot(aes(freq, reorder(word, freq))) + geom_col() + ggtitle("Michelle") +  xlim(0, 0.06)
michelle_words_freq 

After, I determined the 7 most frequently used words in Michelle’s speech and plotted the frequency based on usage and titled the graph Michelle To do this I grouped her speech into words and arranged it to most frequently used and plotted the results with a ggplot, placing the x axis at .6 to show how much the word “I” was used

melania_words_freq <- melania_words %>% 
  group_by(word) %>% 
  summarise(cnt = n()) %>% 
  mutate(freq = round(cnt / sum(cnt), 3)) %>% 
  arrange(desc(freq)) %>% 
  head(7) %>% 
  ggplot(aes(freq, reorder(word, freq))) + geom_col() + ggtitle("Melania") +  xlim(0, 0.07)
melania_words_freq 

After, I determined the 7 most frequently used words in Melania’s speech and plotted the frequency based on usage and titled the graph Melania To do this I grouped her speech into words and arranged it to most frequently used and plotted the results with a ggplot, placing the x axis at .7 to show how much the word “I” was used Melania used the word “I” more than Michelle so I extended the x axis to .7

grid.arrange(michelle_words_freq, melania_words_freq, ncol=2, nrow=1)

I then arranged the two data sets together so they can be compared side by side

michelle_words %>% 
  count(word, sort=TRUE) %>% 
wordcloud2()

After, looked at the sentiment of each speech and used a word cloud to show the overarching message Michelle’s speech is positive and mentions I,students, education, and hope very often

melania_words %>% 
  count(word, sort=TRUE) %>% 
wordcloud2() 

I then looked at the sentiment of each speech and used a word cloud to show the overarching message Melania’s speech is positive and mentions children, “I”, and Americans very often, her word cloud is substainally smaller than Michelle’s To determine why Melania’s word cloud is smaller I calculated the word count of each speech

 michelle_speech %>%
  unnest_tokens(word, X1) %>%
  count()
## # A tibble: 1 × 1
##       n
##   <int>
## 1  2716

Michelle has a total of 2716 words in her speech I used unnest tokens to to separate each line of text

 melania_speech %>%
  unnest_tokens(word, X1) %>%
  count()
## # A tibble: 1 × 1
##       n
##   <int>
## 1   855

Melania’s speech is 855 words in total I used unnest tokens to to separate each line of text Melania’s word count explains the smaller word cloud size for the overall sentiment

Conclusion: First Ladies Melania Trump and Michelle Obama’s speeches both shared positive sentiments. Obama’s speech was focused on the future for education, college, hope, and people. This aligns with my hypothesis about Obama projecting her hope for the future and education. Melania Trump’s speech is features the word “I” a considerable amount of times and mentions children very often. Michelle Obama’s speech was 2,716 words compared to Melania’s 855 word speech. The data from Melania’s shorter speech does not create the same size word cloud or the same amount of commonly used phrases because of the lack of data. Melania used the word “I” 24 times in her 855 word speech and Michelle used the word “I” 31 times in her 2716 word speech . I also explored the most commonly used word after “I”, Michelle’s analysis yielded no results. Melania used the word “have” most commonly after the word “I” aligning with the hypothesis that her speech is focused on the task she accomplished. The most commonly used phrase used by Michelle Obama was “young people” and Melania’s phrases yielded were “I have” used 8 times and “be best” 5 times. My hypothesis aligned with the results of the analysis.