Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
Rows: 11199 Columns: 1
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): The Project Gutenberg eBook of Mussolini as revealed in his politic...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
Rows: 6941 Columns: 1
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): {\rtf1\ansi\ansicpg1252\cocoartf2578
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
Rows: 4152 Columns: 1
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): SPEECH 1
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ purrr 1.0.2
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
Rows: 11200 Columns: 1
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): X1
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
Rows: 6942 Columns: 1
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): X1
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
Rows: 4153 Columns: 1
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): X1
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 20,804 × 3
# Groups: Speaker [3]
Speaker word n
<chr> <chr> <int>
1 Hitler 1 1
2 Hitler 1,000 1
3 Hitler 1,042 1
4 Hitler 10 1
5 Hitler 10,000 1
6 Hitler 10,000,000 1
7 Hitler 10,572 1
8 Hitler 100 1
9 Hitler 104 1
10 Hitler 10th 1
# ℹ 20,794 more rows
We can look at the most common negative and positive words used by each man and visualize them through a word cloud to see each word’s popularity.
Among these words, Hitler used “won”, “win”, and “love” most often throughout his speeches.
From looking at the word clouds, there are very interesting similarities between the most popular negative and positive words used among the three men. Trump used the word “bad” a lot as well as synonyms for the word, such as “terrible” and “horrible.” Mussolini focuses more on death and suffering, saying “dead”, “die”, and “destruction” most often. Hitler said “crisis” and “lost” a lot throughout his speeches, perhaps to utilize fear-mongering among his listeners about the state of Germany. From looking at the most popular positive words among the three men, there are also many similarities. All three men said “love” and “win” most often. I am inferring that this is because each leader wanted to earn trust and connection with their followers as well as make them believe that they are going to lead their countries and people to victory against their enemies.
I assume that, on average, each leader’s sentiment throughout their speeches would be negative, so here, I can check to see if I am correct.
Trump’s mean sentiment is -0.3678977, Mussolini’s average sentiment is -0.2750846, and Hitler’s average sentiment is -0.3725. On average, each leader’s speech contained a negative sentiment, and their scores are relatively close. Trump has the most negative sentiment, followed by Mussolini, and then Hitler.
In order to find the most popular words, in general, for each man, I need to remove apostrophes and filter out words that do not contain any sentiment, such as “im” and “its” to see what each man liked to emphasize to their followers.
Trump overwhelmingly said “people” most throughout his speeches. From observing him over the last eight years, he enjoys using his speeches to talk about other people, criticize particular groups of people, as well as speak directly to his audiences. Further, as a presidential-candidate and president, it makes sense that he would say “country” a lot as well to discuss The United States as well as America’s relationships with other nations. Trump also enjoys talking about money, whether it is about the nation’s economic state or his own finances, mostly the latter.
Mussolini’s most popular words include “Italy,” “Italian,” and “War”, which all make sense as he was the leader of the state of Italy, spoke to Italian people in most of his speeches, and he was the leader during World War 2, so the subject matter in his speeches would typically be about the state of the war.
Hitler’s most popular words included “people,”German,” and “Germany.” Similarly to Trump, Hitler used people to talk about and blame enemy groups as well as round up the people of Germany to believe in him. Similarly to Mussolini, Hitler said “German” and “Germany” the most because he was the leader of Germany and was mostly speaking to or about the German people.
For the last portion of my study, I want to compare the average word lengths that are included in the speeches of each of these three leaders because I hypothesize that the length would be relatively-shorter to appeal to a larger and less-educated crowd.
BadMen2 |>mutate(Length =str_length(BadMen2$word)) -> BadMen2_lengthBadMen2_length |>group_by(Speaker) |>summarize(average =mean(Length)) |>ggplot(aes(Speaker, average, fill = Speaker)) +geom_col() +labs(x ="Speaker", y="Average Word Length by Letter", title ="Average word length in Speakers' Speeches")
Hitler’s average word length was the longest among the three leaders, but each three leader’s average word length are relatively the same, with Trump’s average word length clearly being the lowest. Trump says simple words like “good,” and “bad” to appeal to his target audiences.
I found that this report supports my hypothesis because the text within the speeches of Trump, Mussolini, and Hitler contained generally negative sentiment, and included short and common words.