Nazi Germany is a symbol of political propaganda and incitement. After the Nazis came to power in 1933, Hitler established the Reich Ministry of Public Enlightenment and Propaganda, with Joseph Goebbels as its head. The Ministry’s goal was to effectively deliver the Nazi message to the public through various mediums, including art, theater, music, film, radio, books, and the press. Through this, the Nazis aimed to spread nationalism and anti-Semitism within Germany and maintain the Nazi regime.
This report aims to analyze the speeches of Hitler and Goebbels, who led Nazi Germany’s propaganda efforts. These two figures created Nazi ideology using different languages and expressions. Therefore, this analysis seeks to explore how they mobilized the masses through language, and to identify both the similarities and differences between their approaches.
As a result of Nazi propaganda, many Germans enthusiastically supported the imperialistic wars of the Nazis and participated in the nationalist and anti-Semitic policies. This led to the outbreak of World War II and the horrific tragedy of the Holocaust. In order to prevent such historical tragedies from recurring, this report seeks to understand the structure of propaganda discourse by analyzing texts used in political incitement. Additionally, by analyzing the characteristics of hate speech, the report aims to raise awareness about the dangers of hate speech in political contexts.
Specifically, the report will first identify the key terms in the speeches through TF-IDF analysis. Then, using log odds ratio analysis, it will explore which specific words are more characteristic of each speaker’s discourse. Following this, the NRC sentiment analysis will be used to investigate the emotional connotations of the key terms used in the speeches. Finally, bigram analysis will be conducted to identify frequently occurring word pairs, helping to analyze the combinations and context of key terms. This will allow us to review the language and expressions used in the speeches of the two figures.
The main questions are the followings: - How do the word frequency and usage patterns differ between Hitler’s and Goebbels’ speeches?
What are the differences in emotional expression between Hitler’s and Goebbels’ speeches?
What themes are emphasized in Hitler’s and Goebbels’ speeches, respectively?
Expected Analysis Results: Given that Hitler mobilized the masses using emotional and persuasive language, he is expected to have used more blatant and extreme language (e.g., extermination, Jewish, must, fight). On the other hand, as Goebbels was a propaganda strategist, he is expected to have used more indirect and logical language, subtly embedding the ideology (e.g., Volk, duty, honor, responsibility).
The analysis results revealed several significant differences between the speeches of Hitler and Goebbels. First, by examining the TF-IDF frequency graph, we observed that Hitler frequently used words that emphasized the justification of war and ideology. In contrast, Goebbels often used emotional language associated with empathy and solidarity.
The log odds ratio analysis, conducted to identify more characteristic words for each speaker, yielded similar results. Hitler focused primarily on the importance and legitimacy of war, frequently mentioning specific war regions and enemies. On the other hand, Goebbels emphasized the duty of being German, focusing on community, morality, and responsibility.
The NRC sentiment analysis showed which emotional words were most prevalent in each speech. The analysis revealed that both Hitler and Goebbels had the highest proportions of fear and trust in their speeches. However, Hitler showed higher proportions of anger and anticipation, while Goebbels had higher proportions of joy, disgust, and surprise. In other words, while both figures used words related to fear and trust, Hitler emphasized anger and hostility more, while Goebbels utilized a broader emotional spectrum.
Lastly, the bigram network analysis, which sought to identify significant word combinations, revealed that Hitler frequently used nation-centric terms related to ‘German,’ such as ‘national socialism,’ ‘world war,’ and ‘British government,’ emphasizing nationalism and militarism. In contrast, Goebbels, while emphasizing the nation and ethnicity, frequently used anti-Semitic terms like ‘Jewish star’ and ‘international Jewry’ and highlighted community and morality through combinations like ‘public life’ and ‘political morality.’
In summary, Hitler mainly used strategic language to emphasize the nation and ethnicity, justifying the war. This result slightly deviated from the initial expectation, as Hitler predominantly used systematic and strategic language rather than emotional language. On the other hand, Goebbels, who was expected to use more logical and systematic language along with moral duty, ended up using more emotional and abstract language.
This analysis provides valuable insights into the structure of propaganda discourse by examining the speeches of Hitler and Goebbels, who were central to Nazi Germany’s propaganda. It demonstrates that political speeches can mobilize the masses not only through content but also by using emotions, structure, and strategic word choices. Furthermore, it shows how propaganda operates through different strategies and contributes to a critical understanding of the role and influence of political language.
This analysis uses the speeches of Hitler and Goebbels as the dataset. The data was collected from the Internet Archive, a digital library in the United States that provides English translations of speeches by these two figures. A total of 20 speeches were collected, 10 from each individual. Hitler’s speeches range from 1921 to the 1940s, while Goebbels’ speeches span from 1933 to 1944, the period during which he became actively involved with the Nazi Party. The speeches from both individuals were collected evenly across different time periods. (Hitler: 3 speeches from the 1920s, 4 from the 1930s, 3 from the 1940s; Goebbels: 5 speeches from the 1930s, 5 from the 1940s).
The dataset contains four columns: speaker, year, title, and text. The speaker column contains the name of the speaker, year represents the year the speech was given, title is the title of the speech, and text includes the full transcript of the speech.
first, load hitler_goebbels_speeches.csv dataset into ‘data’ variable. Next, words are extracted from the speech texts, and ‘stop words’ are removed. The data to be used for analysis is then stored in a variable called word_data.
# data = unprocessed data is saved
data <- read.csv("hitler_goebbels_speeches.csv")
# tokenize the 'text' column into individual words(removed stop words)
word_data <- data %>%
unnest_tokens(word, text) %>%
anti_join(stop_words, by = "word") %>%
filter(!str_detect(word, "^\\d+$"))
# processing data for TF-IDF analysis
# Calculate term frequency
frequency <- word_data %>%
count(speaker, word, sort = TRUE)
# Compute TF-IDF score for each word in each speaker's speeches
tfidf <- frequency %>%
bind_tf_idf(term = word,
document = speaker,
n = n)%>%
arrange(-tf_idf)
# Extract top 10 TF-IDF words for each speaker
tfidf_top10 <- tfidf %>%
group_by(speaker) %>%
slice_max(tf_idf, n = 10, with_ties = FALSE)
# plotting bar graph
ggplot(tfidf_top10, aes(x = reorder_within(word, tf_idf, speaker),
y = tf_idf,
fill = speaker)) +
geom_col(show.legend = FALSE) +
coord_flip() +
facet_wrap(~ speaker, scales = "free", ncol = 2) +
scale_x_reordered() +
scale_fill_manual(values = c("Hitler" = "red3",
"Goebbels" = "steelblue")) +
labs(x = NULL, y = "TF-IDF", title ="Top 10 TF-IDF Words: Hitler vs. Goebbels")
Top 10 TF-IDF Words
This graph visualizes the top 10 words most distinctively used by each speaker in Hitler and Goebbels’ speeches, analyzed using the TF-IDF method. TF-IDF analysis assigns high scores to specific words that each individual frequently used, while giving lower scores to words that are commonly used in general. Therefore, this method is useful not only for basic word frequency analysis but also for identifying characteristic and meaningful words. It is especially effective for detecting differences in the themes of the speeches, which is why it was chosen. Additionally, the bar graph makes it easy to intuitively observe the difference in TF-IDF values through the length of the bars, helping to identify the importance of each word. By using horizontal bars to list the words along the X-axis, the readability of the words was improved. The contrasting images of the two figures were also visually represented by creating separate graphs for each and assigning contrasting colors.
The analysis revealed clear linguistic differences between Hitler and Goebbels. In Goebbels’ speeches, words with high TF-IDF values included radio, hrer (Hörer, meaning listener), responsibility, pain, morality, burdens, people’s, overcome, actions, and germany. The presence of radio and hrer indicates his focus on propaganda via radio. Words like responsibility, pain, burdens, and overcome are related to discussing suffering and appealing for overcoming it. Morality, people’s, and actions are words that emphasize moral justification and calls for action. These findings suggest that Goebbels often used emotional language to evoke empathy and solidarity. In contrast, Hitler frequently used words like greece, greek, yugoslavia, balkans, churchill, officers, distress, aim, conception, and marxism. These words point to his primary concerns with war, enemies, and ideologies. Greece, yugoslavia, and balkans are geographical locations related to the major fronts of Germany’s military invasions. Churchill, officers, and distress are also related to war. Aim, conception, and marxism carry ideological and philosophical connotations. Thus, it is evident that Hitler often used language to emphasize the justification of war and ideology. In summary, Hitler used broad, macro-level language as a war leader, while Goebbels used more targeted, micro-level language as a political strategist to rally the public.
# processing data for log odds ratio analysis
# Pivot data and compute log odds ratio
logodds_wide <- frequency %>%
pivot_wider(names_from = speaker, values_from = n, values_fill = 0) %>%
rename(Goebbels = "Goebbels", Hitler = "Hitler") %>%
mutate(ratio_hitler = (Hitler + 1) / sum(Hitler + 1),
ratio_goebbels = (Goebbels + 1) / sum(Goebbels + 1),
log_odds_ratio = log(ratio_hitler / ratio_goebbels))
# Select top distinctive words per speaker based on absolute log odds ratio
logodds_top10 <- logodds_wide %>%
group_by(speaker = ifelse(log_odds_ratio > 0, "Hitler", "Goebbels")) %>%
slice_max(abs(log_odds_ratio), n = 10, with_ties = FALSE) %>%
ungroup() %>%
mutate(speaker = factor(speaker, levels = c("Goebbels", "Hitler")))
# Visualize most distinctive words using a bar plot
ggplot(logodds_top10, aes(x = reorder(word, log_odds_ratio),
y = log_odds_ratio, fill = speaker)) +
geom_col(show.legend = FALSE) +
coord_flip() +
facet_wrap(~ speaker, scales = "free") +
scale_fill_manual(values = c("Hitler" = "red3",
"Goebbels" = "steelblue"))+
labs(title = "Log Odds Ratio: Most Distinctive Words", x = NULL, y = "Log Odds Ratio")
Log Odds Ratio: Most Distinctive Words
In this section, the log odds ratio analysis was used to identify which specific words appeared more frequently in the speeches of Hitler and Goebbels. This analysis was particularly useful for directly comparing the word usage between the two figures, offering a clearer understanding of the messages conveyed in their speeches. Following the TF-IDF analysis, the log odds ratio analysis was conducted to examine which words Hitler and Goebbels used more frequently, and which words each of them used more than the other. Additionally, the results were visualized using bar graphs, making it easier to intuitively observe the differences in word choice through the length of the bars. To improve readability, horizontal bar graphs were used, as words positioned along the X-axis are shown without overlap. Separate graphs were created for each figure to clearly highlight the linguistic differences between them. To reflect the rhetorical style of each speaker, the color of the graphs was distinguished: Hitler’s aggressive speeches were represented by red, while Goebbels’ strategic speeches were represented by steelblue. This approach allowed us to reflect the distinct features of each speaker’s rhetoric. Ultimately, through log odds ratio analysis, we aimed to gain a deeper understanding of the distinctive words used in the speeches of Hitler and Goebbels.
The analysis showed that the most distinctive words in Goebbels’ speeches were responsibility, overcome, english, hrer (Hörer, meaning listener), radio, people’s, events, burdens, pain, and morality. These results were similar to the findings from the TF-IDF analysis. It seems that Goebbels used these words in his speeches to emphasize community, morality, and a sense of responsibility to the German people. In contrast, the most distinctive words in Hitler’s speeches were greece, churchill, officers, distress, aim, conception, british, yugoslavia, greek, and balkans. These words, too, aligned closely with the TF-IDF results. It appears that Hitler used these terms to focus on war zones and specific enemies, emphasizing the importance and legitimacy of war to the German people.
# processing data for nrc analysis
# Load NRC lexicon (excluding polarity: positive/negative)
nrc <- get_sentiments("nrc") %>%
filter(!sentiment %in% c("positive", "negative"))
# Combine word tokens with NRC sentiments and count frequency per sentiment and speaker
sentiment_count <- word_data %>%
inner_join(nrc, by = "word") %>%
count(speaker, sentiment, sort = TRUE)
## Warning in inner_join(., nrc, by = "word"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 8 of `x` matches multiple rows in `y`.
## ℹ Row 3365 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
NRC donut graph: Goebbels
# Extract sentiment data only for Goebbels
sentiment_goebbels <- sentiment_count %>% filter(speaker == "Goebbels")
# Plotting a donut chart to visualize Goebbels's sentiment distribution
ggplot(sentiment_goebbels, aes(x = 2, y = n, fill = sentiment)) +
geom_bar(stat = "identity", width = 1) +
coord_polar(theta = "y") +
geom_text(aes(label = paste0(round(n / sum(n) * 100, 1), "%")),
position = position_stack(vjust = 0.5), size = 4) +
labs(title = "Sentiment Distribution in Goebbels's Speeches") +
theme_void() +
theme(legend.title = element_blank()) +
xlim(0.5, 2.5) +
theme_light()
NRC donut graph: Hitler
# Extract sentiment data only for Hitler
sentiment_hitler <- sentiment_count %>%
filter(speaker == "Hitler")
# Plotting a donut chart to visualize Hitler's sentiment distribution
ggplot(sentiment_hitler, aes(x = 2, y = n, fill = sentiment)) +
geom_bar(stat = "identity", width = 1) +
coord_polar(theta = "y") +
geom_text(aes(label = paste0(round(n / sum(n) * 100, 1), "%")),
position = position_stack(vjust = 0.5), size = 4) +
labs(title = "Sentiment Distribution in Hitler's Speeches") +
theme_void() +
theme(legend.title = element_blank()) +
xlim(0.5, 2.5) +
theme_light()
Sentiment Distribution Using Donut Charts
In this section, the sentiment distribution in Hitler’s and Goebbels’ speeches was visualized using donut charts. First, words were classified into emotional categories using the NRC sentiment lexicon. Then, the proportion of each emotion was compared between the two sets of speeches. To allow for a more nuanced emotional analysis, general sentiments such as positive and negative were excluded from the visualization. The donut chart, a variation of the pie chart, was chosen as it effectively displays the proportion each category occupies in the whole and is especially useful for conveying comparative sentiment ratios between the two figures. Due to its clarity and simplicity in presenting proportions, the donut chart was ideal for illustrating the sentiment analysis results, allowing for a detailed and intuitive depiction of the emotional tone in their speeches.
Colors were used to distinguish between different emotions, making it easier to identify them visually. Each segment included percentage labels, which were placed at the center of the chart to allow for quick recognition of each emotion’s share. The text size was set to size = 4 to prevent overlapping. Additionally, theme_void() and theme_light() were applied to minimize the background design and keep the viewer’s focus on the graph itself. The ring thickness was adjusted to emphasize the donut shape and reduce visual clutter.
The analysis revealed that fear was the most dominant emotion in both Hitler’s and Goebbels’ speeches. The second most prominent emotion for both was trust, suggesting that both figures sought to evoke a combination of fear and trust in their audiences. More specifically, Goebbels showed higher proportions of joy, disgust, and surprise compared to Hitler, indicating that his speeches employed a broader emotional spectrum. The 2.0% higher ratio of joy in Goebbels’ speeches suggests that he often coupled fear with positive messages. On the other hand, Hitler’s speeches contained higher proportions of anger and anticipation, emphasizing aggression and tension. Notably, both figures displayed relatively high levels of sadness, each reaching double-digit percentages. This suggests that both made use of narratives about “national suffering and sacrifice” to evoke public sympathy. In summary, while both Hitler and Goebbels emphasized fear and trust, Hitler leaned more on anger and hostility, whereas Goebbels utilized a more layered emotional strategy, incorporating joy and disgust alongside fear.
Bigram Network Analysis
# processing data bigrams analysis
# Tokenizing the text into bigrams (two-word phrases)
bigrams <- data %>%
unnest_tokens(bigram, text, token = "ngrams", n = 2) %>%
separate(bigram, c("word1", "word2"), sep = " ") %>%
filter(!word1 %in% stop_words$word,
!word2 %in% stop_words$word,
!is.na(word1), !is.na(word2))
# Bigram Network for Goebbels's Speeches
goebbels_bigram_counts <- bigrams %>%
filter(speaker == "Goebbels") %>%
filter(!str_detect(word1, "\\d")) %>%
filter(!str_detect(word2, "\\d")) %>%
count(word1, word2, sort = TRUE)
# Filter frequent bigrams and convert to graph structure
goebbels_bigram_graph <- goebbels_bigram_counts %>%
filter(n > 2) %>%
graph_from_data_frame()
goebbels_bigram_graph
## IGRAPH 34f16d7 DN-- 48 34 --
## + attr: name (v/c), n (e/n)
## + edges from 34f16d7 (vertex names):
## [1] german ->people national ->socialist
## [3] german ->radio air ->terror
## [5] entire ->german german ->nation
## [7] national ->socialism historical ->significance
## [9] socialist ->revolution war ->effort
## [11] broad ->masses jewish ->star
## [13] middle ->class public ->life
## [15] historical ->mission international->jewry
## + ... omitted several edges
# Visualize Bigram Network for Goebbels
set.seed(123)
ggraph(goebbels_bigram_graph, layout = "fr") +
geom_edge_link(aes(edge_alpha = n), show.legend = FALSE,
arrow = grid::arrow(length = unit(0.15, "inches"), type = "closed"),
end_cap = circle(0.07, 'inches')) +
geom_node_point(color = "steelblue", size = 4) +
geom_node_text(aes(label = name), repel = TRUE, size = 4) +
theme_void() +
ggtitle("Bigram Network: Goebbels's Speeches") +
theme_light()
# Bigram Network for Hitler's Speeches
hitler_bigram_counts <- bigrams %>%
filter(speaker == "Hitler") %>%
filter(!str_detect(word1, "\\d")) %>%
filter(!str_detect(word2, "\\d")) %>%
count(word1, word2, sort = TRUE)
# Filter frequent bigrams and convert to graph structure
hitler_bigram_graph <- hitler_bigram_counts %>%
filter(n > 2) %>%
graph_from_data_frame()
hitler_bigram_graph
## IGRAPH 36a651e DN-- 67 48 --
## + attr: name (v/c), n (e/n)
## + edges from 36a651e (vertex names):
## [1] german ->people national ->socialist
## [3] german ->reich german ->nation
## [5] air ->force german ->army
## [7] national ->socialism world ->war
## [9] economic ->life german ->government
## [11] noncommissioned->officers armed ->forces
## [13] british ->foreign british ->government
## [15] british ->people friendly ->relations
## + ... omitted several edges
# Visualize Bigram Network for Hitler
set.seed(123)
ggraph(hitler_bigram_graph, layout = "fr") +
geom_edge_link(aes(edge_alpha = n), show.legend = FALSE,
arrow = grid::arrow(length = unit(0.15, "inches"), type = "closed"),
end_cap = circle(0.07, 'inches')) +
geom_node_point(color = "red3", size = 4) +
geom_node_text(aes(label = name), repel = TRUE, size = 4) +
theme_void() +
ggtitle("Bigram Network: Hitler's Speeches") +
theme_light()
Bigram Analysis and Network Visualization
In this section, bigrams—pairs of consecutive words—were extracted from Hitler’s and Goebbels’ speeches and visualized using network graphs. The networks were built using bigrams that appeared more than three times in each speech set. Bigram graphs are particularly effective in identifying frequently recurring expressions, offering insights into how words are connected to form meaningful narratives. This approach was instrumental in uncovering the structural differences in the rhetorical patterns of the two orators.
Separate bigram network graphs were created for Hitler and Goebbels to capture the most prominent bigrams in their speeches. To enhance readability, the nodes were distributed evenly, and overlapping between bigrams was minimized. The layout was designed so that the closer two nodes appeared, the more frequently those words co-occurred in the text. To visually distinguish between the two speakers, node colors were assigned as follows: red3 for Hitler, reflecting his aggressive and forceful rhetoric, and steelblue for Goebbels, representing a more strategic and calculated tone. Directional arrows were added to the edges to indicate word order within the bigrams, and the thickness of the edges reflected the strength of the connection—the thicker the edge, the more frequently the bigram appeared. To focus attention on the relationships between words, theme_void() and theme_light() were used to minimize the background visuals.
The results revealed that the most frequently appearing bigrams in Hitler’s speeches were centered around national identity and militarism. Expressions such as “national socialism”, “world war”, “armed forces”, and “british government” appeared prominently, reflecting his emphasis on nationalism, military power, and foreign diplomacy. These combinations indicate that Hitler aimed to persuade the public through themes of nationalism, international conflict, and military legitimacy.
Goebbels’ speeches also featured commonly recurring bigrams such as “german people” and “national socialism”, but with notable differences. Phrases like “jewish star” and “international jewry” pointed to his frequent use of antisemitic messaging. Furthermore, expressions such as “public life” and “political morality” revealed his emphasis on moral and collective identity. These patterns suggest that Goebbels employed emotional and ideological strategies to strengthen internal unity and justify actions under the guise of moral obligation.
In conclusion, Hitler tended to incite the public using authoritative and structured crisis narratives, while Goebbels focused more on emotional appeal and moral persuasion to consolidate internal unity. These patterns were consistently observed across the TF-IDF, Log Odds Ratio, and Bigram analyses.
Summary of findings:
There were significant differences in the vocabulary used in the speeches of Hitler and Goebbels.
Hitler used more fear- and anger-related terms, whereas Goebbels employed a broader emotional vocabulary.
These differences were clearly reflected in the combination of words through bigram network analysis.