Analysis of the number of words, distinct words, and density of each inaugural speech.
inaug_words |>mutate(word_length =nchar(word)) |>ggplot(aes(word_length)) +facet_wrap(vars(author), scales ="free_y") +geom_histogram(binwidth =1) +labs(title ="Word length distributions for each inaugural speech")
Graph of the word length of each president during their inaugural speech.
inaug_words |>group_by (author) |>count(word, sort = T) |>top_n(5) |>ungroup() |>mutate(word =reorder(word, n)) |>ggplot(aes(word, n, fill = author)) +geom_col(show.legend =FALSE) +coord_flip() +facet_wrap(~author, scales ="free") +# creates separate graphs for each authorscale_fill_viridis_d() +# uses a nicer color schemetheme_minimal() +# removes the gray backgroundlabs(x =NULL, y ="Most common words")
Selecting by n
Graph of the most common words used by each president during their inaugural speech.
inaug_words |>anti_join(snowball) |>group_by(author) |>count(word, sort = T) |>top_n(5) |>ungroup() |>mutate(word =reorder(word, n)) |>ggplot(aes(word, n, fill = author)) +geom_col(show.legend =FALSE) +labs(x =NULL, y ="Most common words") +facet_wrap(vars(author), scales ="free") +scale_fill_viridis_d() +theme_minimal() +coord_flip()
Joining with `by = join_by(word)`
Selecting by n
Graph of the most common words excluding stop words that were used by each president during their inaugural speech.
inaug_word_counts <- inaug_speeches |># This counts each word per authorunnest_tokens(word, text) |>count(author, word, sort =TRUE) total_words <- inaug_word_counts |># This counts total words per authorgroup_by(author) |>summarize(total =sum(n))inaug_word_counts <-left_join(inaug_word_counts, total_words) # Joins the two
# A tibble: 619 × 3
word sentiment n
<chr> <chr> <int>
1 good positive 21
2 freedom positive 19
3 great positive 19
4 right positive 18
5 work positive 18
6 peace positive 17
7 free positive 16
8 well positive 15
9 confidence positive 11
10 happiness positive 11
# ℹ 609 more rows
Sentiment analysis of the common positive and negative words of each inaugural speech.
inaug_words |>inner_join(bing) |>count(word, sentiment, sort =TRUE) |>group_by(sentiment) |>top_n(10) |>ungroup() |>mutate(word =reorder(word, n)) |>ggplot(aes(word, n, fill = sentiment)) +geom_col(show.legend =FALSE) +facet_wrap(vars(sentiment), scales ="free") +labs(y ="inaugural speech: Words that contribute the most to each sentiment",x =NULL) +scale_fill_viridis_d() +coord_flip() +theme_minimal()
Joining with `by = join_by(word)`
Selecting by n
Graph of the common positive and negative words used by presidents in their inaugural speech.
inaug_speeches |>unnest_tokens(bigram, text, token ="ngrams", n =2) |>select(bigram) -> inaug_bigrams
inaug_bigrams |>count(bigram, sort = T)
# A tibble: 10,876 × 2
bigram n
<chr> <int>
1 of the 146
2 in the 80
3 of our 55
4 to the 55
5 and the 38
6 to be 37
7 it is 35
8 by the 34
9 for the 30
10 that the 29
# ℹ 10,866 more rows
Analysis of the common bigrams of each inaugural speech.
# A tibble: 2,399 × 2
bigram n
<chr> <int>
1 let us 18
2 fellow citizens 16
3 united states 11
4 american people 6
5 federal government 4
6 government can 4
7 one section 4
8 vice president 4
9 will endure 4
10 among us 3
# ℹ 2,389 more rows
Analysis of the common bigrams excluding stop words of each inaugural speech.
first_word <-c("president", "citizens") # these need to be lowercaseinaug_bigrams |>count(bigram, sort = T) |>separate(bigram, c("word1", "word2"), sep =" ") |># separate the two wordsfilter(word1 %in% first_word) |># find first words from our listcount(word1, word2, wt = n, sort =TRUE) |>mutate(word2 =factor(word2, levels =rev(unique(word2)))) |># put the words in ordergroup_by(word1) |>top_n(5) |>ggplot(aes(word2, n, fill = word1)) +scale_fill_viridis_d() +# set the color palettegeom_col(show.legend =FALSE) +labs(x =NULL, y =NULL, title ="Word following:") +facet_wrap(~word1, scales ="free") +coord_flip() +theme_minimal()
Selecting by n
Graph of the common words that followed the words citizens and president used by presidents in their inaugural speech.