library(tidyverse)
library(DT)
library(tidytext) # package for text analysis
library(readxl) # reads excel files, the format I used for the data
- The following will read in the text and unnest the words for inaugural speeches.
in_notes <- read_excel("inaug_speeches.xlsx")
in_notes
in_words <- in_notes %>%
unnest_tokens(word, text)
in_words
NA
The previous graphs show that we are looking at inaugural speeches from Washington, Jefferson, Lincoln, FDR, Kennedy, Reagan, and Obama. The second graph shows every word used in each of their speeches.
- The following shows lexical diversity, lexical density, and the total number of words for each inaugural speech.
in_words %>%
group_by(author) %>%
summarise(num_words = n(),
lex_diversity = n_distinct(word),
lex_density = n_distinct(word)/n())
Lexical diversity refers to the number of distinct words that the inaugural speech contained. This is one measure of an individual’s vocabulary. However, longer speeches have more distinct words. Because of that, lexical density is measured. Lexical density refers to the number of distinct words divided by the total number of words. The higher the lexical density is, the less repeat words are being used. In the graph above, Washington has the highest lexical density and Lincoln has the lowest.
- The following shows the mean word length of each inaugural speech.
in_words %>%
group_by(author) %>%
mutate(word_length = nchar(word)) %>%
summarize(mean_word_length = mean(word_length)) %>%
arrange(-mean_word_length)
The table above shows the mean word length for each inaugural speech. The inaugural speeches were very similar in mean word length, with Washington having the biggest mean word length of 4.9 and Kennedy having the smallest mean word length with 4.4.
- The following graphs show the mean word length for each inaugural speech.
in_words %>%
mutate(word_length = nchar(word)) %>%
ggplot(aes(word_length)) +
geom_histogram(binwidth = 1) +
facet_wrap(vars(author), scales = "free_y") +
labs(title = "Average Word Length of Inaugural Speeches")

According to the histograms, the presidents all had similar mean word lengths in their inaugural speeches. It is important to note that the y-axis is different for each histogram.
- The following shows the most common words used in each inaugural speech with stop words removed.
in_words %>%
anti_join(stop_words) %>%
group_by(author) %>%
count(word, sort = T) %>%
top_n(5) %>%
ungroup() %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(word, n, fill = author)) +
geom_col(show.legend = FALSE) +
labs(x = NULL, y = "Most Common Words") +
facet_wrap(vars(author), scales = "free") +
scale_fill_viridis_d() +
theme_minimal() +
coord_flip()
Joining, by = "word"
Selecting by n

The above graphs show the most common words used according to each president’s speech after stop words have been removed. There are some similarities and differences as one can see from looking at them. For example, Washington, Reagan, and Jefferson all had the word “government” as the most used word. FDR and Obama had a similar word they both used that was the most common, “national” and “nation”. From looking at the histograms, it makes sense that these were the most common words that were used in inaugural speeches.
- The following graphs show words with the highest tf-idfs in each inaugural speech.
in_word_counts <- in_notes %>% # This counts each word per author
unnest_tokens(word, text) %>%
count(author, word, sort = TRUE)
total_words <- in_word_counts %>% # This counts total words per author
group_by(author) %>%
summarize(total = sum(n))
in_word_counts <- left_join(in_word_counts, total_words) # Joins the two
Joining, by = "author"
in_tf_idf <- in_word_counts %>% # Calculates tf-idf
bind_tf_idf(word, author, n)
in_tf_idf %>% # Displays it
arrange(-tf_idf)
NA
in_tf_idf %>%
arrange(-tf_idf) %>%
mutate(word = factor(word, levels = rev(unique(word)))) %>%
group_by(author) %>%
top_n(5) %>%
ggplot(aes(word, tf_idf, fill = author)) +
geom_col(show.legend = FALSE) +
labs(x = NULL, y = "tf-idf") +
facet_wrap(~author, scales = "free") +
coord_flip() +
theme_minimal() +
scale_fill_viridis_d() +
labs(title= "Most Distinctive Words in Each Inaugural Speech")
Selecting by tf_idf

The above graphs show what words are unique to each inaugural speech when all speeches are taken into account. In other words, it’s a measure of how often a word appears in one document, divided by how often it appears in other documents. Furthermore, tf-idf automatically excludes most stop words because they will appear in all of the notes. With all of that being said, some inaugural speeches have more unique words than others. For example, Washington’s speech has more than Obama’s. Perhaps, the words in the graphs are a good indicator of what was happening during the time that president gave their inaugural speech.
LS0tCnRpdGxlOiAiSW50cm8gdG8gVGV4dCBBbmFseXNpcyIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQpgYGB7cn0KbGlicmFyeSh0aWR5dmVyc2UpCmxpYnJhcnkoRFQpCmxpYnJhcnkodGlkeXRleHQpICAgICAgICAjIHBhY2thZ2UgZm9yIHRleHQgYW5hbHlzaXMKbGlicmFyeShyZWFkeGwpICAgICAgICAgICMgcmVhZHMgZXhjZWwgZmlsZXMsIHRoZSBmb3JtYXQgSSB1c2VkIGZvciB0aGUgZGF0YQoKYGBgCjEuIFRoZSBmb2xsb3dpbmcgd2lsbCByZWFkIGluIHRoZSB0ZXh0IGFuZCB1bm5lc3QgdGhlIHdvcmRzIGZvciBpbmF1Z3VyYWwgc3BlZWNoZXMuIApgYGB7cn0KaW5fbm90ZXMgPC0gcmVhZF9leGNlbCgiaW5hdWdfc3BlZWNoZXMueGxzeCIpCgppbl9ub3RlcwpgYGAKYGBge3J9CmluX3dvcmRzIDwtIGluX25vdGVzICU+JQogIHVubmVzdF90b2tlbnMod29yZCwgdGV4dCkKCmluX3dvcmRzCgpgYGAKVGhlIHByZXZpb3VzIGdyYXBocyBzaG93IHRoYXQgd2UgYXJlIGxvb2tpbmcgYXQgaW5hdWd1cmFsIHNwZWVjaGVzIGZyb20gV2FzaGluZ3RvbiwgSmVmZmVyc29uLCBMaW5jb2xuLCBGRFIsIEtlbm5lZHksIFJlYWdhbiwgYW5kIE9iYW1hLiBUaGUgc2Vjb25kIGdyYXBoIHNob3dzIGV2ZXJ5IHdvcmQgdXNlZCBpbiBlYWNoIG9mIHRoZWlyIHNwZWVjaGVzLiAKCjIuIFRoZSBmb2xsb3dpbmcgc2hvd3MgbGV4aWNhbCBkaXZlcnNpdHksIGxleGljYWwgZGVuc2l0eSwgYW5kIHRoZSB0b3RhbCBudW1iZXIgb2Ygd29yZHMgZm9yIGVhY2ggaW5hdWd1cmFsIHNwZWVjaC4KYGBge3J9CmluX3dvcmRzICU+JSAKICBncm91cF9ieShhdXRob3IpICU+JSAKICBzdW1tYXJpc2UobnVtX3dvcmRzID0gbigpLAogICAgICAgICAgICBsZXhfZGl2ZXJzaXR5ID0gbl9kaXN0aW5jdCh3b3JkKSwgCiAgICAgICAgICAgIGxleF9kZW5zaXR5ID0gbl9kaXN0aW5jdCh3b3JkKS9uKCkpCmBgYApMZXhpY2FsIGRpdmVyc2l0eSByZWZlcnMgdG8gdGhlIG51bWJlciBvZiBkaXN0aW5jdCB3b3JkcyB0aGF0IHRoZSBpbmF1Z3VyYWwgc3BlZWNoIGNvbnRhaW5lZC4gVGhpcyBpcyBvbmUgbWVhc3VyZSBvZiBhbiBpbmRpdmlkdWFsJ3Mgdm9jYWJ1bGFyeS4gSG93ZXZlciwgbG9uZ2VyIHNwZWVjaGVzIGhhdmUgbW9yZSBkaXN0aW5jdCB3b3Jkcy4gQmVjYXVzZSBvZiB0aGF0LCBsZXhpY2FsIGRlbnNpdHkgaXMgbWVhc3VyZWQuIExleGljYWwgZGVuc2l0eSByZWZlcnMgdG8gdGhlIG51bWJlciBvZiBkaXN0aW5jdCB3b3JkcyBkaXZpZGVkIGJ5IHRoZSB0b3RhbCBudW1iZXIgb2Ygd29yZHMuIFRoZSBoaWdoZXIgdGhlIGxleGljYWwgZGVuc2l0eSBpcywgdGhlIGxlc3MgcmVwZWF0IHdvcmRzIGFyZSBiZWluZyB1c2VkLiBJbiB0aGUgZ3JhcGggYWJvdmUsIFdhc2hpbmd0b24gaGFzIHRoZSBoaWdoZXN0IGxleGljYWwgZGVuc2l0eSBhbmQgTGluY29sbiBoYXMgdGhlIGxvd2VzdC4KCjMuIFRoZSBmb2xsb3dpbmcgc2hvd3MgdGhlIG1lYW4gd29yZCBsZW5ndGggb2YgZWFjaCBpbmF1Z3VyYWwgc3BlZWNoLiAKYGBge3J9CmluX3dvcmRzICU+JQogIGdyb3VwX2J5KGF1dGhvcikgJT4lIAogIG11dGF0ZSh3b3JkX2xlbmd0aCA9IG5jaGFyKHdvcmQpKSAlPiUgCiAgc3VtbWFyaXplKG1lYW5fd29yZF9sZW5ndGggPSBtZWFuKHdvcmRfbGVuZ3RoKSkgJT4lIAogIGFycmFuZ2UoLW1lYW5fd29yZF9sZW5ndGgpCmBgYApUaGUgdGFibGUgYWJvdmUgc2hvd3MgdGhlIG1lYW4gd29yZCBsZW5ndGggZm9yIGVhY2ggaW5hdWd1cmFsIHNwZWVjaC4gVGhlIGluYXVndXJhbCBzcGVlY2hlcyB3ZXJlIHZlcnkgc2ltaWxhciBpbiBtZWFuIHdvcmQgbGVuZ3RoLCB3aXRoIFdhc2hpbmd0b24gaGF2aW5nIHRoZSBiaWdnZXN0IG1lYW4gd29yZCBsZW5ndGggb2YgNC45IGFuZCBLZW5uZWR5IGhhdmluZyB0aGUgc21hbGxlc3QgbWVhbiB3b3JkIGxlbmd0aCB3aXRoIDQuNC4gCgo0LiBUaGUgZm9sbG93aW5nIGdyYXBocyBzaG93IHRoZSBtZWFuIHdvcmQgbGVuZ3RoIGZvciBlYWNoIGluYXVndXJhbCBzcGVlY2guIApgYGB7cn0KaW5fd29yZHMgJT4lCiAgbXV0YXRlKHdvcmRfbGVuZ3RoID0gbmNoYXIod29yZCkpICU+JSAKICBnZ3Bsb3QoYWVzKHdvcmRfbGVuZ3RoKSkgKwogIGdlb21faGlzdG9ncmFtKGJpbndpZHRoID0gMSkgKwogIGZhY2V0X3dyYXAodmFycyhhdXRob3IpLCBzY2FsZXMgPSAiZnJlZV95IikgKwogIGxhYnModGl0bGUgPSAiQXZlcmFnZSBXb3JkIExlbmd0aCBvZiBJbmF1Z3VyYWwgU3BlZWNoZXMiKQpgYGAKQWNjb3JkaW5nIHRvIHRoZSBoaXN0b2dyYW1zLCB0aGUgcHJlc2lkZW50cyBhbGwgaGFkIHNpbWlsYXIgbWVhbiB3b3JkIGxlbmd0aHMgaW4gdGhlaXIgaW5hdWd1cmFsIHNwZWVjaGVzLiBJdCBpcyBpbXBvcnRhbnQgdG8gbm90ZSB0aGF0IHRoZSB5LWF4aXMgaXMgZGlmZmVyZW50IGZvciBlYWNoIGhpc3RvZ3JhbS4gCgo1LiBUaGUgZm9sbG93aW5nIHNob3dzIHRoZSBtb3N0IGNvbW1vbiB3b3JkcyB1c2VkIGluIGVhY2ggaW5hdWd1cmFsIHNwZWVjaCB3aXRoIHN0b3Agd29yZHMgcmVtb3ZlZC4gCmBgYHtyfQoKaW5fd29yZHMgJT4lCiAgYW50aV9qb2luKHN0b3Bfd29yZHMpICU+JSAKICBncm91cF9ieShhdXRob3IpICU+JSAKICBjb3VudCh3b3JkLCBzb3J0ID0gVCkgJT4lCiAgdG9wX24oNSkgJT4lIAogIHVuZ3JvdXAoKSAlPiUgCiAgbXV0YXRlKHdvcmQgPSByZW9yZGVyKHdvcmQsIG4pKSAlPiUKICBnZ3Bsb3QoYWVzKHdvcmQsIG4sIGZpbGwgPSBhdXRob3IpKSArCiAgZ2VvbV9jb2woc2hvdy5sZWdlbmQgPSBGQUxTRSkgKwogIGxhYnMoeCA9IE5VTEwsIHkgPSAiTW9zdCBDb21tb24gV29yZHMiKSArCiAgZmFjZXRfd3JhcCh2YXJzKGF1dGhvciksIHNjYWxlcyA9ICJmcmVlIikgKwogIHNjYWxlX2ZpbGxfdmlyaWRpc19kKCkgKwogIHRoZW1lX21pbmltYWwoKSArCiAgY29vcmRfZmxpcCgpCgpgYGAKVGhlIGFib3ZlIGdyYXBocyBzaG93IHRoZSBtb3N0IGNvbW1vbiB3b3JkcyB1c2VkIGFjY29yZGluZyB0byBlYWNoIHByZXNpZGVudCdzIHNwZWVjaCBhZnRlciBzdG9wIHdvcmRzIGhhdmUgYmVlbiByZW1vdmVkLiBUaGVyZSBhcmUgc29tZSBzaW1pbGFyaXRpZXMgYW5kIGRpZmZlcmVuY2VzIGFzIG9uZSBjYW4gc2VlIGZyb20gbG9va2luZyBhdCB0aGVtLiBGb3IgZXhhbXBsZSwgV2FzaGluZ3RvbiwgUmVhZ2FuLCBhbmQgSmVmZmVyc29uIGFsbCBoYWQgdGhlIHdvcmQgImdvdmVybm1lbnQiIGFzIHRoZSBtb3N0IHVzZWQgd29yZC4gRkRSIGFuZCBPYmFtYSBoYWQgYSBzaW1pbGFyIHdvcmQgdGhleSBib3RoIHVzZWQgdGhhdCB3YXMgdGhlIG1vc3QgY29tbW9uLCAibmF0aW9uYWwiIGFuZCAibmF0aW9uIi4gRnJvbSBsb29raW5nIGF0IHRoZSBoaXN0b2dyYW1zLCBpdCBtYWtlcyBzZW5zZSB0aGF0IHRoZXNlIHdlcmUgdGhlIG1vc3QgY29tbW9uIHdvcmRzIHRoYXQgd2VyZSB1c2VkIGluIGluYXVndXJhbCBzcGVlY2hlcy4gCgo2LiBUaGUgZm9sbG93aW5nIGdyYXBocyBzaG93IHdvcmRzIHdpdGggdGhlIGhpZ2hlc3QgdGYtaWRmcyBpbiBlYWNoIGluYXVndXJhbCBzcGVlY2guIApgYGB7cn0KaW5fd29yZF9jb3VudHMgPC0gaW5fbm90ZXMgJT4lICAgICAgICAgICAgICMgVGhpcyBjb3VudHMgZWFjaCB3b3JkIHBlciBhdXRob3IKICB1bm5lc3RfdG9rZW5zKHdvcmQsIHRleHQpICU+JQogIGNvdW50KGF1dGhvciwgd29yZCwgc29ydCA9IFRSVUUpIAoKdG90YWxfd29yZHMgPC0gaW5fd29yZF9jb3VudHMgJT4lICAgICAgICAgICAgICAgIyBUaGlzIGNvdW50cyB0b3RhbCB3b3JkcyBwZXIgYXV0aG9yCiAgZ3JvdXBfYnkoYXV0aG9yKSAlPiUgCiAgc3VtbWFyaXplKHRvdGFsID0gc3VtKG4pKQoKaW5fd29yZF9jb3VudHMgPC0gbGVmdF9qb2luKGluX3dvcmRfY291bnRzLCB0b3RhbF93b3JkcykgICAgIyBKb2lucyB0aGUgdHdvCgppbl90Zl9pZGYgPC0gaW5fd29yZF9jb3VudHMgJT4lICAgICAgICAgICAgICMgQ2FsY3VsYXRlcyB0Zi1pZGYKICBiaW5kX3RmX2lkZih3b3JkLCBhdXRob3IsIG4pCgppbl90Zl9pZGYgJT4lICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAjIERpc3BsYXlzIGl0CiAgYXJyYW5nZSgtdGZfaWRmKSAgICAgICAgICAgICAgICAgICAgICAgICAgCgpgYGAKYGBge3J9CmluX3RmX2lkZiAlPiUKICBhcnJhbmdlKC10Zl9pZGYpICU+JQogIG11dGF0ZSh3b3JkID0gZmFjdG9yKHdvcmQsIGxldmVscyA9IHJldih1bmlxdWUod29yZCkpKSkgJT4lIAogIGdyb3VwX2J5KGF1dGhvcikgJT4lIAogIHRvcF9uKDUpICU+JSAKICBnZ3Bsb3QoYWVzKHdvcmQsIHRmX2lkZiwgZmlsbCA9IGF1dGhvcikpICsKICBnZW9tX2NvbChzaG93LmxlZ2VuZCA9IEZBTFNFKSArCiAgbGFicyh4ID0gTlVMTCwgeSA9ICJ0Zi1pZGYiKSArCiAgZmFjZXRfd3JhcCh+YXV0aG9yLCBzY2FsZXMgPSAiZnJlZSIpICsKICBjb29yZF9mbGlwKCkgKwogIHRoZW1lX21pbmltYWwoKSArCiAgc2NhbGVfZmlsbF92aXJpZGlzX2QoKSArCiAgbGFicyh0aXRsZT0gIk1vc3QgRGlzdGluY3RpdmUgV29yZHMgaW4gRWFjaCBJbmF1Z3VyYWwgU3BlZWNoIikKCmBgYApUaGUgYWJvdmUgZ3JhcGhzIHNob3cgd2hhdCB3b3JkcyBhcmUgdW5pcXVlIHRvIGVhY2ggaW5hdWd1cmFsIHNwZWVjaCB3aGVuIGFsbCBzcGVlY2hlcyBhcmUgdGFrZW4gaW50byBhY2NvdW50LiBJbiBvdGhlciB3b3JkcywgaXQncyBhIG1lYXN1cmUgb2YgaG93IG9mdGVuIGEgd29yZCBhcHBlYXJzIGluIG9uZSBkb2N1bWVudCwgZGl2aWRlZCBieSBob3cgb2Z0ZW4gaXQgYXBwZWFycyBpbiBvdGhlciBkb2N1bWVudHMuIEZ1cnRoZXJtb3JlLCB0Zi1pZGYgYXV0b21hdGljYWxseSBleGNsdWRlcyBtb3N0IHN0b3Agd29yZHMgYmVjYXVzZSB0aGV5IHdpbGwgYXBwZWFyIGluIGFsbCBvZiB0aGUgbm90ZXMuIFdpdGggYWxsIG9mIHRoYXQgYmVpbmcgc2FpZCwgc29tZSBpbmF1Z3VyYWwgc3BlZWNoZXMgaGF2ZSBtb3JlIHVuaXF1ZSB3b3JkcyB0aGFuIG90aGVycy4gRm9yIGV4YW1wbGUsIFdhc2hpbmd0b24ncyBzcGVlY2ggaGFzIG1vcmUgdGhhbiBPYmFtYSdzLiBQZXJoYXBzLCB0aGUgd29yZHMgaW4gdGhlIGdyYXBocyBhcmUgYSBnb29kIGluZGljYXRvciBvZiB3aGF0IHdhcyBoYXBwZW5pbmcgZHVyaW5nIHRoZSB0aW1lIHRoYXQgcHJlc2lkZW50IGdhdmUgdGhlaXIgaW5hdWd1cmFsIHNwZWVjaC4gCgoKCgo=