library(tidyverse)
library(DT)
library(tidytext)        # package for text analysis
library(readxl)          # reads excel files, the format I used for the data

TEXT ANALYSIS OF INAUGURAL SPEECHES

In this analysis we will be looking at a comparison of word strength and complexity in inaugural speeches throughout time. Below is a table of the Presidents and their speeches. The second table shows those speeches into words. There are a total of 14,874 words in all of the speeches, combined.

inaug_speeches <- read_excel("inaug_speeches.xlsx")

inaug_speeches
inaug_words <- inaug_speeches %>%
  unnest_tokens(word, text)

inaug_words
NA

Measures of Text Complexity

The following table shows the number of words as well as the diversity and density of the words in the speeches by each President. Diversity and density indicate the verbal complexity of the words, as does word length.

inaug_words %>% 
  group_by(author) %>% 
  summarise(num_words = n(),
            lex_diversity = n_distinct(word), 
            lex_density = n_distinct(word)/n())

The length of the words are shown in the next two tables. The first table shows the word, how long it is, and the author of the word. Presidents Lincoln, Reagan, and Obama had the longest word however, as shown in the second table, President Washington’s mean word length was a 4.94 which is longer than all three of the others.

inaug_words %>%
  mutate(word_length = nchar(word)) %>% 
  distinct(word, word_length, author) %>% 
  arrange(-word_length)
NA
inaug_words %>%
  group_by(author) %>% 
  mutate(word_length = nchar(word)) %>% 
  summarize(mean_word_length = mean(word_length)) %>% 
  arrange(-mean_word_length)

The graphs below show word length by each President.

inaug_words %>%
  mutate(word_length = nchar(word)) %>% 
  ggplot(aes(word_length)) +
  geom_histogram(binwidth = 1) +
  facet_wrap(vars(author), scales = "free_y") +
  labs(title = "Word Length By President")

Most Common Words Used in Inaugural Speeches

Now we will look at the speeches through a different lens of complexity. By removing some of the more common words such as “the”, “of”, and “I”, we are able to discern how distinct the words are and how often they may be used throughout the speech.

inaug_words %>%
  anti_join(stop_words) %>% 
  group_by(author) %>% 
  count(word, sort = T)%>%
  top_n(5) %>% 
  ungroup() %>% 
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(word, n, fill = author)) +
  geom_col(show.legend = FALSE) +
  labs(x = NULL, y = "Most Common Words Used in Inaugural Speeches") +
  facet_wrap(vars(author), scales = "free") +
  scale_fill_viridis_d() +
  theme_minimal() +
  coord_flip()
Joining, by = "word"
Selecting by n

The graph above shows the most common words used in inaugural speeches by President. There are many common words used in the speeches. President’s Obama and Roosevelt used “nation” or “national” the most while President’s Washington, Jefferson, and Reagan used the word “government” more.

The last thing we will look at is how unique the more distinctive word is to one speech compared to another speech. The graph below shows the top five distinctive words used in each inaugural speech.

inaug_tf_idf %>%
  arrange(-tf_idf) %>%
  mutate(word = factor(word, levels = rev(unique(word)))) %>% 
  group_by(author) %>% 
  top_n(5) %>% 
  ggplot(aes(word, tf_idf, fill = author)) +
  geom_col(show.legend = FALSE) +
  labs(x = NULL, y = "tf-idf") +
  facet_wrap(~author, scales = "free") +
  coord_flip() +
  theme_minimal() +
  scale_fill_viridis_d() +
  labs(title = "Top 5 Distinctive Words in each Inaugural Speech")
Selecting by tf_idf

LS0tDQp0aXRsZTogIk1pbm5pZSBCZWxsIC0gSW5hdWd1cmFsIFNwZWVjaCBBbmFseXNpcyINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQoNCmBgYHtyfQ0KbGlicmFyeSh0aWR5dmVyc2UpDQpsaWJyYXJ5KERUKQ0KbGlicmFyeSh0aWR5dGV4dCkgICAgICAgICMgcGFja2FnZSBmb3IgdGV4dCBhbmFseXNpcw0KbGlicmFyeShyZWFkeGwpICAgICAgICAgICMgcmVhZHMgZXhjZWwgZmlsZXMsIHRoZSBmb3JtYXQgSSB1c2VkIGZvciB0aGUgZGF0YQ0KDQpgYGANCg0KDQojIyMgVEVYVCBBTkFMWVNJUyBPRiBJTkFVR1VSQUwgU1BFRUNIRVMNCg0KSW4gdGhpcyBhbmFseXNpcyB3ZSB3aWxsIGJlIGxvb2tpbmcgYXQgYSBjb21wYXJpc29uIG9mIHdvcmQgc3RyZW5ndGggYW5kIGNvbXBsZXhpdHkgaW4gaW5hdWd1cmFsIHNwZWVjaGVzIHRocm91Z2hvdXQgdGltZS4gQmVsb3cgaXMgYSB0YWJsZSBvZiB0aGUgUHJlc2lkZW50cyBhbmQgdGhlaXIgc3BlZWNoZXMuIFRoZSBzZWNvbmQgdGFibGUgc2hvd3MgdGhvc2Ugc3BlZWNoZXMgaW50byB3b3Jkcy4gVGhlcmUgYXJlIGEgdG90YWwgb2YgMTQsODc0IHdvcmRzIGluIGFsbCBvZiB0aGUgc3BlZWNoZXMsIGNvbWJpbmVkLg0KDQpgYGB7cn0NCmluYXVnX3NwZWVjaGVzIDwtIHJlYWRfZXhjZWwoImluYXVnX3NwZWVjaGVzLnhsc3giKQ0KDQppbmF1Z19zcGVlY2hlcw0KYGBgDQoNCg0KYGBge3J9DQppbmF1Z193b3JkcyA8LSBpbmF1Z19zcGVlY2hlcyAlPiUNCiAgdW5uZXN0X3Rva2Vucyh3b3JkLCB0ZXh0KQ0KDQppbmF1Z193b3Jkcw0KDQpgYGANCg0KDQojIyMgTWVhc3VyZXMgb2YgVGV4dCBDb21wbGV4aXR5DQoNClRoZSBmb2xsb3dpbmcgdGFibGUgc2hvd3MgdGhlIG51bWJlciBvZiB3b3JkcyBhcyB3ZWxsIGFzIHRoZSBkaXZlcnNpdHkgYW5kIGRlbnNpdHkgb2YgdGhlIHdvcmRzIGluIHRoZSBzcGVlY2hlcyBieSBlYWNoIFByZXNpZGVudC4gRGl2ZXJzaXR5IGFuZCBkZW5zaXR5IGluZGljYXRlIHRoZSB2ZXJiYWwgY29tcGxleGl0eSBvZiB0aGUgd29yZHMsIGFzIGRvZXMgd29yZCBsZW5ndGguDQoNCg0KYGBge3J9DQppbmF1Z193b3JkcyAlPiUgDQogIGdyb3VwX2J5KGF1dGhvcikgJT4lIA0KICBzdW1tYXJpc2UobnVtX3dvcmRzID0gbigpLA0KICAgICAgICAgICAgbGV4X2RpdmVyc2l0eSA9IG5fZGlzdGluY3Qod29yZCksIA0KICAgICAgICAgICAgbGV4X2RlbnNpdHkgPSBuX2Rpc3RpbmN0KHdvcmQpL24oKSkNCmBgYA0KDQoNClRoZSBsZW5ndGggb2YgdGhlIHdvcmRzIGFyZSBzaG93biBpbiB0aGUgbmV4dCB0d28gdGFibGVzLiBUaGUgZmlyc3QgdGFibGUgc2hvd3MgdGhlIHdvcmQsIGhvdyBsb25nIGl0IGlzLCBhbmQgdGhlIGF1dGhvciBvZiB0aGUgd29yZC4gUHJlc2lkZW50cyBMaW5jb2xuLCBSZWFnYW4sIGFuZCBPYmFtYSBoYWQgdGhlIGxvbmdlc3Qgd29yZCBob3dldmVyLCBhcyBzaG93biBpbiB0aGUgc2Vjb25kIHRhYmxlLCBQcmVzaWRlbnQgV2FzaGluZ3RvbidzIG1lYW4gd29yZCBsZW5ndGggd2FzIGEgNC45NCB3aGljaCBpcyBsb25nZXIgdGhhbiBhbGwgdGhyZWUgb2YgdGhlIG90aGVycy4NCg0KDQpgYGB7cn0NCmluYXVnX3dvcmRzICU+JQ0KICBtdXRhdGUod29yZF9sZW5ndGggPSBuY2hhcih3b3JkKSkgJT4lIA0KICBkaXN0aW5jdCh3b3JkLCB3b3JkX2xlbmd0aCwgYXV0aG9yKSAlPiUgDQogIGFycmFuZ2UoLXdvcmRfbGVuZ3RoKQ0KDQpgYGANCg0KDQoNCmBgYHtyfQ0KaW5hdWdfd29yZHMgJT4lDQogIGdyb3VwX2J5KGF1dGhvcikgJT4lIA0KICBtdXRhdGUod29yZF9sZW5ndGggPSBuY2hhcih3b3JkKSkgJT4lIA0KICBzdW1tYXJpemUobWVhbl93b3JkX2xlbmd0aCA9IG1lYW4od29yZF9sZW5ndGgpKSAlPiUgDQogIGFycmFuZ2UoLW1lYW5fd29yZF9sZW5ndGgpDQpgYGANCg0KDQpUaGUgZ3JhcGhzIGJlbG93IHNob3cgd29yZCBsZW5ndGggYnkgZWFjaCBQcmVzaWRlbnQuDQoNCmBgYHtyfQ0KaW5hdWdfd29yZHMgJT4lDQogIG11dGF0ZSh3b3JkX2xlbmd0aCA9IG5jaGFyKHdvcmQpKSAlPiUgDQogIGdncGxvdChhZXMod29yZF9sZW5ndGgpKSArDQogIGdlb21faGlzdG9ncmFtKGJpbndpZHRoID0gMSkgKw0KICBmYWNldF93cmFwKHZhcnMoYXV0aG9yKSwgc2NhbGVzID0gImZyZWVfeSIpICsNCiAgbGFicyh0aXRsZSA9ICJXb3JkIExlbmd0aCBCeSBQcmVzaWRlbnQiKQ0KYGBgDQoNCg0KIyMjIE1vc3QgQ29tbW9uIFdvcmRzIFVzZWQgaW4gSW5hdWd1cmFsIFNwZWVjaGVzDQoNCg0KTm93IHdlIHdpbGwgbG9vayBhdCB0aGUgc3BlZWNoZXMgdGhyb3VnaCBhIGRpZmZlcmVudCBsZW5zIG9mIGNvbXBsZXhpdHkuIEJ5IHJlbW92aW5nIHNvbWUgb2YgdGhlIG1vcmUgY29tbW9uIHdvcmRzIHN1Y2ggYXMgInRoZSIsICJvZiIsIGFuZCAiSSIsIHdlIGFyZSBhYmxlIHRvIGRpc2Nlcm4gaG93IGRpc3RpbmN0IHRoZSB3b3JkcyBhcmUgYW5kIGhvdyBvZnRlbiB0aGV5IG1heSBiZSB1c2VkIHRocm91Z2hvdXQgdGhlIHNwZWVjaC4NCg0KDQpgYGB7cn0NCmluYXVnX3dvcmRzICU+JQ0KICBhbnRpX2pvaW4oc3RvcF93b3JkcykgJT4lIA0KICBncm91cF9ieShhdXRob3IpICU+JSANCiAgY291bnQod29yZCwgc29ydCA9IFQpJT4lDQogIHRvcF9uKDUpICU+JSANCiAgdW5ncm91cCgpICU+JSANCiAgbXV0YXRlKHdvcmQgPSByZW9yZGVyKHdvcmQsIG4pKSAlPiUNCiAgZ2dwbG90KGFlcyh3b3JkLCBuLCBmaWxsID0gYXV0aG9yKSkgKw0KICBnZW9tX2NvbChzaG93LmxlZ2VuZCA9IEZBTFNFKSArDQogIGxhYnMoeCA9IE5VTEwsIHkgPSAiTW9zdCBDb21tb24gV29yZHMgVXNlZCBpbiBJbmF1Z3VyYWwgU3BlZWNoZXMiKSArDQogIGZhY2V0X3dyYXAodmFycyhhdXRob3IpLCBzY2FsZXMgPSAiZnJlZSIpICsNCiAgc2NhbGVfZmlsbF92aXJpZGlzX2QoKSArDQogIHRoZW1lX21pbmltYWwoKSArDQogIGNvb3JkX2ZsaXAoKQ0KDQpgYGANCg0KDQpUaGUgZ3JhcGggYWJvdmUgc2hvd3MgdGhlIG1vc3QgY29tbW9uIHdvcmRzIHVzZWQgaW4gaW5hdWd1cmFsIHNwZWVjaGVzIGJ5IFByZXNpZGVudC4gVGhlcmUgYXJlIG1hbnkgY29tbW9uIHdvcmRzIHVzZWQgaW4gdGhlIHNwZWVjaGVzLiBQcmVzaWRlbnQncyBPYmFtYSBhbmQgUm9vc2V2ZWx0IHVzZWQgIm5hdGlvbiIgb3IgIm5hdGlvbmFsIiB0aGUgbW9zdCB3aGlsZSBQcmVzaWRlbnQncyBXYXNoaW5ndG9uLCBKZWZmZXJzb24sIGFuZCBSZWFnYW4gdXNlZCB0aGUgd29yZCAiZ292ZXJubWVudCIgbW9yZS4NCg0KDQpUaGUgbGFzdCB0aGluZyB3ZSB3aWxsIGxvb2sgYXQgaXMgaG93IHVuaXF1ZSB0aGUgbW9yZSBkaXN0aW5jdGl2ZSB3b3JkIGlzIHRvIG9uZSBzcGVlY2ggY29tcGFyZWQgdG8gYW5vdGhlciBzcGVlY2guIFRoZSBncmFwaCBiZWxvdyBzaG93cyB0aGUgdG9wIGZpdmUgZGlzdGluY3RpdmUgd29yZHMgdXNlZCBpbiBlYWNoIGluYXVndXJhbCBzcGVlY2guDQoNCg0KYGBge3J9DQppbmF1Z190Zl9pZGYgJT4lDQogIGFycmFuZ2UoLXRmX2lkZikgJT4lDQogIG11dGF0ZSh3b3JkID0gZmFjdG9yKHdvcmQsIGxldmVscyA9IHJldih1bmlxdWUod29yZCkpKSkgJT4lIA0KICBncm91cF9ieShhdXRob3IpICU+JSANCiAgdG9wX24oNSkgJT4lIA0KICBnZ3Bsb3QoYWVzKHdvcmQsIHRmX2lkZiwgZmlsbCA9IGF1dGhvcikpICsNCiAgZ2VvbV9jb2woc2hvdy5sZWdlbmQgPSBGQUxTRSkgKw0KICBsYWJzKHggPSBOVUxMLCB5ID0gInRmLWlkZiIpICsNCiAgZmFjZXRfd3JhcCh+YXV0aG9yLCBzY2FsZXMgPSAiZnJlZSIpICsNCiAgY29vcmRfZmxpcCgpICsNCiAgdGhlbWVfbWluaW1hbCgpICsNCiAgc2NhbGVfZmlsbF92aXJpZGlzX2QoKSArDQogIGxhYnModGl0bGUgPSAiVG9wIDUgRGlzdGluY3RpdmUgV29yZHMgaW4gZWFjaCBJbmF1Z3VyYWwgU3BlZWNoIikNCmBgYA0KDQoNCg0KDQoNCg0KDQo=