Some time ago, I came across a video on YouTube that detailed how the reading levels of U.S. presidential speeches declined over time. When reading up on this topic, I found that earlier analyses have come to same conclusion. Check out, for example, this article in “The Atlantic” or this blog post on “Towards Data Science”. The key result of all of these analyses is that, while George Washington addressed the nation using college-level rhetoric, the speeches of modern-day presidents are at the reading level of a middle schooler. The reason for this development appears to be the changing voting population in American politics. In the 18th and 19th century, politics were dominated by college-educated voters and politicians. Today just about anybody can vote and participate in the political process. Consequently, politicians needed to adjust their rhetoric. Now let’s see if we can corroborate these findings using our own analysis in R!
Source: https://www.youtube.com/watch?v=N5HbFfxWILY
The first question we need to answer is: “How do we measure readability?” It turns out that there are many different formulas to calculate how easy or hard it is to read a particular text. Some of the more common ones are the Flesch reading-ease score (FRES), the Flesch–Kincaid grade level (FKGL), the automated readability index (ARI), the Coleman–Liau index (CLI), and the Linsear Write metric (LWM). These measures are based on how many words and sentences a text has, and how long these words and sentences typically are. However, they do not take into account whether the individual words are “hard” or “rare”. But it seems fair to assume that on average harder texts have longer sentences and harder words have more characters and more syllables in them.
Let’s introduce some notation:
The first four of the different reading level formulas are then given by:
\[\begin{align} FRES &= 206.835 - 1.015 \cdot n_w/n_se - 84.6 \cdot n_y/n_w \\[1em] FKGL &= 0.39 \cdot n_w/n_s + 11.8 \cdot n_y/n_w - 15.59 \\[1em] ARI &= 4.71 \cdot n_c/n_w + 0.5 \cdot n_w/n_se - 21.43 \\[1em] CLI &= 0.0588 \cdot n_c/n_w \cdot 100 - 0.296 \cdot n_s/n_w \cdot 100 - 15.8\\[1em] \end{align}\]
The LWM is computed by scoring the readability of every word in the text. Words with less than three syllables are given one point, every other word is given three points. Letting \(\Sigma\) denote the sum of all points, the LWM is given by
\[\begin{equation} LWM = \begin{cases} ~~~ \Sigma/2, ~~~~~ \text{ if } \Sigma > 20 \\ \Sigma/2-1, ~ \text{ if } \Sigma \leq 20 ~. \\ \end{cases} \end{equation}\]
Note that \(FRES\) evaluates how “easy” it is to read a text, while the other metrics evaluate how “hard” it is to read a text. \(FRES\) scores texts on a scale from 0 to 100 with 100 being a very easy text. In contrast to that, the other metrics determine a text’s grade level, such that a very easy text would score a 1 or a 2, while a college-level or university-level text would, e.g., score a 20.
Next, we have to get some data. You can download a large number of presidential speeches from The Grammar Lab. The data come in the form of text files. These text files all start with a header like this:
<title="Address on the Space Effort">
<date="September 12, 1962">
In a first step, I removed this header from each text file and I also removed comments like “<Laughter.>” or “<Applause.>”. What we wind up with are long strings. In case of Kennedy’s famous speech about the moon landing, this looks like this:
But why, some say, the moon? Why choose this as our goal? And they may well ask why climb the highest mountain. Why, 35 years ago, fly the Atlantic? Why does Rice play Texas? We choose to go to the moon. We choose to go to the moon in this decade and do the other things, not because they are easy, but because they are hard, because that goal will serve to organize and measure the best of our energies and skills, because that challenge is one that we are willing to accept, one we are unwilling to postpone, and one which we intend to win, and the others, too.
Let’s say we have saved this particular string as a character object called text. To calculate the readability scores, we need to count the number of sentences, words and syllables in this text. To do so, we need to load the following four packages and define a very useful auxiliary function:
library(tm) # text mining tools
library(stringr) # manipulate strings
library(quanteda) # count syllables
library(tidyverse) # data wrangling
'%!in%' <- function(x,y)!('%in%'(x,y)) # opposite of '%in%'
First, we split the text into a list of its different sentences using the strsplit() function while specifying periods, colons, semi-colons, question marks and exclamation marks as separators.
sentences <- strsplit(text, "[\\.:;?!]")
sentences
## [[1]]
## [1] "But why, some say, the moon"
## [2] " Why choose this as our goal"
## [3] " And they may well ask why climb the highest mountain"
## [4] " Why, 35 years ago, fly the Atlantic"
## [5] " Why does Rice play Texas"
## [6] " We choose to go to the moon"
## [7] " We choose to go to the moon in this decade and do the other things, not because they are easy, but because they are hard, because that goal will serve to organize and measure the best of our energies and skills, because that challenge is one that we are willing to accept, one we are unwilling to postpone, and one which we intend to win, and the others, too"
While this works well in this case, it would not work in a case like this:
alternative_text <- "The U.S. is a pretty big country."
strsplit(alternative_text, "[\\.:;?!]")
## [[1]]
## [1] "The U" "S"
## [3] " is a pretty big country"
R interprets “… U.S.” as belonging to separate sentences, which is not what we want. Therefore, I wrote a somewhat nasty and time-consuming workaround that merges these unwanted sentences back together. There are certainly more efficient solutions to this problem, but it worked for me and I did not have to hard-code numerous abbreviations that need to be changed into proper words:
sentences <- unlist(sentences)
while(TRUE){
if(min(nchar(sentences),na.rm=T) > 2) break()
for(i in 1:length(sentences)) {
if(nchar(sentences[i]) < 3){
sentences[i-1] <- paste(sentences[i-1],
sentences[i])
sentences[i] <- NA
if(i != length(sentences)){
sentences[i-1] <- paste(sentences[i-1],
sentences[i+1])
sentences[i+1] <- NA
}
sentences <- sentences[!is.na(sentences)]
break()
}
}
}
This chunk of code, using a continuous while-loop, goes through an unlisted version of the “sentences” list and appends the unwanted pseudo-sentences to the sentence before them. The while-loop stops, once all sentences have at least three characters in them.
Next, we split the text into the different words and syllables, which will then give us all the necessary ingredients to compute the different readability metrics:
# split text into words
sentences <- tolower(sentences)
words <- paste(sentences, collapse="")
words <- gsub(pattern="\\W", replace=" ", text)
words <- stripWhitespace(words)
words <- unlist(str_split(words, " "))
words <- words[nchar(words) != 0]
words <- tolower(words)
words
## [1] "but" "why" "some" "say" "the" "moon"
## [7] "why" "choose" "this" "as" "our" "goal"
## [13] "and" "they" "may" "well" "ask" "why"
## [19] "climb" "the" "highest" "mountain" "why" "35"
## [25] "years" "ago" "fly" "the" "atlantic" "why"
## [31] "does" "rice" "play" "texas" "we" "choose"
## [37] "to" "go" "to" "the" "moon" "we"
## [43] "choose" "to" "go" "to" "the" "moon"
## [49] "in" "this" "decade" "and" "do" "the"
## [55] "other" "things" "not" "because" "they" "are"
## [61] "easy" "but" "because" "they" "are" "hard"
## [67] "because" "that" "goal" "will" "serve" "to"
## [73] "organize" "and" "measure" "the" "best" "of"
## [79] "our" "energies" "and" "skills" "because" "that"
## [85] "challenge" "is" "one" "that" "we" "are"
## [91] "willing" "to" "accept" "one" "we" "are"
## [97] "unwilling" "to" "postpone" "and" "one" "which"
## [103] "we" "intend" "to" "win" "and" "the"
## [109] "others" "too"
# identify syllables
syllables <- sapply(words,nsyllable)
syllables
## but why some say the moon why choose
## 1 1 1 1 1 1 1 1
## this as our goal and they may well
## 1 1 2 1 1 1 1 1
## ask why climb the highest mountain why 35
## 1 1 1 1 2 2 1 NA
## years ago fly the atlantic why does rice
## 1 2 1 1 3 1 1 1
## play texas we choose to go to the
## 1 2 1 1 1 1 1 1
## moon we choose to go to the moon
## 1 1 1 1 1 1 1 1
## in this decade and do the other things
## 1 1 2 1 1 1 2 1
## not because they are easy but because they
## 1 2 1 1 2 1 2 1
## are hard because that goal will serve to
## 1 1 2 1 1 1 1 1
## organize and measure the best of our energies
## 3 1 2 1 1 1 2 3
## and skills because that challenge is one that
## 1 1 2 1 2 1 1 1
## we are willing to accept one we are
## 1 1 2 1 2 1 1 1
## unwilling to postpone and one which we intend
## 3 1 2 1 1 1 1 2
## to win and the others too
## 1 1 1 1 2 1
# number of characters, words, sentences and syllables
n_c <- sum(nchar(words))
n_w <- length(words)
n_se <- length(sentences)
n_sy <- sum(syllables, na.rm=T)
# display results
c(n_se,n_w,n_sy,n_c)
## [1] 7 110 137 435
Now, we are ready to compute the different readability metrics using the formulas explained above.
# flesch reading ease score (0 to 100)
fres <- 206.835 - 1.015*n_w/n_se - 84.6*n_sy/n_w
# flesch-kincaid grade level
fkgl <- 0.39*n_w/n_se + 11.8*n_sy/n_w - 15.59
# automated readability index (grade level)
ari <- 4.71*n_c/n_w + .5*n_w/n_se - 21.43
# coleman-liau index (grade level)
cli <- 0.0588*n_c/n_w*100 - 0.296*n_se/n_w*100 - 15.8
# linsear write metric (grade level)
tab <- sort(table(syllables))
value <- sum(tab[c("1","2")])
value <- value + sum(3*tab[which(names(tab)%!in%c("1","2"))])
value <- value/n_se
lwm <- ifelse(value > 20, value/2, (value-2)/2)
# display results
c(fres, fkgl, ari, cli, lwm)
## [1] 85.519545 5.234935 5.053052 5.569091 7.357143
As we can see, this excerpt from Kennedy’s speech on the space effort scores an 86 out of a 100 on the Flesh reading-ease scale. Of course, the different grade level metrics don’t agree perfectly, but they all put the text at around a grade level of 6.
I applied the above code to all of the speeches found on The Grammar Lab homepage (until Barack Obama) and stored the results in a data frame called “df”. After removing some outliers the data look like these:
print(df[1:25,], digits = 3)
## date president n_se n_w n_sy n_c fres fkgl ari cli lwm
## 1 1789-04-30 washington 45 1434 2403 7089 32.7 16.61 17.79 12.34 22.3
## 2 1789-10-03 washington 5 435 690 2065 -15.7 37.06 44.43 11.77 58.0
## 3 1790-01-08 washington 36 848 1466 4349 36.7 14.00 14.50 13.10 16.9
## 4 1790-12-08 washington 53 1400 2283 6856 42.1 13.95 14.84 11.87 18.0
## 5 1790-12-29 washington 53 1401 2048 6326 56.3 11.97 13.05 9.63 16.7
## 6 1791-10-25 washington 87 2267 3827 11402 37.6 14.49 15.29 12.64 18.6
## 7 1792-04-05 washington 9 156 262 810 47.2 10.99 11.69 13.02 12.2
## 8 1792-11-06 washington 87 2358 3873 11569 40.4 14.36 15.23 11.96 18.5
## 9 1792-12-12 washington 4 190 307 932 21.9 22.00 25.42 12.42 32.0
## 10 1793-03-04 washington 5 136 223 640 40.5 14.37 14.33 10.78 19.1
## 11 1793-04-22 washington 6 237 386 1158 29.0 19.03 21.33 12.18 26.9
## 12 1793-12-03 washington 65 1973 3194 9471 39.1 15.35 16.36 11.45 20.5
## 13 1794-08-07 washington 28 1284 2164 6343 17.7 22.18 24.77 12.60 32.7
## 14 1794-09-25 washington 22 653 1116 3339 32.1 16.15 17.49 13.27 21.6
## 15 1794-11-19 washington 118 2927 4871 14400 40.9 13.72 14.14 11.93 17.2
## 16 1795-12-08 washington 58 1977 3421 10015 25.8 18.12 19.47 13.12 24.3
## 17 1796-03-30 washington 42 1066 1733 5196 43.5 13.49 14.22 11.69 17.1
## 18 1796-08-29 washington 93 1602 2231 6935 71.5 7.56 7.57 7.94 10.2
## 19 1796-09-19 washington 221 6074 10256 29971 36.1 15.05 15.55 12.14 19.3
## 20 1796-12-07 washington 132 2865 4806 14165 42.9 12.67 12.71 11.91 15.1
## 21 1797-03-04 adams 56 2322 3899 11281 22.7 20.40 22.18 12.05 28.8
## 22 1797-05-16 adams 89 3025 5129 15079 28.9 17.67 19.04 12.64 23.9
## 23 1797-11-22 adams 62 2049 3393 10123 33.2 16.84 18.36 12.35 22.7
## 24 1798-03-23 adams 17 655 1115 3255 23.7 19.52 21.24 12.65 27.6
## 25 1798-12-08 adams 67 2218 3695 10967 32.3 16.98 18.41 12.38 23.0
Before studying how the readability scores evolved over time, let’s have a quick look at how strongly the different readability metrics are correlated to another.
pairs(df %>% select(fres, fkgl, ari, cli, lwm),
upper.panel = NULL)
For the most part, the different readability scores are strongly correlated with another. Unsurprisingly, the Flesch readability score is negatively correlated to the other metrics. This is because an easy text scores a high Flesch score, while the other metrics assign it a low grade level.
Finally, let’s have a look at the reading levels over time.
# flesch reading ease score (0 to 100)
ggplot(df, aes(date, fres)) +
geom_point() +
geom_smooth() +
labs(y="", x="", title="Flesh Reading Ease Score")
# flesch-kincaid formula (grade level)
ggplot(df, aes(date, fkgl)) +
geom_point() +
geom_smooth() +
labs(y="", x="", title="Flesh-Kincaid Grade Level")
# automated readability index (grade level)
ggplot(df, aes(date, ari)) +
geom_point() +
geom_smooth() +
labs(y="", x="", title="Automated readability index")
# coleman-liau index (grade level)
ggplot(df, aes(date, cli)) +
geom_point() +
geom_smooth()+
labs(y="", x="", title="Coleman-Liau index")
# linsear write metric (grade level)
ggplot(df, aes(date, lwm)) +
geom_point() +
geom_smooth() +
labs(y="", x="", title="Linsear Write metric")
On the whole, we find the same trend that has been found before: Readability has increased over time; “speeches have gotten dumber”. However, our data show quite a number of extremely high grade levels (or low readability scores). In fact, all metrics suggest higher grade levels than those found in earlier analyses. The most likely explanation for this appears to be that other analyses have more carefully selected their data sample and might have focused on particular types of speeches.
This was only a very superficial analysis which primarily focused on “How can you do it?” instead of “What are the ultimate results?”. If you want to repeat this analysis in a more rigorous way, you should probably read through the different speeches in your sample and check whether they are truly comparable. This step is necessary to ensure that you are not comparing a carefully crafted inaugural address, which was probably written by professional speech writer, to an unprepared remark during a press conference. A good place to look for presidential speeches that are grouped into different categories might be the American Presidency Project. Another useful data source is the Miller Center.