Some time ago, I came across a video on YouTube that detailed how the reading levels of U.S. presidential speeches declined over time. When reading up on this topic, I found that earlier analyses have come to same conclusion. Check out, for example, this article in “The Atlantic” or this blog post on “Towards Data Science”. The key result of all of these analyses is that, while George Washington addressed the nation using college-level rhetoric, the speeches of modern-day presidents are at the reading level of a middle schooler. The reason for this development appears to be the changing voting population in American politics. In the 18th and 19th century, politics were dominated by college-educated voters and politicians. Today just about anybody can vote and participate in the political process. Consequently, politicians needed to adjust their rhetoric. Now let’s see if we can corroborate these findings using our own analysis in R!

Source: https://www.youtube.com/watch?v=N5HbFfxWILY

Reading level formulas

The first question we need to answer is: “How do we measure readability?” It turns out that there are many different formulas to calculate how easy or hard it is to read a particular text. Some of the more common ones are the Flesch reading-ease score (FRES), the Flesch–Kincaid grade level (FKGL), the automated readability index (ARI), the Coleman–Liau index (CLI), and the Linsear Write metric (LWM). These measures are based on how many words and sentences a text has, and how long these words and sentences typically are. However, they do not take into account whether the individual words are “hard” or “rare”. But it seems fair to assume that on average harder texts have longer sentences and harder words have more characters and more syllables in them.

Let’s introduce some notation:

The first four of the different reading level formulas are then given by:

\[\begin{align} FRES &= 206.835 - 1.015 \cdot n_w/n_se - 84.6 \cdot n_y/n_w \\[1em] FKGL &= 0.39 \cdot n_w/n_s + 11.8 \cdot n_y/n_w - 15.59 \\[1em] ARI &= 4.71 \cdot n_c/n_w + 0.5 \cdot n_w/n_se - 21.43 \\[1em] CLI &= 0.0588 \cdot n_c/n_w \cdot 100 - 0.296 \cdot n_s/n_w \cdot 100 - 15.8\\[1em] \end{align}\]

The LWM is computed by scoring the readability of every word in the text. Words with less than three syllables are given one point, every other word is given three points. Letting \(\Sigma\) denote the sum of all points, the LWM is given by

\[\begin{equation} LWM = \begin{cases} ~~~ \Sigma/2, ~~~~~ \text{ if } \Sigma > 20 \\ \Sigma/2-1, ~ \text{ if } \Sigma \leq 20 ~. \\ \end{cases} \end{equation}\]

Note that \(FRES\) evaluates how “easy” it is to read a text, while the other metrics evaluate how “hard” it is to read a text. \(FRES\) scores texts on a scale from 0 to 100 with 100 being a very easy text. In contrast to that, the other metrics determine a text’s grade level, such that a very easy text would score a 1 or a 2, while a college-level or university-level text would, e.g., score a 20.

Getting the necessary data

Next, we have to get some data. You can download a large number of presidential speeches from The Grammar Lab. The data come in the form of text files. These text files all start with a header like this:

<title="Address on the Space Effort">
<date="September 12, 1962">

In a first step, I removed this header from each text file and I also removed comments like “<Laughter.>” or “<Applause.>”. What we wind up with are long strings. In case of Kennedy’s famous speech about the moon landing, this looks like this:

But why, some say, the moon? Why choose this as our goal? And they may well ask why climb the highest mountain. Why, 35 years ago, fly the Atlantic? Why does Rice play Texas? We choose to go to the moon. We choose to go to the moon in this decade and do the other things, not because they are easy, but because they are hard, because that goal will serve to organize and measure the best of our energies and skills, because that challenge is one that we are willing to accept, one we are unwilling to postpone, and one which we intend to win, and the others, too.

Let’s say we have saved this particular string as a character object called text. To calculate the readability scores, we need to count the number of sentences, words and syllables in this text. To do so, we need to load the following four packages and define a very useful auxiliary function:

library(tm)        # text mining tools
library(stringr)   # manipulate strings
library(quanteda)  # count syllables
library(tidyverse) # data wrangling
'%!in%' <- function(x,y)!('%in%'(x,y)) # opposite of '%in%'

First, we split the text into a list of its different sentences using the strsplit() function while specifying periods, colons, semi-colons, question marks and exclamation marks as separators.

sentences <- strsplit(text, "[\\.:;?!]")
sentences
## [[1]]
## [1] "But why, some say, the moon"                                                                                                                                                                                                                                                                                                                                            
## [2] " Why choose this as our goal"                                                                                                                                                                                                                                                                                                                                           
## [3] " And they may well ask why climb the highest mountain"                                                                                                                                                                                                                                                                                                                  
## [4] " Why, 35 years ago, fly the Atlantic"                                                                                                                                                                                                                                                                                                                                   
## [5] " Why does Rice play Texas"                                                                                                                                                                                                                                                                                                                                              
## [6] " We choose to go to the moon"                                                                                                                                                                                                                                                                                                                                           
## [7] " We choose to go to the moon in this decade and do the other things, not because they are easy, but because they are hard, because that goal will serve to organize and measure the best of our energies and skills, because that challenge is one that we are willing to accept, one we are unwilling to postpone, and one which we intend to win, and the others, too"

While this works well in this case, it would not work in a case like this:

alternative_text <- "The U.S. is a pretty big country."
strsplit(alternative_text, "[\\.:;?!]")
## [[1]]
## [1] "The U"                    "S"                       
## [3] " is a pretty big country"

R interprets “… U.S.” as belonging to separate sentences, which is not what we want. Therefore, I wrote a somewhat nasty and time-consuming workaround that merges these unwanted sentences back together. There are certainly more efficient solutions to this problem, but it worked for me and I did not have to hard-code numerous abbreviations that need to be changed into proper words:

sentences <- unlist(sentences)
  while(TRUE){
    if(min(nchar(sentences),na.rm=T) > 2) break()
    for(i in 1:length(sentences)) {
      if(nchar(sentences[i]) < 3){
        sentences[i-1] <- paste(sentences[i-1],
                                sentences[i])
        sentences[i] <- NA
        if(i != length(sentences)){
          sentences[i-1] <- paste(sentences[i-1],
                                  sentences[i+1])
          sentences[i+1] <- NA
        }
        sentences <- sentences[!is.na(sentences)]
        break()
      }
    }
  }

This chunk of code, using a continuous while-loop, goes through an unlisted version of the “sentences” list and appends the unwanted pseudo-sentences to the sentence before them. The while-loop stops, once all sentences have at least three characters in them.

Next, we split the text into the different words and syllables, which will then give us all the necessary ingredients to compute the different readability metrics:

# split text into words
sentences <- tolower(sentences) 
words <- paste(sentences, collapse="")
words <- gsub(pattern="\\W", replace=" ", text) 
words <- stripWhitespace(words) 
words <- unlist(str_split(words, " "))
words <- words[nchar(words) != 0]
words <- tolower(words)
words
##   [1] "but"       "why"       "some"      "say"       "the"       "moon"     
##   [7] "why"       "choose"    "this"      "as"        "our"       "goal"     
##  [13] "and"       "they"      "may"       "well"      "ask"       "why"      
##  [19] "climb"     "the"       "highest"   "mountain"  "why"       "35"       
##  [25] "years"     "ago"       "fly"       "the"       "atlantic"  "why"      
##  [31] "does"      "rice"      "play"      "texas"     "we"        "choose"   
##  [37] "to"        "go"        "to"        "the"       "moon"      "we"       
##  [43] "choose"    "to"        "go"        "to"        "the"       "moon"     
##  [49] "in"        "this"      "decade"    "and"       "do"        "the"      
##  [55] "other"     "things"    "not"       "because"   "they"      "are"      
##  [61] "easy"      "but"       "because"   "they"      "are"       "hard"     
##  [67] "because"   "that"      "goal"      "will"      "serve"     "to"       
##  [73] "organize"  "and"       "measure"   "the"       "best"      "of"       
##  [79] "our"       "energies"  "and"       "skills"    "because"   "that"     
##  [85] "challenge" "is"        "one"       "that"      "we"        "are"      
##  [91] "willing"   "to"        "accept"    "one"       "we"        "are"      
##  [97] "unwilling" "to"        "postpone"  "and"       "one"       "which"    
## [103] "we"        "intend"    "to"        "win"       "and"       "the"      
## [109] "others"    "too"
# identify syllables
syllables <- sapply(words,nsyllable)
syllables
##       but       why      some       say       the      moon       why    choose 
##         1         1         1         1         1         1         1         1 
##      this        as       our      goal       and      they       may      well 
##         1         1         2         1         1         1         1         1 
##       ask       why     climb       the   highest  mountain       why        35 
##         1         1         1         1         2         2         1        NA 
##     years       ago       fly       the  atlantic       why      does      rice 
##         1         2         1         1         3         1         1         1 
##      play     texas        we    choose        to        go        to       the 
##         1         2         1         1         1         1         1         1 
##      moon        we    choose        to        go        to       the      moon 
##         1         1         1         1         1         1         1         1 
##        in      this    decade       and        do       the     other    things 
##         1         1         2         1         1         1         2         1 
##       not   because      they       are      easy       but   because      they 
##         1         2         1         1         2         1         2         1 
##       are      hard   because      that      goal      will     serve        to 
##         1         1         2         1         1         1         1         1 
##  organize       and   measure       the      best        of       our  energies 
##         3         1         2         1         1         1         2         3 
##       and    skills   because      that challenge        is       one      that 
##         1         1         2         1         2         1         1         1 
##        we       are   willing        to    accept       one        we       are 
##         1         1         2         1         2         1         1         1 
## unwilling        to  postpone       and       one     which        we    intend 
##         3         1         2         1         1         1         1         2 
##        to       win       and       the    others       too 
##         1         1         1         1         2         1
# number of characters, words, sentences and syllables
n_c <- sum(nchar(words))
n_w <- length(words)
n_se <- length(sentences)
n_sy <- sum(syllables, na.rm=T)

# display results
c(n_se,n_w,n_sy,n_c)
## [1]   7 110 137 435

Now, we are ready to compute the different readability metrics using the formulas explained above.

# flesch reading ease score (0 to 100)
fres <- 206.835 - 1.015*n_w/n_se - 84.6*n_sy/n_w

# flesch-kincaid grade level
fkgl <- 0.39*n_w/n_se + 11.8*n_sy/n_w - 15.59

# automated readability index (grade level)
ari <- 4.71*n_c/n_w + .5*n_w/n_se - 21.43

# coleman-liau index (grade level)
cli <- 0.0588*n_c/n_w*100 - 0.296*n_se/n_w*100 - 15.8

# linsear write metric (grade level)
tab <- sort(table(syllables))
value <- sum(tab[c("1","2")])
value <- value + sum(3*tab[which(names(tab)%!in%c("1","2"))])
value <- value/n_se
lwm <- ifelse(value > 20, value/2, (value-2)/2)

# display results
c(fres, fkgl, ari, cli, lwm)
## [1] 85.519545  5.234935  5.053052  5.569091  7.357143

As we can see, this excerpt from Kennedy’s speech on the space effort scores an 86 out of a 100 on the Flesh reading-ease scale. Of course, the different grade level metrics don’t agree perfectly, but they all put the text at around a grade level of 6.

Results

I applied the above code to all of the speeches found on The Grammar Lab homepage (until Barack Obama) and stored the results in a data frame called “df”. After removing some outliers the data look like these:

print(df[1:25,], digits = 3)
##          date  president n_se  n_w  n_sy   n_c  fres  fkgl   ari   cli  lwm
## 1  1789-04-30 washington   45 1434  2403  7089  32.7 16.61 17.79 12.34 22.3
## 2  1789-10-03 washington    5  435   690  2065 -15.7 37.06 44.43 11.77 58.0
## 3  1790-01-08 washington   36  848  1466  4349  36.7 14.00 14.50 13.10 16.9
## 4  1790-12-08 washington   53 1400  2283  6856  42.1 13.95 14.84 11.87 18.0
## 5  1790-12-29 washington   53 1401  2048  6326  56.3 11.97 13.05  9.63 16.7
## 6  1791-10-25 washington   87 2267  3827 11402  37.6 14.49 15.29 12.64 18.6
## 7  1792-04-05 washington    9  156   262   810  47.2 10.99 11.69 13.02 12.2
## 8  1792-11-06 washington   87 2358  3873 11569  40.4 14.36 15.23 11.96 18.5
## 9  1792-12-12 washington    4  190   307   932  21.9 22.00 25.42 12.42 32.0
## 10 1793-03-04 washington    5  136   223   640  40.5 14.37 14.33 10.78 19.1
## 11 1793-04-22 washington    6  237   386  1158  29.0 19.03 21.33 12.18 26.9
## 12 1793-12-03 washington   65 1973  3194  9471  39.1 15.35 16.36 11.45 20.5
## 13 1794-08-07 washington   28 1284  2164  6343  17.7 22.18 24.77 12.60 32.7
## 14 1794-09-25 washington   22  653  1116  3339  32.1 16.15 17.49 13.27 21.6
## 15 1794-11-19 washington  118 2927  4871 14400  40.9 13.72 14.14 11.93 17.2
## 16 1795-12-08 washington   58 1977  3421 10015  25.8 18.12 19.47 13.12 24.3
## 17 1796-03-30 washington   42 1066  1733  5196  43.5 13.49 14.22 11.69 17.1
## 18 1796-08-29 washington   93 1602  2231  6935  71.5  7.56  7.57  7.94 10.2
## 19 1796-09-19 washington  221 6074 10256 29971  36.1 15.05 15.55 12.14 19.3
## 20 1796-12-07 washington  132 2865  4806 14165  42.9 12.67 12.71 11.91 15.1
## 21 1797-03-04      adams   56 2322  3899 11281  22.7 20.40 22.18 12.05 28.8
## 22 1797-05-16      adams   89 3025  5129 15079  28.9 17.67 19.04 12.64 23.9
## 23 1797-11-22      adams   62 2049  3393 10123  33.2 16.84 18.36 12.35 22.7
## 24 1798-03-23      adams   17  655  1115  3255  23.7 19.52 21.24 12.65 27.6
## 25 1798-12-08      adams   67 2218  3695 10967  32.3 16.98 18.41 12.38 23.0

Before studying how the readability scores evolved over time, let’s have a quick look at how strongly the different readability metrics are correlated to another.

pairs(df %>% select(fres, fkgl, ari, cli, lwm),
      upper.panel = NULL)

For the most part, the different readability scores are strongly correlated with another. Unsurprisingly, the Flesch readability score is negatively correlated to the other metrics. This is because an easy text scores a high Flesch score, while the other metrics assign it a low grade level.

Finally, let’s have a look at the reading levels over time.

# flesch reading ease score (0 to 100)
ggplot(df, aes(date, fres)) + 
  geom_point() + 
  geom_smooth() + 
  labs(y="", x="", title="Flesh Reading Ease Score")

# flesch-kincaid formula (grade level)
ggplot(df, aes(date, fkgl)) + 
  geom_point() + 
  geom_smooth() + 
  labs(y="", x="", title="Flesh-Kincaid Grade Level")

# automated readability index (grade level)
ggplot(df, aes(date, ari)) + 
  geom_point() + 
  geom_smooth() + 
  labs(y="", x="", title="Automated readability index")

# coleman-liau index (grade level)
ggplot(df, aes(date, cli)) + 
  geom_point() + 
  geom_smooth()+ 
  labs(y="", x="", title="Coleman-Liau index")

# linsear write metric (grade level)
ggplot(df, aes(date, lwm)) + 
  geom_point() + 
  geom_smooth() + 
  labs(y="", x="", title="Linsear Write metric")

On the whole, we find the same trend that has been found before: Readability has increased over time; “speeches have gotten dumber”. However, our data show quite a number of extremely high grade levels (or low readability scores). In fact, all metrics suggest higher grade levels than those found in earlier analyses. The most likely explanation for this appears to be that other analyses have more carefully selected their data sample and might have focused on particular types of speeches.

Disclaimer

This was only a very superficial analysis which primarily focused on “How can you do it?” instead of “What are the ultimate results?”. If you want to repeat this analysis in a more rigorous way, you should probably read through the different speeches in your sample and check whether they are truly comparable. This step is necessary to ensure that you are not comparing a carefully crafted inaugural address, which was probably written by professional speech writer, to an unprepared remark during a press conference. A good place to look for presidential speeches that are grouped into different categories might be the American Presidency Project. Another useful data source is the Miller Center.