For REASONS, my husband asked me to repeat my analysis of readability scores on novels for All Quiet on the Western Front by Erich Maria Remarque. This book is not available on Project Gutenberg so I pulled a text version from the Open Library.
library(readr)
library(dplyr)
allquiet <- read_lines("~/Public/Drop Box/All quiet on the western front - Remarque, Erich Maria, 1898-1970 plain.txt")
allquiet <- data_frame(line = 1:length(allquiet), text = allquiet)
allquiet <- allquiet[7:dim(allquiet)[1],]
library(tidytext)
tidy_allquiet <- allquiet %>%
unnest_tokens(sentence, text, token = "sentences") %>%
unnest_tokens(word, sentence, drop = FALSE) %>%
rowwise() %>%
mutate(n_syllables = count_syllables(word)) %>%
ungroup()
tidy_allquiet %>%
select(word, n_syllables)
## # A tibble: 63,592 × 2
## word n_syllables
## <chr> <dbl>
## 1 this 1
## 2 book 1
## 3 is 1
## 4 to 1
## 5 be 1
## 6 neither 2
## 7 an 1
## 8 accusation 4
## 9 nor 1
## 10 a 1
## # ... with 63,582 more rows
Now the text is tidied and in the form we need it.
How many sentences are there?
n_sentences <- tidy_allquiet %>%
summarise(n_sentences = n_distinct(sentence)) %>%
as.numeric()
n_sentences
## [1] 5027
How many words with 3 or more syllables are there?
n_polysyllables <- tidy_allquiet %>%
filter(n_syllables >= 3) %>%
summarise(n_polysyllables = n()) %>%
as.numeric()
n_polysyllables
## [1] 3748
What is the SMOG grade?
SMOG <- 1.0430 * sqrt(30 * n_polysyllables/n_sentences) + 3.1291
SMOG
## [1] 8.061863
So All Quiet on the Western Front has a SMOG grade of about 8 (i.e. about the beginning of 8th grade), a year lower than Little Women and Anne of Green Gables.