What is the most common letter frequencies of 5-letter words in each letter position. These make good candidates for opening or second words with highest probability of getting green or yellow letters
TL;DR: Try these words to start:
SANES, SALES, SORES, CARES, BARES, TARES, SATES, SERES, PARES, MARES
Note: As far as wordle strategy goes, you might not want to pick words from this list that have duplicate letters!
First load a dictionary of English words and subset them to only 5-letter words. Here I’m using the Grady Ward Augmented English dictionary.
# load a dictionary
data(GradyAugmented)
# filter to only 5 letter words and convert to uppercase
wordles <- toupper(GradyAugmented[str_length(GradyAugmented)==5])
flextable(data.frame(Words = sample(wordles,10))) %>%
set_caption(caption = "Example Wordles") %>%
set_table_properties(layout = "autofit", width = 0.2)
Words |
SLUGS |
AFOUL |
SPANK |
SONSY |
ZANZA |
SPIKY |
HEXER |
FLINN |
YAUDS |
CRUET |
For each of the five letter positions, we can calculate how often each letter appears in that position. For example, how often does the letter ‘S’ appear as the first letter of a 5-letter word, how often does ‘T’ appear as the second letter etc.
The plot below shows the top 10 letters in each word position showing that S is the most common start (and last!) letter in 5-letter words.
# function to tabulate letter frequency from a vector of letters
letter_freq <- function(x){
tbl <- table(x)
res <- cbind(tbl,round(prop.table(tbl)*100,2))
colnames(res) <- c('Count','Percentage')
res <- res %>%
as.data.frame() %>%
rownames_to_column("Letter") %>%
arrange(desc(Percentage))
}
# wrapper function for letter table by letter position
letter_table <- function(n){letter_freq(str_sub(wordles,n,n))}
# collect the letter freq tables for all 5 letter positions
wordle_freq <- map_dfr(seq(1:5), letter_table, .id = "position" )
wordle_freq %>%
group_by(position) %>%
top_n(10, Percentage) %>%
ungroup() %>%
mutate(position = as.factor(position),
Letter = reorder_within(Letter,Percentage, position)) %>%
ggplot(aes(Letter, Percentage, fill = position)) +
geom_col(show.legend = FALSE) +
facet_wrap(~position, scales = 'free_y') +
coord_flip() +
scale_x_reordered() +
theme_bw() +
labs(y = "Percent Letter Frequency",
title = "Letter Frequency of 5-letter words by Letter Positions 1-5",
subtitle = "Dictionary: Grady Wards Augmented Dictionary")
Applying the letter frequencies by position to each 5-letter wordle in our dictionary, we can then calculate a combined score by averaging each letter frequency in the word.
# function to score word by calculating mean percentage of letter position based on freq table
score_word <- function(word){
chars <- data.frame(Letter = unlist(strsplit(word, split = "")), stringsAsFactors = FALSE) %>%
mutate(position = as.character(row_number())) %>%
inner_join(wordle_freq, by = c("position", "Letter"))
# score <- mean(chars$Percentage)
df <- data.frame(Word = word, Score = mean(chars$Percentage))
return(df)
}
# get word scores
word_scores <- map_dfr(wordles, score_word)
saveRDS(word_scores, "word_scores.rds")
flextable(df.wordscores) %>%
set_caption(caption = "Top 10 Words by average letter position frequency score") %>%
set_table_properties(layout = "autofit", width = 0.5)
Word | Score |
SANES | 16.618 |
SALES | 16.492 |
SORES | 16.310 |
CARES | 16.260 |
BARES | 16.090 |
TARES | 16.032 |
SATES | 15.978 |
SERES | 15.960 |
PARES | 15.948 |
MARES | 15.912 |