Each day, a new five letter word is the day’s Wordle. Thanks to The Riddler’s fascination with the Riddler, the Riddler universe has access to a list of words that we are allowed to guess along with a list of words that could be the mystery Wordle word.
Let’s read in the mystery words and the words we can use to guess. Thanks Jason Ash for the text files online!
mysteryWords <- url("https://jtash.vercel.app/wordle-solutions.txt")
guessWords <- url("https://jtash.vercel.app/wordle-guesses.txt")
Mystery <- readLines(mysteryWords, warn = FALSE)
Guess <- readLines(guessWords, warn = FALSE)
To come up with a good first guess word, we’ll do a quick analysis of the distribution of letters in the 1st position, 2nd, and so on.
letterDist <- rep(0, 26)
names(letterDist) <- letters
## this function takes a word list and a number between 1 and 5 and then
## creates a distribution of letters for that given spot
distribute <- function(wordList, letNum){
letterDist <- rep(0, 26)
names(letterDist) <- letters
k <- length(wordList)
for(i in 1:k){
x <- strsplit(wordList[i], split="")[[1]][letNum]
letterDist[x] <- letterDist[x]+1
}
return(letterDist)
}
(letter1Dist <- distribute(Mystery, 1))
## a b c d e f g h i j k l m n o p q r s t
## 141 173 198 111 72 136 115 69 34 20 20 88 107 37 41 142 23 105 366 149
## u v w x y z
## 33 43 83 0 6 3
(letter2Dist <- distribute(Mystery, 2))
## a b c d e f g h i j k l m n o p q r s t
## 304 16 40 20 242 8 12 144 202 2 10 201 38 87 279 61 5 267 16 77
## u v w x y z
## 186 15 44 14 23 2
(letter3Dist <- distribute(Mystery, 3))
## a b c d e f g h i j k l m n o p q r s t
## 307 57 56 75 177 25 67 9 266 3 12 112 61 139 244 58 1 163 80 111
## u v w x y z
## 165 49 26 12 29 11
(letter4Dist <- distribute(Mystery, 4))
## a b c d e f g h i j k l m n o p q r s t
## 163 24 152 69 318 35 76 28 158 2 55 162 68 182 132 50 0 152 171 139
## u v w x y z
## 82 46 25 3 3 20
(letter5Dist <- distribute(Mystery, 5))
## a b c d e f g h i j k l m n o p q r s t
## 64 11 31 118 424 26 41 139 11 0 113 156 42 130 58 56 0 212 36 253
## u v w x y z
## 1 0 17 8 364 4
We see that ‘s’ is the most popular first letter while ‘a’ is the most popular 2nd and 3rd letters, ‘e’ the most popular 4th and 5th letters. If we look at the next most popular letters in the 3rd and 4th spots, we find ‘i’ and ‘n’, respectively. It turns out ‘saine’ is a guessable word which is an archaic term meaning to make the sign of the cross over oneself to protect from evil.
To illustrate the process of guessing the Wordle, I will walk through my guessing procedure for 1/17/2022. My first guess was ‘saine’, and it says that the ‘s’, ‘i’, and ‘e’ are all correct AND in the correct places while the ‘a’ and the ‘n’ are not in the word.
## A function that takes a word as input along with a vector of letters, and returns TRUE if any of the letters are in the word, False otherwise. This will be used to remove words.
remWords <- function(x, r){
return(!any(r %in% x))
}
## A function that takes a word and vector of letters as input and returns TRUE if all the letters are in the word, and FALSE otherwise. Will be used to find words to keep.
keepWords <- function(x, k){
return(all(k %in% x))
}
## A function that takes a word, a letter, and a position as input and returns TRUE if the word has that letter in that position and FALSE otherwise.
exactSpot <- function(x, e, pos){
return(x[pos] == e)
}
## A function that takes a word, a letter, and a position as input and returns TRUE if the word does not have that letter in that position.
notExactSpot <- function(x, e, pos){
return(x[pos] != e)
}
## inputs gw = guess word, MystDist is the distribution of Mystery words, and then px is 0 if letter in position x is not included, 1 if letter in position x is included but not in that position, and 2 if letter in position x is included and it the correct position.
filterList <- function(gW, MystDist, p1, p2, p3, p4, p5){
Mdist <- MystDist
gLets <- strsplit(gW, "")[[1]]
P <- c(p1, p2, p3, p4, p5)
i <- 1
for(p in P){
if(p==0){
Mdist <- Mdist[sapply(strsplit(Mdist, ""),
function(x) remWords(x, gLets[i]))]
}
if(p==1){
Mdist <- Mdist[sapply(strsplit(Mdist, ""),
function(x) keepWords(x, gLets[i]))]
Mdist <- Mdist[sapply(strsplit(Mdist, ""),
function(x) notExactSpot(x, gLets[i],i))]
}
if(p==2){
Mdist <- Mdist[sapply(strsplit(Mdist, ""),
function(x) exactSpot(x, gLets[i],i))]
}
i <- i+1
}
return(Mdist)
}
Using the function now for the wordle on 1/17/2022, I input ‘saine’, the Mystery word list, and then 2,0,2,0,2.
(MystRnd2 <- filterList("saine", Mystery, 2, 0, 2, 0, 2))
## [1] "seize" "shire" "slice" "slide" "slime" "smile" "smite" "spice" "spike"
## [10] "spire" "spite" "suite"
I would like to analyze each of these words as potential next guess words. If I were to have ‘seize’ as my guess word, what would be the maximum sized remaining third round distribution of words? Then, I choose the minimum over all guess words.
listFilter <- function(Mword, Gword, potentialWords){
Glets <- strsplit(Gword, "")[[1]]
Mlets <- strsplit(Mword, "")[[1]]
notInLets <- unique(Glets[!Glets %in% Mlets])
InLets <- unique(Glets[Glets %in% Mlets])
exactLets <- Glets[Glets == Mlets]
notexactLets <- setdiff(InLets, exactLets)
# Refine the word distribution according to the rules
NewDist <- potentialWords[sapply(strsplit(potentialWords, split=""), function(x) remWords(x, notInLets))]
NewDist <- NewDist[sapply(strsplit(NewDist, split=""), function(x) keepWords(x, InLets))]
if(length(exactLets)>0){
for(a in exactLets){
for(a1 in which(a==Mlets)){
NewDist <- NewDist[sapply(strsplit(NewDist, split=""), function(x) exactSpot(x, a, a1))]
}
}
}
if(length(notexactLets)>0){
for(a in notexactLets){
NewDist <- NewDist[sapply(strsplit(NewDist, split=""), function(x) notExactSpot(x, a, which(a==Glets)))]
}
}
return(NewDist)
}
nextGuess <- function(GuessList, MysteryList){
L <- length(MysteryList)
for(g in GuessList){
M <- 1
for(m in MysteryList){
gL <- listFilter(m, g, MysteryList)
if(length(gL)>M){
M <- length(gL)
}
}
if(M <= L){
if(M<L){G <- g}
else{ G <- c(G,g)}
L <- M
}
}
return(list(G, L))
}
nextGuess(MystRnd2, MystRnd2)
## [[1]]
## [1] "spite"
##
## [[2]]
## [1] 6
If this had given me a 1 or 2, I would have considered it. However, seeing that I could reasonably have a 1/6th chance of getting it on the third try I will instead analyze the distribution of the second and fourth letters in these words to come up with a better guess.
(letter2Dist_r2 <- distribute(MystRnd2,2))
## a b c d e f g h i j k l m n o p q r s t u v w x y z
## 0 0 0 0 1 0 0 1 0 0 0 3 2 0 0 4 0 0 0 0 1 0 0 0 0 0
(letter4Dist_r2 <- distribute(MystRnd2,4))
## a b c d e f g h i j k l m n o p q r s t u v w x y z
## 0 0 2 1 0 0 0 0 0 0 1 1 1 0 0 0 0 2 0 3 0 0 0 0 0 1
Using the above information, I will filter the guess list down to words that do not have any of the letters that have a zero above.
Gref <- Guess[sapply(strsplit(Guess, split=""), function(x) remWords(x, c("a", "b", "f", "g", "i", "j", "n", "o","q","s", "v", "w", "x", "y")))]
length(Gref)
## [1] 247
With 247 guess words to work with, let’s see what happens if we use the words in this list to narrow down the mystery word.
nextGuess(Gref, MystRnd2)
## [[1]]
## [1] "clept" "crump"
##
## [[2]]
## [1] 2
By using either ‘clept’ or ‘crump’ as our second guess, the maximum number of mystery words left will be two. Although we have zero possibility of getting it on the second guess, we have a much higher probability of guessing the mystery word on the third.
I’ve guessed ‘clept’ as my second round and the result is that there is no ‘c’, ‘l’, ‘p’, or ‘t’ in the word. We already know that ‘e’ is in the last spot. I can only cross my fingers and hope that the filter will reduce it list down to one.
Let’s use the function to eliminate further.
(MystRnd3 <- filterList("clept", MystRnd2, 0, 0, 1, 0, 0))
## [1] "seize" "shire"
Well, darn. I have two to choose from. Going with the first, I ended up being wrong. In hindsight, had I guessed ‘clump’ on the second try I would have got it on the third!
filterList("crump", MystRnd2, 0, 1, 0, 0, 0)
## [1] "shire"
This is the third time I’ve used this strategy, and there are times in which using one of the 2nd round mystery words is more beneficial than using a guess word. I have yet to automate the middle step in which we select whether the 2nd guess should come from the Mystery or Guess list, and as such, calculating the probability of getting it in three or less would be pretty difficult.