Wordle, while not quite as popular as it was in early February, is still played enough to generate more than 150,000 daily Twitter posts by users itching to showcase their 5-letter word guessing prowess. Within some key boundaries, rules are straightforward as each game challenges a player to identify a hidden word in as few tries as possible:
As there is only a single Wordle solution each day, sharing results to social media allows players to rank themselves against global competitors whose results are posted daily by the automated Twitter account, @WordleStats. The craze has born a flurry of imitators trying to create a similar buzz by challenging your solving abilities along a breadth of diverse topics. As of late March, the count is up to 18 spin-off versions. Some of the more popular titles include:
Quordle: similar to Wordle, but you’re solving four puzzles at once,
Worldle: geography themed,
Nerdle: math themed,
Lewdle: like the original, but all the solutions are NSFW words.
Along with creating clones, there’s been an ever increasing effort to figure out how to outwit the game by determining the most effective solving strategies. These include identifying the best starter words, letter combos, and multiple processes to eliminate erroneous letters. To this end, statisticians and data science gurus have taken to the web to post pointers on how to be the best wordler you can be. Much of this work applies statistical methods to data sets built from a combination of common word lists and the full Wordle solution set. Dictionaries outside of Wordle are used to determine averages of letter combinations across all known 5-letter words. An alternative is to use the word lists supplied by the game itself. Within the code are both the solution set of words numbering 2,315 and an allowable set of words at just under 13,000. The allowable words are all the 5-letter words that the game will accept, with just over 2,300 of them classified as winning solutions. Using these word lists provides the best opportunity to statistically understand any potential patterns that could lead to more efficient strategies.
One of the more comprehensive articles on the various strategies players use was written by Chris Chow about a month into Wordle mania. Through a literature review of other gamers’ studies, he determined that in addition to having a great starter or “seed” word, the best players used a combination of tactics to play towards multiple goals or definitions of success. Some play to solve it in the fewest guesses, while others focus on gaining information to ensure success by the final round (playing not to lose). He created a gaming bot to simulate various tactics and determined that personal preference was as big a factor in a successful gaming strategy as what letters/words were chosen at the beginning of a round.
The following case study will rely on text network analysis techniques in an attempt to identify latent or hidden relationships between Wordle characters. My method will explore the below topics pertaining to each step of the data-intensive workflow process:
Prepare: Prior to analysis, I’ll explain the context from which the data came, formulate some research questions, and introduced the R packages that will enable analysis.
Wrangle: In section 2, I’ll import the Wordle data set taken from Twitter, tidy it, and tokenize it into elements that can be statistically analyzed as well as input into network graphing models.
Analyze: In section 3, I’ll explore the data elements in an effort to describe trends that can shape the tactics in building the graphing models.
Model: I’ll wrap up the analysis in Section 4 by introducing networks of letter combinations. These tools will assist in visualizing relationships between letters and may uncover insights into effective letter or word choice in the game.
Communicate: Finally, I’ll conclude by consolidating artifacts from the analysis to address the research focus from section 1b.
As a relatively new student of the data science field (and a Wordle addict), I’m throwing my hat into the analytic ring, but I’m handicapping myself a bit. As a gaming purist, I want to do as little damage as possible to the Wordle experience, so I am refraining from digging into the coded word lists and building my analysis on only the games that have been played to date.
The goal of this analysis is to determine whether or not text network analysis can be used to describe the latent relationships between letters or letter combinations and thereby indicate those combinations most conducive to solving Wordle puzzles faster.
library(plotly)
library(tidyverse)
library(tidytext)
library(htmlwidgets)
library(dplyr)
library(here)
library(igraph)
library(ggraph)
library(zoo)
library(wordcloud2)
library(kableExtra)
The data for this case study was generated from Kevin O’ Connor’s (@gooeyblob) automated @WordleStats site, which I put into a .csv file for ease of input. That data was collected between 7 January and 23 March 2022 for a total of 76 observations.
wordle_raw <- read_csv(here("data", "wordle.csv"))
wordle_raw %>%
kbl() %>%
kable_styling() %>%
scroll_box(width = "800px", height = "500px")
| Date | ID | Word | n | Hmode | 1 | 2 | 3 | 4 | 5 | 6 | X |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 23-Mar | 277 | PURGE | 156785 | 8555 | 0.01 | 0.04 | 0.22 | 0.35 | 0.26 | 0.11 | 0.02 |
| 22-Mar | 276 | SLOSH | 160161 | 8807 | 0.00 | 0.02 | 0.19 | 0.36 | 0.27 | 0.13 | 0.02 |
| 21-Mar | 275 | THEIR | 173636 | 9200 | 0.02 | 0.14 | 0.36 | 0.30 | 0.13 | 0.04 | 0.00 |
| 20-Mar | 274 | RENEW | 154987 | 8417 | 0.00 | 0.04 | 0.20 | 0.33 | 0.27 | 0.13 | 0.02 |
| 19-Mar | 273 | ALLOW | 156311 | 8515 | 0.00 | 0.05 | 0.21 | 0.32 | 0.26 | 0.14 | 0.03 |
| 18-Mar | 272 | SAUTE | 179830 | 9304 | 0.01 | 0.08 | 0.31 | 0.34 | 0.19 | 0.06 | 0.01 |
| 17-Mar | 271 | MOVIE | 169071 | 8847 | 0.01 | 0.05 | 0.18 | 0.30 | 0.26 | 0.16 | 0.03 |
| 16-Mar | 270 | CATER | 217856 | 11234 | 0.01 | 0.07 | 0.19 | 0.22 | 0.19 | 0.18 | 0.15 |
| 15-Mar | 269 | TEASE | 202855 | 10024 | 0.01 | 0.16 | 0.32 | 0.30 | 0.16 | 0.06 | 0.01 |
| 14-Mar | 268 | SMELT | 185406 | 9373 | 0.00 | 0.05 | 0.19 | 0.33 | 0.28 | 0.13 | 0.02 |
| 13-Mar | 267 | FOCUS | 179436 | 8937 | 0.01 | 0.04 | 0.23 | 0.36 | 0.24 | 0.10 | 0.01 |
| 12-Mar | 266 | TODAY | 192049 | 9353 | 0.01 | 0.07 | 0.29 | 0.35 | 0.20 | 0.07 | 0.01 |
| 11-Mar | 265 | WATCH | 226349 | 12400 | 0.01 | 0.06 | 0.14 | 0.18 | 0.17 | 0.24 | 0.20 |
| 10-Mar | 264 | LAPSE | 208884 | 9960 | 0.00 | 0.08 | 0.31 | 0.34 | 0.19 | 0.07 | 0.01 |
| 9-Mar | 263 | MONTH | 201799 | 9435 | 0.01 | 0.05 | 0.26 | 0.37 | 0.22 | 0.08 | 0.01 |
| 8-Mar | 262 | SWEET | 207473 | 9767 | 0.01 | 0.05 | 0.18 | 0.31 | 0.28 | 0.15 | 0.02 |
| 7-Mar | 261 | HOARD | 218595 | 9823 | 0.01 | 0.09 | 0.30 | 0.34 | 0.19 | 0.07 | 0.01 |
| 6-Mar | 260 | CLOTH | 218595 | 9911 | 0.01 | 0.08 | 0.33 | 0.34 | 0.17 | 0.07 | 0.01 |
| 5-Mar | 259 | BRINE | 229895 | 10405 | 0.01 | 0.09 | 0.25 | 0.29 | 0.22 | 0.12 | 0.03 |
| 4-Mar | 258 | AHEAD | 203730 | 9396 | 0.01 | 0.05 | 0.20 | 0.35 | 0.26 | 0.12 | 0.02 |
| 3-Mar | 257 | MOURN | 240018 | 10465 | 0.01 | 0.08 | 0.29 | 0.34 | 0.19 | 0.08 | 0.01 |
| 2-Mar | 256 | NASTY | 257304 | 10813 | 0.01 | 0.07 | 0.26 | 0.31 | 0.21 | 0.11 | 0.02 |
| 1-Mar | 255 | RUPEE | 240137 | 10577 | 0.01 | 0.02 | 0.17 | 0.35 | 0.30 | 0.13 | 0.02 |
| 28-Feb | 254 | CHOKE | 251094 | 10521 | 0.01 | 0.08 | 0.30 | 0.36 | 0.18 | 0.06 | 0.01 |
| 27-Feb | 253 | CHANT | 250413 | 10438 | 0.01 | 0.09 | 0.33 | 0.33 | 0.16 | 0.07 | 0.01 |
| 26-Feb | 252 | SPILL | 248363 | 10087 | 0.01 | 0.05 | 0.26 | 0.34 | 0.22 | 0.10 | 0.02 |
| 25-Feb | 251 | VIVID | 255907 | 11687 | 0.01 | 0.02 | 0.10 | 0.29 | 0.33 | 0.21 | 0.04 |
| 24-Feb | 250 | BLOKE | 250674 | 10405 | 0.01 | 0.06 | 0.21 | 0.32 | 0.25 | 0.12 | 0.02 |
| 23-Feb | 249 | TROVE | 277576 | 11411 | 0.01 | 0.05 | 0.16 | 0.24 | 0.25 | 0.22 | 0.08 |
| 22-Feb | 248 | THORN | 309356 | 11814 | 0.01 | 0.14 | 0.38 | 0.30 | 0.12 | 0.04 | 0.00 |
| 21-Feb | 247 | OTHER | 278731 | 10887 | 0.01 | 0.09 | 0.26 | 0.30 | 0.21 | 0.10 | 0.02 |
| 20-Feb | 246 | TACIT | 273306 | 11094 | 0.01 | 0.04 | 0.21 | 0.32 | 0.26 | 0.14 | 0.03 |
| 19-Feb | 245 | SWILL | 282327 | 11241 | 0.01 | 0.01 | 0.08 | 0.19 | 0.31 | 0.30 | 0.10 |
| 18-Feb | 244 | DODGE | 265238 | 10220 | 0.01 | 0.03 | 0.15 | 0.29 | 0.27 | 0.19 | 0.07 |
| 17-Feb | 243 | SHAKE | 342003 | 12767 | 0.01 | 0.06 | 0.16 | 0.23 | 0.24 | 0.21 | 0.09 |
| 16-Feb | 242 | CAULK | 289721 | 10740 | 0.01 | 0.04 | 0.20 | 0.31 | 0.26 | 0.15 | 0.03 |
| 15-Feb | 241 | AROMA | 287836 | 10343 | 0.01 | 0.06 | 0.25 | 0.33 | 0.22 | 0.11 | 0.02 |
| 14-Feb | 240 | CYNIC | 261521 | 10030 | 0.01 | 0.02 | 0.11 | 0.33 | 0.34 | 0.17 | 0.03 |
| 13-Feb | 239 | ROBIN | 277471 | 9249 | 0.01 | 0.06 | 0.29 | 0.34 | 0.21 | 0.08 | 0.01 |
| 12-Feb | 238 | ULTRA | 269885 | 9310 | 0.01 | 0.07 | 0.23 | 0.34 | 0.24 | 0.10 | 0.01 |
| 11-Feb | 237 | ULCER | 278826 | 10631 | 0.01 | 0.04 | 0.18 | 0.30 | 0.28 | 0.16 | 0.03 |
| 10-Feb | 236 | PAUSE | 304830 | 13480 | 0.01 | 0.08 | 0.26 | 0.32 | 0.21 | 0.10 | 0.02 |
| 9-Feb | 235 | HUMOR | 305372 | 13846 | 0.01 | 0.05 | 0.22 | 0.34 | 0.25 | 0.11 | 0.02 |
| 8-Feb | 234 | FRAME | 336236 | 15369 | 0.01 | 0.10 | 0.20 | 0.24 | 0.24 | 0.17 | 0.03 |
| 7-Feb | 233 | ELDER | 288228 | 13340 | 0.01 | 0.03 | 0.13 | 0.24 | 0.30 | 0.24 | 0.05 |
| 6-Feb | 232 | SKILL | 311018 | 13716 | 0.01 | 0.03 | 0.17 | 0.33 | 0.27 | 0.16 | 0.03 |
| 5-Feb | 231 | ALOFT | 319698 | 13708 | 0.01 | 0.04 | 0.22 | 0.36 | 0.25 | 0.11 | 0.02 |
| 4-Feb | 230 | PLEAT | 359679 | 14813 | 0.01 | 0.10 | 0.28 | 0.31 | 0.19 | 0.09 | 0.02 |
| 3-Feb | 229 | SHARD | 358176 | 14609 | 0.01 | 0.07 | 0.22 | 0.28 | 0.25 | 0.14 | 0.04 |
| 2-Feb | 228 | MOIST | 361908 | 14205 | 0.03 | 0.13 | 0.32 | 0.29 | 0.16 | 0.07 | 0.01 |
| 1-Feb | 227 | THOSE | 351663 | 13606 | 0.01 | 0.13 | 0.34 | 0.30 | 0.15 | 0.06 | 0.01 |
| 31-Jan | 226 | LIGHT | 341314 | 13347 | 0.01 | 0.10 | 0.25 | 0.27 | 0.19 | 0.12 | 0.05 |
| 30-Jan | 225 | WRUNG | 294687 | 11524 | 0.00 | 0.02 | 0.18 | 0.39 | 0.27 | 0.12 | 0.02 |
| 29-Jan | 224 | COULD | 313220 | 11592 | 0.01 | 0.07 | 0.29 | 0.35 | 0.20 | 0.08 | 0.01 |
| 28-Jan | 223 | PERKY | 296968 | 11148 | 0.01 | 0.04 | 0.17 | 0.30 | 0.27 | 0.17 | 0.04 |
| 27-Jan | 222 | MOUNT | 331844 | 11451 | 0.01 | 0.09 | 0.29 | 0.33 | 0.19 | 0.07 | 0.01 |
| 26-Jan | 221 | WHACK | 302348 | 10163 | 0.01 | 0.04 | 0.22 | 0.37 | 0.24 | 0.10 | 0.02 |
| 25-Jan | 220 | SUGAR | 276404 | 8708 | 0.01 | 0.06 | 0.25 | 0.34 | 0.23 | 0.09 | 0.01 |
| 24-Jan | 219 | KNOLL | 258038 | 8317 | 0.01 | 0.01 | 0.11 | 0.29 | 0.33 | 0.21 | 0.04 |
| 23-Jan | 218 | CRIMP | 269929 | 7630 | 0.01 | 0.05 | 0.28 | 0.38 | 0.20 | 0.07 | 0.01 |
| 22-Jan | 217 | WINCE | 241489 | 6850 | 0.01 | 0.03 | 0.17 | 0.33 | 0.29 | 0.15 | 0.03 |
| 21-Jan | 216 | PRICK | 273727 | 7409 | 0.01 | 0.08 | 0.30 | 0.33 | 0.19 | 0.07 | 0.01 |
| 20-Jan | 215 | ROBOT | 243964 | 6589 | 0.01 | 0.08 | 0.29 | 0.34 | 0.20 | 0.08 | 0.01 |
| 19-Jan | 214 | POINT | 280622 | 7094 | 0.01 | 0.16 | 0.37 | 0.28 | 0.12 | 0.04 | 0.01 |
| 18-Jan | 213 | PROXY | 220950 | 6206 | 0.01 | 0.02 | 0.11 | 0.24 | 0.31 | 0.26 | 0.06 |
| 17-Jan | 212 | SHIRE | 222197 | 5640 | 0.01 | 0.08 | 0.32 | 0.32 | 0.18 | 0.08 | 0.02 |
| 16-Jan | 211 | SOLAR | 209609 | 4955 | 0.01 | 0.09 | 0.32 | 0.32 | 0.18 | 0.07 | 0.01 |
| 15-Jan | 210 | PANIC | 205880 | 4655 | 0.01 | 0.09 | 0.35 | 0.34 | 0.16 | 0.05 | 0.01 |
| 14-Jan | 209 | TANGY | 169484 | 3985 | 0.01 | 0.04 | 0.21 | 0.30 | 0.24 | 0.15 | 0.05 |
| 13-Jan | 208 | ABBEY | 132726 | 3345 | 0.01 | 0.02 | 0.13 | 0.29 | 0.31 | 0.20 | 0.03 |
| 12-Jan | 207 | FAVOR | 137586 | 3073 | 0.01 | 0.04 | 0.15 | 0.26 | 0.29 | 0.21 | 0.04 |
| 11-Jan | 206 | DRINK | 153880 | 3017 | 0.01 | 0.09 | 0.35 | 0.34 | 0.16 | 0.05 | 0.01 |
| 10-Jan | 205 | QUERY | 107134 | 2242 | 0.01 | 0.04 | 0.16 | 0.30 | 0.30 | 0.17 | 0.02 |
| 9-Jan | 204 | GORGE | 91477 | 1913 | 0.01 | 0.03 | 0.13 | 0.27 | 0.30 | 0.22 | 0.04 |
| 8-Jan | 203 | CRANK | 101503 | 1763 | 0.01 | 0.05 | 0.23 | 0.31 | 0.24 | 0.14 | 0.02 |
| 7-Jan | 202 | SLUMP | 80630 | 1362 | 0.01 | 0.03 | 0.23 | 0.39 | 0.24 | 0.09 | 0.01 |
This data is still quite raw. To better understand player engagement and results, we must tidy it a bit by doing some computations and modifying a few columns to make plotting simpler.
# Convert date column from Character to Date class
wordle_date <- mutate(wordle_raw, Date = as.Date(Date, "%d-%B"))
# Rename column 'n' to 'Players'
guesses_by_date <- wordle_date %>%
rename(Players = n)
# Rename guess variables
names(guesses_by_date)[6:12] <- c("one", "two", "three", "four", "five", "six", "wrong")
# Compute players by guess and replace that column
wordle_guesses <- guesses_by_date %>%
mutate(one = one * Players) %>%
mutate(two = two * Players) %>%
mutate(three = three * Players) %>%
mutate(four = four * Players) %>%
mutate(five = five * Players) %>%
mutate(six = six * Players) %>%
mutate(wrong = wrong * Players)
# Add column for average guess by word
wordle_guesses <- wordle_guesses %>%
mutate(avg_guess = (one + 2 * two + 3 * three + 4 * four +
5 * five + 6 * six + 7 * wrong) / Players) %>%
relocate(avg_guess, .before = Players)
# Compute average number of guesses by attempt
wordle_tidy <- wordle_guesses %>%
pivot_longer('one':last_col(), values_to = "guesses")
wordle_guesses %>%
kbl() %>%
kable_styling() %>%
scroll_box(width = "800px", height = "500px")
| Date | ID | Word | avg_guess | Players | Hmode | one | two | three | four | five | six | wrong |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2022-03-23 | 277 | PURGE | 4.25 | 156785 | 8555 | 1567.85 | 6271.40 | 34492.70 | 54874.75 | 40764.10 | 17246.35 | 3135.70 |
| 2022-03-22 | 276 | SLOSH | 4.32 | 160161 | 8807 | 0.00 | 3203.22 | 30430.59 | 57657.96 | 43243.47 | 20820.93 | 3203.22 |
| 2022-03-21 | 275 | THEIR | 3.47 | 173636 | 9200 | 3472.72 | 24309.04 | 62508.96 | 52090.80 | 22572.68 | 6945.44 | 0.00 |
| 2022-03-20 | 274 | RENEW | 4.27 | 154987 | 8417 | 0.00 | 6199.48 | 30997.40 | 51145.71 | 41846.49 | 20148.31 | 3099.74 |
| 2022-03-19 | 273 | ALLOW | 4.36 | 156311 | 8515 | 0.00 | 7815.55 | 32825.31 | 50019.52 | 40640.86 | 21883.54 | 4689.33 |
| 2022-03-18 | 272 | SAUTE | 3.84 | 179830 | 9304 | 1798.30 | 14386.40 | 55747.30 | 61142.20 | 34167.70 | 10789.80 | 1798.30 |
| 2022-03-17 | 271 | MOVIE | 4.32 | 169071 | 8847 | 1690.71 | 8453.55 | 30432.78 | 50721.30 | 43958.46 | 27051.36 | 5072.13 |
| 2022-03-16 | 270 | CATER | 4.68 | 217856 | 11234 | 2178.56 | 15249.92 | 41392.64 | 47928.32 | 41392.64 | 39214.08 | 32678.40 |
| 2022-03-15 | 269 | TEASE | 3.72 | 202855 | 10024 | 2028.55 | 32456.80 | 64913.60 | 60856.50 | 32456.80 | 12171.30 | 2028.55 |
| 2022-03-14 | 268 | SMELT | 4.31 | 185406 | 9373 | 0.00 | 9270.30 | 35227.14 | 61183.98 | 51913.68 | 24102.78 | 3708.12 |
| 2022-03-13 | 267 | FOCUS | 4.09 | 179436 | 8937 | 1794.36 | 7177.44 | 41270.28 | 64596.96 | 43064.64 | 17943.60 | 1794.36 |
| 2022-03-12 | 266 | TODAY | 3.91 | 192049 | 9353 | 1920.49 | 13443.43 | 55694.21 | 67217.15 | 38409.80 | 13443.43 | 1920.49 |
| 2022-03-11 | 265 | WATCH | 4.96 | 226349 | 12400 | 2263.49 | 13580.94 | 31688.86 | 40742.82 | 38479.33 | 54323.76 | 45269.80 |
| 2022-03-10 | 264 | LAPSE | 3.89 | 208884 | 9960 | 0.00 | 16710.72 | 64754.04 | 71020.56 | 39687.96 | 14621.88 | 2088.84 |
| 2022-03-09 | 263 | MONTH | 4.02 | 201799 | 9435 | 2017.99 | 10089.95 | 52467.74 | 74665.63 | 44395.78 | 16143.92 | 2017.99 |
| 2022-03-08 | 262 | SWEET | 4.33 | 207473 | 9767 | 2074.73 | 10373.65 | 37345.14 | 64316.63 | 58092.44 | 31120.95 | 4149.46 |
| 2022-03-07 | 261 | HOARD | 3.89 | 218595 | 9823 | 2185.95 | 19673.55 | 65578.50 | 74322.30 | 41533.05 | 15301.65 | 2185.95 |
| 2022-03-06 | 260 | CLOTH | 3.86 | 218595 | 9911 | 2185.95 | 17487.60 | 72136.35 | 74322.30 | 37161.15 | 15301.65 | 2185.95 |
| 2022-03-05 | 259 | BRINE | 4.13 | 229895 | 10405 | 2298.95 | 20690.55 | 57473.75 | 66669.55 | 50576.90 | 27587.40 | 6896.85 |
| 2022-03-04 | 258 | AHEAD | 4.27 | 203730 | 9396 | 2037.30 | 10186.50 | 40746.00 | 71305.50 | 52969.80 | 24447.60 | 4074.60 |
| 2022-03-03 | 257 | MOURN | 3.90 | 240018 | 10465 | 2400.18 | 19201.44 | 69605.22 | 81606.12 | 45603.42 | 19201.44 | 2400.18 |
| 2022-03-02 | 256 | NASTY | 4.02 | 257304 | 10813 | 2573.04 | 18011.28 | 66899.04 | 79764.24 | 54033.84 | 28303.44 | 5146.08 |
| 2022-03-01 | 255 | RUPEE | 4.38 | 240137 | 10577 | 2401.37 | 4802.74 | 40823.29 | 84047.95 | 72041.10 | 31217.81 | 4802.74 |
| 2022-02-28 | 254 | CHOKE | 3.84 | 251094 | 10521 | 2510.94 | 20087.52 | 75328.20 | 90393.84 | 45196.92 | 15065.64 | 2510.94 |
| 2022-02-27 | 253 | CHANT | 3.79 | 250413 | 10438 | 2504.13 | 22537.17 | 82636.29 | 82636.29 | 40066.08 | 17528.91 | 2504.13 |
| 2022-02-26 | 252 | SPILL | 4.09 | 248363 | 10087 | 2483.63 | 12418.15 | 64574.38 | 84443.42 | 54639.86 | 24836.30 | 4967.26 |
| 2022-02-25 | 251 | VIVID | 4.70 | 255907 | 11687 | 2559.07 | 5118.14 | 25590.70 | 74213.03 | 84449.31 | 53740.47 | 10236.28 |
| 2022-02-24 | 250 | BLOKE | 4.15 | 250674 | 10405 | 2506.74 | 15040.44 | 52641.54 | 80215.68 | 62668.50 | 30080.88 | 5013.48 |
| 2022-02-23 | 249 | TROVE | 4.68 | 277576 | 11411 | 2775.76 | 13878.80 | 44412.16 | 66618.24 | 69394.00 | 61066.72 | 22206.08 |
| 2022-02-22 | 248 | THORN | 3.47 | 309356 | 11814 | 3093.56 | 43309.84 | 117555.28 | 92806.80 | 37122.72 | 12374.24 | 0.00 |
| 2022-02-21 | 247 | OTHER | 3.96 | 278731 | 10887 | 2787.31 | 25085.79 | 72470.06 | 83619.30 | 58533.51 | 27873.10 | 5574.62 |
| 2022-02-20 | 246 | TACIT | 4.35 | 273306 | 11094 | 2733.06 | 10932.24 | 57394.26 | 87457.92 | 71059.56 | 38262.84 | 8199.18 |
| 2022-02-19 | 245 | SWILL | 5.08 | 282327 | 11241 | 2823.27 | 2823.27 | 22586.16 | 53642.13 | 87521.37 | 84698.10 | 28232.70 |
| 2022-02-18 | 244 | DODGE | 4.66 | 265238 | 10220 | 2652.38 | 7957.14 | 39785.70 | 76919.02 | 71614.26 | 50395.22 | 18566.66 |
| 2022-02-17 | 243 | SHAKE | 4.62 | 342003 | 12767 | 3420.03 | 20520.18 | 54720.48 | 78660.69 | 82080.72 | 71820.63 | 30780.27 |
| 2022-02-16 | 242 | CAULK | 4.34 | 289721 | 10740 | 2897.21 | 11588.84 | 57944.20 | 89813.51 | 75327.46 | 43458.15 | 8691.63 |
| 2022-02-15 | 241 | AROMA | 4.10 | 287836 | 10343 | 2878.36 | 17270.16 | 71959.00 | 94985.88 | 63323.92 | 31661.96 | 5756.72 |
| 2022-02-14 | 240 | CYNIC | 4.63 | 261521 | 10030 | 2615.21 | 5230.42 | 28767.31 | 86301.93 | 88917.14 | 44458.57 | 7845.63 |
| 2022-02-13 | 239 | ROBIN | 3.96 | 277471 | 9249 | 2774.71 | 16648.26 | 80466.59 | 94340.14 | 58268.91 | 22197.68 | 2774.71 |
| 2022-02-12 | 238 | ULTRA | 4.07 | 269885 | 9310 | 2698.85 | 18891.95 | 62073.55 | 91760.90 | 64772.40 | 26988.50 | 2698.85 |
| 2022-02-11 | 237 | ULCER | 4.40 | 278826 | 10631 | 2788.26 | 11153.04 | 50188.68 | 83647.80 | 78071.28 | 44612.16 | 8364.78 |
| 2022-02-10 | 236 | PAUSE | 4.02 | 304830 | 13480 | 3048.30 | 24386.40 | 79255.80 | 97545.60 | 64014.30 | 30483.00 | 6096.60 |
| 2022-02-09 | 235 | HUMOR | 4.18 | 305372 | 13846 | 3053.72 | 15268.60 | 67181.84 | 103826.48 | 76343.00 | 33590.92 | 6107.44 |
| 2022-02-08 | 234 | FRAME | 4.20 | 336236 | 15369 | 3362.36 | 33623.60 | 67247.20 | 80696.64 | 80696.64 | 57160.12 | 10087.08 |
| 2022-02-07 | 233 | ELDER | 4.71 | 288228 | 13340 | 2882.28 | 8646.84 | 37469.64 | 69174.72 | 86468.40 | 69174.72 | 14411.40 |
| 2022-02-06 | 232 | SKILL | 4.42 | 311018 | 13716 | 3110.18 | 9330.54 | 52873.06 | 102635.94 | 83974.86 | 49762.88 | 9330.54 |
| 2022-02-05 | 231 | ALOFT | 4.24 | 319698 | 13708 | 3196.98 | 12787.92 | 70333.56 | 115091.28 | 79924.50 | 35166.78 | 6393.96 |
| 2022-02-04 | 230 | PLEAT | 3.92 | 359679 | 14813 | 3596.79 | 35967.90 | 100710.12 | 111500.49 | 68339.01 | 32371.11 | 7193.58 |
| 2022-02-03 | 229 | SHARD | 4.30 | 358176 | 14609 | 3581.76 | 25072.32 | 78798.72 | 100289.28 | 89544.00 | 50144.64 | 14327.04 |
| 2022-02-02 | 228 | MOIST | 3.70 | 361908 | 14205 | 10857.24 | 47048.04 | 115810.56 | 104953.32 | 57905.28 | 25333.56 | 3619.08 |
| 2022-02-01 | 227 | THOSE | 3.67 | 351663 | 13606 | 3516.63 | 45716.19 | 119565.42 | 105498.90 | 52749.45 | 21099.78 | 3516.63 |
| 2022-01-31 | 226 | LIGHT | 4.06 | 341314 | 13347 | 3413.14 | 34131.40 | 85328.50 | 92154.78 | 64849.66 | 40957.68 | 17065.70 |
| 2022-01-30 | 225 | WRUNG | 4.35 | 294687 | 11524 | 0.00 | 5893.74 | 53043.66 | 114927.93 | 79565.49 | 35362.44 | 5893.74 |
| 2022-01-29 | 224 | COULD | 3.97 | 313220 | 11592 | 3132.20 | 21925.40 | 90833.80 | 109627.00 | 62644.00 | 25057.60 | 3132.20 |
| 2022-01-28 | 223 | PERKY | 4.45 | 296968 | 11148 | 2969.68 | 11878.72 | 50484.56 | 89090.40 | 80181.36 | 50484.56 | 11878.72 |
| 2022-01-27 | 222 | MOUNT | 3.82 | 331844 | 11451 | 3318.44 | 29865.96 | 96234.76 | 109508.52 | 63050.36 | 23229.08 | 3318.44 |
| 2022-01-26 | 221 | WHACK | 4.17 | 302348 | 10163 | 3023.48 | 12093.92 | 66516.56 | 111868.76 | 72563.52 | 30234.80 | 6046.96 |
| 2022-01-25 | 220 | SUGAR | 4.00 | 276404 | 8708 | 2764.04 | 16584.24 | 69101.00 | 93977.36 | 63572.92 | 24876.36 | 2764.04 |
| 2022-01-24 | 219 | KNOLL | 4.71 | 258038 | 8317 | 2580.38 | 2580.38 | 28384.18 | 74831.02 | 85152.54 | 54187.98 | 10321.52 |
| 2022-01-23 | 218 | CRIMP | 3.96 | 269929 | 7630 | 2699.29 | 13496.45 | 75580.12 | 102573.02 | 53985.80 | 18895.03 | 2699.29 |
| 2022-01-22 | 217 | WINCE | 4.46 | 241489 | 6850 | 2414.89 | 7244.67 | 41053.13 | 79691.37 | 70031.81 | 36223.35 | 7244.67 |
| 2022-01-21 | 216 | PRICK | 3.83 | 273727 | 7409 | 2737.27 | 21898.16 | 82118.10 | 90329.91 | 52008.13 | 19160.89 | 2737.27 |
| 2022-01-20 | 215 | ROBOT | 3.95 | 243964 | 6589 | 2439.64 | 19517.12 | 70749.56 | 82947.76 | 48792.80 | 19517.12 | 2439.64 |
| 2022-01-19 | 214 | POINT | 3.47 | 280622 | 7094 | 2806.22 | 44899.52 | 103830.14 | 78574.16 | 33674.64 | 11224.88 | 2806.22 |
| 2022-01-18 | 213 | PROXY | 4.87 | 220950 | 6206 | 2209.50 | 4419.00 | 24304.50 | 53028.00 | 68494.50 | 57447.00 | 13257.00 |
| 2022-01-17 | 212 | SHIRE | 3.93 | 222197 | 5640 | 2221.97 | 17775.76 | 71103.04 | 71103.04 | 39995.46 | 17775.76 | 4443.94 |
| 2022-01-16 | 211 | SOLAR | 3.82 | 209609 | 4955 | 2096.09 | 18864.81 | 67074.88 | 67074.88 | 37729.62 | 14672.63 | 2096.09 |
| 2022-01-15 | 210 | PANIC | 3.77 | 205880 | 4655 | 2058.80 | 18529.20 | 72058.00 | 69999.20 | 32940.80 | 10294.00 | 2058.80 |
| 2022-01-14 | 209 | TANGY | 4.37 | 169484 | 3985 | 1694.84 | 6779.36 | 35591.64 | 50845.20 | 40676.16 | 25422.60 | 8474.20 |
| 2022-01-13 | 208 | ABBEY | 4.56 | 132726 | 3345 | 1327.26 | 2654.52 | 17254.38 | 38490.54 | 41145.06 | 26545.20 | 3981.78 |
| 2022-01-12 | 207 | FAVOR | 4.57 | 137586 | 3073 | 1375.86 | 5503.44 | 20637.90 | 35772.36 | 39899.94 | 28893.06 | 5503.44 |
| 2022-01-11 | 206 | DRINK | 3.77 | 153880 | 3017 | 1538.80 | 13849.20 | 53858.00 | 52319.20 | 24620.80 | 7694.00 | 1538.80 |
| 2022-01-10 | 205 | QUERY | 4.43 | 107134 | 2242 | 1071.34 | 4285.36 | 17141.44 | 32140.20 | 32140.20 | 18212.78 | 2142.68 |
| 2022-01-09 | 204 | GORGE | 4.64 | 91477 | 1913 | 914.77 | 2744.31 | 11892.01 | 24698.79 | 27443.10 | 20124.94 | 3659.08 |
| 2022-01-08 | 203 | CRANK | 4.22 | 101503 | 1763 | 1015.03 | 5075.15 | 23345.69 | 31465.93 | 24360.72 | 14210.42 | 2030.06 |
| 2022-01-07 | 202 | SLUMP | 4.13 | 80630 | 1362 | 806.30 | 2418.90 | 18544.90 | 31445.70 | 19351.20 | 7256.70 | 806.30 |
Note that this data describes only those games that were shared on Twitter. The Players column captures number of players sharing their results on that day. While not reported anywhere officially, the total number of Wordle players is thought to be in the neighborhood of 15 times more than what is shared to social media. Hmode denotes those players who chose to play in hard mode. This mode forces players to keep using letters that have been correctly identified. In normal mode, players can enter any valid word. This is helpful if your preference is to eliminate letters in an information-gathering strategy. The columns one through six denote the number of correct guesses at attempts 1 through 6. The column wrong identifies those players failing to guess the correct word by the 6th attempt.
The final step in Section 2 involves breaking the words into individual characters to enable network analysis in Section 4. This tokenization will isolate individual letters (unigrams) and/or group them into bigrams (letter pairs) to explore latent relationships.
# Reduce dataframe to only answers
wordle_answers <- wordle_raw %>%
select(ID, Word)
# Tokenize characters-unigrams
wordle_unigram <- wordle_answers %>%
unnest_tokens(letter, Word, token = "characters", to_lower = FALSE)
unigram_top_tokens <- wordle_unigram %>%
count(letter, sort = TRUE) %>%
top_n(26)
unigram_top_tokens %>%
kbl() %>%
kable_styling(fixed_thead = T) %>%
kable_paper() %>%
scroll_box(width = "25%", height = "250px")
| letter | n |
|---|---|
| E | 36 |
| R | 33 |
| A | 31 |
| O | 31 |
| T | 27 |
| L | 25 |
| S | 21 |
| I | 20 |
| C | 18 |
| H | 18 |
| N | 18 |
| U | 16 |
| P | 13 |
| K | 11 |
| M | 11 |
| D | 10 |
| G | 8 |
| W | 8 |
| Y | 8 |
| B | 6 |
| V | 5 |
| F | 4 |
| Q | 1 |
| X | 1 |
# Tokenize characters-bigrams
wordle_bigram <- wordle_answers %>%
unnest_tokens(bigram, Word, token = "character_shingles", n = 2)
wordle_bigram$bigram = toupper(wordle_bigram$bigram)
bigram_top_tokens <- wordle_bigram %>%
count(bigram, sort = TRUE) %>%
top_n(31)
bigram_top_tokens %>%
kbl() %>%
kable_styling(fixed_thead = T) %>%
kable_paper() %>%
scroll_box(width = "25%", height = "250px")
| bigram | n |
|---|---|
| ER | 6 |
| MO | 6 |
| TH | 6 |
| AR | 5 |
| IN | 5 |
| LL | 5 |
| LO | 5 |
| RO | 5 |
| AN | 4 |
| HA | 4 |
| HO | 4 |
| NT | 4 |
| OR | 4 |
| RI | 4 |
| SE | 4 |
| SH | 4 |
| UL | 4 |
| AT | 3 |
| AU | 3 |
| CH | 3 |
| EA | 3 |
| GE | 3 |
| HE | 3 |
| IC | 3 |
| IL | 3 |
| KE | 3 |
| OT | 3 |
| OU | 3 |
| RA | 3 |
| TE | 3 |
| VI | 3 |
Since one of our goals is to understand which letter or combinations of letters lead to better outcomes, tidying the data allows us to explore player engagement at the word and attempt level.
wordle_tidy$name <- factor(wordle_tidy$name,
levels = c("one",
"two",
"three",
"four",
"five",
"six",
"wrong"))
wordle_tidy %>%
ggplot(aes(fill=name, y=guesses, x=Date)) +
geom_bar(position="stack", stat="identity") +
labs(subtitle = "Twitter-Reported Wordle Scores by Date", x = "", y = "Players") +
scale_fill_discrete(name = "Guesses") +
scale_y_continuous(labels = scales::comma)
Wordle’s moment in the sun may be waning, but there are still over 150,000 players posting their scores to Twitter on a daily basis. This means that there are still upwards of 2 million players globally. Wordle was purchased by the New York Times (NYT) at the end of January, which coincided with the beginning of decline in player interest. That is probably less about the NYT and more about the nature of game fads.
There was a rumor or belief that the NYT made the game more difficult, leading to players losing interest. Well let’s see what the data says… (hover over each column to see that day’s word and scores)
scores_bar_graph <- wordle_guesses %>%
ggplot(aes(y= avg_guess, x = Date)) +
geom_col(aes(text = Word), fill = "pink") +
geom_smooth(method = "lm", se = FALSE, color = "red") +
geom_ribbon(stat = "smooth", method = "lm", alpha = .15) +
scale_y_continuous(labels = scales::comma) +
coord_cartesian(ylim = c(3,5)) +
labs(title = "Wordle Averages by Date", x = "", y = "Average Score")
ggplotly(scores_bar_graph, tooltip = c("text", "Date", "avg_guess"))
When we take all scores into account from early January through 23 March, the trend line does not appear to imply an increase in difficulty. In fact, if you hover over the ends of the trend line, you can see that the most recent average, 4.16 is lower than the average in early January, 4.20. That is fairly flat and indicates no real change to difficulty level for this data set. But with all those peaks and troughs, it is probably a bit more accurate to look at a rolling average to capture more short-term effects that could have impacted how players perceived game difficulty.
rolling_bar_graph <- wordle_guesses %>%
mutate(seven_avg = rollmean(avg_guess, 5,
align="center",
fill=0)) %>%
ggplot(aes(y= avg_guess, x = Date)) +
geom_col(aes(text = Word), fill = "pink") +
geom_line(aes(y = seven_avg),
color = "red") +
scale_y_continuous(labels = scales::comma) +
coord_cartesian(ylim = c(3,5)) +
labs(title = "5-Day Rolling Wordle Average", x = "", y = "Average Score")
ggplotly(rolling_bar_graph, tooltip = c("text", "Date", "avg_guess"))
This shows a very different adventure, though the ending is probably the same. Again, numerous peaks and dips that, over time, average to a fairly flat trend. But depending on where you started playing, your experience could vary greatly from someone else. If we focus on the time around the NYT purchase of the game, you can see a steady increase in difficulty from the beginning of February until about the 18th. For anyone paying attention, it would absolutely look and feel like the game got harder for about 2 weeks after the NYT took the reigns. Eventually, the game would re-balance over the following few weeks to get back in line with what has become a flat trend over the course of the game.
As it turns out, the NYT did nothing to make the game any more difficult. In fact, they removed solution words they felt were too obscure and could make the game less fun to play. The vast majority of original solution words are still in play in the same order the original creator laid out.
Prior to applying text networking models to the data, it may be informative to better understand key letters and letter combinations that could influence Wordle guess choice. Logic dictates that the most common letters (or combinations thereof) would make good options for early word selection. For this last piece of exploratory analysis, I’ll examine single letters (unigrams) and letter pairs (bigrams).
unigram_top_tokens <- wordle_unigram %>%
count(letter, sort = TRUE) %>%
top_n(26)
top_unigrams <- unigram_top_tokens %>%
mutate(freq = round(n / sum(n), 3)) %>%
arrange(desc(freq))
top_unigrams %>%
ggplot(aes(x = reorder(letter, -n), y = freq)) +
geom_bar(stat = "identity", fill = "palegreen") +
scale_y_continuous(breaks = c(0, .02, .04, .06, .08, .10)) +
labs(title = "Top Wordle Letters (Unigrams)", subtitle = "7 Jan - 23 Mar '22",
y = "Frequency of Appearance", x = "Letter")
The top five most used letters to date include E, R, O, A, and T. Odds are that if you use those letters early, you will see some success identifying usable clues. However, you can only form two actual words from those letters: ORATE and ROATE. I highly doubt that ROATE is one of the solution words, so this significantly limits a player’s ability to guess right on the first attempt. Lets try this on bigrams to see if we uncover any additional clues.
bigram_top_tokens <- wordle_bigram %>%
count(bigram, sort = TRUE) %>%
top_n(31)
top_bigrams <- bigram_top_tokens %>%
mutate(freq = round(n / sum(n), 3)) %>%
arrange(desc(freq))
top_bigrams %>%
ggplot(aes(x = reorder(bigram, -n), y = freq)) +
geom_bar(stat = "identity", fill = "palegreen") +
scale_y_continuous(breaks = c(0, .02, .04, .06, .08, .10)) +
labs(title = "Top Wordle Letter Pairs (Bigrams)", subtitle = "7 Jan - 23 Mar '22",
y = "Frequency of Appearance", x = "Letter Pair")
The top three bigrams (ER, MO, and TH) are recognized as fairly common beginning or ending parts of words. Again, this makes sense, but as they’ve only been seen 6 times each out of 76 games, it doesn’t appear that bigrams alone are the key to victory in the first couple of guesses either.
Now that we’ve explored individual letters and bigram relationships mathematically, let’s transitions to more visual representations. Section 4 will take this same data, but experiment with network visuals in an attempt to identify latent relationships that the math is challenged to reveal through standard statistical charts.
To visualize relationships between Wordle letters (edges and nodes), we’ll need three pieces of information:
from: the letter an edge is coming from
to: the letter an edge is going towards
weight: A numeric value associated with each edge
We need to transform our dataset (wordle_bigram) into these variables in the following way: from is the “letter1”, to is the “letter2”, and weight is “n”.
The function graph_from_data_frame enables the transformation:
# Separate bigrams
bigram_separated <- wordle_bigram %>%
separate(bigram, c("letter1", "letter2"), sep = 1)
# Count Bigrams
bigram_counts <- bigram_separated %>%
count(letter1, letter2, sort = TRUE)
# Create graph
bigram_graph <- bigram_counts %>%
graph_from_data_frame()
set.seed(100)
bigram_graph |>
ggraph(layout = "stress") +
geom_node_text(aes(label = name)) +
geom_edge_link(aes(edge_alpha = n, start_cap = circle(2, 'mm'), end_cap = circle(2, 'mm')),
arrow = arrow(length = unit(2, 'mm')), color = "blue") +
theme_graph() +
labs(title = "Wordle Letter Pairs", edge_alpha = "#Connections")
Line weights are shaded to emphasize the number of connections between letters. You’ll see the top five letters (E, R, O, A, and T) from the frequency chart still play a prominent role in this visual, but a few more connections are also easily seen with this type of chart. The letters N, H, L, M, and I now appear to be quite an important part of the network. If we filter out some of the least common relationships, we can zero in on the most numerous letter pairs:
bigram_graph_filtered <- bigram_counts %>%
filter(n > 2) %>%
graph_from_data_frame()
set.seed(200)
bigram_graph_filtered |>
ggraph(layout = "fr") +
geom_node_text(aes(label = name)) +
geom_edge_link(aes(edge_alpha = n, start_cap = circle(2, 'mm'), end_cap = circle(2, 'mm')),
arrow = arrow(length = unit(2, 'mm')), color = "red") +
theme_graph() +
labs(title = "Most Common Wordle Letter Pairs", edge_alpha = "#Connections")
This relationship visual highlights letters and letter combinations that were not apparent in earlier charts. The letters H, N, and M don’t appear in our frequency chart until letter 10 and higher.
While letter counts and frequency analysis give insights into the most common letters, they fall short in describing the key letter pairings that lead to effective word choice for early guessing.
Frequency analysis identified the top five most common letters as E, R, A, O, and T and the most common bigrams as ER, MO, and TH. Consulting the Scrabble dictionary, there are 22 words that could be built from combining these results. Correcting for uncommon words and repeat letters, the remaining potential starter words (10) are:
earth, harem, hater, heart, homer, other, metro, tamer, torah, and orate.
Network analysis, on the other hand, identified letter combinations above and beyond the mathematical exercises. Based on adding the letters L, I, and N, the potential starter word list grows to 72:
This case study was designed to answer two questions:
Can text network analysis identify latent relationships between Wordle letters?
Can those identified relationships lead to more effective Wordle solutions?
The first question can be answered in the affirmative as the network visuals aided in identifying letter relationships that were not apparent in the frequency analysis at either the unigram or bigram level. Those additional combinations increased the potential solution words that could be used early in the game by seven fold.
The second question, however, is a little more difficult to answer and I believe the case study needs to be expanded to address the aspect of efficiency. Just having more potential words doesn’t necessarily equate to solving the puzzle quicker. One aspect of the game this analysis did not address was letter positioning. Letter frequency analysis focused on how often letters appeared regardless of position within a word. An additional level of analysis could explore positions one through six to determine where letters appeared the most, thereby reducing the potential word list to those terms that closely matched letter placement.
Lastly, there were specific limitations on this case study based on the data set. My goal was to understand how text networks could be applied at the character level rather than understanding numerical absolutes.As a result, I specifically chose to remain within the puzzles that had already been solved so as not to spoil the game for those that may still be playing. This method restricted my word list and therefore my letter counts significantly. I also didn’t use any external lexicons or dictionaries to determine “average” letter or bigram frequencies.
Bernoff, J. (2022, January 20). A mathematical analysis of the best first guess for Wordle. Without Bullshit. Retrieved March 25, 2022, from https://withoutbullshit.com/blog/a-mathematical-analysis-of-the-best-first-guess-for-wordle
Chow, C. (2022, February 11). Loaded words in Wordle. Medium. Retrieved March 25, 2022, from https://towardsdatascience.com/loaded-words-in-wordle-e78cb36f1e3c#:~:text=In%20Wordle%2C%20there%20are%202%2C315,(%E2%80%9Csupport%20words%E2%80%9D).
Frias, J. (2022, March 23). Forget luck: Optimized wordle strategy. Medium. Retrieved March 26, 2022, from https://betterprogramming.pub/forget-luck-optimized-wordle-strategy-using-bigquery-c676771e316f
Gupta, R. (2022, January 25). WORDLE-Vision: Simple Analytics to up your Wordle Game. Medium. Retrieved March 25, 2022, from https://towardsdatascience.com/wordle-vision-simple-analytics-to-up-your-wordle-game-65daf4f1aa6f
Hinton, L. (2022, March 27). Wordle word answer - what’s the Wordle today? (March 27). Gfinity Esports. Retrieved March 25, 2022, from https://www.gfinityesports.com/wordle/answer-list/
Lesser, R. (2022, March 8). Wordle, 15 million tweets later. Observable. Retrieved March 25, 2022, from https://observablehq.com/@rlesser/wordle-twitter-exploration
The New York Times. (n.d.). Wordle - a daily word game. The New York Times. Retrieved March 23, 2022, from https://www.nytimes.com/games/wordle/index.html