Introduction
This data set The
New York Times Connections Archive derived from Kaggle contains all the words, group
names, group level, game ID, starting row, and starting columns of
The New York Times word puzzle called
Connections. The data set encompasses all the
observations of the game from June 12th of 2023 (first game launch date)
and is constantly updated with the most recent games.
As much as I love playing Connections, I can’t help
but feel frustrated at times when I am stuck or fail a game. For this
reason, I decided to analyze this data set to gather more insight on the
game, find a pattern, and maybe even use a little LLM or ML to predict
future Connections games. Just a fun little project I
did during my free time :)
Summary Table of the Connections Data set
Obviously each game consists of 4 words for each of the 4 groups so
every difficulty level has an equal number of observations. The purpose
of this table was to just see the numbers of games that were released
for each year from 2023 - 2025. Also because I just wanted my Rmarkdown
report to be longer.
| Characteristic |
2023
N = 3,248 |
2024
N = 5,856 |
2025
N = 608 |
| Group.Level |
|
|
|
| Â Â Â Â 0 |
812 (25%) |
1,464 (25%) |
152 (25%) |
| Â Â Â Â 1 |
812 (25%) |
1,464 (25%) |
152 (25%) |
| Â Â Â Â 2 |
812 (25%) |
1,464 (25%) |
152 (25%) |
| Â Â Â Â 3 |
812 (25%) |
1,464 (25%) |
152 (25%) |
Interactive Treemap of Connections
Click on the years and the level of difficulties (0 being the
easiest category to 3 being the most difficult) to see the top 10 themes
that were reused throughout the duration this game has been out
for.
Hyrax says goodbye!