Introduction

This data set The New York Times Connections Archive derived from Kaggle contains all the words, group names, group level, game ID, starting row, and starting columns of The New York Times word puzzle called Connections. The data set encompasses all the observations of the game from June 12th of 2023 (first game launch date) and is constantly updated with the most recent games.
As much as I love playing Connections, I can’t help but feel frustrated at times when I am stuck or fail a game. For this reason, I decided to analyze this data set to gather more insight on the game, find a pattern, and maybe even use a little LLM or ML to predict future Connections games. Just a fun little project I did during my free time :)

Summary Table of the Connections Data set

Obviously each game consists of 4 words for each of the 4 groups so every difficulty level has an equal number of observations. The purpose of this table was to just see the numbers of games that were released for each year from 2023 - 2025. Also because I just wanted my Rmarkdown report to be longer.
Characteristic 2023
N = 3,248
1
2024
N = 5,856
1
2025
N = 608
1
Group.Level


    0 812 (25%) 1,464 (25%) 152 (25%)
    1 812 (25%) 1,464 (25%) 152 (25%)
    2 812 (25%) 1,464 (25%) 152 (25%)
    3 812 (25%) 1,464 (25%) 152 (25%)
1 n (%)

Interactive Treemap of Connections

Click on the years and the level of difficulties (0 being the easiest category to 3 being the most difficult) to see the top 10 themes that were reused throughout the duration this game has been out for.
Hyrax says goodbye!
Hyrax says goodbye!