Winning Jeopardy Interactively

Author

Abrar Mohammad Hasanat

Introduction

In this portfolio project, I use Jeopardy clue data to develop a study strategy for the game. I focus on two text variables, Category and Question, to identify common topics and clue patterns. I also compare category words to clue value to see which common topics are connected to higher-value clues.

Category words

# A tibble: 15 × 2
   word         n
   <chr>    <int>
 1 world      414
 2 words      366
 3 history    279
 4 time       234
 5 names      233
 6 american   231
 7 u.s        230
 8 century    227
 9 science    223
10 letter     212
11 state      201
12 movies     187
13 movie      184
14 sports     184
15 music      174

The most common words in Jeopardy categories basically shows us that the show uses fields like geography, history, language and culture. “World” is the most common category word, appearing 414 times, followed by “words” with 366 appearances and “history” with 279 appearances. Other frequent category words include “time,” “names,” “american,” “science,” “state,” “movies,” “sports” and “music.” So I’d benefit from studying world knowledge, history, science, geography, wordplay and pop culture references rather than focusing only on one subject.

Clue words

# A tibble: 15 × 2
   word          n
   <chr>     <int>
 1 city        593
 2 country     482
 3 state       450
 4 u.s         445
 5 film        414
 6 word        403
 7 title       370
 8 american    288
 9 world       268
10 novel       266
11 president   264
12 capital     257
13 famous      251
14 french      249
15 time        244

Not considering a few of the non-meaningful words,the most common words in the Jeopardy clues show that a lot of the clues are built around geography, culture, politics and language. Words like “city,” “country,” “state,” “u.s,” and “capital” show that geographic and national knowledge appear quite a bit in clue wording. Other common words, such as “film,” “title,” “novel” and “word” point toward entertainment, literature and language-based clues. So basically, based on this question analysis, my study strategy should be geography, U.S. knowledge, literature, film, politics and language rather than focusing only on one subject.

Counting subjects in categories

# A tibble: 2 × 2
  subject count
  <chr>   <int>
1 history   279
2 science   223

The subject_category_count() function takes a vector of subject words and counts how often we can find each one in the Category column. From the test output we can see that “history” appears 279 times and “science” appears 223 times, matching the expected counts.

Category words and clue value

# A tibble: 15 × 3
   word          appearances average_value
   <chr>               <int>         <dbl>
 1 science               221         1107.
 2 origins                50         1036 
 3 dictionary             60          993.
 4 islands                50          992 
 5 international          50          946 
 6 literary              140          937.
 7 french                 63          929.
 8 book                   85          913.
 9 art                   128          902.
10 artists                64          894.
11 winners                50          880 
12 song                   70          874.
13 lit                    78          863.
14 word                  125          850.
15 life                   70          847.

To connect category words to another variable, I compared common category words by their average clue value. I only kept category words that appeared at least 50 times so that the results were not based on rare words. Now within these common words, “science” had the highest average clue value, at about $1,107. Other high-value category words included “origins,” “dictionary,” “islands,” “international,” “literary,” “french,” “book” and “art.”

Strategy

Based on these analyses, my strategy for winning Jeopardy would be to focus on these recurring knowledge areas rather than trying to memorize every fact ever. The category analysis showed that topics related to world knowledge, history, science, American topics, states, movies, sports and music appear quite a lot. The clue word analysis particularly stressed geography and culture, with common words such as “city,” “country,” “state,” “u.s,” “film,” “title,” “novel,” “president” and “capital.” In the end, I feel like the value analysis helped me come to the best strategy as science, dictionary/language, literature, French, books and art-related categories showed higher average clue values among common category words. So, while I still should prepare broadly, my strategy would be to prioritize geography, U.S. and world knowledge, science, literature, language, and arts/culture because these areas appear often and most of them are connected to higher-value clues.

Interactivity

For Portfolio 4, I added interactivity to my Jeopardy project by turning two of my original static bar charts into interactive plotly graphs. I believe doing this made the report easier to explore because now I can hover over each bar and see the exact word and number of appearances instead of estimating from the axis. I chose this kind of interactivity because the original project was mainly about comparing word frequencies, so I believe hover labels improve the usefulness of the graphs without changing the overall analysis.