# A tibble: 15 × 2
word n
<chr> <int>
1 world 414
2 words 366
3 history 279
4 time 234
5 names 233
6 american 231
7 u.s 230
8 century 227
9 science 223
10 letter 212
11 state 201
12 movies 187
13 movie 184
14 sports 184
15 music 174
Winning Jeopardy Interactively
Introduction
In this portfolio project, I use Jeopardy clue data to develop a study strategy for the game. I focus on two text variables, Category and Question, to identify common topics and clue patterns. I also compare category words to clue value to see which common topics are connected to higher-value clues.
Category words
The most common words in Jeopardy categories basically shows us that the show uses fields like geography, history, language and culture. “World” is the most common category word, appearing 414 times, followed by “words” with 366 appearances and “history” with 279 appearances. Other frequent category words include “time,” “names,” “american,” “science,” “state,” “movies,” “sports” and “music.” So I’d benefit from studying world knowledge, history, science, geography, wordplay and pop culture references rather than focusing only on one subject.
Clue words
# A tibble: 15 × 2
word n
<chr> <int>
1 city 593
2 country 482
3 state 450
4 u.s 445
5 film 414
6 word 403
7 title 370
8 american 288
9 world 268
10 novel 266
11 president 264
12 capital 257
13 famous 251
14 french 249
15 time 244
Not considering a few of the non-meaningful words,the most common words in the Jeopardy clues show that a lot of the clues are built around geography, culture, politics and language. Words like “city,” “country,” “state,” “u.s,” and “capital” show that geographic and national knowledge appear quite a bit in clue wording. Other common words, such as “film,” “title,” “novel” and “word” point toward entertainment, literature and language-based clues. So basically, based on this question analysis, my study strategy should be geography, U.S. knowledge, literature, film, politics and language rather than focusing only on one subject.
Counting subjects in categories
# A tibble: 2 × 2
subject count
<chr> <int>
1 history 279
2 science 223
The subject_category_count() function takes a vector of subject words and counts how often we can find each one in the Category column. From the test output we can see that “history” appears 279 times and “science” appears 223 times, matching the expected counts.
Category words and clue value
# A tibble: 15 × 3
word appearances average_value
<chr> <int> <dbl>
1 science 221 1107.
2 origins 50 1036
3 dictionary 60 993.
4 islands 50 992
5 international 50 946
6 literary 140 937.
7 french 63 929.
8 book 85 913.
9 art 128 902.
10 artists 64 894.
11 winners 50 880
12 song 70 874.
13 lit 78 863.
14 word 125 850.
15 life 70 847.
To connect category words to another variable, I compared common category words by their average clue value. I only kept category words that appeared at least 50 times so that the results were not based on rare words. Now within these common words, “science” had the highest average clue value, at about $1,107. Other high-value category words included “origins,” “dictionary,” “islands,” “international,” “literary,” “french,” “book” and “art.”
Strategy
Based on these analyses, my strategy for winning Jeopardy would be to focus on these recurring knowledge areas rather than trying to memorize every fact ever. The category analysis showed that topics related to world knowledge, history, science, American topics, states, movies, sports and music appear quite a lot. The clue word analysis particularly stressed geography and culture, with common words such as “city,” “country,” “state,” “u.s,” “film,” “title,” “novel,” “president” and “capital.” In the end, I feel like the value analysis helped me come to the best strategy as science, dictionary/language, literature, French, books and art-related categories showed higher average clue values among common category words. So, while I still should prepare broadly, my strategy would be to prioritize geography, U.S. and world knowledge, science, literature, language, and arts/culture because these areas appear often and most of them are connected to higher-value clues.
Interactivity
For Portfolio 4, I added interactivity to my Jeopardy project by turning two of my original static bar charts into interactive plotly graphs. I believe doing this made the report easier to explore because now I can hover over each bar and see the exact word and number of appearances instead of estimating from the axis. I chose this kind of interactivity because the original project was mainly about comparing word frequencies, so I believe hover labels improve the usefulness of the graphs without changing the overall analysis.