For 35 years, trivia-lovers and game-show enthusiasts alike have been rapt with the TV game show Jeopardy!. Created by game show legend, Merv Griffin, the show originated in 1964, under Art Fleming, but later was rebooted in 1984 under the hosting duties of Alex Trebek.
Contestants compete to answer trivia questions in six categories for five varying amounts of money. Over the years, this value has been increased over time. After the initial Jeopardy Round, values are doubled in Double Jeopardy. During Final Jeopardy, contestants may wager whatever money they have earned in one final written question that all contestants answer.
However, one of the features of the show is the Daily Double. Appearing during both the Jeopardy and Double Jeopardy rounds, the Daily Double is a set of two questions that, when selected are solely available to the selector, who is able to wager any amount of their earned money (or up to $2000 if the contestant has less than $2000).
As shown during the recent Jeopardy! Greatest of All Time Tournament between legends of the game, James Holzhauer (highest single-game winner), Ken Jennings (longest win streak) and Brad Rutter (most all-time winnings), the Daily Double is the most important single differentiator of the game. All three contestants frequently were “seeking” the Daily Double question, and the point at which they found the Daily Double clue, as well as their proficiency in answering the clue were pivotal to contestants winning or losing. While Holzhauer and Jennings sparred over clues, contestant Brad Rutter was left rudderless by several too early Daily Double questions, either answering the question incorrectly or not having a substantial amount of money built up. Because of this, Rutter, the all-time leader Jeopardy money winner was shut out of each match, and often failed to qualify for the Final Jeopardy clue.
Given the clear importance of these questions, it services to learn more about the common characteristics and features of the Daily Double questions, both in order to better strategize, and to focus topic studies. During the course of this study, we will attempt to learn more about the make up of the Daily Double, including:
The data set I will be using is comprised of informaiton that I scraped from the website J-Archive!, which has meticulously recorded every clue since the show’s reboot on September 10, 1984. This ultimately comprises over 335,000 clues from the 35 year history of the show. As this process is helmed by fans, there are a few minor gaps (comments in the dataset may indicate a few clues which are lost to gaps in the VHS tapes they received of original-air recordings), but the greater part of the history is preserved. Available for perusal, we are provided information about the category, value and location of each question, as well as the answer and the number of failed and correct attempts on the answer. Most importantly to our current studies, a binary feature indicates whether or not the question was a Daily Double. This will be key to the visualizations and procedure we will compose.
###################
# Data Upload
jdf = read.csv('R/data/master_jeopardy_file.csv')
head(jdf)
## X epNum airDate extra_info round_name
## 1 0 1 1984-09-10 Premiere episode with Alex Trebek as host. Jeopardy
## 2 1 1 1984-09-10 Premiere episode with Alex Trebek as host. Jeopardy
## 3 2 1 1984-09-10 Premiere episode with Alex Trebek as host. Jeopardy
## 4 3 1 1984-09-10 Premiere episode with Alex Trebek as host. Jeopardy
## 5 4 1 1984-09-10 Premiere episode with Alex Trebek as host. Jeopardy
## 6 5 1 1984-09-10 Premiere episode with Alex Trebek as host. Jeopardy
## coord category order value daily_double
## 1 (1, 1) LAKES & RIVERS 11 (100,) False
## 2 (2, 1) INVENTIONS 13 (100,) False
## 3 (3, 1) ANIMALS 1 (100,) False
## 4 (4, 1) FOREIGN CUISINE 20 (100,) False
## 5 (5, 1) ACTORS & ROLES 3 (100,) False
## 6 (1, 2) LAKES & RIVERS 12 (200,) False
## question answer
## 1 River mentioned most often in the Bible the Jordan
## 2 Marconi's wonderful wireless the radio
## 3 These rodents first got to America by stowing away on ships rats
## 4 The "coq" in coq au vin chicken
## 5 Video in which Michael Jackson plays a werewolf & a zombie "Thriller"
## 6 Scottish word for lake loch
## correctAttempts wrongAttempts season
## 1 1 1 1
## 2 0 3 1
## 3 1 0 1
## 4 1 0 1
## 5 1 0 1
## 6 1 0 1
Visualization 1: Has the average time to selection of the Daily Double changed over time?
Now, knowing the likely locations of the Daily Double clue, it becomes important to understand the strategic timeline of selecting the Daily Double clue. As noted above, contestant Brad Rutter was effectively undone by being forced to seek out the Daily Double clue early by his competitors, and too often he arrived at the clue before having more than the minimum bet of $2,000. This led to an inefficiency as when he got the question correct, he did not have enough money to fundamentally benefit from it. Therefore, the order in which one gets to the Daily Double is important as well, with players wanting to arrive at the clue with some amount of accrual to maximize its benefit.
However, as this is a competition, it is safe to assume that other competitors are employing a similar tactic. because of this, we want to know what is the average number of questions before the Daily Double is arrived at? And, moreover, has this changed over time?
######################################
# Graphic 1
# Has the average selction of the Daily Double changed over time?
# Subsetting the data to only DD answers
ddf = jdf[which(jdf$daily_double == "True"),]
head(ddf)
## X epNum airDate extra_info
## 11 10 1 1984-09-10 Premiere episode with Alex Trebek as host.
## 42 41 1 1984-09-10 Premiere episode with Alex Trebek as host.
## 93 92 10 1984-09-21
## 149 148 11 1984-09-24
## 162 161 110 1985-02-08
## 192 191 110 1985-02-08
## round_name coord category order value daily_double
## 11 Jeopardy (1, 3) LAKES & RIVERS 14 (800,) True
## 42 Double Jeopardy (6, 4) 4-LETTER WORDS 4 (1000,) True
## 93 Double Jeopardy (5, 4) HOMONYMS 10 (1000,) True
## 149 Double Jeopardy (5, 5) LANDMARKS 9 (600,) True
## 162 Jeopardy (5, 2) AUSTRALIA 19 (500,) True
## 192 Double Jeopardy (2, 4) LAKES & RIVERS 6 (1000,) True
## question
## 11 River in this famous song:
## 42 It's the first 4-letter word in "The Star Spangled Banner"
## 93 Didn't see the fog
## 149 Though unmarried & childless, he lives in the world's largest residential palace
## 162 Title of this song, which actually means "to tramp the roads with a backpack"
## 192 While poets pour over the Rhine & Danube this 2nd largest German river gets no press
## answer correctAttempts wrongAttempts season
## 11 the Volga River 1 0 1
## 42 what 1 0 1
## 93 missed (mist) 0 1 1
## 149 the Pope 0 1 1
## 162 "Waltzing Matilda" 1 0 1
## 192 the Elbe 0 1 1
m = aggregate(ddf$order, by=list(ddf$season), FUN=mean)
# m
jsd = aggregate(ddf$order, by=list(ddf$season), FUN=sd)
# jsd
# plot(m)
# plot(jsd)
#
# remove.packages(c("ggplot2", "data.table"))
# install.packages('Rcpp', dependencies = TRUE)
# install.packages('ggplot2', dependencies = TRUE)
# install.packages('data.table', dependencies = TRUE)
# install.packages('dplyr', dependencies = TRUE)
# install.packages('rlang', dependencies = TRUE)
library(ggplot2, data.table)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
#
order_summary = ddf %>%
group_by(season) %>%
summarise(order_means = mean(order),
order_sd = sd(order),
order_SE = sd(order)/sqrt(n()))
order_summary
## # A tibble: 35 x 4
## season order_means order_sd order_SE
## <int> <dbl> <dbl> <dbl>
## 1 1 13.3 7.37 0.733
## 2 2 15.5 7.98 0.633
## 3 3 15.5 8.08 0.424
## 4 4 16.2 8.20 0.386
## 5 5 15.9 8.02 0.405
## 6 6 16.4 8.21 0.381
## 7 7 15.2 8.40 0.648
## 8 8 16.4 8.36 0.553
## 9 9 17.8 7.93 0.492
## 10 10 16.9 8.37 0.650
## # ... with 25 more rows
# Give a title and labels
jp.er1 = order.plot = ggplot(order_summary, aes(order_summary$season, order_means)) +
geom_col()
jp.er2 = jp.er1 + geom_errorbar(aes(ymin = order_means - order_sd, ymax = order_means + order_sd), color="black", width = 0.2)+
labs(title="Average Selection of Daily Double per Season with 90% Confidence interval", x="Season", y="Order of Question Selected")
jp.er2
Based upon this visualization, we see that the general number of questions that it takes for contestants to reach the Daily Double has remained fairly stagnant over time, with a certain amount of differentiation within each season, reflected by the fairly wide error bars.
This indicates that the savvy contestant would want to start hunting for the Daily Double around the 7th or 8th question in order to prevent being swept out from the category.
In order to begin Daily Double Hunting, however, the contestant will want to know more information about how to root out the Daily Double.
Visualization 2:
What are the most common categories to receie the Daily Double?
When the board appears there are six categories revealed to the players, with five questions each. These categories can span any number of topics, from Arts to Zoology. In order to find the Daily Double, conestants must find the specific space on the board where the clue has been placed.
Over the 35-year history, there have been nearly 40,000 unique categories to appear. However, certain categories appear more often, and various categories that are either word-play (‘“DR” Movies’ or ‘“Kings” & “Queens”’) or feature a special time-sensitive subject (such as “David Letterman” airing the night before David Letterman took over The Late Show. Categories that occur more often are more predictable and can be planned for, so findng out if there is regularlity to which categories receive the Daily Double would be advantageous to being able to find the clue early.
######################################
# Graphic 2
# Identifying the most common Daily Double categories
# Getting a good sense of the field
# dd.catsum = summary(ddf$category)
# ndd.catsum = summary(nonddf$category)
dd.cat.df = as.data.frame(table(ddf$category))
ndd.cat.df = as.data.frame(table(jdf$category))
# Because the questions can occur 5 times in a category,
# we divide the frequency by 5. See note above
# about the assumptions made.
ndd.cat.df$Freq=ndd.cat.df$Freq/5
# Removing categories that have been represented fewer than 30 times in 35 years
common.dd.cat = dd.cat.df[dd.cat.df$Freq >=30,]
# Removing Categories that have occurred more than 199 times in 35 years
# ndd.cat.dfc = ndd.cat.dfc[ndd.cat.dfc$Freq>=40,]
# dim(dd.cat.df[dd.cat.df$Freq !=0,])
# Understanding the mean and sd of everything and removing those
# that are nil
avg.frq.dd = mean(dd.cat.df$Freq[dd.cat.df$Freq >1])
sd.frq.dd = sd(dd.cat.df$Freq[dd.cat.df$Freq >1])
avg.frq.dd
## [1] 4.562433
sd.frq.dd
## [1] 6.334391
Given that there are six categories to a given Jeopardy board, one would expect that if the placement is random, that each category has a 1/6 (16.667%) chance of receiving the Daily Double. Therefore, what we want to see are the categories which show a significantly higher rate of receiving the Daily Double, as well as those that occur frequently enough to be possible to occur in our game.
Visualization 3
ddf = jdf[which(jdf$daily_double == "True"),]
head(ddf)
## X epNum airDate extra_info
## 11 10 1 1984-09-10 Premiere episode with Alex Trebek as host.
## 42 41 1 1984-09-10 Premiere episode with Alex Trebek as host.
## 93 92 10 1984-09-21
## 149 148 11 1984-09-24
## 162 161 110 1985-02-08
## 192 191 110 1985-02-08
## round_name coord category order value daily_double
## 11 Jeopardy (1, 3) LAKES & RIVERS 14 (800,) True
## 42 Double Jeopardy (6, 4) 4-LETTER WORDS 4 (1000,) True
## 93 Double Jeopardy (5, 4) HOMONYMS 10 (1000,) True
## 149 Double Jeopardy (5, 5) LANDMARKS 9 (600,) True
## 162 Jeopardy (5, 2) AUSTRALIA 19 (500,) True
## 192 Double Jeopardy (2, 4) LAKES & RIVERS 6 (1000,) True
## question
## 11 River in this famous song:
## 42 It's the first 4-letter word in "The Star Spangled Banner"
## 93 Didn't see the fog
## 149 Though unmarried & childless, he lives in the world's largest residential palace
## 162 Title of this song, which actually means "to tramp the roads with a backpack"
## 192 While poets pour over the Rhine & Danube this 2nd largest German river gets no press
## answer correctAttempts wrongAttempts season
## 11 the Volga River 1 0 1
## 42 what 1 0 1
## 93 missed (mist) 0 1 1
## 149 the Pope 0 1 1
## 162 "Waltzing Matilda" 1 0 1
## 192 the Elbe 0 1 1
nonddf = jdf[which(jdf$daily_double != "True"),]
head(nonddf)
## X epNum airDate extra_info round_name
## 1 0 1 1984-09-10 Premiere episode with Alex Trebek as host. Jeopardy
## 2 1 1 1984-09-10 Premiere episode with Alex Trebek as host. Jeopardy
## 3 2 1 1984-09-10 Premiere episode with Alex Trebek as host. Jeopardy
## 4 3 1 1984-09-10 Premiere episode with Alex Trebek as host. Jeopardy
## 5 4 1 1984-09-10 Premiere episode with Alex Trebek as host. Jeopardy
## 6 5 1 1984-09-10 Premiere episode with Alex Trebek as host. Jeopardy
## coord category order value daily_double
## 1 (1, 1) LAKES & RIVERS 11 (100,) False
## 2 (2, 1) INVENTIONS 13 (100,) False
## 3 (3, 1) ANIMALS 1 (100,) False
## 4 (4, 1) FOREIGN CUISINE 20 (100,) False
## 5 (5, 1) ACTORS & ROLES 3 (100,) False
## 6 (1, 2) LAKES & RIVERS 12 (200,) False
## question answer
## 1 River mentioned most often in the Bible the Jordan
## 2 Marconi's wonderful wireless the radio
## 3 These rodents first got to America by stowing away on ships rats
## 4 The "coq" in coq au vin chicken
## 5 Video in which Michael Jackson plays a werewolf & a zombie "Thriller"
## 6 Scottish word for lake loch
## correctAttempts wrongAttempts season
## 1 1 1 1
## 2 0 3 1
## 3 1 0 1
## 4 1 0 1
## 5 1 0 1
## 6 1 0 1
dd.catsum = summary(ddf$category)
ndd.catsum = summary(nonddf$category)
dd.cat.table = table(ddf$category)
ndd.cat.table = table(nonddf$category)
table(ddf$wrongAttempts)
##
## 0 1
## 12208 6596
dd.cat.df = as.data.frame(dd.cat.table)
ndd.cat.df = as.data.frame(ndd.cat.table)
ddf$value[1:5]
## [1] (800,) (1000,) (1000,) (600,) (500,)
## 6537 Levels: () (0, 0) (0, 0, 100) (0, 0, 1000) (0, 0, 11000) ... (9999, 8500, 9601)
mean(ddf$value[which(ddf$wrongAttempts == 1)])
## Warning in mean.default(ddf$value[which(ddf$wrongAttempts == 1)]): argument is
## not numeric or logical: returning NA
## [1] NA
common.dd.cat = dd.cat.df[dd.cat.df$Freq >=30,]
dim(common.dd.cat)
## [1] 30 2
common.ndd.cat = ndd.cat.df[ndd.cat.df$Freq>=200,]
dim(common.ndd.cat)
## [1] 126 2
dim(dd.cat.df[dd.cat.df$Freq !=0,])
## [1] 12128 2
avg.frq.dd = mean(dd.cat.df$Freq[dd.cat.df$Freq >1])
sd.frq.dd = sd(dd.cat.df$Freq[dd.cat.df$Freq >1])
avg.frq.dd
## [1] 4.562433
sd.frq.dd
## [1] 6.334391
ndd.cat.dfc = as.data.frame(table(jdf$category))
ndd.cat.dfc$Freq=ndd.cat.dfc$Freq/5
head(ndd.cat.df)
## Var1 Freq
## 1 '20s TRANSPORTATION 5
## 2 '21 4
## 3 '30S FILM FACTS 5
## 4 '30s TV 5
## 5 '38 SPECIAL 5
## 6 '40s FICTION 5
head(ndd.cat.dfc)
## Var1 Freq
## 1 '20s TRANSPORTATION 1
## 2 '21 1
## 3 '30S FILM FACTS 1
## 4 '30s TV 1
## 5 '38 SPECIAL 1
## 6 '40s FICTION 1
head(dd.cat.df)
## Var1 Freq
## 1 '20s TRANSPORTATION 0
## 2 '21 1
## 3 '30S FILM FACTS 0
## 4 '30s TV 0
## 5 '38 SPECIAL 0
## 6 '40s FICTION 0
pct.dd.df = ndd.cat.dfc
pct.dd.df$Freq = dd.cat.df$Freq/ndd.cat.dfc$Freq
pct.dd.df$n = as.data.frame(table(jdf$category))$Freq
head(pct.dd.df)
## Var1 Freq n
## 1 '20s TRANSPORTATION 0 5
## 2 '21 1 5
## 3 '30S FILM FACTS 0 5
## 4 '30s TV 0 5
## 5 '38 SPECIAL 0 5
## 6 '40s FICTION 0 5
common.dd.pct = pct.dd.df[which(pct.dd.df$n >=100 & pct.dd.df$Freq >=0.2),]
# Restricting to only the top 20 categories by Daily Double Frequency
top.dd.vals = common.dd.pct[order(common.dd.pct$Freq, decreasing=TRUE),]
head(top.dd.vals)
## Var1 Freq n
## 17485 HAIL TO THE CHIEF 0.6084656 189
## 17154 GOVERNMENT & POLITICS 0.5660377 265
## 22712 LIBRARIES 0.5645161 186
## 36973 THE ELEMENTS 0.5365854 205
## 2813 12-LETTER WORDS 0.5357143 168
## 41370 U.S. PRESIDENTS 0.5166667 300
names(top.dd.vals) = c("category", "c.freq", "n")
top.dd.vals = top.dd.vals[1:20,]
top.dd.vals$order = findInterval(top.dd.vals$c.freq, sort(top.dd.vals$c.freq))
top.dd.vals$order = 21 - top.dd.vals$order
# install.packages(ggplot2, dependencies = TRUE)
# library(ggplot2)
j.splot = ggplot(data=top.dd.vals, aes(x=top.dd.vals$c.freq, y=top.dd.vals$n),
color="#004c6d", fill="#004c6d") +
geom_point(size = 0.1, shape=1) +
geom_text(label=top.dd.vals$category, size=2.5) +
theme_minimal() +
labs(title="Plot of Highest Daily Double Occurrences by Category with Category Occurrence",
y="Category Occurences", x="Percent Daily Double")
j.splot
Based on this graph, we understand that there are specific categories that receive the Daily Double more than twice as often as expected. In fact, there are nine categories whose presence indicate essentially a coin flip of whether or not it will possess the Daily Double question. Due to this, these subjects would be ideally suited to study. However, as it should be noted, these categories most of these occur about 300 times or fewer, which is not quite the most popular of the categories (while, admittedly happening nearly 100 times per season).
This leads us to want to know, what is the Daily Double Rate for those categories that are the most common? For this, we rework the dataset to view the Top 20 occurring categories and similarly graph them to indicate both the category occurrences and the percent of Daily Doubles.
Visualization 4
# Restricting to only the top 20 categories
top.c.vals = common.dd.pct[order(common.dd.pct$n, decreasing=TRUE),]
names(top.c.vals) = c("category", "c.freq", "n")
top.c.vals = top.c.vals[1:20,]
top.c.vals$order = findInterval(top.c.vals$c.freq, sort(top.c.vals$n))
top.c.vals$order = 21 - top.c.vals$order
library(ggplot2)
jc.splot = ggplot(data=top.c.vals, aes(x=top.c.vals$c.freq, y=top.c.vals$n),
color="#004c6d", fill="#004c6d") +
geom_point(size = 0.1, shape=1) +
geom_text(label=top.c.vals$category, size=2.5) +
theme_minimal() +
labs(title="Plot of Daily Double Occurence Percentages for Top Occurring Categories",
y="Category Occurences", x="Percent Daily Double")
jc.splot
So, based upon this, we can see that there are several categories that occur with enough frequency that to study them is worthwhile, but also only provide moderate to little benefit for the purposes of studying with an eye to the Daily Double.
Given what we now know about topics common to the daily double category, it becomes important to ensure that we can answer the question correctly. Over the entire run, contestants have answered 35.6% of Daily Doubles incorrectly, and many times, this can result in a massive loss of points. To best prepare, it is worth getting an idea for the subjects within these topics that frequently occur and can be actionably studied in preparation for the competition.
# Create a separate dataset which has all
# answers which have appeared more than 100 times.
common.answers = jdf[which(jdf$a.freq >=100 & jdf$answer != "="),]
head(common.answers)
## [1] X epNum airDate extra_info
## [5] round_name coord category order
## [9] value daily_double question answer
## [13] correctAttempts wrongAttempts season
## <0 rows> (or 0-length row.names)
dim(common.answers)
## [1] 0 15
state.list = read.csv('R/data/statename.csv')
head(state.list)
## ï..state latitude longitude name
## 1 AK 63.58875 -154.49306 Alaska
## 2 AL 32.31823 -86.90230 Alabama
## 3 AR 35.20105 -91.83183 Arkansas
## 4 AZ 34.04893 -111.09373 Arizona
## 5 CA 36.77826 -119.41793 California
## 6 CO 39.55005 -105.78207 Colorado
st.cap.list = read.csv('R/data/statecapname.csv')
st.cap.list = st.cap.list$capital
state.list = state.list$name
pres.list = read.csv('R/data/presname.csv')
pres.list = pres.list$President
country.list = read.csv('R/data/countryname.csv')
co.cap.list = country.list$capital
head(country.list)
## ï..country capital type
## 1 Abkhazia Sukhumi countryCapital
## 2 Afghanistan Kabul countryCapital
## 3 Akrotiri and Dhekelia Episkopi Cantonment countryCapital
## 4 Albania Tirana countryCapital
## 5 Algeria Algiers countryCapital
## 6 American Samoa Pago Pago countryCapital
country.list = country.list$ï..country
common.answers$a.type = ifelse(is.element(common.answers$answer, state.list), "State",
ifelse(is.element(common.answers$answer, pres.list), "President",
ifelse(is.element(common.answers$answer, country.list), "Country",
ifelse(is.element(common.answers$answer, st.cap.list), "S. Capital",
ifelse(is.element(common.answers$answer, co.cap.list), "C. Captial","Other")))))
head(common.answers)
## [1] X epNum airDate extra_info
## [5] round_name coord category order
## [9] value daily_double question answer
## [13] correctAttempts wrongAttempts season a.type
## <0 rows> (or 0-length row.names)
dim(common.answers)
## [1] 0 16
common.answers.un = common.answers
library(dplyr)
common.answers.un = distinct(common.answers.un, common.answers.un$answer, .keep_all=TRUE)
head(common.answers.un)
## [1] X epNum airDate
## [4] extra_info round_name coord
## [7] category order value
## [10] daily_double question answer
## [13] correctAttempts wrongAttempts season
## [16] a.type common.answers.un$answer
## <0 rows> (or 0-length row.names)
dim(common.answers.un)
## [1] 0 17
table(common.answers.un$a.type)
## < table of extent 0 >
# I did a little bit of manual data cleaning just to get a few more Types for the dataset
common.answers.un = read.csv('R/data/common_answers.csv')
# install.packages(treemap, dependecies=TRUE)
library(treemap)
table.common.ans = as.data.frame(table(common.answers.un$a.type))
names(table.common.ans) = c('type', 'freq')
table.common.ans$cat = ifelse(table.common.ans$type == "C. Captial" | table.common.ans$type == "Country" |
table.common.ans$type == "State" | table.common.ans$type == "S. Capital",
"Geography",
ifelse(table.common.ans$type == "Historical Figure" |
table.common.ans$type == "President", "History", "Other"))
treemap(table.common.ans,
index=c("cat", "type"),
vSize="freq",
title="Types of Answers Occurring More Than 100 Times",
fontsize.labels=c(12,8),
align.labels=list(c("center", "center"), c("left", "top")))
Visualization 4: identifying the most common space for Daily Double questions.
Finally, now that we know the categories that are most likely to inclue the Daily Doubles if they appear on the board as well as the most common categories to study for, we have to accept the fact that even if a category has featured over 300 times, that in the 5,723 episode run of the show, that is still a fairly rare occurrence. More helpful, then is knowing where the Daily Double appears on the game board. As mentioned, there are six columns and five rows to any Jeopardy! game board, and even casual watchers of the show could intuit that there is design in the placement of the Daily Double. Given that we have the coordinates of each Daily Double clue, we can therefore produce a heatmap indicating the exact locations and frequency of the Daily Double.
######################################
# Graphic 4
#
# Produce a Heat Map of the Locations for
# Daily Double Answers
# Subsetting the data to only DD answers
ddf = jdf[which(jdf$daily_double == "True"),]
head(ddf)
## X epNum airDate extra_info
## 11 10 1 1984-09-10 Premiere episode with Alex Trebek as host.
## 42 41 1 1984-09-10 Premiere episode with Alex Trebek as host.
## 93 92 10 1984-09-21
## 149 148 11 1984-09-24
## 162 161 110 1985-02-08
## 192 191 110 1985-02-08
## round_name coord category order value daily_double
## 11 Jeopardy (1, 3) LAKES & RIVERS 14 (800,) True
## 42 Double Jeopardy (6, 4) 4-LETTER WORDS 4 (1000,) True
## 93 Double Jeopardy (5, 4) HOMONYMS 10 (1000,) True
## 149 Double Jeopardy (5, 5) LANDMARKS 9 (600,) True
## 162 Jeopardy (5, 2) AUSTRALIA 19 (500,) True
## 192 Double Jeopardy (2, 4) LAKES & RIVERS 6 (1000,) True
## question
## 11 River in this famous song:
## 42 It's the first 4-letter word in "The Star Spangled Banner"
## 93 Didn't see the fog
## 149 Though unmarried & childless, he lives in the world's largest residential palace
## 162 Title of this song, which actually means "to tramp the roads with a backpack"
## 192 While poets pour over the Rhine & Danube this 2nd largest German river gets no press
## answer correctAttempts wrongAttempts season
## 11 the Volga River 1 0 1
## 42 what 1 0 1
## 93 missed (mist) 0 1 1
## 149 the Pope 0 1 1
## 162 "Waltzing Matilda" 1 0 1
## 192 the Elbe 0 1 1
# Creating a dataframe of the frequencies
# of DD by coordinate
jcol1 = c(6, 393, 1072, 1434, 990)
jcol2 = c(3, 238, 681, 977, 609)
jcol3 = c(7, 326, 926, 1322, 899)
jcol4 = c(3, 271, 922, 1230, 913)
jcol5 = c(4, 305, 880, 1228, 830)
jcol6 = c(4, 206, 654, 867, 604)
coordmdf = data.frame(jcol1, jcol2, jcol3, jcol4, jcol5, jcol6)
coordmdf
## jcol1 jcol2 jcol3 jcol4 jcol5 jcol6
## 1 6 3 7 3 4 4
## 2 393 238 326 271 305 206
## 3 1072 681 926 922 880 654
## 4 1434 977 1322 1230 1228 867
## 5 990 609 899 913 830 604
jddmat = as.matrix(coordmdf)
# jddfreq = as.matrix(c(6, 393, 1072, 1434, 990,
# 3, 238, 681, 977, 609,
# 7, 326, 926, 1322, 899,
# 3, 271, 922, 1230, 913,
# 4, 305, 880, 1228, 830,
# 4, 206, 654, 867, 604))
my_palette = colorRampPalette(c("#ccccff", "#000999"))(n = 10)
library(gplots)
##
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
##
## lowess
heatmap.2(jddmat,
density.info="none",
trace="none",
offsetCol=-18,
labRow=c("$200", "$400", "$600", "$800", "$1000"),
# labCol=c("Cat1", "Cat2", "Cat3", "Cat4", "Cat5", "Cat6"), (this was finnicky and didn't work well)
labCol=c("", "", "", "", "", ""),
main = "Daily Double Location Heat Map",
xlab = "Category",
ylab = "Dollar Amount",
margins =c(12,9),
col=my_palette,
# breaks=col_breaks,
dendrogram='none',
cellnote=jddmat,
notecex=1,
notecol = "#FFCC00",
Rowv=FALSE,
Colv=FALSE)
Based upon this heatmap, we are able to see that the fourth row is the most common row for the Daily Double to occur in. Knowing this, we can therefore focus our efforts on selecting quesitons from that row in order to maximize our ability to secure the Daily Double Clue.
So, based upon our findings here we have been able to learn several features that are greatly important to securing one of the most important features of the Jeopardy! board. Across the history of the show, contestants on average achieve the Daily Double around the 13th question, meaning that a savvy contestant would want to seek out the Daily Double around the 7th or 8th question. They would want to look for general categories, and those that might constitute an area of study at a university (or a university itself). And based upon the subjects provided, they should have ample knowledge of History, Geography, or one of the other most-common answer sets. And should the contestant see one of these generalized categories in the first, third, fourth or fifth column, then they should jump on that $800 question to attempt to play the numbers and hunt the Daily Double.