So, this is my first attempt at solving the Riddler. It’s real clunky, but I think I’m right. Firstly, let’s simulate the game. Set the dictionary up, then chose a random sample from the dictionary.
dictionary <- (1:267751)
working_dic <- dictionary
upper_limit <- tail(working_dic, 1)
lower_limit <- as.integer(working_dic[1])
word_choice <- sample(dictionary, size = 1)
Then, using a for loop (old school), we progressively guess between the upper and lower limits. The most sensible way is to guess directly between them (forcing an integer). So the first guess is always in the middle of the dictionary. This will minimise guesses (I think)
## [1] 18
Then, we can replicate this and put all in a function. Then we replicate, 1 million times
set_up_game <- function() {
working_dic <- dictionary
upper_limit <- tail(working_dic, 1)
lower_limit <- as.integer(working_dic[1])
word_choice <- sample(dictionary, size = 1)
no_guesses <- 0
while(guess != word_choice) {
guess <- as.integer((upper_limit+lower_limit)/2)
if(guess > word_choice) {
upper_limit <- guess
} else {
lower_limit <- guess
}
no_guesses = no_guesses + 1
}
return(no_guesses)
}
sample <- replicate(10e4, set_up_game())
Then we plot…
plot <-as_tibble(sample)
ggplot(plot) +
geom_histogram(aes(x = value)) +
xlim(10,20) +
xlab("Number of guesses")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 193 rows containing non-finite values (stat_bin).
## Warning: Removed 2 rows containing missing values (geom_bar).
plot %>%
count(value)
## # A tibble: 19 x 2
## value n
## <dbl> <int>
## 1 0 1
## 2 2 1
## 3 3 2
## 4 4 3
## 5 5 7
## 6 6 15
## 7 7 24
## 8 8 44
## 9 9 96
## 10 10 187
## 11 11 355
## 12 12 737
## 13 13 1535
## 14 14 3007
## 15 15 6289
## 16 16 12184
## 17 17 24535
## 18 18 48916
## 19 19 2062
Let’s see how much worse it is to guess a random number between lower and upper limits. I think this is really slow way of running a loop, but it works anyway.
random_guess <- function() {
working_dic <- dictionary
upper_limit <- tail(working_dic, 1)
lower_limit <- as.integer(working_dic[1])
word_choice <- sample(dictionary, size = 1)
no_guesses <- 0
while(guess != word_choice){
guess <- sample(lower_limit:upper_limit,1)
if(guess > word_choice) {
upper_limit <- guess
} else {
lower_limit <- guess
}
no_guesses <- no_guesses + 1
}
return(no_guesses)
}
sample <- replicate(10e4, random_guess())
So, it’s much less efficent, about twice as many guesses.
plot <-as_tibble(sample)
ggplot(plot) +
geom_histogram(aes(x = value)) +
xlab("Number of guesses")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot %>%
count(value, sort = TRUE)
## # A tibble: 55 x 2
## value n
## <dbl> <int>
## 1 25 6932
## 2 24 6898
## 3 26 6742
## 4 23 6645
## 5 22 6279
## 6 27 6170
## 7 28 5741
## 8 21 5710
## 9 29 5081
## 10 20 5019
## # ... with 45 more rows