```
BST 249 Course Note
Some code for visualization from Kazuki http://rpubs.com/kaz_yos/dyn-prog-1
Before entering the game, the participant starts with \(n\) dollars. Two possible values of \(\theta \in \{\theta_1, \theta_2\}\) are presented to the participant for guessing. The game mechanism randomly picks a value \(\theta\) out of \(\{\theta_1, \theta_2\}\) with equal probability \(1/2\). Then, this \(\theta\) is used as the true underlying probability of being \(Success\) in a Bernoulli random number generator which generates two potential outcomes \(\{Success, Failure\}\). Whenever the participant has money (i.e. \(\geq 1\) dollars) on his hand, he has the option to spend 1 dollar to observe the next outcome (i.e. \(Success\) of \(Failure\)) from the Bernoulli random number generator. Also, at any time throughout the game, the participant has one chance to stop further observing and guess which of \(\{\theta_1, \theta_2\}\) is the true underlying \(\theta\) that was used in the game. If the participant’s guess on \(\theta\) is correct, he will receive a fixed amount of payoff (here we consider the payoff as \(n\) dollars, the same as the initial amount of money); otherwise nothing. In both cases, the game stops.
Our objective is to find the optimal decision that maximizes the expected utility. Here, the utility is defined as \[U = -Cost + payoff,\] where
\[payoff = \begin{cases}n & \mbox{ if guesses correctly;}\\ 0 & \mbox{ otherwise.}\end{cases}\]
Let \(i\) and \(j\) be the total numbers of \(Success\) and \(Failure\) that have been observed. Then, a particular decision \(d\) is a map from any possible \((i,j)\) to whether “continue” looking or “stop” (and guess), i.e.
\[d: \left((i,j): i+ j \leq n-1\right) \longrightarrow \{continue, stop\}.\]
Note that if \(i + j = n\), then the participant has used all the money to observe with no more money on hand. Then he has to “stop” and guess the value of \(\theta\) which is a deterministic action. Therefore, the possible values of \((i,j)\) that are worth considering is \(i + j \leq n-1\).
The space of decisions is thus a class of maps, i.e. \[\mathcal{D} = \{d: \left((i,j): i+ j \leq n-1\right) \longrightarrow \{continue, stop\}\}.\]
The optimal decision is \[d^* = \text{argmax}_{d \in \mathcal{D}} U(d),\] where \(U(d)\) is the expected utility.
This problem essentially comes down to a binary decision tree: at each observed \((i,j)\) where \(0 \leq i + j < n\) (including \(i = j = 0\)), we have the option to decide whether to continue looking or stop. At \((i,j)\) where \(i + j = n\), we have to stop. Thus, from layer 0 to layer \(n-1\), there are \(1 + 2 +\cdots + n = \frac{n(n+1)}{2}\) possible nodes for the participant to choose whether to continue or stop, which leads to \(2^{\frac{n(n+1)}{2}}\) different patterns (decisions). Therefore, for a small to moderate \(n\), we may use brute force approach (i.e. calculate the expected utility at all possible values of \((i,j)\) and compare).
However, when \(n\) is moderate to large, the number of calculations using brute force grows exponentially. To retain efficiency, we can use Dynamic Programming (DP) which saves the amount of computations by memorizing and reusing the quantities from previous steps. Thus, there is a computation time-space trade-off for Dynamic Programming and in most cases the burden of using moderate to large space is acceptable.
First, we may adopt the objective prior belief of \(\theta\) being whether \(\theta_1\) or \(\theta_2\) as \(\pi = (1/2, 1/2)\). This prior is compatible according to the game mechanism and is further used for calculating the posterior of \(\theta\) being whether \(\theta_1\) or \(\theta_2\) after observing the data. Note that although the participant observes the entile sequence of \(Success\) and \(Failure\), the total number of \(Success\) \((i)\) and \(Failure\) \((j)\) are sufficient for obtaining the posterior.
Let \(d^*_h(i,j)\) be the optimal decision after observing \((i,j)\) where \(h = i+j\), \(U^*_h(i,j)\) be the optimal expected utility, \(U_h(i,j,C)\) is the optimal expected utility if we decide to continue looking, and \(U_h(i,j,S)\) is the optimal expected utility if we decide to stop. At each observed \((i,j)\) pair, if we decide to continue looking and observe a new \(Success\), then the optimal expected utility from then on is \(U^*_{h+1}(i+1,j)\). Similarly, if we decide to continue looking and observe a new \(Failure\), then the optimal expected utility from then on is \(U^*_{h+1}(i,j+1)\). Thus, by storing the values of \(U^*_{h+1}(i+1,j)\) and \(U^*_{h+1}(i,j+1)\), we may easily compute the value of \(U_h(i,j,C)\). This is the key idea of Dynamic Programming.
require(ggplot2)
post_prob <- function(pi_1, theta_1, theta_2, i, j){
pi_2 <- 1 - pi_1
num <- pi_1 * theta_1^i * (1 - theta_1)^j
denom <- num + pi_2 * theta_2^i * (1 - theta_2)^j
return(num/denom)
}
Utility_stop <- function(pi_1, theta_1, theta_2, i, j, payoff){
Cost <- i+j
post_prob_1 <- post_prob(pi_1, theta_1, theta_2, i, j)
Utility <- -1 * Cost + payoff * max(post_prob_1, 1-post_prob_1)
return(Utility)
}
Utility_cont <- function(U_star, pi_1, theta_1, theta_2, i, j, payoff){
n_step <- i+j
i_index <- i+1
j_index <- j+1
post_prob_1 <- post_prob(pi_1, theta_1, theta_2, i, j)
post_prob_success <- post_prob_1 * theta_1 + (1-post_prob_1) * theta_2
if (n_step == dim(U_star)[1]-1){
Utility <- post_prob_success * Utility_stop(pi_1, theta_1, theta_2, i+1, j, payoff) +
(1-post_prob_success) * Utility_stop(pi_1, theta_1, theta_2, i, j+1, payoff)
}else{
Utility <- post_prob_success * U_star[i_index+1, j_index] + (1-post_prob_success) * U_star[i_index, j_index+1]
}
return(Utility)
}
decision <- function(U_ij_cont, U_ij_stop){
if (U_ij_cont >= U_ij_stop){
return(1)
}else{
return(0)
}
}
AdaptiveDesign <- function(pi_1 = 0.5, theta_1, theta_2, n_init, payoff = n_init){
U_cont <- matrix(-Inf, nrow = n_init, ncol = n_init)
U_stop <- matrix(-Inf, nrow = n_init, ncol = n_init)
U_star <- matrix(-Inf, nrow = n_init, ncol = n_init)
Decision <- matrix(NA, nrow = n_init, ncol = n_init)
for (sum in (n_init-1):0){
for (i in 0:sum){
j <- sum-i
i_index <- i+1
j_index <- j+1
U_stop[i_index,j_index] <- Utility_stop(pi_1, theta_1, theta_2, i, j, payoff)
U_cont[i_index,j_index] <- Utility_cont(U_star, pi_1, theta_1, theta_2, i, j, payoff)
U_star[i_index,j_index] <- max(U_cont[i_index,j_index], U_stop[i_index,j_index])
Decision[i_index,j_index] <- decision(U_cont[i_index,j_index], U_stop[i_index,j_index])
}
}
data <- matrix(NA, nrow = (n_init)*(n_init+1)/2, ncol = 6)
line <- 1
for (i in 0:(n_init-1)){
for (j in 0:(n_init-1-i)){
i_index <- i+1
j_index <- j+1
data[line,1] <- i
data[line,2] <- j
data[line,3] <- Decision[i_index,j_index]
data[line,4] <- U_star[i_index,j_index]
data[line,5] <- U_cont[i_index,j_index]
data[line,6] <- U_stop[i_index,j_index]
line <- line+1
}
}
df <- as.data.frame(data)
colnames(df) <- c("Success","Failure","Decision_Binary","U_star","U_stop","U_cont")
df$Decision <- ifelse(df$Decision_Binary == 1, "continue", "stop")
return(df)
}
df1 <- AdaptiveDesign(theta_1 = 0.25, theta_2 = 0.75, n_init = 30)
head(df1)
## Success Failure Decision_Binary U_star U_stop U_cont Decision
## 1 0 0 1 23.80000 23.80000 15.00000 continue
## 2 0 1 1 23.80000 23.80000 21.50000 continue
## 3 0 2 0 25.00000 24.69000 25.00000 stop
## 4 0 3 0 25.92857 24.92857 25.92857 stop
## 5 0 4 0 25.63415 24.63415 25.63415 stop
## 6 0 5 0 24.87705 23.87705 24.87705 stop
ggplot(data = df1, mapping = aes(x = Success, y = Failure)) +
geom_point(aes(colour = Decision)) +
theme_bw() + theme(legend.key = element_blank()) +
ggtitle(expression(paste(theta[1], " = 0.25, ", theta[2], " = 0.75, ", "initial = payoff = $30"))) + theme(plot.title = element_text(hjust = 0.5))
## Plot optimal expected utilities at each (i,j)
ggplot(data = df1, mapping = aes(x = Success, y = Failure, colour = U_star)) +
geom_point(size = 5) +
scale_color_gradient2(low = "purple", high = "red") +
theme_bw() + theme(legend.key = element_blank()) +
ggtitle(expression(paste(theta[1], " = 0.25, ", theta[2], " = 0.75, ", "initial = payoff = $30"))) + theme(plot.title = element_text(hjust = 0.5))
df2 <- AdaptiveDesign(theta_1 = 0.5, theta_2 = 0.7, n_init = 30)
ggplot(data = df2, mapping = aes(x = Success, y = Failure)) +
geom_point(aes(colour = Decision)) +
theme_bw() + theme(legend.key = element_blank()) +
ggtitle(expression(paste(theta[1], " = 0.5, ", theta[2], " = 0.7, ", "initial = payoff = $30"))) + theme(plot.title = element_text(hjust = 0.5))
## Plot optimal expected utilities at each (i,j)
ggplot(data = df2, mapping = aes(x = Success, y = Failure, colour = U_star)) +
geom_point(size = 5) +
scale_color_gradient2(low = "purple", high = "red") +
theme_bw() + theme(legend.key = element_blank()) +
ggtitle(expression(paste(theta[1], " = 0.5, ", theta[2], " = 0.7, ", "initial = payoff = $30"))) + theme(plot.title = element_text(hjust = 0.5))
df3 <- AdaptiveDesign(theta_1 = 0.7, theta_2 = 0.9, n_init = 30)
ggplot(data = df3, mapping = aes(x = Success, y = Failure)) +
geom_point(aes(colour = Decision)) +
theme_bw() + theme(legend.key = element_blank()) +
ggtitle(expression(paste(theta[1], " = 0.7, ", theta[2], " = 0.9, ", "initial = payoff = $30"))) + theme(plot.title = element_text(hjust = 0.5))
## Plot optimal expected utilities at each (i,j)
ggplot(data = df3, mapping = aes(x = Success, y = Failure, colour = U_star)) +
geom_point(size = 5) +
scale_color_gradient2(low = "purple", high = "red") +
theme_bw() + theme(legend.key = element_blank()) +
ggtitle(expression(paste(theta[1], " = 0.7, ", theta[2], " = 0.9, ", "initial = payoff = $30"))) + theme(plot.title = element_text(hjust = 0.5))
df4 <- AdaptiveDesign(theta_1 = 0.4, theta_2 = 0.6, n_init = 30)
ggplot(data = df4, mapping = aes(x = Success, y = Failure)) +
geom_point(aes(colour = Decision)) +
theme_bw() + theme(legend.key = element_blank()) +
ggtitle(expression(paste(theta[1], " = 0.4, ", theta[2], " = 0.6, ", "initial = payoff = $30"))) + theme(plot.title = element_text(hjust = 0.5))
## Plot optimal expected utilities at each (i,j)
ggplot(data = df4, mapping = aes(x = Success, y = Failure, colour = U_star)) +
geom_point(size = 5) +
scale_color_gradient2(low = "purple", high = "red") +
theme_bw() + theme(legend.key = element_blank()) +
ggtitle(expression(paste(theta[1], " = 0.4, ", theta[2], " = 0.6, ", "initial = payoff = $30"))) + theme(plot.title = element_text(hjust = 0.5))
df5 <- AdaptiveDesign(theta_1 = 0.25, theta_2 = 0.75, n_init = 100)
ggplot(data = df5, mapping = aes(x = Success, y = Failure)) +
geom_point(aes(colour = Decision)) +
theme_bw() + theme(legend.key = element_blank()) +
ggtitle(expression(paste(theta[1], " = 0.25, ", theta[2], " = 0.75, ", "initial = payoff = $100"))) + theme(plot.title = element_text(hjust = 0.5))
## Plot optimal expected utilities at each (i,j)
ggplot(data = df5, mapping = aes(x = Success, y = Failure, colour = U_star)) +
geom_point(size = 5) +
scale_color_gradient2(low = "purple", high = "red") +
theme_bw() + theme(legend.key = element_blank()) +
ggtitle(expression(paste(theta[1], " = 0.25, ", theta[2], " = 0.75, ", "initial = payoff = $100"))) + theme(plot.title = element_text(hjust = 0.5))
This problem is an example in decision theory that uses Bayes updating on unknown parameters. The solution illustrate the efficiency of Dynamic Programming algorithm in terms of reducing computational time. For initial being 100 dollars, the computational speed is still very fast (~1 second).