Smith is in jail and has 3 dollars; he can get out on bail if he has 8 dollars. A guard agrees to make a series of bets with him. If Smith bets x dollars, he wins x dollars with probability 0.4 and loses x dollars with probability 0.6.
Smith may choose either of the following two betting strategies. (a) He bets 1 dollar each time (timid strategy). (b) He bets, each time, as much as possible but not more than necessary to bring his fortune up to 8 dollars (bold strategy).
Determine which strategy gives Smith the better chance of getting out of jail.
Solution:
We can formulate this problem as a Markov chain of nine states, with each state representing the amount of dollars in Smith’s possession. States 0 (no money left to bet) and 8 (the game has been won) are said to be absorbing states.
State 0 - $0
State 1 - $1
State 2 - $2
State 3 - $3
State 4 - $4
State 5 - $5
State 6 - $6
State 7 - $7
State 8 - $8
According to the timid strategy, the one-step transition matrix Q:
q <- matrix(c(0, .4, 0, 0, 0, 0, 0, .6, 0, 0.4, 0, 0, 0, 0, 0, 0.6, 0, 0.4, 0, 0, 0, 0, 0, 0.6, 0, 0.4, 0, 0, 0, 0, 0, 0.6, 0, 0.4, 0, 0, 0, 0, 0, 0.6, 0, 0.4, 0, 0, 0, 0, 0, 0.6, 0), byrow=TRUE, nrow=7, ncol=7)
q
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 0.0 0.4 0.0 0.0 0.0 0.0 0.0
## [2,] 0.6 0.0 0.4 0.0 0.0 0.0 0.0
## [3,] 0.0 0.6 0.0 0.4 0.0 0.0 0.0
## [4,] 0.0 0.0 0.6 0.0 0.4 0.0 0.0
## [5,] 0.0 0.0 0.0 0.6 0.0 0.4 0.0
## [6,] 0.0 0.0 0.0 0.0 0.6 0.0 0.4
## [7,] 0.0 0.0 0.0 0.0 0.0 0.6 0.0
The canonical form:
r <- matrix(c(.6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.4), byrow=TRUE, nrow=7, ncol=2)
r
## [,1] [,2]
## [1,] 0.6 0.0
## [2,] 0.0 0.0
## [3,] 0.0 0.0
## [4,] 0.0 0.0
## [5,] 0.0 0.0
## [6,] 0.0 0.0
## [7,] 0.0 0.4
Using these two matrices we derive the probability of winning or losing from any starting position by solving \((I−Q)^{−1}R\). Doyle Theorem 11.6 (pg. 420)
solve(diag(7)-q) %*% r
## [,1] [,2]
## [1,] 0.9796987 0.02030135
## [2,] 0.9492466 0.05075337
## [3,] 0.9035686 0.09643140
## [4,] 0.8350515 0.16494845
## [5,] 0.7322760 0.26772403
## [6,] 0.5781126 0.42188739
## [7,] 0.3468676 0.65313243
From the starting position 1, the probability of winning is a mere 2%.
The probability of winning according to the bold strategy is a bit more straightforward to solve, and could be derived from simple probablity theory. However, given the initial state of 1 ($1), the only states that can ever be reached are Step 2 {0, 2}, Step 3 {0,4} and Step 4 {0,8}.
q3 <- matrix(c(0, .4, 0, 0, 0, 0, 0,
0, 0, 0, 0.4, 0, 0, 0,
0, 0, 0, 0, 0, 0.4, 0,
0, 0, 0, 0, 0, 0, 0,
0, 0.6, 0, 0, 0, 0, 0,
0, 0, 0, 0.6, 0, 0, 0,
0, 0, 0, 0, 0, 0.6, 0), byrow=TRUE, nrow=7, ncol=7)
q3
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 0 0.4 0 0.0 0 0.0 0
## [2,] 0 0.0 0 0.4 0 0.0 0
## [3,] 0 0.0 0 0.0 0 0.4 0
## [4,] 0 0.0 0 0.0 0 0.0 0
## [5,] 0 0.6 0 0.0 0 0.0 0
## [6,] 0 0.0 0 0.6 0 0.0 0
## [7,] 0 0.0 0 0.0 0 0.6 0
The canonical form:
r3 <- matrix(c(.6, .6, .6, .6, 0, 0, 0,
0, 0, 0, .4, .4, .4, 0.4), byrow=FALSE, nrow=7, ncol=2)
r3
## [,1] [,2]
## [1,] 0.6 0.0
## [2,] 0.6 0.0
## [3,] 0.6 0.0
## [4,] 0.6 0.4
## [5,] 0.0 0.4
## [6,] 0.0 0.4
## [7,] 0.0 0.4
Again solving \((I−Q)^{−1}R\):
solve(diag(7)-q3) %*% r3
## [,1] [,2]
## [1,] 0.936 0.064
## [2,] 0.840 0.160
## [3,] 0.744 0.256
## [4,] 0.600 0.400
## [5,] 0.504 0.496
## [6,] 0.360 0.640
## [7,] 0.216 0.784
This can be confirmed by simple binomial calculation.
.4^3
## [1] 0.064
The chance of reaching state 8 is (0.4)^3 = 0.064 or 6.4%
The bold strategy is the better strategy. Since the prisoner is more likely to lose than win, the preferable strategy is be to play as few times as necessary as the absorbing state of 0 is the dominant one.
The bold strategy requires one to win a streak of three times, while the timid strategy could go on indefinitely, but to win, must be played at least 8 times.
q2 <- matrix(c(0, 0.4, 0, 0, 0, 0, 0,
0, 0, 0, 0.4, 0, 0, 0,
0, 0.6, 0, 0.4, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0,
0, 0.6, 0, 0, 0, 0, 0,
0, 0, 0, 0.6, 0, 0, 0,
0, 0, 0, 0, 0, 0.6, 0), byrow=TRUE, nrow=7, ncol=7)
q2
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 0 0.4 0 0.0 0 0.0 0
## [2,] 0 0.0 0 0.4 0 0.0 0
## [3,] 0 0.6 0 0.4 0 0.0 0
## [4,] 0 0.0 0 0.0 0 0.0 0
## [5,] 0 0.6 0 0.0 0 0.0 0
## [6,] 0 0.0 0 0.6 0 0.0 0
## [7,] 0 0.0 0 0.0 0 0.6 0
r2 <- matrix(c(.6, .6, 0, 0.6, 0, 0, 0, 0, 0, 0, 0.4, 0.4, 0.4, 0.4), byrow=FALSE, nrow=7, ncol=2)
r2
## [,1] [,2]
## [1,] 0.6 0.0
## [2,] 0.6 0.0
## [3,] 0.0 0.0
## [4,] 0.6 0.4
## [5,] 0.0 0.4
## [6,] 0.0 0.4
## [7,] 0.0 0.4
solve(diag(7)-q2) %*% r2
## [,1] [,2]
## [1,] 0.936 0.064
## [2,] 0.840 0.160
## [3,] 0.744 0.256
## [4,] 0.600 0.400
## [5,] 0.504 0.496
## [6,] 0.360 0.640
## [7,] 0.216 0.784
With a win percentage of 6.4% percent, the prisoner should be indifferent between these two strategies as they are equally unlikely at 6.4%