CUNY 605 Week 10 - Markov Chains

Markov Chain - Prison Gambling

Smith is in jail and has 3 dollars; he can get out on bail if he has 8 dollars. A guard agrees to make a series of bets with him. If Smith bets x dollars, he wins x dollars with probability 0.4 and loses x dollars with probability 0.6.

Smith may choose either of the following two betting strategies. (a) He bets 1 dollar each time (timid strategy). (b) He bets, each time, as much as possible but not more than necessary to bring his fortune up to 8 dollars (bold strategy).

Determine which strategy gives Smith the better chance of getting out of jail.

Solution:

We can formulate this problem as a Markov chain of nine states, with each state representing the amount of dollars in Smith’s possession. States 0 (no money left to bet) and 8 (the game has been won) are said to be absorbing states.

State 0 - $0
State 1 - $1
State 2 - $2
State 3 - $3
State 4 - $4
State 5 - $5
State 6 - $6
State 7 - $7
State 8 - $8

According to the timid strategy, the one-step transition matrix Q:

q <- matrix(c(0, .4, 0, 0, 0, 0, 0, .6, 0, 0.4, 0, 0, 0, 0, 0, 0.6, 0, 0.4, 0, 0, 0, 0, 0, 0.6, 0, 0.4, 0, 0, 0, 0, 0, 0.6, 0, 0.4, 0, 0, 0, 0, 0, 0.6, 0, 0.4, 0, 0, 0, 0, 0, 0.6, 0), byrow=TRUE, nrow=7, ncol=7)
q

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,]  0.0  0.4  0.0  0.0  0.0  0.0  0.0
## [2,]  0.6  0.0  0.4  0.0  0.0  0.0  0.0
## [3,]  0.0  0.6  0.0  0.4  0.0  0.0  0.0
## [4,]  0.0  0.0  0.6  0.0  0.4  0.0  0.0
## [5,]  0.0  0.0  0.0  0.6  0.0  0.4  0.0
## [6,]  0.0  0.0  0.0  0.0  0.6  0.0  0.4
## [7,]  0.0  0.0  0.0  0.0  0.0  0.6  0.0

The canonical form:

r <- matrix(c(.6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.4), byrow=TRUE, nrow=7, ncol=2)
r

##      [,1] [,2]
## [1,]  0.6  0.0
## [2,]  0.0  0.0
## [3,]  0.0  0.0
## [4,]  0.0  0.0
## [5,]  0.0  0.0
## [6,]  0.0  0.0
## [7,]  0.0  0.4

Using these two matrices we derive the probability of winning or losing from any starting position by solving $(I−Q)^{−1}R$. Doyle Theorem 11.6 (pg. 420)

solve(diag(7)-q) %*% r

##           [,1]       [,2]
## [1,] 0.9796987 0.02030135
## [2,] 0.9492466 0.05075337
## [3,] 0.9035686 0.09643140
## [4,] 0.8350515 0.16494845
## [5,] 0.7322760 0.26772403
## [6,] 0.5781126 0.42188739
## [7,] 0.3468676 0.65313243

From the starting position 1, the probability of winning is a mere 2%.

The probability of winning according to the bold strategy is a bit more straightforward to solve, and could be derived from simple probablity theory. However, given the initial state of 1 ($1), the only states that can ever be reached are Step 2 {0, 2}, Step 3 {0,4} and Step 4 {0,8}.

q3 <- matrix(c(0, .4, 0, 0, 0, 0, 0, 
              0, 0, 0, 0.4, 0, 0, 0, 
              0, 0, 0, 0, 0, 0.4, 0, 
              0, 0, 0, 0, 0, 0, 0, 
              0, 0.6, 0, 0, 0, 0, 0, 
              0, 0, 0, 0.6, 0, 0, 0, 
              0, 0, 0, 0, 0, 0.6, 0), byrow=TRUE, nrow=7, ncol=7)
q3

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,]    0  0.4    0  0.0    0  0.0    0
## [2,]    0  0.0    0  0.4    0  0.0    0
## [3,]    0  0.0    0  0.0    0  0.4    0
## [4,]    0  0.0    0  0.0    0  0.0    0
## [5,]    0  0.6    0  0.0    0  0.0    0
## [6,]    0  0.0    0  0.6    0  0.0    0
## [7,]    0  0.0    0  0.0    0  0.6    0

The canonical form:

r3 <- matrix(c(.6, .6, .6, .6, 0, 0, 0, 
              0, 0, 0, .4, .4, .4, 0.4), byrow=FALSE, nrow=7, ncol=2)
r3

##      [,1] [,2]
## [1,]  0.6  0.0
## [2,]  0.6  0.0
## [3,]  0.6  0.0
## [4,]  0.6  0.4
## [5,]  0.0  0.4
## [6,]  0.0  0.4
## [7,]  0.0  0.4

Again solving $(I−Q)^{−1}R$:

solve(diag(7)-q3) %*% r3

##       [,1]  [,2]
## [1,] 0.936 0.064
## [2,] 0.840 0.160
## [3,] 0.744 0.256
## [4,] 0.600 0.400
## [5,] 0.504 0.496
## [6,] 0.360 0.640
## [7,] 0.216 0.784

This can be confirmed by simple binomial calculation.

.4^3

## [1] 0.064

The chance of reaching state 8 is (0.4)^3 = 0.064 or 6.4%

The bold strategy is the better strategy. Since the prisoner is more likely to lose than win, the preferable strategy is be to play as few times as necessary as the absorbing state of 0 is the dominant one.

The bold strategy requires one to win a streak of three times, while the timid strategy could go on indefinitely, but to win, must be played at least 8 times.

With the situation in Exercise 13, consider the strategy such that for i < 4, Smith bets min(i, 4-i), and for i >= 4, he bets according to the bold strategy, where i is his current fortune. Find the probability that he gets out of jail using this strategy. How does this probability compare with that obtained for the bold strategy?

q2 <- matrix(c(0, 0.4, 0, 0, 0, 0, 0, 
              0, 0, 0, 0.4, 0, 0, 0, 
              0, 0.6, 0, 0.4, 0, 0, 0, 
              0, 0, 0, 0, 0, 0, 0, 
              0, 0.6, 0, 0, 0, 0, 0, 
              0, 0, 0, 0.6, 0, 0, 0, 
              0, 0, 0, 0, 0, 0.6, 0), byrow=TRUE, nrow=7, ncol=7)
q2

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,]    0  0.4    0  0.0    0  0.0    0
## [2,]    0  0.0    0  0.4    0  0.0    0
## [3,]    0  0.6    0  0.4    0  0.0    0
## [4,]    0  0.0    0  0.0    0  0.0    0
## [5,]    0  0.6    0  0.0    0  0.0    0
## [6,]    0  0.0    0  0.6    0  0.0    0
## [7,]    0  0.0    0  0.0    0  0.6    0

r2 <- matrix(c(.6, .6, 0, 0.6, 0, 0, 0, 0, 0, 0, 0.4, 0.4, 0.4, 0.4), byrow=FALSE, nrow=7, ncol=2)
r2

##      [,1] [,2]
## [1,]  0.6  0.0
## [2,]  0.6  0.0
## [3,]  0.0  0.0
## [4,]  0.6  0.4
## [5,]  0.0  0.4
## [6,]  0.0  0.4
## [7,]  0.0  0.4

solve(diag(7)-q2) %*% r2

##       [,1]  [,2]
## [1,] 0.936 0.064
## [2,] 0.840 0.160
## [3,] 0.744 0.256
## [4,] 0.600 0.400
## [5,] 0.504 0.496
## [6,] 0.360 0.640
## [7,] 0.216 0.784

With a win percentage of 6.4% percent, the prisoner should be indifferent between these two strategies as they are equally unlikely at 6.4%

CUNY 605 Week 10 - Markov Chains

Zachary Herold

April 8, 2019

Markov Chain - Prison Gambling