This example is drawn from the book “Reinforcement Learning: An Introduction”, p. 89-90.

The Problem

Jack manages two locations for a nationwide car rental company. Each day, some number of customers arrive at each location to rent cars. If Jack has a car available, he rents it out and is credited $10 by the national company. If he is out of cars at that location, then the business is lost. Cars become available for renting the day after they are returned. To help ensure that cars are available where they are needed, Jack can move them between two locations overnight, at a cost of $2 per car moved. We assume that the number of cars requested and returned at each location are Poisson random variables, meaning that the probability that the number is \(n\) is \(\frac{\lambda^{n}}{n!}e^{-\lambda}\), where \(\lambda\) is the expected number. Suppose \(\lambda\) is 3 and 4 for rental requests at the first and second locations and 3 and 2 for returns. To simplify the problem sightly, we assume that there can be no more than 20 cars at each location (any additional cars are returned to the nationwide company, and thus disappear from the problem) and a maximum of five cars can be moved from on one location to the other in one night. We take the discount rate to be \(\gamma=0.9\) and formulate this as a continuing finite MDP, where the time steps are days, the state is the number of cars at each location at the end of the day, and the actions are the net numbers of cars moved between the two locations overnight.

Analysis

Let’s analyze this problem. Assume that the state of the problem is \(S_t=\{N_{t,1},N_{t,2}\}\), where \(N_{t,i}\) is the number of cars at the end of the day \(t\) at location \(i=1,2\). The action space is \(a_t=\{\max(-5,-N_{t,2}),\dots,\min(5,N_{t,1})\}\), which means we move \(a\) cars from location 1 to the location 2 over night.

Since the number of car request and return is independent and \[N_{t+1,1}-N_{t,1}-a_t=p_1-q_1\] \[N_{t+1,2}-N_{t,2}+a_t=p_1-q_1\] where \(q_i\) and \(p_i\) is the number of car request and return for location \(i\) respectively, the joint distribution for each location is: \[Pr_{t,1}(q)=\frac{3^{2q+N_{t+1,1}-N_{t,1}-a_t}}{q!(q+N_{t+1,1}-N_{t,1}-a_t)!}e^{-6}\] \[Pr_{t,2}(q)=\frac{2^{3q+N_{t+1,2}-N_{t,2}+a_t}}{q!(q+N_{t+1,2}-N_{t,2}+a_t)!}e^{-6}\]

Then the joint distribution for both location is: \[Pr\{S_{t+1}|S_t,q_1,q_2,a_t\}=\frac{3^{2q_{1}+N_{t+1,1}-N_{t,1}-a_t}* 2^{3q_{2}+N_{t+1,2}-N_{t,2}+a_t}}{q_{1}!q_{2}!(q_{1}+N_{t+1,1}-N_{t,1}-a_t)!(q_2+N_{t+1,2}-N_{t,2}+a_t)!}e^{-12}\]

The reward is \[R\{S_{t+1}|S_t,q_1,q_2,a_t\}=10(q_1+q_2)-2a_t\]

Also note that we have inequality: \[q_1 \leq N_{t,1}-a_t\]

\[q_2 \leq N_{t,2}+a_t\]

Coding

Let’s initialize the policy uniformly and all the state-value as zero.

maxCar <- 10
maxMove <- 5
V <- rep(0,(maxCar+1)^2)
tmp <- 2*maxMove + 1  # the policy state.
init.policy <- matrix(1/tmp,nrow = length(V),ncol = tmp)  # the policy for each state
value.a <- -5:5
gamma <- 0.9

The calculation of state-value function:

cons <- exp(-12)

frac <- function(t){
  if(t == 0 || t == 1) 1 else t*frac(t-1)
}

compute.v <- function(policy,lastV){
  tV <- rep(0,length(lastV))
  tpv <- matrix(0,nrow=nrow(policy),ncol=ncol(policy))
  for(i in 0:maxCar) for(j in 0:maxCar){
    curState <- c(i,j)
    pv <- rep(0,ncol(policy))
    curInd <- (maxCar+1) * curState[1] + curState[2] + 1
    for(ta in 1:ncol(policy)){
      Ev <- 0
      a <- value.a[ta]
      if(a >= - curState[2] && a <= curState[1] && curState[1] - a <= maxCar && curState[2] + a <= maxCar){
        for(ntp11 in 0:maxCar) for(ntp12 in 0:maxCar){
          newInd <- (maxCar+1)*ntp11 + ntp12 + 1
          sum1 <- 0
          for(q1 in 0:(curState[1]-a)) for(q2 in 0:(curState[2]+a)){
            t1 <- q1 + ntp11 - curState[1] - a
            t2 <- q2 + ntp12 - curState[2] + a
            if(t1 >= 0 && t2 >= 0){
              frac.q1 <- frac(q1)
              frac.q2 <- frac(q2)
              frac.t1 <- frac(t1)
              frac.t2 <- frac(t2)
              p <- (3^(t1+q1))*(2^(t2+2*q2))*cons/(frac.q1*frac.q2*frac.t1*frac.t2)
              reward <- 10*(q1 + q2) - 2*a + gamma * lastV[newInd]
              sum1 <- sum1 + p * reward
            }
          }
          Ev <- Ev + sum1
        }
        pv[ta] <- Ev
      }
    }
    tV[curInd] <- sum(policy[curInd]*pv)
    tpv[curInd,] <- pv
  }
  list(V=tV,pv=tpv)
}

The calculation of policy: \[\pi_{t+1}(a|s)=softmax{v_{\pi_t}(s,a)}\]

compute.policy <- function(V,pv){
  t(apply(pv,1,function(r){
    er <- exp(r)
    er / sum(er)
  }))
}

The whole process:

policy <- init.policy
maxR <- 100
epsilon <- 1e-4
r <- 1
dis <- 100
while(r < maxR && dis > epsilon){
  res <- compute.v(policy,V)
  nextV <- res$V
  pv <- res$pv
  policy <- compute.policy(nextV,pv)
  
  dis <- sum(abs(nextV-V))
  V <- nextV
  
  print(paste("round",r))
  print("V:")
  print(V)
  print("policy:")
  pp <- as.array(apply(policy,1,function(r){
    m <- which(r==max(r))
    if(length(m) == 1) as.character(value.a[m])
    else 'Multi'
  }))
  for(i in 0:maxCar) for(j in 0:maxCar){
    names(pp)[i*(maxCar+1) + j + 1] <- paste0('[',i,',',j,']')
  }
  print(pp)
  
  r <- r + 1
}
## [1] "round 1"
## [1] "V:"
##   [1]  0.000000000  0.005861829  0.045828897  0.176680356  0.452702697
##   [6]  0.873642288  1.360956657  1.787607648  2.037766897  2.054576814
##  [11]  1.853767748  0.003913114  0.044443015  0.211259006  0.631945631
##  [16]  1.387266080  2.424358293  3.544590086  4.480258437  5.006311083
##  [21]  5.020040776  4.393493518  0.025564108  0.165207250  0.607927222
##  [26]  1.560177600  3.099248954  5.069682545  7.103128042  8.751309390
##  [31]  9.654325287  9.505171913  7.886472742  0.080698405  0.398562979
##  [36]  1.262033116  2.941274904  5.476436169  8.579372299 11.692955127
##  [41] 14.175042162 15.416670814 14.684058879 11.534734116  0.168398056
##  [46]  0.714365994  2.063492994  4.527397499  8.094078171 12.345618509
##  [51] 16.549174186 19.845177732 21.148675891 19.395669674 14.584232369
##  [56]  0.265635492  1.029901197  2.814205052  5.954607078 10.395183932
##  [61] 15.619116480 20.754060387 24.662753770 25.698074527 22.831814500
##  [66] 16.686841183  0.341432217  1.256932783  3.327781981  6.902759709
##  [71] 11.903977250 17.756990966 23.487952276 27.669440389 28.297023725
##  [76] 24.634051469 17.740865343  0.374456509  1.342695460  3.501357421
##  [81]  7.201402601 12.331448079 18.215358683 23.789374710 27.559181678
##  [86] 27.672270680 23.753019865 16.923730465  0.360258290  1.281216764
##  [91]  3.330231392  6.795339822 11.424370222 16.434539815 20.871738790
##  [96] 23.581593438 23.253071545 19.716473122 13.798909777  0.308504902
## [101]  1.103152729  2.831437410  5.594929098  9.014745628 12.423424053
## [106] 15.217250107 16.737192969 16.203174266 13.460656325  8.924828783
## [111]  0.236448037  0.814531301  1.956176712  3.604754941  5.466028966
## [116]  7.186962852  8.518428044  9.153417300  8.643758106  6.782440852
## [121]  3.655152223
## [1] "policy:"
##   [0,0]   [0,1]   [0,2]   [0,3]   [0,4]   [0,5]   [0,6]   [0,7]   [0,8] 
## "Multi"     "0"    "-1"    "-1"    "-1"    "-1"    "-2"    "-2"    "-2" 
##   [0,9]  [0,10]   [1,0]   [1,1]   [1,2]   [1,3]   [1,4]   [1,5]   [1,6] 
##    "-2"    "-2"     "0"     "0"     "0"    "-1"    "-1"    "-1"    "-1" 
##   [1,7]   [1,8]   [1,9]  [1,10]   [2,0]   [2,1]   [2,2]   [2,3]   [2,4] 
##    "-1"    "-1"    "-1"    "-1"     "1"     "0"     "0"     "0"     "0" 
##   [2,5]   [2,6]   [2,7]   [2,8]   [2,9]  [2,10]   [3,0]   [3,1]   [3,2] 
##    "-1"    "-1"    "-1"    "-1"    "-1"    "-1"     "1"     "1"     "0" 
##   [3,3]   [3,4]   [3,5]   [3,6]   [3,7]   [3,8]   [3,9]  [3,10]   [4,0] 
##     "0"     "0"     "0"    "-1"    "-1"    "-1"    "-1"    "-1"     "1" 
##   [4,1]   [4,2]   [4,3]   [4,4]   [4,5]   [4,6]   [4,7]   [4,8]   [4,9] 
##     "1"     "1"     "0"     "0"     "0"     "0"    "-1"    "-1"    "-1" 
##  [4,10]   [5,0]   [5,1]   [5,2]   [5,3]   [5,4]   [5,5]   [5,6]   [5,7] 
##    "-1"     "1"     "1"     "1"     "1"     "0"     "0"     "0"     "0" 
##   [5,8]   [5,9]  [5,10]   [6,0]   [6,1]   [6,2]   [6,3]   [6,4]   [6,5] 
##    "-1"    "-1"     "0"     "1"     "1"     "1"     "1"     "0"     "0" 
##   [6,6]   [6,7]   [6,8]   [6,9]  [6,10]   [7,0]   [7,1]   [7,2]   [7,3] 
##     "0"     "0"     "0"     "0"     "0"     "1"     "1"     "1"     "1" 
##   [7,4]   [7,5]   [7,6]   [7,7]   [7,8]   [7,9]  [7,10]   [8,0]   [8,1] 
##     "1"     "0"     "0"     "0"     "0"     "0"     "0"     "1"     "1" 
##   [8,2]   [8,3]   [8,4]   [8,5]   [8,6]   [8,7]   [8,8]   [8,9]  [8,10] 
##     "1"     "1"     "0"     "0"     "0"     "0"    "-1"    "-1"     "0" 
##   [9,0]   [9,1]   [9,2]   [9,3]   [9,4]   [9,5]   [9,6]   [9,7]   [9,8] 
##     "1"     "1"     "1"     "1"     "0"     "0"     "0"    "-1"    "-1" 
##   [9,9]  [9,10]  [10,0]  [10,1]  [10,2]  [10,3]  [10,4]  [10,5]  [10,6] 
##    "-1"    "-1"     "1"     "1"     "1"     "0"     "0"     "0"     "0" 
##  [10,7]  [10,8]  [10,9] [10,10] 
##     "0"     "0"     "0"     "0" 
## [1] "round 2"
## [1] "V:"
##   [1] 1.546237e-04 7.551310e-03 5.191278e-02 1.619612e-01 2.284616e-01
##   [6] 1.237326e-01 3.654282e-02 1.060585e-02 4.965413e-03 4.942730e-03
##  [11] 9.820523e-03 4.947574e-03 4.933453e-02 1.806848e-01 1.990781e-01
##  [16] 3.280120e-02 1.269461e-03 4.554810e-05 3.714818e-06 1.001908e-06
##  [21] 1.050113e-06 3.753931e-06 2.823470e-02 1.491185e-01 1.952238e-01
##  [26] 2.032309e-02 3.133671e-04 9.618630e-07 2.491702e-09 3.399663e-11
##  [31] 4.291577e-12 5.204329e-12 5.018055e-11 8.026095e-02 2.215208e-01
##  [36] 4.606552e-02 3.005124e-04 3.115340e-07 3.195772e-10 2.096501e-13
##  [41] 6.060162e-16 3.882463e-17 5.775154e-17 1.545781e-15 1.447884e-01
##  [46] 1.571966e-01 4.796144e-03 4.688975e-06 4.032005e-10 5.601107e-14
##  [51] 5.908237e-17 1.959474e-19 9.072131e-21 1.611481e-20 7.202739e-19
##  [56] 1.908039e-01 7.468143e-02 4.377842e-04 1.360895e-07 4.299215e-12
##  [61] 1.463555e-16 7.162821e-20 6.347500e-22 5.215355e-23 9.171441e-23
##  [66] 1.897481e-21 2.099552e-01 4.120532e-02 8.805168e-05 1.064613e-08
##  [71] 3.360088e-13 8.104301e-18 2.110800e-21 1.239488e-23 9.931467e-25
##  [76] 1.367357e-24 2.525147e-23 2.146480e-01 3.433634e-02 5.961365e-05
##  [81] 6.134980e-09 2.159831e-13 6.436803e-18 1.747282e-21 1.032052e-23
##  [86] 8.064725e-25 1.152153e-24 2.286278e-23 2.131707e-01 4.347017e-02
##  [91] 1.258694e-04 2.521233e-08 1.292347e-12 5.075554e-17 2.051596e-20
##  [96] 1.209781e-22 6.389596e-24 1.072960e-23 3.166702e-22 2.028487e-01
## [101] 7.407315e-02 6.086274e-04 3.822687e-07 3.350072e-11 3.151163e-15
## [106] 2.487786e-18 6.340665e-21 2.230833e-22 4.156032e-22 2.582830e-20
## [111] 1.772839e-01 1.293198e-01 3.838719e-03 8.186026e-06 2.467425e-09
## [116] 1.146984e-12 3.874476e-15 1.233023e-16 2.709474e-17 3.090723e-17
## [121] 1.618083e-16
## [1] "policy:"
##   [0,0]   [0,1]   [0,2]   [0,3]   [0,4]   [0,5]   [0,6]   [0,7]   [0,8] 
##     "0"     "0"    "-1"    "-1"    "-1"    "-1"    "-1"    "-2"    "-2" 
##   [0,9]  [0,10]   [1,0]   [1,1]   [1,2]   [1,3]   [1,4]   [1,5]   [1,6] 
##    "-2"    "-2"     "0"     "0"     "0"    "-1"    "-1"    "-1"    "-1" 
##   [1,7]   [1,8]   [1,9]  [1,10]   [2,0]   [2,1]   [2,2]   [2,3]   [2,4] 
##    "-1"    "-1"    "-1"    "-1"     "1"     "0"     "0"     "0"    "-1" 
##   [2,5]   [2,6]   [2,7]   [2,8]   [2,9]  [2,10]   [3,0]   [3,1]   [3,2] 
##    "-1"    "-1"    "-1"    "-1"    "-1"    "-1"     "1"     "1"     "0" 
##   [3,3]   [3,4]   [3,5]   [3,6]   [3,7]   [3,8]   [3,9]  [3,10]   [4,0] 
##     "0"     "0"     "0"    "-1"    "-1"    "-1"    "-1"    "-1"     "1" 
##   [4,1]   [4,2]   [4,3]   [4,4]   [4,5]   [4,6]   [4,7]   [4,8]   [4,9] 
##     "1"     "1"     "0"     "0"     "0"     "0"    "-1"    "-1"    "-1" 
##  [4,10]   [5,0]   [5,1]   [5,2]   [5,3]   [5,4]   [5,5]   [5,6]   [5,7] 
##     "0"     "1"     "1"     "1"     "0"     "0"     "0"     "0"     "0" 
##   [5,8]   [5,9]  [5,10]   [6,0]   [6,1]   [6,2]   [6,3]   [6,4]   [6,5] 
##    "-1"     "0"     "0"     "1"     "1"     "1"     "1"     "0"     "0" 
##   [6,6]   [6,7]   [6,8]   [6,9]  [6,10]   [7,0]   [7,1]   [7,2]   [7,3] 
##     "0"     "0"     "0"     "0"     "0"     "1"     "1"     "1"     "1" 
##   [7,4]   [7,5]   [7,6]   [7,7]   [7,8]   [7,9]  [7,10]   [8,0]   [8,1] 
##     "0"     "0"     "0"     "0"    "-1"     "0"     "0"     "1"     "1" 
##   [8,2]   [8,3]   [8,4]   [8,5]   [8,6]   [8,7]   [8,8]   [8,9]  [8,10] 
##     "1"     "1"     "0"     "0"     "0"    "-1"    "-1"    "-1"     "0" 
##   [9,0]   [9,1]   [9,2]   [9,3]   [9,4]   [9,5]   [9,6]   [9,7]   [9,8] 
##     "1"     "1"     "1"     "0"     "0"     "0"    "-1"    "-1"    "-1" 
##   [9,9]  [9,10]  [10,0]  [10,1]  [10,2]  [10,3]  [10,4]  [10,5]  [10,6] 
##    "-1"    "-1"     "1"     "1"     "1"     "0"     "0"     "0"     "0" 
##  [10,7]  [10,8]  [10,9] [10,10] 
##     "0"     "0"     "0"     "0" 
## [1] "round 3"
## [1] "V:"
##   [1] 5.865297e-06 5.860276e-03 4.329632e-02 1.339388e-01 1.685925e-01
##   [6] 7.310859e-02 1.759422e-02 4.465591e-03 1.997669e-03 2.073483e-03
##  [11] 4.561565e-03 3.941005e-03 4.217620e-02 1.503729e-01 1.353135e-01
##  [16] 1.441291e-02 3.348218e-04 7.421735e-06 4.198002e-07 9.490671e-08
##  [21] 1.005475e-07 4.443308e-07 2.496085e-02 1.298006e-01 1.405472e-01
##  [26] 9.174648e-03 7.313741e-05 8.690253e-08 8.995011e-11 5.901005e-13
##  [31] 4.974904e-14 6.203121e-14 9.604812e-13 7.320353e-02 1.858324e-01
##  [36] 2.638991e-02 8.349004e-05 3.207087e-08 8.943477e-12 1.305619e-15
##  [41] 1.229476e-18 4.329273e-20 6.895780e-20 3.780247e-18 1.332107e-01
##  [46] 1.226018e-01 2.398300e-03 8.609002e-07 1.780471e-11 4.457119e-16
##  [51] 6.749745e-20 4.292109e-23 9.034522e-25 1.637826e-24 7.178281e-23
##  [56] 1.755072e-01 5.502181e-02 2.046708e-04 2.339183e-08 1.040677e-13
##  [61] 4.288236e-19 2.187322e-23 2.205135e-26 5.137898e-28 5.379033e-28
##  [66] 1.069068e-26 1.931296e-01 2.976940e-02 3.996637e-05 2.016512e-09
##  [71] 7.630302e-15 1.420429e-20 3.099466e-25 1.652435e-28 2.514088e-30
##  [76] 1.945059e-30 5.225409e-29 1.984537e-01 2.516481e-02 2.800396e-05
##  [81] 1.276783e-09 5.827979e-15 1.179628e-20 2.719846e-25 1.268045e-28
##  [86] 1.692437e-30 1.941307e-30 6.606470e-29 1.990651e-01 3.313119e-02
##  [91] 6.465142e-05 5.909171e-09 3.893147e-14 1.572851e-19 6.287420e-24
##  [96] 1.478434e-27 1.616821e-29 3.962063e-29 3.723414e-27 1.917519e-01
## [101] 5.953544e-02 3.508201e-04 9.626910e-08 1.524387e-12 2.515007e-17
## [106] 1.037552e-21 1.286487e-25 1.882205e-27 5.303484e-27 1.596383e-24
## [111] 1.695150e-01 1.111985e-01 2.485689e-03 2.462076e-06 2.347264e-10
## [116] 3.175669e-14 2.960351e-17 2.947423e-19 2.852382e-20 2.455331e-20
## [121] 1.843568e-19
## [1] "policy:"
##   [0,0]   [0,1]   [0,2]   [0,3]   [0,4]   [0,5]   [0,6]   [0,7]   [0,8] 
##     "0"     "0"    "-1"    "-1"    "-1"    "-1"    "-2"    "-2"    "-2" 
##   [0,9]  [0,10]   [1,0]   [1,1]   [1,2]   [1,3]   [1,4]   [1,5]   [1,6] 
##    "-2"    "-2"     "0"     "0"     "0"    "-1"    "-1"    "-1"    "-1" 
##   [1,7]   [1,8]   [1,9]  [1,10]   [2,0]   [2,1]   [2,2]   [2,3]   [2,4] 
##    "-1"    "-1"    "-1"    "-1"     "1"     "0"     "0"     "0"     "0" 
##   [2,5]   [2,6]   [2,7]   [2,8]   [2,9]  [2,10]   [3,0]   [3,1]   [3,2] 
##    "-1"    "-1"    "-1"    "-1"    "-1"    "-1"     "1"     "1"     "0" 
##   [3,3]   [3,4]   [3,5]   [3,6]   [3,7]   [3,8]   [3,9]  [3,10]   [4,0] 
##     "0"     "0"     "0"    "-1"    "-1"    "-1"    "-1"    "-1"     "1" 
##   [4,1]   [4,2]   [4,3]   [4,4]   [4,5]   [4,6]   [4,7]   [4,8]   [4,9] 
##     "1"     "1"     "0"     "0"     "0"     "0"    "-1"    "-1"    "-1" 
##  [4,10]   [5,0]   [5,1]   [5,2]   [5,3]   [5,4]   [5,5]   [5,6]   [5,7] 
##    "-1"     "1"     "1"     "1"     "1"     "0"     "0"     "0"     "0" 
##   [5,8]   [5,9]  [5,10]   [6,0]   [6,1]   [6,2]   [6,3]   [6,4]   [6,5] 
##    "-1"    "-1"     "0"     "1"     "1"     "1"     "1"     "0"     "0" 
##   [6,6]   [6,7]   [6,8]   [6,9]  [6,10]   [7,0]   [7,1]   [7,2]   [7,3] 
##     "0"     "0"     "0"     "0"     "0"     "1"     "1"     "1"     "1" 
##   [7,4]   [7,5]   [7,6]   [7,7]   [7,8]   [7,9]  [7,10]   [8,0]   [8,1] 
##     "1"     "0"     "0"     "0"     "0"     "0"     "0"     "1"     "1" 
##   [8,2]   [8,3]   [8,4]   [8,5]   [8,6]   [8,7]   [8,8]   [8,9]  [8,10] 
##     "1"     "1"     "0"     "0"     "0"     "0"    "-1"    "-1"     "0" 
##   [9,0]   [9,1]   [9,2]   [9,3]   [9,4]   [9,5]   [9,6]   [9,7]   [9,8] 
##     "1"     "1"     "1"     "1"     "0"     "0"     "0"    "-1"    "-1" 
##   [9,9]  [9,10]  [10,0]  [10,1]  [10,2]  [10,3]  [10,4]  [10,5]  [10,6] 
##    "-1"    "-1"     "1"     "1"     "1"     "0"     "0"     "0"     "0" 
##  [10,7]  [10,8]  [10,9] [10,10] 
##     "0"     "0"     "0"     "0" 
## [1] "round 4"
## [1] "V:"
##   [1] 4.670858e-06 5.860084e-03 4.370197e-02 1.402298e-01 2.004948e-01
##   [6] 1.094291e-01 3.250796e-02 9.480268e-03 4.456318e-03 4.452304e-03
##  [11] 8.878981e-03 3.937346e-03 4.249504e-02 1.584433e-01 1.752821e-01
##  [16] 2.886525e-02 1.116157e-03 4.005111e-05 3.269537e-06 8.828289e-07
##  [21] 9.263225e-07 3.344705e-06 2.502788e-02 1.337352e-01 1.744509e-01
##  [26] 1.800313e-02 2.756107e-04 8.408594e-07 2.167729e-09 2.948789e-11
##  [31] 3.714927e-12 4.511656e-12 4.419489e-11 7.382823e-02 2.025070e-01
##  [36] 4.151830e-02 2.669133e-04 2.730887e-07 2.776785e-10 1.809913e-13
##  [41] 5.198984e-16 3.315661e-17 4.940709e-17 1.346161e-15 1.355842e-01
##  [46] 1.449836e-01 4.336468e-03 4.171941e-06 3.530766e-10 4.842286e-14
##  [51] 5.065168e-17 1.669015e-19 7.675224e-21 1.363939e-20 6.198216e-19
##  [56] 1.804814e-01 6.920424e-02 3.958833e-04 1.206439e-07 3.763418e-12
##  [61] 1.263126e-16 6.115299e-20 5.376353e-22 4.382685e-23 7.692550e-23
##  [66] 1.613204e-21 1.998925e-01 3.833333e-02 7.973291e-05 9.405988e-09
##  [71] 2.929617e-13 6.987581e-18 1.798952e-21 1.046467e-23 8.305504e-25
##  [76] 1.138652e-24 2.123782e-23 2.053759e-01 3.207297e-02 5.413469e-05
##  [81] 5.425098e-09 1.880876e-13 5.551878e-18 1.489512e-21 8.706619e-24
##  [86] 6.726957e-25 9.541975e-25 1.904387e-23 2.048430e-01 4.078882e-02
##  [91] 1.147849e-04 2.244432e-08 1.137701e-12 4.421007e-17 1.765980e-20
##  [96] 1.029650e-22 5.363187e-24 8.907205e-24 2.630409e-22 1.956940e-01
## [101] 6.985792e-02 5.617984e-04 3.479238e-07 3.027471e-11 2.815964e-15
## [106] 2.197735e-18 5.530544e-21 1.911502e-22 3.507298e-22 2.168898e-20
## [111] 1.716421e-01 1.238108e-01 3.636661e-03 7.704426e-06 2.304906e-09
## [116] 1.060086e-12 3.536398e-15 1.108300e-16 2.390062e-17 2.678096e-17
## [121] 1.388891e-16
## [1] "policy:"
##   [0,0]   [0,1]   [0,2]   [0,3]   [0,4]   [0,5]   [0,6]   [0,7]   [0,8] 
##     "0"     "0"    "-1"    "-1"    "-1"    "-1"    "-2"    "-2"    "-2" 
##   [0,9]  [0,10]   [1,0]   [1,1]   [1,2]   [1,3]   [1,4]   [1,5]   [1,6] 
##    "-2"    "-2"     "0"     "0"     "0"    "-1"    "-1"    "-1"    "-1" 
##   [1,7]   [1,8]   [1,9]  [1,10]   [2,0]   [2,1]   [2,2]   [2,3]   [2,4] 
##    "-1"    "-1"    "-1"    "-1"     "1"     "0"     "0"     "0"     "0" 
##   [2,5]   [2,6]   [2,7]   [2,8]   [2,9]  [2,10]   [3,0]   [3,1]   [3,2] 
##    "-1"    "-1"    "-1"    "-1"    "-1"    "-1"     "1"     "1"     "0" 
##   [3,3]   [3,4]   [3,5]   [3,6]   [3,7]   [3,8]   [3,9]  [3,10]   [4,0] 
##     "0"     "0"     "0"    "-1"    "-1"    "-1"    "-1"    "-1"     "1" 
##   [4,1]   [4,2]   [4,3]   [4,4]   [4,5]   [4,6]   [4,7]   [4,8]   [4,9] 
##     "1"     "1"     "0"     "0"     "0"     "0"    "-1"    "-1"    "-1" 
##  [4,10]   [5,0]   [5,1]   [5,2]   [5,3]   [5,4]   [5,5]   [5,6]   [5,7] 
##    "-1"     "1"     "1"     "1"     "1"     "0"     "0"     "0"     "0" 
##   [5,8]   [5,9]  [5,10]   [6,0]   [6,1]   [6,2]   [6,3]   [6,4]   [6,5] 
##    "-1"    "-1"     "0"     "1"     "1"     "1"     "1"     "0"     "0" 
##   [6,6]   [6,7]   [6,8]   [6,9]  [6,10]   [7,0]   [7,1]   [7,2]   [7,3] 
##     "0"     "0"     "0"     "0"     "0"     "1"     "1"     "1"     "1" 
##   [7,4]   [7,5]   [7,6]   [7,7]   [7,8]   [7,9]  [7,10]   [8,0]   [8,1] 
##     "1"     "0"     "0"     "0"     "0"     "0"     "0"     "1"     "1" 
##   [8,2]   [8,3]   [8,4]   [8,5]   [8,6]   [8,7]   [8,8]   [8,9]  [8,10] 
##     "1"     "1"     "0"     "0"     "0"     "0"    "-1"    "-1"     "0" 
##   [9,0]   [9,1]   [9,2]   [9,3]   [9,4]   [9,5]   [9,6]   [9,7]   [9,8] 
##     "1"     "1"     "1"     "1"     "0"     "0"     "0"    "-1"    "-1" 
##   [9,9]  [9,10]  [10,0]  [10,1]  [10,2]  [10,3]  [10,4]  [10,5]  [10,6] 
##    "-1"    "-1"     "1"     "1"     "1"     "0"     "0"     "0"     "0" 
##  [10,7]  [10,8]  [10,9] [10,10] 
##     "0"     "0"     "0"     "0" 
## [1] "round 5"
## [1] "V:"
##   [1] 5.296722e-06 5.865874e-03 4.372604e-02 1.402973e-01 2.006319e-01
##   [6] 1.095362e-01 3.254063e-02 9.488334e-03 4.459253e-03 4.454417e-03
##  [11] 8.881811e-03 3.941377e-03 4.251762e-02 1.585273e-01 1.754584e-01
##  [16] 2.891722e-02 1.118694e-03 4.014040e-05 3.275518e-06 8.840031e-07
##  [21] 9.271638e-07 3.346664e-06 2.504059e-02 1.337967e-01 1.746228e-01
##  [26] 1.803753e-02 2.762863e-04 8.433264e-07 2.174023e-09 2.955922e-11
##  [31] 3.721512e-12 4.516979e-12 4.422732e-11 7.385579e-02 2.026335e-01
##  [36] 4.158366e-02 2.675918e-04 2.739204e-07 2.784985e-10 1.814951e-13
##  [41] 5.210997e-16 3.321217e-17 4.946149e-17 1.347069e-15 1.356315e-01
##  [46] 1.451225e-01 4.344849e-03 4.183043e-06 3.541732e-10 4.856588e-14
##  [51] 5.076964e-17 1.671949e-19 7.684882e-21 1.365042e-20 6.200961e-19
##  [56] 1.805471e-01 6.928803e-02 3.967727e-04 1.209821e-07 3.773808e-12
##  [61] 1.266316e-16 6.127310e-20 5.383218e-22 4.386160e-23 7.696143e-23
##  [66] 1.613559e-21 1.999666e-01 3.838181e-02 7.991603e-05 9.433683e-09
##  [71] 2.937235e-13 7.001639e-18 1.801668e-21 1.047478e-23 8.309972e-25
##  [76] 1.138994e-24 2.124100e-23 2.054465e-01 3.211050e-02 5.424930e-05
##  [81] 5.439864e-09 1.885333e-13 5.560689e-18 1.491195e-21 8.712565e-24
##  [86] 6.729240e-25 9.543759e-25 1.904567e-23 2.049020e-01 4.082886e-02
##  [91] 1.149903e-04 2.249471e-08 1.139683e-12 4.426364e-17 1.767484e-20
##  [96] 1.030072e-22 5.363997e-24 8.908004e-24 2.630564e-22 1.957371e-01
## [101] 6.990932e-02 5.625634e-04 3.484681e-07 3.031027e-11 2.818609e-15
## [106] 2.198977e-18 5.531161e-21 1.911588e-22 3.507390e-22 2.168947e-20
## [111] 1.716695e-01 1.238706e-01 3.639975e-03 7.711887e-06 2.306904e-09
## [116] 1.060861e-12 3.538245e-15 1.108655e-16 2.390469e-17 2.678311e-17
## [121] 1.388938e-16
## [1] "policy:"
##   [0,0]   [0,1]   [0,2]   [0,3]   [0,4]   [0,5]   [0,6]   [0,7]   [0,8] 
##     "0"     "0"    "-1"    "-1"    "-1"    "-1"    "-2"    "-2"    "-2" 
##   [0,9]  [0,10]   [1,0]   [1,1]   [1,2]   [1,3]   [1,4]   [1,5]   [1,6] 
##    "-2"    "-2"     "0"     "0"     "0"    "-1"    "-1"    "-1"    "-1" 
##   [1,7]   [1,8]   [1,9]  [1,10]   [2,0]   [2,1]   [2,2]   [2,3]   [2,4] 
##    "-1"    "-1"    "-1"    "-1"     "1"     "0"     "0"     "0"     "0" 
##   [2,5]   [2,6]   [2,7]   [2,8]   [2,9]  [2,10]   [3,0]   [3,1]   [3,2] 
##    "-1"    "-1"    "-1"    "-1"    "-1"    "-1"     "1"     "1"     "0" 
##   [3,3]   [3,4]   [3,5]   [3,6]   [3,7]   [3,8]   [3,9]  [3,10]   [4,0] 
##     "0"     "0"     "0"    "-1"    "-1"    "-1"    "-1"    "-1"     "1" 
##   [4,1]   [4,2]   [4,3]   [4,4]   [4,5]   [4,6]   [4,7]   [4,8]   [4,9] 
##     "1"     "1"     "0"     "0"     "0"     "0"    "-1"    "-1"    "-1" 
##  [4,10]   [5,0]   [5,1]   [5,2]   [5,3]   [5,4]   [5,5]   [5,6]   [5,7] 
##    "-1"     "1"     "1"     "1"     "1"     "0"     "0"     "0"     "0" 
##   [5,8]   [5,9]  [5,10]   [6,0]   [6,1]   [6,2]   [6,3]   [6,4]   [6,5] 
##    "-1"    "-1"     "0"     "1"     "1"     "1"     "1"     "0"     "0" 
##   [6,6]   [6,7]   [6,8]   [6,9]  [6,10]   [7,0]   [7,1]   [7,2]   [7,3] 
##     "0"     "0"     "0"     "0"     "0"     "1"     "1"     "1"     "1" 
##   [7,4]   [7,5]   [7,6]   [7,7]   [7,8]   [7,9]  [7,10]   [8,0]   [8,1] 
##     "1"     "0"     "0"     "0"     "0"     "0"     "0"     "1"     "1" 
##   [8,2]   [8,3]   [8,4]   [8,5]   [8,6]   [8,7]   [8,8]   [8,9]  [8,10] 
##     "1"     "1"     "0"     "0"     "0"     "0"    "-1"    "-1"     "0" 
##   [9,0]   [9,1]   [9,2]   [9,3]   [9,4]   [9,5]   [9,6]   [9,7]   [9,8] 
##     "1"     "1"     "1"     "1"     "0"     "0"     "0"    "-1"    "-1" 
##   [9,9]  [9,10]  [10,0]  [10,1]  [10,2]  [10,3]  [10,4]  [10,5]  [10,6] 
##    "-1"    "-1"     "1"     "1"     "1"     "0"     "0"     "0"     "0" 
##  [10,7]  [10,8]  [10,9] [10,10] 
##     "0"     "0"     "0"     "0" 
## [1] "round 6"
## [1] "V:"
##   [1] 5.300668e-06 5.865870e-03 4.372495e-02 1.402839e-01 2.005741e-01
##   [6] 1.094782e-01 3.252103e-02 9.483162e-03 4.457292e-03 4.452975e-03
##  [11] 8.879869e-03 3.941389e-03 4.251666e-02 1.585087e-01 1.753789e-01
##  [16] 2.888868e-02 1.117201e-03 4.008568e-05 3.271717e-06 8.832301e-07
##  [21] 9.265944e-07 3.345310e-06 2.504034e-02 1.337860e-01 1.745490e-01
##  [26] 1.801964e-02 2.759065e-04 8.418416e-07 2.170107e-09 2.951359e-11
##  [31] 3.717181e-12 4.513375e-12 4.420472e-11 7.385356e-02 2.025902e-01
##  [36] 4.155120e-02 2.672230e-04 2.734513e-07 2.780166e-10 1.811797e-13
##  [41] 5.203264e-16 3.317546e-17 4.942453e-17 1.346434e-15 1.356233e-01
##  [46] 1.450622e-01 4.340562e-03 4.176882e-06 3.535424e-10 4.848194e-14
##  [51] 5.069836e-17 1.670068e-19 7.678503e-21 1.364295e-20 6.199077e-19
##  [56] 1.805303e-01 6.924883e-02 3.963127e-04 1.207964e-07 3.767750e-12
##  [61] 1.264416e-16 6.120056e-20 5.378968e-22 4.383918e-23 7.693791e-23
##  [66] 1.613332e-21 1.999440e-01 3.835846e-02 7.982023e-05 9.418532e-09
##  [71] 2.932825e-13 6.993150e-18 1.800004e-21 1.046850e-23 8.307126e-25
##  [76] 1.138773e-24 2.123895e-23 2.054238e-01 3.209219e-02 5.418878e-05
##  [81] 5.431731e-09 1.882751e-13 5.555302e-18 1.490150e-21 8.708831e-24
##  [86] 6.727778e-25 9.542602e-25 1.904450e-23 2.048834e-01 4.080927e-02
##  [91] 1.148811e-04 2.246668e-08 1.138507e-12 4.423035e-17 1.766538e-20
##  [96] 1.029805e-22 5.363475e-24 8.907477e-24 2.630462e-22 1.957248e-01
## [101] 6.988442e-02 5.621550e-04 3.481595e-07 3.028850e-11 2.816949e-15
## [106] 2.198193e-18 5.530785e-21 1.911535e-22 3.507330e-22 2.168914e-20
## [111] 1.716632e-01 1.238428e-01 3.638212e-03 7.707560e-06 2.305676e-09
## [116] 1.060376e-12 3.537080e-15 1.108430e-16 2.390208e-17 2.678171e-17
## [121] 1.388907e-16
## [1] "policy:"
##   [0,0]   [0,1]   [0,2]   [0,3]   [0,4]   [0,5]   [0,6]   [0,7]   [0,8] 
##     "0"     "0"    "-1"    "-1"    "-1"    "-1"    "-2"    "-2"    "-2" 
##   [0,9]  [0,10]   [1,0]   [1,1]   [1,2]   [1,3]   [1,4]   [1,5]   [1,6] 
##    "-2"    "-2"     "0"     "0"     "0"    "-1"    "-1"    "-1"    "-1" 
##   [1,7]   [1,8]   [1,9]  [1,10]   [2,0]   [2,1]   [2,2]   [2,3]   [2,4] 
##    "-1"    "-1"    "-1"    "-1"     "1"     "0"     "0"     "0"     "0" 
##   [2,5]   [2,6]   [2,7]   [2,8]   [2,9]  [2,10]   [3,0]   [3,1]   [3,2] 
##    "-1"    "-1"    "-1"    "-1"    "-1"    "-1"     "1"     "1"     "0" 
##   [3,3]   [3,4]   [3,5]   [3,6]   [3,7]   [3,8]   [3,9]  [3,10]   [4,0] 
##     "0"     "0"     "0"    "-1"    "-1"    "-1"    "-1"    "-1"     "1" 
##   [4,1]   [4,2]   [4,3]   [4,4]   [4,5]   [4,6]   [4,7]   [4,8]   [4,9] 
##     "1"     "1"     "0"     "0"     "0"     "0"    "-1"    "-1"    "-1" 
##  [4,10]   [5,0]   [5,1]   [5,2]   [5,3]   [5,4]   [5,5]   [5,6]   [5,7] 
##    "-1"     "1"     "1"     "1"     "1"     "0"     "0"     "0"     "0" 
##   [5,8]   [5,9]  [5,10]   [6,0]   [6,1]   [6,2]   [6,3]   [6,4]   [6,5] 
##    "-1"    "-1"     "0"     "1"     "1"     "1"     "1"     "0"     "0" 
##   [6,6]   [6,7]   [6,8]   [6,9]  [6,10]   [7,0]   [7,1]   [7,2]   [7,3] 
##     "0"     "0"     "0"     "0"     "0"     "1"     "1"     "1"     "1" 
##   [7,4]   [7,5]   [7,6]   [7,7]   [7,8]   [7,9]  [7,10]   [8,0]   [8,1] 
##     "1"     "0"     "0"     "0"     "0"     "0"     "0"     "1"     "1" 
##   [8,2]   [8,3]   [8,4]   [8,5]   [8,6]   [8,7]   [8,8]   [8,9]  [8,10] 
##     "1"     "1"     "0"     "0"     "0"     "0"    "-1"    "-1"     "0" 
##   [9,0]   [9,1]   [9,2]   [9,3]   [9,4]   [9,5]   [9,6]   [9,7]   [9,8] 
##     "1"     "1"     "1"     "1"     "0"     "0"     "0"    "-1"    "-1" 
##   [9,9]  [9,10]  [10,0]  [10,1]  [10,2]  [10,3]  [10,4]  [10,5]  [10,6] 
##    "-1"    "-1"     "1"     "1"     "1"     "0"     "0"     "0"     "0" 
##  [10,7]  [10,8]  [10,9] [10,10] 
##     "0"     "0"     "0"     "0" 
## [1] "round 7"
## [1] "V:"
##   [1] 5.299209e-06 5.865858e-03 4.372490e-02 1.402838e-01 2.005738e-01
##   [6] 1.094779e-01 3.252096e-02 9.483147e-03 4.457286e-03 4.452971e-03
##  [11] 8.879865e-03 3.941379e-03 4.251661e-02 1.585084e-01 1.753784e-01
##  [16] 2.888855e-02 1.117195e-03 4.008547e-05 3.271703e-06 8.832276e-07
##  [21] 9.265927e-07 3.345306e-06 2.504030e-02 1.337859e-01 1.745484e-01
##  [26] 1.801953e-02 2.759045e-04 8.418354e-07 2.170092e-09 2.951343e-11
##  [31] 3.717167e-12 4.513364e-12 4.420466e-11 7.385348e-02 2.025897e-01
##  [36] 4.155098e-02 2.672208e-04 2.734486e-07 2.780141e-10 1.811784e-13
##  [41] 5.203236e-16 3.317534e-17 4.942442e-17 1.346433e-15 1.356232e-01
##  [46] 1.450616e-01 4.340530e-03 4.176843e-06 3.535386e-10 4.848146e-14
##  [51] 5.069799e-17 1.670060e-19 7.678480e-21 1.364292e-20 6.199071e-19
##  [56] 1.805300e-01 6.924850e-02 3.963092e-04 1.207951e-07 3.767713e-12
##  [61] 1.264405e-16 6.120015e-20 5.378947e-22 4.383908e-23 7.693782e-23
##  [66] 1.613331e-21 1.999438e-01 3.835828e-02 7.981950e-05 9.418422e-09
##  [71] 2.932796e-13 6.993099e-18 1.799994e-21 1.046847e-23 8.307112e-25
##  [76] 1.138772e-24 2.123894e-23 2.054235e-01 3.209204e-02 5.418833e-05
##  [81] 5.431673e-09 1.882734e-13 5.555270e-18 1.490144e-21 8.708810e-24
##  [86] 6.727771e-25 9.542596e-25 1.904449e-23 2.048832e-01 4.080912e-02
##  [91] 1.148803e-04 2.246649e-08 1.138500e-12 4.423015e-17 1.766532e-20
##  [96] 1.029803e-22 5.363472e-24 8.907475e-24 2.630462e-22 1.957247e-01
## [101] 6.988423e-02 5.621521e-04 3.481575e-07 3.028837e-11 2.816939e-15
## [106] 2.198189e-18 5.530784e-21 1.911535e-22 3.507330e-22 2.168914e-20
## [111] 1.716631e-01 1.238426e-01 3.638200e-03 7.707533e-06 2.305669e-09
## [116] 1.060373e-12 3.537073e-15 1.108428e-16 2.390207e-17 2.678171e-17
## [121] 1.388907e-16
## [1] "policy:"
##   [0,0]   [0,1]   [0,2]   [0,3]   [0,4]   [0,5]   [0,6]   [0,7]   [0,8] 
##     "0"     "0"    "-1"    "-1"    "-1"    "-1"    "-2"    "-2"    "-2" 
##   [0,9]  [0,10]   [1,0]   [1,1]   [1,2]   [1,3]   [1,4]   [1,5]   [1,6] 
##    "-2"    "-2"     "0"     "0"     "0"    "-1"    "-1"    "-1"    "-1" 
##   [1,7]   [1,8]   [1,9]  [1,10]   [2,0]   [2,1]   [2,2]   [2,3]   [2,4] 
##    "-1"    "-1"    "-1"    "-1"     "1"     "0"     "0"     "0"     "0" 
##   [2,5]   [2,6]   [2,7]   [2,8]   [2,9]  [2,10]   [3,0]   [3,1]   [3,2] 
##    "-1"    "-1"    "-1"    "-1"    "-1"    "-1"     "1"     "1"     "0" 
##   [3,3]   [3,4]   [3,5]   [3,6]   [3,7]   [3,8]   [3,9]  [3,10]   [4,0] 
##     "0"     "0"     "0"    "-1"    "-1"    "-1"    "-1"    "-1"     "1" 
##   [4,1]   [4,2]   [4,3]   [4,4]   [4,5]   [4,6]   [4,7]   [4,8]   [4,9] 
##     "1"     "1"     "0"     "0"     "0"     "0"    "-1"    "-1"    "-1" 
##  [4,10]   [5,0]   [5,1]   [5,2]   [5,3]   [5,4]   [5,5]   [5,6]   [5,7] 
##    "-1"     "1"     "1"     "1"     "1"     "0"     "0"     "0"     "0" 
##   [5,8]   [5,9]  [5,10]   [6,0]   [6,1]   [6,2]   [6,3]   [6,4]   [6,5] 
##    "-1"    "-1"     "0"     "1"     "1"     "1"     "1"     "0"     "0" 
##   [6,6]   [6,7]   [6,8]   [6,9]  [6,10]   [7,0]   [7,1]   [7,2]   [7,3] 
##     "0"     "0"     "0"     "0"     "0"     "1"     "1"     "1"     "1" 
##   [7,4]   [7,5]   [7,6]   [7,7]   [7,8]   [7,9]  [7,10]   [8,0]   [8,1] 
##     "1"     "0"     "0"     "0"     "0"     "0"     "0"     "1"     "1" 
##   [8,2]   [8,3]   [8,4]   [8,5]   [8,6]   [8,7]   [8,8]   [8,9]  [8,10] 
##     "1"     "1"     "0"     "0"     "0"     "0"    "-1"    "-1"     "0" 
##   [9,0]   [9,1]   [9,2]   [9,3]   [9,4]   [9,5]   [9,6]   [9,7]   [9,8] 
##     "1"     "1"     "1"     "1"     "0"     "0"     "0"    "-1"    "-1" 
##   [9,9]  [9,10]  [10,0]  [10,1]  [10,2]  [10,3]  [10,4]  [10,5]  [10,6] 
##    "-1"    "-1"     "1"     "1"     "1"     "0"     "0"     "0"     "0" 
##  [10,7]  [10,8]  [10,9] [10,10] 
##     "0"     "0"     "0"     "0"