Tic-tac-toe Monte Carlo

I have to admit I went a bit overboard for such a pointless project based on the most basic (barely) strategic game out there. Which is why frankly I’m writing this up quickly in order to get it off my plate and move on to other things. Maybe at some point I’ll expand to slightly more complicated games, leveraging at least some of the guts from this Tic-tac-toe simulation.

Anyway, how did it come to this? I’ve been playing some basic strategy games with my five year old daughter lately, and I thought “I should do a quick simulation of Tic-tac-toe in R!” Both as a personal challenge, as well as a way to show her something that would hopefully be cool.

One thing to note: I didn’t do any research. I’m sure this has been done a million times, but given that I was taking this on as a personal challenge, I didn’t want to see anyone else’s approach to the problem. I just wanted to think through it myself and develop my own approach.

Also, frankly, I stopped myself short of going fully down the rabbit hole (which is, again, why I’m writing this in haste now). I could have spent time making the players actually play strategically, but I didn’t (more on that later).

So what is this exactly?

The idea was to create a simple simulation of Tic-tac-toe to see… something. The idea was to capture data and look for patterns. I’m assuming we all know the game, but just in case: Tic-tac-toe is a 2 player game played on a 3x3 board which looks something like this:

board <- data.frame(matrix(nrow=3, ncol=3, dimnames = list(c("1","2","3"),c("1","2","3"))))
board[is.na(board)] <- 0
board

##   X1 X2 X3
## 1  0  0  0
## 2  0  0  0
## 3  0  0  0

Players take turns placing either an “x” (say for player 1) or an “o” (say for player 2) in any open cell. the object is to get 3 in a row either horizontally, vertically, or diagonally. The game continues until either player wins or we have a stalemate.

There are a few things to keep in mind: * We don’t know how many turns a game will last * We can’t let a player play in a “taken” cell * If all cells are taken and neither player has won, we have a stalemate * We want to run a bunch of successive games * We want to keep track of every play on every game * We want to keep track of who wins each game (or if the game ends in stalemate)

Lastly, our players will be equally skilled (or unskilled, rather). Their play is random and un-strategic. We could change this quite easily by making each player consider whether the other has 2 in a row, and blocking them with some probability. But I didn’t go this far. In any case, I thought if play is random and I run enough games, I should capture these basic defensive strategies.

Let’s see some code

Really, there’s nothing particularly exciting here. No cutting edge, complicated stuff. In fact it’s all straight up old school R, but we have to define a number of functions to make it work.

First we initialize some things and setup up some constants we need. Our player 1 will play a “1”, player 2 will play a “10”, P1 wins will be flagged as code 99, P2 as code 88, and stalemates as code 77.

all_games           <-list()
game_count          <-1
total_games         <-10000
p1win_code          <-99
p2win_code          <-88
stalemate_code      <-77
p1                  <-1
p2                  <-10

We’ll also set up a function to play a turn, and a function for updating the existing game board based on the current play.

The “play” function simply draws an integer uniformly from either 1, 2, or 3, for each x and y coordinates. Then it checks that we haven’t yet played that cell. If the cell is taken we try again, otherwise the function returns the coordinates for the play.

The “update_board” function updates the board based on the play coordinates and whether it’s P1 or P2’s turn.

play <- function(board){
  repeat{
    x<-sample.int(3,1,replace=TRUE)
    y<-sample.int(3,1,replace=TRUE)
  
    if (board[x,y]==0) {
      break
    } 
  }
  return(c(x,y))
}

update_board <- function(player, play){
  if (player==1) {
    board[play[1],play[2]] <- p1
  } else {
    board[play[1],play[2]] <- p2
  }
  return(board)
}

Next we set up a couple more functions to check for a win by either P1 or P2. All we do is sum across columns, rows, and diagonals. If the sum in either case is 3, we have P1 as the winner. If the sum is 30, we have P2 as the winner. I have 2 separate functions though if I thought about it a bit I could surely consolidate.

#check if P1 won  
winner_p1 <-function(board) {
  winner <- FALSE
  #check if we have a full column. 
  if (sum(board$X1) == p1*3 | sum(board$X2) == p1*3 | sum(board$X3) == p1*3) {
    winner <- TRUE
  #check if we have a full row. 
  } else if (sum(board[1,]) == p1*3 | sum(board[2,]) == p1*3 | sum(board[3,]) == p1*3) {
    winner <- TRUE
  #check if we have a full diag. 
  } else if (board[1,1]+board[2,2]+board[3,3] == p1*3 | board[1,3]+board[2,2]+board[3,1] == p1*3) {
    winner <- TRUE
  }
  return(winner)
}

#check if p2 won
winner_p2 <-function(board) {
  winner <- FALSE
  #check if we have a full column. 
  if (sum(board$X1) == p2*3 | sum(board$X2) == p2*3 | sum(board$X3) == p2*3) {
    winner <- TRUE
    #check if we have a full row. 
  } else if (sum(board[1,]) == p2*3 | sum(board[2,]) == p2*3 | sum(board[3,]) == p2*3) {
    winner <- TRUE
    #check if we have a full diag. 
  } else if (board[1,1]+board[2,2]+board[3,3] == p2*3 | board[1,3]+board[2,2]+board[3,1] == p2*3) {
    winner <- TRUE
  }
  return(winner)
}

And finally, we play some games. 10,000 of them. We initialize a new board for each game. During a single game we track each turn and save each play to a list. Once we get a win or a stalemate, we save the appropriate code to the same list. SO each game is a list of play coordinates ending in a win or stalemate code. We then store each game to a master list, which will have 10,000 elements, each representing a game.

set.seed(9999)
repeat {
  #initialize a bunch of stuff that needs to be refreshed each game
  board               <- data.frame(matrix(nrow=3, ncol=3, dimnames = list(c("1","2","3"),c("1","2","3"))))
  board[is.na(board)] <- 0
  current_game        <-list()
  turn_count          <-1
  
    repeat{
    if (turn_count > 2) {
      if (winner_p1(board)==TRUE) {
        current_game[[turn_count]] <- p1win_code
        break
      }  
      if (winner_p2(board)==TRUE) {
        current_game[[turn_count]] <- p2win_code
        break
      }
      if (!(0 %in% board$X1 | 0 %in% board$X2 | 0 %in% board$X3)) {
        current_game[[turn_count]] <- stalemate_code
        break
      }
    }
      
    if (turn_count/2==floor(turn_count/2)) {
        played<-play(board)
        board<-update_board(2,played)
      } else {
        played<-play(board)
        board<-update_board(1,played)
      }
        current_game[[turn_count]]<-played
        turn_count <- turn_count + 1
    }
all_games[[game_count]]<-current_game
if (game_count==total_games){break}
game_count<-game_count+1
}

Great so… now what?

Well we’ve now captured every move, along with the result, of 10,000 games of Tic-tac-toe. We can ask some basic questions such as, is there any advantage to playing first?

unlisted_games<-unlist(all_games)
p1wins        <-length(unlisted_games[unlisted_games==p1win_code])
p2wins        <-length(unlisted_games[unlisted_games==p2win_code])
ties          <-length(unlisted_games[unlisted_games==stalemate_code])
p1wins

## [1] 5963

p2wins

## [1] 2838

ties

## [1] 1199

It turns out P1 has a huge advantage. P1 wins 0.5963 percent of games, while P2 takes home only 0.2838 percent. 0.1199 percent of games end in a stalemate.

What’s the best first move for P1? We can look at the first move in all games won by P1 to see if any pattern exists.

#first we'll find all the games p1 won, and subset our list by those games
#then we get the first plays from each game
p1games   <-sapply(all_games, function(x){p1win_code %in% x})
p1wins    <-all_games[p1games]
firstplay <-sapply(p1wins, function(x){paste(unlist(x[1])[1],unlist(x[1])[2], sep="")})
table(firstplay)

## firstplay
##  11  12  13  21  22  23  31  32  33 
## 723 577 681 659 750 575 704 598 696

So there seems to be a small benefit to playing the middle cell at (2,2) first. We could conduct a hypothesis test to confirm statistical significance but… I’m going to leave that for another day. Either way the edge would be extremely small and pales in comparison to the benefit of simply being P1.

Is there a “best” first move for P2?

#first we'll find all the games p2 won, and subset our list by those games
#then we get the first plays from each game
p2games   <-sapply(all_games, function(x){p2win_code %in% x})
p2wins    <-all_games[p2games]
secplay   <-sapply(p2wins, function(x){paste(unlist(x[2])[1],unlist(x[2])[2], sep="")})
table(secplay)

## secplay
##  11  12  13  21  22  23  31  32  33 
## 332 251 376 234 468 254 348 254 321

Interestingly (or actually, maybe not interestingly at all), P2’s best move is to play (2,2) as well, assuming it hasn’t been played by P1. So what if we know P1 played (2,2), is there an optimal first move to counter with?

#we start from the games won by P2, then look for games where P1 played (2,2) first.
playfirst    <-sapply(p2wins, function(x){identical(x[[1]],as.integer(c(2,2)))})
p2wins       <-p2wins[playfirst]
secplay      <-sapply(p2wins, function(x){paste(unlist(x[2])[1],unlist(x[2])[2], sep="")})
table(secplay)

## secplay
## 11 12 13 21 23 31 32 33 
## 32 13 40 18 17 31 17 41

It turns our there are very few games won by P2 if P1 plays (2,2) on the first turn. In fact this occurs in < 1% of games.

Anyway we could go on like this but… I’m going to leave it here. Potential future things I will likely never get to? Creating a Shiny app that allows you to play, and making the players play strategically as opposed to picking plays randomly.

For now, if you’re playing Tic-tac-toe against a small child or a cat, and their play is pretty random, and you want to win, be player 1 and start in the middle. Any advantage discussed here would evaporate with “skilled” players, who could effectively force a stalemate in every game.

Tic-tac-toe Monte Carlo

Philippe Jette

January 5, 2019

So what is this exactly?

Let’s see some code

Great so… now what?