A Draw from Polya’s Urn

With Polya’s Urn (named after George Polya) we remove a marble from an urn and then return it to the urn along with another marble of the same color. Let’s simulate a few draws from an urn with 3 red marbles, 2 green marble and 1 blue marble. Try to understand what each line of code is doing. If you’re unsure of what the sample function does, type ?sample into the console (this works with all functions). If you’re unsure of what kind of an object “draw” is, just type draw into the console after creating it. Note that your marble counts may well be different from the ones shown here since this is a random process and you might have drawn a different marble than I did.

marble.counts <- c(3,2,1)
num.colors <- length(marble.counts)

# drawing from the urn with probabilities determined by the number of
# marbles of each color
draw <- sample(1:num.colors, 1, prob=marble.counts, replace=FALSE)


# replacing the marble and adding one of the same color
marble.counts[draw] <-marble.counts[draw] + 1
marble.counts
## [1] 3 3 1

Many Draws from Polya’s Urn

Now, let’s make 100 draws and see how it plays out. The code looks the same, except that to repeat the drawing and replacing 100 times, we’ll use a for loop.

marble.counts <- c(3,2,1)
num.colors <- length(marble.counts)

for (i in 1:100){
  draw <- sample(1:num.colors, 1, prob=marble.counts, replace=FALSE)
  marble.counts[draw] <-marble.counts[draw] + 1
}

marble.counts
## [1] 99  5  2

Moran Process

Now, just like yesterday, let’s both add and remove marbles (“births” and “deaths” of marbles) where both the probability of each type/color being born and later dying depends on its frequency in the population. This is called a Moran process, after Patrick Moran.

marble.counts <- c(3,2,1)
num.colors <- length(marble.counts)

for (i in 1:100){
  
  # Birth
  draw <- sample(1:num.colors, 1, prob=marble.counts, replace=FALSE)
  marble.counts[draw] <-marble.counts[draw] + 1
  
  # Death
  draw <- sample(1:num.colors, 1, prob=marble.counts, replace=FALSE)
  marble.counts[draw] <-marble.counts[draw] - 1
}

marble.counts
## [1] 6 0 0

Try this Moran process several times, what happens?

Let’s try to see the Moran process in action by plotting the population of marbles as it changes in time.

marble.counts <- c(3,2,1)
num.colors <- length(marble.counts)
cycles <- 100
plot(rep(0,num.colors), marble.counts, cex=0.5, xlim=c(0,cycles), ylim=c(0,sum(marble.counts)), col=2:(num.colors+1), pch=19, xlab="cycles complete", ylab="number of marbles")


for (i in 1:cycles){
  
  # Birth
  draw <- sample(1:num.colors, 1, prob=marble.counts, replace=FALSE)
  marble.counts[draw] <-marble.counts[draw] + 1
  
  # Death
  draw <- sample(1:num.colors, 1, prob=marble.counts, replace=FALSE)
  marble.counts[draw] <-marble.counts[draw] - 1
  
  #add data points
  points(rep(i,num.colors), marble.counts, cex=0.5, col=2:(num.colors+1), pch=19)
}

Try running this code a few times and see the graphs it produces.

Moran Function

Finally, let’s write a function to create such graphs and allows the user to choose the starting numbers of marbles of different colors and the number of cycles. Copy and paste this entire code, into the console to create the function.

moran.function <- function(marble.counts, cycles){
  num.colors <- length(marble.counts)

plot(rep(0,num.colors), marble.counts, cex=0.5, xlim=c(0,cycles), ylim=c(0,sum(marble.counts)), col=2:(num.colors+1), pch=19, xlab="cycles complete", ylab="number of marbles")


for (i in 1:cycles){
  
  # Birth
  draw <- sample(1:num.colors, 1, prob=marble.counts, replace=FALSE)
  marble.counts[draw] <-marble.counts[draw] + 1
  
  # Death
  draw <- sample(1:num.colors, 1, prob=marble.counts, replace=FALSE)
  marble.counts[draw] <-marble.counts[draw] - 1
  
  #add data points
  points(rep(i,num.colors), marble.counts, cex=0.5, col=2:(num.colors+1), pch=19)
}
return(marble.counts)
}

Now, let’s try it out! You can try your own combinations but don’t go past 10,000 or so cycles or else you might be waiting a while.

moran.function(marble.counts=c(3,2,1), cycles=50)

moran.function(marble.counts=c(1,1), cycles=10)

moran.function(marble.counts=c(10,10), cycles=500)

moran.function(marble.counts=c(100,100), cycles=5000)

Gambler’s Ruin

If we’re uncomfortable with the cycle of life, we could think of the Moran process as a game. Red, green and blue are gambling. Red starts with 3 chips, blue with 2 and green with 1. In each round (what we used to refer to as a birth and a death), one of the players might gain a chip from one of the other players but the total number of chips stays the same. The gamblers intend to play until all players but one have lost their entire bankroll (gone extinct, if you will). The question, which was solved by dutch mathematician Christiaan Huygens, is how each player’s chance of winning depends on their number of chips. Let’s modify our code to play out the gambler’s ruin and see if we can find the answer. [Note: Huygens solved this problem in 1657, prior to the creation of R.]

Instead of use a for loop which runs through a specific number of cycles, we can use a while loop that keeps going until a certain condition is met. This code keeps going until one player has all the chips (or, equivalently, all the marbles are one color) and then tells you the winner.

marble.counts <- c(3,2,1)
num.colors <- length(marble.counts)


while(max(marble.counts) < sum(marble.counts)){
  
  # Birth
  draw <- sample(1:num.colors, 1, prob=marble.counts, replace=FALSE)
  marble.counts[draw] <-marble.counts[draw] + 1
  
  # Death
  draw <- sample(1:num.colors, 1, prob=marble.counts, replace=FALSE)
  marble.counts[draw] <-marble.counts[draw] - 1

}
which(marble.counts == sum(marble.counts))
## [1] 2

Now, let’s run this code 100 times, each time keeping track of the winner:

num.sims <- 100
winners <- rep(NA, num.sims)

for (i in 1:100){
  marble.counts <- c(3,2,1)
  num.colors <- length(marble.counts)
while(max(marble.counts) < sum(marble.counts)){
  
  # Birth
  draw <- sample(1:num.colors, 1, prob=marble.counts, replace=FALSE)
  marble.counts[draw] <-marble.counts[draw] + 1
  
  # Death
  draw <- sample(1:num.colors, 1, prob=marble.counts, replace=FALSE)
  marble.counts[draw] <-marble.counts[draw] - 1

}
  winners[i] <- which(marble.counts == sum(marble.counts))
  
}

Now, let’s look at how often each color won:

table(winners)
## winners
##  1  2  3 
## 44 36 20

Try this function with different numbers of marbles and see if you can solve the gambler’s ruin problem.

A second question of possible interest might be how long it will take until one player has all the chips. Let’s run this function again, but instead of keeping track of who wins, we’ll keep track of how long the game takes.

num.sims <- 100
number.cycles <- rep(NA, num.sims)

for (i in 1:100){
  marble.counts <- c(3,2,1)
  num.colors <- length(marble.counts)
  nm <- 0
while(max(marble.counts) < sum(marble.counts)){
  
  # Birth
  draw <- sample(1:num.colors, 1, prob=marble.counts, replace=FALSE)
  marble.counts[draw] <-marble.counts[draw] + 1
  
  # Death
  draw <- sample(1:num.colors, 1, prob=marble.counts, replace=FALSE)
  marble.counts[draw] <-marble.counts[draw] - 1
  nm <- nm+1

}
  number.cycles[i] <- nm
  
}

And then, we can look at how long the 100 games took on average and even look at the standard deviation in game length and plot the game lengths:

mean(number.cycles)
median(number.cycles)
sd(number.cycles)
hist(number.cycles)

Try this with different numbers of chips or with different starting distributions of chips and see what you can say about how long this game takes to play out. Note: Don’t use more than about 100 chips or else you may be waiting for a while.