Problem Statement from The Riddler

Suppose you’re playing a match at the U.S. Open, and you’re slightly better than the competition: your chances of winning any given point are exactly 55 percent. (Yes, most players are more likely to win the points they serve, but we’re simplifying things a bit.) What are your chances of winning a three-set match, as played by the women, or a five-set match, as played by the men? And what are your chances of winning the whole tournament (seven consecutive matches)?

If you’re not familiar with the scoring system in tennis, the first to 4 points wins a game (as long as they’ve won 2 more points than their opponent), the first to six games wins a set (as long as they’ve won two more games), and the first to two sets (for women) or three sets (for men) wins the match. If at any point a set is tied at six games apiece, that set is decided by a tiebreaker, in which the first to 7 points wins (with the same 2-point margin rule applying here).

Part I: Simulation

Through simulation, we’ll get a feel for the exact probability that we are trying to obtain. Our first goal is to simulate a game. We’ll do this using a function that will take the probability of winning a point as an input, use a binomial distribution to generate results of each point, and then output the result of the entire game.

We’ll build it so that it can also take a boolean input ‘tiebreak’ that will have a default value of FALSE which we can set true if it is a set tie breaker game.

## Inputs: p, the probability that you win a point
##         tBreak, TRUE if a set tie breaker game, FALSE otherwise. 
tennisGame <- function(p, tBreak=FALSE){ 
  ## Let's make sure the probability is between 0 and 1. 
  if(p<0 | p>1){return("input is not a probability between 0 and 1")}
  ## Let's say we are player A and our opponent is player B. We begin 
  ## zero points each. 
  pntsA <- 0
  pntsB <- 0
  ## The total game points we are trying to get if 4 unles it is a 
  ## set tiebreaker, in which case it is 7. 
  gamept <- ifelse(tBreak,7,4)  
  ## Let's play points while our points are less than 4 OR  
  ## the difference in our points is less than 2. 
  while(max(pntsA,pntsB)<gamept || abs(pntsA-pntsB)<2){ 
    # Each point is dictated by a random binomial distribution.
    outcome <- rbinom(1,1,p)
    ## Our points increase depending on that outcome. 
    pntsA <- pntsA + outcome
    pntsB <- pntsB + (1-outcome)
  }
  ## Return 1 if we win, 0 if we lose. 
  ifelse(pntsA>pntsB,return(1),return(0))
}

tennisGame(0.25) ## Will most likely be 0.

## [1] 0

tennisGame(0.75) ## Will most likely be 1.

## [1] 1

Now, let’s simulate 10,000 games with known point probability 0.55 to see what the resulting probability of winning a tennis game might be.

set.seed(20190910)
G <- 0
for(i in 1:10000){
  G <- G + tennisGame(0.55)
}
G/10000

## [1] 0.6285

Gtb <- 0
for(i in 1:10000){
  Gtb <- Gtb + tennisGame(0.55, tBreak = TRUE)
}
Gtb/10000

## [1] 0.6488

We’re looking for something around 62.85% as the probability of winning a single regular tennis game, and 65.88% as the probability of winning a tie-breaker.

Now, let’s simulate a set in tennis. Again, we’ll use the a function and a similar structure to the tennisGame function. However, we need to be careful when dealing with tie breaking games.

## Only input is the probability of winning a point, p.
tennisSet <- function(p){
  if(p<0 | p>1){return("input is not a probability between 0 and 1")}
  gmsA <- 0
  gmsB <- 0
  ## While the maximum number of games either of us has won is less
  ## than 6, we must keep playing games. 
  while(max(gmsA,gmsB)<6){
    outcome <- tennisGame(p)
    gmsA <- gmsA + outcome
    gmsB <- gmsB + (1-outcome)
  }
  ## Now that one of us has won 6 games, is the difference between
  ## the number of games we have won greater that 1? If so, who won?
  if(abs(gmsA-gmsB)>1){ifelse(gmsA>gmsB,return(1),return(0))}
  ## If not, we need to play another game. 
  outcome <- tennisGame(p)
  gmsA <- gmsA + outcome
  gmsB <- gmsB + (1-outcome)
  ## Now that this game is done, are we tied 6-6, or did one of use
  ## win 7-5?  If the game is tied 6-6, then we play a tie breaker 
  ## game, so we set the tBreak variable to TRUE. 
  ifelse(abs(gmsA-gmsB)>1,
         ifelse(gmsA>gmsB,return(1),return(0)),
         tennisGame(p,TRUE))
}

tennisSet(0.25) ## should be 0

## [1] 0

tennisSet(0.75) ## should be 1.

## [1] 1

Again, let’s simulate this set 10,000 times to get an empirical probability that we will attempt to attain theoretically.

set.seed(6285)
S <- 0
for(i in 1:10000){
  S <- S + tennisSet(0.55)
}
S/10000

## [1] 0.8096

Look’s like the probability of winning a set will be close to 80.96%. Next up we have a tennis match. This function is a little easier to build as there will be fewer cases. We’ll build it to take the probability of winning an individual point and whether the player is male or female as inputs.

## Inputs: the probability of winning a point, p
##      and, whether or not the player is male or female (boolean)
tennisMatch <- function(p, male=TRUE){
  if(p<0 | p>1){return("input is not a probability between 0 and 1")}
  stsA <- 0
  stsB <- 0
  ## we have a max of 3 sets if male, and 2 sets if female. 
  stmax <- ifelse(male,3,2)
  ## Let's play sets until one of us is at the set maximum. 
  while(max(stsA,stsB)<stmax){ 
    outcome <- tennisSet(p)
    stsA <- stsA + outcome
    stsB <- stsB + (1-outcome)
  }
  ifelse(stsA==stmax,return(1),return(0))
}

tennisMatch(0.25) # should be 0

## [1] 0

tennisMatch(0.75) # should be 1.

## [1] 1

Last simulation! Let’s simulate 10,000 tennis matches to get an empirical probability of winning a tennis match when the probability of an individual point is 0.55. We’ll do this once for the males and once for the females.

set.seed(8096)
M <- 0 
for(i in 1:10000){
  M <- M+tennisMatch(0.55)
}
M/10000

## [1] 0.9501

Mf <- 0
for(i in 1:10000){
  Mf <- Mf + tennisMatch(0.55, male = FALSE)
}
Mf/10000

## [1] 0.9091

The probability of a male winning the tennis match is going to be close to 95.01% and for a female, it will be close to 90.91%. To find the probability of winning the entire tournament, we raise these values to the 7th power as we have to win 7 matches in a row to win the tournament.

The Theoretical Probabilities.

Once we have the probabiliity of winning a tennis match, \(p_m\), then the probability of winning the entire tournament will be \[ p_t = (p_m)^7 \label{pt} \] To calculate \(p_m\), we need the probability of winning a set, \(p_s\). Since winning a match requires males to win 3 sets and females to win 2 sets, this holds a negative binomial distribution with parameters \(r=3\) (\(r=2\) for females) and \(p = p_s\). This requires you sum up the probabilities your opponent has 0, 1, and 2 wins before you get your 3rd (or 0 and 1 win before you get your second for females). To illustrate, we’ll use .8096 as the probability of winning a set.

pnbinom(2,3,0.8096) ## Is this close to .9501?

## [1] 0.9491878

pnbinom(1,2,0.8096) ## Is this close to .9091?

## [1] 0.9050483

Once we know \(p_s\) exactly, we’ll come back and plug this in. To find \(p_s\), we’ll need both \(p_g\), the probability of winning a regular game, and \(p_{tbg}\), the probability of winning a tie-breaking game.

The probability of winning six games before the opponent wins 0-4 is again a negative binomial with \(r=6\) and \(p=p_g\). However, there is also the possibility of arriving at a score of 5-5 by either the opponent or myself getting to 5 games last. (Note: the only way to arrive at a 6-6 split is via a 5-5 split). Again, we’ll use a negative binomial with \(r=5\) and \(p=p_g\), but multiply this by 2.

Once at a 5-5 split, we can win the set by winning two games in a row with probability \((p_g)^2\), or we can get to a 6-6 split with probability \(2p_g(1-p_g)\) and then win the tie-breaker with probability \(p_{tbg}\).

To test this theory, let’s plug this all in using simulated empirical values.

## If the theory is correct, then the value below should be close to .8096.
pnbinom(4,6,0.6285)+2*dnbinom(5,5,.6285)*(.6285^2+2*.6285*(1-.6285)*.6488)

## [1] 0.8246078

Well, its close, but not as close as I would like. This is probably due to the fact we’re using empirical estimate a lot in that calculation.

The probability of winning an individual game is next. This will require a gambler’s ruin type of calculation. To win a game we can either get to 4 points before our opponent arrives at 3, or by winning by more than 1 point after arriving at a 3-3 split (called a score of 40-40, or deuce in actual tennis).

The first value is again a negative binomial probability with \(r=4\), \(p=0.55\), and summing over 0, 1, and 2 wins by the opponent.

The second value is trickier. Let \(P\) be the probability of winning when the current score is tied up (deuce). Let \(P_{-1}\) be the probability of winning when the opponent has the advantage and \(P_1\) be the probability of winning when you have the advantage. Here is the relationship between these three variables. \[ P_{-1} = 0.55P \] That is, the probability of winning the game if your opponent has the advantage is the probability you win the next point times the probability of winning the game when you are at deuce. \[ P = 0.45P_{-1}+0.55P_1 \] That is, the probability of winning the game when at deuce is the probability of losing the next point times the probability of winning when your opponent has the advantage plus the probability you win the next point times the probability of winning when you have the advantage. \[ P_1 = 0.55+0.45P \] That is, the probability of winning when you have the advantage is equal to the probability of you winning the next point plus the probability of you losing the next point times the probability of winning when you are at deuce.

Substitute \(P_1\) from the third equation into the second to get something in terms of \(P_{-1}\) and \(P\) only. Then substitute the \(P_{-1}\) from the first equation into the second to get \(P\). The algebra will give you \[ P = \frac{(0.55)^2}{1-2(0.45)(0.55)} \approx 0.599099. \]

Let’s put it all together now, remembering that we could arrive at 3-3 by either me getting to 3 points last or the opponent getting to 3 points last.

pg <- pnbinom(2,4,0.55) + 2*dnbinom(3,3,0.55)*(0.55)^2/(1-2*(.45)*(.55))
pg

## [1] 0.6231485

This is the exact answer for \(p_g\), the probability of winning a game. Since our simulated probability of 0.6285 is close to this, we probably did the calculation correctly. Let’s change a few values to find \(p_{tbg}\).

ptbg <- pnbinom(5,7,0.55) + 2*dnbinom(6,6,0.55)*(0.55)^2/(1-2*(.45)*(.55))
ptbg

## [1] 0.6541508

Again, this is close to our empirical probability of 0.6488. Let’s use these values to obtain the probability of winning a set.

ps <- pnbinom(4,6,pg)+2*dnbinom(5,5,pg)*(pg^2+2*pg*(1-pg)*ptbg)
ps

## [1] 0.815042

Using the exact values instead of the empirical values brought it much closer to our empirical probability .8096! We now have the exact probability of winning a set. Let’s now use it to find the exact probabilities of winning a match for males and females.

pmMale <- pnbinom(2,3,ps) 
pmMale ## Is this close to .9501?

## [1] 0.9529824

pmFemale <- pnbinom(1,2,ps) 
pmFemale ## Is this close to .9091?

## [1] 0.9100262

Here again, we have values very close to the simulated ones, so I think we have the exact probabilities of winning a match for both males and females!

Our last, and final calculation is for the tournament itself.

(pmMale)^7

## [1] 0.7138292

(pmFemale)^7

## [1] 0.5168653

Summary

Now, revisiting the original problems and questions, we can give our answers.

What are your chances of winning a three-set match, as played by the women, or a five-set match, as played by the men?

0.9100262 (women)
0.9529824 (men)

And what are your chances of winning the whole tournament (seven consecutive matches)?

0.5168653 (women)
0.7138292 (men)

U.S. Open Chances

Jason Shaw

Tuesday, September 10, 2019

Problem Statement from The Riddler

Part I: Simulation

The Theoretical Probabilities.

Summary