library(dplyr)

Binomial Probability

During this viral pandemonium while we all contemplate how and when we could receive our vaccine shots, I have decided to take the time to consider another kind of shot and offer my mind some relief from the troubles of this pandemic. The shot I am talking about is that of a basketball threw a hoop suspended over the court. We are going to push thoughts of disease aside and consider the free throw percentage and the probability that one of the NBA’s most famous players to date, will keep his best ever free throw percentage. That’s right, we are going to discuss Stephen Curry.

Thus far in his twelve-year career, Stephen Curry is shooting and making 90.7% of his free throws. He currently holds the highest free throw percentage in NBA history. He also broke the record for highest free throw percentage in the first seven games of the season with 98.2% of shots made. For reference, the average free throw percentage across the entire NBA is about 73%. To add some depth to that, another well-known player, Michael Jordan, sits at 83.5%.

Now, free throws are not everything and sharing these statistics are not to say he is the best player, or even the best all-around shooter. Rather we are going to use these metrics and a little statistical probability to predict the likelihood of Curry maintaining his free throw percentage in the next season based solely on his previous seasons. So, just push those player status arguments aside; at least for now.

When calculating the probability of an event, we first need to consider our approach. In the case of free throw percentage, there are only two outcomes. Either Curry will make the shots or miss. For purposes of simplification, we also assume that the results of each shot taken are completely independent of the other shots. Given these assumptions, our event should follow a binomial probability distribution.

Since there are only two potential outcomes in singular shots, the resultant average will also be binomial. In this case, it would be greatly beneficial to know the frequency of prior shots taken and shots made. Thankfully, these are collected each game and compiled. Here is a quick look at the data we’ll be working with.

curry <- read.delim("https://raw.githubusercontent.com/palmorezm/misc/master/CurryStats.txt", header = T, sep = ",")
head(curry)
##    Season Age  Tm  Lg Pos  G GS   MP  FG  FGA   FG. X3P X3PA  X3P. X2P X2PA
## 1 2009-10  21 GSW NBA  PG 80 77 36.2 6.6 14.3 0.462 2.1  4.8 0.437 4.5  9.5
## 2 2010-11  22 GSW NBA  PG 74 74 33.6 6.8 14.2 0.480 2.0  4.6 0.442 4.8  9.6
## 3 2011-12  23 GSW NBA  PG 26 23 28.2 5.6 11.4 0.490 2.1  4.7 0.455 3.5  6.7
## 4 2012-13  24 GSW NBA  PG 78 78 38.2 8.0 17.8 0.451 3.5  7.7 0.453 4.5 10.1
## 5 2013-14  25 GSW NBA  PG 78 78 36.5 8.4 17.7 0.471 3.3  7.9 0.424 5.0  9.8
## 6 2014-15  26 GSW NBA  PG 80 80 32.7 8.2 16.8 0.487 3.6  8.1 0.443 4.6  8.7
##    X2P.  eFG.  FT FTA   FT. ORB DRB TRB AST STL BLK TOV  PF  PTS
## 1 0.474 0.535 2.2 2.5 0.885 0.6 3.9 4.5 5.9 1.9 0.2 3.0 3.2 17.5
## 2 0.498 0.551 2.9 3.1 0.934 0.7 3.2 3.9 5.8 1.5 0.3 3.1 3.1 18.6
## 3 0.514 0.583 1.5 1.8 0.809 0.6 2.8 3.4 5.3 1.5 0.3 2.5 2.4 14.7
## 4 0.449 0.549 3.4 3.7 0.900 0.8 3.3 4.0 6.9 1.6 0.2 3.1 2.5 22.9
## 5 0.509 0.566 3.9 4.5 0.885 0.6 3.7 4.3 8.5 1.6 0.2 3.8 2.5 24.0
## 6 0.528 0.594 3.9 4.2 0.914 0.7 3.6 4.3 7.7 2.0 0.2 3.1 2.0 23.8

There is no need for the extraneous stuff unrelated to our objective on free throw percentage. For that reason, we select the variables that are most likely to be useful to us. Then, we run a quick calculation on our free throw percentage column labeled ‘FT.’ to validate what we have been told as well as how to compute the average properly.

ft <- curry %>% 
  select(Season, FT, FTA, FT.)
mean(ft$FT.)
## [1] 0.9071538

Curry’s average free throw percentage of 90.7% is calculated using the arithmetic mean and is shown above for reference. Of course, we should remember, basketball, like most sports, is dynamic. The variables that we use to predict the probability of a shot are not inherently stable. In fact, many would argue that these statistics are constantly in motion. This means we can only predict with the most recent data. Otherwise, the probability might be wildly off. Nevertheless, we give our best efforts.

To find the correct probability of the event, we first must set a goal. We want to know what the probability of Curry maintaining his current free throw percentage is based only on his seasonal data. Since we know his percentage (90.7%) we can evaluate every event in at or above this value as a success, while averages below this are failures. This binomial probability event is represented by this equation:

\[P_{x}=(\frac{n}{x})\times p^{x} \times q^{n - x}\]

Where \(P\) is the probability of exactly \(x\) successes with \(n\) events and the probability of a single event is \(p\) while failure is \(q\). If this seems like gibberish, hang in there. This equation is broken into constituent parts making it quite intuitive. In other words, we only use this equation for a reference as we end up using it just by thinking through the problem logically.

This method is perhaps the most straightforward and useful way to quickly grasp the concept of binomial probability. Once we are finished, it is all the more useful too, as it can be applied to any event where there are only two outcomes. Think about how many of those exist in the world (the answer; a lot).

For example, consider how we might find the probability of a successful event. That is, when our free throw percentage is greater than or equal to 0.907. Well to start, we count how many events there are. In R, we use a function for that.

# Total possible events
events <- length(ft$Season)
events <- (events - 1)
events
## [1] 12

With this function we now know there are 13 total events, including the career average. Technically, this career average is not a seasonal statistic. Because of this, we drop it from the total otherwise our result would be skewed. The total true events with seasonal averages becomes 12.

Expressing this as a fraction, we would want to know how many events have a success to find the proportion of success over the true events. As mentioned, we find the successes by observing where the seasonal free throw average is greater than or equal 0.907 and count their occurrences.

# Total successes
success <- ft %>% 
  filter(FT. >= .907) 
# Exclude career average 
success <- (length(success$FT.) - 1)
# Total failures 
fail <- events - success
fail
## [1] 5
success
## [1] 7

While we were counting the occurrences of success, we simultaneously were able to find the number of failures. Recall that there are only two outcomes possible. If our total events are held constant and we have \(x\) events, then the number of failures must always make up the remaining difference. This is an extremely useful trick for binomials and one that should be remembered!

Lastly, we crunch the final numbers. We find the probability of success given a certain number of events. In this case, Curry has played for 12 seasons and has a seasonal average for each of them. Following the same principle of proportions, we create another fraction to find the probability of success. Of course, we could easily compute the failure of the events by finding the difference, but here we take the long route and compute failures from the total events as well.

# Probability of success 
success/events
## [1] 0.5833333
# Probability of failure 
fail/events
## [1] 0.4166667

And there we have it! Our is that the probability of Curry maintaining his average free throw percentage at 90.7% is approximately 58.3% based solely on his historical records. We know that the unfortunate reality for all basketball players in 2020 is that the year was unprecedented and created a whole host of statistical anomalies. For this reason and many others, (like comparable free throw percentages for other players - for example) the actual probability is likely lower than this. However, this does offer a great example to learn about binomial probabilities!

As a final word on the subject, consider this scenario in which the number of events is held constant. We had 12 events possible corresponding to twelve total seasons. If we sum the probability of successful events and failed events what number should we expect? We crunch those numbers to find out.

# Sum of binomial probability
(success/events) + (fail/events)
## [1] 1

The answer is one and it turns out that binomial probabilities will always sum to one! We could throw more math at you to prove it, but when we think about this logically, as we have for this entire free throw scenario, the probability of failure and success for a binomial probability must always sum to one because there are only two outcomes possible. Anything else is simply not a binomial!