The Poisson distribution is commonly used as a model for the number of occurrences of some “rare” event in a large number of (weakly dependent) trials. Consider, for example, touchdown passes in a football game. Peyton Manning has thrown for a total of 520 touchdown passes in 249 career games (the NFL record) for an average of 520/249 = 2.0883534 touchdowns per game. His game by game statistics can be downloaded from the site <http://www.pro-football-reference.com/players/M/MannPe00/gamelog/

After downloading this data and cleaning it up a bit, we import the results into R

peyton_manning <- read.csv("C:/Users/jmayberr/Desktop/peyton_manning.csv")

and look at his game by game TD pass statistics

table(peyton_manning$TD)
## 
##  0  1  2  3  4  5  6  7 
## 26 64 68 57 25  6  2  1

Let’s compare these numbers with what one would get using Poisson approximation. We take \(\lambda = 520/249\) (his per game average) and compute the density function for \(x=0,1,...,7\):

dpois(0:7,520/249)
## [1] 0.123890965 0.258728120 0.270157876 0.188061708 0.098184827 0.041008924
## [7] 0.014273521 0.004258308

If we multiply these probabilities by 249, we can estimate the number of games in which we would expect him to have thrown \(0,1,...,7\) TD passes:

249*dpois(0:7,520/249)
## [1] 30.848850 64.423302 67.269311 46.827365 24.448022 10.211222  3.554107
## [8]  1.060319

Not too bad of a comparison with the actual numbers. For the statistically savvy, we can assess the ``goodness-of-fit’’ using the \(\chi^2\) distribution:

p= c(dpois(0:6,520/249),1-ppois(6,520/249))
chisq.test(table(peyton_manning$TD),p=p,simulate.p.value=T)
## 
##  Chi-squared test for given probabilities with simulated p-value
##  (based on 2000 replicates)
## 
## data:  table(peyton_manning$TD)
## X-squared = 5.5346, df = NA, p-value = 0.6092

Note that we “bin” together the probabilities of getting more than seven passes per game so that the probabilities add up to 1. The simulate.p.value command is used because some of the cell counts are rather small and in such cases, the asymptotic approximation of the test statistic for the \(\chi^2\) test may not be accurate - if you try leaving this statement out, R will complain!