The following analysis was conducted using publicly available game and player data from regular season and postseason games since 1999 via the nflfastR and nflseedR packages.
Gambling markets are efficient. Over a large enough sample, actual game results will (roughly) converge to pre-game betting lines. That is, teams favored by 3 will win by an average of 3; games with an O/U of 50 will have an average of 50 points scored. Further, for both game spreads and game totals, actual results are normally distributed around the pre-game lines with a standard deviation of ~13.
| NFL Point Spreads Summary Statistics | ||||
| 1999-2022 | ||||
| Spread | n | Results | ||
|---|---|---|---|---|
| Avg | Median | Stdev | ||
| 3.0 | 1029 | 2.45 | 3 | 13.45 |
| 3.5 | 586 | 4.12 | 3 | 13.37 |
| 1.0 | 519 | 0.18 | 1 | 13.93 |
| 7.0 | 445 | 8.98 | 7 | 13.44 |
| 2.5 | 421 | 0.96 | 2 | 13.67 |
| 4.0 | 316 | 4.07 | 4 | 11.76 |
| 6.0 | 311 | 5.87 | 6 | 12.86 |
| 6.5 | 301 | 6.42 | 5 | 13.47 |
| 7.5 | 246 | 8.30 | 8 | 12.75 |
| 5.5 | 241 | 5.61 | 4 | 13.49 |
| 4.5 | 229 | 5.20 | 4 | 12.45 |
| 10.0 | 199 | 10.09 | 9 | 12.22 |
| 5.0 | 155 | 5.43 | 5 | 13.05 |
| 2.0 | 149 | 2.72 | 3 | 11.01 |
| 9.0 | 146 | 8.22 | 7 | 14.18 |
| 1.5 | 142 | -0.58 | 0 | 12.42 |
| 9.5 | 140 | 10.26 | 8 | 13.93 |
| 10.5 | 126 | 11.13 | 10 | 13.03 |
| 8.0 | 92 | 9.63 | 11 | 15.03 |
| 8.5 | 89 | 9.45 | 8 | 14.61 |
| 11.0 | 77 | 11.18 | 10 | 11.65 |
| 14.0 | 76 | 13.37 | 12 | 13.69 |
| 13.0 | 67 | 12.87 | 14 | 12.85 |
| 13.5 | 57 | 12.33 | 8 | 14.32 |
| 11.5 | 41 | 11.12 | 13 | 15.40 |
| 12.0 | 32 | 8.56 | 6 | 13.10 |
| 12.5 | 32 | 14.78 | 18 | 15.41 |
| 0.0 | 31 | 0.97 | 2 | 13.97 |
| 14.5 | 22 | 11.68 | 8 | 14.57 |
| 17.0 | 16 | 19.31 | 18 | 10.13 |
| 15.5 | 15 | 20.93 | 19 | 12.37 |
| 16.0 | 14 | 16.50 | 16 | 8.52 |
| 16.5 | 13 | 15.92 | 14 | 14.88 |
| 15.0 | 10 | 19.60 | 18 | 14.67 |
| 17.5 | 4 | 12.75 | 7 | 20.47 |
| 18.0 | 4 | 25.00 | 25 | 16.43 |
| 20.5 | 4 | 14.75 | 13 | 8.38 |
| 19.0 | 3 | 20.33 | 18 | 18.61 |
| 22.0 | 2 | 23.00 | 23 | 2.83 |
| 18.5 | 1 | 31.00 | 31 | NA |
| 19.5 | 1 | 28.00 | 28 | NA |
| 20.0 | 1 | 26.00 | 26 | NA |
| 24.0 | 1 | 3.00 | 3 | NA |
| 27.0 | 1 | 16.00 | 16 | NA |
| Data from nflseedR | ||||
| NFL Game Totals Summary Statistics | ||||
| 1999-2022 | ||||
| O/U | n | Results | ||
|---|---|---|---|---|
| Avg | Median | Stdev | ||
| 44.0 | 341 | 46.03 | 45 | 13.34 |
| 41.0 | 302 | 42.24 | 41 | 12.75 |
| 43.0 | 283 | 43.97 | 43 | 13.95 |
| 43.5 | 266 | 43.67 | 44 | 13.60 |
| 47.0 | 247 | 45.82 | 45 | 12.29 |
| 45.0 | 240 | 44.73 | 44 | 12.58 |
| 44.5 | 231 | 43.43 | 43 | 12.83 |
| 42.0 | 220 | 43.21 | 42 | 13.51 |
| 46.0 | 216 | 46.71 | 47 | 12.84 |
| 41.5 | 209 | 41.82 | 41 | 13.53 |
| 45.5 | 207 | 45.38 | 45 | 14.58 |
| 46.5 | 203 | 47.67 | 46 | 13.48 |
| 42.5 | 188 | 43.26 | 43 | 14.21 |
| 40.5 | 186 | 41.95 | 41 | 13.47 |
| 40.0 | 181 | 40.56 | 40 | 11.68 |
| 47.5 | 179 | 47.98 | 46 | 14.03 |
| 37.0 | 176 | 35.86 | 34 | 12.72 |
| 37.5 | 174 | 39.23 | 40 | 15.00 |
| 48.0 | 171 | 49.14 | 47 | 14.77 |
| 38.0 | 170 | 39.42 | 37 | 14.37 |
| 48.5 | 157 | 49.85 | 50 | 14.34 |
| 39.5 | 155 | 39.41 | 38 | 13.60 |
| 39.0 | 141 | 39.91 | 40 | 13.10 |
| 38.5 | 124 | 37.72 | 37 | 12.60 |
| 49.0 | 118 | 48.62 | 47 | 13.38 |
| 36.5 | 117 | 37.94 | 37 | 14.16 |
| 49.5 | 108 | 47.62 | 47 | 13.27 |
| 51.0 | 91 | 51.98 | 51 | 12.59 |
| 36.0 | 90 | 38.13 | 37 | 14.62 |
| 50.0 | 78 | 47.94 | 47 | 11.95 |
| 50.5 | 73 | 52.45 | 51 | 14.94 |
| 35.5 | 70 | 39.97 | 40 | 11.86 |
| 35.0 | 61 | 35.69 | 36 | 13.45 |
| 34.5 | 59 | 36.97 | 36 | 12.78 |
| 34.0 | 57 | 39.30 | 36 | 14.36 |
| 52.0 | 52 | 52.75 | 52 | 14.17 |
| 51.5 | 48 | 53.83 | 54 | 14.28 |
| 52.5 | 46 | 51.91 | 51 | 10.46 |
| 53.0 | 45 | 52.78 | 53 | 13.91 |
| 33.0 | 44 | 35.43 | 36 | 11.84 |
| 53.5 | 40 | 55.90 | 55 | 13.75 |
| 54.0 | 40 | 56.17 | 56 | 13.41 |
| 33.5 | 35 | 34.91 | 33 | 12.11 |
| 55.0 | 35 | 49.66 | 49 | 11.52 |
| 54.5 | 28 | 55.43 | 62 | 15.45 |
| 55.5 | 17 | 51.18 | 50 | 17.52 |
| 56.5 | 16 | 58.19 | 52 | 18.51 |
| 32.0 | 14 | 31.21 | 28 | 11.39 |
| 56.0 | 14 | 55.07 | 52 | 14.72 |
| 57.0 | 8 | 54.00 | 57 | 8.14 |
| 31.0 | 6 | 28.50 | 25 | 14.22 |
| 32.5 | 6 | 42.00 | 42 | 8.69 |
| 58.0 | 5 | 55.00 | 62 | 20.48 |
| 57.5 | 4 | 61.25 | 68 | 21.96 |
| 59.5 | 4 | 68.00 | 69 | 13.52 |
| 58.5 | 3 | 67.67 | 70 | 7.77 |
| 31.5 | 2 | 18.50 | 18 | 3.54 |
| 30.0 | 1 | 33.00 | 33 | NA |
| 30.5 | 1 | 19.00 | 19 | NA |
| 60.0 | 1 | 61.00 | 61 | NA |
| 61.0 | 1 | 48.00 | 48 | NA |
| 63.0 | 1 | 58.00 | 58 | NA |
| 63.5 | 1 | 105.00 | 105 | NA |
| Data from nflseedR | ||||
However, as illustrated below, simply sampling from a
normal distribution doesn’t work due to of the concept of key/critical
numbers.
To simulate possible game scores, I want to take advantage of these
normally distributed pre-game lines while also accounting for key
numbers.
To do so, I first randomly generated game results from
a normal distribution with mean=1.5 (line= PHI -1.5)
and stdev=13, and game totals from a normal
distribution with mean=50.5 (O/U=50.5) and
stdev=13.
The first few rows of randomly
generated results are shown below. Note that
result=home score-away score and
total=home score+away score
## result total
## 1 4 43
## 2 -13 29
## 3 -13 54
## 4 -18 51
## 5 -13 45
## 6 -10 34
Next, I created a function that, for a given randomly generated
game score and total, does the following:
1. Expands the range of
each by 3 in both directions
2. Looks through all historical games
that fall within this expanded range
3. Randomly picks one game
For example, if result=-2 and
total=50, the function extracts all previously played
games where:
result > -6 AND
result < 2 AND total > 46
AND total < 54 (this particular example has
occurred 210 times since 1999), and then randomly selects one of these
rows.
Here are the 10 most common outcomes after running 50k
simulations.
| 10 Most Common Scores | |||||
| 50,000 simulations | |||||
| PHI | KC | Result | Total | Winner | Occurences |
|---|---|---|---|---|---|
| 27 | 24 | 3 | 51 | PHI | 646 |
| 24 | 27 | -3 | 51 | KC | 478 |
| 20 | 23 | -3 | 43 | KC | 460 |
| 23 | 20 | 3 | 43 | PHI | 460 |
| 31 | 24 | 7 | 55 | PHI | 458 |
| 20 | 17 | 3 | 37 | PHI | 396 |
| 31 | 28 | 3 | 59 | PHI | 363 |
| 34 | 31 | 3 | 65 | PHI | 352 |
| 27 | 20 | 7 | 47 | PHI | 321 |
| 30 | 27 | 3 | 57 | PHI | 304 |
| Avg Result=1.51, Avg Total=50.38 | |||||
My first step was to examine Kelce’s relevant metrics dating back to
2018–Mahomes’ first year as the starter.
| Travis Kelce Summary Statistics | |||||||||
| Season | GP | Targets | Tgts/Gm | Yds | Yds/Tgt | Depth of Target | Longest Catch of Game | ||
|---|---|---|---|---|---|---|---|---|---|
| Median | Avg | Median | Avg | ||||||
| 2018 | 18 | 166 | 9.2 | 1467 | 8.84 | 7 | 9.16 | 24.0 | 23.78 |
| 2019 | 19 | 158 | 8.3 | 1436 | 9.09 | 7 | 8.67 | 20.0 | 22.79 |
| 2020 | 18 | 185 | 10.3 | 1776 | 9.60 | 7 | 8.31 | 24.5 | 25.83 |
| 2021 | 19 | 161 | 8.5 | 1424 | 8.84 | 6 | 7.43 | 20.0 | 24.68 |
| 2022 | 19 | 177 | 9.3 | 1514 | 8.55 | 5 | 6.89 | 23.0 | 25.84 |
Kelce’s average longest reception actually peaked this
year, though his aDOT and yards/target hit a five year low. All the
while, his volume has remained consistent.
I deployed a
classical bootstrap, resampling yards gained from all
Kelce targets in 2021 and 2022 (pre-SB), which seemed to strike a fair
balance between representative-ness and sample size. Since we are
looking for the median (presumably for purpose of betting the longest
reception prop), the fact that this sample does not include every
possible reception length (for example, the sample includes a 69 yard
and a 52 yard catch with nothing in between) is not an issue.
In each run, the number of targets was sampled from a Poisson
distribution, with the lambda parameter set as Kelce’s average
targets/game from 2018-2022.
To clarify, I wrote a function that does the following:
It looks like the line was 22.5 yards while the median from the
sampling distribution is 23 yards (his actual longest catch turned out
to be 22 yards!)
Though projecting the “true talent level” of an NFL player is much
harder than doing so for an MLB player due to the accuracy and
availability of predictive statistics in the respective sports, baseball
projection systems provide a useful framework for attacking this type of
question. Specifically, I’m an advocate of emulating the approach used
by Dan Szymborski in his ZiPS projections.
If you are
unfamiliar or need a refresher on how ZiPS works, here is a brief
description on the methodology (full articlecan
be found here.)
How does “ZiPS project(s) future production? First, using both recent playing data with adjustments for…, ZiPS establishes a baseline estimate for every player being projected. To get an idea of where the player is going, the system compares that baseline to the baselines of all other players in its database…Using a whole lot of stats, information on shape, and player characteristics, ZiPS then finds a large cohort that is most similar to the player. I use Mahalanobis distance extensively for this.
Utilizing sources like game statistics, draft data, PFF grades, Madden ratings (plus any additional data you may have access to), I propose clustering and subsequently formulating projections for various metrics from these clusters (like YPA for a QB).