The following analysis was conducted using publicly available game and player data from regular season and postseason games since 1999 via the nflfastR and nflseedR packages.

Question 1: Simulate Super Bowl Score Results (ie 27-23, 30-21, etc.) based on pregame Vegas Odds (Game Total, Spread)

Gambling markets are efficient. Over a large enough sample, actual game results will (roughly) converge to pre-game betting lines. That is, teams favored by 3 will win by an average of 3; games with an O/U of 50 will have an average of 50 points scored. Further, for both game spreads and game totals, actual results are normally distributed around the pre-game lines with a standard deviation of ~13.

NFL Point Spreads Summary Statistics
1999-2022
Spread n Results
Avg Median Stdev
3.0 1029 2.45 3 13.45
3.5 586 4.12 3 13.37
1.0 519 0.18 1 13.93
7.0 445 8.98 7 13.44
2.5 421 0.96 2 13.67
4.0 316 4.07 4 11.76
6.0 311 5.87 6 12.86
6.5 301 6.42 5 13.47
7.5 246 8.30 8 12.75
5.5 241 5.61 4 13.49
4.5 229 5.20 4 12.45
10.0 199 10.09 9 12.22
5.0 155 5.43 5 13.05
2.0 149 2.72 3 11.01
9.0 146 8.22 7 14.18
1.5 142 -0.58 0 12.42
9.5 140 10.26 8 13.93
10.5 126 11.13 10 13.03
8.0 92 9.63 11 15.03
8.5 89 9.45 8 14.61
11.0 77 11.18 10 11.65
14.0 76 13.37 12 13.69
13.0 67 12.87 14 12.85
13.5 57 12.33 8 14.32
11.5 41 11.12 13 15.40
12.0 32 8.56 6 13.10
12.5 32 14.78 18 15.41
0.0 31 0.97 2 13.97
14.5 22 11.68 8 14.57
17.0 16 19.31 18 10.13
15.5 15 20.93 19 12.37
16.0 14 16.50 16 8.52
16.5 13 15.92 14 14.88
15.0 10 19.60 18 14.67
17.5 4 12.75 7 20.47
18.0 4 25.00 25 16.43
20.5 4 14.75 13 8.38
19.0 3 20.33 18 18.61
22.0 2 23.00 23 2.83
18.5 1 31.00 31 NA
19.5 1 28.00 28 NA
20.0 1 26.00 26 NA
24.0 1 3.00 3 NA
27.0 1 16.00 16 NA
Data from nflseedR



NFL Game Totals Summary Statistics
1999-2022
O/U n Results
Avg Median Stdev
44.0 341 46.03 45 13.34
41.0 302 42.24 41 12.75
43.0 283 43.97 43 13.95
43.5 266 43.67 44 13.60
47.0 247 45.82 45 12.29
45.0 240 44.73 44 12.58
44.5 231 43.43 43 12.83
42.0 220 43.21 42 13.51
46.0 216 46.71 47 12.84
41.5 209 41.82 41 13.53
45.5 207 45.38 45 14.58
46.5 203 47.67 46 13.48
42.5 188 43.26 43 14.21
40.5 186 41.95 41 13.47
40.0 181 40.56 40 11.68
47.5 179 47.98 46 14.03
37.0 176 35.86 34 12.72
37.5 174 39.23 40 15.00
48.0 171 49.14 47 14.77
38.0 170 39.42 37 14.37
48.5 157 49.85 50 14.34
39.5 155 39.41 38 13.60
39.0 141 39.91 40 13.10
38.5 124 37.72 37 12.60
49.0 118 48.62 47 13.38
36.5 117 37.94 37 14.16
49.5 108 47.62 47 13.27
51.0 91 51.98 51 12.59
36.0 90 38.13 37 14.62
50.0 78 47.94 47 11.95
50.5 73 52.45 51 14.94
35.5 70 39.97 40 11.86
35.0 61 35.69 36 13.45
34.5 59 36.97 36 12.78
34.0 57 39.30 36 14.36
52.0 52 52.75 52 14.17
51.5 48 53.83 54 14.28
52.5 46 51.91 51 10.46
53.0 45 52.78 53 13.91
33.0 44 35.43 36 11.84
53.5 40 55.90 55 13.75
54.0 40 56.17 56 13.41
33.5 35 34.91 33 12.11
55.0 35 49.66 49 11.52
54.5 28 55.43 62 15.45
55.5 17 51.18 50 17.52
56.5 16 58.19 52 18.51
32.0 14 31.21 28 11.39
56.0 14 55.07 52 14.72
57.0 8 54.00 57 8.14
31.0 6 28.50 25 14.22
32.5 6 42.00 42 8.69
58.0 5 55.00 62 20.48
57.5 4 61.25 68 21.96
59.5 4 68.00 69 13.52
58.5 3 67.67 70 7.77
31.5 2 18.50 18 3.54
30.0 1 33.00 33 NA
30.5 1 19.00 19 NA
60.0 1 61.00 61 NA
61.0 1 48.00 48 NA
63.0 1 58.00 58 NA
63.5 1 105.00 105 NA
Data from nflseedR



However, as illustrated below, simply sampling from a normal distribution doesn’t work due to of the concept of key/critical numbers.



To simulate possible game scores, I want to take advantage of these normally distributed pre-game lines while also accounting for key numbers.


To do so, I first randomly generated game results from a normal distribution with mean=1.5 (line= PHI -1.5) and stdev=13, and game totals from a normal distribution with mean=50.5 (O/U=50.5) and stdev=13.

The first few rows of randomly generated results are shown below. Note that result=home score-away score and total=home score+away score

##   result total
## 1      4    43
## 2    -13    29
## 3    -13    54
## 4    -18    51
## 5    -13    45
## 6    -10    34


Next, I created a function that, for a given randomly generated game score and total, does the following:
1. Expands the range of each by 3 in both directions
2. Looks through all historical games that fall within this expanded range
3. Randomly picks one game

For example, if result=-2 and total=50, the function extracts all previously played games where:
result > -6 AND result < 2 AND total > 46 AND total < 54 (this particular example has occurred 210 times since 1999), and then randomly selects one of these rows.


Here are the 10 most common outcomes after running 50k simulations.

10 Most Common Scores
50,000 simulations
PHI KC Result Total Winner Occurences
27 24 3 51 PHI 646
24 27 -3 51 KC 478
20 23 -3 43 KC 460
23 20 3 43 PHI 460
31 24 7 55 PHI 458
20 17 3 37 PHI 396
31 28 3 59 PHI 363
34 31 3 65 PHI 352
27 20 7 47 PHI 321
30 27 3 57 PHI 304
Avg Result=1.51, Avg Total=50.38
Question 2: for the Super Bowl using Bayesian Bootstrapping or another methodology

My first step was to examine Kelce’s relevant metrics dating back to 2018–Mahomes’ first year as the starter.

Travis Kelce Summary Statistics
Season GP Targets Tgts/Gm Yds Yds/Tgt Depth of Target Longest Catch of Game
Median Avg Median Avg
2018 18 166 9.2 1467 8.84 7 9.16 24.0 23.78
2019 19 158 8.3 1436 9.09 7 8.67 20.0 22.79
2020 18 185 10.3 1776 9.60 7 8.31 24.5 25.83
2021 19 161 8.5 1424 8.84 6 7.43 20.0 24.68
2022 19 177 9.3 1514 8.55 5 6.89 23.0 25.84



Kelce’s average longest reception actually peaked this year, though his aDOT and yards/target hit a five year low. All the while, his volume has remained consistent.

I deployed a classical bootstrap, resampling yards gained from all Kelce targets in 2021 and 2022 (pre-SB), which seemed to strike a fair balance between representative-ness and sample size. Since we are looking for the median (presumably for purpose of betting the longest reception prop), the fact that this sample does not include every possible reception length (for example, the sample includes a 69 yard and a 52 yard catch with nothing in between) is not an issue.

In each run, the number of targets was sampled from a Poisson distribution, with the lambda parameter set as Kelce’s average targets/game from 2018-2022.

To clarify, I wrote a function that does the following:

  1. Randomly selects the number of targets (t) from a Poisson distribution with lambda=9.1
  2. Randomly selects t plays from the set of Kelce’s 2021-2022 actual targets
  3. Extracts yards gained from each selected play
  4. Stores the longest reception of these t plays

It looks like the line was 22.5 yards while the median from the sampling distribution is 23 yards (his actual longest catch turned out to be 22 yards!)

Question 3: Determine Brock Purdy’s true talent level YPA at this point in his career

Though projecting the “true talent level” of an NFL player is much harder than doing so for an MLB player due to the accuracy and availability of predictive statistics in the respective sports, baseball projection systems provide a useful framework for attacking this type of question. Specifically, I’m an advocate of emulating the approach used by Dan Szymborski in his ZiPS projections.

If you are unfamiliar or need a refresher on how ZiPS works, here is a brief description on the methodology (full articlecan be found here.)

How does “ZiPS project(s) future production? First, using both recent playing data with adjustments for…, ZiPS establishes a baseline estimate for every player being projected. To get an idea of where the player is going, the system compares that baseline to the baselines of all other players in its database…Using a whole lot of stats, information on shape, and player characteristics, ZiPS then finds a large cohort that is most similar to the player. I use Mahalanobis distance extensively for this.

Utilizing sources like game statistics, draft data, PFF grades, Madden ratings (plus any additional data you may have access to), I propose clustering and subsequently formulating projections for various metrics from these clusters (like YPA for a QB).