Background : We decided to have an NFL “Survivor League” for our fall office pool this year (2021-2022). The rules are fairly simple.
Pick 1 winning team for each week of the 18 game regular season. If your team wins or ties, you advance to the next week. Week 12 is a special case where you have to pick two teams and both have to win in order to advance
You can only pick each team one time. Therefore you have to pick 19 teams total.
The last survivor takes 75% of the money in the pool while the runner up takes 25%. If there are ties, they divide the money accordingly.
So the question is, how can we use analytics to make the best picks?
Note : This article, like most of mine on R-Pubs, is both instructional and investigative. You can re-create any of these charts and tables from the embedded R code. Click the “Code” button to view.
First we will load in the 2021 schedule from www.pro-football-reference.com where we can get both the schedule and results.
Below is an example of the table we built where each row represents a possible outcome of a scheduled game.
## 2 - Load Data - Schedules
SchedURL = "https://www.pro-football-reference.com/years/2021/games.htm"
SchedYear = "2021"
ScheduleDF = bind_cols(
tibble(
Week_Num = SchedURL %>% read_html() %>% html_nodes('*[data-stat="week_num"]') %>% html_text(),
Date = SchedURL %>% read_html() %>% html_nodes('*[data-stat="game_date"]') %>% html_text(),
VisTm = SchedURL %>% read_html() %>% html_nodes('*[data-stat="winner"]') %>% html_text(),
HomeTm = SchedURL %>% read_html() %>% html_nodes('*[data-stat="loser"]') %>% html_text(),
GameLoc = SchedURL %>% read_html() %>% html_nodes('*[data-stat="game_location"]') %>% html_text(),
PtsWin = SchedURL %>% read_html() %>% html_nodes('*[data-stat="pts_win"]') %>% html_text()) %>%
filter(Date != "Date"),
tibble(
VisRef = SchedURL %>% read_html() %>% html_nodes('*[data-stat="winner"] a') %>% html_attr("href"),
HomeRef = SchedURL %>% read_html() %>% html_nodes('*[data-stat="loser"] a') %>% html_attr("href"))) %>%
mutate(VisID = str_extract(VisRef, "(?<=/teams/).*(?=/)") %>% toupper(),
HomeID = str_extract(HomeRef, "(?<=/teams/).*(?=/)") %>% toupper(),
Date = paste0(Date) %>% as.Date(format = "%Y-%m-%d"),
Winner = if_else(PtsWin != "", VisID, ""),
SwapTm = if_else(GameLoc != "@", VisTm, ""),
SwapID = if_else(GameLoc != "@", VisID, ""),
VisTm = if_else(GameLoc != "@", HomeTm, VisTm),
VisID = if_else(GameLoc != "@", HomeID, VisID),
HomeTm = if_else(GameLoc != "@", SwapTm, HomeTm),
HomeID = if_else(GameLoc != "@", SwapID, HomeID)) %>%
dplyr::select(-SwapTm, -SwapID, -GameLoc, -PtsWin)
PlanDF = ScheduleDF %>%
filter(Week_Num %in% as.character(seq(1, 18))) %>%
mutate(V = VisID,
H = HomeID,
Week_Num = as.numeric(Week_Num)) %>%
dplyr::select(Week_Num, Date, VisID, HomeID, V, H) %>%
pivot_longer(-c(Date, Week_Num, VisID, HomeID),
names_to = "Version", values_to = "Winner") %>%
group_by(Week_Num) %>%
mutate(OutcomeID = paste0(str_pad(Week_Num, 2, "0", side = c("left")),
str_pad(row_number(), 3, "0", side = c("left")))) %>%
ungroup()
ResultDF = ScheduleDF %>%
filter(Winner != "" & !is.na(Winner)) %>%
mutate(V = VisID,
H = HomeID,
Week_Num = as.numeric(Week_Num)) %>%
dplyr::select(Week_Num, Date, VisID, HomeID, V, H, Winner) %>%
pivot_longer(-c(Date, Week_Num, VisID, HomeID, Winner),
names_to = "Version", values_to = "TeamID") %>%
mutate(Result = if_else(Winner == TeamID, "Winner", "Loser") %>%
factor(levels = c("Winner", "Loser"))) %>%
dplyr::select(Week_Num, TeamID, Result)
# PossibleDF = expand_grid(
# "Winner01" = filter(PlanDF, Week_Num == 1)$Winner,
# "Winner02" = filter(PlanDF, Week_Num == 2)$Winner,
# "Winner03" = filter(PlanDF, Week_Num == 3)$Winner) %>%
# filter(Winner01 != Winner02 & Winner01 != Winner03 & Winner02 != Winner03)
# VertSchedDF = PlanDF %>%
# mutate(Opp = if_else(Winner==VisID, HomeID, VisID)) %>%
# rename(Team = Winner, Place = Version) %>%
# dplyr::select(Week_Num, Team, Place, Opp)
head(PlanDF, 10) %>%
kable("html", escape = FALSE,
caption = paste0('<p style="color:black; font-size:18px">',
'2021 NFL Schedule - Possible Outcomes',
'</p>')) %>%
kable_styling("striped",
bootstrap_options = c("hover", "condensed")) %>%
row_spec(0, color = "white", background = "black")
| Week_Num | Date | VisID | HomeID | Version | Winner | OutcomeID |
|---|---|---|---|---|---|---|
| 1 | 2021-09-09 | DAL | TAM | V | DAL | 01001 |
| 1 | 2021-09-09 | DAL | TAM | H | TAM | 01002 |
| 1 | 2021-09-12 | PHI | ATL | V | PHI | 01003 |
| 1 | 2021-09-12 | PHI | ATL | H | ATL | 01004 |
| 1 | 2021-09-12 | NYJ | CAR | V | NYJ | 01005 |
| 1 | 2021-09-12 | NYJ | CAR | H | CAR | 01006 |
| 1 | 2021-09-12 | MIN | CIN | V | MIN | 01007 |
| 1 | 2021-09-12 | MIN | CIN | H | CIN | 01008 |
| 1 | 2021-09-12 | SEA | CLT | V | SEA | 01009 |
| 1 | 2021-09-12 | SEA | CLT | H | CLT | 01010 |
Now we need to get some idea of what the probability is of each of these possible outcomes happening. We will build that by:
Taking the consensus ordinal pre-season ranking from three major sports networks (ESPN, CBS and Fox) to form our basis.
Using that pre-season ranking we will calculate a “Relative Probability Index” or RPI (we’re borrowing the acronym) of each team’s probability of beating the average team in the league. After looking at the data for prior years and rounding for simplicity, the best team in the league has roughly an 80% chance of beating an average team while the worst team in the league has roughly a 20% chance.
Of course those pre-season expectations change as the reality of performance, injuries and other factors impact the actual results of games. Therefore, as the season progresses, we need to update our RPI. Rather than build a complicated “bottom-up” model based on hundreds of factors, we’re going to keep it simple and follow the money. Specifically the odds set by the major online betting platforms.
## Load Data - Vegas Odds and Pre-Season Rankings
OddsUrl = "https://www.vegasinsider.com/nfl/odds/las-vegas/money/"
RawTable = OddsUrl %>%
read_html() %>%
html_node('*[class="frodds-data-tbl"]') %>%
html_table(fill = TRUE)
OddsTable = RawTable %>%
filter(X10 != "") %>%
mutate(DateText = str_extract(X1, "\\d\\d\\/\\d\\d"),
MonthText = str_sub(DateText,1,2),
TimeText = str_extract(X1, "(?<=\\d\\d\\/\\d\\d\\s).*(?=\\n)"),
YearText = if_else(as.numeric(MonthText)<=2, year(Sys.Date()) + 1, year(Sys.Date())),
GameTime = paste0(DateText, "/", YearText, TimeText) %>%
as.POSIXct(format = "%m/%d/%Y %H:%M"),
VisTm = str_extract(X1, "(?<=\\d\\d\\d.).*"),
HomeTm = str_replace(X1, "(?<=\\d\\d\\d.).*", "") %>%
str_extract("(?<=\\d\\d\\d.).*$"),
VisLine = str_extract(X10, "(?<=^)\\-\\d*|(?<=(^\\+))\\d*") %>% as.numeric() ,
HomeLine = str_extract(X10, "\\-\\d*(?<=$)|(?<=(\\+))\\d*(?<=$)") %>% as.numeric(),
VisProb = LineToProb(VisLine),
HomeProb = LineToProb(HomeLine),
ProbGap = (VisProb + HomeProb) - 1) %>%
dplyr::select(GameTime, VisTm, HomeTm, VisLine, HomeLine, VisProb, HomeProb, ProbGap)
# Need to store and lock game odds so that we don't loose games that already happened
ArchiveOdds = readRDS("C:\\Users\\Chris.Woolery\\Documents\\2021\\ArchiveOdds.RDS")
NewArchiveOdds = bind_rows(
OddsTable %>%
mutate(ArchiveDate = Sys.time()) %>%
dplyr::select(ArchiveDate, everything()),
ArchiveOdds) %>%
group_by(GameTime, VisTm) %>%
mutate(MaxArchiveDate = max(ArchiveDate)) %>%
ungroup() %>%
arrange(GameTime, VisTm) %>%
filter(ArchiveDate == MaxArchiveDate) %>%
dplyr::select(-MaxArchiveDate) %>%
distinct()
saveRDS(NewArchiveOdds, "C:\\Users\\Chris.Woolery\\Documents\\2021\\ArchiveOdds.RDS")
# Restore Odds Table with Old Odds Plus Latest Odds
OddsTable = NewArchiveOdds %>%
dplyr::select(-ArchiveDate)
OddsTableLong =
bind_rows(OddsTable %>%
mutate(Date = as.Date(GameTime),
TeamName_VegasInsider = VisTm,
Vegas_Prob = VisProb) %>%
dplyr::select(Date, TeamName_VegasInsider, Vegas_Prob),
OddsTable %>%
mutate(Date = as.Date(GameTime),
TeamName_VegasInsider = HomeTm,
Vegas_Prob = HomeProb) %>%
dplyr::select(Date, TeamName_VegasInsider, Vegas_Prob)) %>%
arrange(Date, desc(Vegas_Prob))
TeamFile = "C:\\Users\\Chris.Woolery\\Documents\\2021\\NF Misc.xlsx"
TeamDF = read_xlsx(TeamFile, sheet = "Presentation") %>%
filter(!is.na(TeamID))
PlanDF = PlanDF %>%
left_join(TeamDF %>%
dplyr::select(TeamID, TeamName_VegasInsider),
by = c("Winner" = "TeamID"))
ForecastDF = TeamDF %>%
mutate(Week = 1) %>%
rename(Pre_RPI = PS_RPI) %>%
dplyr::select(-TeamName_ProFBRef, -TeamName_VegasInsider) %>%
right_join(PlanDF, by = c("TeamID" = "Winner", "Week" = "Week_Num")) %>%
mutate(OppID = if_else(VisID == TeamID, HomeID, VisID)) %>%
dplyr::select(OutcomeID, Week, Date, TeamID, TeamName_VegasInsider, Pre_RPI, OppID) %>%
left_join(TeamDF %>%
mutate(Week = 1) %>%
rename(OppPre_RPI = PS_RPI) %>%
dplyr::select(Week, TeamID, OppPre_RPI),
by = c("OppID" = "TeamID", "Week" = "Week")) %>%
mutate(Pre_Prob = Pre_RPI + (Pre_RPI * (0.5 - OppPre_RPI))) %>%
left_join(OddsTableLong,
by = c("Date", "TeamName_VegasInsider")) %>%
mutate(Post_Prob = if_else(Vegas_Prob > 0,
(Vegas_Prob + Pre_Prob * Week) / (Week + 1),
Pre_Prob),
Post_RPI = if_else(Vegas_Prob > 0,
(Vegas_Prob / (-1 * OppPre_RPI + 1.5) + Pre_RPI * Week) / (Week + 1),
Pre_RPI)) %>%
arrange(OutcomeID)
# Loop through and populate Pre/Post Odds
StartRow = 33
for (i in StartRow :nrow(ForecastDF)) {
ForecastDF$TeamName_VegasInsider[i] =
if_else(sum((ForecastDF$Week == ForecastDF$Week[i] -1) * (ForecastDF$TeamID == ForecastDF$TeamID[i])) > 0,
max("",ForecastDF$TeamName_VegasInsider[ForecastDF$Week == ForecastDF$Week[i] -1
& ForecastDF$TeamID == ForecastDF$TeamID[i]]),
max("",ForecastDF$TeamName_VegasInsider[ForecastDF$Week == ForecastDF$Week[i] -2
& ForecastDF$TeamID == ForecastDF$TeamID[i]]))
ForecastDF$Pre_RPI[i] =
if_else(sum((ForecastDF$Week == ForecastDF$Week[i] -1) * (ForecastDF$TeamID == ForecastDF$TeamID[i])) > 0,
max(0, ForecastDF$Post_RPI[ForecastDF$Week == ForecastDF$Week[i] -1 & ForecastDF$TeamID == ForecastDF$TeamID[i]]),
max(0, ForecastDF$Post_RPI[ForecastDF$Week == ForecastDF$Week[i] -2 & ForecastDF$TeamID == ForecastDF$TeamID[i]]))
ForecastDF$OppPre_RPI[i] =
if_else(sum((ForecastDF$Week == ForecastDF$Week[i] -1) * (ForecastDF$TeamID == ForecastDF$OppID[i])) > 0,
max(0, ForecastDF$Post_RPI[ForecastDF$Week == ForecastDF$Week[i] -1 & ForecastDF$TeamID == ForecastDF$OppID[i]]),
max(0, ForecastDF$Post_RPI[ForecastDF$Week == ForecastDF$Week[i] -2 & ForecastDF$TeamID == ForecastDF$OppID[i]]))
ForecastDF$Pre_Prob[i] =
ForecastDF$Pre_RPI[i] + (ForecastDF$Pre_RPI[i] * (0.5 - ForecastDF$OppPre_RPI[i]))
ForecastDF$Vegas_Prob[i] =
if_else(is.na(ForecastDF$Vegas_Prob[i]), 0, ForecastDF$Vegas_Prob[i])
# Old Version weighted total of prior weeks
# ForecastDF$Post_Prob[i] =
# if_else(ForecastDF$Vegas_Prob[i] > 0,
# (ForecastDF$Vegas_Prob[i] + ForecastDF$Pre_Prob[i] * ForecastDF$Week[i]) / (ForecastDF$Week[i] + 1),
# ForecastDF$Pre_Prob[i])
# New Version weights prior weeks equal to current odds
ForecastDF$Post_Prob[i] =
if_else(ForecastDF$Vegas_Prob[i] > 0,
(ForecastDF$Vegas_Prob[i] + ForecastDF$Pre_Prob[i]) / 2,
ForecastDF$Pre_Prob[i])
ForecastDF$Post_RPI[i] =
if_else(ForecastDF$Vegas_Prob[i] > 0,
(ForecastDF$Vegas_Prob[i] / (-1 * ForecastDF$OppPre_RPI[i] + 1.5) + ForecastDF$Pre_RPI[i] * ForecastDF$Week[i])
/ (ForecastDF$Week[i] + 1),
ForecastDF$Pre_RPI[i])
}
# Maybe archive and retrieve ForecastDF at this point?
# saveRDS(ForecastDF, "C:\\Users\\Chris.Woolery\\Documents\\2021\\ForecastDF.RDS")
head(ForecastDF, 5) %>%
kable("html", escape = FALSE,
caption = paste0('<p style="color:black; font-size:18px">',
'2021 NFL Schedule - Forecasted Outcomes',
'</p>')) %>%
kable_styling("striped",
bootstrap_options = c("hover", "condensed"),
full_width = T) %>%
row_spec(0, color = "white", background = "black")
| OutcomeID | Week | Date | TeamID | TeamName_VegasInsider | Pre_RPI | OppID | OppPre_RPI | Pre_Prob | Vegas_Prob | Post_Prob | Post_RPI |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 01001 | 1 | 2021-09-09 | DAL | Dallas | 0.55 | TAM | 0.80 | 0.3850 | 0.2272727 | 0.3061364 | 0.4373377 |
| 01002 | 1 | 2021-09-09 | TAM | Tampa Bay | 0.80 | DAL | 0.55 | 0.7600 | 0.8181818 | 0.7890909 | 0.8306220 |
| 01003 | 1 | 2021-09-12 | PHI | Philadelphia | 0.30 | ATL | 0.40 | 0.3300 | 0.4081633 | 0.3690816 | 0.3355288 |
| 01004 | 1 | 2021-09-12 | ATL | Atlanta | 0.40 | PHI | 0.30 | 0.4800 | 0.6363636 | 0.5581818 | 0.4651515 |
| 01005 | 1 | 2021-09-12 | NYJ | N.Y. Jets | 0.25 | CAR | 0.35 | 0.2875 | 0.3921569 | 0.3398284 | 0.2955030 |
Here you can see:
Pre_RPI : The RPI prior to considering the Vegas line for this weeks’ games. Remember, the RPI is the relative probability of that team beating the average team in the league.
OppPre_RPI : The Pre_RPI for the opposing team.
Pre_Prob : The probability of the selected team beating the opponent that week prior to considering the Vegas line. For example a team with a RPI of 0.80 playing a team with a RPI of 0.55 has a 76% chance of winning.
Vegas_Prob : The 6 site consensus Vegas money line for the game (if available), converted to probability.
Post_Prob : The Pre_Prob adjusted by the Vegas_Prob where both are weighted equally (This is a change after week 2, previously the Pre_Prob was weighted by the number of weeks results it is based on while the Vegas_Prob was weighted at 1. This resulted in too much weighting to the pre-season rankings).
Post_RPI : The Post_Prob converted back to RPI.
Now what we want to do is use the Post_Prob from each week to create a system of linear equations to maximize. In other words, we want to maximize our total probability of winning.
Let’s look at how we would represent this mathematically:
Where :
p = The probability of winning a specific game
Z = The total probability of picking the winning team
t = Subscript, team id
w = Subscript, week number
Solve for :
x = 1 pick the game, 0 don’t pick the game
So we will simultaneously solve the following series of 32 (one for each team) equations with 18 (the weeks in the regular season) variables.
\[ Z_{t1} = p_{t1w1}*x_{t1w1} + p_{t1w2}*x_{t1w2} + p_{t1w3}*x_{t1w3} ... p_{t1w18}*x_{t1w18}\\ Z_{t2} = p_{t2w1}*x_{t2w1} + p_{t2w2}*x_{t2w2} + p_{t2w3}*x_{t2w3} ... p_{t2w18}*x_{t2w18}\\ Z_{t3} = p_{t3w1}*x_{t3w1} + p_{t3w2}*x_{t3w2} + p_{t3w3}*x_{t3w3} ... p_{t3w18}*x_{t3w18}\\ ...\\ Z_{t32} = p_{t32w1}*x_{t32w1} + p_{t32w2}*x_{t32w2} + p_{t32w3}*x_{t32w3} ... p_{t32w18}*x_{t32w18} \]
Which can be written as: \[ \max Z = \sum_{t=1}^{32}\sum_{w=1}^{18}p_{tw}x_{tw} \]
Subject to these constraints: \[ x_{t1w1} + x_{t1w2} + x_{t1w3} ... x_{t1w18} = 1\\ x_{t1w1} + x_{t2w1} + x_{t3w1} ... x_{t32w1} = 1\\ \]
This type of problem is not your typical linear programming scenario, in fact, it looks a lot like a “Transportation problem” https://en.wikipedia.org/wiki/Transportation_theory_(mathematics).
So in bridging the concepts here, we can see that each team “supplies” a win while each week “demands” a win. Also, week 12 demands two wins.
Therefore we will hijack the “lp.transport” method from the “lpsolve” package where:
The probabilities are the “costs” that we want to maximize.
Each team supplies at most 1 game for all 18 weeks.
Each week demands 1 game for weeks 1-11 and 13-18 with week 12 demanding 2 games
# Thanks to this article for tips on how to use lpsolve
# https://towardsdatascience.com/operations-research-in-r-transportation-problem-1df59961b2ad
# Set-up a 'cost' matrix which is the odds of winning. Odds are zero in a bye week so override
# Teams are the rows or Suppliers in this transportation matrix
# weeks are the columns or customers
costs = ForecastDF %>%
dplyr::select(TeamID, Week, Pre_Prob) %>%
mutate(Pre_Prob = if_else(is.na(Pre_Prob), 0, Pre_Prob)) %>%
pivot_wider(names_from = Week, values_from = Pre_Prob, values_fill = 0) %>%
as.matrix()
rownames(costs) = costs[,1]
costs = costs[,-1]
# Build in picked weeks and teams
complete_weeks = as.numeric(c(1, 2, 3, 4, 5,
6, 7, 8, 9, 10,
11, 12, 12))
picked_teams = as.character(c("SFO", "GNB", "RAV", "BUF", "MIN",
"CLT", "NWE", "KAN", "MIA", "PIT",
"CLE", "DAL", "TAM"))
CompletedPicksDF = tibble(
TeamID = picked_teams,
Week = complete_weeks,
Picked = rep(-1, length(complete_weeks))
)
# Suppliers constraints are <= their supply
row.signs = rep("<=", 32)
# In our case that's games supplied to win are 1 per each of the 32 teams
row.rhs = rep(1, 32)
# Zero out for teams already picked in prior weeks
row.rhs = row.rhs * (!rownames(costs) %in% picked_teams)
# Customer constraints are = their demand
col.signs = rep("=", 18)
# In our case that's games demanded to win are 1 per weeks 1-11 and 13-16 plus 2 in week 12
col.rhs = c(rep(1, 11), 2, rep(1, 6))
# Zero out for weeks already picked
col.rhs = col.rhs * (!colnames(costs) %in% complete_weeks)
# run the solution
SolutionMatrix = lp.transport(costs, "max", row.signs, row.rhs, col.signs, col.rhs)$solution
rownames(SolutionMatrix) = rownames(costs)
colnames(SolutionMatrix) = colnames(costs)
PicksDF = SolutionMatrix %>%
as_tibble() %>%
mutate(TeamID = rownames(SolutionMatrix)) %>%
dplyr::select(TeamID, everything()) %>%
pivot_longer(-TeamID, names_to = "Week", values_to = "Picked") %>%
filter(Picked > 0) %>%
mutate(Week = as.numeric(Week)) %>%
bind_rows(CompletedPicksDF) %>%
arrange(Week, TeamID)
PicksDisplayDF = ForecastDF %>%
dplyr::select(Week, TeamID, Post_Prob) %>%
left_join(PicksDF, by = c("Week", "TeamID")) %>%
mutate(TeamID = factor(TeamID),
Picked = factor(
case_when(
Picked == -1 ~ "Past Pick - Winner",
Picked == 1 ~ "Future Pick",
TRUE ~ "Not Picked"), levels = c("Past Pick - Winner", "Past Pick - Loser",
"Future Pick", "Not Picked"))) %>%
arrange(TeamID, Week) %>%
left_join(ResultDF, by = c("TeamID" = "TeamID", "Week" = "Week_Num")) %>%
mutate(TeamID = factor(TeamID),
Category = case_when(
Post_Prob > 0.5 & Result == "Loser" ~ "Upset",
Post_Prob > 0.5 & Result == "Winner" ~ "Expected",
Post_Prob < 0.5 & Result == "Loser" ~ "Expected",
Post_Prob < 0.5 & Result == "Winner" ~ "Upset",
TRUE ~ as.character("Future")) %>%
factor(levels = c("Expected", "Upset", "Future")),
Result = as.character(Result),
Result = if_else(is.na(Result), "Future", Result) %>%
factor(levels=c("Winner", "Loser", "Future")),
Picked = if_else(Picked == "Past Pick - Winner" & Result == "Loser",
"Past Pick - Loser",
as.character(Picked)),
Picked = factor(Picked, levels = c("Past Pick - Winner", "Past Pick - Loser",
"Future Pick", "Not Picked"))
)
Here is what our pick matrix looks like:
PicksDisplayDF %>%
ggplot(aes(Week, TeamID, fill = Picked)) +
geom_tile() +
geom_text(aes(label = percent(Post_Prob, accuracy = 1), color = Picked, size = Picked)) +
scale_fill_manual(values = c("black", "dark red", "dark blue", "white")) +
scale_color_manual(values = c("white", "white", "white", "black")) +
scale_size_manual(values = c(4, 4, 4, 4)) +
scale_x_continuous(breaks = seq(1:18)) +
scale_y_discrete(limits = rev(levels(PicksDisplayDF$TeamID))) +
theme_classic() +
theme(legend.position="none") +
labs(title = "NFL Survivor Pool Picks by Winning Probability",
subtitle = "Version 1 : Equal Weighting, 2 Picks Week 12, 1 Pick All Other Weeks",
y = "",
caption = "Black highlight indicates past winning pick, Red highlight means past losing pick")
This solution represents the highest cumulative probability of winning for the 18 weeks.
Just to throw in a twist, there is a one time “Mulligan” option for the first 10 weeks in our pool where you can get back into the pool if your team looses; all prior picks still stand.
Given this stipulation, we would consider giving greater weight to 11-18 where a loss is final. That’s just what we have done with this pick matrix:
# Because the pool rules allow for one loss and a re-entry in weeks 1:10 then the
# games in week 11:18 need to be weighted with higher importance
# After Week 10, this code was edited to optimize only through the next 2 weeks beyond results
# WeightingFactor = 2
CapWeek = max(ResultDF$Week_Num) + 2
costs_wg = ForecastDF %>%
filter(Week <= CapWeek) %>%
# mutate(Pre_Prob = if_else(Week > 10, Pre_Prob * WeightingFactor, Pre_Prob)) %>%
dplyr::select(TeamID, Week, Pre_Prob) %>%
mutate(Pre_Prob = if_else(is.na(Pre_Prob), 0, Pre_Prob)) %>%
pivot_wider(names_from = Week, values_from = Pre_Prob, values_fill = 0) %>%
as.matrix()
rownames(costs_wg) = costs_wg[,1]
costs_wg = costs_wg[,-1]
# Customer constraints are = their demand
col.signs = rep("=", CapWeek)
# In our case that's games demanded to win are 1 per weeks 1-11 and 13-16 plus 2 in week 12
col.rhs = c(rep(1, 11), 2)
AddWks = CapWeek - length(col.rhs)
col.rhs = c(col.rhs, rep(1, AddWks))
# Zero out for weeks already picked
col.rhs = col.rhs * (!colnames(costs_wg) %in% complete_weeks)
SolutionMatrix_wg = lp.transport(costs_wg, "max", row.signs, row.rhs, col.signs, col.rhs)$solution
rownames(SolutionMatrix_wg) = rownames(costs_wg)
colnames(SolutionMatrix_wg) = colnames(costs_wg)
PicksDF_wg = SolutionMatrix_wg %>%
as_tibble() %>%
mutate(TeamID = rownames(SolutionMatrix)) %>%
dplyr::select(TeamID, everything()) %>%
pivot_longer(-TeamID, names_to = "Week", values_to = "Picked") %>%
filter(Picked > 0) %>%
mutate(Week = as.numeric(Week)) %>%
bind_rows(CompletedPicksDF) %>%
arrange(Week, TeamID)
PicksDisplayDF_wg = ForecastDF %>%
dplyr::select(Week, TeamID, Post_Prob) %>%
left_join(PicksDF_wg, by = c("Week", "TeamID")) %>%
mutate(TeamID = factor(TeamID),
Picked = factor(
case_when(
Picked == -1 ~ "Past Pick - Winner",
Picked == 1 ~ "Future Pick",
TRUE ~ "Not Picked"), levels = c("Past Pick - Winner", "Past Pick - Loser",
"Future Pick", "Not Picked"))) %>%
arrange(TeamID, Week) %>%
left_join(ResultDF, by = c("TeamID" = "TeamID", "Week" = "Week_Num")) %>%
mutate(TeamID = factor(TeamID),
Category = case_when(
Post_Prob > 0.5 & Result == "Loser" ~ "Upset",
Post_Prob > 0.5 & Result == "Winner" ~ "Expected",
Post_Prob < 0.5 & Result == "Loser" ~ "Expected",
Post_Prob < 0.5 & Result == "Winner" ~ "Upset",
TRUE ~ as.character("Future")) %>%
factor(levels = c("Expected", "Upset", "Future")),
Result = as.character(Result),
Result = if_else(is.na(Result), "Future", Result) %>%
factor(levels=c("Winner", "Loser", "Future")),
Picked = if_else(Picked == "Past Pick - Winner" & Result == "Loser",
"Past Pick - Loser",
as.character(Picked)),
Picked = factor(Picked, levels = c("Past Pick - Winner", "Past Pick - Loser",
"Future Pick", "Not Picked"))
)
PicksDisplayDF_wg %>%
ggplot(aes(Week, TeamID, fill = Picked)) +
geom_tile() +
geom_text(aes(label = percent(Post_Prob, accuracy = 1), color = Picked, size = Picked)) +
scale_fill_manual(values = c("black", "dark red", "dark green", "white")) +
scale_color_manual(values = c("white", "white", "white", "black")) +
scale_size_manual(values = c(4, 4, 4, 4)) +
scale_x_continuous(breaks = seq(1:18)) +
scale_y_discrete(limits = rev(levels(PicksDisplayDF$TeamID))) +
theme_classic() +
theme(legend.position="none") +
labs(title = "NFL Survivor Pool Picks by Winning Probability",
subtitle = "Version 2 : Maximizes Winning Probability at Current Week + 2",
y = "",
caption = "Black highlight indicates past winning pick, Red highlight means past losing pick")
Version 1 - No Weighting
Weeks 1-10 Mean Probability : 77.6%
Weeks 11+ Mean Probability : 73.0%
Version 2 - Capped at Current Week +2
Weeks 1-10 Mean Probability : 77.6%
Weeks 11+ Mean Probability : 74.3%
With all of this analytic power, what are the odds of surviving the entire regular season? Not that great as it turns out:
# Hey, you peaked at the code!
# For your troubles, here is a nice article showing the typical survival odds given a pool size
# https://www.teamrankings.com/nfl-survivor-pool-picks/articles/expected-length-of-nfl-survivor-pools/
PicksDisplayDF %>%
mutate(Latest_Probability = if_else(Category != "Future", 1, Post_Prob)) %>%
filter(Picked != "Not Picked") %>%
arrange(Week) %>%
dplyr::select(Week, Post_Prob, Latest_Probability) %>%
rename(Pick_Probability = Post_Prob) %>%
mutate(Starting_Probability = percent(cumprod(Pick_Probability), accuracy = .1),
Latest_Probability = percent(cumprod(Latest_Probability), accuracy = .1),
Pick_Probability = percent(Pick_Probability, accuracy = .1)) %>%
kable("html", escape = FALSE,
caption = paste0('<p style="color:black; font-size:18px">',
'2021 NFL Schedule - Forecasted Probability (Unweighted Model)',
'</p>')) %>%
kable_styling("striped",
bootstrap_options = c("hover", "condensed"),
full_width = T) %>%
row_spec(0, color = "white", background = "black")
| Week | Pick_Probability | Latest_Probability | Starting_Probability |
|---|---|---|---|
| 1 | 79.0% | 100.0% | 79.0% |
| 2 | 86.6% | 100.0% | 68.4% |
| 3 | 80.6% | 100.0% | 55.2% |
| 4 | 94.2% | 100.0% | 52.0% |
| 5 | 71.1% | 100.0% | 36.9% |
| 6 | 69.8% | 100.0% | 25.8% |
| 7 | 70.4% | 100.0% | 18.2% |
| 8 | 84.4% | 100.0% | 15.3% |
| 9 | 64.6% | 100.0% | 9.9% |
| 10 | 75.2% | 100.0% | 7.4% |
| 11 | 78.3% | 100.0% | 5.8% |
| 12 | 68.6% | 100.0% | 4.0% |
| 12 | 68.7% | 100.0% | 2.7% |
| 13 | 86.0% | 86.0% | 2.4% |
| 14 | 69.0% | 59.4% | 1.6% |
| 15 | 80.9% | 48.0% | 1.3% |
| 16 | 72.2% | 34.7% | 1.0% |
| 17 | 65.4% | 22.7% | 0.6% |
| 18 | 68.2% | 15.5% | 0.4% |
Even making the best possible picks over the 18 weeks, there is less than a half percent chance of making it beyond week #16!
Speaking of results, let’s look at the season to date probabilities versus the actual results.
UpsetString = PicksDisplayDF %>%
filter(Category != "Future") %>%
mutate(UpsetCount = if_else(Category == "Upset", 1, 0)) %>%
group_by(Week) %>%
summarize(UpsetRate = percent(sum(UpsetCount) / n()), .groups = "drop") %>%
pull(UpsetRate)
UpsetString = paste(rep("Week ", length(UpsetString)), seq(1, length(UpsetString), by = 1),
rep(": ", length(UpsetString)),
UpsetString,
rep(", ", length(UpsetString)), sep = "", collapse = "")
PicksDisplayDF %>%
ggplot(aes(Week, TeamID, fill = Category)) +
geom_tile() +
geom_text(aes(label = percent(Post_Prob, accuracy = 1), color = Category)) +
scale_fill_manual(values = c("black", "dark red", "white")) +
scale_color_manual(values = c("white", "white", "grey30")) +
scale_size_manual(values = c(4, 4, 4)) +
scale_x_continuous(breaks = seq(1:18)) +
scale_y_discrete(limits = rev(levels(PicksDisplayDF$TeamID))) +
theme_classic() +
theme(legend.position="none") +
labs(title = "NFL Actual Results Versus Winning Probability",
subtitle = "Upsets in RED, Expected in BLACK, Future games not highlighted",
y = "",
caption = paste0("Upset Rate : ", UpsetString))
Finally, for some additional fun I have prepared a simple Underdog betting model. For each week the model picks up to 2 teams that meet the following criteria.
# QA
UnderDog_DF = ForecastDF %>%
mutate(Est_Return = Pre_Prob * ProbToLine(Vegas_Prob),
Vegas_Line = ProbToLine(Vegas_Prob)) %>%
dplyr::select(Week,TeamID, OppID, Pre_Prob, Vegas_Prob, Vegas_Line, Est_Return) %>%
filter(Vegas_Prob != 0 & Pre_Prob > 0.5 & Est_Return > 0) %>%
group_by(Week) %>%
slice_max(order_by = Est_Return, n = 2) %>%
inner_join(PicksDisplayDF %>%
dplyr::select(Week, TeamID, Result),
by = c("Week", "TeamID")) %>%
ungroup() %>%
mutate(ActualResult = case_when(
Result == "Winner" ~ Vegas_Line,
Result == "Loser" ~ -100,
TRUE ~ 0),
CumResult = cumsum(ActualResult))
UnderDog_DF %>%
kable("html", escape = FALSE,
caption = paste0('<p style="color:black; font-size:18px">',
'Underdog Betting Model',
'</p>')) %>%
kable_styling("striped",
bootstrap_options = c("hover", "condensed"),
full_width = T) %>%
row_spec(0, color = "white", background = "black")
| Week | TeamID | OppID | Pre_Prob | Vegas_Prob | Vegas_Line | Est_Return | Result | ActualResult | CumResult |
|---|---|---|---|---|---|---|---|---|---|
| 1 | SDG | WAS | 0.5250000 | 0.4878049 | 105 | 55.12500 | Winner | 105 | 105 |
| 2 | OTI | SEA | 0.5367405 | 0.3076923 | 225 | 120.76661 | Winner | 225 | 330 |
| 2 | MIN | CRD | 0.5065015 | 0.3846154 | 160 | 81.04025 | Loser | -100 | 230 |
| 3 | CLT | OTI | 0.5028491 | 0.3333333 | 200 | 100.56981 | Loser | -100 | 130 |
| 3 | GNB | SFO | 0.6303683 | 0.4081633 | 145 | 91.40340 | Winner | 145 | 275 |
| 4 | SEA | SFO | 0.5832676 | 0.4347826 | 130 | 75.82478 | Winner | 130 | 405 |
| 5 | SFO | CRD | 0.6065645 | 0.3225806 | 210 | 127.37853 | Loser | -100 | 305 |
| 5 | BUF | KAN | 0.5354509 | 0.4444444 | 125 | 66.93136 | Winner | 125 | 430 |
| 6 | SEA | PIT | 0.6147124 | 0.3448276 | 190 | 116.79536 | Loser | -100 | 330 |
| 6 | NWE | DAL | 0.5505174 | 0.3773585 | 165 | 90.83538 | Loser | -100 | 230 |
| 7 | SEA | NOR | 0.5590839 | 0.3448276 | 190 | 106.22594 | Loser | -100 | 130 |
| 7 | DEN | CLE | 0.5113618 | 0.3921569 | 155 | 79.26108 | Loser | -100 | 30 |
| 8 | GNB | CRD | 0.5895359 | 0.2941176 | 240 | 141.48862 | Winner | 240 | 270 |
| 8 | NWE | SDG | 0.5361000 | 0.3125000 | 220 | 117.94200 | Winner | 220 | 490 |
| 9 | DEN | DAL | 0.5293431 | 0.2127660 | 370 | 195.85696 | Winner | 370 | 860 |
| 9 | CLE | CIN | 0.6453802 | 0.4545455 | 120 | 77.44563 | Winner | 120 | 980 |
| 10 | SEA | GNB | 0.5048446 | 0.3636364 | 175 | 88.34780 | Loser | -100 | 880 |
| 10 | NOR | OTI | 0.5662178 | 0.4166667 | 140 | 79.27050 | Loser | -100 | 780 |
| 11 | PIT | SDG | 0.5390249 | 0.3571429 | 180 | 97.02448 | Loser | -100 | 680 |
| 11 | NOR | PHI | 0.6152988 | 0.4878049 | 105 | 64.60638 | Loser | -100 | 580 |
| 12 | OTI | NWE | 0.5149173 | 0.3225806 | 210 | 108.13264 | Loser | -100 | 480 |
| 12 | CLE | RAV | 0.5088362 | 0.3508772 | 185 | 94.13469 | Loser | -100 | 380 |
| 13 | SDG | CIN | 0.5690518 | 0.4000000 | 150 | 85.35777 | Future | 0 | 380 |
| 14 | PIT | MIN | 0.5271081 | 0.3773585 | 165 | 86.97284 | Future | 0 | 380 |
| 14 | BUF | TAM | 0.5438447 | 0.4000000 | 150 | 81.57670 | Future | 0 | 380 |
I certainly wouldn’t bet real money based on a relatively simple model but it’s fun to track the results.
In closing, enjoy the NFL season and..
Choose your picks with savvy