NFL Survivor Project

Background : We decided to have an NFL “Survivor League” for our fall office pool this year (2021-2022). The rules are fairly simple.

Pick 1 winning team for each week of the 18 game regular season. If your team wins or ties, you advance to the next week. Week 12 is a special case where you have to pick two teams and both have to win in order to advance
You can only pick each team one time. Therefore you have to pick 19 teams total.
The last survivor takes 75% of the money in the pool while the runner up takes 25%. If there are ties, they divide the money accordingly.

So the question is, how can we use analytics to make the best picks?

Note : This article, like most of mine on R-Pubs, is both instructional and investigative. You can re-create any of these charts and tables from the embedded R code. Click the “Code” button to view.

First we will load in the 2021 schedule from www.pro-football-reference.com where we can get both the schedule and results.

Below is an example of the table we built where each row represents a possible outcome of a scheduled game.

## 2 - Load Data - Schedules

SchedURL = "https://www.pro-football-reference.com/years/2021/games.htm"
SchedYear = "2021"

ScheduleDF = bind_cols(
 tibble(
  Week_Num = SchedURL %>% read_html() %>% html_nodes('*[data-stat="week_num"]') %>% html_text(),
  Date = SchedURL %>% read_html() %>% html_nodes('*[data-stat="game_date"]') %>% html_text(),
  VisTm = SchedURL %>% read_html() %>% html_nodes('*[data-stat="winner"]') %>% html_text(),
  HomeTm = SchedURL %>% read_html() %>% html_nodes('*[data-stat="loser"]') %>% html_text(),
  GameLoc = SchedURL %>% read_html() %>% html_nodes('*[data-stat="game_location"]') %>% html_text(),
  PtsWin = SchedURL %>% read_html() %>% html_nodes('*[data-stat="pts_win"]') %>% html_text()) %>%
filter(Date != "Date"), 
 tibble(
  VisRef = SchedURL %>% read_html() %>% html_nodes('*[data-stat="winner"] a') %>% html_attr("href"),
  HomeRef = SchedURL %>% read_html() %>% html_nodes('*[data-stat="loser"] a') %>% html_attr("href"))) %>%
  mutate(VisID = str_extract(VisRef, "(?<=/teams/).*(?=/)") %>% toupper(),
         HomeID = str_extract(HomeRef, "(?<=/teams/).*(?=/)") %>% toupper(),
         Date = paste0(Date) %>% as.Date(format = "%Y-%m-%d"),
         Winner = if_else(PtsWin != "", VisID, ""),
         SwapTm = if_else(GameLoc != "@", VisTm, ""),
         SwapID = if_else(GameLoc != "@", VisID, ""),
         VisTm = if_else(GameLoc != "@", HomeTm, VisTm),
         VisID = if_else(GameLoc != "@", HomeID, VisID),
         HomeTm = if_else(GameLoc != "@", SwapTm, HomeTm),
         HomeID = if_else(GameLoc != "@", SwapID, HomeID)) %>%
  dplyr::select(-SwapTm, -SwapID, -GameLoc, -PtsWin)

PlanDF = ScheduleDF %>%
  filter(Week_Num %in% as.character(seq(1, 18))) %>%
  mutate(V = VisID, 
         H = HomeID,
         Week_Num = as.numeric(Week_Num)) %>%
  dplyr::select(Week_Num, Date, VisID, HomeID, V, H) %>%
  pivot_longer(-c(Date, Week_Num, VisID, HomeID), 
               names_to = "Version", values_to = "Winner") %>%
  group_by(Week_Num) %>%
  mutate(OutcomeID = paste0(str_pad(Week_Num, 2, "0", side = c("left")),
                         str_pad(row_number(), 3, "0", side = c("left")))) %>%
  ungroup()

ResultDF = ScheduleDF %>%
  filter(Winner != "" & !is.na(Winner)) %>%
  mutate(V = VisID, 
         H = HomeID,
         Week_Num = as.numeric(Week_Num)) %>%
  dplyr::select(Week_Num, Date, VisID, HomeID, V, H, Winner) %>%
  pivot_longer(-c(Date, Week_Num, VisID, HomeID, Winner), 
               names_to = "Version", values_to = "TeamID") %>%
  mutate(Result = if_else(Winner == TeamID, "Winner", "Loser") %>%
           factor(levels = c("Winner", "Loser"))) %>%
  dplyr::select(Week_Num, TeamID, Result)

# PossibleDF = expand_grid(
#   "Winner01" = filter(PlanDF, Week_Num == 1)$Winner,
#   "Winner02" = filter(PlanDF, Week_Num == 2)$Winner,
#   "Winner03" = filter(PlanDF, Week_Num == 3)$Winner) %>%
#   filter(Winner01 != Winner02 & Winner01 != Winner03 & Winner02 != Winner03)


# VertSchedDF = PlanDF %>%
#   mutate(Opp = if_else(Winner==VisID, HomeID, VisID)) %>%
#   rename(Team = Winner, Place = Version) %>%
#   dplyr::select(Week_Num, Team, Place, Opp)

head(PlanDF, 10) %>%
  kable("html", escape = FALSE,
        caption = paste0('<p style="color:black; font-size:18px">',
        '2021 NFL Schedule - Possible Outcomes',
        '</p>')) %>%
    kable_styling("striped",
                  bootstrap_options = c("hover", "condensed")) %>%
    row_spec(0, color = "white", background = "black")

2021 NFL Schedule - Possible Outcomes
Week_Num	Date	VisID	HomeID	Version	Winner	OutcomeID
1	2021-09-09	DAL	TAM	V	DAL	01001
1	2021-09-09	DAL	TAM	H	TAM	01002
1	2021-09-12	PHI	ATL	V	PHI	01003
1	2021-09-12	PHI	ATL	H	ATL	01004
1	2021-09-12	NYJ	CAR	V	NYJ	01005
1	2021-09-12	NYJ	CAR	H	CAR	01006
1	2021-09-12	MIN	CIN	V	MIN	01007
1	2021-09-12	MIN	CIN	H	CIN	01008
1	2021-09-12	SEA	CLT	V	SEA	01009
1	2021-09-12	SEA	CLT	H	CLT	01010

Now we need to get some idea of what the probability is of each of these possible outcomes happening. We will build that by:

Taking the consensus ordinal pre-season ranking from three major sports networks (ESPN, CBS and Fox) to form our basis.
Using that pre-season ranking we will calculate a “Relative Probability Index” or RPI (we’re borrowing the acronym) of each team’s probability of beating the average team in the league. After looking at the data for prior years and rounding for simplicity, the best team in the league has roughly an 80% chance of beating an average team while the worst team in the league has roughly a 20% chance.

Of course those pre-season expectations change as the reality of performance, injuries and other factors impact the actual results of games. Therefore, as the season progresses, we need to update our RPI. Rather than build a complicated “bottom-up” model based on hundreds of factors, we’re going to keep it simple and follow the money. Specifically the odds set by the major online betting platforms.

We will take the consensus money line from the 6 major online betting platforms each week and use that to adjust our RPI and game probabilities. This will be a continuous weighted adjustment so the RPI for each team will shift gradually rather than being fully overwritten by the latest money line.

## Load Data - Vegas Odds and Pre-Season Rankings

OddsUrl = "https://www.vegasinsider.com/nfl/odds/las-vegas/money/"

RawTable = OddsUrl %>%
  read_html() %>%
  html_node('*[class="frodds-data-tbl"]') %>%
  html_table(fill = TRUE)

OddsTable = RawTable %>%
  filter(X10 != "") %>%
  mutate(DateText = str_extract(X1, "\\d\\d\\/\\d\\d"),
         MonthText = str_sub(DateText,1,2),
         TimeText = str_extract(X1, "(?<=\\d\\d\\/\\d\\d\\s).*(?=\\n)"),
         YearText = if_else(as.numeric(MonthText)<=2, year(Sys.Date()) + 1, year(Sys.Date())),
         GameTime = paste0(DateText, "/", YearText, TimeText) %>%
           as.POSIXct(format = "%m/%d/%Y %H:%M"),
         VisTm = str_extract(X1, "(?<=\\d\\d\\d.).*"),
         HomeTm = str_replace(X1, "(?<=\\d\\d\\d.).*", "") %>%
                  str_extract("(?<=\\d\\d\\d.).*$"),
         VisLine = str_extract(X10, "(?<=^)\\-\\d*|(?<=(^\\+))\\d*") %>% as.numeric() ,
         HomeLine = str_extract(X10, "\\-\\d*(?<=$)|(?<=(\\+))\\d*(?<=$)") %>% as.numeric(),
         VisProb = LineToProb(VisLine),
         HomeProb = LineToProb(HomeLine),
         ProbGap = (VisProb + HomeProb) - 1) %>%
  dplyr::select(GameTime, VisTm, HomeTm, VisLine, HomeLine, VisProb, HomeProb, ProbGap)

# Need to store and lock game odds so that we don't loose games that already happened

ArchiveOdds = readRDS("C:\\Users\\Chris.Woolery\\Documents\\2021\\ArchiveOdds.RDS")

NewArchiveOdds = bind_rows(
  OddsTable %>%
  mutate(ArchiveDate = Sys.time()) %>%
  dplyr::select(ArchiveDate, everything()),
  ArchiveOdds) %>%
  group_by(GameTime, VisTm) %>%
  mutate(MaxArchiveDate = max(ArchiveDate)) %>%
  ungroup() %>%
  arrange(GameTime, VisTm) %>%
  filter(ArchiveDate == MaxArchiveDate) %>%
  dplyr::select(-MaxArchiveDate) %>%
  distinct()

saveRDS(NewArchiveOdds, "C:\\Users\\Chris.Woolery\\Documents\\2021\\ArchiveOdds.RDS")

# Restore Odds Table with Old Odds Plus Latest Odds
OddsTable = NewArchiveOdds %>%
  dplyr::select(-ArchiveDate)

OddsTableLong = 
  bind_rows(OddsTable %>%
              mutate(Date = as.Date(GameTime),
                     TeamName_VegasInsider = VisTm,
                     Vegas_Prob = VisProb) %>%
              dplyr::select(Date, TeamName_VegasInsider, Vegas_Prob),
            OddsTable %>%
              mutate(Date = as.Date(GameTime),
                     TeamName_VegasInsider = HomeTm,
                     Vegas_Prob = HomeProb) %>%
              dplyr::select(Date, TeamName_VegasInsider, Vegas_Prob)) %>%
            arrange(Date, desc(Vegas_Prob))

TeamFile = "C:\\Users\\Chris.Woolery\\Documents\\2021\\NF Misc.xlsx"

TeamDF = read_xlsx(TeamFile, sheet = "Presentation") %>%
  filter(!is.na(TeamID))

PlanDF = PlanDF %>%
  left_join(TeamDF %>%
              dplyr::select(TeamID, TeamName_VegasInsider),
            by = c("Winner" = "TeamID"))

ForecastDF = TeamDF %>%
  mutate(Week = 1) %>% 
  rename(Pre_RPI = PS_RPI) %>%
  dplyr::select(-TeamName_ProFBRef, -TeamName_VegasInsider) %>%
  right_join(PlanDF, by = c("TeamID" = "Winner", "Week" = "Week_Num")) %>%
  mutate(OppID = if_else(VisID == TeamID, HomeID, VisID)) %>%
  dplyr::select(OutcomeID, Week, Date, TeamID, TeamName_VegasInsider, Pre_RPI, OppID) %>%
  left_join(TeamDF %>%
              mutate(Week = 1) %>% 
              rename(OppPre_RPI = PS_RPI) %>%
              dplyr::select(Week, TeamID, OppPre_RPI),
            by = c("OppID" = "TeamID", "Week" = "Week")) %>%
  mutate(Pre_Prob = Pre_RPI + (Pre_RPI * (0.5 - OppPre_RPI))) %>%
  left_join(OddsTableLong,
            by = c("Date", "TeamName_VegasInsider")) %>%
  mutate(Post_Prob = if_else(Vegas_Prob > 0,
                             (Vegas_Prob + Pre_Prob * Week) / (Week + 1),
                             Pre_Prob),
         Post_RPI = if_else(Vegas_Prob > 0, 
                            (Vegas_Prob / (-1 * OppPre_RPI + 1.5) + Pre_RPI * Week) / (Week + 1),
                            Pre_RPI)) %>%
  arrange(OutcomeID)

# Loop through and populate Pre/Post Odds
StartRow = 33
for (i in StartRow :nrow(ForecastDF)) {
  ForecastDF$TeamName_VegasInsider[i] = 
    if_else(sum((ForecastDF$Week == ForecastDF$Week[i] -1) * (ForecastDF$TeamID == ForecastDF$TeamID[i])) > 0,
      max("",ForecastDF$TeamName_VegasInsider[ForecastDF$Week == ForecastDF$Week[i] -1 
                              & ForecastDF$TeamID == ForecastDF$TeamID[i]]),
      max("",ForecastDF$TeamName_VegasInsider[ForecastDF$Week == ForecastDF$Week[i] -2 
                              & ForecastDF$TeamID == ForecastDF$TeamID[i]]))
  ForecastDF$Pre_RPI[i] = 
    if_else(sum((ForecastDF$Week == ForecastDF$Week[i] -1) * (ForecastDF$TeamID == ForecastDF$TeamID[i])) > 0,
      max(0, ForecastDF$Post_RPI[ForecastDF$Week == ForecastDF$Week[i] -1 & ForecastDF$TeamID == ForecastDF$TeamID[i]]),
      max(0, ForecastDF$Post_RPI[ForecastDF$Week == ForecastDF$Week[i] -2 & ForecastDF$TeamID == ForecastDF$TeamID[i]]))
  ForecastDF$OppPre_RPI[i] = 
    if_else(sum((ForecastDF$Week == ForecastDF$Week[i] -1) * (ForecastDF$TeamID == ForecastDF$OppID[i])) > 0,
      max(0, ForecastDF$Post_RPI[ForecastDF$Week == ForecastDF$Week[i] -1 & ForecastDF$TeamID == ForecastDF$OppID[i]]),
      max(0, ForecastDF$Post_RPI[ForecastDF$Week == ForecastDF$Week[i] -2 & ForecastDF$TeamID == ForecastDF$OppID[i]]))
  ForecastDF$Pre_Prob[i] = 
    ForecastDF$Pre_RPI[i] + (ForecastDF$Pre_RPI[i] * (0.5 - ForecastDF$OppPre_RPI[i])) 
  ForecastDF$Vegas_Prob[i] = 
    if_else(is.na(ForecastDF$Vegas_Prob[i]), 0, ForecastDF$Vegas_Prob[i])
  # Old Version weighted total of prior weeks
  # ForecastDF$Post_Prob[i] = 
  #   if_else(ForecastDF$Vegas_Prob[i] > 0, 
  #     (ForecastDF$Vegas_Prob[i] + ForecastDF$Pre_Prob[i] * ForecastDF$Week[i]) / (ForecastDF$Week[i] + 1),
  #     ForecastDF$Pre_Prob[i])
  # New Version weights prior weeks equal to current odds
  ForecastDF$Post_Prob[i] = 
    if_else(ForecastDF$Vegas_Prob[i] > 0, 
      (ForecastDF$Vegas_Prob[i] + ForecastDF$Pre_Prob[i]) / 2,
      ForecastDF$Pre_Prob[i])
  ForecastDF$Post_RPI[i] = 
    if_else(ForecastDF$Vegas_Prob[i] > 0, 
      (ForecastDF$Vegas_Prob[i] / (-1 * ForecastDF$OppPre_RPI[i] + 1.5) + ForecastDF$Pre_RPI[i] * ForecastDF$Week[i]) 
        / (ForecastDF$Week[i] + 1),
      ForecastDF$Pre_RPI[i])
}

# Maybe archive and retrieve ForecastDF at this point?
# saveRDS(ForecastDF, "C:\\Users\\Chris.Woolery\\Documents\\2021\\ForecastDF.RDS")

head(ForecastDF, 5) %>%
  kable("html", escape = FALSE,
        caption = paste0('<p style="color:black; font-size:18px">',
        '2021 NFL Schedule - Forecasted Outcomes',
        '</p>')) %>%
    kable_styling("striped",
                  bootstrap_options = c("hover", "condensed"),
                  full_width = T) %>%
    row_spec(0, color = "white", background = "black")

2021 NFL Schedule - Forecasted Outcomes
OutcomeID	Week	Date	TeamID	TeamName_VegasInsider	Pre_RPI	OppID	OppPre_RPI	Pre_Prob	Vegas_Prob	Post_Prob	Post_RPI
01001	1	2021-09-09	DAL	Dallas	0.55	TAM	0.80	0.3850	0.2272727	0.3061364	0.4373377
01002	1	2021-09-09	TAM	Tampa Bay	0.80	DAL	0.55	0.7600	0.8181818	0.7890909	0.8306220
01003	1	2021-09-12	PHI	Philadelphia	0.30	ATL	0.40	0.3300	0.4081633	0.3690816	0.3355288
01004	1	2021-09-12	ATL	Atlanta	0.40	PHI	0.30	0.4800	0.6363636	0.5581818	0.4651515
01005	1	2021-09-12	NYJ	N.Y. Jets	0.25	CAR	0.35	0.2875	0.3921569	0.3398284	0.2955030

Here you can see:

Pre_RPI : The RPI prior to considering the Vegas line for this weeks’ games. Remember, the RPI is the relative probability of that team beating the average team in the league.
OppPre_RPI : The Pre_RPI for the opposing team.
Pre_Prob : The probability of the selected team beating the opponent that week prior to considering the Vegas line. For example a team with a RPI of 0.80 playing a team with a RPI of 0.55 has a 76% chance of winning.
Vegas_Prob : The 6 site consensus Vegas money line for the game (if available), converted to probability.
Post_Prob : The Pre_Prob adjusted by the Vegas_Prob where both are weighted equally (This is a change after week 2, previously the Pre_Prob was weighted by the number of weeks results it is based on while the Vegas_Prob was weighted at 1. This resulted in too much weighting to the pre-season rankings).
Post_RPI : The Post_Prob converted back to RPI.

Now what we want to do is use the Post_Prob from each week to create a system of linear equations to maximize. In other words, we want to maximize our total probability of winning.

Let’s look at how we would represent this mathematically:

Where :

p = The probability of winning a specific game

Z = The total probability of picking the winning team

t = Subscript, team id

w = Subscript, week number

Solve for :

x = 1 pick the game, 0 don’t pick the game

So we will simultaneously solve the following series of 32 (one for each team) equations with 18 (the weeks in the regular season) variables.

\[ Z_{t1} = p_{t1w1}*x_{t1w1} + p_{t1w2}*x_{t1w2} + p_{t1w3}*x_{t1w3} ... p_{t1w18}*x_{t1w18}\\ Z_{t2} = p_{t2w1}*x_{t2w1} + p_{t2w2}*x_{t2w2} + p_{t2w3}*x_{t2w3} ... p_{t2w18}*x_{t2w18}\\ Z_{t3} = p_{t3w1}*x_{t3w1} + p_{t3w2}*x_{t3w2} + p_{t3w3}*x_{t3w3} ... p_{t3w18}*x_{t3w18}\\ ...\\ Z_{t32} = p_{t32w1}*x_{t32w1} + p_{t32w2}*x_{t32w2} + p_{t32w3}*x_{t32w3} ... p_{t32w18}*x_{t32w18} \]

Which can be written as: \[ \max Z = \sum_{t=1}^{32}\sum_{w=1}^{18}p_{tw}x_{tw} \]

Subject to these constraints: \[ x_{t1w1} + x_{t1w2} + x_{t1w3} ... x_{t1w18} = 1\\ x_{t1w1} + x_{t2w1} + x_{t3w1} ... x_{t32w1} = 1\\ \]

This type of problem is not your typical linear programming scenario, in fact, it looks a lot like a “Transportation problem” https://en.wikipedia.org/wiki/Transportation_theory_(mathematics).

So in bridging the concepts here, we can see that each team “supplies” a win while each week “demands” a win. Also, week 12 demands two wins.

Therefore we will hijack the “lp.transport” method from the “lpsolve” package where:

The probabilities are the “costs” that we want to maximize.
Each team supplies at most 1 game for all 18 weeks.
Each week demands 1 game for weeks 1-11 and 13-18 with week 12 demanding 2 games

# Thanks to this article for tips on how to use lpsolve 
# https://towardsdatascience.com/operations-research-in-r-transportation-problem-1df59961b2ad
# Set-up a 'cost' matrix which is the odds of winning.  Odds are zero in a bye week so override
# Teams are the rows or Suppliers in this transportation matrix
# weeks are the columns or customers
costs = ForecastDF %>%
  dplyr::select(TeamID, Week, Pre_Prob) %>%
  mutate(Pre_Prob = if_else(is.na(Pre_Prob), 0, Pre_Prob)) %>%
  pivot_wider(names_from = Week, values_from = Pre_Prob, values_fill = 0) %>%
  as.matrix()
rownames(costs) = costs[,1]
costs = costs[,-1]

# Build in picked weeks and teams
complete_weeks = as.numeric(c(1, 2, 3, 4, 5, 
                              6, 7, 8, 9, 10,
                              11, 12, 12))
picked_teams = as.character(c("SFO", "GNB", "RAV", "BUF", "MIN", 
                              "CLT", "NWE", "KAN", "MIA", "PIT",
                              "CLE", "DAL", "TAM"))

CompletedPicksDF = tibble(
  TeamID = picked_teams,
  Week = complete_weeks,
  Picked = rep(-1, length(complete_weeks))
)

# Suppliers constraints are <= their supply
row.signs = rep("<=", 32)
# In our case that's games supplied to win are 1 per each of the 32 teams
row.rhs = rep(1, 32)
# Zero out for teams already picked in prior weeks
row.rhs = row.rhs * (!rownames(costs) %in% picked_teams)

# Customer constraints are = their demand  
col.signs = rep("=", 18)
# In our case that's games demanded to win are 1 per weeks 1-11 and 13-16 plus 2 in week 12
col.rhs = c(rep(1, 11), 2, rep(1, 6))
# Zero out for weeks already picked
col.rhs = col.rhs * (!colnames(costs) %in% complete_weeks)

# run the solution
SolutionMatrix = lp.transport(costs, "max", row.signs, row.rhs, col.signs, col.rhs)$solution
rownames(SolutionMatrix) = rownames(costs)
colnames(SolutionMatrix) = colnames(costs)

PicksDF = SolutionMatrix %>%
  as_tibble() %>%
  mutate(TeamID = rownames(SolutionMatrix)) %>%
  dplyr::select(TeamID, everything()) %>%
  pivot_longer(-TeamID, names_to = "Week", values_to = "Picked") %>%
  filter(Picked > 0) %>%
  mutate(Week = as.numeric(Week)) %>%
  bind_rows(CompletedPicksDF) %>%
  arrange(Week, TeamID)

PicksDisplayDF = ForecastDF %>%
  dplyr::select(Week, TeamID, Post_Prob) %>%
  left_join(PicksDF, by = c("Week", "TeamID")) %>%
  mutate(TeamID = factor(TeamID),
         Picked = factor(
           case_when(
             Picked == -1 ~ "Past Pick - Winner",
             Picked == 1 ~ "Future Pick",
             TRUE ~ "Not Picked"), levels = c("Past Pick - Winner", "Past Pick - Loser", 
                                              "Future Pick", "Not Picked"))) %>%
  arrange(TeamID, Week) %>%
  left_join(ResultDF, by = c("TeamID" = "TeamID", "Week" = "Week_Num")) %>%
  mutate(TeamID = factor(TeamID),
         Category = case_when(
            Post_Prob > 0.5 & Result == "Loser" ~ "Upset", 
            Post_Prob > 0.5 & Result == "Winner" ~ "Expected", 
            Post_Prob < 0.5 & Result == "Loser" ~ "Expected",
            Post_Prob < 0.5 & Result == "Winner" ~ "Upset",
            TRUE ~ as.character("Future")) %>%
              factor(levels = c("Expected", "Upset", "Future")),
         Result = as.character(Result),
         Result = if_else(is.na(Result), "Future", Result) %>%
              factor(levels=c("Winner", "Loser", "Future")),
         Picked = if_else(Picked == "Past Pick - Winner" & Result == "Loser", 
                          "Past Pick - Loser", 
                          as.character(Picked)),
         Picked = factor(Picked, levels = c("Past Pick - Winner", "Past Pick - Loser", 
                                              "Future Pick", "Not Picked"))
         )

Here is what our pick matrix looks like:

PicksDisplayDF %>%
  ggplot(aes(Week, TeamID, fill = Picked)) +
  geom_tile() +
  geom_text(aes(label = percent(Post_Prob, accuracy = 1), color = Picked, size = Picked)) +
  scale_fill_manual(values = c("black", "dark red", "dark blue", "white")) +
  scale_color_manual(values = c("white", "white", "white", "black")) +
  scale_size_manual(values = c(4, 4, 4, 4)) +
  scale_x_continuous(breaks = seq(1:18)) +
  scale_y_discrete(limits = rev(levels(PicksDisplayDF$TeamID))) +
  theme_classic() +
  theme(legend.position="none") +
  labs(title = "NFL Survivor Pool Picks by Winning Probability",
       subtitle = "Version 1 : Equal Weighting, 2 Picks Week 12, 1 Pick All Other Weeks",
       y = "",
       caption = "Black highlight indicates past winning pick, Red highlight means past losing pick")

This solution represents the highest cumulative probability of winning for the 18 weeks.

Just to throw in a twist, there is a one time “Mulligan” option for the first 10 weeks in our pool where you can get back into the pool if your team looses; all prior picks still stand.

Given this stipulation, we would consider giving greater weight to 11-18 where a loss is final. That’s just what we have done with this pick matrix:

# Because the pool rules allow for one loss and a re-entry in weeks 1:10 then the 
# games in week 11:18 need to be weighted with higher importance

# After Week 10, this code was edited to optimize only through the next 2 weeks beyond results

# WeightingFactor = 2
CapWeek = max(ResultDF$Week_Num) + 2

costs_wg = ForecastDF %>%
  filter(Week <= CapWeek) %>%
#  mutate(Pre_Prob = if_else(Week > 10, Pre_Prob * WeightingFactor, Pre_Prob)) %>%
  dplyr::select(TeamID, Week, Pre_Prob) %>%
  mutate(Pre_Prob = if_else(is.na(Pre_Prob), 0, Pre_Prob)) %>%
  pivot_wider(names_from = Week, values_from = Pre_Prob, values_fill = 0) %>%
  as.matrix()
rownames(costs_wg) = costs_wg[,1]
costs_wg = costs_wg[,-1]

# Customer constraints are = their demand  
col.signs = rep("=", CapWeek)
# In our case that's games demanded to win are 1 per weeks 1-11 and 13-16 plus 2 in week 12
col.rhs = c(rep(1, 11), 2)
AddWks = CapWeek - length(col.rhs)
col.rhs = c(col.rhs, rep(1, AddWks))
# Zero out for weeks already picked
col.rhs = col.rhs * (!colnames(costs_wg) %in% complete_weeks)


SolutionMatrix_wg = lp.transport(costs_wg, "max", row.signs, row.rhs, col.signs, col.rhs)$solution
rownames(SolutionMatrix_wg) = rownames(costs_wg)
colnames(SolutionMatrix_wg) = colnames(costs_wg)

PicksDF_wg = SolutionMatrix_wg %>%
  as_tibble() %>%
  mutate(TeamID = rownames(SolutionMatrix)) %>%
  dplyr::select(TeamID, everything()) %>%
  pivot_longer(-TeamID, names_to = "Week", values_to = "Picked") %>%
  filter(Picked > 0) %>%
  mutate(Week = as.numeric(Week)) %>%
  bind_rows(CompletedPicksDF) %>%
  arrange(Week, TeamID)

PicksDisplayDF_wg = ForecastDF %>%
  dplyr::select(Week, TeamID, Post_Prob) %>%
  left_join(PicksDF_wg, by = c("Week", "TeamID")) %>%
  mutate(TeamID = factor(TeamID),
         Picked = factor(
           case_when(
             Picked == -1 ~ "Past Pick - Winner",
             Picked == 1 ~ "Future Pick",
             TRUE ~ "Not Picked"), levels = c("Past Pick - Winner", "Past Pick - Loser", 
                                              "Future Pick", "Not Picked"))) %>%
  arrange(TeamID, Week) %>%
  left_join(ResultDF, by = c("TeamID" = "TeamID", "Week" = "Week_Num")) %>%
  mutate(TeamID = factor(TeamID),
         Category = case_when(
            Post_Prob > 0.5 & Result == "Loser" ~ "Upset", 
            Post_Prob > 0.5 & Result == "Winner" ~ "Expected", 
            Post_Prob < 0.5 & Result == "Loser" ~ "Expected",
            Post_Prob < 0.5 & Result == "Winner" ~ "Upset",
            TRUE ~ as.character("Future")) %>%
              factor(levels = c("Expected", "Upset", "Future")),
         Result = as.character(Result),
         Result = if_else(is.na(Result), "Future", Result) %>%
              factor(levels=c("Winner", "Loser", "Future")),
         Picked = if_else(Picked == "Past Pick - Winner" & Result == "Loser", 
                          "Past Pick - Loser", 
                          as.character(Picked)),
         Picked = factor(Picked, levels = c("Past Pick - Winner", "Past Pick - Loser", 
                                              "Future Pick", "Not Picked"))
         )

PicksDisplayDF_wg %>%
  ggplot(aes(Week, TeamID, fill = Picked)) +
  geom_tile() +
  geom_text(aes(label = percent(Post_Prob, accuracy = 1), color = Picked, size = Picked)) +
  scale_fill_manual(values = c("black", "dark red", "dark green", "white")) +
  scale_color_manual(values = c("white", "white", "white", "black")) +
  scale_size_manual(values = c(4, 4, 4, 4)) +
  scale_x_continuous(breaks = seq(1:18)) +
  scale_y_discrete(limits = rev(levels(PicksDisplayDF$TeamID))) +
  theme_classic() +
  theme(legend.position="none") +
  labs(title = "NFL Survivor Pool Picks by Winning Probability",
       subtitle = "Version 2 : Maximizes Winning Probability at Current Week + 2",
       y = "",
       caption = "Black highlight indicates past winning pick, Red highlight means past losing pick")

Version 1 - No Weighting

Weeks 1-10 Mean Probability : 77.6%
Weeks 11+ Mean Probability : 73.0%

Version 2 - Capped at Current Week +2

Weeks 1-10 Mean Probability : 77.6%
Weeks 11+ Mean Probability : 74.3%

With all of this analytic power, what are the odds of surviving the entire regular season? Not that great as it turns out:

# Hey, you peaked at the code!
# For your troubles, here is a nice article showing the typical survival odds given a pool size
# https://www.teamrankings.com/nfl-survivor-pool-picks/articles/expected-length-of-nfl-survivor-pools/


PicksDisplayDF %>%
  mutate(Latest_Probability = if_else(Category != "Future", 1, Post_Prob)) %>%
  filter(Picked != "Not Picked") %>%
  arrange(Week) %>%
  dplyr::select(Week, Post_Prob, Latest_Probability) %>%
  rename(Pick_Probability = Post_Prob) %>%
  mutate(Starting_Probability = percent(cumprod(Pick_Probability), accuracy = .1),
         Latest_Probability = percent(cumprod(Latest_Probability), accuracy = .1),
         Pick_Probability = percent(Pick_Probability, accuracy = .1)) %>%
  kable("html", escape = FALSE,
      caption = paste0('<p style="color:black; font-size:18px">',
      '2021 NFL Schedule - Forecasted Probability (Unweighted Model)',
      '</p>')) %>%
  kable_styling("striped",
                bootstrap_options = c("hover", "condensed"),
                full_width = T) %>%
  row_spec(0, color = "white", background = "black")

2021 NFL Schedule - Forecasted Probability (Unweighted Model)
Week	Pick_Probability	Latest_Probability	Starting_Probability
1	79.0%	100.0%	79.0%
2	86.6%	100.0%	68.4%
3	80.6%	100.0%	55.2%
4	94.2%	100.0%	52.0%
5	71.1%	100.0%	36.9%
6	69.8%	100.0%	25.8%
7	70.4%	100.0%	18.2%
8	84.4%	100.0%	15.3%
9	64.6%	100.0%	9.9%
10	75.2%	100.0%	7.4%
11	78.3%	100.0%	5.8%
12	68.6%	100.0%	4.0%
12	68.7%	100.0%	2.7%
13	86.0%	86.0%	2.4%
14	69.0%	59.4%	1.6%
15	80.9%	48.0%	1.3%
16	72.2%	34.7%	1.0%
17	65.4%	22.7%	0.6%
18	68.2%	15.5%	0.4%

Even making the best possible picks over the 18 weeks, there is less than a half percent chance of making it beyond week #16!

Speaking of results, let’s look at the season to date probabilities versus the actual results.

UpsetString = PicksDisplayDF %>%
  filter(Category != "Future") %>%
  mutate(UpsetCount = if_else(Category == "Upset", 1, 0)) %>%
  group_by(Week) %>%
  summarize(UpsetRate = percent(sum(UpsetCount) / n()), .groups = "drop") %>%
  pull(UpsetRate)

UpsetString = paste(rep("Week ", length(UpsetString)), seq(1, length(UpsetString), by = 1), 
                rep(": ", length(UpsetString)),
                UpsetString,
                rep(", ", length(UpsetString)), sep = "", collapse = "")

PicksDisplayDF %>%
  ggplot(aes(Week, TeamID, fill = Category)) +
  geom_tile() +
  geom_text(aes(label = percent(Post_Prob, accuracy = 1), color = Category)) +
  scale_fill_manual(values = c("black", "dark red", "white")) +
  scale_color_manual(values = c("white", "white", "grey30")) +
  scale_size_manual(values = c(4, 4, 4)) +
  scale_x_continuous(breaks = seq(1:18)) +
  scale_y_discrete(limits = rev(levels(PicksDisplayDF$TeamID))) +
  theme_classic() +
  theme(legend.position="none") +
  labs(title = "NFL Actual Results Versus Winning Probability",
       subtitle = "Upsets in RED, Expected in BLACK, Future games not highlighted",
       y = "",
       caption = paste0("Upset Rate : ", UpsetString))

Finally, for some additional fun I have prepared a simple Underdog betting model. For each week the model picks up to 2 teams that meet the following criteria.

The model predicts that the team will win (has a probability of > 0.5) prior to considering that weeks’ Vegas line.
The model calculates the Estimated Return = The Model’s Pre-Vegas Probability X The Vegas Money Line and chooses the top 2 that have positive returns.

# QA

UnderDog_DF = ForecastDF %>%
  mutate(Est_Return = Pre_Prob * ProbToLine(Vegas_Prob),
         Vegas_Line = ProbToLine(Vegas_Prob)) %>%
  dplyr::select(Week,TeamID, OppID, Pre_Prob, Vegas_Prob, Vegas_Line, Est_Return) %>%
  filter(Vegas_Prob != 0 & Pre_Prob > 0.5 & Est_Return > 0) %>%
  group_by(Week) %>%
  slice_max(order_by = Est_Return, n = 2) %>%
  inner_join(PicksDisplayDF %>%
               dplyr::select(Week, TeamID, Result),
             by = c("Week", "TeamID")) %>%
  ungroup() %>%
  mutate(ActualResult = case_when(
          Result == "Winner" ~ Vegas_Line,
          Result == "Loser" ~ -100,
          TRUE ~ 0),
         CumResult = cumsum(ActualResult))

UnderDog_DF %>%
  kable("html", escape = FALSE,
      caption = paste0('<p style="color:black; font-size:18px">',
      'Underdog Betting Model',
      '</p>')) %>%
  kable_styling("striped",
                bootstrap_options = c("hover", "condensed"),
                full_width = T) %>%
  row_spec(0, color = "white", background = "black")

Underdog Betting Model
Week	TeamID	OppID	Pre_Prob	Vegas_Prob	Vegas_Line	Est_Return	Result	ActualResult	CumResult
1	SDG	WAS	0.5250000	0.4878049	105	55.12500	Winner	105	105
2	OTI	SEA	0.5367405	0.3076923	225	120.76661	Winner	225	330
2	MIN	CRD	0.5065015	0.3846154	160	81.04025	Loser	-100	230
3	CLT	OTI	0.5028491	0.3333333	200	100.56981	Loser	-100	130
3	GNB	SFO	0.6303683	0.4081633	145	91.40340	Winner	145	275
4	SEA	SFO	0.5832676	0.4347826	130	75.82478	Winner	130	405
5	SFO	CRD	0.6065645	0.3225806	210	127.37853	Loser	-100	305
5	BUF	KAN	0.5354509	0.4444444	125	66.93136	Winner	125	430
6	SEA	PIT	0.6147124	0.3448276	190	116.79536	Loser	-100	330
6	NWE	DAL	0.5505174	0.3773585	165	90.83538	Loser	-100	230
7	SEA	NOR	0.5590839	0.3448276	190	106.22594	Loser	-100	130
7	DEN	CLE	0.5113618	0.3921569	155	79.26108	Loser	-100	30
8	GNB	CRD	0.5895359	0.2941176	240	141.48862	Winner	240	270
8	NWE	SDG	0.5361000	0.3125000	220	117.94200	Winner	220	490
9	DEN	DAL	0.5293431	0.2127660	370	195.85696	Winner	370	860
9	CLE	CIN	0.6453802	0.4545455	120	77.44563	Winner	120	980
10	SEA	GNB	0.5048446	0.3636364	175	88.34780	Loser	-100	880
10	NOR	OTI	0.5662178	0.4166667	140	79.27050	Loser	-100	780
11	PIT	SDG	0.5390249	0.3571429	180	97.02448	Loser	-100	680
11	NOR	PHI	0.6152988	0.4878049	105	64.60638	Loser	-100	580
12	OTI	NWE	0.5149173	0.3225806	210	108.13264	Loser	-100	480
12	CLE	RAV	0.5088362	0.3508772	185	94.13469	Loser	-100	380
13	SDG	CIN	0.5690518	0.4000000	150	85.35777	Future	0	380
14	PIT	MIN	0.5271081	0.3773585	165	86.97284	Future	0	380
14	BUF	TAM	0.5438447	0.4000000	150	81.57670	Future	0	380

I certainly wouldn’t bet real money based on a relatively simple model but it’s fun to track the results.

In closing, enjoy the NFL season and..

Choose your picks with savvy

NFL Survivor Project

Chris at Savvy Analytics