“You may not look like a winning team, … but you are one. So … play like one tonight.” - Billy Beane (Moneyball, 2011)

The Question

The holy grail in sports analytics is a single number that describes the value of a player. In most sports this metric is referred to as WAR (Wins Above Replacement). This figure is what helps determine how much to give in a contract, how much to ask for in a trade, how close your team is to competing, and basically everything a decision-maker may want to know about player evaluation. But, after you have the number, you’re still only part of the way to constructing a team. Knowing the values of players, and parlaying that knowledge into a real team that takes advantage of that databse are two very different things. The quote at the start of the piece was part of a less-than-inspiring locker room speech by Billy Beane — played by Brad Pitt — who has assembled a team that is supposed to be excellent on paper, but looks like no winning team anyone has ever seen in person. But, what does a “winning team” look like when using analytics? Or, even more pointedly, what does the “perfect” team look like?

I’m a writer at All About the Jersey — the SBNation blog covering the New Jersey Devils of the NHL. My managing editor, John Fischer, approached me asking what the ideal WAR team would look like. I didn’t immediately know, and so thought it would be an excellent opportunity to try to learn how to answer this type of question as objectively as possible. I’ll try to describe the logic of the approach broadly since the basic idea is generalizeable to any sport and any metric (not just WAR). This piece will detail the basic framework for how you can maximize the value of a given metric given certain constraints pertaining to roster construction. We’ll use the NHL as an example, but the methodology scales to any league. The main feature of the approach we’ll use here is called “linear programming”.

Objective

Linear programming is a method that will allow us to optimize the value of some expression, while controlling for the limitations of the variables therein. In the case of roster construction in the NHL, the thing we want to maximize could be something like GAR (Goals Above Replacement). We want the most valuable team money can buy according to that metric. There are some limitations from the way a sport is played (ex: NHL is typically 12 forwards, 6 defencemen, 2 goalies), the league rules (Ex: NHL allows no more than 23 players on the “active roster”), and fiscal restrictions (Ex: NHL Salary Cap for 2020 is $81.5M) that prevent us from just taking all of the best players and putting them on our roster. So what we want to do is find the maximum value possible for a set of players that abide by these rules. In order to do this, we start with an “objective function” — quite literally the function that describes the objective.

For this problem the objective function is going to be the sum of all values of players chosen (in GAR).

\(\mathrm{Maximize:} G_1x_1+G_2x_2+...+G_nx_n\)

In the equation above, \(G_i\) represents the GAR value of the \(i^{\mathrm{th}}\) player in the dataset, and \(x_i\) is an indicator variable identifying the number of times that player is selected to the team. We obviously can’t select a player more than once, because there’s only one of each player, but we’ll explain how to codify that later. For now, just accept that \(x_i\) is either going to be a zero or a one and, by the end, we want to find out the ideal combination of 23 (size of NHL roster) \(x_i\)s to be ones so as to maximize the total sum of the \(G_i\)s of our players.

Constraints

Now that we have established our objective, we need to list the “constraints”. These are the rules we cannot break when creating the team. Let’s start off with the one that is both easiest and closest to our original function: cap hit. We need to make sure that the the annual average value of the players’ contracts doesn’t exceed the league allowance ($81.5M). So, instead of multiplying the indicator variable by their GAR value as we did in the objective function, we multiply it by their cap hit. In the equation below, \(C_i\) represents the cap hit of the \(i^{\mathrm{th}}\) player, and \(x_i\) is the same indicator as above.

\(C_1x_1+C_2x_2+...+C_nx_n \leq 81,500,000\)

   

Another easy constraint that we can check off in similarly easy fashion is roster size. The NHL does not allow you to select more than 23 players for a team, so that means when we add up all of the players chosen (sum of all \(x_i\)s), it should not exceed 23.

\(x_1+x_2+...x_n \leq 23\)

   

Now we get to the limitation described earlier — the fact that no player can be selected more than once. As much as we’d all like 12 Elias Petterssons on ELCs, there is only one of him. For each player \(i\), we need to make sure \(x_i \leq1\). We need to put this in terms of a linear combination of all the players though because it is linked to the objective function. So in order to make sure that player \(i\) is not chosen more than once, we just need to zero-out the remaining player-variables. For player one, this would look like \(1x_1+0x_2+0x_3+...0x_n \leq 1\); for player two, \(0x_1+1x_2+0x_3+...0x_n \leq 1\); and so on through player “n”, \(0x_1+0x_2+0x_3+...1x_n \leq 1\). If we stack these together, we get an inequality matrix like the one shown below:

\(\begin{pmatrix}1x_{1}&0x_{2}&...&0x_{n}\\ \:0x_{1}\:&1x_2&\ddots&\vdots\\ \:\vdots\:&\ddots&\ddots&\vdots\\ 0x_{1}&...&...&1x_{n}\end{pmatrix} \leq \begin{pmatrix}1\\ \vdots\\ \vdots\\ 1\end{pmatrix}\).

   

There is another constraint on how many of a type of player we should be able to pick. This time, the constraint is a minimum rather than maximum and it pertains to the positional requirements of the roster. In the above matrix, we only care about 1 player at a time. In the final rule that we’ll set, the logic is the same as above, but this time we care about groups of players rather than individual ones — specifically, positions. An important note here is that there’s not actual rule about this, but it is generally the case that an NHL team will have 4 centers, 4 left wingers, 4 rights wingers, 6 defenders, and 2 goalies on the active roster. If one were to believe in a different distribution, the matrix can be easily tweaked accordingly.

Let’s say that our data is conveniently sorted such that players 1 through \(c\) are all centers. They are all counted towards the total in this case, so rather than giving just one of them the coefficient of 1 like we did above, every center gets one, and we can’t have less than 4 of them. As an inequality, this looks like \((x_1...+...x_c)\geq 4\) where \(c\) is the last position of a “Center” in the dataset, and the \(x_i\)s are the same indicators they’ve always been. But remember, just as in the individual player example, this needs to be expressed as a linear combination of all players in in the dataset, not just the centers. That inequality would look like this: \((1x_1...+...1x_c)+(0x_{c+1}...0x_r)+(0x_{r+1}...0x_l)+(0x_{l+1}...0x_d)+(0x_{d+1}...+...0x_g) \geq 4\), where \(c\) is the last position of a “Center” in the dataset, \(r\) is the last position of a “Right Wing” in the dataset, and so on. A clarifying point is that, despite the subscript, \(x_{c+1}\) is not a center, it is to be read as “One index past that last center” which is really the first right wing. Sorry about that, every other subscript structure I thought of would’ve involved 3+ levels which would’ve made things simply hideous. Each positional designation gets one of these, just like each player got one of their own rows in the matrix above. The resulting matrix looks like this:

\(\begin{pmatrix}1x_1...&...1x_c&0x_{c+1}...0x_r&0x_{r+1}...0x_l&0x_{l+1}...0x_d&0x_{d+1}...&...0x_g\\ 0x_1...&...0x_c&1x_{c+1}...1x_r&\ddots&\ddots&0x_{d+1}...&...0x_g\\ \vdots&\vdots&\vdots&\ddots&\ddots&\vdots&\vdots\\ \vdots&\vdots&\vdots&\ddots&\ddots&\vdots&\vdots\\ 0x_1...&...0x_c&0x_{c+1}...0x_r&...&...&1x_{d+1}...&...1x_g\end{pmatrix}\geq \begin{pmatrix}4\\ 4\\4 \\6\\ 2\end{pmatrix}\)

Okay, that’s it for the constraints! Let’s see what the final program looks like, and run it on the dataset.

   

The Linear Program

\(\mathrm{Maximize:} G_1x_1+G_2x_2+...+G_nx_n\)

\(\mathrm{Subject}\) \(\mathrm{to:}\)

\(\begin{pmatrix}C_1x_1&C_2x_2&...&...&...&...&C_nx_n&\le 81,500,000\\x_1&x_2&...&...&...&...&x_n&\le 23\\1x_{1}&0x_{2}&...&...&...&...&0x_{n}&\le 1\\0x_{1}\:&1x_2&\ddots&\ddots&\ddots&\ddots&\vdots&\le 1\\\:\vdots\:&\ddots&\ddots&\ddots&\ddots&\ddots&\vdots&\vdots\\ 0x_{1}&...&...&...&...&...&1x_{n}&\le 1\\1x_1...&...1x_c&0x_{c+1}...0x_r&0x_{r+1}...0x_l&0x_{l+1}...0x_d&0x_{g-1}&0x_g&\ge 4\\ 0x_1...&...0x_c&1x_{c+1}...1x_r&\ddots&\ddots&0x_{g-1}&0x_g&\ge 4\\ \vdots&\vdots&\vdots&\ddots&\ddots&\vdots&\vdots&\ge 4\\ \vdots&\vdots&\vdots&\ddots&\ddots&\vdots&\vdots&\ge 6\\ 0x_1...&...0x_c&0x_{c+1}...0x_r&...&...&1x_{g-1}&1x_g&\ge 2\end{pmatrix}\)

Now that we’ve set up the math, it’s time to write up the code.

R Code

Using GAR data from Evolving-Hockey and salary data from CapFriendly, I created a data frame of players from the 2020 season along with their GAR, cap hit, and a positional indicator matrix. The dataset is shown here:

head(df)
##         Player     Cap  GAR c r l d g
## 1   aaron-dell 1900000  7.0 0 0 0 0 1
## 2 aaron-ekblad 7500000 11.4 0 0 0 1 0
## 3   aaron-ness  725000  3.7 0 0 0 1 0
## 4 adam-boqvist  894167  5.0 0 0 0 1 0
## 5  adam-brooks  759167  0.9 1 0 0 0 0
## 6    adam-erne 1050000 -6.0 0 0 1 0 0

Now, onto the linear programming. In R, there is a handy package designed specifically for this type of problem called lpSolve. It also allows for a fairly intuitive coefficient definition process. See the below code1 for how it could be applied to the example we’ve described in this piece. It’s also been set to only use integers for the \(x_i\)s because we don’t want it choosing fractions of people.

## Load Libraries
library(lpSolve);
library(scales);library(knitr) # <- Just for Presentation purposes

## Objective function coefficients
objective.in  <- df$GAR 


## Create constraint martix (LHS)
const.mat <- matrix(c(df$Cap,                         # Salary Cap Vector
                      diag(nrow(df)) %>% as.numeric,  # Player Identity Matrix
                      rep(1,nrow(df)),                # Roster Count
                      df$c,                           # Position Indicator  (C)
                      df$r,                           #        " "          (RW)
                      df$l,                           #        " "          (LW)    
                      df$d,                           #        " "          (D)
                      df$g),                          #        " "          (G)
                    nrow=7+nrow(df), 
                    byrow=TRUE)

## Define constraint constants (RHS of Matrix)
cap_constraint <- 81.5*1000000
nonrep_constraint <- rep(1,nrow(df))
roster_constraint <- 23
minpos_constraint <- c(4,4,4,6,2)


## Set the RHS of matrix using constraints
const.rhs <- c(cap_constraint,   
               nonrep_constraint,
               roster_constraint, 
               minpos_constraint)

## Constraint directions
const.dir  <- c(rep("<=",length(const.rhs)-length(minpos_constraint)), # Cap, Non-repetition, roster size
                rep(">=",length(minpos_constraint)))                   # Positional minimums

## Find the optimal solution
optimum <-  lp(direction="max",  
               objective.in, 
               const.mat, 
               const.dir,  
               const.rhs,
               int.vec = seq(1,nrow(df)))


df %>%
  pivot_longer(cols = c(c,r,l,d,g),names_to = "Position") %>%
  group_by(Player) %>% 
  filter(value==1) %>% ungroup %>%
  mutate(Position = toupper(Position)) %>%
  select(Player,Cap,GAR,Position) %>%
  cbind(solution = optimum$solution == 1) %>% # Label optimal roster players
  filter(solution) %>%                        # Remove non-optimals
  arrange(Position,-Cap) %>%
  mutate(Cap = dollar(as.numeric(Cap)),
         Player = toupper(Player)) %>%
  select(Player,Position,Cap,GAR) %>%
  kable
Player Position Cap GAR
BRAYDEN-POINT C $6,750,000 21.5
NATHAN-MACKINNON C $6,300,000 19.8
ELIAS-PETTERSSON C $925,000 23.7
ANTHONY-CIRELLI C $728,333 15.7
RYAN-ELLIS D $6,250,000 23.2
ROMAN-JOSI D $4,000,000 17.9
RYAN-PULOCK D $2,000,000 19.3
ADAM-FOX D $925,000 14.9
JOHN-MARINO D $925,000 15.3
CALE-MAKAR D $880,833 15.8
TUUKKA-RASK G $7,000,000 19.7
CONNOR-HELLEBUYCK G $6,166,666 26.7
ANTON-KHUDOBIN G $2,500,000 14.4
DARCY-KUEMPER G $1,850,000 15.6
MACKENZIE-BLACKWOOD G $697,500 12.6
ARTEMI-PANARIN L $11,642,857 24.9
BRAD-MARCHAND L $6,125,000 20.6
ANDREI-SVECHNIKOV L $925,000 14.2
VALERI-NICHUSHKIN L $850,000 14.7
DAVID-PASTRNAK R $6,666,666 17.8
BRYAN-RUST R $3,500,000 15.9
OLIVER-BJORKSTRAND R $2,500,000 13.8
ROBERT-THOMAS R $894,166 10.5

Analysis and Amendments

Those with knowledge of how hockey works probably recognize some of the issues with this initial lineup. The most obvious problem is that there are 5 goalies. You can only play 1 goalie at a time and so teams typically only carry 2 on their roster: a starter and a backup (or a platoon). Quarterbacks in football would fall into the same trap in a mechanism like this. The problem is that the model doesn’t know that only one goalie is playing at a time. More broadly, this algorithm doesn’t understand playing time at all — it assums all players will receive the same distribution of minutes they have received in this, the 2020 season. Following the quarterback example, Kirk Cousins is no longer going to be worth the same value if he were Patrick Mahomes’s backup. As one might expect, the problem is harder than has been described thus far. Here are just a few things to consider if we are going to refine and perfect this model:

  1. Playing Time: One way to consider doing this would be to allow the model to select numbers between 0 and 1 rather than just integers. In this structure each player the coefficient would represent the total percent of available minutes they would play in a given situation. You’d then have to append the objective function (and every row of the matrix) to itself and for each situation.

  2. Positional Budget: If the above task is too daunting (it was for me) then you cant put a budget on a position. In the NHL, maybe that budget could look something like $9M for a goalie. Of the 163 goalies under contract in the NHL, only 3 earn over $7M, and even if that top goalie were selected, there would still be 123 goalies left over that earn $2M or less to fill in the backup spot.

  3. Force Roster Structure: If the positional budget wasn’t enough of a hint for the algorithm, or if it can’t be performed for whatever reason, then we can explicitly tell the model what we want.

When John was presented with this roster, he had some understandable gripes, like any good GM would. Also, the premise of the article implied that this roster was supposed to be the ideal roster for the Devils team. For some context, the Devils have no good defenders in the farm system, so, of those three players that we have not on our active roster, we want two of them to be defenders. Furthermore, since we want to prevent injuries from forcing us into the minors for defence, we specifically want one defender on either side. The closest proxy for this was in the dataset I used was “handedness”. So, I adjusted the positional restrictions from \((C\geq4,R\geq4,L\geq4,D\geq6,G\geq2)\) to \((C\geq4,R\geq4,L\geq4,RD=4,LD=4,G=2)\). To prevent the model from pouring 8-figures into the 2 goalies, I also chose to restrict the goalie position’s cap hit to less than $9M. As a restriction, this looks idencital to the cap hit restriction for the team, except all non-goalie \(x_i\)s have a coefficient of zero and the goalies are one. These are example of how options 2 and 3 above could be adapted to the model.

The “Perfect” Team 2.0

Player Position Cap GAR
ELIAS-PETTERSSON C $925,000 23.7
BRAYDEN-POINT C $6,750,000 21.5
NATHAN-MACKINNON C $6,300,000 19.8
ANTHONY-CIRELLI C $728,333 15.7
CONNOR-HELLEBUYCK G $6,166,666 26.7
DARCY-KUEMPER G $1,850,000 15.6
ROMAN-JOSI LD $4,000,000 17.9
JACCOB-SLAVIN LD $5,300,000 16.5
VINCE-DUNN LD $722,500 12.1
MIRO-HEISKANEN LD $894,166 11.3
ARTEMI-PANARIN LW $11,642,857 24.9
BRAD-MARCHAND LW $6,125,000 20.6
VALERI-NICHUSHKIN LW $850,000 14.7
ANDREI-SVECHNIKOV LW $925,000 14.2
JARED-MCCANN LW $1,250,000 12.8
RYAN-ELLIS RD $6,250,000 23.2
RYAN-PULOCK RD $2,000,000 19.3
CALE-MAKAR RD $880,833 15.8
JOHN-MARINO RD $925,000 15.3
DAVID-PASTRNAK RW $6,666,666 17.8
BRYAN-RUST RW $3,500,000 15.9
OLIVER-BJORKSTRAND RW $2,500,000 13.8
PAVEL-BUCHNEVICH RW $3,250,000 13.4

That’s much better! We now have a roster that is viable and maximized for total GAR. The roster above has a cap hit of $80,402,021 — over a million under the cap — and produced over 400 goals above replacement value. For some perspective on that figure, no real NHL team cleared 160 goals above replacement. Even if we went back a year and chose the Tampa team that was not only historically great, but also played 82 games unlike our team, they still only barely managed to crack 200 GAR. I’d say twice as good as an historically dominant team is fairly impressive.

In my final roster recommendation to John, I played around with the lines and pairings so that they could make as much sense as possible. This was the arrangement I came up with for the project.

Still not perfect. For instance, the “handedness” approach to choosing sided defenders failed because multiple selected players use their off-hand. And, obviously, this still doesn’t account for playing time (we’d need to use GAR/60 and more complex constraints for that). But, it offers a description for how my recommendation came to be, and how you could apply linear programming optimization to maximize roster strength. This is how you create a winning team, whether or not it looks like one.

Hat tip to CapFriendly who provided the salary data; and to the Evolving Wild twins, Luke and Josh Younggren for their GAR model and for helping me merge the two datasets; and John for giving me the idea to run with and allowing me to post this piece explaining the methodology.


  1. The code for this section would https://towardsdatascience.com/linear-programming-in-r-444e9c199280