Goal:

Hockey analytics have come a long way in terms of measuring a player’s performance and impact on the ice. From counting a player’s individual goals in the early era of statistics to isolating a player’s impact on driving his team’s performance, we have advanced the way we evaluate players. One of the most complete metrics that is publicly available is the idea of measuring a player’s impact on his team’s performance and attributing a value that represents how many wins that player contributes to his team, built off data derived from the play by play. A method of calculating that measure that I specifically like is the one created by Josh and Luke Younggren, creators of evolving-hockey.com.

What I want to know is how to predict a player’s future overall impact, represented by Evolving Hockey’s Goals Above Replacement (GAR) per 60 minutes, using the information that we have seen so far. I gave that task a shot using statistical techniques that most respresent how I think about the question at hand.

Key Thought Processes:

I understand the complexity of the question at hand, and I wanted to emphasize the uncertainty around solving the problem. There were two ideas in my mind when I researched this question:

Thinking probabilistically and implementing uncertainty: I definitely wanted to add prediction intervals, as there was a lot of uncertainty due to the nature of the sport, the nature of the metric, and the incomplete collection of publicly available metrics. I also like to think of the values and player rankings in terms of probabilities in addition to the useful baseline measure. So, rather than showing solely that a player is expected to generate X more wins than another player, I like to show their expected values accompanied by a range of possibilities and to use probabilities derived from the range of potential outcomes to show the chances that a player is more valuable than another.

Bayesian thinking: The idea behind thinking in a Bayesian manner is to have some sort of prior expectation of the truth and adjust your understanding based on the information you observe.

Bayes Theorem (image from Google credited to gierfi.com)

Bayes Theorem (image from Google credited to gierfi.com)

I wanted to model having some sort of prior information about the relationships in the data and wanted to maximize the repeatability of my inference as well as minimize my susceptibility to noise in order to best predict out of sample.

Methodology:

I wanted to be able to use the data we have today to predict each player’s GAR in the future. We have advanced data back to 2007-08, but I wanted to use performance dating back two seasons in this model and to do that I capped the predictions off at 5 years in the future due to the diminishing sample of players in that diminished time frame who played more than 5 seasons. The further I predict in the future, the more uncertainty there is both in terms of smaller sample of players and more deviations for player paths, so I stopped after 5 years.

First, I used a combination of individual stats, GAR from the past three seasons, and demographic info about the player (with interactions between the variables where useful) to predict the next season’s GAR. I used a Bayesian Regression model with the first couple seasons as priors for the model parameters to predict the distribution of GAR for 2020-21 through the ensuing Markov Chain Monte Carlo Simulations. I stored the simulations and used the average of the distribution as the player’s Expected Value, which I then included in a new model with the same methodology to predict 2021-22 GAR. I used the same process to predict each of the next five seasons.

I used the predicted distributions to derive the Expected Value, Median projection, and 95% probability distribution (there’s a 95% chance this player’s GAR will be within this range) for each of the five seasons. I also calculated the average GAR value over the entire sample and the average over the first three seasons.

Using the distributions for each player, I then created a plot of their predicted range of outcomes and allow for the comparison of distributions between multiple players.

Next, I wanted to use the projected ranges of player on-ice value to analyze the value each player’s contract will bring relative to the budget constraint at hand: the salary cap.

We have two types of player contracts at the moment: those with existing signed contracts (the known) that extend past this season and those with expired contracts that need to sign new ones this offseason (the unknown). For the former group, I took their current deals from CapFriendly and for the latter group I used Evolving Hockey’s Contract Projection Tool to get their predicted contracts.

Using these, I was able to derive each player’s predicted value per dollar compared to league average over the course of their contract. I gave the players a Contract Value score, comparing their predicted GAR per dollar to the average and normalizing it. I also calculated the probability of each player generating more value per dollar than three different z-score thresholds of GAR per dollar.

Dashboard:

A link to this Dashboard can be found here

The dashboard presents each method of analysis in different tabs.

Player GAR Projections: A table consisting of every current player, some of their known and current data, and their projections for Expected Value, Median, and 95% Probability Interval. Each category is filterable using the white boxes.

Dashboard First Tab

Dashboard First Tab

Player Visual Projection: A plot of a user-selected player’s predicted range of GAR with the context of league median and league top 5% values shown as vertical lines.

Dashboard Second Tab

Dashboard Second Tab

Player Comparison: The user selects two players and the plot will update showing each player’s projected range of outcomes as well as the probability of the first player chosen having a higher GAR than the second player chosen.

Dashboard Third Tab

Dashboard Third Tab

Existing Contract Analysis: A table consisting of each current contract, projected GAR, and the analysis surrounding its projected GAR per dollar value. Each column is filterable using the white boxes.

Dashboard Fourth Tab

Dashboard Fourth Tab

Upcoming Free Agent Projected Value: A table consisting of each upcoming free agent contract projection, projected GAR, and the analysis surrounding its projected GAR per dollar value. Each column is filterable using the white boxes.

Dashboard Fifth Tab

Dashboard Fifth Tab

Player Contract Analysis: A plot of a user-selected player’s predicted range of GAR with the context of thresholds for GAR per dollar value shown as vertical lines.

Dashboard Sixth Tab

Dashboard Sixth Tab

Takeaways:

I believe this gives good insight into projecting how a player will perform, and I think the inclusion of micro stats allows us to understand which players show signs of repeatable production through their individual actions on the ice. We are able to identify which players we expect to rise above the rest, we see the uncertainty in projecting players and comparing the production of multiple players, and we see which individual stats are predictors of future success.

I think of this as the framework of how to project the future of NHL players and implement this thought process into team building and decision making. I believe the specifics can be improved upon continuously, both with improved data on players’ specific contributions and innovation in Machine Learning technology and techniques.

Example Conclusion:

Connor McDavid is projected to have the highest average GAR over the next five seasons among current NHL players. He has more than a 90% chance of being in the top half of players in each of the next five seasons and he has between a 30% and 51% chance in each of the next three seasons of being among the top 5% in the league in GAR per 60 minutes of play despite finishing 36th. In 2019-20 among players who logged at least 800 minutes.

Connor McDavid Projections

Connor McDavid Projections

McDavid has a 59% chance of providing more value over the next five seasons than the projected 2nd most valuable player in that span, David Pastrnak. McDavid has a 76.6% chance of providing more value than the 2018-29 Hart Trophy winner, Nikita Kucherov. Kucherov, the 26 year old reigning MVP, has a somewhat flat projection over the next three years before dropping off and by 2024-25 only having a 56.6% chance of being in the top half of players.

Connor McDavid and Nikita Kucherov Comparison

Connor McDavid and Nikita Kucherov Comparison

Most of the players currently in the top 15 in salary are on the downward portion of their career arcs and are projected to provide subpar value per dollar. Drew Doughty, for example, is projected to provide relatively poor value per dollar, as he is currently 30 years old and signed for 7 more seasons. The highly paid young players, primarily off Restricted Free Agent contracts fresh off their rookie deals, provide above average value even despite earning a lot of money. Examples of that include Connor McDavid, Auston Matthews, Jack Eichel, and Sebastian Aho. Artemi Panarin, the biggest Unrestricted Free Agent signing from last summer, is projected to provide just about average value for his salary. He provided very good value in his first season in New York, is projected to provide good value over the next three seasons, and then is projected to drop off at the back end of his deal. For someone who’s currently second in Average Annual Cap Hit, New York should be satisfied with the deal.

High Contracts

High Contracts

A trend that shows in the contract analysis is the powerful value of good players on rookie deals, contracts which are so much lower than the value that they provide that the deals make up the bulk of the list for highest value per dollar, even for players who are not yet superstars such as Jack Hughes and Kirby Dach. Having a very good player such as Elias Pettersson, Quinn Hughes, Cale Makar, and Brady Tkachuk on rookie contracts provides higher value than any other type of deal.

Highest Value Contracts

Highest Value Contracts

That trend continues when it comes to predicted this summer’s free agents and the value they provide. Matt Barzal, who is a young player about to sign his second deal, seems primed to provide above average value even despite a projected 8-year deal worth over $9 mil per year. On the other hand, veteran Unrestricted Free Agents such as Ryan Strome and Tyson Barrie seem more likely to provide significantly below average value per dollar. Anthony Cirelli, Valeri Nichushkin, and Andrew Mangiapane are three players who seem likely to provide more value than their next contracts would suggest.

Highest Projected Free Agent Contracts

Highest Projected Free Agent Contracts

Projecting players’ probable ranges of outcome give a better sense of the trends and uncertainty surrounding player development and can lead to a better understanding of where players go from this point in their career and which potential contracts provide better value.

Goalies:

I performed similar analysis with goalies and created similarly named tabs for that analysis on the dashboard. I found there was a lot more uncertainty in these predictions, as a surprise to no one, and a much lower difference between the high performers and replacement level. The major lessons you can take from the goalie predictions are that it is unlikely to predict sustained standout goalie performance and that it is especially unlikely as the players age.

A closer look at the contracts provides a clear ideal strategy when it comes to goalie contracts. The contracts that provide above average value are short, cheap, and mostly given to young players. There are no multi-year contracts given to goalies above $5 mil per year that are expected to yield high value, and there are no long-term contracts given to goalies that are expected to yield positive value.

Through the predictions you can see it is very uncertain as to who will sustain success over the next five years, and thus it is likely best practice to take fliers on young players on lower deals. The current promising goalies who may yield positive value include Carter Hart, Mackenzie Blackwood, Ilya Samsonov, Thatcher Demko, and the Rangers’ duo of Igor Shesterkin and Alexandar Georgiev. When faced with the dilemma of paying big money and term for a goalie who has performed well in the past or taking a flier on a potentially above average younger goalie for a lower price, the decision seems pretty clear.

Future:

I want to keep improving the method of predicting next season’s GAR, and the more I learn about advanced statistical techniques the higher chance I have of improving my methodology. I also specifically want to improve how the priors are chosen.

Specifically, I want to improve not just the model parameter priors, but also the pre-draft prior evaluation of the young players. At the moment I treat each player as the average value of their draft position. For example, the pre-draft evaluation of Nolan Patrick is treated as equal in the model to the pre-draft evaluation of Jack Eichel, who was commonly thought of as an especially exciting 2nd pick. I would love to implement a draft projection model to get a better understanding of young players’ prospects.

I could also look at percentage of the cap for contract analysis instead of solely dollar values. In analyzing current contracts I opted to look at the stagnant AAV because I wanted to see how much GAR per dollar each player projects to bring his team. However, when making decisions on future contracts, teams generally look at what percentage of the salary cap to offer the player as the cap is subject to change over time. If the salary cap rises, the contract being analyzed becomes more valuable as the player is providing a higher output relative to the budget constraint he takes up.

I also look to try to predict goalie performance in the future using this methodology, as this analysis was only done for skaters.