Flexible Aging in the NHL Using GAM

Part #1 – Intro to Aging Curves

Mapping Aging

Whose value resume is more impressive, Pavel Datsyuk’s or Connor McDavid’s? It’s a really difficult question to answer. We only have the beginning of McDavid’s career and the end of Datsyuk’s in public data used for GAR calculations. Surely there must be a way to control for the impact of age to determine how impressive their respective performances are. This was the crux of my interest when I began looking at NHL aging — simply to acquire age-adjusted performances. The further I went down the rabbit hole, though, the more it became apparent that this can only be answered in a comprehensive approach to what the process of NHL aging looks like. Ultimately, and unfortunately, this would mean constructing an aging curve. I say “unfortunately” because this is a well-trod area of research, and one rife with potholes. As such, this study will be delivered in 3 parts. First, we will summarize aging research up until this point, focusing on the current standard and how mine differs; second, we will describe the new approach and some preliminary findings; and last, we will investigate one possible approach to making the curves a little more flexible moving forward. This piece will serve as the first part — describing the state of aging curves.

An aging curve is, simply put, any graph where the x-axis is age, and the y-axis is some variable which could feasibly rise and fall predictably due to age. For instance, in medicine to see the impact of age on mortality rate, you could graph the mortality rate for every age – a curve which can help understand things like the Gompertz–Makeham Law of Mortality. In sports, we typically choose a metric that either represents a valuation (ex: GAR) or a specific skill (ex: Sh%) and graph the expected trajectory of a player’s career over that time. The quickest way to do this would be to take every player-season and graph the performance of players in the metric versus the age of the players in the season they produced the metric, like what was done here for goalies.

The immediate problem with this is that the only players that are in the league at the far ends are likely very good. As an example: for skaters, Connor McDavid is going to start in the NHL earlier than a random 4th rounder, and Jaromir Jagr was 100% of the 40+ year-old age pool at one point. This issue is a specific brand of selection bias called “survivor bias.” Briefly, I’d like to credit Burtch here for humbly acknowleding the error and correcting it, but also for leaving up the initial article so we may use it as an example in pieces like this. This bias must be fixed to have any hopes of portraying age accurately. The most common way to deal with this in the sabermetrics community is by using something called the “delta” method.

The Delta Method

This method was pioneered in baseball first and has since been applied elsewhere.. You can see Tom Tango write about it here, here, and here. This method was expounded on by Michael Lichtman here, where he introduced phantom years to further adjust for survivor bias here – a model adapted for hockey by EvolvingWild (Josh and Luke Younggren) here and here ¹. Until very recently, this has essentially been the gold standard for hockey aging.

In the delta method, we make age “buckets” that contain the change in value of a given metric (for the purposes of this article, we’ll say GAR) between two age-seasons. For example, you would calculate the difference between Sami Vatanen’s last two seasons (0.078 - 0.092 = -0.014) and put that in the age bucket corresponding to the transition between his ages in those two seasons — the “26->27” age bucket — this is done for every player in the dataset. Then you would average the observations within each age bucket and “chain” them together to view the cumulative effect. For example, if the peak age is 23 and the average player drops -0.2 GAR/60 from 23 to 24 and -0.4GAR from 24 to 25, then their “chained” decline at 25 would be -0.6 – indicating they are -0.6 GAR/60 from their peak. From that you would construct an aging curve.

This method solves the first, and biggest, problem introduced by survivorship: the “better” players outlive “worse” players because players are being compared only to their own personal history. For survivorship to still be impactful, we’d need provide evidence for the claim “players age at significantly different rates.” This may be, but it’s certainly less obvious than the statement “there is a significant difference in the value of players.” The delta method is effective, to a degree — but it is not the only method. Which brings me to my counterproposal — regression.

Regression

This is hardly an original idea. In fact, at about the same time Lichtman wrote his delta curve piece, J.C. Bradbury released an alternative method using multiple regression (paywall). You can see Bradbury’s summary, as well as criticism from Lichtman and others in the comments, here (the Lichtman piece I cited above also has links to some of the debate). I’m not going to relitigate that entire dispute here, but the most popular criticism was that he only used players with very large samples (who are likely quite good since they were allowed to play so long), but billed his findings in a more universal fashion. It might not matter, for reasons I’ll get into later, but at minimum it was reasonable to question the assumption that the good players would not differ significantly. Micah McCurdy utilized something of a hybrid approach when analyzing age’s impact on shot data — he still binned the data and found the differential between years (like delta), but after doing so, applied a model to predict the delta result (using regression). Overall, I’m of the opinion that Bradbury’s approach (sampling controversy notwithstanding) is superior to the delta method due to some pretty clear advantages. Before explaining my specific model, I want to describe how the regression will work, and where I believe it has benefits over the delta method.

Regression is an entirely different approach in structure. It is a method in which you provide a list of important metrics and how you think they’re related, and the algorithm will respond with a precise formula that maps those factors onto som desired response. In the context of aging, at minimum, the formula must contain some information about the player producing the metric and the age they were when they produced it. Imagine we have a speaker that is constantly playing a single note. We have multiple dials, all of which contribute to raising/lowering the volume of the speaker. The volume of the speaker is like a response variable (ex: GAR) and the different dials are the variables you are using to predict it (ex: Player and Age). What the regression will let you do is hold one of the dials steady, to figure out what happens when you turn the other one. For instance, if we hold player steady, what does the “age dial” do to the volume? And how quickly does it do that? Cycling through those settings is how we construct the curve.

My particular method will start off by including just these two variables – the player’s name as a categorical variable, and their age as a numeric variable. I will also specifically be using a GAM model, not a multiple regression as Bradbury did. I’ll give more on that in piece #2. With this information about the two models, respectively, we can see some very clear benefits emerge from the use of a regression-based approach over the delta.

Advantages of Regression over Delta² Method

First, delta has no choice but to bin ages³. In order to have a sufficiently large number of players to average in each bucket, this must be done. But when your birthday is, especially early in NHL careers, may have a significant impact on when a player starts and how quickly they improve. Second, a key feature of measuring an impact of aging is having the best a priori measure of “true talent” you can. The delta method has a “memory” of only one season – each transition bucket just compares one year to the previous. The regression is based on a categorical predictor which will become a placeholder for the quality of their entire career. Third, an additional consequence to this failure of memory is that it limits our sample – only consecutive seasons can be used⁴. Regression can use an entire career no matter how discontinuous. Fourth, regression scales better to future problems due to its flexibility. To provide an example: I won’t have time to do it in this piece, but let’s say I wish to create a projection system for prospects. Knowing the pace of aging is very important for that process because age matters a lot for prospects. But prospects play from all over the world in hundreds of different leagues with different levels of competition. How would delta handle this? I would expect that it would have to involve an outsourced NHLe measure because otherwise, I see no way of combining this data. In comparison, for regression, we need only include “league” as a covariate in the model. Exactly how it is treated would depend on some preliminary findings, but regardless, I believe regression-based methods leave delta in the dust with regards to scalability.

In summary, regression-based approaches have the following advantages over the traditional delta method.

Does not require binning of ages
Career-long “memory” rather than just the prior season
Can count non-consecutive seasons
Scales better due to additional flexibility

This explains why I am interested in using a regression instead of a delta method. But before ending this piece and beginning the more technical aspects of the study, I wanted to say a little bit more about some general thoughts about aging that I’ve determined through my work and in conversations with many others including Bradbury, Tango, Lau Sze Yui, the Younggrens, and several others on Twitter.

Survivorship Commentary

I’d like to clarify a few things about the fabled “survivor bias” — a concept that has proven to be fairly omnipresent in aging curve research. The earliest cited account of survivor bias used in analysis was in Abraham Wald’s analysis of WW2 bombers. What Wald found, was that, in deciding what parts of the planes to reinforce, they should not discount the area with the least bullet holes. Though it may be initially counterintuitive, the problem is that the only planes that they analyzed were the ones that returned. The ones that did NOT return could have feasibly taken shots to those areas – which would mean that they were even MORE in need of reinforcement since a sufficient number of shots to those areas would down the plane. This is an egregious case of survivor bias. This is the same problem you would observe if taking the raw totals of GAR/60 of all NHL players of each age. But by controlling for the quality of the player, we’ve already done the most important work in controlling for survivorship. All future corrections will be of marginal improvement in projecting in comparison to this one. Furthermore, some changes may make no improvement at all. As the last part of this piece, I will propose a brief commentary on the overreaction to some components of surviorship in aging.

The first clarification I want to make is that being better does not necessarily mean a player will age slower. A really common concern voiced on my tweet-previews of my aging curves was that either A) it didn’t apply to the “greats” because they’re better longer or B) the “greats” are the only ones in the data at the extremes so they bias the curve. Both of these arguments presuppose that the better a player is, the slower they will age. We don’t have good evidence to say that this is true, and when pressed, many of you didn’t believe that argument either. Jeff Zimmerman’s work had Hall of Fame hitters aging at largely the same rate as “normal guys”. Henry Druschel recorded phenoms as aging only marginally slower than non-phenoms. In my dataset, the career GAR/60 impact only explained about 3% of the steepness* of a players age curve (*”steepness” defined as the leading coefficient of the quadratic regression fit to their career results). It is certainly not obvious, and debatably not even true, that better players will age more slowly. It’s not good versus bad that concerns us in this conversation, it’s fast-aging versus slow-aging.

Another clarification is that we have not yet proven the magnitude — or even direction — of the impact of survivorship on the aging curves⁵. As such, most methods of “accounting for” it are likely speculative. Here’s an illustration of what I’m talking about. The following two features are both biases impacting curves:

Players may drop out of the sample after an anomalously bad season. These players were likely to regress upward, but were denied that opportunity by being cut. As such, the aging curve may overestimate the impact of aging due to their inclusion.
Players who age slower will be more likely to stay in the sample than players who age faster. They will make up an increasingly large percentage of the sample and, as such, will make the curve underestimate the impact of aging due to their inclusion.

Now, a common method employed to address concerns about the first problem is to put in “phantom years” – projecting players forward based off their recent history and an “aging” factor. How is that aging factor determined? Well typically you would build a simple linear model off the players that are in the dataset. So, in order to solve problem 1 (overestimation), a model is built off data that suffers from problem 2 (underestimation) thus potentially exacerbating it. Indeed, in testing, the “phantom projection” model made worse predictions on holdout years than the unaltered model in my research.

Summary

In order to analyze aging, the best method would will ultimately be to utilize a regression of some kind because it doesn’t require binning, has a career-long memory, can count non-consecutive seasons, and scales better to future problems. This will be elaborated on in Part 2. Survivor bias is definitely a thing, but the most important focal points for its impact are corrected for by both delta and regression models. It’s not clear what the anatomy of the bias is and so we cannot say a curve will be substantively different in a specific way — we can only aim to make our curve flexible enough to handle the different agers. This will be handled in Part #3.

Part #2 – GAM Aging Methodology

In part 1, I introduced the concept of aging curves, in part 2, I’m going to explain my specific approach to the aging curve produced by regression and offer some preliminary findings.

Introduction to GAMs

My first step in this process was actually to treat age as categorical, not dissimilar to how delta works, to get an idea for the rough shape of the curve. Upon looking at it, it was not obvious to me that it should be symmetrical – players may not develop at the same pace with which they deteriorate. This was one of Birnbaum’s critiques of Bradbury that I thought landed. Bradbury treated age as a quadratic curve which would mean it must be symmetrical. I wanted my model to be more flexible than that, so I used a General Additive Model (GAM) and treated age with something called a “spline.”

A typical linear regression fits an equation to a set of data following the form: \(y = \beta_1{x} + \beta_0 + \epsilon\). This will find the straight line of best fit for any dataset, even if the data isn’t shaped as a line. A GAM model is the same structure, but it doesn’t HAVE to use a linear coefficient \(\beta_1\) , it can use any function: \(y = f(x) + \beta_0 + \epsilon\). This allows you to use a much more flexible treatment of nonlinear terms. Age clearly has a “rise and fall” effect so we will need nonlinear treatment of that variable. Bradbury essentially did this, using a quadratic as the \(f(x)\) function. In particular, I will be using the aforementioned spline as my \(f(x)\). A spline is piecewise function – a process that will segment the data and write different functions for different parts, and then connect them. That would look a little choppy if unadjusted so there are also various forms of smoothing involved to make it look like a one big function. In calculus you could say that the smoothing process makes the function “continuous and differentiable.” Here is an example of how this progression could be visualized onto a pseudorandom dataset that is not dissimilar from an aging curve.

With our vocab terms defined, let’s now present the first attempt at a model for an aging curve.

\[gar60 \sim player + s(age)\]

At this point, it’s worth reiterating that this is a General ADDITIVE Model. In other words, the impact of the player does not directly interact with the shape of the curve, just the position. Therefore, Brooks Orpik and Erik Karlsson will age in the same shape, but Karlsson’s curve will obviously be much higher. This feature is one way of extracting the quality of a player’s age-adjusted impact. I address this a little in my “clarifications and misconceptions” in part one of this series, but I also have a method for adjusting for the Orpik-Karlsson problem that I’ll elaborate on in part three. For now we just want league-baselines so we will put a pin in that feature.

Initial Results

Applying this model, and holding the player category constant, we can create curves for each position. On advice from a few who have worked on this type of project before (namely, McCurdy and Tango) I used all player-seasons in the Evolving-Hockey dataset — there was no TOI-cutoff. Yes, some of the low-TOI results were erratic, but regressing those results to their TOI peers did not improve the model. Using a better prior than “TOI-peers” may improve the projection, but that would be an entire piece unto itself. As such, for now, we will make no further adjustments outside of standardizing GAR/60 by year (GAR per 100 FA for goalies).

A note on the technical side of things: GAMs have multiple options for “smoothing terms.” These are different techniques of molding the disjoint set of piecewise functions into the continuous, smooth curve shown above. In testing, the R-default smoothing method — the thin-plate regression spline — graded out as having the lowest AIC for test models (I included the testing for the sample of all-situation forwards as example in the RMarkdown linked at the end of this piece). The graphs below are the all-situation age curves for the 3 positions where the x-axis is how many standard deviations below their peak we expect them to be at that age.

The most recent and thorough attempt at calculating hockey aging curves for value metrics was by the Younggrens (linked above) and they found forwards peak at 25 and defenders peak at 22⁶. My model indicates that it may actually be forwards who peak earlier. I can’t claim to understand the anatomy of the difference in its entirety, but the Younggrens did use total WAR whereas I’m using a rate stat so I think it’s possible their model picked up some signal from TOI which may have impacted the results. In McCurdy’s work, he found that maximum threat impact occurred at age-24 for skaters — bang on for my defender curve, and slightly older than my forward curve. For goalies, the most recent work is from Garik16 who opted for a linear projection method, but had he chosen the quadratic, then his peak would have probably been around 25.5 which is about a year off of what I’ve found here. It’s also worth noting that, unlike in the skaters’ curves, the goalie curve’s interval is so wide relative to it’s slope, that it wouldn’t be surprising for a goalie to perform at exactly the same GAR/FA rate their entire career.

We can also apply this methodology to specific categories of GAR. This is what you would get for each gar component for each skater if we run a model of the same structure, simply replacing the response variable.

Once again, let’s compare this to previous findings. The even-strength GAR curves have the same phenomenon as they did for all-situations, with the forwards peaking at a surprisingly early age. The penalty aging curve nosedives almost immediately in both my model and the Younggrens’, so it seems that, as stark as it is, there’s possible merit to the distribution. If I were to venture a guess as to why, players are often both at their fastest and their weakest when they are very young. This combination of skills likely gets them roughed up a fair bit which could make it easier to draw penalties.

The fact that powerplay skill peaks later is not as surprising – the Younggrens’ curves were pretty constant from 23-30 for forwards and for 22-28 for defenders and my findings are in the middle of that. It’s also worth noting that some of these indicate very little movement over an entire career. For instance, it seems that defender PP_GAR/60 and forward SH_GAR/60 have almost no movement over the entirety of the age curve. As it turns out, indeed, the age term is insignificant for those two components.

Positional P-values for Age Spline Term by Situation
	Forwards	Defenders
Even-Strength	4.06e-24	2.01e-06
Powerplay	1.84e-07	0.411
Shorthanded	0.149	0.0846
Penalty_GAR	2.04e-10	0.00136

All-situation P-values for Age Spline Term by Position
Position	P_values
Forwards	7.11e-30
Defenders	1.71e-10
Goales	0.0129

One possible explanation here could be that shooting and passing seem like skills likely to age better than, say, skating. Adherence to scheme also seems like something that likely to age well and may help in special teams. Given the fact that the less “active” member of the special teams unit (Defenders matter more for PK, Forwards for PP) is the one with insignificant age factor, it also seems possible that there’s a fair deal of randomness involved.

Summary

The model employed here is a GAM model with two predictors — the player, and a spline on his age. The spline allows for a flexible, nonlinear treatment of the age variable. It is fairly similar to previous findings, though indicates forwards may peak even earlier than anticipated. Age plays no significant role in shorthanded forward value or powerplay defender value, but is statistically significant to all other GAR components.

This method does not allow for any departure from the mean for the age curve — it assumes all players age identically. It also does not attempt to adjust for survivor bias in any way outside of controlling for the player. In the third and final part we will look at one possible approach that may be able to address the bias by introducing some flexibility into the graph. in addition to investigating the effectiveness of a somewhat bayesian correction method.

Part #3 – Flexible Aging: The Tensor Product

In part 1, we introduced the concept of aging curves, and talked about common approaches (delta vs regression) and handling of biases. In part 2, we elaborated on the chosen method — regression — and offered some preliminary results. In this third and final installment, we will look to improve the model in 2 specific ways — installing a prior for the low-sample player-seasons, and using a tensor to make the aging curve more flexible.

Decay-Rate Tensor

As I mentioned in part 1, one of the most common criticisms of aging curves is that it is not true for some players. Specifically — though, in my opinion, incorrectly — people generally assume this to mean “better” players. In other words, if a player is sufficiently good, they can delay the impact of aging and so their curve should look notably different. As I hinted at in part 1, I don’t believe this is necessarily the case. Players that are very good may stay in the league longer, but it’s not because they’re aging slowly, it’s because they were so good to start, that the impact of aging takes longer to drag them out of the league. Gordie Howe played until he was 51, but his best season by Point Shares was when he was just 25. Wayne Gretzky, remarkably, finished top 5 in Hart voting in his age-37 season, but his best season was at just 21 years old. More recently, Jaromir Jagr is the modern ageless “great” — his best season was at just 23. Not only do great players often age as expected, but decidedly not-great players will sometimes age surprisingly slowly. I don’t think anyone will confuse Mike Knuble for a Hall of Famer anytime soon, but every one of his top 8 seasons occurred at age 30 or older. Long story short (too late?): it’s very possible some players to have different curves than others, but it’s not because they’re good players. It’s because they’re slow agers — and those two things are not the same.

We’re still, however, left with this pesky issue of players who age at different rates. Obviously, every player has their own unique physiology, training, circumstances, etc. that will affect the onset of aging. But, if we simply gave each player their own curve, we would not only have a much less robust basis for each curve, but it would be difficult to make generalized statements about the leagues aging as a whole. I decided it would be necessary to include something like “decay rate” for each player that could slightly alter the curve. Surely there are better ways to do this, but I’ll demonstrate one possibility in this section.

Each player has set of points for their performance in every season they’ve played, relative to their age at the time of that performance. In Bradbury’s piece linked above, he acknowledged that this would likely follow a quadratic pattern. We will take that methodology and map it onto every player’s unique progression. We fit a quadratic curve, \(f(x)=ax^2+bx+c\) (a parabola) to every player. The “decay rate” of a parabola is its leading coefficient, \(a\) — the only parameter that changes the “steepness” of the curve. I calculated this term for every player and decided on using that metric as a proxy for decay rate. Now, we will be able to have one variable for the overally “quality” of a player, one for the impact of general aging, and one that adjust that aging trend for this specific player’s decay rate. In order to make these work as intended, we need find a way to have the player-specific decay rate communicate with the league-wide impact of aging.

In GAM regressions, there are multiple tools that allow for interactions between predictor variables, but the most useful one for our purposes here is likely the tensor product. Put simply, a tensor product is one way for GAMs to handle interactions between variables on different scales, as explained here. For instance, in our case, one of the terms is “age” (measured in years) and one is “GAR rate decay” (measured in GAR SDs) — very different units. So, our new formula will look like this:

\[ gar60 \sim player + te(age,decay)\]

Obviously, this is going to be a little more difficult to show graphically because we’ve added a third dimension to the input (decay). So next we have to think about what we’re interested in showing. The ultimate goal is to show the range of possible aging curves. One option is to graph the aging curve exactly as described in part 2, but then iterate through all of the possible decay rates to see what the model does to the curve as it cycles through the values for that parameter. To do this, we must first determine what the range of reasonable decay rates is. In order to illustrate this procedure, I’ll use total GAR/60 for forwards as the sample data (again, no TOI-cutoff). The graph below shows the distribution of quadratic leading coefficients for that subgroup of the dataset.

If we crunch into specifics, it will become clear that about 90% of the data is between -0.4 and 0.4 so that is, roughly, where we will cycle through. The problem is that The original curves for the “positive decay” players actually had a point where they changed concavity early in careers (the curve did this). I believe this is due to the “late entries” - the model “knew” young players were pretty good, and so when someone entered the league late (and maybe even improved their first couple seasons), it made a graph that went down first then up. Regardless of the reasoning, this down-up-down pattern is clearly not indicative of an age curve we’d like to generalize. However, if we cut the curve off at it’s lowest point, we get something that I believe represents the late-entry phenomenon the model was picking up on. The resulting curve is shown to the below, animated from the 5th through the 95th percentiles:

This, debatably, poses more questions than answers. The first interesting feature is that the tensor seems to have picked up on two distinct curves — we’ll call them “late” agers and “early” agers. As you can see, around the time where the decay rate turns positive (a fairly bizarre result), the curve’s peak-age jumps about a decade later than it was prior. If we look at the list of curves all at once, the two distinct peaks become even clearer.

Some players have a concave-up (U-shaped) performance arcs, and an aging model will struggle with understanding what is happening to these players. The model seems to have determined that these players may peak as late as 33-34. The players that age closer to the manner in which we’d expect (concave-down) seem to peak closer to the point with which we’ve become familiar (23-24 years old). This could be evidence that players are able to stave off aging by up to a decade. It could mean that games played may be as impactful to deterioration as age. It could also mean that, regardless of when you enter the NHL, it takes a few years to figure it out and so even late-entries will improve over the first few years. It could also just be a statistical artifact — after all it seems unlikely that there are actually two distinct aging patterns as opposed to a fairly even spectrum.

It’s tough to know exactly what the tensor is “seeing” in the data, but what I do think is a reasonable inference to draw from this data is that aging in sports is indeed very fluid — a 10-year gap in potential “peaks” is a startling span. This is be no means the definitive approach to isolating the nature or anatomy of this fluidity, but it’s an attempt and an entry point for marrying the clear facts of aging (no one is as athletic at 83 as 23) with the equally clear lack of ubiquity for “peaks.”

Concluding Thoughts

A lot of people have covered aging in sports — I’ve mentioned many of their names or psuedonyms here. This piece is not meant as a refutation of any of their innovative work, nor a rebuttal to their philosophies, but as tribute to their foundational research. Aging in sports is as essential as it is confounding, and I merely aimed to build off this previous work in a manner that would allow us to currently and progressively create a model for aging that is as flexible enough to mirror the range of possibilities we see in the real world.

No model can hope to capture all of the idiosyncratic complexity of a sport in which people can peak anywhere from 19 to 39. But, with a sufficiently adaptive approach, we can marginally de-mystify the divergences from the cannonical curve. No model will be able to mold to all shapes of careers, but the splined-age GAM uses exact age rather than bins, may be adapted to include data from other leagues, and can offer an asymmetric nonlinear curve.

The answer for how to build a model for aging doesn’t have to be a GAM, and the way to make it adaptive doesn’t have to be a tensor. That’s just a first attempt of what I believe should be many. Everyone ages, but at different rates. Aging is different to everyone, but foreign to no one. A model like this is one way of attempting to capture that elusive phenomenon.

With that, I’d like to thank Micah McCurdy, Tom Tango, and J.C. Bradbury for being responsive to my questions about their respective attempts at curve. I’d like to thank Lau Sze Yui for all of his advice in the use of GAMs. Everyone with whom I spoke about this on twitter including Josh Garik and Adam Harstad. My additional appreciation to Prashanth Iyer, Ryan Stimson, and Peter Flynn who offered edits to this piece. And, of course, a massive thanks to Luke and Josh Younggren who provided all of the data for this through their GAR model and their scraper, and whose consult has been invaluable both for this specific project and for my growth as an analyst in general.

I don’t know how to end long research pieces. So, here’s a link to the code used to create this Markdown annnnddddd …

Part #4 – Addendum: Simulation vs Delta

This piece will serve as a living addendum to my 3-part aging curve piece. I will include any corrections/clarifications if necessary, in addition to some follow-up work that is specifically relevant to the research I’ve done heretofore.

As I said at the onset of this piece, this is a topic that has been thoroughly well-researched by a fairly wide array of smart people. Many of them were kind enough to offer feedback on this piece after it was released originally on 11/13/19. That feedback can be largely found in the threads attached to my tweet that made the rollout of the piece official. It was argued by Phil Birnbaum that the long-career guys will be weighted too strongly by the regression model. It was recommended by Tom Tango that a simulation may be the best way to discover if and how the regression-only model would err. I decided it was worth checking if my model would have some potential pitfalls in comparison to the delta method. I will show that comparison here.

This is the basic struture of the simulation.

Create list of 520 players (a1,a2,…,a20,…,z18,z19,z20)
Give each player rows for ages 20-35
Choose aging trajectories – dual-linear – quadratic
Choose Decay ranges – “Full-Range”: Slow agers don’t age – “Mid-Range: Slow agers age at half-speed –”No-Range": Only 1 expected aging rate
Randomize within that span (Uniform randomization)
Randomize the “Quality” of each player (Gaussian randomization)
Randomize the variation for each year (Gaussian randomization)
Dropout players with 3-year averages below a given threshold (4 value units, here)
Calculate Delta method, GAM with spline (with and without a lag term), and GAM with tensor-product (with and without a lag term), and Loess as suggested by Michael Shuckers.
Map each aging curves onto a given player – Set to max=zero
Calculate MAEs for each method vs the user-selected age curve from step #9
Graph the models from step #9 versus the user-selected curves from step #3 over the decay ranges from step #4 and compare results.

library(tidyverse);library(zoo);library(mgcv);
library(ggpubr);library(modelr);library(gam);library(caret)


# Range of Decay is from x to 1
# y is the "true" baseline aging curve

trial_run = function(x,y){
  
  temp <- data.frame(
    p1 = rep(letters,each=16*20),
    p2 = rep(rep(1:20,each=16),26), ## Create Players
    age = rep(20:35,20*26), # Ages 
    traj = rep(y,26*20), # Baseline Aging
    r_goodness = rep(rnorm(20*26,5,2),each=16), # Random "Quality"
    r_rate = rep(runif(20*26,x,1),each=16), # Random "Decay Rate"
    r_spray = rep(rnorm(20*26*16,0,4)) # Random Yearly Noise
  ) %>%
    mutate(gar = r_goodness+r_spray+traj*r_rate, # Rolls Dice
           player = paste0(p1,p2)) %>% # Name Player
    group_by(player) %>%
    
    # "Dropout Mechanism" -- Based on 3-year GAR 
    # (Avg player career = 6.6 seasons)
    mutate(lag1 = lag(gar),lag2 = lag(gar,2)) %>%
    rowwise() %>% 
    mutate(threeyr = mean(c(lag1,lag2,gar),na.rm = T)) %>%
    group_by(player) %>%
    mutate(fcode = ifelse(lag(player)!=player,0,1),
           include = ifelse(threeyr<4,0,1),
           d = gar - lag(gar)) %>%
    filter(include==1)
  
  
  
  
  temp = temp %>%
    group_by(player) %>%
    do(lm(gar ~ age + I(age^2), data=.) %>% coef %>% `[`(3) %>% enframe)%>%
    select(player,decay=value) %>%
    right_join(temp,by="player") %>%
    filter(!(duplicated(player)& duplicated(age)))
  
  
  
  spline_mod = mgcv::gam(gar ~ player + s(age),data=temp)
  spline_lag = mgcv::gam(gar ~ player + lag1 + s(age),data=temp)
  tensor_mod = mgcv::gam(gar ~ player + ti(age)+ti(decay)+ti(age,decay),data=temp)
  tensor_lag = mgcv::gam(gar ~ player + lag1 + ti(age)+ti(decay)+ti(age,decay),data=temp)
  loess_mod = gam::gam(gar ~ player + lo(age,span = 0.25,degree = 1),data=temp)
  
  delta = temp %>%
    group_by(age) %>%
    summarise(change = mean(d)) %>%
    mutate(change = ifelse(is.na(change),0,change),
           chained = cumsum(change),
           delta_gar = chained - max(chained)) %>%
    select(age,delta_gar)
  
  
  
  trugar = data.frame(age=20:35,true_gar = y)
  
  
  thisguy = last(temp %>% filter(!is.na(decay)) %>% `$`(player))
  
  shell = data.frame(
    player = rep(thisguy,16),
    age = 20:35,
    lag1 = rep(0,16),
    lag2 = rep(1,16)
  ) %>% 
    left_join(delta,by="age") %>%
    left_join(trugar,by="age") %>%
    left_join(temp %>% select(player,decay),by="player") %>%
    filter(!duplicated(age)) %>%
    add_predictions(spline_mod,"spline_mod") %>%
    add_predictions(spline_lag,"spline_lag") %>%
    add_predictions(tensor_mod,"tensor_mod") %>%
    add_predictions(tensor_lag,"tensor_lag") %>%
    add_predictions(loess_mod,"loess_mod") %>%
    mutate(spline_mod=spline_mod-max(spline_mod)) %>%
    mutate(spline_lag=spline_lag-max(spline_lag))%>%
    mutate(tensor_mod=tensor_mod-max(tensor_mod)) %>%
    mutate(tensor_lag=tensor_lag-max(tensor_lag)) %>%
    mutate(loess_mod=loess_mod -max(loess_mod))
  
    
  sgr=paste0("MAE=",round(mean(abs(shell$true_gar-shell$spline_mod)),1))
  slr=paste0("MAE=",round(mean(abs(shell$true_gar-shell$spline_lag)),1))
  tgr=paste0("MAE=",round(mean(abs(shell$true_gar-shell$tensor_mod)),1))
  tlr=paste0("MAE=",round(mean(abs(shell$true_gar-shell$tensor_lag)),1))
  dgr=paste0("MAE=",round(mean(abs(shell$true_gar-shell$delta_gar)),1))
  lgr=paste0("MAE=",round(mean(abs(shell$true_gar-shell$loess_mod)),1))

  forPlot = shell %>% select(-player,-lag1,-lag2,-decay)%>%
    gather("model","gar",-age) %>%
    mutate(model=as.factor((model)))
  

  ggplot(forPlot,aes(x=age,y=gar,color=model)) +
    geom_line() +
    ylim(c(-11,0.1))+
    scale_color_manual(values=c("darkorange3","green3","darkred", "red", "deepskyblue2","deepskyblue4","black"))+
    annotate(geom="text", x=25, y=-5.5, label=sgr,color="red")+
    annotate(geom="text", x=25, y=-6.5, label=slr,color="darkred")+
    annotate(geom="text", x=25, y=-8.5, label=tgr,color="deepskyblue4")+
    annotate(geom="text", x=25, y=-7.5, label=tlr,color="deepskyblue2")+
    annotate(geom="text", x=25, y=-10.5, label=dgr,color="darkorange3")+
    annotate(geom="text", x=25, y=-9.5, label=lgr,color="green3")+
    ylab("GAR") + xlab("Age")
}



quadform = function(x){
  -(1/7)*(x-26)^2
}

y1=quadform(20:35)
y2=c(-5:0,-1:-10)

seed = 123

seed = 123

set.seed(seed)
p1 = trial_run(0,y1)+ggtitle("Decay Rate Span (0-1)") + theme(legend.position = "none");set.seed(seed)
p2 = trial_run(0.5,y1)+ggtitle("Decay Rate Span (0.5-1)") + theme(legend.position = "none");set.seed(seed)
p3 = trial_run(1,y1)+ggtitle("Decay Rate Span (Just 1)") +  theme(legend.position = "none");set.seed(seed)
q1 = trial_run(0,y2)+ggtitle("Decay Rate Span (0-1)") + theme(legend.position = "none");set.seed(seed)
q2 = trial_run(0.5,y2)+ggtitle("Decay Rate Span (0.5-1)") + theme(legend.position = "none");set.seed(seed)
q3 = trial_run(1,y2)+ggtitle("Decay Rate Span (Just 1)") +  theme(legend.position = "none")

ggarrange(p1,p2,p3, q1,q2,q3,
             ncol = 3, nrow = 2,common.legend = T)

The results demonstrate that, even with a little added help from a lag term, the gam models I’ve proposed in this piece do not perform as well as the delta method in modeling the true aging curve of this simulated distribution of player-seasons. Why is this? Phil Birnbaum offered a bit of a description in a blog post here and it essentially boils down to leverage. This is a fuzzier, multi-dimensional version of an outlier dominating a regression equation. This is to say, it’s not the length of the long careers that is doing the heavy lifting as much as it is the position of them. By virtue of spanning to both ends of the scale — the highest-leverege locations — the players with long careers are dominating the equation. The long-career players are likely, on average, slower agers. This feature is biasing the curve upward. The delta curve doesn’t care about a player’s history, only their previous year. This feature which I described as a “lack of memory” in Part #1 turns out to be helpful in containing the impact of slow-agers.

It’s not entirely surprising that a regression would suffer from this bias, but the hope was that the spline (smoothed piecewise functions) or loess curve (localized linear models) would be sufficiently “forgetful” so as proxy delta’s handling of this issue. As it turns out, that did not occur in this simulation. Feel free to run the sim for yourself with different seed sets to observe the impact on different trials.

What does this mean for aging curves moving forward? Is delta ever going to be toppled. I’m still hopeful for regression. As you can see, we’ve made a bit of progress in getting closer to the delta curve already — the ground to make up is shrinking. Furthermore, it seems that the tensor method described in Part #3 has some unique untapped potential. You’ll see that if we simply run the same experiment, but re-randomize with a different starting point — set.seed(123425) — for the random number generator, we get a model for which the tensor wins the day.

## Warning: Removed 1 rows containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_path).

The reason I didn’t show that was because I ran 10 different randomized numbers and only one of them produced a result in which delta wasn’t the pretty clear best model. Specifically, the tensor model often missed pretty disasterously. I believe this is because the model is using the decay rate as a “guess” as to what aging class the player falls in, and when it guesses right, it nails it. But, when it guesses wrong, it’s using information that hurts the model. If someone (because I’m growing fairly exhausted with the topic) can identify a better measure for curve “shape” or a better method — perhaps a shape-constrained GAM to rein in the tensor model a bit, or including more metrics descriptive of the curve’s shape as covariates, or, as proposed by Jonathan Judge, perhaps an entirely different approach using FDA.

Therefore, I think a fair description of the state of the research at the moment is that while the ceiling for some of these other measures may exceed that of the delta (see graph above), we’re not at the point yet where we can say regression either is or will be defintively better. Indeed, at the moment, there’s reason to believe delta is still the best method as currently constructed. I also think, however, that some of the methods I’ve proposed produced encouraging and have a higher ceiling with regards to flexibility, applicability, and adaptability.

In the original piece, I did not credit Eric Tulsky, who, before the Younggrens, had written extensively on aging here, here, here, here, here, and here ↩
When I refer to “Delta” Method in Part #1, I’m speaking about the conventional method that has been most frequently cited in the public sphere.↩
Other versions of the delta method have utilized decimal ages with sliding half-year intervals on either side. This is still technically “binning,” but it would have removed the most pernicious aspect of the practice.↩
There are less conventional versions of the delta can span more than a year and bridge the gap, but would still miss a “bucket” in the tradeoff↩
In the addendum in Part 4 of this piece, I run a simulation experiment that attempts to quantify this issue↩
It’s worth noting that you can likely draw many existing aging curve, including the Younggrens’, through the shaded region of mine. One could potentially claim this means that we don’t have sufficient evidence to support the claim that the aging trajectory is significantly different than the consensus up until this point↩

Flexible Aging in the NHL Using GAM

CJ Turtoro

11/4/2019

Part #1 – Intro to Aging Curves

Mapping Aging

The Delta Method

Regression

Advantages of Regression over Delta² Method

Survivorship Commentary

Summary

Part #2 – GAM Aging Methodology

Introduction to GAMs

Initial Results

Summary

Part #3 – Flexible Aging: The Tensor Product

Decay-Rate Tensor

Concluding Thoughts

Part #4 – Addendum: Simulation vs Delta

Flexible Aging in the NHL Using GAM

CJ Turtoro

11/4/2019

Part #1 – Intro to Aging Curves

Mapping Aging

The Delta Method

Regression

Advantages of Regression over Delta2 Method

Survivorship Commentary

Summary

Part #2 – GAM Aging Methodology

Introduction to GAMs

Initial Results

Summary

Part #3 – Flexible Aging: The Tensor Product

Decay-Rate Tensor

Concluding Thoughts

Part #4 – Addendum: Simulation vs Delta

Advantages of Regression over Delta² Method