This is an attempt to build a simple model that predicts the Free Agent contract salaries for the 2015/2016 Free Agent season for MLB. The model is simple in that it tries to predict the player’s future salary given the players WAR (wins above replacemet) for the 2015, 2014, and 2013 seasons, the players current age, and their primary position. The predictions are based on free agent data from 2006 through 2014 (found here and were estimated using several different models (linear regression; partial least squares; lasso, MARS; support vector machine (SVM); bagged trees; gradient boosting machine (GBM); and cubist).

Repeated cross-validation was employed for model selection and the three best performing models (in terms of RMSE and R^squared) are presented in the table below. The best performing models are the MARS, GBM, and Cubist models. For the fun of it, I’ve also included an average of the three models.

These are the top 50 free agents according to ESPN, however some of the FA are missing from this list - in general, this is because they were missing WAR values that couldn’t be easily imputed. Note that the predictions and the Actual are based on the salary - the overall size of the contract divided by number of contract years, and this does not take into account any attempt on the part of the team to backload the contract.

nameLast nameFirst yearID Pos Age Rk WARpresent WARlag1 WARlag2 mars gbm cube Avg Actual
Heyward Jason 2015 RF 26 1 6.5 6.2 3.7 21467653 17275523 34249434 24330870 23000000
Greinke Zack 2015 SP 32 2 9.3 4.3 3.9 75444329 19333737 50638917 48472328 34416667
Price David 2015 SP 30 3 6.0 4.6 2.8 41396859 22283416 28613963 30764746 31000000
Upton Justin 2015 LF 28 4 4.4 3.2 2.9 17657209 16960706 15790983 16802966 22125000
Cespedes Yoenis 2015 CF 30 5 6.3 4.1 1.5 24663879 15373549 19939450 19992293 NA
Gordon Alex 2015 LF 31 6 2.8 6.6 4.2 17691107 15610988 16657572 16653222 18000000
Davis Chris 2015 1B 29 7 5.2 1.8 6.5 26435351 19975776 21205889 22539005 23000000
Fowler Dexter 2015 CF 29 8 2.2 1.8 2.0 9616638 7876682 7908118 8467146 NA
Cueto Johnny 2015 SP 29 9 3.9 6.4 1.4 26164379 21457146 22188956 23270160 21666667
Kazmir Scott 2015 SP 31 10 3.3 1.7 1.1 16338624 11753035 12243065 13444908 16000000
Leake Mike 2015 SP 28 11 2.9 1.5 3.0 16410339 12000949 13923312 14111533 16000000
Lackey John 2015 SP 37 12 5.7 1.1 2.8 22304739 13120942 17518442 17648041 16000000
Chen Wei-Yin 2015 SP 30 13 3.8 1.8 1.8 19838772 16070969 14592884 16834208 20000000
Kendrick Howie 2015 2B 32 14 1.1 5.3 3.4 8704445 7154584 9127213 8328747 NA
Jackson Austin 2015 RF 28 15 1.5 1.8 3.6 7440309 7136385 8834637 7803777 NA
Span Denard 2015 CF 31 16 0.8 3.7 2.3 7109511 5870335 6507299 6495715 10333333
Wieters Matt 2015 C 29 17 0.8 0.7 0.6 4024326 3157619 3100007 3427318 15800000
Zimmermann Jordan 2015 SP 29 18 3.5 4.9 3.7 30811959 21165514 21153595 24377023 22000000
Desmond Ian 2015 SS 30 19 2.0 3.9 3.6 13356649 10401790 11444390 11734276 NA
Samardzija Jeff 2015 SP 30 20 0.2 3.7 1.0 7829895 6713795 7877918 7473869 18000000
Anderson Brett 2015 SP 27 21 1.5 0.9 -0.8 5764224 4382958 7168454 5771879 15800000
Zobrist Ben 2015 2B 34 22 1.9 4.9 5.0 13303036 9556857 11244290 11368061 14000000
Rasmus Colby 2015 LF 29 23 2.6 1.0 4.8 12791213 10499482 10490021 11260239 15800000
Murphy Daniel 2015 2B 30 24 1.4 1.9 1.8 6800693 5056689 5747586 5868322 12500000
Latos Mat 2015 SP 27 25 -0.5 0.9 3.8 4171263 3506838 6066638 4581580 NA
Happ J.A. 2015 SP 33 26 3.0 1.4 0.1 12732351 7059599 9949470 9913807 12000000
Kennedy Ian 2015 SP 30 27 -0.4 1.4 -1.5 3789439 3807986 3822497 3806641 14000000
Sipp Tony 2015 RP 32 28 1.7 0.8 -0.4 6205972 4748219 5616339 5523510 6000000
Parra Gerardo 2015 RF 28 30 1.0 -0.3 6.1 5581670 5240446 7392084 6071400 9166667
Gallardo Yovani 2015 SP 29 31 4.1 2.5 0.5 21036862 14591706 15225164 16951244 NA
Cabrera Asdrubal 2015 SS 29 32 1.7 0.9 1.0 6176405 5713862 5745459 5878575 9250000
O’Day Darren 2015 RP 33 34 2.8 2.3 2.0 14107000 10179099 10643870 11643323 7750000
Colon Bartolo 2015 SP 42 36 1.0 0.5 5.0 7224578 6301357 7563089 7029675 7250000
Utley Chase 2015 2B 36 37 0.4 3.7 3.6 6065979 5471305 6687126 6074804 7000000
Pelfrey Mike 2015 SP 31 38 1.4 -0.7 -0.3 5473629 4436511 5430170 5113437 8000000
Hill Rich 2015 SP 35 39 1.6 0.2 -1.2 6051715 5500647 5484850 5679070 6000000
Venable Will 2015 LF 33 40 0.3 0.9 3.2 3982375 3202299 3987977 3724217 NA
Young Chris 2015 RF 32 41 1.2 0.8 -0.2 4092190 3641385 4775557 4169711 6500000
Estrada Marco 2015 SP 32 42 3.6 0.6 1.6 14721013 10670239 11734392 12375215 13000000
Lincecum Tim 2015 SP 31 43 0.3 -0.7 -0.6 3471515 3055110 3449553 3325393 NA
Soto Geovany 2015 C 32 44 1.0 0.2 1.5 4065745 3255256 4393034 3904679 2800000
Cahill Trevor 2015 RP 27 45 -0.3 -1.5 0.7 1772771 1536245 2254581 1854532 4250000
Fister Doug 2015 SP 31 46 0.2 4.5 4.1 10391227 8803480 11586550 10260419 NA
Ramirez Alexei 2015 SS 34 47 1.0 3.1 2.8 7014151 6536792 7130061 6893668 4000000
Lowe Mark 2015 RP 32 48 1.6 -0.2 -0.6 5026229 3791707 4702290 4506742 5500000
Kelley Shawn 2015 RP 31 49 0.8 -0.2 0.0 3950505 2625004 3103729 3226413 5000000
de Aza Alejandro 2015 LF 31 50 1.0 0.8 0.0 3958433 3155024 4067343 3726933 4500000

A few points to note - first, the models simply can’t handle Zack Greinke and his 2015 WAR of 9.3; and second, the MARS model has an affinity for pitching and is willing to overpay for it relative to the predictions of the other models.

Postscript:

Of the 47 FA listed above, 37 have now signed a contract. Thus far the models have done a pretty good job in their predictions with the \(R^2\) for the models as MARS: 0.71; GBM: 0.73; Cubist: 0.73; and the average at 0.77. Of the individual models, the RMSE is lowest for the Cubist and the ensemble of the three has the lowest RMSE overall. It appears that where large discrepancies between predicted and actual are found, they are occurring because teams are overpaying for a level of past performance not seen during the past three years. For instance, Matt Wieters is being paid for the hitter he was in 2012 and 2011, not for what he’s done most recently. The same could be said for the deal for Trevor Cahill. And it’s your best guess as to what Ian Keneny is being paid for.

On the opposite end are players that are being paid for their performance in 2015 when little of their past play is as consistent. Players like J.A. Happ and Brett Anderson appear to be overpaid based on the hopes they don’t regress back to their previous selves in 2016.

Below is a plot of the observed versus predicted salaries of the model averages, and as can be seen the predictions are actually conservative - where they missed the generally missed low. All in all though, not bad for a simple model.