This is an attempt to build a simple model that predicts the Free Agent contract salaries for the 2015/2016 Free Agent season for MLB. The model is simple in that it tries to predict the player’s future salary given the players WAR (wins above replacemet) for the 2015, 2014, and 2013 seasons, the players current age, and their primary position. The predictions are based on free agent data from 2006 through 2014 (found here and were estimated using several different models (linear regression; partial least squares; lasso, MARS; support vector machine (SVM); bagged trees; gradient boosting machine (GBM); and cubist).
Repeated cross-validation was employed for model selection and the three best performing models (in terms of RMSE and R^squared) are presented in the table below. The best performing models are the MARS, GBM, and Cubist models. For the fun of it, I’ve also included an average of the three models.
These are the top 50 free agents according to ESPN, however some of the FA are missing from this list - in general, this is because they were missing WAR values that couldn’t be easily imputed. Note that the predictions and the Actual are based on the salary - the overall size of the contract divided by number of contract years, and this does not take into account any attempt on the part of the team to backload the contract.
| nameLast | nameFirst | yearID | Pos | Age | Rk | WARpresent | WARlag1 | WARlag2 | mars | gbm | cube | Avg | Actual |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Heyward | Jason | 2015 | RF | 26 | 1 | 6.5 | 6.2 | 3.7 | 21467653 | 17275523 | 34249434 | 24330870 | 23000000 |
| Greinke | Zack | 2015 | SP | 32 | 2 | 9.3 | 4.3 | 3.9 | 75444329 | 19333737 | 50638917 | 48472328 | 34416667 |
| Price | David | 2015 | SP | 30 | 3 | 6.0 | 4.6 | 2.8 | 41396859 | 22283416 | 28613963 | 30764746 | 31000000 |
| Upton | Justin | 2015 | LF | 28 | 4 | 4.4 | 3.2 | 2.9 | 17657209 | 16960706 | 15790983 | 16802966 | 22125000 |
| Cespedes | Yoenis | 2015 | CF | 30 | 5 | 6.3 | 4.1 | 1.5 | 24663879 | 15373549 | 19939450 | 19992293 | NA |
| Gordon | Alex | 2015 | LF | 31 | 6 | 2.8 | 6.6 | 4.2 | 17691107 | 15610988 | 16657572 | 16653222 | 18000000 |
| Davis | Chris | 2015 | 1B | 29 | 7 | 5.2 | 1.8 | 6.5 | 26435351 | 19975776 | 21205889 | 22539005 | 23000000 |
| Fowler | Dexter | 2015 | CF | 29 | 8 | 2.2 | 1.8 | 2.0 | 9616638 | 7876682 | 7908118 | 8467146 | NA |
| Cueto | Johnny | 2015 | SP | 29 | 9 | 3.9 | 6.4 | 1.4 | 26164379 | 21457146 | 22188956 | 23270160 | 21666667 |
| Kazmir | Scott | 2015 | SP | 31 | 10 | 3.3 | 1.7 | 1.1 | 16338624 | 11753035 | 12243065 | 13444908 | 16000000 |
| Leake | Mike | 2015 | SP | 28 | 11 | 2.9 | 1.5 | 3.0 | 16410339 | 12000949 | 13923312 | 14111533 | 16000000 |
| Lackey | John | 2015 | SP | 37 | 12 | 5.7 | 1.1 | 2.8 | 22304739 | 13120942 | 17518442 | 17648041 | 16000000 |
| Chen | Wei-Yin | 2015 | SP | 30 | 13 | 3.8 | 1.8 | 1.8 | 19838772 | 16070969 | 14592884 | 16834208 | 20000000 |
| Kendrick | Howie | 2015 | 2B | 32 | 14 | 1.1 | 5.3 | 3.4 | 8704445 | 7154584 | 9127213 | 8328747 | NA |
| Jackson | Austin | 2015 | RF | 28 | 15 | 1.5 | 1.8 | 3.6 | 7440309 | 7136385 | 8834637 | 7803777 | NA |
| Span | Denard | 2015 | CF | 31 | 16 | 0.8 | 3.7 | 2.3 | 7109511 | 5870335 | 6507299 | 6495715 | 10333333 |
| Wieters | Matt | 2015 | C | 29 | 17 | 0.8 | 0.7 | 0.6 | 4024326 | 3157619 | 3100007 | 3427318 | 15800000 |
| Zimmermann | Jordan | 2015 | SP | 29 | 18 | 3.5 | 4.9 | 3.7 | 30811959 | 21165514 | 21153595 | 24377023 | 22000000 |
| Desmond | Ian | 2015 | SS | 30 | 19 | 2.0 | 3.9 | 3.6 | 13356649 | 10401790 | 11444390 | 11734276 | NA |
| Samardzija | Jeff | 2015 | SP | 30 | 20 | 0.2 | 3.7 | 1.0 | 7829895 | 6713795 | 7877918 | 7473869 | 18000000 |
| Anderson | Brett | 2015 | SP | 27 | 21 | 1.5 | 0.9 | -0.8 | 5764224 | 4382958 | 7168454 | 5771879 | 15800000 |
| Zobrist | Ben | 2015 | 2B | 34 | 22 | 1.9 | 4.9 | 5.0 | 13303036 | 9556857 | 11244290 | 11368061 | 14000000 |
| Rasmus | Colby | 2015 | LF | 29 | 23 | 2.6 | 1.0 | 4.8 | 12791213 | 10499482 | 10490021 | 11260239 | 15800000 |
| Murphy | Daniel | 2015 | 2B | 30 | 24 | 1.4 | 1.9 | 1.8 | 6800693 | 5056689 | 5747586 | 5868322 | 12500000 |
| Latos | Mat | 2015 | SP | 27 | 25 | -0.5 | 0.9 | 3.8 | 4171263 | 3506838 | 6066638 | 4581580 | NA |
| Happ | J.A. | 2015 | SP | 33 | 26 | 3.0 | 1.4 | 0.1 | 12732351 | 7059599 | 9949470 | 9913807 | 12000000 |
| Kennedy | Ian | 2015 | SP | 30 | 27 | -0.4 | 1.4 | -1.5 | 3789439 | 3807986 | 3822497 | 3806641 | 14000000 |
| Sipp | Tony | 2015 | RP | 32 | 28 | 1.7 | 0.8 | -0.4 | 6205972 | 4748219 | 5616339 | 5523510 | 6000000 |
| Parra | Gerardo | 2015 | RF | 28 | 30 | 1.0 | -0.3 | 6.1 | 5581670 | 5240446 | 7392084 | 6071400 | 9166667 |
| Gallardo | Yovani | 2015 | SP | 29 | 31 | 4.1 | 2.5 | 0.5 | 21036862 | 14591706 | 15225164 | 16951244 | NA |
| Cabrera | Asdrubal | 2015 | SS | 29 | 32 | 1.7 | 0.9 | 1.0 | 6176405 | 5713862 | 5745459 | 5878575 | 9250000 |
| O’Day | Darren | 2015 | RP | 33 | 34 | 2.8 | 2.3 | 2.0 | 14107000 | 10179099 | 10643870 | 11643323 | 7750000 |
| Colon | Bartolo | 2015 | SP | 42 | 36 | 1.0 | 0.5 | 5.0 | 7224578 | 6301357 | 7563089 | 7029675 | 7250000 |
| Utley | Chase | 2015 | 2B | 36 | 37 | 0.4 | 3.7 | 3.6 | 6065979 | 5471305 | 6687126 | 6074804 | 7000000 |
| Pelfrey | Mike | 2015 | SP | 31 | 38 | 1.4 | -0.7 | -0.3 | 5473629 | 4436511 | 5430170 | 5113437 | 8000000 |
| Hill | Rich | 2015 | SP | 35 | 39 | 1.6 | 0.2 | -1.2 | 6051715 | 5500647 | 5484850 | 5679070 | 6000000 |
| Venable | Will | 2015 | LF | 33 | 40 | 0.3 | 0.9 | 3.2 | 3982375 | 3202299 | 3987977 | 3724217 | NA |
| Young | Chris | 2015 | RF | 32 | 41 | 1.2 | 0.8 | -0.2 | 4092190 | 3641385 | 4775557 | 4169711 | 6500000 |
| Estrada | Marco | 2015 | SP | 32 | 42 | 3.6 | 0.6 | 1.6 | 14721013 | 10670239 | 11734392 | 12375215 | 13000000 |
| Lincecum | Tim | 2015 | SP | 31 | 43 | 0.3 | -0.7 | -0.6 | 3471515 | 3055110 | 3449553 | 3325393 | NA |
| Soto | Geovany | 2015 | C | 32 | 44 | 1.0 | 0.2 | 1.5 | 4065745 | 3255256 | 4393034 | 3904679 | 2800000 |
| Cahill | Trevor | 2015 | RP | 27 | 45 | -0.3 | -1.5 | 0.7 | 1772771 | 1536245 | 2254581 | 1854532 | 4250000 |
| Fister | Doug | 2015 | SP | 31 | 46 | 0.2 | 4.5 | 4.1 | 10391227 | 8803480 | 11586550 | 10260419 | NA |
| Ramirez | Alexei | 2015 | SS | 34 | 47 | 1.0 | 3.1 | 2.8 | 7014151 | 6536792 | 7130061 | 6893668 | 4000000 |
| Lowe | Mark | 2015 | RP | 32 | 48 | 1.6 | -0.2 | -0.6 | 5026229 | 3791707 | 4702290 | 4506742 | 5500000 |
| Kelley | Shawn | 2015 | RP | 31 | 49 | 0.8 | -0.2 | 0.0 | 3950505 | 2625004 | 3103729 | 3226413 | 5000000 |
| de Aza | Alejandro | 2015 | LF | 31 | 50 | 1.0 | 0.8 | 0.0 | 3958433 | 3155024 | 4067343 | 3726933 | 4500000 |
A few points to note - first, the models simply can’t handle Zack Greinke and his 2015 WAR of 9.3; and second, the MARS model has an affinity for pitching and is willing to overpay for it relative to the predictions of the other models.
Postscript:
Of the 47 FA listed above, 37 have now signed a contract. Thus far the models have done a pretty good job in their predictions with the \(R^2\) for the models as MARS: 0.71; GBM: 0.73; Cubist: 0.73; and the average at 0.77. Of the individual models, the RMSE is lowest for the Cubist and the ensemble of the three has the lowest RMSE overall. It appears that where large discrepancies between predicted and actual are found, they are occurring because teams are overpaying for a level of past performance not seen during the past three years. For instance, Matt Wieters is being paid for the hitter he was in 2012 and 2011, not for what he’s done most recently. The same could be said for the deal for Trevor Cahill. And it’s your best guess as to what Ian Keneny is being paid for.
On the opposite end are players that are being paid for their performance in 2015 when little of their past play is as consistent. Players like J.A. Happ and Brett Anderson appear to be overpaid based on the hopes they don’t regress back to their previous selves in 2016.
Below is a plot of the observed versus predicted salaries of the model averages, and as can be seen the predictions are actually conservative - where they missed the generally missed low. All in all though, not bad for a simple model.