Suppose you are the General Manager of a baseball team, and you are selecting two players for your team. You have a budget of $10,500,000, and you have the choice between the following players:

Player Name OBP SLG Salary Yandy Diaz 0.403 0.511 $8,000,000 Joey Meneses 0.320 0.366 $723,600 Jose Abreu 0.292 0.358 $19,500,000 Ryan Noda 0.384 0.400 $720,000 Nate Lowe 0.365 0.426 $4,050,000

Given your budget and the player statistics, which two players would you select?

We’re going to use OBP and SLG to predict how many runs a team might score. The idea is to pick players who give us the best shot at scoring runs—and winning games

# Read in data
baseball = read.csv("baseball.csv")
str(baseball)
'data.frame':   1232 obs. of  15 variables:
 $ Team        : chr  "ARI" "ATL" "BAL" "BOS" ...
 $ League      : chr  "NL" "NL" "AL" "AL" ...
 $ Year        : int  2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
 $ RS          : int  734 700 712 734 613 748 669 667 758 726 ...
 $ RA          : int  688 600 705 806 759 676 588 845 890 670 ...
 $ W           : int  81 94 93 69 61 85 97 68 64 88 ...
 $ OBP         : num  0.328 0.32 0.311 0.315 0.302 0.318 0.315 0.324 0.33 0.335 ...
 $ SLG         : num  0.418 0.389 0.417 0.415 0.378 0.422 0.411 0.381 0.436 0.422 ...
 $ BA          : num  0.259 0.247 0.247 0.26 0.24 0.255 0.251 0.251 0.274 0.268 ...
 $ Playoffs    : int  0 1 1 0 0 0 1 0 0 1 ...
 $ RankSeason  : int  NA 4 5 NA NA NA 2 NA NA 6 ...
 $ RankPlayoffs: int  NA 5 4 NA NA NA 4 NA NA 2 ...
 $ G           : int  162 162 162 162 162 162 162 162 162 162 ...
 $ OOBP        : num  0.317 0.306 0.315 0.331 0.335 0.319 0.305 0.336 0.357 0.314 ...
 $ OSLG        : num  0.415 0.378 0.403 0.428 0.424 0.405 0.39 0.43 0.47 0.402 ...
# Subset to only include moneyball years
moneyball = subset(baseball, Year < 2002)
str(moneyball)
'data.frame':   902 obs. of  15 variables:
 $ Team        : chr  "ANA" "ARI" "ATL" "BAL" ...
 $ League      : chr  "AL" "NL" "NL" "AL" ...
 $ Year        : int  2001 2001 2001 2001 2001 2001 2001 2001 2001 2001 ...
 $ RS          : int  691 818 729 687 772 777 798 735 897 923 ...
 $ RA          : int  730 677 643 829 745 701 795 850 821 906 ...
 $ W           : int  75 92 88 63 82 88 83 66 91 73 ...
 $ OBP         : num  0.327 0.341 0.324 0.319 0.334 0.336 0.334 0.324 0.35 0.354 ...
 $ SLG         : num  0.405 0.442 0.412 0.38 0.439 0.43 0.451 0.419 0.458 0.483 ...
 $ BA          : num  0.261 0.267 0.26 0.248 0.266 0.261 0.268 0.262 0.278 0.292 ...
 $ Playoffs    : int  0 1 1 0 0 0 0 0 1 0 ...
 $ RankSeason  : int  NA 5 7 NA NA NA NA NA 6 NA ...
 $ RankPlayoffs: int  NA 1 3 NA NA NA NA NA 4 NA ...
 $ G           : int  162 162 162 162 161 162 162 162 162 162 ...
 $ OOBP        : num  0.331 0.311 0.314 0.337 0.329 0.321 0.334 0.341 0.341 0.35 ...
 $ OSLG        : num  0.412 0.404 0.384 0.439 0.393 0.398 0.427 0.455 0.417 0.48 ...
# Regression model to predict runs scored
RunsReg = lm(RS ~ OBP + SLG , data=moneyball)
summary(RunsReg)

Call:
lm(formula = RS ~ OBP + SLG, data = moneyball)

Residuals:
    Min      1Q  Median      3Q     Max 
-70.838 -17.174  -1.108  16.770  90.036 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -804.63      18.92  -42.53   <2e-16 ***
OBP          2737.77      90.68   30.19   <2e-16 ***
SLG          1584.91      42.16   37.60   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 24.79 on 899 degrees of freedom
Multiple R-squared:  0.9296,    Adjusted R-squared:  0.9294 
F-statistic:  5934 on 2 and 899 DF,  p-value: < 2.2e-16
#Runs= -804.63 + 2737.77*obp + 1584.91*slg
Runs_Diaz= round(-804.63 + 2737.77*0.403 + 1584.91*0.511)
Runs_Diaz
[1] 1109
#Runs= -804.63 + 2737.77*obp + 1584.91*slg
Run_Meneses= round(-804.63 + 2737.77*0.320 + 1584.91*0.366)
Run_Meneses
[1] 652
#Runs= -804.63 + 2737.77*obp + 1584.91*slg
Runs_Noda= round(-804.63 + 2737.77*0.384 + 1584.91*0.400)
Runs_Noda
[1] 881
#Runs= -804.63 + 2737.77*obp + 1584.91*slg
Runs_Lowe= round(-804.63 + 2737.77*0.365 + 1584.91*0.426)
Runs_Lowe
[1] 870

Based on our analysis using OBP and SLG, the top two player choices would be Yandy Díaz, with an estimated 1,109 runs scored, and Ryan Noda, with an estimated 881 runs. These projections suggest they offer strong run-scoring potential.

LS0tCnRpdGxlOiAiSW4tY2xhc3MgYWN0aXZpdHkgOTogQ2hvb3NpbmcgYW1vbmcgRGlmZmVyZW50IFBsYXllcnMiCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KU3VwcG9zZSB5b3UgYXJlIHRoZSBHZW5lcmFsIE1hbmFnZXIgb2YgYSBiYXNlYmFsbCB0ZWFtLCBhbmQgeW91IGFyZSBzZWxlY3RpbmcgdHdvIHBsYXllcnMgZm9yIHlvdXIgdGVhbS4gCllvdSBoYXZlIGEgYnVkZ2V0IG9mICQxMCw1MDAsMDAwLCBhbmQgeW91IGhhdmUgdGhlIGNob2ljZSBiZXR3ZWVuIHRoZSBmb2xsb3dpbmcgcGxheWVyczoKClBsYXllciBOYW1lICAgICAgIE9CUCAgIFNMRyAgIFNhbGFyeSAKWWFuZHkgRGlheiAgICAgICAgMC40MDMgMC41MTEgJDgsMDAwLDAwMCAKSm9leSBNZW5lc2VzICAgICAgMC4zMjAgMC4zNjYgJDcyMyw2MDAgCkpvc2UgQWJyZXUgICAgICAgIDAuMjkyIDAuMzU4ICQxOSw1MDAsMDAwIApSeWFuIE5vZGEgICAgICAgICAwLjM4NCAwLjQwMCAkNzIwLDAwMCAKTmF0ZSBMb3dlICAgICAgICAgMC4zNjUgMC40MjYgJDQsMDUwLDAwMAoKR2l2ZW4geW91ciBidWRnZXQgYW5kIHRoZSBwbGF5ZXIgc3RhdGlzdGljcywgd2hpY2ggdHdvIHBsYXllcnMgd291bGQgeW91IHNlbGVjdD8KCldl4oCZcmUgZ29pbmcgdG8gdXNlIE9CUCBhbmQgU0xHIHRvIHByZWRpY3QgaG93IG1hbnkgcnVucyBhIHRlYW0gbWlnaHQgc2NvcmUuIFRoZSBpZGVhIGlzIHRvIHBpY2sgcGxheWVycyB3aG8gZ2l2ZSB1cyB0aGUgYmVzdCBzaG90IGF0IHNjb3JpbmcgcnVuc+KAlGFuZCB3aW5uaW5nIGdhbWVzCi0tLQpgYGB7cn0KIyBSZWFkIGluIGRhdGEKYmFzZWJhbGwgPSByZWFkLmNzdigiYmFzZWJhbGwuY3N2IikKc3RyKGJhc2ViYWxsKQpgYGAKYGBge3J9CiMgU3Vic2V0IHRvIG9ubHkgaW5jbHVkZSBtb25leWJhbGwgeWVhcnMKbW9uZXliYWxsID0gc3Vic2V0KGJhc2ViYWxsLCBZZWFyIDwgMjAwMikKc3RyKG1vbmV5YmFsbCkKYGBgCmBgYHtyfQojIFJlZ3Jlc3Npb24gbW9kZWwgdG8gcHJlZGljdCBydW5zIHNjb3JlZApSdW5zUmVnID0gbG0oUlMgfiBPQlAgKyBTTEcgLCBkYXRhPW1vbmV5YmFsbCkKc3VtbWFyeShSdW5zUmVnKQpgYGAKYGBge3J9CiNSdW5zPSAtODA0LjYzICsgMjczNy43NypvYnAgKyAxNTg0LjkxKnNsZwpSdW5zX0RpYXo9IHJvdW5kKC04MDQuNjMgKyAyNzM3Ljc3KjAuNDAzICsgMTU4NC45MSowLjUxMSkKUnVuc19EaWF6CmBgYApgYGB7cn0KI1J1bnM9IC04MDQuNjMgKyAyNzM3Ljc3Km9icCArIDE1ODQuOTEqc2xnClJ1bl9NZW5lc2VzPSByb3VuZCgtODA0LjYzICsgMjczNy43NyowLjMyMCArIDE1ODQuOTEqMC4zNjYpClJ1bl9NZW5lc2VzCmBgYApgYGB7cn0KI1J1bnM9IC04MDQuNjMgKyAyNzM3Ljc3Km9icCArIDE1ODQuOTEqc2xnClJ1bnNfTm9kYT0gcm91bmQoLTgwNC42MyArIDI3MzcuNzcqMC4zODQgKyAxNTg0LjkxKjAuNDAwKQpSdW5zX05vZGEKYGBgCmBgYHtyfQojUnVucz0gLTgwNC42MyArIDI3MzcuNzcqb2JwICsgMTU4NC45MSpzbGcKUnVuc19Mb3dlPSByb3VuZCgtODA0LjYzICsgMjczNy43NyowLjM2NSArIDE1ODQuOTEqMC40MjYpClJ1bnNfTG93ZQpgYGAKQmFzZWQgb24gb3VyIGFuYWx5c2lzIHVzaW5nIE9CUCBhbmQgU0xHLCB0aGUgdG9wIHR3byBwbGF5ZXIgY2hvaWNlcyB3b3VsZCBiZSBZYW5keSBEw61heiwgd2l0aCBhbiBlc3RpbWF0ZWQgMSwxMDkgcnVucyBzY29yZWQsIGFuZCBSeWFuIE5vZGEsIHdpdGggYW4gZXN0aW1hdGVkIDg4MSBydW5zLiBUaGVzZSBwcm9qZWN0aW9ucyBzdWdnZXN0IHRoZXkgb2ZmZXIgc3Ryb25nIHJ1bi1zY29yaW5nIHBvdGVudGlhbC4K