In class activity 9

Suppose you are the General Manager of a baseball team, and you are selecting two players for your team. You have a budget of $10,500,000, and you have the choice between the following players: Player Name OBP SLG Salary Yandy Diaz 0.403 0.511 $8,000,000 Joey Meneses 0.320 0.366 $723,600 Jose Abreu 0.292 0.358 $19,500,000 Ryan Noda 0.384 0.400 $720,000 Nate Lowe 0.365 0.426 $4,050,000

Given your budget and the player statistics, which two players would you select?

For this exercise we have to exclude Jose Abreau since his salary is out of the budget. Therefore, we will compute the predicted runs scored using obp and slg. This is because we want to select the players that could give us the maximum probability to score runs and therefore win games.

# Read in data
baseball = read.csv("baseball.csv")
str(baseball)
## 'data.frame':    1232 obs. of  15 variables:
##  $ Team        : chr  "ARI" "ATL" "BAL" "BOS" ...
##  $ League      : chr  "NL" "NL" "AL" "AL" ...
##  $ Year        : int  2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
##  $ RS          : int  734 700 712 734 613 748 669 667 758 726 ...
##  $ RA          : int  688 600 705 806 759 676 588 845 890 670 ...
##  $ W           : int  81 94 93 69 61 85 97 68 64 88 ...
##  $ OBP         : num  0.328 0.32 0.311 0.315 0.302 0.318 0.315 0.324 0.33 0.335 ...
##  $ SLG         : num  0.418 0.389 0.417 0.415 0.378 0.422 0.411 0.381 0.436 0.422 ...
##  $ BA          : num  0.259 0.247 0.247 0.26 0.24 0.255 0.251 0.251 0.274 0.268 ...
##  $ Playoffs    : int  0 1 1 0 0 0 1 0 0 1 ...
##  $ RankSeason  : int  NA 4 5 NA NA NA 2 NA NA 6 ...
##  $ RankPlayoffs: int  NA 5 4 NA NA NA 4 NA NA 2 ...
##  $ G           : int  162 162 162 162 162 162 162 162 162 162 ...
##  $ OOBP        : num  0.317 0.306 0.315 0.331 0.335 0.319 0.305 0.336 0.357 0.314 ...
##  $ OSLG        : num  0.415 0.378 0.403 0.428 0.424 0.405 0.39 0.43 0.47 0.402 ...
# Subset to only include moneyball years
moneyball = subset(baseball, Year < 2002)
str(moneyball)
## 'data.frame':    902 obs. of  15 variables:
##  $ Team        : chr  "ANA" "ARI" "ATL" "BAL" ...
##  $ League      : chr  "AL" "NL" "NL" "AL" ...
##  $ Year        : int  2001 2001 2001 2001 2001 2001 2001 2001 2001 2001 ...
##  $ RS          : int  691 818 729 687 772 777 798 735 897 923 ...
##  $ RA          : int  730 677 643 829 745 701 795 850 821 906 ...
##  $ W           : int  75 92 88 63 82 88 83 66 91 73 ...
##  $ OBP         : num  0.327 0.341 0.324 0.319 0.334 0.336 0.334 0.324 0.35 0.354 ...
##  $ SLG         : num  0.405 0.442 0.412 0.38 0.439 0.43 0.451 0.419 0.458 0.483 ...
##  $ BA          : num  0.261 0.267 0.26 0.248 0.266 0.261 0.268 0.262 0.278 0.292 ...
##  $ Playoffs    : int  0 1 1 0 0 0 0 0 1 0 ...
##  $ RankSeason  : int  NA 5 7 NA NA NA NA NA 6 NA ...
##  $ RankPlayoffs: int  NA 1 3 NA NA NA NA NA 4 NA ...
##  $ G           : int  162 162 162 162 161 162 162 162 162 162 ...
##  $ OOBP        : num  0.331 0.311 0.314 0.337 0.329 0.321 0.334 0.341 0.341 0.35 ...
##  $ OSLG        : num  0.412 0.404 0.384 0.439 0.393 0.398 0.427 0.455 0.417 0.48 ...
# Regression model to predict runs scored
RunsReg = lm(RS ~ OBP + SLG , data=moneyball)
summary(RunsReg)
## 
## Call:
## lm(formula = RS ~ OBP + SLG, data = moneyball)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -70.838 -17.174  -1.108  16.770  90.036 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -804.63      18.92  -42.53   <2e-16 ***
## OBP          2737.77      90.68   30.19   <2e-16 ***
## SLG          1584.91      42.16   37.60   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 24.79 on 899 degrees of freedom
## Multiple R-squared:  0.9296, Adjusted R-squared:  0.9294 
## F-statistic:  5934 on 2 and 899 DF,  p-value: < 2.2e-16
#Runs= -804.63 + 2737.77*obp + 1584.91*slg
Runs_Diaz= round(-804.63 + 2737.77*0.403 + 1584.91*0.511)
Runs_Diaz
## [1] 1109
#Runs= -804.63 + 2737.77*obp + 1584.91*slg
Run_Meneses= round(-804.63 + 2737.77*0.320 + 1584.91*0.366)
Run_Meneses
## [1] 652
#Runs= -804.63 + 2737.77*obp + 1584.91*slg
Runs_Noda= round(-804.63 + 2737.77*0.384 + 1584.91*0.400)
Runs_Noda
## [1] 881
#Runs= -804.63 + 2737.77*obp + 1584.91*slg
Runs_Lowe= round(-804.63 + 2737.77*0.365 + 1584.91*0.426)
Runs_Lowe
## [1] 870

According to our results the two players I would chose based on the variables obp and slg would be Yandi Diaz with a estimated 1109 runs scored and Ryan Noda with an estimated 881 runs scored.