Background

UDEMY’s Data Science and Machine Learning Bootcamp with R Capstone Project is based on the historical problem of The Oakland Athletic’s 2002 season. The Athletics’ 2002 campaign ranks among the most famous in franchise history. Following the 2001 season, Oakland saw the departure of three key players (the lost boys). Billy Beane, the team’s general manager, responded with a series of under-the-radar free agent signings. The new-look Athletics, despite a comparative lack of star power, surprised the baseball world by besting the 2001 team’s regular season record.

In this project we will perform an analysis that aims to inform decision makers on the selection of replacements for the departed players. We will be using real data provided by Sean Lahaman’s Website (http://www.seanlahman.com/baseball-archive/statistics/).

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)

Loading initial data

We’ll start loading and exploring the data.

batting <- read.csv('Batting.csv')
head(batting, 12)
##     playerID yearID stint teamID lgID   G G_batting  AB   R   H X2B X3B HR RBI
## 1  aardsda01   2004     1    SFN   NL  11        11   0   0   0   0   0  0   0
## 2  aardsda01   2006     1    CHN   NL  45        43   2   0   0   0   0  0   0
## 3  aardsda01   2007     1    CHA   AL  25         2   0   0   0   0   0  0   0
## 4  aardsda01   2008     1    BOS   AL  47         5   1   0   0   0   0  0   0
## 5  aardsda01   2009     1    SEA   AL  73         3   0   0   0   0   0  0   0
## 6  aardsda01   2010     1    SEA   AL  53         4   0   0   0   0   0  0   0
## 7  aardsda01   2012     1    NYA   AL   1        NA  NA  NA  NA  NA  NA NA  NA
## 8  aaronha01   1954     1    ML1   NL 122       122 468  58 131  27   6 13  69
## 9  aaronha01   1955     1    ML1   NL 153       153 602 105 189  37   9 27 106
## 10 aaronha01   1956     1    ML1   NL 153       153 609 106 200  34  14 26  92
## 11 aaronha01   1957     1    ML1   NL 151       151 615 118 198  27   6 44 132
## 12 aaronha01   1958     1    ML1   NL 153       153 601 109 196  34   4 30  95
##    SB CS BB SO IBB HBP SH SF GIDP G_old
## 1   0  0  0  0   0   0  0  0    0    11
## 2   0  0  0  0   0   0  1  0    0    45
## 3   0  0  0  0   0   0  0  0    0     2
## 4   0  0  0  1   0   0  0  0    0     5
## 5   0  0  0  0   0   0  0  0    0    NA
## 6   0  0  0  0   0   0  0  0    0    NA
## 7  NA NA NA NA  NA  NA NA NA   NA    NA
## 8   2  2 28 39  NA   3  6  4   13   122
## 9   3  1 49 61   5   3  7  4   20   153
## 10  2  4 37 54   6   2  5  7   21   153
## 11  1  1 57 58  15   0  0  3   13   151
## 12  4  1 59 49  16   1  0  3   21   153
str(batting)
## 'data.frame':    97889 obs. of  24 variables:
##  $ playerID : chr  "aardsda01" "aardsda01" "aardsda01" "aardsda01" ...
##  $ yearID   : int  2004 2006 2007 2008 2009 2010 2012 1954 1955 1956 ...
##  $ stint    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ teamID   : chr  "SFN" "CHN" "CHA" "BOS" ...
##  $ lgID     : chr  "NL" "NL" "AL" "AL" ...
##  $ G        : int  11 45 25 47 73 53 1 122 153 153 ...
##  $ G_batting: int  11 43 2 5 3 4 NA 122 153 153 ...
##  $ AB       : int  0 2 0 1 0 0 NA 468 602 609 ...
##  $ R        : int  0 0 0 0 0 0 NA 58 105 106 ...
##  $ H        : int  0 0 0 0 0 0 NA 131 189 200 ...
##  $ X2B      : int  0 0 0 0 0 0 NA 27 37 34 ...
##  $ X3B      : int  0 0 0 0 0 0 NA 6 9 14 ...
##  $ HR       : int  0 0 0 0 0 0 NA 13 27 26 ...
##  $ RBI      : int  0 0 0 0 0 0 NA 69 106 92 ...
##  $ SB       : int  0 0 0 0 0 0 NA 2 3 2 ...
##  $ CS       : int  0 0 0 0 0 0 NA 2 1 4 ...
##  $ BB       : int  0 0 0 0 0 0 NA 28 49 37 ...
##  $ SO       : int  0 0 0 1 0 0 NA 39 61 54 ...
##  $ IBB      : int  0 0 0 0 0 0 NA NA 5 6 ...
##  $ HBP      : int  0 0 0 0 0 0 NA 3 3 2 ...
##  $ SH       : int  0 1 0 0 0 0 NA 6 7 5 ...
##  $ SF       : int  0 0 0 0 0 0 NA 4 4 7 ...
##  $ GIDP     : int  0 0 0 0 0 0 NA 13 20 21 ...
##  $ G_old    : int  11 45 2 5 NA NA NA 122 153 153 ...
head(batting$AB, 5)
## [1] 0 2 0 1 0
head(batting$X2B)
## [1] 0 0 0 0 0 0
colnames(batting)
##  [1] "playerID"  "yearID"    "stint"     "teamID"    "lgID"      "G"        
##  [7] "G_batting" "AB"        "R"         "H"         "X2B"       "X3B"      
## [13] "HR"        "RBI"       "SB"        "CS"        "BB"        "SO"       
## [19] "IBB"       "HBP"       "SH"        "SF"        "GIDP"      "G_old"

Preliminary plots

Some exploratory plots:

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 6413 rows containing non-finite values (stat_bin).

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 6413 rows containing non-finite values (stat_bin).

Let’s move on adding more useful statistics to the intial data frame: Batting Average, On Base Percentage, Slugging Percentage.

batting <- batting %>% mutate(AVG=H/AB, OBP=(H+BB+HBP)/(AB+BB+HBP+SF), 
                        OBP=((H+BB+HBP)/(AB+BB+HBP+SF)),
                        SLG=((H-X2B-X3B-HR)+2*X2B+3*X3B+4*HR)/AB)
str(batting)
## 'data.frame':    97889 obs. of  27 variables:
##  $ playerID : chr  "aardsda01" "aardsda01" "aardsda01" "aardsda01" ...
##  $ yearID   : int  2004 2006 2007 2008 2009 2010 2012 1954 1955 1956 ...
##  $ stint    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ teamID   : chr  "SFN" "CHN" "CHA" "BOS" ...
##  $ lgID     : chr  "NL" "NL" "AL" "AL" ...
##  $ G        : int  11 45 25 47 73 53 1 122 153 153 ...
##  $ G_batting: int  11 43 2 5 3 4 NA 122 153 153 ...
##  $ AB       : int  0 2 0 1 0 0 NA 468 602 609 ...
##  $ R        : int  0 0 0 0 0 0 NA 58 105 106 ...
##  $ H        : int  0 0 0 0 0 0 NA 131 189 200 ...
##  $ X2B      : int  0 0 0 0 0 0 NA 27 37 34 ...
##  $ X3B      : int  0 0 0 0 0 0 NA 6 9 14 ...
##  $ HR       : int  0 0 0 0 0 0 NA 13 27 26 ...
##  $ RBI      : int  0 0 0 0 0 0 NA 69 106 92 ...
##  $ SB       : int  0 0 0 0 0 0 NA 2 3 2 ...
##  $ CS       : int  0 0 0 0 0 0 NA 2 1 4 ...
##  $ BB       : int  0 0 0 0 0 0 NA 28 49 37 ...
##  $ SO       : int  0 0 0 1 0 0 NA 39 61 54 ...
##  $ IBB      : int  0 0 0 0 0 0 NA NA 5 6 ...
##  $ HBP      : int  0 0 0 0 0 0 NA 3 3 2 ...
##  $ SH       : int  0 1 0 0 0 0 NA 6 7 5 ...
##  $ SF       : int  0 0 0 0 0 0 NA 4 4 7 ...
##  $ GIDP     : int  0 0 0 0 0 0 NA 13 20 21 ...
##  $ G_old    : int  11 45 2 5 NA NA NA 122 153 153 ...
##  $ AVG      : num  NaN 0 NaN 0 NaN ...
##  $ OBP      : num  NaN 0 NaN 0 NaN ...
##  $ SLG      : num  NaN 0 NaN 0 NaN ...

Merge salary data

We know we don’t just want the best players, we want the most undervalued players, meaning we will also need to know current salary information.

salary <- read.csv("Salaries.csv")
str(salary)
## 'data.frame':    23956 obs. of  5 variables:
##  $ yearID  : int  1985 1985 1985 1985 1985 1985 1985 1985 1985 1985 ...
##  $ teamID  : chr  "BAL" "BAL" "BAL" "BAL" ...
##  $ lgID    : chr  "AL" "AL" "AL" "AL" ...
##  $ playerID: chr  "murraed02" "lynnfr01" "ripkeca01" "lacyle01" ...
##  $ salary  : int  1472819 1090000 800000 725000 641667 625000 581250 560000 558333 547143 ...
summary(salary)
##      yearID        teamID              lgID             playerID        
##  Min.   :1985   Length:23956       Length:23956       Length:23956      
##  1st Qu.:1993   Class :character   Class :character   Class :character  
##  Median :1999   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :1999                                                           
##  3rd Qu.:2006                                                           
##  Max.   :2013                                                           
##      salary        
##  Min.   :       0  
##  1st Qu.:  250000  
##  Median :  507950  
##  Mean   : 1864357  
##  3rd Qu.: 2100000  
##  Max.   :33000000

Salary data starts in 1985, while Batting data starts in 1871. To facilitate things we can remove the batting data that occurred before 1985.

batting <- filter(batting, yearID >= 1985)
summary(batting)
##    playerID             yearID         stint         teamID         
##  Length:35652       Min.   :1985   Min.   :1.00   Length:35652      
##  Class :character   1st Qu.:1993   1st Qu.:1.00   Class :character  
##  Mode  :character   Median :2000   Median :1.00   Mode  :character  
##                     Mean   :2000   Mean   :1.08                     
##                     3rd Qu.:2007   3rd Qu.:1.00                     
##                     Max.   :2013   Max.   :4.00                     
##                                                                     
##      lgID                 G           G_batting            AB       
##  Length:35652       Min.   :  1.0   Min.   :  0.00   Min.   :  0.0  
##  Class :character   1st Qu.: 14.0   1st Qu.:  4.00   1st Qu.:  3.0  
##  Mode  :character   Median : 34.0   Median : 27.00   Median : 47.0  
##                     Mean   : 51.7   Mean   : 46.28   Mean   :144.7  
##                     3rd Qu.: 77.0   3rd Qu.: 77.00   3rd Qu.:241.0  
##                     Max.   :163.0   Max.   :163.00   Max.   :716.0  
##                                     NA's   :1406     NA's   :4377   
##        R                H               X2B              X3B        
##  Min.   :  0.00   Min.   :  0.00   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.:  0.00   1st Qu.:  0.00   1st Qu.: 0.000   1st Qu.: 0.000  
##  Median :  4.00   Median :  8.00   Median : 1.000   Median : 0.000  
##  Mean   : 19.44   Mean   : 37.95   Mean   : 7.293   Mean   : 0.824  
##  3rd Qu.: 30.00   3rd Qu.: 61.00   3rd Qu.:11.000   3rd Qu.: 1.000  
##  Max.   :152.00   Max.   :262.00   Max.   :59.000   Max.   :23.000  
##  NA's   :4377     NA's   :4377     NA's   :4377     NA's   :4377    
##        HR              RBI               SB                CS        
##  Min.   : 0.000   Min.   :  0.00   Min.   :  0.000   Min.   : 0.000  
##  1st Qu.: 0.000   1st Qu.:  0.00   1st Qu.:  0.000   1st Qu.: 0.000  
##  Median : 0.000   Median :  3.00   Median :  0.000   Median : 0.000  
##  Mean   : 4.169   Mean   : 18.41   Mean   :  2.811   Mean   : 1.219  
##  3rd Qu.: 5.000   3rd Qu.: 27.00   3rd Qu.:  2.000   3rd Qu.: 1.000  
##  Max.   :73.000   Max.   :165.00   Max.   :110.000   Max.   :29.000  
##  NA's   :4377     NA's   :4377     NA's   :4377      NA's   :4377    
##        BB               SO              IBB               HBP        
##  Min.   :  0.00   Min.   :  0.00   Min.   :  0.000   Min.   : 0.000  
##  1st Qu.:  0.00   1st Qu.:  1.00   1st Qu.:  0.000   1st Qu.: 0.000  
##  Median :  3.00   Median : 12.00   Median :  0.000   Median : 0.000  
##  Mean   : 14.06   Mean   : 27.03   Mean   :  1.171   Mean   : 1.273  
##  3rd Qu.: 21.00   3rd Qu.: 42.00   3rd Qu.:  1.000   3rd Qu.: 1.000  
##  Max.   :232.00   Max.   :223.00   Max.   :120.000   Max.   :35.000  
##  NA's   :4377     NA's   :4377     NA's   :4378      NA's   :4387    
##        SH               SF              GIDP           G_old      
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.00   Min.   :  0.0  
##  1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 0.00   1st Qu.: 11.0  
##  Median : 0.000   Median : 0.000   Median : 1.00   Median : 32.0  
##  Mean   : 1.465   Mean   : 1.212   Mean   : 3.25   Mean   : 49.7  
##  3rd Qu.: 2.000   3rd Qu.: 2.000   3rd Qu.: 5.00   3rd Qu.: 77.0  
##  Max.   :39.000   Max.   :17.000   Max.   :35.00   Max.   :163.0  
##  NA's   :4377     NA's   :4378     NA's   :4377    NA's   :5189   
##       AVG             OBP             SLG       
##  Min.   :0.000   Min.   :0.000   Min.   :0.000  
##  1st Qu.:0.136   1st Qu.:0.188   1st Qu.:0.167  
##  Median :0.233   Median :0.296   Median :0.333  
##  Mean   :0.205   Mean   :0.262   Mean   :0.304  
##  3rd Qu.:0.274   3rd Qu.:0.342   3rd Qu.:0.423  
##  Max.   :1.000   Max.   :1.000   Max.   :4.000  
##  NA's   :8905    NA's   :8821    NA's   :8905

Now we can merge salary data into batting data frame.

combo <- merge(batting, salary, by=c("playerID", "yearID"))
summary(combo)
##    playerID             yearID         stint         teamID.x        
##  Length:25397       Min.   :1985   Min.   :1.000   Length:25397      
##  Class :character   1st Qu.:1993   1st Qu.:1.000   Class :character  
##  Mode  :character   Median :1999   Median :1.000   Mode  :character  
##                     Mean   :1999   Mean   :1.098                     
##                     3rd Qu.:2006   3rd Qu.:1.000                     
##                     Max.   :2013   Max.   :4.000                     
##                                                                      
##     lgID.x                G            G_batting            AB       
##  Length:25397       Min.   :  1.00   Min.   :  0.00   Min.   :  0.0  
##  Class :character   1st Qu.: 26.00   1st Qu.:  8.00   1st Qu.:  5.0  
##  Mode  :character   Median : 50.00   Median : 42.00   Median : 85.0  
##                     Mean   : 64.06   Mean   : 57.58   Mean   :182.4  
##                     3rd Qu.:101.00   3rd Qu.:101.00   3rd Qu.:336.0  
##                     Max.   :163.00   Max.   :163.00   Max.   :716.0  
##                                      NA's   :906      NA's   :2661   
##        R                H               X2B              X3B        
##  Min.   :  0.00   Min.   :  0.00   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.:  0.00   1st Qu.:  1.00   1st Qu.: 0.000   1st Qu.: 0.000  
##  Median :  9.00   Median : 19.00   Median : 3.000   Median : 0.000  
##  Mean   : 24.71   Mean   : 48.18   Mean   : 9.276   Mean   : 1.033  
##  3rd Qu.: 43.00   3rd Qu.: 87.25   3rd Qu.:16.000   3rd Qu.: 1.000  
##  Max.   :152.00   Max.   :262.00   Max.   :59.000   Max.   :23.000  
##  NA's   :2661     NA's   :2661     NA's   :2661     NA's   :2661    
##        HR              RBI               SB                CS       
##  Min.   : 0.000   Min.   :  0.00   Min.   :  0.000   Min.   : 0.00  
##  1st Qu.: 0.000   1st Qu.:  0.00   1st Qu.:  0.000   1st Qu.: 0.00  
##  Median : 1.000   Median :  8.00   Median :  0.000   Median : 0.00  
##  Mean   : 5.369   Mean   : 23.56   Mean   :  3.568   Mean   : 1.54  
##  3rd Qu.: 7.000   3rd Qu.: 39.00   3rd Qu.:  3.000   3rd Qu.: 2.00  
##  Max.   :73.000   Max.   :165.00   Max.   :110.000   Max.   :29.00  
##  NA's   :2661     NA's   :2661     NA's   :2661      NA's   :2661   
##        BB               SO              IBB               HBP        
##  Min.   :  0.00   Min.   :  0.00   Min.   :  0.000   Min.   : 0.000  
##  1st Qu.:  0.00   1st Qu.:  2.00   1st Qu.:  0.000   1st Qu.: 0.000  
##  Median :  6.00   Median : 20.00   Median :  0.000   Median : 0.000  
##  Mean   : 17.98   Mean   : 33.52   Mean   :  1.533   Mean   : 1.614  
##  3rd Qu.: 29.00   3rd Qu.: 55.00   3rd Qu.:  2.000   3rd Qu.: 2.000  
##  Max.   :232.00   Max.   :223.00   Max.   :120.000   Max.   :35.000  
##  NA's   :2661     NA's   :2661     NA's   :2662      NA's   :2670    
##        SH               SF              GIDP            G_old       
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   :  0.00  
##  1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 20.00  
##  Median : 0.000   Median : 0.000   Median : 2.000   Median : 47.00  
##  Mean   : 1.786   Mean   : 1.554   Mean   : 4.127   Mean   : 61.43  
##  3rd Qu.: 2.000   3rd Qu.: 2.000   3rd Qu.: 7.000   3rd Qu.:101.00  
##  Max.   :39.000   Max.   :17.000   Max.   :35.000   Max.   :163.00  
##  NA's   :2661     NA's   :2662     NA's   :2661     NA's   :3414    
##       AVG             OBP             SLG          teamID.y        
##  Min.   :0.000   Min.   :0.000   Min.   :0.000   Length:25397      
##  1st Qu.:0.160   1st Qu.:0.208   1st Qu.:0.200   Class :character  
##  Median :0.242   Median :0.305   Median :0.351   Mode  :character  
##  Mean   :0.212   Mean   :0.270   Mean   :0.317                     
##  3rd Qu.:0.276   3rd Qu.:0.346   3rd Qu.:0.432                     
##  Max.   :1.000   Max.   :1.000   Max.   :4.000                     
##  NA's   :5618    NA's   :5562    NA's   :5618                      
##     lgID.y              salary        
##  Length:25397       Min.   :       0  
##  Class :character   1st Qu.:  255000  
##  Mode  :character   Median :  550000  
##                     Mean   : 1879256  
##                     3rd Qu.: 2150000  
##                     Max.   :33000000  
## 

Analysis

Analyzing lost players and their relevant statistics for year 2001.

lost_pay <- combo %>% 
      filter(playerID == "giambja01" | playerID == "damonjo01" | playerID == "saenzol01") %>%
      filter(yearID == 2001) %>%
      select(playerID,H,X2B,X3B,HR,OBP,SLG,AVG,AB)

Finding replacements whom satisfy the conditions:

The total combined salary of the three players can not exceed 15 million dollars. Their combined number of At Bats (AB) needs to be equal to or greater than the lost players. *Their mean OBP had to equal to or greater than the mean OBP of the lost players

Let’s first take a look at the merged data frame.

ggplot(combo, aes(salary/1000))+
      geom_histogram()+
      labs(title="Salaries histogram", x="Earinings in thousands of dollars", y="Count")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(combo, aes(AB, OBP, colour = salary/1000))+
      geom_point(alpha=.5)+
      labs(title="Selection Variables", x="At Bat", y="OBP")
## Warning: Removed 5562 rows containing missing values (geom_point).

In order to balance the combination, we will set the following caps to filter the merged data frame. Salary limit is 15 million. It makes sense to cut players who individually earn 10 million. At bat figure will be set to half the mean of the 3 lost players. OBP minimum value will be set as a third of the mean from lost players.

SAL_cap <- 10000000 
AB_lostpl <- sum(lost_pay$AB)
OBP_mean_lostpl <- mean(lost_pay$OBP)
AB_cap <- AB_lostpl/6
OBP_cap <- OBP_mean_lostpl/3
combo_k <- combo %>%
      filter(yearID==2001) %>%
      filter(playerID != "giambja01" & playerID != "damonjo01" & playerID != "saenzol01") %>%
      select(playerID, yearID, AB, OBP, SLG, AVG, salary) %>%
      filter(salary <= SAL_cap) 
combo_k <- combo_k %>%
      filter(AB >= AB_cap) %>%
      filter(OBP >= OBP_cap)

Plots

ggplot(combo_k, aes(salary/1000))+
      geom_histogram()+
      labs(title="Salaries histogram", x="Earinings in thousands of dollars", y="Count")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(combo_k, aes(AB, OBP, colour = salary/1000))+
      geom_point(alpha=.5, size = 3)+
      labs(title="Selection Variables", x="At Bat", y="OBP")

summary(combo_k)
##    playerID             yearID           AB             OBP        
##  Length:247         Min.   :2001   Min.   :245.0   Min.   :0.2438  
##  Class :character   1st Qu.:2001   1st Qu.:375.0   1st Qu.:0.3169  
##  Mode  :character   Median :2001   Median :456.0   Median :0.3346  
##                     Mean   :2001   Mean   :452.7   Mean   :0.3381  
##                     3rd Qu.:2001   3rd Qu.:542.0   3rd Qu.:0.3633  
##                     Max.   :2001   Max.   :692.0   Max.   :0.4317  
##       SLG              AVG             salary        
##  Min.   :0.2853   Min.   :0.1870   Min.   :  200000  
##  1st Qu.:0.3757   1st Qu.:0.2531   1st Qu.:  345000  
##  Median :0.4311   Median :0.2699   Median : 2350000  
##  Mean   :0.4364   Mean   :0.2716   Mean   : 2933150  
##  3rd Qu.:0.4768   3rd Qu.:0.2908   3rd Qu.: 4991500  
##  Max.   :0.6880   Max.   :0.3497   Max.   :10000000

The plots shown above allow us to think that there are a significant number of good candidates to substitute the lost players. We need players with an OBP mean of above 0.3639 and an at bat combined score of above 1469. Let’s try a simple approach. We can cut the At Bat threshold at 500, and OBP at .35, this way we ca be certain that remaining players will meet performance cap.

combo_k1 <- combo_k%>%
      filter(OBP >= .35 & AB >=500)

Now we have ended up with a “short list” of 45 players. Let’s take a look at their salary histogram.

ggplot(combo_k1, aes(salary/1000))+
      geom_histogram()+
      labs(title="Histogram salaries-SHORT LIST", 
           x="Earinings in thousands of dollars", y="Count")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

We can confidently say that we can pull out at leas one set of three players that could satisfy the conditions. We can arrange the short list based on the OBP score.

combo_k1 <- arrange(combo_k1, desc(OBP))
head(combo_k1, 10)
##     playerID yearID  AB       OBP       SLG       AVG  salary
## 1  heltoto01   2001 587 0.4316547 0.6848382 0.3356048 4950000
## 2  berkmla01   2001 577 0.4302326 0.6204506 0.3310225  305000
## 3  gonzalu01   2001 609 0.4285714 0.6880131 0.3251232 4833333
## 4  sheffga01   2001 515 0.4174757 0.5825243 0.3106796 9916667
## 5  thomeji01   2001 526 0.4161491 0.6235741 0.2908745 7875000
## 6  alomaro01   2001 575 0.4146707 0.5408696 0.3356522 7750000
## 7  edmonji01   2001 500 0.4102142 0.5640000 0.3040000 6333333
## 8  gilesbr02   2001 576 0.4035608 0.5902778 0.3090278 7333333
## 9  pujolal01   2001 590 0.4029630 0.6101695 0.3288136  200000
## 10 olerujo01   2001 572 0.4011799 0.4720280 0.3024476 6700000

Conclusion

A look at the first three players in the above table already give us a feasible answer. Every player in it meets the performance threshold. Lets make a simple summation to see if they can be afforded with the 15 million cap.

play1 <- combo_k1[1, 1]
play2 <- combo_k1[2, 1]
play3 <- combo_k1[3, 1]

z_sal <- sum(combo_k1[1:3, 7])

paste("Combined salayes of", play1,",", play2, ",", "and", play3, "is equal to", z_sal)
## [1] "Combined salayes of heltoto01 , berkmla01 , and gonzalu01 is equal to 10088333"

We can now conclude that players heltoto01, berkmla01, and gonzalu01 have a combined At Bat score greater than 1469, an OBP with a mean grater that .3639 and altogether cost less than 15 million dollars. We can confidently recommend their hiring, as they are also the ones with the highest OBP among the short list.