Background

This project was completed as part of ‘R for Data Science and Machine Learning’ by Jose Portilla via Udemy. Below are some snippets as provided by the course for general information on the project:

The 2002 Oakland A’s The Oakland Athletics’ 2002 season was the team’s 35th in Oakland, California. It was also the 102nd season in franchise history. The Athletics finished first in the American League West with a record of 103-59.

The Athletics’ 2002 campaign ranks among the most famous in franchise history. Following the 2001 season, Oakland saw the departure of three key players (the lost boys). Billy Beane, the team’s general manager, responded with a series of under-the-radar free agent signings. The new-look Athletics, despite a comparative lack of star power, surprised the baseball world by besting the 2001 team’s regular season record. The team is most famous, however, for winning 20 consecutive games between August 13 and September 4, 2002.[1] The Athletics’ season was the subject of Michael Lewis’ 2003 book Moneyball: The Art of Winning an Unfair Game (as Lewis was given the opportunity to follow the team around throughout that season)

This project is based off the book written by Michael Lewis (later turned into a movie).

Moneyball Book

The central premise of book Moneyball is that the collective wisdom of baseball insiders (including players, managers, coaches, scouts, and the front office) over the past century is subjective and often flawed. Statistics such as stolen bases, runs batted in, and batting average, typically used to gauge players, are relics of a 19th-century view of the game and the statistics available at that time. The book argues that the Oakland A’s’ front office took advantage of more analytical gauges of player performance to field a team that could better compete against richer competitors in Major League Baseball (MLB).

Rigorous statistical analysis had demonstrated that on-base percentage and slugging percentage are better indicators of offensive success, and the A’s became convinced that these qualities were cheaper to obtain on the open market than more historically valued qualities such as speed and contact. These observations often flew in the face of conventional baseball wisdom and the beliefs of many baseball scouts and executives.

By re-evaluating the strategies that produce wins on the field, the 2002 Athletics, with approximately US 44 million dollars in salary, were competitive with larger market teams such as the New York Yankees, who spent over US$125 million in payroll that same season.

Because of the team’s smaller revenues, Oakland is forced to find players undervalued by the market, and their system for finding value in undervalued players has proven itself thus far. This approach brought the A’s to the playoffs in 2002 and 2003.

In this project we’ll work with some data and with the goal of trying to find replacement players for the ones lost at the start of the off-season - During the 2001–02 offseason, the team lost three key free agents to larger market teams: 2000 AL MVP Jason Giambi to the New York Yankees, outfielder Johnny Damon to the Boston Red Sox, and closer Jason Isringhausen to the St. Louis Cardinals.

The main goal of this project is for you to feel comfortable working with R on real data to try and derive actionable insights

#Replacement Players Now we have all the information we need! Here is your final task - Find Replacement Players for the key three players we lost! However, you have three constraints:

The total combined salary of the three players can not exceed 15 million dollars. Their combined number of At Bats (AB) needs to be equal to or greater than the lost players. (1469) Their mean OBP had to equal to or greater than the mean OBP of the lost players (0.3638686667)

Data

We’ll be using data from Sean Lahaman’s Website a very useful source for baseball statistics.

Project Walkthrough

Setup

library(dplyr)
library(ggplot2)
library(plotly)

File Load

batting <- read.csv('/Users/spencer/moneyball_project/Batting.csv')
sal <- read.csv('/Users/spencer/moneyball_project/Salaries.csv')

Data Prep

From that project instructions, I utilized the ‘Batting.csv’ and created 4 new columns for the following information:

Batting Average:

batting$BA <- batting$H / batting$AB

On Base Percentage:

batting$OBP <- (batting$H + batting$BB + batting$HBP) / (batting$AB + batting$BB + batting$HBP + batting$SF)

Slugging Average:

In order to figure up the slugging average (Slugging Percentage) the amount of ‘singles’ hit was needed. To calculate that figure:

batting$X1B <- batting$H - batting$X2B - batting$X3B - batting$HR

Now to figure the Slugging Percentage:

batting$SLG <- (batting$X1B + (2 * batting$X2B) + (3 * batting$X3B) + (4 * batting$HR)) / batting$AB

Check Current ‘batting’ table to make sure all data is present:

head(batting)
##    playerID yearID stint teamID lgID  G G_batting AB R H X2B X3B HR RBI SB CS
## 1 aardsda01   2004     1    SFN   NL 11        11  0 0 0   0   0  0   0  0  0
## 2 aardsda01   2006     1    CHN   NL 45        43  2 0 0   0   0  0   0  0  0
## 3 aardsda01   2007     1    CHA   AL 25         2  0 0 0   0   0  0   0  0  0
## 4 aardsda01   2008     1    BOS   AL 47         5  1 0 0   0   0  0   0  0  0
## 5 aardsda01   2009     1    SEA   AL 73         3  0 0 0   0   0  0   0  0  0
## 6 aardsda01   2010     1    SEA   AL 53         4  0 0 0   0   0  0   0  0  0
##   BB SO IBB HBP SH SF GIDP G_old  BA OBP X1B SLG
## 1  0  0   0   0  0  0    0    11 NaN NaN   0 NaN
## 2  0  0   0   0  1  0    0    45   0   0   0   0
## 3  0  0   0   0  0  0    0     2 NaN NaN   0 NaN
## 4  0  1   0   0  0  0    0     5   0   0   0   0
## 5  0  0   0   0  0  0    0    NA NaN NaN   0 NaN
## 6  0  0   0   0  0  0    0    NA NaN NaN   0 NaN

Looking at the summary of our batting data frame, it is obvious that the data contains extraneous information not needed for our evaluation. It was decided to only look at data for the years 1985 and up:

summary(batting)
##    playerID             yearID         stint          teamID         
##  Length:97889       Min.   :1871   Min.   :1.000   Length:97889      
##  Class :character   1st Qu.:1931   1st Qu.:1.000   Class :character  
##  Mode  :character   Median :1970   Median :1.000   Mode  :character  
##                     Mean   :1962   Mean   :1.077                     
##                     3rd Qu.:1995   3rd Qu.:1.000                     
##                     Max.   :2013   Max.   :5.000                     
##                                                                      
##      lgID                 G            G_batting            AB       
##  Length:97889       Min.   :  1.00   Min.   :  0.00   Min.   :  0.0  
##  Class :character   1st Qu.: 13.00   1st Qu.:  7.00   1st Qu.:  9.0  
##  Mode  :character   Median : 35.00   Median : 32.00   Median : 61.0  
##                     Mean   : 51.65   Mean   : 49.13   Mean   :154.1  
##                     3rd Qu.: 81.00   3rd Qu.: 81.00   3rd Qu.:260.0  
##                     Max.   :165.00   Max.   :165.00   Max.   :716.0  
##                                      NA's   :1406     NA's   :6413   
##        R                H               X2B            X3B        
##  Min.   :  0.00   Min.   :  0.00   Min.   : 0.0   Min.   : 0.000  
##  1st Qu.:  0.00   1st Qu.:  1.00   1st Qu.: 0.0   1st Qu.: 0.000  
##  Median :  5.00   Median : 12.00   Median : 2.0   Median : 0.000  
##  Mean   : 20.47   Mean   : 40.37   Mean   : 6.8   Mean   : 1.424  
##  3rd Qu.: 31.00   3rd Qu.: 66.00   3rd Qu.:10.0   3rd Qu.: 2.000  
##  Max.   :192.00   Max.   :262.00   Max.   :67.0   Max.   :36.000  
##  NA's   :6413     NA's   :6413     NA's   :6413   NA's   :6413    
##        HR              RBI               SB                CS        
##  Min.   : 0.000   Min.   :  0.00   Min.   :  0.000   Min.   : 0.000  
##  1st Qu.: 0.000   1st Qu.:  0.00   1st Qu.:  0.000   1st Qu.: 0.000  
##  Median : 0.000   Median :  5.00   Median :  0.000   Median : 0.000  
##  Mean   : 3.002   Mean   : 18.47   Mean   :  3.265   Mean   : 1.385  
##  3rd Qu.: 3.000   3rd Qu.: 28.00   3rd Qu.:  2.000   3rd Qu.: 1.000  
##  Max.   :73.000   Max.   :191.00   Max.   :138.000   Max.   :42.000  
##  NA's   :6413     NA's   :6837     NA's   :7713      NA's   :29867   
##        BB               SO              IBB              HBP        
##  Min.   :  0.00   Min.   :  0.00   Min.   :  0.00   Min.   : 0.000  
##  1st Qu.:  0.00   1st Qu.:  2.00   1st Qu.:  0.00   1st Qu.: 0.000  
##  Median :  4.00   Median : 11.00   Median :  0.00   Median : 0.000  
##  Mean   : 14.21   Mean   : 21.95   Mean   :  1.28   Mean   : 1.136  
##  3rd Qu.: 21.00   3rd Qu.: 31.00   3rd Qu.:  1.00   3rd Qu.: 1.000  
##  Max.   :232.00   Max.   :223.00   Max.   :120.00   Max.   :51.000  
##  NA's   :6413     NA's   :14251    NA's   :42977    NA's   :9233    
##        SH               SF             GIDP           G_old     
##  Min.   : 0.000   Min.   : 0.0    Min.   : 0.00   Min.   :  0   
##  1st Qu.: 0.000   1st Qu.: 0.0    1st Qu.: 0.00   1st Qu.: 11   
##  Median : 1.000   Median : 0.0    Median : 1.00   Median : 34   
##  Mean   : 2.564   Mean   : 1.2    Mean   : 3.33   Mean   : 51   
##  3rd Qu.: 3.000   3rd Qu.: 2.0    3rd Qu.: 5.00   3rd Qu.: 82   
##  Max.   :67.000   Max.   :19.0    Max.   :36.00   Max.   :165   
##  NA's   :12751    NA's   :42446   NA's   :32521   NA's   :5189  
##        BA             OBP             X1B              SLG       
##  Min.   :0.000   Min.   :0.00    Min.   :  0.00   Min.   :0.000  
##  1st Qu.:0.148   1st Qu.:0.19    1st Qu.:  1.00   1st Qu.:0.179  
##  Median :0.231   Median :0.29    Median :  9.00   Median :0.309  
##  Mean   :0.209   Mean   :0.26    Mean   : 29.14   Mean   :0.291  
##  3rd Qu.:0.275   3rd Qu.:0.34    3rd Qu.: 48.00   3rd Qu.:0.397  
##  Max.   :1.000   Max.   :1.00    Max.   :225.00   Max.   :4.000  
##  NA's   :13520   NA's   :49115   NA's   :6413     NA's   :13520
batting1 <- subset(batting,subset = yearID > 1984)

Check our new table ‘batting1’ summary to confirm our year filter is correct:

summary(batting1)
##    playerID             yearID         stint         teamID         
##  Length:35652       Min.   :1985   Min.   :1.00   Length:35652      
##  Class :character   1st Qu.:1993   1st Qu.:1.00   Class :character  
##  Mode  :character   Median :2000   Median :1.00   Mode  :character  
##                     Mean   :2000   Mean   :1.08                     
##                     3rd Qu.:2007   3rd Qu.:1.00                     
##                     Max.   :2013   Max.   :4.00                     
##                                                                     
##      lgID                 G           G_batting            AB       
##  Length:35652       Min.   :  1.0   Min.   :  0.00   Min.   :  0.0  
##  Class :character   1st Qu.: 14.0   1st Qu.:  4.00   1st Qu.:  3.0  
##  Mode  :character   Median : 34.0   Median : 27.00   Median : 47.0  
##                     Mean   : 51.7   Mean   : 46.28   Mean   :144.7  
##                     3rd Qu.: 77.0   3rd Qu.: 77.00   3rd Qu.:241.0  
##                     Max.   :163.0   Max.   :163.00   Max.   :716.0  
##                                     NA's   :1406     NA's   :4377   
##        R                H               X2B              X3B        
##  Min.   :  0.00   Min.   :  0.00   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.:  0.00   1st Qu.:  0.00   1st Qu.: 0.000   1st Qu.: 0.000  
##  Median :  4.00   Median :  8.00   Median : 1.000   Median : 0.000  
##  Mean   : 19.44   Mean   : 37.95   Mean   : 7.293   Mean   : 0.824  
##  3rd Qu.: 30.00   3rd Qu.: 61.00   3rd Qu.:11.000   3rd Qu.: 1.000  
##  Max.   :152.00   Max.   :262.00   Max.   :59.000   Max.   :23.000  
##  NA's   :4377     NA's   :4377     NA's   :4377     NA's   :4377    
##        HR              RBI               SB                CS        
##  Min.   : 0.000   Min.   :  0.00   Min.   :  0.000   Min.   : 0.000  
##  1st Qu.: 0.000   1st Qu.:  0.00   1st Qu.:  0.000   1st Qu.: 0.000  
##  Median : 0.000   Median :  3.00   Median :  0.000   Median : 0.000  
##  Mean   : 4.169   Mean   : 18.41   Mean   :  2.811   Mean   : 1.219  
##  3rd Qu.: 5.000   3rd Qu.: 27.00   3rd Qu.:  2.000   3rd Qu.: 1.000  
##  Max.   :73.000   Max.   :165.00   Max.   :110.000   Max.   :29.000  
##  NA's   :4377     NA's   :4377     NA's   :4377      NA's   :4377    
##        BB               SO              IBB               HBP        
##  Min.   :  0.00   Min.   :  0.00   Min.   :  0.000   Min.   : 0.000  
##  1st Qu.:  0.00   1st Qu.:  1.00   1st Qu.:  0.000   1st Qu.: 0.000  
##  Median :  3.00   Median : 12.00   Median :  0.000   Median : 0.000  
##  Mean   : 14.06   Mean   : 27.03   Mean   :  1.171   Mean   : 1.273  
##  3rd Qu.: 21.00   3rd Qu.: 42.00   3rd Qu.:  1.000   3rd Qu.: 1.000  
##  Max.   :232.00   Max.   :223.00   Max.   :120.000   Max.   :35.000  
##  NA's   :4377     NA's   :4377     NA's   :4378      NA's   :4387    
##        SH               SF              GIDP           G_old      
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.00   Min.   :  0.0  
##  1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 0.00   1st Qu.: 11.0  
##  Median : 0.000   Median : 0.000   Median : 1.00   Median : 32.0  
##  Mean   : 1.465   Mean   : 1.212   Mean   : 3.25   Mean   : 49.7  
##  3rd Qu.: 2.000   3rd Qu.: 2.000   3rd Qu.: 5.00   3rd Qu.: 77.0  
##  Max.   :39.000   Max.   :17.000   Max.   :35.00   Max.   :163.0  
##  NA's   :4377     NA's   :4378     NA's   :4377    NA's   :5189   
##        BA             OBP             X1B              SLG       
##  Min.   :0.000   Min.   :0.000   Min.   :  0.00   Min.   :0.000  
##  1st Qu.:0.136   1st Qu.:0.188   1st Qu.:  0.00   1st Qu.:0.167  
##  Median :0.233   Median :0.296   Median :  6.00   Median :0.333  
##  Mean   :0.205   Mean   :0.262   Mean   : 25.66   Mean   :0.304  
##  3rd Qu.:0.274   3rd Qu.:0.342   3rd Qu.: 42.00   3rd Qu.:0.423  
##  Max.   :1.000   Max.   :1.000   Max.   :225.00   Max.   :4.000  
##  NA's   :8905    NA's   :8821    NA's   :4377     NA's   :8905

For the last bit of house-keeping on the data, we needed to merge the salary table and the baseball stats in order to compare the salaries of leaving players and their replacements:

combo <- merge(batting1, sal, by= c('playerID','yearID'))

Review of Lost Players

For the first order of business, I felt it was necessary to take a look at the players the A’s were losing to get a sense of where I needed to look for replacements. To do this, I set up a few new data frames to look at different pieces of the data:

lostplayers <- combo %>% 
  filter(playerID %in% c('giambja01','damonjo01','saenzol01'))

lostplayers2 <- subset(lostplayers, subset = yearID ==2001)

lostplayers3 <- lostplayers2[,c('playerID','H','X2B','X3B','HR','OBP','SLG','BA','AB')]

combo2 <- subset(combo, subset = yearID == 2001)

The main 2 that I use going forward are the lostplayers3 table which is just the data of the 3 lost players, with their stats for the last year with the A’s and just a sub section of the stats we’re evaluating. Lastly, the combo2 dataframe narrows down our look at all the players to just 2001 stats. This will be used as a starting point for my evaluation.

Based upon the parameters of the project, I knew that I had to stay within the $15 million salary cap, combined at bats of 1469 or greater, and a mean OBP of = 0.3639 or greater. By creating another data frame and a plot of that data, we can take a look at the players that would best meet those requirements. I chose $7,000,000.00 as a cap point, assuming that we’ll have some difficulty finding 3 players within cap, with necessary stats if we spend too much on one or two players.

Initial Data Frame with these parameters:

combo3 <- combo2[combo2$OBP > 0.36 & combo2$AB > 490 & combo2$salary < 7000000,]

Building Our Plot

pl <- ggplot(combo3, aes(x = OBP, y = AB))
combo3_plot <- ggplotly(pl+geom_point(aes(size = salary, color = salary)) + scale_colour_gradient(high='red',low = "blue") + geom_text(aes(label = playerID), nudge_y = 5, color = "gray20",check_overlap = TRUE))

For the plot, I thought it was essential to try to get a good overview of the players available and compare the AB, OBP, and Salary all at once.

From this review, there were 3 immediate standouts: gonzalu01, heltoto01, and berkmla01

Narrowing the focus, I set up another dataframe, just for these players:

targets <- combo3 %>% 
  filter(playerID %in% c('gonzalu01','heltoto01','berkmla01'))

Replacement Player Review

Time to review the Summary of the new data frame to confirm all three will meet the mean OBP requirement, per project requirements

summary(targets)
##    playerID             yearID         stint     teamID.x        
##  Length:3           Min.   :2001   Min.   :1   Length:3          
##  Class :character   1st Qu.:2001   1st Qu.:1   Class :character  
##  Mode  :character   Median :2001   Median :1   Mode  :character  
##                     Mean   :2001   Mean   :1                     
##                     3rd Qu.:2001   3rd Qu.:1                     
##                     Max.   :2001   Max.   :1                     
##     lgID.x                G           G_batting           AB     
##  Length:3           Min.   :156.0   Min.   :156.0   Min.   :577  
##  Class :character   1st Qu.:157.5   1st Qu.:157.5   1st Qu.:582  
##  Mode  :character   Median :159.0   Median :159.0   Median :587  
##                     Mean   :159.0   Mean   :159.0   Mean   :591  
##                     3rd Qu.:160.5   3rd Qu.:160.5   3rd Qu.:598  
##                     Max.   :162.0   Max.   :162.0   Max.   :609  
##        R               H              X2B             X3B       
##  Min.   :110.0   Min.   :191.0   Min.   :36.00   Min.   :2.000  
##  1st Qu.:119.0   1st Qu.:194.0   1st Qu.:45.00   1st Qu.:3.500  
##  Median :128.0   Median :197.0   Median :54.00   Median :5.000  
##  Mean   :123.3   Mean   :195.3   Mean   :48.33   Mean   :4.667  
##  3rd Qu.:130.0   3rd Qu.:197.5   3rd Qu.:54.50   3rd Qu.:6.000  
##  Max.   :132.0   Max.   :198.0   Max.   :55.00   Max.   :7.000  
##        HR             RBI            SB          CS          BB        
##  Min.   :34.00   Min.   :126   Min.   :1   Min.   :1   Min.   : 92.00  
##  1st Qu.:41.50   1st Qu.:134   1st Qu.:4   1st Qu.:3   1st Qu.: 95.00  
##  Median :49.00   Median :142   Median :7   Median :5   Median : 98.00  
##  Mean   :46.67   Mean   :138   Mean   :5   Mean   :5   Mean   : 96.67  
##  3rd Qu.:53.00   3rd Qu.:144   3rd Qu.:7   3rd Qu.:7   3rd Qu.: 99.00  
##  Max.   :57.00   Max.   :146   Max.   :7   Max.   :9   Max.   :100.00  
##        SO             IBB             HBP              SH        
##  Min.   : 83.0   Min.   : 5.00   Min.   : 5.00   Min.   :0.0000  
##  1st Qu.: 93.5   1st Qu.:10.00   1st Qu.: 9.00   1st Qu.:0.0000  
##  Median :104.0   Median :15.00   Median :13.00   Median :0.0000  
##  Mean   :102.7   Mean   :14.67   Mean   :10.67   Mean   :0.3333  
##  3rd Qu.:112.5   3rd Qu.:19.50   3rd Qu.:13.50   3rd Qu.:0.5000  
##  Max.   :121.0   Max.   :24.00   Max.   :14.00   Max.   :1.0000  
##        SF             GIDP        G_old             BA              OBP        
##  Min.   :5.000   Min.   : 8   Min.   :156.0   Min.   :0.3251   Min.   :0.4286  
##  1st Qu.:5.000   1st Qu.:11   1st Qu.:157.5   1st Qu.:0.3281   1st Qu.:0.4294  
##  Median :5.000   Median :14   Median :159.0   Median :0.3310   Median :0.4302  
##  Mean   :5.333   Mean   :12   Mean   :159.0   Mean   :0.3306   Mean   :0.4302  
##  3rd Qu.:5.500   3rd Qu.:14   3rd Qu.:160.5   3rd Qu.:0.3333   3rd Qu.:0.4309  
##  Max.   :6.000   Max.   :14   Max.   :162.0   Max.   :0.3356   Max.   :0.4317  
##       X1B             SLG           teamID.y            lgID.y         
##  Min.   :92.00   Min.   :0.6205   Length:3           Length:3          
##  1st Qu.:94.50   1st Qu.:0.6526   Class :character   Class :character  
##  Median :97.00   Median :0.6848   Mode  :character   Mode  :character  
##  Mean   :95.67   Mean   :0.6644                                        
##  3rd Qu.:97.50   3rd Qu.:0.6864                                        
##  Max.   :98.00   Max.   :0.6880                                        
##      salary       
##  Min.   : 305000  
##  1st Qu.:2569166  
##  Median :4833333  
##  Mean   :3362778  
##  3rd Qu.:4891666  
##  Max.   :4950000

CONFIRMED

Reviewing Salary Cap requirement (<$15,000,000)

sum(targets$salary)
## [1] 10088333

CONFIRMED

#Review At Bats (AB) to confirm sum is above the required 1469

sum(targets$AB)
## [1] 1773

Confirmed above 1469

Done?

At this point, the 3 players found meet the criteria set forth by the project parameters but I felt there is a little more analysis to be reviewed here that may benefit the team. For example, the current salary obligations with the 3 selected players leaves a considerable amount of cap room for the team. It is possible that this is beneficial to the team, as those resources can be allocated to other positions or used to keep players previously out of budget, but if we wanted to maximize the $15 million in resources for these 3 players, it is necessary to take a little closer look, especially since we only reviewed players with $7 Million and below cap hit in our initial review.

Further Analysis

cap_play <- (sum(targets$salary))-4950000
print(cap_play)
## [1] 5138333

Referring back to our ‘Plot of Targets’, putting gonzalu01 and heltoto01 back yield the most return to our wallet. The most expensive of the two being heltoto01 and seeing that their ABs are so close, I chose him, understanding that this is more of a salary play exercise and that further analysis may prove that gonzalu01 is a superior player (will visit later).

Returning heltoto01, leaves the following cap space for another player:

cap_room_v1 <- 15000000 - cap_play
print(cap_room_v1)
## [1] 9861667

New cap room is now $9,861,667.00 - which is above the range I looked at before and certainly is worth investigating the potential of our 3rd player coming from this new available pool of players.

combo4 <- combo2[combo2$OBP > 0.36 & combo2$AB > 490 & combo2$salary > 7000000 & combo2$salary < 9861667,]

print(combo4)
##        playerID yearID stint teamID.x lgID.x   G G_batting  AB   R   H X2B X3B
## 346   alomaro01   2001     1      CLE     AL 157       157 575 113 193  34  12
## 1971  biggicr01   2001     1      HOU     NL 155       155 617 118 180  35   3
## NA         <NA>     NA    NA     <NA>   <NA>  NA        NA  NA  NA  NA  NA  NA
## 7943  gilesbr02   2001     1      PIT     NL 160       160 576 116 178  37   7
## 17275 palmera01   2001     1      TEX     AL 160       160 600  98 164  33   0
## 22628 thomeji01   2001     1      CLE     AL 156       156 526 101 153  26   1
##       HR RBI SB CS  BB  SO IBB HBP SH SF GIDP G_old        BA       OBP X1B
## 346   20 100 30  6  80  71   5   4  9  9    9   157 0.3356522 0.4146707 127
## 1971  20  70  7  4  66 100   4  28  0  6   11   155 0.2917342 0.3821478 122
## NA    NA  NA NA NA  NA  NA  NA  NA NA NA   NA    NA        NA        NA  NA
## 7943  37  95 13  6  90  67  14   4  0  4   10   160 0.3090278 0.4035608  97
## 17275 47 123  1  1 101  90   8   7  0  6    8   160 0.2733333 0.3809524  84
## 22628 49 124  0  1 111 185  14   4  0  3    9   156 0.2908745 0.4161491  77
##             SLG teamID.y lgID.y  salary
## 346   0.5408696      CLE     AL 7750000
## 1971  0.4554295      HOU     NL 7750000
## NA           NA     <NA>   <NA>      NA
## 7943  0.5902778      PIT     NL 7333333
## 17275 0.5633333      TEX     AL 9000000
## 22628 0.6235741      CLE     AL 7875000

The takeaway from looking at combo4 data is that there are 5 players that match our critera and are a potential to be one of our replacement players.

At this point, we have a pretty solid lot of 8 players to replace our original 3. Putting all 11 together and comparing their performance side-by-side was my next step.

all_targets <- combo %>%
filter(yearID == 2001 & playerID %in% c('giambja01','damonj01','saenzol01','alomaro01','biggicr01','gilesbr02','palmera01','thomeji01','gonzalu01', 'heltoto01','berkmla01'))

Put them on our graph of OBP and AB to see how they compare:

pl3 <- ggplot(all_targets, aes(x = OBP, y = AB))

pl3_plot <- ggplotly(pl3+geom_point(aes(size = salary, color = salary)) + scale_colour_gradient(high='red',low = "blue") + geom_text(aes(label = playerID), nudge_y = 5, color = "gray20",check_overlap = TRUE))

pl3_plot

Looking at this graph, the original 3 are still jumping out to me as the favorites for a couple reasons. I know that they are cheaper than the new additions I added in that were within the $7 mil to $9 mil range and their performance for this particular year is comparable. I thought a good step forward would be to review historical trends of their OBP to see where the trends were.

replacement_players <- combo %>% 
filter(playerID %in% c('giambja01','damonj01','saenzol01','alomaro01','biggicr01','gilesbr02','palmera01','thomeji01','gonzalu01', 'heltoto01','berkmla01')) %>% 
filter(yearID <2002)

pl4 <- ggplot(replacement_players,aes(x= yearID, y = OBP))
pl4_plot <- pl4 + geom_line() + facet_grid(rows=vars(playerID), labeller = labeller(group = label_wrap_gen(width = 25))) + theme(strip.text.y = element_text(angle = 0), legend.position = "none")

pl4_plot

This visualization proved to me that the original 3 are the best options. Looking at some of the other players, such as ‘palmera01’, we can see that they have many years of MLB stats, showing that they are an older player, maybe beyond their prime and possibly more prone to injury. Looking at the players I chose: ‘berkmla01’ is trending upwards and only has a couple years of wear and tear. The same can be said for heltoto01, though there was a down year most recently. As for the last chosen player, ‘gonzalu01’, there is some history there but the trend still remains pretty consistent with getting on base and points more favorable than the other options that would be a higher salary. Bringing in a veteran player to pair with our 2 younger players seems like a solid strategy, as well.

Final Notes

For my 3 replacement players, I chose to go with: ‘berkmla01’, ‘heltoto01’, and ‘gonzalu01’.

##     playerID   H X2B X3B HR       OBP       SLG        BA  AB
## 2  berkmla01 191  55   5 34 0.4302326 0.6204506 0.3310225 577
## 27 gonzalu01 198  36   7 57 0.4285714 0.6880131 0.3251232 609
## 39 heltoto01 197  54   2 49 0.4316547 0.6848382 0.3356048 587
##     playerID   H X2B X3B HR       OBP       SLG        BA  AB
## 7  damonjo01 165  34   4  9 0.3235294 0.3633540 0.2562112 644
## 24 giambja01 178  47   2 38 0.4769001 0.6596154 0.3423077 520
## 40 saenzol01  67  21   1  9 0.2911765 0.3836066 0.2196721 305

Replacements