This project was completed as part of ‘R for Data Science and Machine Learning’ by Jose Portilla via Udemy. Below are some snippets as provided by the course for general information on the project:
The 2002 Oakland A’s The Oakland Athletics’ 2002 season was the team’s 35th in Oakland, California. It was also the 102nd season in franchise history. The Athletics finished first in the American League West with a record of 103-59.
The Athletics’ 2002 campaign ranks among the most famous in franchise history. Following the 2001 season, Oakland saw the departure of three key players (the lost boys). Billy Beane, the team’s general manager, responded with a series of under-the-radar free agent signings. The new-look Athletics, despite a comparative lack of star power, surprised the baseball world by besting the 2001 team’s regular season record. The team is most famous, however, for winning 20 consecutive games between August 13 and September 4, 2002.[1] The Athletics’ season was the subject of Michael Lewis’ 2003 book Moneyball: The Art of Winning an Unfair Game (as Lewis was given the opportunity to follow the team around throughout that season)
This project is based off the book written by Michael Lewis (later turned into a movie).
Moneyball Book
The central premise of book Moneyball is that the collective wisdom of baseball insiders (including players, managers, coaches, scouts, and the front office) over the past century is subjective and often flawed. Statistics such as stolen bases, runs batted in, and batting average, typically used to gauge players, are relics of a 19th-century view of the game and the statistics available at that time. The book argues that the Oakland A’s’ front office took advantage of more analytical gauges of player performance to field a team that could better compete against richer competitors in Major League Baseball (MLB).
Rigorous statistical analysis had demonstrated that on-base percentage and slugging percentage are better indicators of offensive success, and the A’s became convinced that these qualities were cheaper to obtain on the open market than more historically valued qualities such as speed and contact. These observations often flew in the face of conventional baseball wisdom and the beliefs of many baseball scouts and executives.
By re-evaluating the strategies that produce wins on the field, the 2002 Athletics, with approximately US 44 million dollars in salary, were competitive with larger market teams such as the New York Yankees, who spent over US$125 million in payroll that same season.
Because of the team’s smaller revenues, Oakland is forced to find players undervalued by the market, and their system for finding value in undervalued players has proven itself thus far. This approach brought the A’s to the playoffs in 2002 and 2003.
In this project we’ll work with some data and with the goal of trying to find replacement players for the ones lost at the start of the off-season - During the 2001–02 offseason, the team lost three key free agents to larger market teams: 2000 AL MVP Jason Giambi to the New York Yankees, outfielder Johnny Damon to the Boston Red Sox, and closer Jason Isringhausen to the St. Louis Cardinals.
The main goal of this project is for you to feel comfortable working with R on real data to try and derive actionable insights
#Replacement Players Now we have all the information we need! Here is your final task - Find Replacement Players for the key three players we lost! However, you have three constraints:
The total combined salary of the three players can not exceed 15 million dollars. Their combined number of At Bats (AB) needs to be equal to or greater than the lost players. (1469) Their mean OBP had to equal to or greater than the mean OBP of the lost players (0.3638686667)
We’ll be using data from Sean Lahaman’s Website a very useful source for baseball statistics.
library(dplyr)
library(ggplot2)
library(plotly)
batting <- read.csv('/Users/spencer/moneyball_project/Batting.csv')
sal <- read.csv('/Users/spencer/moneyball_project/Salaries.csv')
From that project instructions, I utilized the ‘Batting.csv’ and created 4 new columns for the following information:
Batting Average:
batting$BA <- batting$H / batting$AB
On Base Percentage:
batting$OBP <- (batting$H + batting$BB + batting$HBP) / (batting$AB + batting$BB + batting$HBP + batting$SF)
Slugging Average:
In order to figure up the slugging average (Slugging Percentage) the amount of ‘singles’ hit was needed. To calculate that figure:
batting$X1B <- batting$H - batting$X2B - batting$X3B - batting$HR
Now to figure the Slugging Percentage:
batting$SLG <- (batting$X1B + (2 * batting$X2B) + (3 * batting$X3B) + (4 * batting$HR)) / batting$AB
Check Current ‘batting’ table to make sure all data is present:
head(batting)
## playerID yearID stint teamID lgID G G_batting AB R H X2B X3B HR RBI SB CS
## 1 aardsda01 2004 1 SFN NL 11 11 0 0 0 0 0 0 0 0 0
## 2 aardsda01 2006 1 CHN NL 45 43 2 0 0 0 0 0 0 0 0
## 3 aardsda01 2007 1 CHA AL 25 2 0 0 0 0 0 0 0 0 0
## 4 aardsda01 2008 1 BOS AL 47 5 1 0 0 0 0 0 0 0 0
## 5 aardsda01 2009 1 SEA AL 73 3 0 0 0 0 0 0 0 0 0
## 6 aardsda01 2010 1 SEA AL 53 4 0 0 0 0 0 0 0 0 0
## BB SO IBB HBP SH SF GIDP G_old BA OBP X1B SLG
## 1 0 0 0 0 0 0 0 11 NaN NaN 0 NaN
## 2 0 0 0 0 1 0 0 45 0 0 0 0
## 3 0 0 0 0 0 0 0 2 NaN NaN 0 NaN
## 4 0 1 0 0 0 0 0 5 0 0 0 0
## 5 0 0 0 0 0 0 0 NA NaN NaN 0 NaN
## 6 0 0 0 0 0 0 0 NA NaN NaN 0 NaN
Looking at the summary of our batting data frame, it is obvious that the data contains extraneous information not needed for our evaluation. It was decided to only look at data for the years 1985 and up:
summary(batting)
## playerID yearID stint teamID
## Length:97889 Min. :1871 Min. :1.000 Length:97889
## Class :character 1st Qu.:1931 1st Qu.:1.000 Class :character
## Mode :character Median :1970 Median :1.000 Mode :character
## Mean :1962 Mean :1.077
## 3rd Qu.:1995 3rd Qu.:1.000
## Max. :2013 Max. :5.000
##
## lgID G G_batting AB
## Length:97889 Min. : 1.00 Min. : 0.00 Min. : 0.0
## Class :character 1st Qu.: 13.00 1st Qu.: 7.00 1st Qu.: 9.0
## Mode :character Median : 35.00 Median : 32.00 Median : 61.0
## Mean : 51.65 Mean : 49.13 Mean :154.1
## 3rd Qu.: 81.00 3rd Qu.: 81.00 3rd Qu.:260.0
## Max. :165.00 Max. :165.00 Max. :716.0
## NA's :1406 NA's :6413
## R H X2B X3B
## Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.000
## 1st Qu.: 0.00 1st Qu.: 1.00 1st Qu.: 0.0 1st Qu.: 0.000
## Median : 5.00 Median : 12.00 Median : 2.0 Median : 0.000
## Mean : 20.47 Mean : 40.37 Mean : 6.8 Mean : 1.424
## 3rd Qu.: 31.00 3rd Qu.: 66.00 3rd Qu.:10.0 3rd Qu.: 2.000
## Max. :192.00 Max. :262.00 Max. :67.0 Max. :36.000
## NA's :6413 NA's :6413 NA's :6413 NA's :6413
## HR RBI SB CS
## Min. : 0.000 Min. : 0.00 Min. : 0.000 Min. : 0.000
## 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 0.000
## Median : 0.000 Median : 5.00 Median : 0.000 Median : 0.000
## Mean : 3.002 Mean : 18.47 Mean : 3.265 Mean : 1.385
## 3rd Qu.: 3.000 3rd Qu.: 28.00 3rd Qu.: 2.000 3rd Qu.: 1.000
## Max. :73.000 Max. :191.00 Max. :138.000 Max. :42.000
## NA's :6413 NA's :6837 NA's :7713 NA's :29867
## BB SO IBB HBP
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.000
## 1st Qu.: 0.00 1st Qu.: 2.00 1st Qu.: 0.00 1st Qu.: 0.000
## Median : 4.00 Median : 11.00 Median : 0.00 Median : 0.000
## Mean : 14.21 Mean : 21.95 Mean : 1.28 Mean : 1.136
## 3rd Qu.: 21.00 3rd Qu.: 31.00 3rd Qu.: 1.00 3rd Qu.: 1.000
## Max. :232.00 Max. :223.00 Max. :120.00 Max. :51.000
## NA's :6413 NA's :14251 NA's :42977 NA's :9233
## SH SF GIDP G_old
## Min. : 0.000 Min. : 0.0 Min. : 0.00 Min. : 0
## 1st Qu.: 0.000 1st Qu.: 0.0 1st Qu.: 0.00 1st Qu.: 11
## Median : 1.000 Median : 0.0 Median : 1.00 Median : 34
## Mean : 2.564 Mean : 1.2 Mean : 3.33 Mean : 51
## 3rd Qu.: 3.000 3rd Qu.: 2.0 3rd Qu.: 5.00 3rd Qu.: 82
## Max. :67.000 Max. :19.0 Max. :36.00 Max. :165
## NA's :12751 NA's :42446 NA's :32521 NA's :5189
## BA OBP X1B SLG
## Min. :0.000 Min. :0.00 Min. : 0.00 Min. :0.000
## 1st Qu.:0.148 1st Qu.:0.19 1st Qu.: 1.00 1st Qu.:0.179
## Median :0.231 Median :0.29 Median : 9.00 Median :0.309
## Mean :0.209 Mean :0.26 Mean : 29.14 Mean :0.291
## 3rd Qu.:0.275 3rd Qu.:0.34 3rd Qu.: 48.00 3rd Qu.:0.397
## Max. :1.000 Max. :1.00 Max. :225.00 Max. :4.000
## NA's :13520 NA's :49115 NA's :6413 NA's :13520
batting1 <- subset(batting,subset = yearID > 1984)
Check our new table ‘batting1’ summary to confirm our year filter is correct:
summary(batting1)
## playerID yearID stint teamID
## Length:35652 Min. :1985 Min. :1.00 Length:35652
## Class :character 1st Qu.:1993 1st Qu.:1.00 Class :character
## Mode :character Median :2000 Median :1.00 Mode :character
## Mean :2000 Mean :1.08
## 3rd Qu.:2007 3rd Qu.:1.00
## Max. :2013 Max. :4.00
##
## lgID G G_batting AB
## Length:35652 Min. : 1.0 Min. : 0.00 Min. : 0.0
## Class :character 1st Qu.: 14.0 1st Qu.: 4.00 1st Qu.: 3.0
## Mode :character Median : 34.0 Median : 27.00 Median : 47.0
## Mean : 51.7 Mean : 46.28 Mean :144.7
## 3rd Qu.: 77.0 3rd Qu.: 77.00 3rd Qu.:241.0
## Max. :163.0 Max. :163.00 Max. :716.0
## NA's :1406 NA's :4377
## R H X2B X3B
## Min. : 0.00 Min. : 0.00 Min. : 0.000 Min. : 0.000
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 0.000
## Median : 4.00 Median : 8.00 Median : 1.000 Median : 0.000
## Mean : 19.44 Mean : 37.95 Mean : 7.293 Mean : 0.824
## 3rd Qu.: 30.00 3rd Qu.: 61.00 3rd Qu.:11.000 3rd Qu.: 1.000
## Max. :152.00 Max. :262.00 Max. :59.000 Max. :23.000
## NA's :4377 NA's :4377 NA's :4377 NA's :4377
## HR RBI SB CS
## Min. : 0.000 Min. : 0.00 Min. : 0.000 Min. : 0.000
## 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 0.000
## Median : 0.000 Median : 3.00 Median : 0.000 Median : 0.000
## Mean : 4.169 Mean : 18.41 Mean : 2.811 Mean : 1.219
## 3rd Qu.: 5.000 3rd Qu.: 27.00 3rd Qu.: 2.000 3rd Qu.: 1.000
## Max. :73.000 Max. :165.00 Max. :110.000 Max. :29.000
## NA's :4377 NA's :4377 NA's :4377 NA's :4377
## BB SO IBB HBP
## Min. : 0.00 Min. : 0.00 Min. : 0.000 Min. : 0.000
## 1st Qu.: 0.00 1st Qu.: 1.00 1st Qu.: 0.000 1st Qu.: 0.000
## Median : 3.00 Median : 12.00 Median : 0.000 Median : 0.000
## Mean : 14.06 Mean : 27.03 Mean : 1.171 Mean : 1.273
## 3rd Qu.: 21.00 3rd Qu.: 42.00 3rd Qu.: 1.000 3rd Qu.: 1.000
## Max. :232.00 Max. :223.00 Max. :120.000 Max. :35.000
## NA's :4377 NA's :4377 NA's :4378 NA's :4387
## SH SF GIDP G_old
## Min. : 0.000 Min. : 0.000 Min. : 0.00 Min. : 0.0
## 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 11.0
## Median : 0.000 Median : 0.000 Median : 1.00 Median : 32.0
## Mean : 1.465 Mean : 1.212 Mean : 3.25 Mean : 49.7
## 3rd Qu.: 2.000 3rd Qu.: 2.000 3rd Qu.: 5.00 3rd Qu.: 77.0
## Max. :39.000 Max. :17.000 Max. :35.00 Max. :163.0
## NA's :4377 NA's :4378 NA's :4377 NA's :5189
## BA OBP X1B SLG
## Min. :0.000 Min. :0.000 Min. : 0.00 Min. :0.000
## 1st Qu.:0.136 1st Qu.:0.188 1st Qu.: 0.00 1st Qu.:0.167
## Median :0.233 Median :0.296 Median : 6.00 Median :0.333
## Mean :0.205 Mean :0.262 Mean : 25.66 Mean :0.304
## 3rd Qu.:0.274 3rd Qu.:0.342 3rd Qu.: 42.00 3rd Qu.:0.423
## Max. :1.000 Max. :1.000 Max. :225.00 Max. :4.000
## NA's :8905 NA's :8821 NA's :4377 NA's :8905
For the last bit of house-keeping on the data, we needed to merge the salary table and the baseball stats in order to compare the salaries of leaving players and their replacements:
combo <- merge(batting1, sal, by= c('playerID','yearID'))
For the first order of business, I felt it was necessary to take a look at the players the A’s were losing to get a sense of where I needed to look for replacements. To do this, I set up a few new data frames to look at different pieces of the data:
lostplayers <- combo %>%
filter(playerID %in% c('giambja01','damonjo01','saenzol01'))
lostplayers2 <- subset(lostplayers, subset = yearID ==2001)
lostplayers3 <- lostplayers2[,c('playerID','H','X2B','X3B','HR','OBP','SLG','BA','AB')]
combo2 <- subset(combo, subset = yearID == 2001)
The main 2 that I use going forward are the lostplayers3 table which is just the data of the 3 lost players, with their stats for the last year with the A’s and just a sub section of the stats we’re evaluating. Lastly, the combo2 dataframe narrows down our look at all the players to just 2001 stats. This will be used as a starting point for my evaluation.
Based upon the parameters of the project, I knew that I had to stay within the $15 million salary cap, combined at bats of 1469 or greater, and a mean OBP of = 0.3639 or greater. By creating another data frame and a plot of that data, we can take a look at the players that would best meet those requirements. I chose $7,000,000.00 as a cap point, assuming that we’ll have some difficulty finding 3 players within cap, with necessary stats if we spend too much on one or two players.
Initial Data Frame with these parameters:
combo3 <- combo2[combo2$OBP > 0.36 & combo2$AB > 490 & combo2$salary < 7000000,]
Building Our Plot
pl <- ggplot(combo3, aes(x = OBP, y = AB))
combo3_plot <- ggplotly(pl+geom_point(aes(size = salary, color = salary)) + scale_colour_gradient(high='red',low = "blue") + geom_text(aes(label = playerID), nudge_y = 5, color = "gray20",check_overlap = TRUE))
For the plot, I thought it was essential to try to get a good overview of the players available and compare the AB, OBP, and Salary all at once.
From this review, there were 3 immediate standouts: gonzalu01, heltoto01, and berkmla01
Narrowing the focus, I set up another dataframe, just for these players:
targets <- combo3 %>%
filter(playerID %in% c('gonzalu01','heltoto01','berkmla01'))
Time to review the Summary of the new data frame to confirm all three will meet the mean OBP requirement, per project requirements
summary(targets)
## playerID yearID stint teamID.x
## Length:3 Min. :2001 Min. :1 Length:3
## Class :character 1st Qu.:2001 1st Qu.:1 Class :character
## Mode :character Median :2001 Median :1 Mode :character
## Mean :2001 Mean :1
## 3rd Qu.:2001 3rd Qu.:1
## Max. :2001 Max. :1
## lgID.x G G_batting AB
## Length:3 Min. :156.0 Min. :156.0 Min. :577
## Class :character 1st Qu.:157.5 1st Qu.:157.5 1st Qu.:582
## Mode :character Median :159.0 Median :159.0 Median :587
## Mean :159.0 Mean :159.0 Mean :591
## 3rd Qu.:160.5 3rd Qu.:160.5 3rd Qu.:598
## Max. :162.0 Max. :162.0 Max. :609
## R H X2B X3B
## Min. :110.0 Min. :191.0 Min. :36.00 Min. :2.000
## 1st Qu.:119.0 1st Qu.:194.0 1st Qu.:45.00 1st Qu.:3.500
## Median :128.0 Median :197.0 Median :54.00 Median :5.000
## Mean :123.3 Mean :195.3 Mean :48.33 Mean :4.667
## 3rd Qu.:130.0 3rd Qu.:197.5 3rd Qu.:54.50 3rd Qu.:6.000
## Max. :132.0 Max. :198.0 Max. :55.00 Max. :7.000
## HR RBI SB CS BB
## Min. :34.00 Min. :126 Min. :1 Min. :1 Min. : 92.00
## 1st Qu.:41.50 1st Qu.:134 1st Qu.:4 1st Qu.:3 1st Qu.: 95.00
## Median :49.00 Median :142 Median :7 Median :5 Median : 98.00
## Mean :46.67 Mean :138 Mean :5 Mean :5 Mean : 96.67
## 3rd Qu.:53.00 3rd Qu.:144 3rd Qu.:7 3rd Qu.:7 3rd Qu.: 99.00
## Max. :57.00 Max. :146 Max. :7 Max. :9 Max. :100.00
## SO IBB HBP SH
## Min. : 83.0 Min. : 5.00 Min. : 5.00 Min. :0.0000
## 1st Qu.: 93.5 1st Qu.:10.00 1st Qu.: 9.00 1st Qu.:0.0000
## Median :104.0 Median :15.00 Median :13.00 Median :0.0000
## Mean :102.7 Mean :14.67 Mean :10.67 Mean :0.3333
## 3rd Qu.:112.5 3rd Qu.:19.50 3rd Qu.:13.50 3rd Qu.:0.5000
## Max. :121.0 Max. :24.00 Max. :14.00 Max. :1.0000
## SF GIDP G_old BA OBP
## Min. :5.000 Min. : 8 Min. :156.0 Min. :0.3251 Min. :0.4286
## 1st Qu.:5.000 1st Qu.:11 1st Qu.:157.5 1st Qu.:0.3281 1st Qu.:0.4294
## Median :5.000 Median :14 Median :159.0 Median :0.3310 Median :0.4302
## Mean :5.333 Mean :12 Mean :159.0 Mean :0.3306 Mean :0.4302
## 3rd Qu.:5.500 3rd Qu.:14 3rd Qu.:160.5 3rd Qu.:0.3333 3rd Qu.:0.4309
## Max. :6.000 Max. :14 Max. :162.0 Max. :0.3356 Max. :0.4317
## X1B SLG teamID.y lgID.y
## Min. :92.00 Min. :0.6205 Length:3 Length:3
## 1st Qu.:94.50 1st Qu.:0.6526 Class :character Class :character
## Median :97.00 Median :0.6848 Mode :character Mode :character
## Mean :95.67 Mean :0.6644
## 3rd Qu.:97.50 3rd Qu.:0.6864
## Max. :98.00 Max. :0.6880
## salary
## Min. : 305000
## 1st Qu.:2569166
## Median :4833333
## Mean :3362778
## 3rd Qu.:4891666
## Max. :4950000
CONFIRMED
Reviewing Salary Cap requirement (<$15,000,000)
sum(targets$salary)
## [1] 10088333
CONFIRMED
#Review At Bats (AB) to confirm sum is above the required 1469
sum(targets$AB)
## [1] 1773
Confirmed above 1469
At this point, the 3 players found meet the criteria set forth by the project parameters but I felt there is a little more analysis to be reviewed here that may benefit the team. For example, the current salary obligations with the 3 selected players leaves a considerable amount of cap room for the team. It is possible that this is beneficial to the team, as those resources can be allocated to other positions or used to keep players previously out of budget, but if we wanted to maximize the $15 million in resources for these 3 players, it is necessary to take a little closer look, especially since we only reviewed players with $7 Million and below cap hit in our initial review.
cap_play <- (sum(targets$salary))-4950000
print(cap_play)
## [1] 5138333
Referring back to our ‘Plot of Targets’, putting gonzalu01 and heltoto01 back yield the most return to our wallet. The most expensive of the two being heltoto01 and seeing that their ABs are so close, I chose him, understanding that this is more of a salary play exercise and that further analysis may prove that gonzalu01 is a superior player (will visit later).
Returning heltoto01, leaves the following cap space for another player:
cap_room_v1 <- 15000000 - cap_play
print(cap_room_v1)
## [1] 9861667
New cap room is now $9,861,667.00 - which is above the range I looked at before and certainly is worth investigating the potential of our 3rd player coming from this new available pool of players.
combo4 <- combo2[combo2$OBP > 0.36 & combo2$AB > 490 & combo2$salary > 7000000 & combo2$salary < 9861667,]
print(combo4)
## playerID yearID stint teamID.x lgID.x G G_batting AB R H X2B X3B
## 346 alomaro01 2001 1 CLE AL 157 157 575 113 193 34 12
## 1971 biggicr01 2001 1 HOU NL 155 155 617 118 180 35 3
## NA <NA> NA NA <NA> <NA> NA NA NA NA NA NA NA
## 7943 gilesbr02 2001 1 PIT NL 160 160 576 116 178 37 7
## 17275 palmera01 2001 1 TEX AL 160 160 600 98 164 33 0
## 22628 thomeji01 2001 1 CLE AL 156 156 526 101 153 26 1
## HR RBI SB CS BB SO IBB HBP SH SF GIDP G_old BA OBP X1B
## 346 20 100 30 6 80 71 5 4 9 9 9 157 0.3356522 0.4146707 127
## 1971 20 70 7 4 66 100 4 28 0 6 11 155 0.2917342 0.3821478 122
## NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## 7943 37 95 13 6 90 67 14 4 0 4 10 160 0.3090278 0.4035608 97
## 17275 47 123 1 1 101 90 8 7 0 6 8 160 0.2733333 0.3809524 84
## 22628 49 124 0 1 111 185 14 4 0 3 9 156 0.2908745 0.4161491 77
## SLG teamID.y lgID.y salary
## 346 0.5408696 CLE AL 7750000
## 1971 0.4554295 HOU NL 7750000
## NA NA <NA> <NA> NA
## 7943 0.5902778 PIT NL 7333333
## 17275 0.5633333 TEX AL 9000000
## 22628 0.6235741 CLE AL 7875000
The takeaway from looking at combo4 data is that there are 5 players that match our critera and are a potential to be one of our replacement players.
At this point, we have a pretty solid lot of 8 players to replace our original 3. Putting all 11 together and comparing their performance side-by-side was my next step.
all_targets <- combo %>%
filter(yearID == 2001 & playerID %in% c('giambja01','damonj01','saenzol01','alomaro01','biggicr01','gilesbr02','palmera01','thomeji01','gonzalu01', 'heltoto01','berkmla01'))
Put them on our graph of OBP and AB to see how they compare:
pl3 <- ggplot(all_targets, aes(x = OBP, y = AB))
pl3_plot <- ggplotly(pl3+geom_point(aes(size = salary, color = salary)) + scale_colour_gradient(high='red',low = "blue") + geom_text(aes(label = playerID), nudge_y = 5, color = "gray20",check_overlap = TRUE))
pl3_plot
Looking at this graph, the original 3 are still jumping out to me as the favorites for a couple reasons. I know that they are cheaper than the new additions I added in that were within the $7 mil to $9 mil range and their performance for this particular year is comparable. I thought a good step forward would be to review historical trends of their OBP to see where the trends were.
replacement_players <- combo %>%
filter(playerID %in% c('giambja01','damonj01','saenzol01','alomaro01','biggicr01','gilesbr02','palmera01','thomeji01','gonzalu01', 'heltoto01','berkmla01')) %>%
filter(yearID <2002)
pl4 <- ggplot(replacement_players,aes(x= yearID, y = OBP))
pl4_plot <- pl4 + geom_line() + facet_grid(rows=vars(playerID), labeller = labeller(group = label_wrap_gen(width = 25))) + theme(strip.text.y = element_text(angle = 0), legend.position = "none")
pl4_plot
This visualization proved to me that the original 3 are the best options. Looking at some of the other players, such as ‘palmera01’, we can see that they have many years of MLB stats, showing that they are an older player, maybe beyond their prime and possibly more prone to injury. Looking at the players I chose: ‘berkmla01’ is trending upwards and only has a couple years of wear and tear. The same can be said for heltoto01, though there was a down year most recently. As for the last chosen player, ‘gonzalu01’, there is some history there but the trend still remains pretty consistent with getting on base and points more favorable than the other options that would be a higher salary. Bringing in a veteran player to pair with our 2 younger players seems like a solid strategy, as well.
For my 3 replacement players, I chose to go with: ‘berkmla01’, ‘heltoto01’, and ‘gonzalu01’.
## playerID H X2B X3B HR OBP SLG BA AB
## 2 berkmla01 191 55 5 34 0.4302326 0.6204506 0.3310225 577
## 27 gonzalu01 198 36 7 57 0.4285714 0.6880131 0.3251232 609
## 39 heltoto01 197 54 2 49 0.4316547 0.6848382 0.3356048 587
## playerID H X2B X3B HR OBP SLG BA AB
## 7 damonjo01 165 34 4 9 0.3235294 0.3633540 0.2562112 644
## 24 giambja01 178 47 2 38 0.4769001 0.6596154 0.3423077 520
## 40 saenzol01 67 21 1 9 0.2911765 0.3836066 0.2196721 305
Replacements