Lahman Data base was used to analyze team and pitcher data from 2007 to 2016. Analyzed both individual and a summarization of individuals that make up a team to find characteristics that tend to have a lower cost to strike out ratio. Only used pitchers that played in 10 or more games.
library(RSQLite)
library(dplyr)
library(lubridate)
library(ggplot2)
library(caret)
library(kableExtra)
SQLight was used to select data and rename columns. From the Lahman data base tables Master, Pitching, Teams, and Salaries were used, and the data was loaded into R for analysis.
q_pitch <- "
SELECT playerID, yearID, teamID, G as pit_games,
BB as pit_walks, IBB as pit_intwalks,
SO as pit_so, ERA
FROM Pitching
WHERE yearID BETWEEN 2007 AND 2016;
"
q_master <- "
SELECT playerID, birthYear, birthMonth,
nameLast, nameGiven, debut, finalGame
FROM Master
WHERE finalGame >= 2007
"
q_salaries <- "
SELECT yearID, teamID, playerID, salary as pit_salary
FROM Salaries
WHERE yearID BETWEEN 2007 AND 2016;
"
q_teams <- "
SELECT yearID, lgID, teamID, W as team_wins, L as team_losses,
SOA as team_so, AB as team_atbat, BB as team_batters_walk
FROM Teams
WHERE yearID BETWEEN 2007 AND 2016;
"
conn <- dbConnect(drv = SQLite(), dbname = "lahman2016.db")
pitchers <- dbGetQuery(conn = conn, statement = q_pitch)
teams <- dbGetQuery(conn = conn, statement = q_teams)
master <- dbGetQuery(conn = conn, statement = q_master)
salaries <- dbGetQuery(conn = conn, statement = q_salaries)
dbDisconnect(conn)
All four tables were joined together to get individual statistics and then grouped to get team statics by team and year. Calculations performed to get information that was not provided in database such as cost per strike out, career length, total walks. Grouping performed to calculate team data such as average career length, cost per strike out, and average age.
baseball_full <-tbl_df(pitchers) %>%
filter(pit_games >= 10) %>%
inner_join(y= master, by = 'playerID') %>%
inner_join(y = salaries, by = c('playerID', 'yearID', 'teamID')) %>%
inner_join(y = teams, by = c('yearID', 'teamID')) %>%
mutate(pit_age = ifelse(birthMonth < 4, yearID - birthYear,
yearID - birthYear -1)) %>%
mutate(pit_costso = pit_salary/pit_so) %>%
mutate(career_length = yearID - year(debut)) %>%
mutate(pit_tot_walks = pit_walks + pit_intwalks) %>%
mutate(wl_ratio = team_wins/(team_wins + team_losses)) %>%
mutate(team_atbat_walkpct = team_batters_walk / team_atbat) %>%
group_by(teamID, yearID) %>%
mutate(team_salary = sum(pit_salary),
team_avg_age = mean(pit_age),
team_avg_career_length = mean(career_length),
team_costso = team_salary/team_so) %>%
ungroup ()%>%
select(playerID, nameGiven, nameLast, yearID, pit_games, pit_so,
pit_tot_walks, ERA, pit_salary, pit_costso, pit_age, finalGame, career_length,
team_atbat_walkpct, teamID, wl_ratio, team_wins, team_losses, team_so,
team_costso,team_avg_career_length, team_avg_age, team_salary)
summary (baseball_full)
## playerID nameGiven nameLast yearID
## Length:3355 Length:3355 Length:3355 Min. :2007
## Class :character Class :character Class :character 1st Qu.:2009
## Mode :character Mode :character Mode :character Median :2011
## Mean :2011
## 3rd Qu.:2014
## Max. :2016
## pit_games pit_so pit_tot_walks ERA
## Min. :10.00 Min. : 1.00 Min. : 0.00 Min. : 0.000
## 1st Qu.:26.00 1st Qu.: 38.00 1st Qu.: 18.00 1st Qu.: 3.150
## Median :33.00 Median : 62.00 Median : 29.00 Median : 3.920
## Mean :39.93 Mean : 76.99 Mean : 33.95 Mean : 4.127
## 3rd Qu.:58.00 3rd Qu.:106.00 3rd Qu.: 46.00 3rd Qu.: 4.870
## Max. :92.00 Max. :301.00 Max. :114.00 Max. :13.500
## pit_salary pit_costso pit_age finalGame
## Min. : 380000 Min. : 1528 Min. :20.00 Length:3355
## 1st Qu.: 500000 1st Qu.: 9202 1st Qu.:26.00 Class :character
## Median : 1250000 Median : 27667 Median :28.00 Mode :character
## Mean : 3302823 Mean : 55849 Mean :28.83
## 3rd Qu.: 4250000 3rd Qu.: 70455 3rd Qu.:31.00
## Max. :32571000 Max. :923077 Max. :49.00
## career_length team_atbat_walkpct teamID wl_ratio
## Min. : 0.00 Min. :0.06844 Length:3355 Min. :0.3148
## 1st Qu.: 2.00 1st Qu.:0.08314 Class :character 1st Qu.:0.4506
## Median : 5.00 Median :0.09152 Mode :character Median :0.5031
## Mean : 5.58 Mean :0.09183 Mean :0.5007
## 3rd Qu.: 8.00 3rd Qu.:0.09925 3rd Qu.:0.5556
## Max. :26.00 Max. :0.12355 Max. :0.6358
## team_wins team_losses team_so team_costso
## Min. : 51.0 Min. : 59.00 Min. : 911 Min. : 7162
## 1st Qu.: 73.0 1st Qu.: 72.00 1st Qu.:1096 1st Qu.:21172
## Median : 81.0 Median : 80.00 Median :1179 Median :30740
## Mean : 81.1 Mean : 80.87 Mean :1177 Mean :32729
## 3rd Qu.: 90.0 3rd Qu.: 89.00 3rd Qu.:1249 3rd Qu.:40819
## Max. :103.0 Max. :111.00 Max. :1450 Max. :81455
## team_avg_career_length team_avg_age team_salary
## Min. :2.727 Min. :25.70 Min. : 6982500
## 1st Qu.:4.750 1st Qu.:27.93 1st Qu.:25058500
## Median :5.385 Median :28.82 Median :35479900
## Mean :5.580 Mean :28.83 Mean :38822138
## 3rd Qu.:6.357 3rd Qu.:29.67 3rd Qu.:48572271
## Max. :9.769 Max. :33.08 Max. :98203500
dim(diamonds)
## [1] 53940 10
Looking to build a pitching team where one would get the best deal for cost of strike out.Looking at cost per strike out of indivual pitchers the majority fall below $120,000 with a few outliers.
ggplot(data = baseball_full) +
geom_histogram(mapping = aes( x = pit_costso), bins = 30, fill ="steel blue")+
scale_x_continuous(labels = scales::comma) +
geom_vline(xintercept = mean(baseball_full$pit_costso), color ="dark red") +
labs(x = "Pitchers Cost per Strike Out",
y = "Count" ,
title = "Comparison of Cost per Strike Out Among Pitchers")
Correlation table gives a quick view of how each attribute effects the other attributes. Pitcher salary, career and pitcher age have this highest corrolation values where the team win loss ratio has the lowest.
sel_data <- baseball_full %>%
select(pit_games, pit_so, ERA, pit_age, pit_salary, pit_costso, pit_tot_walks,
career_length, wl_ratio)
sel_data_cor <- round(cor(sel_data),4)
head(sel_data_cor)
## pit_games pit_so ERA pit_age pit_salary pit_costso
## pit_games 1.0000 -0.0841 -0.4583 0.0802 -0.1592 -0.2098
## pit_so -0.0841 1.0000 -0.2830 -0.1652 0.3676 -0.2256
## ERA -0.4583 -0.2830 1.0000 0.0157 -0.0665 0.1981
## pit_age 0.0802 -0.1652 0.0157 1.0000 0.3556 0.4037
## pit_salary -0.1592 0.3676 -0.0665 0.3556 1.0000 0.5094
## pit_costso -0.2098 -0.2256 0.1981 0.4037 0.5094 1.0000
## pit_tot_walks career_length wl_ratio
## pit_games -0.1337 0.0069 0.0580
## pit_so 0.7809 -0.0208 0.1175
## ERA -0.0603 0.0111 -0.1795
## pit_age -0.1726 0.8506 0.0528
## pit_salary 0.2417 0.4832 0.0870
## pit_costso -0.2245 0.4429 -0.0096
Graph the high and low correlations to see how the compair to one another and how the correlation value is linked to the RSquared factor that is used in linear modeling.
features_pit_age <- baseball_full %>%
select(pit_age)
labels_pit_costso <- baseball_full$pit_costso
model <- train(x = features_pit_age, y = labels_pit_costso, method = 'lm')
model$results
## intercept RMSE Rsquared MAE RMSESD RsquaredSD MAESD
## 1 TRUE 74880.03 0.1696633 41628.33 5546.62 0.01864453 1051.357
ggplot(data = baseball_full) +
geom_point(mapping = aes(x = pit_age, y = pit_costso)) +
labs(x = "Pitchers Age",
y = "Cost per Strike Out" ,
title = "Pitcher Age to Strike Out Value") +
scale_y_continuous(labels = scales::comma) +
theme_dark() +
geom_abline(intercept = model$finalModel$coefficients[[1]],
slope = model$finalModel$coefficients[[2]],
color = 'dark red', size = 1)
As the graph shows there is a decent interrelationship between the age of the pitcher and the cost per strike out. As pitcher get older the value per strike out goes down.
features_career_length <- baseball_full %>%
select(career_length)
labels_pit_costso <- baseball_full$pit_costso
model <- train(x = features_career_length, y = labels_pit_costso, method = 'lm')
model$results
## intercept RMSE Rsquared MAE RMSESD RsquaredSD MAESD
## 1 TRUE 73306.72 0.1995042 39463.56 5448.225 0.01692802 1327.326
ggplot(data = baseball_full) +
geom_point(mapping = aes(x = career_length, y = pit_costso)) +
labs(x = "Career Length",
y = "Cost per Strike Out" ,
title = "Career Length to Strike Out Value") +
scale_y_continuous(labels = scales::comma) +
theme_dark() +
geom_abline(intercept = model$finalModel$coefficients[[1]],
slope = model$finalModel$coefficients[[2]],
color = 'dark red', size = 1)
As the graph shows there is a decent interrelationship between the career length of the pitcher and the cost per strike out. As pitcher’s career gets longer the value per strike out goes down.
features_pit_salary <- baseball_full %>%
select(pit_salary)
labels_pit_costso <- baseball_full$pit_costso
model <- train(x = features_pit_salary, y = labels_pit_costso, method = 'lm')
model$results
## intercept RMSE Rsquared MAE RMSESD RsquaredSD MAESD
## 1 TRUE 72443.15 0.2563275 36930.24 6883.971 0.03165741 1593.329
ggplot(data = baseball_full) +
geom_point(mapping = aes(x = pit_salary, y = pit_costso))+
labs(x = "Pitcher Salary",
y = "Cost per Strike Out" ,
title = "Pitcher Salary to Cost per Strike Out") +
scale_y_continuous(labels = scales::comma) +
theme_dark() +
geom_abline(intercept = model$finalModel$coefficients[[1]],
slope = model$finalModel$coefficients[[2]],
color = 'dark red', size = 1)
As the graph shows there is a decent interrelationship between the salary of the pitcher and the cost per strike out. As pitcher’s goes up the cost per strike up goes up as well. These two values should have a direct corelations because pitcher’s salary is used to calculate the cost per strike out. The variation the graph could be related to the value of the pitcher off the mound, health of the pitcher, or power of negotiating salary in baseball.
features_wl_ratio <- baseball_full %>%
select(wl_ratio)
labels_pit_costso <- baseball_full$pit_costso
model <- train(x = features_wl_ratio, y = labels_pit_costso, method = 'lm')
model$results
## intercept RMSE Rsquared MAE RMSESD RsquaredSD MAESD
## 1 TRUE 81830.83 0.0006658077 50432.45 4713.404 0.00104132 1159.298
ggplot(data = baseball_full) +
geom_point(mapping = aes(x = wl_ratio, y = pit_costso)) +
labs(x = "Percentage of Wins",
y = "Cost per Strike Out" ,
title = "Strike Out cost to Wins") +
scale_y_continuous(labels = scales::comma) +
theme_dark() +
geom_abline(intercept = model$finalModel$coefficients[[1]],
slope = model$finalModel$coefficients[[2]],
color = 'dark red', size = 1)
There is very little correlation between the cost of a strike out to a team having a winning record. A manager or owner would have a hard time going out and buying a team of pitchers to get a winning record.
Manager or owner was to get the biggest bang for their buck on a team of pitchers. Looking for pitchers that have above average stats with below average pay. Gathering a team of 13 pitchers. The first rank has more rigorous stats and might result in less than 13 pitchers for a given year. The second rank just has below average cost per strike out and above average amount of strike outs. Taken the first list out of the second list to ensure that there are no duplicates. Joining the groups back together ordered them by rank to ensure the starter group was included then the rest of the pitchers by cost per strike out. Took the top 13 to form a “Good Deal” bullpen.
starters <- baseball_full %>%
filter(pit_salary < mean(pit_salary),
pit_so > mean(pit_so),
pit_tot_walks < mean(pit_tot_walks),
ERA < mean(ERA),
wl_ratio > 0.50,
year(finalGame) == 2016,
yearID == 2016) %>%
select(playerID, nameGiven, nameLast, pit_salary, pit_so, pit_costso,
pit_tot_walks, ERA, career_length, pit_age) %>%
mutate(rank = 1) %>%
arrange(pit_costso)
good_costso<-baseball_full %>%
filter(pit_salary < mean(pit_salary),
pit_so > mean(pit_so),
year(finalGame) == 2016,
yearID == 2016) %>%
select(playerID, nameGiven, nameLast, pit_salary, pit_so, pit_costso,
pit_tot_walks, ERA, career_length, pit_age) %>%
arrange(pit_costso)
second_string <- anti_join(good_costso, starters,
by = c('playerID', 'nameGiven', 'nameLast',
'pit_salary', 'pit_so',
'pit_costso', 'pit_tot_walks',
'ERA', 'career_length', 'pit_age')) %>%
mutate(rank = 2)
good_deal_team <- starters %>%
full_join(y = second_string,
by = c('playerID', 'nameGiven', 'nameLast', 'pit_salary',
'pit_so', 'pit_costso', 'pit_tot_walks', 'ERA',
'career_length','pit_age','rank')) %>%
select(nameGiven, nameLast, pit_salary, pit_so, pit_tot_walks,
ERA, pit_costso, rank) %>%
rename(FirstMiddle = nameGiven,
Last = nameLast,
Salary = pit_salary,
StrikeOut = pit_so,
Walks = pit_tot_walks,
SOCost = pit_costso) %>%
arrange(rank, SOCost) %>%
slice (1:13)
good_deal_team %>%
kable("html") %>%
kable_styling()
| FirstMiddle | Last | Salary | StrikeOut | Walks | ERA | SOCost | rank |
|---|---|---|---|---|---|---|---|
| Kenneth Robert | Giles | 528200 | 102 | 26 | 4.11 | 5178.431 | 1 |
| Roberto | Osuna | 516100 | 82 | 18 | 2.68 | 6293.902 | 1 |
| Brad | Brach | 1250000 | 92 | 26 | 2.05 | 13586.957 | 1 |
| Robert Glenn | Ray | 521000 | 218 | 75 | 4.90 | 2389.908 | 2 |
| Jonathan Charles | Gray | 509500 | 185 | 61 | 4.61 | 2754.054 | 2 |
| Collin Alexander | McHugh | 529000 | 177 | 55 | 4.34 | 2988.701 | 2 |
| Kevin John | Gausman | 532000 | 174 | 48 | 3.61 | 3057.471 | 2 |
| Jerad Joseph | Eickhoff | 516000 | 167 | 44 | 3.65 | 3089.820 | 2 |
| Marcus Earl | Stroman | 515900 | 166 | 54 | 4.37 | 3107.831 | 2 |
| Daniel Steven | Straily | 512100 | 162 | 77 | 3.76 | 3161.111 | 2 |
| Aaron Jacob | Sanchez | 517800 | 161 | 63 | 3.00 | 3216.149 | 2 |
| Daniel Dariel | Salazar | 536200 | 161 | 66 | 3.87 | 3330.435 | 2 |
| Vincent John | Velasquez | 516000 | 152 | 46 | 4.12 | 3394.737 | 2 |
Correlation table gives a quick view of how each attribute effects the other attributes. The team cost per strike out has a high interrelationship with team average age, team avgerage career length and the total team salary. There is a much smaller correlation with the number of strike outs to cost per strike out. The ultimate goal in baseball is to win. When it comes to pitchers there is no winning stratagy. All of the team attributes have a low interrelationship to the win loss ratio.
sel_team_data <- baseball_full %>%
select(team_so, team_atbat_walkpct,
team_avg_age, team_avg_career_length, team_salary, team_costso, wl_ratio)
sel_team_cor <- round(cor(sel_team_data),4)
head(sel_team_cor)
## team_so team_atbat_walkpct team_avg_age
## team_so 1.0000 -0.0545 0.0226
## team_atbat_walkpct -0.0545 1.0000 0.1313
## team_avg_age 0.0226 0.1313 1.0000
## team_avg_career_length -0.0062 0.1297 0.8804
## team_salary 0.3594 -0.0665 0.4574
## team_costso 0.1848 -0.0618 0.4902
## team_avg_career_length team_salary team_costso
## team_so -0.0062 0.3594 0.1848
## team_atbat_walkpct 0.1297 -0.0665 -0.0618
## team_avg_age 0.8804 0.4574 0.4902
## team_avg_career_length 1.0000 0.5166 0.5568
## team_salary 0.5166 1.0000 0.9800
## team_costso 0.5568 0.9800 1.0000
## wl_ratio
## team_so 0.3549
## team_atbat_walkpct 0.3425
## team_avg_age 0.1742
## team_avg_career_length 0.2271
## team_salary 0.2765
## team_costso 0.2281
Looking at ten years of team data for the cost per strike out the team as a whole the cost is more copared to that of the indiduals because the pitching bullpen is lob sided to the number of pitchers playing in any game. The extreams are reduced but there is still some outliers.
data_baseball <- baseball_full %>%
select(yearID, teamID, team_wins, team_losses, team_so,
team_atbat_walkpct,team_avg_age, team_avg_career_length,
team_salary, team_costso, wl_ratio) %>%
group_by(teamID, yearID)
team_baseball<- distinct(data_baseball)
mean_team_cost <- mean(baseball_full$team_costso)
ggplot(data = team_baseball) +
geom_histogram(mapping = aes( x = team_costso), bins = 25) +
scale_x_continuous(labels = scales::comma) +
geom_vline(xintercept=mean(baseball_full$team_costso), color="red")
ggplot(data = baseball_full) +
geom_boxplot(mapping = aes(x = yearID, y = team_costso,
group = yearID,
fill = yearID)) +
labs(x = "Year",
y = "Team Price for Strike Out" ,
title = "Team Strike Out Cost by Year") +
scale_y_continuous(labels = scales::comma) +
theme_dark()
features_team_avg_age <- baseball_full %>%
select(team_avg_age)
labels_team_costso <- baseball_full$team_costso
model <- train(x = features_team_avg_age,
y = labels_team_costso,
method = 'lm')
model$results
## intercept RMSE Rsquared MAE RMSESD RsquaredSD MAESD
## 1 TRUE 12983.11 0.2381298 10262.29 241.4323 0.01986402 161.9459
ggplot(data = baseball_full) +
geom_point(mapping = aes(x = team_avg_age,
y = team_costso)) +
labs(x = "Average Team Age",
y = "Team Cost per Strike Out" ,
title = "Team Cost per Strike Out to age") +
scale_y_continuous(labels = scales::comma) +
theme_dark() +
geom_abline(intercept = model$finalModel$coefficients[[1]],
slope = model$finalModel$coefficients[[2]],
color = 'dark red', size = 1)
As the graph shows there is a decent correlation between the average age of the pitching team and the cost per strike out. The same interrelationships is with individuals, as the pitching team get older the value per strike out goes down making the cost per strike out go up.
features_team_avg_career_length <- baseball_full %>%
select(team_avg_career_length)
labels_team_costso <- baseball_full$team_costso
model <- train(x = features_team_avg_career_length,
y = labels_team_costso,
method = 'lm')
model$results
## intercept RMSE Rsquared MAE RMSESD RsquaredSD MAESD
## 1 TRUE 12310.64 0.3091983 9729.782 215.5717 0.02364329 159.7309
ggplot(data = baseball_full) +
geom_point(mapping = aes(x = team_avg_career_length,
y = team_costso)) +
labs(x = "Team Career Length",
y = "Team Cost per Strike Out" ,
title = "Career Length to Strike Out Cost") +
scale_y_continuous(labels = scales::comma) +
theme_dark() +
geom_abline(intercept = model$finalModel$coefficients[[1]],
slope = model$finalModel$coefficients[[2]],
color = 'dark red', size = 1)
As the graph shows there is a decent correlation between the average career of the pitching team and the cost per strike out. The same interrelationships is with individuals, as the pitching team’s career gets long the cost per strike out goes up and value per strike out goes down.
features_wl_ratio <- baseball_full %>%
select(wl_ratio)
labels_team_avg_career_length <- baseball_full$team_avg_career_length
model <- train(x = features_wl_ratio,
y = labels_team_avg_career_length,
method = 'lm')
model$results
## intercept RMSE Rsquared MAE RMSESD RsquaredSD MAESD
## 1 TRUE 1.236875 0.05315146 0.9758566 0.0200377 0.01037708 0.01778283
ggplot(data = baseball_full) +
geom_point(mapping = aes(x = wl_ratio,
y = team_avg_career_length)) +
labs(x = "Win Percentage",
y = "Team Career Length" ,
title = "Career Length to Wins") +
theme_dark() +
geom_abline(intercept = model$finalModel$coefficients[[1]],
slope = model$finalModel$coefficients[[2]],
color = 'dark red', size = 1)
There is a very low correlation between career length and the capabity to win games. Despite younger player and teams having a better cost per strike out it does not transpher to wins.
features_wl_ratio <- baseball_full %>%
select(wl_ratio)
labels_team_salary <- baseball_full$team_salary
model <- train(x = features_wl_ratio,
y = labels_team_salary,
method = 'lm')
model$results
## intercept RMSE Rsquared MAE RMSESD RsquaredSD MAESD
## 1 TRUE 18194131 0.07637461 14036615 314264.8 0.01170675 277928.9
ggplot(data = baseball_full) +
geom_point(mapping = aes(x = wl_ratio,
y = team_salary)) +
labs(x = "Win Percentage",
y = "Team Salary" ,
title = "Team Salary to Wins") +
scale_y_continuous(labels = scales::comma) +
theme_dark() +
geom_abline(intercept = model$finalModel$coefficients[[1]],
slope = model$finalModel$coefficients[[2]],
color = 'dark red', size = 1)
As salaries increase so does the win percentage.
features_team_so <- baseball_full %>%
select(team_so)
labels_wl_ratio <- baseball_full$wl_ratio
model <- train(x = features_team_so,
y = labels_wl_ratio,
method = 'lm')
model$results
## intercept RMSE Rsquared MAE RMSESD RsquaredSD
## 1 TRUE 0.06199979 0.1262335 0.0524347 0.0007881526 0.01276425
## MAESD
## 1 0.0006361553
ggplot(data = baseball_full) +
geom_point(mapping = aes(x = team_so,
y = wl_ratio)) +
labs(x = "Strike Outs",
y = "Win Percentage" ,
title = "Team Strike Out Cost to Win Ratio") +
theme_dark() +
geom_abline(intercept = model$finalModel$coefficients[[1]],
slope = model$finalModel$coefficients[[2]],
color = 'dark red', size = 1)
Also as salaries increase so do win percentages
features_team_costso <- baseball_full %>%
select(team_costso)
labels_wl_ratio <- baseball_full$wl_ratio
model <- train(x = features_team_costso,
y = labels_wl_ratio,
method = 'lm')
model$results
## intercept RMSE Rsquared MAE RMSESD RsquaredSD
## 1 TRUE 0.06473426 0.05558987 0.05462158 0.0005627593 0.008806155
## MAESD
## 1 0.0005630158
ggplot(data = baseball_full) +
geom_point(mapping = aes(x = team_costso,
y = wl_ratio)) +
labs(x = "Cost per Strike Out",
y = "Win Percentage" ,
title = "Team Strike Out Cost to Win Ratio") +
scale_x_continuous(labels = scales::comma) +
theme_dark() +
geom_abline(intercept = model$finalModel$coefficients[[1]],
slope = model$finalModel$coefficients[[2]],
color = 'dark red', size = 1)
There is only a low interrelationship bewtween win percentages and the cost per strike out.
team_baseball %>%
select(yearID, teamID, team_costso, wl_ratio) %>%
arrange(team_costso) %>%
head(n=10)
## # A tibble: 10 x 4
## # Groups: teamID, yearID [10]
## yearID teamID team_costso wl_ratio
## <int> <chr> <dbl> <dbl>
## 1 2009 OAK 7162. 0.463
## 2 2008 PIT 7251. 0.414
## 3 2007 TBA 7403. 0.407
## 4 2009 FLO 8004. 0.537
## 5 2013 HOU 8143. 0.315
## 6 2009 WAS 8144. 0.364
## 7 2008 WAS 8373. 0.366
## 8 2014 MIA 8671. 0.475
## 9 2008 FLO 8968. 0.522
## 10 2016 CIN 10213. 0.420
Out of the top ten team cost per strike out only two of the teams had a winning record.
To get the best price per strike out the best interrelationship factors are age and career length. Both of these atributes have the highest correlations on both the indivudal and team scales. There is no strong factor with pitching teams that would determine a strong winning record.