library(Lahman)
library(tidyverse)
library(ggplot2)
library(ggthemes)
library(RColorBrewer)
data("Pitching")

Rpubs link: http://rpubs.com/ortizanthony001/267423

We are comparing pitching data from the Lahman dataset, specifically looking at the relationship between Earned Run Average (ERA) and Batting Average Against (BAA) in the Steroid Era (1995-2003) and Post-Steroid Era (2009-2015). There is no concrete start or end for these Era’s, but the Steroid Era started in the mid 1980’s and ended in mid 2000’s. We filtered out ERA values higher than 7.00 and BAA values higher than .350 seeing as pitchers with stats above these values most likely had very few appearances in games. Our main objective was to see if there was a positive correlation between ERA and BAA in these two eras.

filtered_data1 <- filter(Pitching, ERA < 7.00 & BAOpp < .350, yearID %in% c(1995:1997, 2000:2003))

plot1 <- ggplot(filtered_data1, mapping = aes(x = ERA, y = BAOpp)) + 
geom_point(color = "dodgerblue1", alpha = .4) + 
facet_wrap(~yearID) + 
geom_smooth(se = FALSE, color = "black", lwd = .5) + 
labs(x = "Earned Run Average", y = "Batting Average Against", title = "Batting Average Against vs. Earned Run Average", subtitle= "1995-2003, (1998, 1999 excluded)") +
theme_fivethirtyeight()

plot1

ggsave(filename = "plot1.pdf", plot = plot1)

For the Steroid Era, we plotted Batting Average Against vs. Earned Run Average for each year. Although the data in each year are different, it is clear from the graphic that there is a strong positive correlation between BAA and ERA. Each point closer the line has a darker color, points further from the line are more transparent.

filtered_data2 <- filter(Pitching, ERA < 7.00 & BAOpp < .350, yearID %in% c(2009:2015))

plot2 <- ggplot(filtered_data2, mapping = aes(x = ERA, y = BAOpp)) + 
geom_point(color = "dodgerblue1", alpha = .4) + 
facet_wrap(~yearID) + 
geom_smooth(se = FALSE, color = "black", lwd = .5) + 
labs(x = "Earned Run Average", y = "Batting Average Against", title = "Batting Average Against vs. Earned Run Average", subtitle= "2009-2015") +
theme_fivethirtyeight()

plot2

ggsave(filename = "plot2.pdf", plot = plot2)

For the Post-Steroid Era, we plotted Batting Average Against vs. Earned Run Average for each year. Although the data in each year are different, it is clear from the graphic that there is a strong positive correlation between BAA and ERA. Each point closer the line has a darker color, points further from the line are more transparent.

filtered_data <- filter(Pitching, ERA < 7.00 & BAOpp < .350, yearID %in% c(1995:1997, 2000:2003, 2009:2015))

avg_values <- filtered_data %>% group_by (yearID) %>% summarize(meanERA = mean(ERA, na.rm = TRUE), meanBAA = mean(BAOpp, na.rm = TRUE), meanHR = mean(HR, na.rm = TRUE))

avg_plot <- ggplot(avg_values, mapping = aes(x = meanERA, y = meanBAA)) + geom_point(aes(color = yearID)) + labs(x = "Average ERA", y = "Average Batting Average Against", title= "Comparison of Steroid-Era and Post Steroid-Era")

avg_plot

ggsave(filename = "avg_plot.pdf", plot = avg_plot)

We plotted the average BAA vs. average ERA by year for all years in the Steroid and Post-Steroid Era to compare the difference in pitching statistics. The darker the blue, the earlier the year. From the graphic, during the Steroid Era, the average BAA and average ERA are both higher than the average BAA and average ERA in the Post-Steroid Era. This suggests that Steroids influenced pitcher’s statistics.

avg_values1 <- filtered_data1 %>% group_by (yearID) %>% summarize(meanERA = mean(ERA, na.rm = TRUE), meanBAA = mean(BAOpp, na.rm = TRUE), meanHR = mean(HR, na.rm = TRUE))
avg_values2 <- filtered_data2 %>% group_by (yearID) %>% summarize(meanERA = mean(ERA, na.rm = TRUE), meanBAA = mean(BAOpp, na.rm = TRUE), meanHR = mean(HR, na.rm = TRUE))

avg_of_avg1 <- avg_values1 %>% summarize(meanERA1 = mean(meanERA, na.rm = TRUE), meanBAA1 = mean(meanBAA, na.rm = TRUE), meanHR1 = mean(meanHR, na.rm = TRUE))
avg_of_avg2 <- avg_values2 %>% summarize(meanERA1 = mean(meanERA, na.rm = TRUE), meanBAA1 = mean(meanBAA, na.rm = TRUE), meanHR1 = mean(meanHR, na.rm = TRUE))

avg_of_avg1
## # A tibble: 1 × 3
##   meanERA1  meanBAA1  meanHR1
##      <dbl>     <dbl>    <dbl>
## 1  4.24503 0.2499917 8.900118
avg_of_avg2
## # A tibble: 1 × 3
##   meanERA1  meanBAA1  meanHR1
##      <dbl>     <dbl>    <dbl>
## 1 3.779881 0.2449057 7.220001

We calculated the average ERA, average BAA, and average Home Runs given up for the entirety of the Steroid Era and Post-Steroid ERA. All these statistics are higher in the Steroid Era which help answer the question if steroids impacted the game.