MLB Home Runs

Author

Matt Conover

The Evolution of MLB

Initial Question

Over the course of 150+ years, the game of baseball has evolved tremendously. A lot has changed, especially the prevelance of the home run. Which MLB seasons saw the most home runs? What does this tell us about the evolution of the sport as a whole.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.2     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
batting <- read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/johnsoni4_xavier_edu/EWdUkH1KJkVMnMm3-y9qjIUBUXzafbCx2amHtpSh4Zm1fw?download=1")
Rows: 110495 Columns: 22
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (3): playerID, teamID, lgID
dbl (19): yearID, stint, G, AB, R, H, 2B, 3B, HR, RBI, SB, CS, BB, SO, IBB, ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
hr_by_season <- batting %>% 
  group_by(yearID) %>% 
  summarize(total_hr = sum(HR))

hr_by_season %>% 
  ggplot(aes(x=yearID, y=total_hr))+
  geom_line()+
  labs(title = "Total Home Runs Hit Each Season in MLB",
       x = "Year",
       y = "Total Home Runs")

Analysis

This graph shows a steady increase in the number of home runs hit each season throughout MLB history (barring the 2020 season which was shortened). There are some very interesting trends in this graph, and they highlight the different “eras” within MLB history.

The “Eras”

Until 1930, home runs were not hit very often, as hitters were more focused on putting the ball in play. This was know as the “dead ball era”. It was Babe Ruth, however, who played in the late 1920’s and 30’s that sparked a power surge league wide. The home run numbers kept rising steadily throughout the mid 20th century until the 90’s, when MLB entered the “steroid era”. Some players discovered that they could use steroids to break home run records, until MLB started to crack down on players juicing themselves with performance enhancing drugs, resulting in a slight decrease in power in the early 2000’s. In the early 2010’s, we start to see an increase back to steroid era level numbers. This all culminated in 2019, when it was discovered that MLB changed the baseballs used in games in order to increase power league wide. This was known as the year of the juiced ball. Multiple teams broke home run records in 2019.