Is The Home Run Era Ruining Baseball

Is The Home Run Era Ruining Baseball

By: Lucas Quintos

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.1     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(Lahman)
library(httr)
library(rvest)

Attaching package: 'rvest'

The following object is masked from 'package:readr':

    guess_encoding
library(lubridate)
library(magrittr)

Attaching package: 'magrittr'

The following object is masked from 'package:purrr':

    set_names

The following object is masked from 'package:tidyr':

    extract
library(knitr)

Introduction

Baseball has always been an interesting sport to me. Some find it too slow and boring, while others find th epace is what makes it so good. Some baseball fans love small ball and stacking up hits, while some just like to see the home run ball. Over the last few decades, home runs have become a bigger part of Major League Baseball, which made me wonder: Is the home run era ruining baseball?

I am interested in this question because home runs are exciting, but if the game becomes too focused on them, baseball might lose some of the variety that makes it fun. As a fan of the cubs, this year they seem to be able to play small ball and play for the long ball. this seems way more exciting than just home runs, walks and strikeouts to me. So to determine if the home run ball is ruining baseball, I will be asking: Has the rise in home runs made MLB less balanced and less entertaining from an offensive perspective?

My hypothesis is that home runs are not ruining baseball by themselves, but the modern game has become more dependent on them. I expect to find that home runs are strongly connected to scoring, but also that strikeouts have increased, batting average has gone down, and the game has become more all-or-nothing.

Data

For my primary dataset, I used the Lahman baseball database package in R. The Lahman database is one of the most widely used historical baseball datasets and contains team-level MLB statistics including runs, home runs, strikeouts, hits, walks, and wins dating back to the 1800s. I used this in my Sabermetrics class. I will start from 1950, for comparision analysis. The link to the dataset source is: https://seanlahman.com/baseball-archive/statistics/

teams_raw <- Teams
mlb <- teams_raw %>%
  filter(yearID >= 1950) %>%
  mutate(
    HR_per_game = HR / G,
    R_per_game = R / G,
    SO_per_game = SO / G,
    BB_per_game = BB / G,
    BA = H / AB,
    HR_rate = HR / AB,
    SO_rate = SO / AB
  )

Data Dictionary

data_dictionary <- tibble(
  Variable = c(
    "yearID", "teamID", "G", "W", "R", "HR", "SO", "BB", "H", "AB",
    "HR_per_game", "R_per_game", "SO_per_game", "BB_per_game",
    "BA", "HR_rate", "SO_rate"
  ),
  Meaning = c(
    "Season year",
    "Team abbreviation",
    "Games played",
    "Wins",
    "Runs scored",
    "Home runs",
    "Strikeouts",
    "Walks",
    "Hits",
    "At bats",
    "Home runs per game",
    "Runs per game",
    "Strikeouts per game",
    "Walks per game",
    "Batting average",
    "Home runs divided by at bats",
    "Strikeouts divided by at bats"
  )
)

kable(data_dictionary)
Variable Meaning
yearID Season year
teamID Team abbreviation
G Games played
W Wins
R Runs scored
HR Home runs
SO Strikeouts
BB Walks
H Hits
AB At bats
HR_per_game Home runs per game
R_per_game Runs per game
SO_per_game Strikeouts per game
BB_per_game Walks per game
BA Batting average
HR_rate Home runs divided by at bats
SO_rate Strikeouts divided by at bats

Summary Statistics

summary_stats <- mlb %>%
  summarize(
    team_seasons = n(),
    first_year = min(yearID),
    most_recent_year = max(yearID),
    average_runs_per_game = mean(R_per_game, na.rm = TRUE),
    average_home_runs_per_game = mean(HR_per_game, na.rm = TRUE),
    average_strikeouts_per_game = mean(SO_per_game, na.rm = TRUE),
    average_walks_per_game = mean(BB_per_game, na.rm = TRUE),
    average_batting_average = mean(BA, na.rm = TRUE)
  )
kable(summary_stats, digits = 3)
team_seasons first_year most_recent_year average_runs_per_game average_home_runs_per_game average_strikeouts_per_game average_walks_per_game average_batting_average
1922 1950 2025 4.424 0.942 6.269 3.28 0.257

These summary statistics give a basic overview of the data. Since the data set goes back to 1950 for this project, it allows me to compare modern baseball to older eras.

year_summary <- mlb %>%
  group_by(yearID) %>%
  summarize(
    avg_HR_per_game = mean(HR_per_game, na.rm = TRUE),
    avg_R_per_game = mean(R_per_game, na.rm = TRUE),
    avg_SO_per_game = mean(SO_per_game, na.rm = TRUE),
    avg_BB_per_game = mean(BB_per_game, na.rm = TRUE),
    avg_BA = mean(BA, na.rm = TRUE),
    .groups = "drop"
  )

Analysis of Primary Data

Home Runs Over Time

ggplot(year_summary, aes(x = yearID, y = avg_HR_per_game)) +  
  geom_line(linewidth = 1) + 
  labs(title = "Home Runs Have Become a Bigger Part of MLB", 
       x = "Season", 
       y = "Average Home Runs Per Game") + 
       theme_minimal()

Home runs have clearly become more common over time. This supports the idea that modern baseball is more power-focused than earlier eras. This does not automatically mean baseball is worse, but it does show that the style of play has changed.

Runs Over Time

ggplot(year_summary, aes(x = yearID, y = avg_R_per_game)) +
  geom_line(linewidth = 1) +
  labs(
    title = "Runs Per Game Have Not Increased as Clearly as Home Runs",
    x = "Season",
    y = "Average Runs Per Game"
  ) +
  theme_minimal()

This graph is important because if home runs were automatically making baseball better offensively, runs would rise the same way. Instead, scoring moves up and down over time. This shows that more home runs do not always mean a better offensive game.

Strikeouts Over Time

ggplot(year_summary, aes(x = yearID, y = avg_SO_per_game)) +
  geom_line(linewidth = 1) +
  labs(
    title = "Strikeouts Have Increased in the Modern Game",
    x = "Season",
    y = "Average Strikeouts Per Game"
  ) +
  theme_minimal()

This is probably the biggest argument against the home run era. Home runs are exciting, but strikeouts take action away from the field. When strikeouts rise, there are fewer defensive plays, fewer hits, and fewer chances for baserunning.

Batting Average Over Time

ggplot(year_summary, aes(x = yearID, y = avg_BA)) +
  geom_line(linewidth = 1) +
  labs(
    title = "Batting Average Has Declined Over Time",
    x = "Season",
    y = "Average Team Batting Average"
  ) +
  theme_minimal()

Batting average is not the only way to judge offense, but it still shows how often teams are getting hits. The decline suggests that baseball has become less about contact and more about power, walks, and strikeouts. The jump in the mid 90’s can be attributed to the steroid era.

Do Home Runs Help Teams Score?

ggplot(mlb, aes(x = HR_per_game, y = R_per_game)) +
  geom_point(alpha = 0.35) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    title = "Teams That Hit More Home Runs Usually Score More Runs",
    x = "Home Runs Per Game",
    y = "Runs Per Game"
  ) +
  theme_minimal()
`geom_smooth()` using formula = 'y ~ x'

This graph explains why teams keep chasing home runs. Even if some fans dislike the modern style, home runs are strongly connected to scoring, shown by the positive correlation on the graph. As an analyst, this is what I would recommend if you want to score more runs.

Correlation Table

Now that we have looked at all the stats that can be attributed to our question, lets look at the correlations between these stats.

cor_table <- mlb %>%
  select(R_per_game, HR_per_game, SO_per_game, BB_per_game, BA, HR_rate, SO_rate) %>%
  cor(use = "complete.obs") %>%
  round(3)

kable(cor_table)
R_per_game HR_per_game SO_per_game BB_per_game BA HR_rate SO_rate
R_per_game 1.000 0.695 0.087 0.525 0.744 0.669 0.046
HR_per_game 0.695 1.000 0.531 0.259 0.228 0.998 0.507
SO_per_game 0.087 0.531 1.000 -0.099 -0.333 0.548 0.997
BB_per_game 0.525 0.259 -0.099 1.000 0.230 0.259 -0.101
BA 0.744 0.228 -0.333 0.230 1.000 0.185 -0.383
HR_rate 0.669 0.998 0.548 0.259 0.185 1.000 0.529
SO_rate 0.046 0.507 0.997 -0.101 -0.383 0.529 1.000

The correlation table gives a more detailed look at how offensive statistics are related to each other in modern baseball.

The strongest relationship in the table is between HR_per_game and R_per_game with a correlation of about 0.695. This shows that teams that hit more home runs generally score more runs. This supports the idea that power hitting is one of the most effective offensive strategies in baseball today.

There is also a moderately strong positive relationship between SO_per_game and HR_per_game at about 0.531. This suggests that the modern power approach often comes with more strikeouts. Teams that focus heavily on power tend to accept strikeouts as part of their offensive strategy.

Another important result is the strong positive relationship between batting average (BA) and runs scored (R_per_game) at about 0.744. Even in the home run era, teams still score more runs when they consistently get hits and put the ball in play.

The table also shows a negative relationship between batting average and strikeout rate (SO_rate) at about -0.383. This makes sense because teams that strike out more frequently usually get fewer hits overall.

Overall, the correlation table supports both sides of the argument. Home runs clearly help teams score runs, which explains why teams value power so much. At the same time, the data also suggests that the modern power-focused style contributes to more strikeouts and a less contact-oriented version of baseball, and more hits also leads to runs.

Comparing Baseball By Decade

decade_summary <- mlb %>%
  mutate(decade = floor(yearID / 10) * 10) %>%
  group_by(decade) %>%
  summarize(
    HR_per_game = mean(HR_per_game, na.rm = TRUE),
    R_per_game = mean(R_per_game, na.rm = TRUE),
    SO_per_game = mean(SO_per_game, na.rm = TRUE),
    BA = mean(BA, na.rm = TRUE),
    .groups = "drop"
  )

kable(decade_summary, digits = 3)
decade HR_per_game R_per_game SO_per_game BA
1950 0.843 4.447 4.399 0.259
1960 0.820 4.042 5.712 0.249
1970 0.746 4.155 5.145 0.256
1980 0.804 4.288 5.342 0.259
1990 0.960 4.686 6.151 0.265
2000 1.073 4.758 6.561 0.265
2010 1.069 4.388 7.819 0.254
2020 1.179 4.486 8.534 0.245
ggplot(decade_summary, aes(x = factor(decade), y = HR_per_game)) +
  geom_col() +
  labs(
    title = "Average Home Runs Per Game by Decade",
    x = "Decade",
    y = "Home Runs Per Game"
  ) +
  theme_minimal()

Looking at the data by decade makes the trend easier to understand. The modern game is clearly much more home-run focused than older eras, showing this is truly an era focused on HR’s

Secondary Data Source: Web Scraped Baseball Reference Data

For my secondary data source, I used web scraping to collect MLB batting statistics from Baseball Reference. This helps compare the Lahman dataset with a public website that reports current league batting totals.

url <- "https://www.baseball-reference.com/leagues/majors/bat.shtml"

batting_page <- read_html(url)

batting_table <- batting_page %>%
  html_table(fill = TRUE)

batting_reference <- batting_table[[1]]

glimpse(batting_reference)
Rows: 162
Columns: 30
$ Year   <chr> "2026", "2025", "2024", "2023", "2022", "2021", "2020", "2019",…
$ Tms    <chr> "30", "30", "30", "30", "30", "30", "30", "30", "30", "30", "30…
$ `#Bat` <chr> "499", "763", "741", "765", "790", "1373", "618", "1287", "1271…
$ BatAge <chr> "28.1", "27.9", "27.9", "28.0", "28.2", "28.4", "28.0", "27.9",…
$ `R/G`  <chr> "4.48", "4.45", "4.39", "4.62", "4.28", "4.53", "4.65", "4.83",…
$ G      <chr> "1126", "4860", "4858", "4860", "4860", "4858", "1796", "4858",…
$ PA     <chr> "37.98", "37.64", "37.56", "37.88", "37.46", "37.43", "37.03", …
$ AB     <chr> "33.51", "33.68", "33.69", "33.83", "33.63", "33.33", "32.87", …
$ R      <chr> "4.48", "4.45", "4.39", "4.62", "4.28", "4.53", "4.65", "4.83",…
$ H      <chr> "8.13", "8.26", "8.20", "8.40", "8.16", "8.13", "8.04", "8.65",…
$ `1B`   <chr> "5.37", "5.38", "5.34", "5.35", "5.33", "5.15", "5.06", "5.34",…
$ `2B`   <chr> "1.57", "1.59", "1.60", "1.69", "1.63", "1.62", "1.57", "1.76",…
$ `3B`   <chr> "0.13", "0.13", "0.14", "0.15", "0.13", "0.14", "0.13", "0.16",…
$ HR     <chr> "1.06", "1.16", "1.12", "1.21", "1.07", "1.22", "1.28", "1.39",…
$ RBI    <chr> "4.28", "4.27", "4.19", "4.43", "4.09", "4.32", "4.44", "4.63",…
$ SB     <chr> "0.70", "0.71", "0.74", "0.72", "0.51", "0.46", "0.49", "0.47",…
$ CS     <chr> "0.21", "0.20", "0.20", "0.18", "0.17", "0.15", "0.16", "0.17",…
$ BB     <chr> "3.63", "3.16", "3.07", "3.25", "3.06", "3.25", "3.39", "3.27",…
$ SO     <chr> "8.38", "8.36", "8.48", "8.61", "8.40", "8.68", "8.68", "8.81",…
$ BA     <chr> ".243", ".245", ".243", ".248", ".243", ".244", ".245", ".252",…
$ OBP    <chr> ".322", ".315", ".312", ".320", ".312", ".317", ".322", ".323",…
$ SLG    <chr> ".392", ".404", ".399", ".414", ".395", ".411", ".418", ".435",…
$ OPS    <chr> ".714", ".719", ".711", ".734", ".706", ".728", ".740", ".758",…
$ TB     <chr> "13.15", "13.60", "13.45", "14.01", "13.28", "13.69", "13.73", …
$ GDP    <chr> "0.67", "0.64", "0.66", "0.71", "0.70", "0.69", "0.69", "0.71",…
$ HBP    <chr> "0.42", "0.40", "0.42", "0.43", "0.42", "0.43", "0.46", "0.41",…
$ SH     <chr> "0.14", "0.12", "0.09", "0.09", "0.08", "0.16", "0.07", "0.16",…
$ SF     <chr> "0.27", "0.27", "0.26", "0.25", "0.25", "0.24", "0.22", "0.24",…
$ IBB    <chr> "0.12", "0.11", "0.10", "0.10", "0.10", "0.14", "0.11", "0.16",…
$ BIP    <chr> "24.34", "24.43", "24.35", "24.26", "24.41", "23.67", "23.13", …
batting_reference_clean <- batting_reference %>%
  filter(!is.na(Year)) %>%
  filter(Year != "Year") %>%
  mutate(
    Year = as.numeric(Year),
    HR = as.numeric(HR),
    R = as.numeric(R),
    SO = as.numeric(SO),
    BA = as.numeric(BA)
  ) %>%
  filter(Year >= 1950)
baseball_reference_summary <- batting_reference_clean %>%
  select(Year, HR, R, SO, BA) %>%
  arrange(desc(Year)) %>%
  slice_head(n = 10)

kable(baseball_reference_summary)
Year HR R SO BA
2026 1.06 4.48 8.38 0.243
2025 1.16 4.45 8.36 0.245
2024 1.12 4.39 8.48 0.243
2023 1.21 4.62 8.61 0.248
2022 1.07 4.28 8.40 0.243
2021 1.22 4.53 8.68 0.244
2020 1.28 4.65 8.68 0.245
2019 1.39 4.83 8.81 0.252
2018 1.15 4.45 8.48 0.248
2017 1.26 4.65 8.25 0.255



This table shows the most recent league batting totals from Baseball Reference.

Comparing Primary and Secondary Data

The Lahman data is team-level data, while the Baseball Reference data is league-level data. To compare them, I created league totals from the Lahman data and then compared those totals with Baseball Reference. . Since the Baseball Reference table reports home runs per game, I converted the Lahman data to league home runs per game too. This puts both sources on the same scale and makes them comparable.

lahman_league_totals <- teams_raw %>%
  filter(yearID >= 1950) %>%
  group_by(yearID) %>%
  summarize(
    Lahman_HR_per_game = sum(HR, na.rm = TRUE) / sum(G, na.rm = TRUE),
    .groups = "drop"
  )

baseball_reference_totals <- batting_reference_clean %>%
  select(
    Year,
    BR_HR_per_game = HR
  )

comparison <- lahman_league_totals %>%
  inner_join(
    baseball_reference_totals,
    by = c("yearID" = "Year")
  )

comparison_recent <- comparison %>%
  arrange(yearID) %>%
  slice_head(n = 10)

kable(comparison_recent, digits = 3)
yearID Lahman_HR_per_game BR_HR_per_game
1950 0.837 0.84
1951 0.752 0.75
1952 0.686 0.69
1953 0.837 0.84
1954 0.783 0.78
1955 0.901 0.90
1956 0.926 0.93
1957 0.891 0.89
1958 0.907 0.91
1959 0.909 0.91
comparison_long <- comparison %>%
  pivot_longer(
    cols = c(Lahman_HR_per_game, BR_HR_per_game),
    names_to = "Source",
    values_to = "HR_per_game"
  )

ggplot(
  comparison_long,
  aes(
    x = yearID,
    y = HR_per_game,
    color = Source
  )
) +
  geom_line(linewidth = 1) +
  labs(
    title = "Home Runs Per Game Across Data Sources",
    x = "Season",
    y = "Home Runs Per Game"
  ) +
  theme_minimal()

Both datasets show that home runs per game have generally increased since 1950, especially beginning in the mid-1990s and continuing into the modern era. The graph also shows that modern baseball has reached some of the highest home run rates in league history.

This comparison is important because it confirms that the rise in home runs is not just an issue with one dataset or one source. Multiple independent sources show the same overall trend, supporting the idea that baseball has become increasingly focused on power hitting over time.

Main Findings

After looking at the data, a few things stand out. First, home runs have clearly become a bigger part of Major League Baseball. The game today is much more power-focused than it used to be. Second, home runs do help teams score. The scatterplot shows that teams with more home runs per game usually score more runs per game. This explains why teams keep building around power hitters. Third, the downside is that strikeouts have increased a lot too. This is where the game can become less entertaining. More strikeouts mean fewer balls in play, fewer defensive plays, and fewer chances for baserunning. Finally, batting average has declined over time. This supports the idea that baseball has become less about contact and more about the “three true outcomes”: home runs, walks, and strikeouts.

Conclusion

My final answer is that the home run era is not completely ruining baseball, but it has made the game more one-dimensional. Home runs are exciting, and the data shows that they help teams score. Because of that, it makes sense that teams value power. At the same time, baseball is more fun when there is a mix of everything: home runs, stolen bases, defense, contact hitting, and strategy. The problem is not the home run itself. The problem is when the game becomes too focused on only home runs and strikeouts. Overall, the league should encourage more balls in play, more stolen bases, and more action. The home run era is not the end of baseball, but baseball is better when the home run is only one part of the game instead of the whole game.