Baseeball Eras and Their Run Enviroments

Do The Eras Matter When Comes to Scoring

Ryan Lentini

Introduction

I am a huge baseball fan (Red Sox are my favorite team) and baseball has some of the most interesting data because it goes back so far. The Lahman data set that I am using contains batting and pitching data from 1871 to 2020! That is a ridiculously large number of baseball games. The game has also changed so drastically over the years and I love history so I thought that I could combine the two in a way by doing analysis on the different eras of baseball. FI will focus on the main seven eras of baseball that are widely accepted and talked about by baseball people:

IMPORTANT NOTE: ALL STATS ARE DIVIDED BY NUMBER OF GAMES IN THE GIVEN SEASONTO MAKE STATS COMPARABLE ACROSS ERAS

19th century baseball: 1800-1900 Dead Ball: 1901-1919 Lively Ball: 1920-1941 Integration: 1942-1960 Expansion: 1961-1976 Free Agency: 1977-1993 Long Ball: 1993-2020 (present data)

I want to analyze the differences between these eras in some key stats and check and see if some of the eras live up to their name.

The data used for this project comes from the Sean Lahman database which is a database of baseball data for the MLB that includes almost anything you can think of from player stats to how many people attended the game. This analysis will use three main parts of the database; people, batting, pitching, and team statistics. Below you can look over the data dictionary for each dataset. It is an overwhelming amount of data if replicating this paper but as you will see below, much of it won’t be used as only some of the variables from each table are necessary. The main stats that describe the different eras in baseball are home runs, runs, and stolen bases.

Intro to the dataset

The data we will use for this project is of residential properties in nine neighborhoods adjacent to the university campus. Each row records the details of the transaction including the following list of variables:

Variable Description
yearID Year
era The era of baseball based upon the yearID
G Games played
R Runs scored
H Hits by batters
X2B Doubles
X3B Triples
HR Homeruns by batters
BB Walks by batters
SO Strikeouts by batters
SB Stolen bases
ER Earned runs allowed
CG Complete games

Available data for teams dataset

This searchable table contains all the data available for year by year aggregated data on a by game basis

Section 1.2: Summary statistics

The table below summarizes some interesting statistics just to get an overview and see some neat stats that give some color about baseball. I focus on home runs here becasue that is what people are usually most intrigued by as well as stolen bases. We will dig more into these later as we assess the key stats in the enviroments of each era.

# Average Number of Home Runs

## [1] 0.5837044

Standard Deviation of Home Runs

## [1] 0.3527894

What Was the Max Number of Home Runs Per Game in a Season?

## [1] 1.394813

How many home runs have been hit in the history of the MLB?

## [1] 87.55567

Average number of stolen bases in a season with 1876-1885 taken out as SBs were not recorded then

## [1] 0.7765376

Part 2

2.1 Distribution of Home Runs Across Eras

Here we can see that the dead ball era was truly produced less home runs per game than any other era in baseball history. What is really interesting as a fan is the stagnation of home runs between the expansion and free agency eras with the free agency era actually having a lower median number of home runs per game with one really outlier season in that era. Integration era is also fun to observe as we can see the wide difference in number of home runs by seasons during this era as pitchers were able to keep home runs down some years but hitters also had some good years. Last observation is that the Lively Ball era does not really live up to its name although in comparison to the dead ball era it is much bigger.

2.2 Comparing Number of Runs Per Game Across Eras

I would think that the more home run eras would have more runs per game but there is also the possibility in this “three outcomes” era of baseball of a strikeout, walk, or home run that there could actually be less runs.

We get a very interesting result here that is quite surprising. We see here that the number of runs per game was significantly higher in the 19th Century and Lively Ball era than in the more recent eras. Comparing this to our previous graph makes this graph even more insightful into the way of play between eras. The number of runs per game between the free agency era and the Long Ball era is less than a run although we see home runs per game have increased quite significantly with the median being over one home run a game in the Long Ball era. The Dead Ball era doesn’t seem quite as boring anymore as there isn’t a massive difference in runs per game.

This leads me to believe that there are less players getting on base as even though there are more home runs per game, meaning that players are not driving in very many runs with those home runs as I suspect many are solo shots. The above graph gives credence to the criticism of modern baseball that there is not as much action or “small ball” around the bases that is more fun to watch than home runs.

Next, lets see if this small ball theory holds any water by looking at one of the most common stats that represent small ball. Number of stolen bases (SB).

2.3 Comparing Stolen Bases Across Eras

In the above graph we see some evidence that stolen bases were higher in the previous eras. This is true for the 19th Century and Dead Ball era where the number of stolen bases per game is about triple and double that in the modern Long Ball era. The Lively Ball and Integration era go against my initial hypothesis as these eras actually see a smaller number of stolen bases per game than the Long Ball era.

So what can we attribute the run difference to? By the process of elimination by looking at home runs and stolen bases we are left to conclude that the Lively Ball, Integration, and somewhat the Free Agency era see their run differential by hits per game which gives evidence to the fact of better contact skills among the previous generations in baseball.

Next, I will assess to see if bat to ball contact is the reason why runs per game are not as high as would be expected. The best way to assess this with the data we have is to see the trend of strikeouts per game in each era.

2.4 Does the modern long ball era strike out more?

Note: Years 1911 and 1912 are missing in the data set The above graph really shows that strikeouts have dramatically increased by 50% from the Integration to Long Ball era. Two strikeouts might not sound like a ton but when you think that that teams only need 27 outs typically for a game, that two extra outs where the ball is not in play and there is no action on the bases. 7 Strikeouts a game is roughly 13% of the total number of outs a team needs for a game (7/54). You can see why baseball has lost popularity as over a quarter of the ABs have no ball put in play. Strikeouts are especially damaging for scoring as even a ball in play that is an out can be a “productive out” such as a sac fly or hitting it to the right side of the infield with a base runner on third. For the baseball purist and entertainment factor strikeouts are especially annoying and an increasingly frustrating trend in modern baseball for many.

2.5 What does the recent trend in strikeouts look like?

Strikeouts have steadily risen in over the Long Ball era which is a troubling sign for the health and entertainment of the game as the game has passed over 8 strikeouts per game.

3.1 How do people feel about strikeouts?

I am interested to know how people on twitter talk about strikeouts from a perspective if whether or not they really like them or not. Here we see an odd word in the strikeout data with a good amount of positive words which probably pertain to someone who is tweeting about their pitcher. We see the most positive word that people talk about in the strikeout tweets is “top” which probably means that player had some of the top Ks or something of the sort. What was interesting was that “fatigued” is so common. Players that are maybe fatigued strikeout more? This is definitely a tricky area as the fatigue can be attributed to both sides of the ball as well as the word strike is usually negative in my opinion on twitter. People love to tweet out when they think an umpire misses a strike call.

4.1 Analysis of Variance

Lets see if era is statistically significant in scoring runs.

To do this we will run a one-way ANOVA test to determine if era has an affect on the number of runs. Located below is the results and interpretation of the test.

##              Df Sum Sq Mean Sq F value Pr(>F)    
## era           6  83.09  13.848   31.12 <2e-16 ***
## Residuals   143  63.63   0.445                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The test shows us that the p-value is less than alpha of 0.05. This means that the variable we are testing, era, is statistically significant. This means that the era of baseball does have a significant impact on the number of runs scored in a game.

So how does this impact the way that I go about watching and interpreting the sport?:

As stated above, the test shows that era does have an impact on scoring. This can be useful as a fan because if we start to notice trends it is possible to look back at this paper and the previous graphs to see what era we are going into or leaving. For example, if stolen bases go up and strikeouts go down then we can anticipate or expect that baseball might be heading to a Dead Ball era scoring environment (see figures 2.3 and 2.4).