The purpose of this document is to analyze the season history of the Cincinnati Reds baseball franchise from 1882 to 2019. As a die-hard Reds fan and a follower of baseball statistics, I enjoyed researching the season history of my favorite team to find interesting trends and revelations. In addition, I took a Sabermetrics class at Xavier that inspired me to continue playing around with baseball data in my free time. I scraped a datatable from baseball-reference that displays every single season the Cincinnati Reds have played to date. Here is the link:
https://www.baseball-reference.com/teams/CIN/index.shtml
Information found in the table include wins, losses, runs scored, runs allowed, winning percentage, average batter age, average pitcher age, etc.
The packages in R I loaded up and used are listed below along with their purpose in my analysis:
XML: Working with XML table objects on HTML webpages
rvest: Useful tooks for working with HTML and XML
RCurl: Convenient HTTP forms
magritter: Adding piping functionality
httr:Useful for web authentication
tidyverse: The tidyverse is a collection of open source R packages that help model, transform, and visualize data. In the tidyverse, I used the ggplot2 package throughout my analysis to create my visualizations, and I used the dplyr package to filter and manipulate the data.
scales: I used the scales library to prevent R from labeling my axes in a scientific format.
To install these packages in R, use the following command: install.packages(c(“XML”, “rvest”, “RCurl”, “magritter”, httr“,”tidyverse“,”scales"))
After scraping the data table from baseball-reference, I asked a series of 5 questions I wanted to investigate.
The Steroid Era is a dark chapter in baseball history where many players used PEDs to gain an advantage. As a result, we witnessed spectacular home run numbers from many players throughout the ‘90s and early ’00s, such as Barry Bonds, Mark McGwire, and Sammy Sosa. I’m curious to see if the Steroid Era is reflected in the Reds’ history too. I will compare the average amount of runs scored and allowed in each Reds era by grouping each season by their respective era. I identified 7 eras in baseball: Deadball (1901-1919), Lively Ball (1920-1945), Post-War (1946-1960), Expansion (1961-1976), Free Agency (1977-1993), Steroid (1994-2005), and Modern (2006-2019).
Based on the two bar graphs, the Steroid Era in Reds History had the most runs scored and allowed on average in a season. This indicates that Reds players were probably using PEDs to increase their chances of hitting long balls, and Reds pitchers faced players from other teams that frequently used PEDs. All in all, while the Steroid Era condemns certain individuals like Barry Bonds, it’s important to note that everyone probably used them in this time, too, because the MLB didn’t test their players for PEDs until 2003.