Percentage Change of Average MLB Player Salary 1985–2015

Percentage Change of Average MLB Player Salary 1985–2015

This article was published Aug 12, 2017

Article Stats

Reading.Time.in.Minutes Word.Count Likes Comments
29 7401 95 3

Summary of “A Brief Exploration of Baseball Statistics”

Author Will Koehrsen used sabermetrics to determine that, because baseball players regress to the mean in performance and because salaries are strongly affected by performance in “flashier” metrics, including runs batted in (RBI) for batters and number of wins for pitchers, in the two years before signing, teams should make an effort to discover players before they have a breakout season.

Author’s Note:

The following exploratory data analysis project was completed as part of the Udacity Data Analyst Nanodegree that I finished in May 2017. All of the code can be found on my GitHub repository for the class. I highly recommend the course to anyone interested in data analysis (that is anyone who wants to make sense of the mass amounts of data generated in our modern world) as well as to those who want to learn basic programming skills in an application-based format.

## [1] "https://medium.com/@williamkoehrsen/data-analysis-with-python-19434f5d6324"

Conclusion

There are several conclusions that can be drawn from this analysis but there are also numerous caveats that must be mentioned that prevent these conclusions from being accepted as fact. I also must state the ever-present rule that correlation does not imply causation. In other words, players statistics may be correlated with salary, but that does not mean the statistics caused a higher or a lower salary.

The main conclusions are as follows:

  1. When looking at batters from the range 2006–2010, the number of RBIs was the performance metric most highly correlated with salary in 2008.
  2. When looking at pitchers from the range 2006–2010, the number of wins was the performance metric most highly correlated with salary in 2008.
  3. These correlations were stronger in the two seasons preceding 2008 than in the two seasons following 2008. A t-Test conducted on batters found that batters with an above average salary in 2008 exhibited larger declines in performance (as measured by number of RBIs) from the preceding seasons to the following seasons than players with a below average salary in 2008. This strongly suggests that players with an above average salary experience a regression to the mean in terms of performance.
  4. Based on the above point, teams should make an effort to discover players before they have a breakout season. That is basically what the Oakland Athletics management team, the subject of the book Moneyball, did in the early 2000s, and they were able to compete and beat teams with much greater player salaries. By finding players who were undervalued, they were able to achieve sustained success despite their meager payroll. Once a player has reached the level where they command a higher salary, they will tend to not perform as well as before their salary increased because their prior performance was an outlier and will gradually decline to a more average level. This phenomenon is not limited to baseball, but can be observed in all aspects of daily life as shown by numerous researchers.

Written by Will Koehrsen

Author Will Koehrsen, Data Scientist at Cortex Intel, Data Science Communicator; Visit Will's Medium page at  https://medium.com/@williamkoehrsen for more info

Author Will Koehrsen, Data Scientist at Cortex Intel, Data Science Communicator; Visit Will’s Medium page at https://medium.com/@williamkoehrsen for more info


Author Stats

Following Followers Follower.to.Following.Ratio Comments
15 32000 2133 3

Further Reading

Writing DS Articles Weekly

What I Learned from Writing a Data Science Article Every Week for a Year School of Athens by Raphael

Using his own experience writing a data science article every week for a year as an analogue, the author explores how he believes that a slow, yet consistent, dedication to learning data science is the single most important factor in becoming proficient in the field.

Gaussian Models

Gaussian Mixture Models and Expectation-Maximization (A full explanation) Gaussian Mixture Models and Expectation-Maximization

Utilizing a Bayesian perspective, this article employed complex mathematics to teach how Gaussian Mixture Models and the Expectation-Maximization might be implemeneted by data scientists. These are two methods commonly seen in machine learning, so this article offered an interesting, albeit high-level, primer on high data science.

DS in Tokyo

Exploring the Tokyo Neighbourhoods: Data-Science in Real Life Tokyo! Source Louis Martinez

This article delves deeply into what being a data scientist in the corporate world means, demonstrating how information must be processed according to the client’s desires, which oftentimes complicates the process. Specifically, python libraries were used to scrap web-data and Foursquare API was used to explore the major districts of Tokyo, all in order to fulfill the parameters of the project.

Visualizing COVID-19

Building COVID-19 interactive dashboard from Jupyter Notebooks Building COVID-19 analysis dashboard using Python and Voila

In this article, the author demonstrates the step by step process he used to gather, process, plot, and export up-to-date coronavius case data so that the pandemic could be better understood through visualization.

Wrangling in Python

Cleaning and Preparing Data in Python Cleaning and Preparing Data in Python

Noting that 70-80% of a data scientist’s work is consumed by the process of data wrangling, this article explores best practices for processing data, using Python to demonstrate.