A statistical analysis of Baseball involves measuring the individual, team, and league performance based on two broad categories: offensive and defensive performance. The recording of play-by-play performance commonly refers to each event that occurs within one game between teams. The statistical summary of each game is recorded in a box score, which indicates various quantitative measures such as runs scored, the number of hits, strikes, walks, outs, errors, or balls put into play. In addition, a box score records various categorical metrics such as a team’s lineup - or the sequential order of players - the position of players, the names of umpires and managers present, or the game’s date to name a few. The crux of a box score is to record objective data by giving logical values (0 for false, and 1 for true) to each event that occurs and does not occur within a game. In other words, if Noah Syndergaard, a starting pitcher for the New York Mets, were to strikeout a batter in the first inning of a game he would be accreddited one “out” in the first inning. An aggregration of these scores is what begins this report to identify not only career leaders based on certain statistical analyses, but also to create a team of statistically powerful players that are currently active - not deceased nor retired - within the Major League and to forecast their possible performance.
In 1985, Bill James pioneered the empirical analysis of Baseball known as Sabermetrics in order to accurately measure player performance that contributes to a team’s win or loss. Unlike Sabermetrics, conventional statistics - which I explain both further on in this report - typically do not factor out variables that cannot logically be traced back to certain players or the position a player is in. Luck, for example, is not exactly an objective measure that can be easily quantified or logically supported without some subjectivity. However, in many convential statistics, luck is usually accounted for. To do away with such subjectivity that misleads one to believe a player is better or worse than he actually is, James and other Sabermetricians derived serval, improved formulas to accurately portray a player’s performance. This report breaks down the categories of offensive and defensive performance into a player’s ability to bat, pitch and field balls put into play to either earn or prevent runs from being scored using these Sabermetrics.
The database where I derived play-by-play data is an amalgam of box scores collected from 1952 to 2015 - the lastest season to be documented by Retrosheet Inc., an open-source organization dedicated to the collection of game accounts, to the unification of such accounts into a user-friendly system, and to the recording of such a system into a computerized format. I restricted my analysis to the years of 1976 to 2015, a 40-year time span of which was imported from the open-source relational database management software, MySQL. In conjuction to the Retrosheet database, I imported certain data sets related to all players, Pitchers and Fielders from the Lahman Database, another open-source online archive of summarized baseball statistics to assist in calculating statistics that would have otherwise been faulted with human error.
Also, with the guide of Joseph Adler’s Baseball Hacks: Tips and Tools for Analyzing and Winning with Statistics, I was able to derive the most up-to-date formulas used to measure batting, fielding, and pitching statistics in addition to defining variables that are commonly abbreviated in the industry. The Sabermetric statistics used in this report are categorized by batting, pitching and fielding performances, and will be discussed in this order.
Conventional statistics like a batter’s Batting Average is commonly used to explain how well a player hits a ball per the number of times he is up at bat. This simiplistic equation cannot, however, be used to compare the batting average of a seasoned player whose had a larger number of at bats to an less seasoned player whose has number of at bats is significantly shorter. Usually, Batting Averages are documented along with a player’s number of at bats to allow fans to gauge players slightly more effectively, but it does not prove to be statistically significant. Players who “qualify for titles such as the Highest Batting Average,” Major League Baseball requires that a “batter has on average 3.1 at bats per game,” that is about “500 at bats” (ADLER, 333).
## 0% 25% 50% 75% 100%
## 0 272 438 546 716