This analysis uses the MLB databases from Kaggle (https://www.kaggle.com/datasets/open-source-sports/baseball-databank). The data sets spread over a number of years but Baseball as a sport evolves frequently. Given the variation of the game, this research will focus on modern baseball (2000-2015).
Often in sports news today you read about “Super Teams” or teams that spend easily above one standard deviation from the average (do not hold me to this statement). Even though we often hear about these teams, are they necessarily guaranteed to perform well or outperform teams with lower payrolls?
(Division champions denoted by star above). From the plot above we can see that there is some variation between the divisions in regards to salary and success. We can see in the American League East that the New York Yankees and the Boston Red Sox (the two teams with the highest payroll) have won the Division 11 out of the 16 years. On the other hand, the American League West has only seen the top salary team win it 7 times (the Moneyball Oakland A’s have won the division 6 times with the lowest payroll). We also see that some divisions do not have much of a difference between team salaries, for example the National League Central with little spread between division champs. Overall, there is a general trend showing the increase of salary with almost all teams showing an significant increase in salary from 2000-2015. Could the general increase in teams salaries mean pay to win or general inflation through the years?
The above plot takes the average salary by position for Modern Era Baseball. From the plot we can see a general increase in average salary for essentially every position in the MLB. First basemen have the highest average salary while it has nearly doubled throughout the time period. Pitchers are a near second in average salary while also seeing nearly a double in average salary as well. On the other hand, catchers have the lowest of average salaries (weird considering they have one of the highest workloads out of the 7 groups).
The above plot takes the 6 main statistics for a batter. Data was cleaned to only show batters who averaged >= 3.1 at bats per game (this is essentially the requirement to win a batting title) or had over 300 plate appearances.
In general, we can see from these statistics that more tends to mean better. This trend shows the value of a dependable everyday player who can hit his weight + some heavy coats. Although more is better, there are still some interesting notes to take from the data. There is a small inflection point for HRperSeason around 20 HR’s followed by a more bowed out slope for >20. Furthermore, HR’s saw the starkest increase in salary as the x-value increased. It must be noted that it is rather hard to fit a line to batting average as rarely any qualifying hitters hit lower than .200.
The above plot takes the 6 main statistics for a Starting Pitcher. Data was cleaned to only show pitchers who started more than 8 games a season (sometimes relievers have to fill in and start when a starting pitcher gets injured).
From the plots a few things become evident. Teams put value into a tenured starting pitcher, there is a roughly linear correlation between total wins and salary. Also, teams seem to value a starting pitcher with an ERA less than 4 than a pitcher with a higher ERA. Finally, not surprisingly, there is value in strikeouts. The value in strikeouts makes sense, if a pitcher forces a lot of strikeouts certain metrics like WIP and ERA are likely to go down as well.
The above plot takes the 6 main statistics for a Relief Pitcher. Data was cleaned to only show pitchers who started less than 8 games a season (sometimes relievers have to fill in and start when a starting pitcher gets injured).
From the plots we can see that relief pitchers have certain similarities to starting pitchers but vary on a few aspects. First of all, a relief pitchers value is more focused on Saves than Wins (relief pitchers often come in in situations where the goal is to get a few outs or hold a score steady). Again, there is almost a linear relationship between Saves and salary. Saves are essentially closing out a game, as they work as a compliment to a Win. One could say it is called a save because it saves the win for the team. On the other hand, relief pitchers have more value at a lower ERA and WIP than starting pitchers. Strikeouts have a positive correlation but not as drastic as that of a starting pitcher.
| playerID | TotHR | YrT_HR | TotBA | Avg_Salary |
|---|---|---|---|---|
| pujolal01 | 560 | 37.333 | 0.312 | 11936029 |
| rodrial01 | 539 | 35.933 | 0.293 | 24650117 |
| ortizda01 | 493 | 30.812 | 0.285 | 8990156 |
| dunnad01 | 460 | 30.667 | 0.237 | 8668846 |
| soriaal01 | 409 | 27.267 | 0.271 | 11425714 |
| cabremi01 | 408 | 31.385 | 0.321 | 12339279 |
| thomeji01 | 405 | 33.750 | 0.271 | 9535897 |
| konerpa01 | 403 | 28.786 | 0.282 | 8560434 |
| teixema01 | 391 | 27.929 | 0.273 | 14703846 |
| beltrad01 | 391 | 24.438 | 0.287 | 10296875 |
From computing the top 10 HR hitters of the time frame we see a few familiar and expected names. Names such as Albert Pujols and Alex Rodriguez are on this list. When comparing these players salaries to position averages, we can see that these players easily get paid over the average. Home Runs are king!!
| playerID | TotBA | Avg_Salary | TotG |
|---|---|---|---|
| bondsba01 | 0.323 | 15790533 | 972 |
| cabremi01 | 0.321 | 12339279 | 1938 |
| guerrvl01 | 0.318 | 10342621 | 1729 |
| heltoto01 | 0.317 | 11625000 | 1901 |
| ramirma02 | 0.317 | 16338147 | 1424 |
| diazma02 | 0.316 | 1426786 | 427 |
| walkela01 | 0.314 | 12496032 | 652 |
| suzukic01 | 0.314 | 10808766 | 2357 |
| mauerjo01 | 0.313 | 12418750 | 1421 |
| ordonma01 | 0.312 | 11070062 | 1525 |
Furthermore, we can see the value in a dependable hitter. Every player on this list (besides one) averages over a 10 million dollar salary per season, well above the league average. This list also has a few expected names such as, Ichiro Suzuki, Barry Bonds (cheater), and Joe Mauer (one of the best hitting catchers of the era). Maybe Batting Average is king, these salaries appear to be a little higher than the top 10 HR table!
| playerID | TotG | Avg_Salary |
|---|---|---|
| suzukic01 | 2357 | 10808766 |
| beltrad01 | 2338 | 10296875 |
| pujolal01 | 2274 | 11936029 |
| hunteto01 | 2230 | 10709688 |
| rolliji01 | 2220 | 6518667 |
| ortizda01 | 2146 | 8990156 |
| beltrca01 | 2136 | 11911424 |
| ramirar01 | 2104 | 9232188 |
| jeterde01 | 2092 | 17212540 |
| rodrial01 | 2077 | 24650117 |
Finally, this table is to look into the value of an everyday player. There are a few expected and repeated names on this list, such as Ichiro Suzuki and Albert Pujols. Even though there are a few players with high average salaries in this table it appears as the home runs and batting average might take precedence over not missing a few games. The value of a sturdy player is still evident.
The above plot shows the payroll breakdown between playoff and non playoff teams. We can see a steady increase in the payroll of playoff teams through out the period. More interesting is that in this era only 8-10 teams made the playoffs (in total there are 30 teams). Therefore, even though the bar splits look even in appearance they are not truly so. For example, in 2000 only 8 teams made the postseason while they accounted for half the total payroll. The same applies for 2015 with 10 playoff teams accounting for over half of the 30 team total payroll.
In this plot we assess if paying more necessarily increases a teams chance of winning the World Series. It must be noted that in the earlier part of the period only 8 teams made the playoffs, while in the later part 10 teams made it. Although, this graph shows the increase in total payroll for playoff teams, there does not appear to be a significant indicator of paying more leads to winning it all. Anyways, there is a slight increase in salary for World Series winning teams but nothing major.
This plot is to help visualize the breakdown between playoff and non-playoff teams farther. Playoff teams are in green while red corresponds to non-playoff (triangle denotes World Series champion). In comparison to salary distribution the WS champs seem to never be lower than middle of the back. There is a slight incremental trend since 2010 for World Series champions salaries (but not 2015). Additionally, the plot also brings to light that not all teams with big payrolls even necessarily make the playoffs. For example, in 2015 4 of the 6 top payroll teams did not make the playoffs.
I really enjoyed this class! I feel as if I have learned a lot more into visualizing data and using R as a whole. The real life examples in this class are a god send!