This analysis uses the MLB databases from Kaggle (https://www.kaggle.com/datasets/open-source-sports/baseball-databank). The data sets spread over a number of years but Baseball as a sport evolves frequently. Given the variation of the game, this research will focus on modern baseball (2000-2015).

1. The Rise of Salaries in Modern Baseball

Rise of Salary by Team

Often in sports news today you read about “Super Teams” or teams that spend easily above one standard deviation from the average (do not hold me to this statement). Even though we often hear about these teams, are they necessarily guaranteed to perform well or outperform teams with lower payrolls?

(Division champions denoted by star above). From the plot above we can see that there is some variation between the divisions in regards to salary and success. We can see in the American League East that the New York Yankees and the Boston Red Sox (the two teams with the highest payroll) have won the Division 11 out of the 16 years. On the other hand, the American League West has only seen the top salary team win it 7 times (the Moneyball Oakland A’s have won the division 6 times with the lowest payroll). We also see that some divisions do not have much of a difference between team salaries, for example the National League Central with little spread between division champs. Overall, there is a general trend showing the increase of salary with almost all teams showing an significant increase in salary from 2000-2015. Could the general increase in teams salaries mean pay to win or general inflation through the years?

Rise of Salary by Position

The above plot takes the average salary by position for Modern Era Baseball. From the plot we can see a general increase in average salary for essentially every position in the MLB. First basemen have the highest average salary while it has nearly doubled throughout the time period. Pitchers are a near second in average salary while also seeing nearly a double in average salary as well. On the other hand, catchers have the lowest of average salaries (weird considering they have one of the highest workloads out of the 7 groups).

2. What Brings in the Money, A Batter’s Persepective

The above plot takes the 6 main statistics for a batter. Data was cleaned to only show batters who averaged >= 3.1 at bats per game (this is essentially the requirement to win a batting title) or had over 300 plate appearances.

In general, we can see from these statistics that more tends to mean better. This trend shows the value of a dependable everyday player who can hit his weight + some heavy coats. Although more is better, there are still some interesting notes to take from the data. There is a small inflection point for HRperSeason around 20 HR’s followed by a more bowed out slope for >20. Furthermore, HR’s saw the starkest increase in salary as the x-value increased. It must be noted that it is rather hard to fit a line to batting average as rarely any qualifying hitters hit lower than .200.

3. What Brings in the Money, A Pitcher’s Perspective

Starting Pitchers

The above plot takes the 6 main statistics for a Starting Pitcher. Data was cleaned to only show pitchers who started more than 8 games a season (sometimes relievers have to fill in and start when a starting pitcher gets injured).

  • ERA -> pitchers Earned Run Average -> 9 * Earned Runs Allowed / Innings pitched
  • Total Wins -> pitchers total wins
  • SOperSeason -> average strikeouts per season
  • WIPperSeason -> average WIP played per season -> (# hits allowed + # walks allowed)/ Innings pitched
  • OBAperSeason -> average opponents batting average per season
  • GSperSeason -> average games started per season

From the plots a few things become evident. Teams put value into a tenured starting pitcher, there is a roughly linear correlation between total wins and salary. Also, teams seem to value a starting pitcher with an ERA less than 4 than a pitcher with a higher ERA. Finally, not surprisingly, there is value in strikeouts. The value in strikeouts makes sense, if a pitcher forces a lot of strikeouts certain metrics like WIP and ERA are likely to go down as well.

Relief Pitchers

The above plot takes the 6 main statistics for a Relief Pitcher. Data was cleaned to only show pitchers who started less than 8 games a season (sometimes relievers have to fill in and start when a starting pitcher gets injured).

  • ERA -> pitchers Earned Run Average -> 9 * Earned Runs Allowed / Innings pitched
  • Total Saves -> pitchers total saves
  • SOperSeason -> average strikeouts per season
  • WIPperSeason -> average WIP played per season -> (# hits allowed + # walks allowed)/ Innings pitched
  • OBAperSeason -> average opponents batting average per season
  • BBperSeason -> average walked batters per season

From the plots we can see that relief pitchers have certain similarities to starting pitchers but vary on a few aspects. First of all, a relief pitchers value is more focused on Saves than Wins (relief pitchers often come in in situations where the goal is to get a few outs or hold a score steady). Again, there is almost a linear relationship between Saves and salary. Saves are essentially closing out a game, as they work as a compliment to a Win. One could say it is called a save because it saves the win for the team. On the other hand, relief pitchers have more value at a lower ERA and WIP than starting pitchers. Strikeouts have a positive correlation but not as drastic as that of a starting pitcher.

4. The Leaders of the Pack

Hitters

Top 10 HR Hitters
playerID TotHR YrT_HR TotBA Avg_Salary
pujolal01 560 37.333 0.312 11936029
rodrial01 539 35.933 0.293 24650117
ortizda01 493 30.812 0.285 8990156
dunnad01 460 30.667 0.237 8668846
soriaal01 409 27.267 0.271 11425714
cabremi01 408 31.385 0.321 12339279
thomeji01 405 33.750 0.271 9535897
konerpa01 403 28.786 0.282 8560434
teixema01 391 27.929 0.273 14703846
beltrad01 391 24.438 0.287 10296875

From computing the top 10 HR hitters of the time frame we see a few familiar and expected names. Names such as Albert Pujols and Alex Rodriguez are on this list. When comparing these players salaries to position averages, we can see that these players easily get paid over the average. Home Runs are king!!

Top 10 Batting Average
playerID TotBA Avg_Salary TotG
bondsba01 0.323 15790533 972
cabremi01 0.321 12339279 1938
guerrvl01 0.318 10342621 1729
heltoto01 0.317 11625000 1901
ramirma02 0.317 16338147 1424
diazma02 0.316 1426786 427
walkela01 0.314 12496032 652
suzukic01 0.314 10808766 2357
mauerjo01 0.313 12418750 1421
ordonma01 0.312 11070062 1525

Furthermore, we can see the value in a dependable hitter. Every player on this list (besides one) averages over a 10 million dollar salary per season, well above the league average. This list also has a few expected names such as, Ichiro Suzuki, Barry Bonds (cheater), and Joe Mauer (one of the best hitting catchers of the era). Maybe Batting Average is king, these salaries appear to be a little higher than the top 10 HR table!

Top 10 Games Played
playerID TotG Avg_Salary
suzukic01 2357 10808766
beltrad01 2338 10296875
pujolal01 2274 11936029
hunteto01 2230 10709688
rolliji01 2220 6518667
ortizda01 2146 8990156
beltrca01 2136 11911424
ramirar01 2104 9232188
jeterde01 2092 17212540
rodrial01 2077 24650117

Finally, this table is to look into the value of an everyday player. There are a few expected and repeated names on this list, such as Ichiro Suzuki and Albert Pujols. Even though there are a few players with high average salaries in this table it appears as the home runs and batting average might take precedence over not missing a few games. The value of a sturdy player is still evident.

5. Playoffs? Pay to Win?

The above plot shows the payroll breakdown between playoff and non playoff teams. We can see a steady increase in the payroll of playoff teams through out the period. More interesting is that in this era only 8-10 teams made the playoffs (in total there are 30 teams). Therefore, even though the bar splits look even in appearance they are not truly so. For example, in 2000 only 8 teams made the postseason while they accounted for half the total payroll. The same applies for 2015 with 10 playoff teams accounting for over half of the 30 team total payroll.

In this plot we assess if paying more necessarily increases a teams chance of winning the World Series. It must be noted that in the earlier part of the period only 8 teams made the playoffs, while in the later part 10 teams made it. Although, this graph shows the increase in total payroll for playoff teams, there does not appear to be a significant indicator of paying more leads to winning it all. Anyways, there is a slight increase in salary for World Series winning teams but nothing major.

This plot is to help visualize the breakdown between playoff and non-playoff teams farther. Playoff teams are in green while red corresponds to non-playoff (triangle denotes World Series champion). In comparison to salary distribution the WS champs seem to never be lower than middle of the back. There is a slight incremental trend since 2010 for World Series champions salaries (but not 2015). Additionally, the plot also brings to light that not all teams with big payrolls even necessarily make the playoffs. For example, in 2015 4 of the 6 top payroll teams did not make the playoffs.

THANK YOU!

I really enjoyed this class! I feel as if I have learned a lot more into visualizing data and using R as a whole. The real life examples in this class are a god send!