# A tibble: 15 × 6
Metric mean sd min med max
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 on_base_percent 0.329 0.03 0.27 0.325 0.458
2 avg_swing_speed 71.9 2.75 63.1 71.8 78.6
3 squared_up_contact 33.4 3.50 24 33 46
4 squared_up_swing 26.0 4.17 17.7 25.4 43.3
5 avg_swing_length 7.34 0.405 6 7.4 8.2
6 swords 17.4 8.38 1 16 41
7 sweet_spot_percent 34.4 3.42 26.6 34.3 43.1
8 barrel_batted_rate 9.16 4.16 0.8 8.8 26.9
9 poorlyunder_percent 25.6 4.74 16 25.6 36.6
10 poorlytopped_percent 30.2 5.37 17.3 29.9 44.6
11 hard_hit_percent 41.8 7.43 19.5 41.7 61
12 avg_best_speed 101. 2.54 94.4 100. 107.
13 avg_hyper_speed 94.6 1.52 91.3 94.4 99.1
14 whiff_percent 24.0 5.96 6.9 24.4 36.4
15 swing_percent 47.9 4.69 37.2 48 62.3
2024 Baseball Data
Final Project
Introduction
For my analysis, I will examine data from the 2024 MLB season to explore which hitting statistics correlate most strongly with team success. For example, a high team on-base percentage (OBP) may be closely linked to winning more games. I also plan to analyze individual swing characteristics to determine which traits are most associated with strong batting performance. There are many metrics to choose from to evaluate this, but I am going to choose OBP. players with faster swing speeds might reach base more often, contributing to a higher OBP.
Ultimately, I want to uncover which swing characteristics are linked to strong hitting metrics, and which hitting metrics best predict a team’s success. These insights could be valuable to both baseball fans and managers looking to build a winning roster. I’m especially interested in this topic because I played baseball growing up, and the sport’s deep integration of data—especially around swing mechanics and player performance—has always fascinated me. As a fan of the game, I want to know the answers.
2024 Player Data
Dataset Summary:
This dataset from baseball savant (https://baseballsavant.mlb.com/statcast_leaderboard) contains 2024 MLB season statistics at the individual player level, focusing on batting performance and swing characteristics. It includes both traditional stats (like home runs, strikeouts, Batting Average) and advanced metrics (like swing speed, hard hit percentage, and sweet spot rate). It samples information on 129 MLB players
Data Dictionary:
| Column Name | Description |
| last_name, first_name | Player’s full name |
| player_id | Unique identifier for each MLB player |
| year | Season year (2024) |
| ab | At-bats |
| pa | Plate appearances |
| hit | Total hits |
| single | Singles hit |
| double | Doubles hit |
| triple | Triples hit |
| home_run | Home runs hit |
| strikeout | Total strikeouts |
| walk | Total walks (bases on balls) |
| k_percent | Strikeout rate (% of plate appearances ending in strikeout) |
| bb_percent | Walk rate (% of plate appearances ending in walk) |
| batting_avg | Batting average (hits ÷ at-bats) |
| on_base_percent | On-base percentage |
| avg_swing_speed | Average swing speed (likely in mph) |
| squared_up_contact | Number of “squared-up” hits (well-struck balls) |
| squared_up_swing | % of swings that produced squared-up contact |
| avg_swing_length | Average length of a player’s swing (possibly in inches or relative metric) |
| swords | Number of “sword” swings (awkward or defensive swings) |
| sweet_spot_percent | % of batted balls hit on the “sweet spot” of the bat |
| barrel_batted_rate | % of batted balls that were “barrels” (optimal combo of exit velocity and launch angle) |
| poorlyunder_percent | % of batted balls hit under the ball poorly (pop-ups) |
| poorytopped_percent | % of batted balls hit on top poorly (groundouts) |
| hard_hit_percent | % of batted balls with exit velocity ≥95 mph |
| avg_best_speed | Peak swing speed per player (average of their best swings) |
| avg_hyper_speed | Average of a player’s top-tier swing speeds |
| whiff_percent | % of swings that completely missed the ball |
| swing_percent | % of pitches a player swings at |
Summary Insights from 2024 MLB Swing and Hitting Data
This table presents summary statistics for On-Base Percentage and various swing-related metrics that may impact OBP for 129 MLB players from the 2024 season.. Most hitters had similar average swing speeds (around 72 mph), but top-end swing speed varied more and may contribute to power. On average, players barreled 9% of balls and hit 42% at 95+ mph or higher, both strong indicators of performance. About one-third of batted balls hit the sweet spot, while poorly hit grounders and pop-ups remained common. Whiff rates ranged widely, showing differences in contact ability. Overall, small differences in swing metrics — especially those tied to contact quality — help explain differences in offensive success.
Descriptive Analytics
[1] "Correlation between Swing Speed and OBP: 0.249"
The plot shows a weak positive relationship between swing speed and on-base percentage, with a correlation of 0.249, suggesting swing speed has limited impact on getting on base.
[1] "Correlation between Hard-Hit Percentage and OBP: 0.4"
The plot shows a moderate positive relationship between hard-hit percentage and on-base percentage, with a correlation of 0.400, suggesting that players who make more hard contact tend to get on base more often.
[1] "Correlation between Swing Percentage and OBP: -0.355"
The plot shows a moderate negative relationship between swing percentage and on-base percentage, with a correlation of -0.355, suggesting that players who swing more frequently tend to get on base less often.
[1] "Correlation between Swing Length and OBP: -0.022"
The plot shows almost no relationship between swing length and on-base percentage, with a correlation of -0.022, indicating that swing length has little to no impact on a player’s ability to get on base.
[1] "Correlation between Barrel Rate and OBP: 0.444"
The plot shows a moderate positive relationship between barrel rate and on-base percentage, with a correlation of 0.444, suggesting that players who generate more barrels tend to reach base more often.
[1] "Correlation between Poorly Topped Percentage and OBP: -0.227"
The plot shows a weak negative relationship between poorly topped percentage and on-base percentage, with a correlation of -0.227, suggesting that frequent grounders have a modest association with lower OBP.
The box plot shows that players with both high hard-hit percentage and low swing percentage tend to have higher on-base percentages (OBP) compared to all other players. This suggests that combining strong contact with selective swinging is associated with better offensive performance.
Secondary Data Source: Baseball Reference
In our previous analysis, I examined individual swing characteristics to identify which traits are most strongly associated with a high on-base percentage (OBP). It turns out that combining plate discipline with swinging less and having high, hard-hit ball percentages plays a major factor. Building on that, I now want to explore how closely OBP correlates with winning baseball games compared to other hitting statistics from the same year, 2024.
The data for this analysis was scraped from baseball-reference.com. Baseball-Reference.com is a comprehensive online database that provides detailed statistics, historical records, and player profiles for Major League Baseball (MLB) and other leagues. ta Fans, analysts, and researchers widely use itDictu for its in-depth data on team performance, individual player stats, game logs, and advanced metrics.
I will scrape data from Baseball Reference containing team hitting statistics from the 2024 MLB season. After importing the data into R, I will manually add a column with each team’s win percentage for that season by going to MLB.com and looking at the teams’ records for the year 2024. My goal is to analyze how various offensive metrics—such as Batting Average (AVG), On-Base Percentage (OBP), Strikeouts (SO), and Home Runs (HR)—correlate with team success. I will use scatter plots to visually explore the relationship between these offensive statistics and win percentage.
Data Dictionary:
Analysis
[1] "Correlation between Home Runs and Win Percentage: 0.672"
The correlation between Home Runs and Win Percentage is 0.672, indicating a moderately strong positive relationship. Teams that hit more home runs generally tend to win more games, making home run production a key offensive factor worth tracking.
[1] "Correlation between Batting Average and Win Percentage: 0.617"
The correlation between Batting Average and Win Percentage is 0.617, indicating a moderately strong positive relationship. This suggests that teams with higher batting averages tend to perform better overall, though the impact is slightly less pronounced than with home runs.
[1] "Correlation between On-Base Percentage and Win Percentage: 0.789"
The correlation between On-Base Percentage and Win Percentage is 0.789, indicating a strong positive relationship. This suggests that consistently getting on base is a significant driver of team success, more so than batting average or home runs alone.
[1] "Correlation between the team's batters' ages and Win Percentage: 0.309"
The correlation between a team’s average batter age and Win Percentage is 0.309, indicating a weak positive relationship. This suggests that while older lineups may have a slight edge, age alone is not a strong predictor of team success. It does show that older teams tend to be better than younger teams, at least in 2024.
[1] "Correlation between Strikeouts and Win Percentage: -0.305"
The correlation between Strikeouts and Win Percentage is -0.305, indicating a weak negative relationship. Teams that strike out more tend to win slightly less often, but the connection is not particularly strong. This makes sense because teams could often strikeout but also hit for power and hit lots of doubles and home runs.
[1] "Correlation between Stolen Bases and Win Percentage: 0.1"
The correlation between Stolen Bases and Win Percentage is 0.1, indicating a very weak positive relationship. This suggests that while stealing bases may offer some advantage, it has minimal overall impact on a team’s win percentage.
[1] "Correlation between Slugging Percentage and Win Percentage: 0.749"
The correlation between Slugging Percentage and Win Percentage is 0.749, indicating a strong positive relationship. This suggests that teams with higher slugging percentages—those generating more extra-base hits—tend to perform significantly better in terms of winning games.
[1] "Correlation between BatAge and Win Percentage: 0.309"
The correlation between BatAge and Win Percentage is 0.309, indicating a weak positive relationship. This suggests that teams with slightly older batting lineups may perform marginally better, but age alone is not a strong indicator of winning. Older teams do tend to play better, and this could be attributed to being more experienced.
2024 Team Data Conclusion:
This analysis highlights several key insights about the offensive factors that contribute to team success in Major League Baseball. Among the offensive statistics examined, On-Base Percentage (OBP) and Slugging Percentage (SLG) showed the strongest positive correlations with winning, indicating that teams that consistently get on base and hit for power are most likely to succeed. Overall though, OBP had the highest correlation with winning.
Home Runs (HR) and Batting Average (BA) also displayed moderately strong positive relationships with win percentage, reinforcing the importance of both power hitting and consistent contact at the plate. In contrast, Strikeouts (SO) had a weak negative relationship with winning, suggesting that while avoiding strikeouts can help, it is not the sole determinant of success. Similarly, Stolen Bases (SB) showed a very weak positive correlation, implying that while speed can be a small advantage, it is much less critical than power and getting on base.
Average Batter Age (BatAge) had a weak positive relationship with winning percentage, suggesting that experience might provide a slight edge but is not a major driver of team success.
Overall, the data supports the idea that building a strong offensive team in today’s MLB environment depends more on getting on base and producing extra-base hits than on traditional measures like batting average or aggressive base running. Teams aiming to maximize their chances of winning should prioritize players who can consistently reach base and deliver impactful hits.
Overall Findings
In the 2024 MLB season, on-base percentage (OBP) had the strongest correlation with team winning percentage, showing a very strong positive relationship. Teams that consistently got players on base were the most successful, even more so than teams that simply hit a lot of home runs or had a high batting average. At the individual player level, several hitting traits emerged as key drivers of a high OBP. Players with higher hard-hit percentages and barrel rates tended to have significantly better on-base numbers, meaning the ability to consistently make strong contact was critical. In contrast, players who swung more frequently tended to have lower OBPs, suggesting that selectivity at the plate — swinging less often but making better contact when doing so — was an important factor. While swing speed showed only a weak relationship with OBP, players who combined powerful contact with patience were far more effective at reaching base. Overall, the data makes it clear that strong hitting performance in 2024 was less about swinging hard or often, and more about hitting the ball hard, barreling it consistently, and being disciplined enough to wait for good pitches.