Introduction

Baseball has always been a sport defined by numbers, trends, and the evolving ways players generate offense. To better understand how hitters contribute to run production and overall team success, this analysis focuses on a historical dataset of Major League Baseball hitting performance. By examining key offensive metrics, ranging from traditional counting stats like hits and home runs to modern efficiency measures such as on‑base percentage and on-base plus slugging percentage, this project aims to uncover patterns in player performance and highlight the characteristics that separate average hitters from elite offensive contributors. Through data visualization and statistical exploration, the goal is to translate raw numbers into meaningful insights about how hitters impact the game.

Dataset

The analysis draws exclusively from the baseball_hitting.csv dataset, a comprehensive collection of player‑level offensive statistics spanning multiple eras of Major League Baseball. Each row represents a single player’s performance in a given season, capturing both foundational metrics, such as games played, at‑bats, hits, doubles, triples, home runs, RBIs, walks, and strikeouts, and advanced rate statistics including batting average (AVG), on‑base percentage (OBP), slugging percentage (SLG), and on‑base plus slugging (OPS). Together, these variables provide a detailed view of a hitter’s production, efficiency, and power. Although the broader database includes pitching data, this project focuses solely on the hitting portion to allow for a deeper, more targeted exploration of offensive performance across MLB history.

Findings

The visualizations that follow explore how offensive performance in Major League Baseball has taken shape across different eras and player profiles. Patterns in the data reveal how hitters generate value through a mix of contact ability, power, plate discipline, and efficiency. The charts highlight both long‑standing trends and modern shifts toward slugging and extra‑base production. Together, the findings offer a clearer understanding of how elite hitters separate themselves from the league and how offensive strategy has evolved throughout MLB history.

import pandas as pd

path = "C:/Users/Kyle/Downloads/Data_Visualization/"
filename = "baseball_hitting.csv"
df = pd.read_csv(path + filename)

Descriptive Statistics

The dataset contains individual offensive statistics for MLB hitters, including traditional metrics (Hits, Doubles, Triples, Home Runs, RBI, Walks) and rate-based performance measures (AVG, OBP, SLG, OPS). Descriptive statistics reveal meaningful variation across players. The average batting average (AVG) centers around typical league norms, while slugging percentage (SLG) and home run totals show a wider spread, indicating clear separation between power hitters and contact oriented players. Measures such as OBP and OPS also display moderate variability, reflecting differences in plate discipline and overall offensive production. The range and standard deviation across extra-base hit categories (2B, 3B, HR) highlight the diversity of hitter profiles within the dataset. These descriptive statistics provide a foundational understanding of the distribution and variability of offensive performance before moving into visualization and deeper analysis.

df.head()
##    Player name position  ...  Slugging Percentage  On-base Plus Slugging
## 0      B Bonds       LF  ...                0.607                  1.051
## 1      H Aaron       RF  ...                0.555                  0.929
## 2       B Ruth       RF  ...                0.690                  1.164
## 3     A Pujols       1B  ...                0.544                  0.918
## 4  A Rodriguez       SS  ...                0.550                  0.930
## 
## [5 rows x 18 columns]
df.columns = df.columns.str.strip()
numeric_df = df.select_dtypes(include=['float64', 'int64'])
desc_stats = numeric_df.describe().T
desc_stats
##                         count         mean  ...       75%        max
## Games                  2500.0  1084.558000  ...  1438.250   3562.000
## At-bat                 2500.0  3714.962000  ...  5105.750  14053.000
## Runs                   2500.0   521.644800  ...   719.250   2295.000
## Hits                   2500.0  1010.865600  ...  1399.250   4256.000
## Double (2B)            2500.0   181.858000  ...   249.000    792.000
## third baseman          2500.0    32.330800  ...    42.250    309.000
## home run               2500.0   100.611600  ...   125.500    762.000
## run batted in          2500.0   494.206400  ...   656.250   2297.000
## a walk                 2500.0   373.038000  ...   486.250   2558.000
## stolen base            2500.0    76.095200  ...    89.000   1406.000
## AVG                    2500.0     0.263320  ...     0.278      0.367
## On-base Percentage     2500.0     0.331582  ...     0.351      0.482
## Slugging Percentage    2500.0     0.409925  ...     0.441      0.690
## On-base Plus Slugging  2488.0     0.741695  ...     0.784      1.164
## 
## [14 rows x 8 columns]

Relationship Between Batting Average and Homeruns

This visualization highlights one of baseball’s longest‑running debates: whether hitters must sacrifice batting average to generate elite home run power. By plotting batting average against career home run totals, the chart reveals that MLB history is filled with a wide range of offensive profiles. While some sluggers maintain only modest batting averages, the most iconic power hitters, those who sit atop the home run leaderboard, tend to cluster in a surprisingly balanced zone, combining strong contact ability with elite power. The pattern seen underscores a broader truth about MLB hitting: the greatest offensive threats are rarely one‑dimensional. Instead, they blend consistent contact with the ability to change a game in a single swing, shaping how teams evaluate and develop hitters across eras.

import matplotlib.pyplot as plt
import numpy as np

plt.figure(figsize=(18,10))

colors = df['home run']
sizes = df['Hits'] / 5

plt.scatter(df['AVG'], df['home run'], 
            c=colors, s=sizes, cmap='viridis', alpha=0.7)

plt.title('Relationship Between Batting Average and Home Runs', fontsize=18)
plt.xlabel('Batting Average (AVG)', fontsize=14)
plt.ylabel('Home Runs (Career Total)', fontsize=14)

cbar = plt.colorbar()
cbar.set_label('Home Runs (Career Total)', rotation=270, fontsize=14, labelpad=30)

top5 = df.nlargest(5, 'home run')
for i, row in top5.iterrows():
    plt.text(row['AVG'] + 0.002, row['home run'] + 5, row['Player name'], fontsize=12)

my_x_ticks = [round(x,3) for x in list(np.arange(df['AVG'].min(), df['AVG'].max()+0.01, 0.02))]
plt.xticks(my_x_ticks, fontsize=12)
my_y_ticks = list(range(0, int(df['home run'].max())+50, 50))
plt.yticks(my_y_ticks, fontsize=12)
plt.grid(True, linestyle='--', alpha=0.3)

plt.show()

Top 10 Hitters by OPS

OPS is one of the most complete measures of offensive dominance, and this visualization makes clear just how far above the league norm the greatest hitters have performed. By comparing the top ten players in MLB history to the league, average OPS, the chart illustrates the extraordinary gap between everyday hitters and the sport’s all‑time offensive legends. These players didn’t just excel in one area, they consistently reached base and hit for power at levels unmatched by their peers. The visualization also reflects how offensive expectations have shifted over time: while league averages fluctuate with changes in pitching, ballparks, and strategy, the very best hitters transcend their eras. Their OPS values tower over the baseline, reminding us that true offensive greatness is both rare and historically significant.

import matplotlib.pyplot as plt
import numpy as np

d = df.sort_values('On-base Plus Slugging', ascending=False).head(10).copy()

colors = plt.cm.Blues(np.linspace(0.4, 1, len(d)))

plt.figure(figsize=(18, 10))
bars = plt.barh(d['Player name'], d['On-base Plus Slugging'],
                color=colors, edgecolor='black')

plt.gca().invert_yaxis()

for bar in bars:
    width = bar.get_width()
    plt.text(width + 0.005,
             bar.get_y() + bar.get_height()/2,
             f"{width:.3f}",
             va='center',
             fontsize=12,
             fontweight='bold')
    
league_avg_ops = 0.750
plt.axvline(league_avg_ops, color='red', linestyle='dashed', linewidth=2)
plt.text(league_avg_ops + 0.005, -0.5,
         "League Avg OPS (~.750)",
         color='red',
         fontsize=12)

plt.title("Top 10 Hitters by On-base Plus Slugging\nA Complete Measure of Offensive Dominance",
          fontsize=20, fontweight='bold')
plt.xlabel("On-base Plus Slugging (OPS)", fontsize=16)
plt.ylabel("Player", fontsize=16)
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.tight_layout()
plt.show()

Do Power Hitters Sacrafice Batting Average?

This chart deepens the exploration of power versus contact by adding OPS as a color dimension, revealing how overall offensive value fits into the relationship between home runs and batting average. The visualization shows that while some high‑home‑run hitters do carry lower batting averages, the most complete offensive players, those with elite OPS, tend to excel in both categories. The color gradient highlights how OPS acts as a bridge between raw power and consistent on‑base ability. Across MLB history, the hitters who shaped offensive eras were not simply home run machines; they were multidimensional threats who combined patience, power, and contact. This reinforces a key theme in baseball analytics, OPS captures the full picture of offensive impact better than any single statistic.

import matplotlib.pyplot as plt
import numpy as np
plt.figure(figsize=(14, 10))
scatter = plt.scatter(df['home run'],
                      df['AVG'],
                      c=df['On-base Plus Slugging'],
                      cmap='coolwarm',
                      s=120,
                      edgecolor='black')
cbar = plt.colorbar(scatter)
cbar.set_label('On-base Plus Slugging (OPS)', fontsize=14)
plt.xlabel('home run', fontsize=16)
plt.ylabel('AVG', fontsize=16)
plt.title('Do Power Hitters Sacrifice Batting Average?\nRelationship Between HR and AVG',
          fontsize=20, fontweight='bold')
top_players = df.sort_values('home run', ascending=False).head(3)
for _, row in top_players.iterrows():
    plt.text(row['home run'] + 1,
             row['AVG'],
             row['Player name'],
             fontsize=12,
             fontweight='bold')
plt.grid(alpha=0.3)
plt.show()

How Elite Hitters Build Their OPS

Breaking OPS into on‑base percentage and slugging percentage reveals the diverse ways hitters generate offensive value. Some players built their success on exceptional plate discipline and the ability to reach base, while others relied more heavily on raw power. The variety of profiles across eras reflects the shifting nature of MLB offense. Earlier generations often emphasized contact and patience, whereas modern baseball leans more heavily toward slugging and extra‑base hits. Yet across all eras, elite hitters find a way to excel in at least one of these areas, and the most iconic names stand out for their strength in both. The visualization captures the wide range of offensive identities that have shaped MLB history and highlights how different skill sets can lead to sustained excellence.

import matplotlib.pyplot as plt
import numpy as np

d = df.sort_values('On-base Plus Slugging', ascending=False).head(20).copy()
plt.figure(figsize=(18, 10))
plt.bar(d['Player name'], d['On-base Percentage'],
        label='On-base % (OBP)', color='blue', edgecolor='black')
plt.bar(d['Player name'], d['Slugging Percentage'],
        bottom=d['On-base Percentage'],
        label='Slugging % (SLG)', color='red', edgecolor='black')
plt.xticks(rotation=45, ha='right', fontsize=14)
plt.ylabel('OPS Components', fontsize=16)
plt.title('How Elite Hitters Build Their OPS\nOBP vs SLG Contribution',
          fontsize=20, fontweight='bold')
plt.legend(fontsize=14)
plt.tight_layout()
plt.show()

Skill Profile Comparison Amoung Top Hitters

The heatmap provides a comprehensive look at how legendary hitters performed across multiple offensive metrics, offering a side‑by‑side comparison of their strengths. Normalized values reveal distinct identities: some players shine as high‑average contact hitters, others as power‑driven sluggers, and a select few as complete offensive forces who excel in every category. MLB has always been defined by a mix of styles, and the heatmap makes those contrasts easy to see. The variety of strengths across players and eras highlights how offensive excellence can emerge in many forms. Whether through patience, power, or consistent contact, each hitter left a unique imprint on the league’s offensive landscape and contributed to the evolution of hitting philosophy.

import seaborn as sns
import matplotlib.pyplot as plt

df.columns = df.columns.str.strip()
metrics = ['AVG', 'On-base Percentage', 'Slugging Percentage', 'home run', 'On-base Plus Slugging']
hm_df = df[['Player name'] + metrics].dropna(subset=['On-base Plus Slugging']).set_index('Player name')
top_n = 15
hm_df = hm_df.sort_values('On-base Plus Slugging', ascending=False).head(top_n)
hm_norm = (hm_df - hm_df.min()) / (hm_df.max() - hm_df.min())
annot_vals = hm_df.round(3).values

plt.figure(figsize=(14, 8))
sns.heatmap(
    hm_norm,
    cmap='coolwarm',
    linewidths=0.4,
    annot=annot_vals,
    fmt=".3f",
    annot_kws={'size': 9},
    cbar_kws={'label': 'Normalized Performance (0–1)'}
)
plt.title("Top Hitters by OPS – Skill Profile Heatmap", fontsize=18, pad=16)
plt.xlabel("Offensive Metric", fontsize=14)
plt.ylabel("Player", fontsize=14)
plt.xticks(fontsize=12)
plt.yticks(fontsize=10)
plt.tight_layout()
plt.show()

Conclusion

The visualizations work together to reveal a clear narrative about how MLB hitters have shaped the sport across generations. Patterns in the data show that offensive excellence emerges from a combination of power, contact ability, and plate discipline, and the most influential players consistently excel in more than one area. The charts highlight how offensive styles shift with changes in strategy, training, and league environment, yet the core traits that define great hitters remain remarkably stable. The story told by the data is one of evolution layered on top of continuity, where new approaches emerge but the foundations of hitting success endure.

The comparisons across eras also show how different offensive identities can lead to similar levels of impact. Some players dominate through patience and on‑base ability, others through raw power, and a select few through a rare blend of both. Those contrasts illustrate how MLB has always been shaped by a diverse mix of hitting philosophies. By examining these relationships visually, the analysis uncovers how individual performances connect to broader trends, offering a deeper understanding of MLB’s offensive history and the forces that continue to influence the modern game.