Introduction
1.1 Brief Summary of Project Proposal
Currently I am doing a part-time internship with Archbishop Moeller High
School’s baseball team. In this internship I am a part of the baseball
data analytics team. Me and other interns show up to games and practice
collecting data about pitching and hitting. The data set that I will be
using is all the pitching data we have collected this year. It is called
the “PreseasonPens.csv.” This data set has 17 total columns, and 4,741
pitches recorded. There aren’t many missing values because we are tasked
to clean the sheet every single week. In my final project I will
investigate how successful hitters are in different pitch counts. Below
I explain the different variables and values in this CSV.
1.2 Dataset Name
-
“PreseasonPens.csv” – contains all pitch data collected this year.
-
View the Dataset
Dataset Structure
2.1 Rows
-
Each row represents a single pitch attempt.
2.2 Columns
-
A total of 17 variables recorded per pitch.
Variable Details
3.1 Player Information
-
pitcher: Pitcher’s name (first and last; occasionally
missing first name).
-
throws: Pitcher’s throwing hand (“R” for right-handed,
“L” for left-handed).
-
batter: Batter’s name (first and last; occasionally
missing first name).
-
bats: Batter’s hitting side (“R” for right-handed or
“L” for left-handed).
3.2 Pitch Details
-
pitch_result: Outcome of the pitch (ball, called
strike, swinging strike, foul, ball in play).
-
pitch_type: Type of pitch (fastball, sinker, curveball,
slider, changeup, splitter).
-
pitch_velocity: Speed of the pitch (in MPH). A speedy
pitch is anything above 85 MPH and a slow pitch is anything below 70
MPH. Anything below 70 MPH is typically a breaking ball.
3.3 Count & Impact Metrics
-
ball: Number of balls in the current at bat during that
pitch. (We do that to see how well hitters/pitchers do in certain count
situations).
-
strikes: Number of strikes in the current at bat during
that pitch.
-
exit velocity: Speed at which the ball leaves the bat
(in MPH). (Occasionally missing on foul balls)
-
launch_angle: Angle of the batted ball (high positive
values for fly balls, low positive numbers for line drives, negative for
ground balls).
3.4 Play Outcome
-
play_result: Final outcome of the at bat (e.g.,
groundout, flyout, line out, single, double, triple, home run, walk,
called strikeout, swinging strikeout).
-
bip_position: Fielding position where the ball was hit
(one of 9 possible positions: P (Pitcher), C (Catcher), 1B (1st
Baseman), 2B (2nd Baseman), 3B (3rd Baseman), SS (Shortstop), LF (Left
Fielder), CF (Center Fielder), RF (Right Fielder)).
3.5 Additional Information
-
charter_name: Name of the intern who charted the pitch.
-
attack_zone: Numeric code for the pitch location
relative to the strike zone:
-
1–9: Heart of the strike zone (always a strike).
-
11–19: Shadow of the strike zone (could be a strike).
-
21–29: Near strike zone (ball-like).
-
31–39: Wild pitch, far outside the strike zone.
Note: Also segmented by left (1–7), middle (2–8), and right (3–9).
-
date: Date and time when the pitch was recorded.
-
email: The value is constant and is always test@gmail.com. This
value doesn’t matter since it has nothing to do with the pitch result.