July 30, 2015

Introduction

A data frame with 15,307 observations on the following 12 variables. Each observation is a single pitch.

About Verlander:

  • Height 6'5"
  • Born February 20, 1983
  • He dropped out of college
  • He played for the Detroit Tigers
  • He played pitcher

Data:

  • Type of pitch thrown:
    • CH (Change-up)
    • CU (Curve ball)
    • FF (Four-Seam Fastball)
    • FT (Two-Seam Fastball)
    • SL (Slider)

  • batter_hand
    • A factor with two values
      • L (left)
      • R (right)
  • speed
    • speed of pitch (in mph).
  • px
    • x-coordinate of pitch (in feet, measured from center of plate)
  • pz
    • vertical coordinate of pitch (in feet above plate)

  • pfx_x
    • the horizontal movement, in inches, of the pitch between the release point and home plate, as compared to a theoretical pitch thrown at the same speed with no spin-induced movement.
    • Measured at 40 feet from home plate.
  • pfx_z
    • the vertical movement, in inches, of the pitch between the release point and home plate, as compared to a theoretical pitch thrown at the same speed with no spin-induced movement.
    • Measured at 40 feet from home plate.

In this project:

  • How does the PITCHf/x program use different variables to determine pitch type?
    • Does batter hand effect pitch type?
    • Does the strike count effect pitch type?

Methods

Does batter hand effect pitch type?

  • Because both are factors variables we are using cross tables, row percentages, and bar charts.
  • Cross Tables:will give us the relationship between two factor variables.
  • Row Percents:with take the data from cross tables and create percentages to study the relationship.
    • explanatory variable=batter hand
    • response variable=pitch type
  • Bar Charts:provide quick-and-easy bar charts for the graphical exploration of factor variables.

Does the strike count effect pitch type?

  • Because we have both a numerical(strike count) and a factor(pitch type) we will use favorite statistics and histograms.
  • Favorite Statistics: in order to study the relationship between a numerical and factor variable you must break the data into groups and look for differences. If you see large differences you know the variables are related.
  • Histogram:is used to study the distribution of numerical variables

How does the PITCHf/x program determine pitch type?

  • Because the data is factor variables we will use cloud plots.
  • Cloud Plot: is a three denominational scatter plot of the variables from the data set
    • allows for the combination of three sets of data
  • K-Nearest Neighbor:calculates the distance from the data give to predict where future points will fall.

Results

Does batter hand effect pitch type?

Here's a cross table of pitch type and batter hand.

##            pitch_type
## batter_hand   CH   CU   FF   FT   SL
##           L 2024 1529 3832 1303  178
##           R  526 1187 2924  718 1086

Here's the cross table data in percentage form.

##            pitch_type
## batter_hand     CH     CU     FF     FT     SL  Total
##           L  22.83  17.25  43.22  14.70   2.01 100.00
##           R   8.17  18.43  45.40  11.15  16.86 100.00

Here's bar chart showing the data.

Does the strike count effect pitch type?

Here is the data of the pitch type based on how many strikes there are.

##   pitch_type min Q1 median Q3 max      mean        sd    n missing
## 1         CH   0  0      1  2   2 0.8250980 0.8304696 2550       0
## 2         CU   0  1      2  2   2 1.3435199 0.7409356 2716       0
## 3         FF   0  0      1  2   2 0.8502072 0.8284727 6756       0
## 4         FT   0  0      1  1   2 0.7659574 0.8145636 2021       0
## 5         SL   0  0      1  2   2 1.0838608 0.8361918 1264       0

Here is the data in graph form.

How does the PITCHf/x program determine pitch type?

Here are different views of multiple variable plotted on a three dimensional graph.

This is a top view which shows the variation of speed in pitches.

This is an angled view showing where different pitches land in the batter box.

This is an angled view showing the arc of each pitch type.

K-Nearest Neighbor

It is a non parametric method used for classification method and regression. It can be useful to assign amounts of weight to neighboring data so that the data with in a certain range can contribute more than the distant data. Neighbors can be taken from set from which the class or values are known, known as the training set.

These are the predictions of which type of pitch is thrown.

## [1] CU FF FF FF CH
## Levels: CH CU FF FT SL

This compares the predictions to what actually occurred.

## [1] TRUE TRUE TRUE TRUE TRUE

This it the percentage of times the computer predicted correctly.

## [1] 0.9749353

This is a cross table of the data.

##         verTest$pitch_type
## knn.pred   CH   CU   FF   FT   SL
##       CH  810    0    0    5   13
##       CU    0  868    0    0    8
##       FF   15    0 2216   53    0
##       FT    2    0   27  595    0
##       SL    0    3    0    0  412

Discussion

After thoroughly observing the data listed above, we found that certain variables affect pitch type and how the PITCHf/x predicts and collects the data taken at a baseball game.

In the first research question, we analyzed whether or not batter-hand effects the pitch type that will be thrown. Overall the fastballs are used about equally for both left and right handed batters; however Verlander typically used Change Ups about 15% more often for left handed batters. He also tended to throw sliders about 15% more often to right handed batters.

For our second additional research question, we looked into the relationship between the strike count and pitch type. On the first pitch Verlander tends to throw a fastball or change up, however on the second strike he would typically change to curve balls. 

Our overall research question discussed how the PITCHf/x machine used variables within a game to predict which pitch type would be thrown. In the cloud plot we are able to see the relationship between the position and speed of the ball. We are trying to classify the pitch based off the speed. The two seam and the four seam fast balls are equal in speed but there is clearly different breaks in their positions. The two seamer typically has more break because it has less spin, whereas the four seamer has more spin and stays on path.

Our K-Nearest Neighbor data collection we were able to predict which type of pitch Verlander was going to throw. The computer was able to predict the throws correctly at a 97% rate.