Introduction

A data frame with 15307 observations on the following 12 variables. Each observation is a single pitch.

About Verlander:

  • Height 6’5"
  • Born February 20 1983
  • He dropped out of college
  • He played for the Detroit Tigers
  • He played pitcher

Data:

  • Type of pitch thrown:
    • CH (Change-up)
    • CU (Curve ball)
    • FF (Four-Seam Fastball)
    • FT (Two-Seam Fastball)
    • SL (Slider)
  • px
    • x-coordinate of pitch (in feet, measured from center of plate)
  • pz
    • vertical coordinate of pitch (in feet above plate)
  • pfx_x
    • the horizontal movement, in inches, of the pitch between the release point and home plate, as compared to a theoretical pitch thrown at the same speed with no spin-induced movement.
    • Measured at 40 feet from home plate.
  • pfx_z
    • the vertical movement, in inches, of the pitch between the release point and home plate, as compared to a theoretical pitch thrown at the same speed with no spin-induced movement.
    • Measured at 40 feet from home plate.
  • batter_hand
    • A factor with two values: L (left) and R (right).

In this project:

  • How does the PITCHf/x program use different variables to determine pitch type?
    • Does batter hand effect pitch type?
    • Does the strike count effect pitch type?

Methods

Does batter hand effect pitch type?

Does the strike count effect pitch type?

How does the PITCHf/x program determine pitch type?

Results

Does batter hand effect pitch type?

Here’s a cross table of pitch type and batter hand.

##            pitch_type
## batter_hand   CH   CU   FF   FT   SL
##           L 2024 1529 3832 1303  178
##           R  526 1187 2924  718 1086

Here’s the cross table data in percentage form.

##            pitch_type
## batter_hand     CH     CU     FF     FT     SL  Total
##           L  22.83  17.25  43.22  14.70   2.01 100.00
##           R   8.17  18.43  45.40  11.15  16.86 100.00

Here’s bar chart showing the data.

Does the strike count effect pitch type?

Here is the data of the pitch type based on how many strikes there are.

##   pitch_type min Q1 median Q3 max      mean        sd    n missing
## 1         CH   0  0      1  2   2 0.8250980 0.8304696 2550       0
## 2         CU   0  1      2  2   2 1.3435199 0.7409356 2716       0
## 3         FF   0  0      1  2   2 0.8502072 0.8284727 6756       0
## 4         FT   0  0      1  1   2 0.7659574 0.8145636 2021       0
## 5         SL   0  0      1  2   2 1.0838608 0.8361918 1264       0

Here is the data in graph form.

How does the PITCHf/x program determine pitch type?

Here are different views of multiple variable plotted on a three dimensional graph.

This is a top view which shows the variation of speed in pitches.

This is an angled view showing where different pitches land in the batter box.

This is an angled view showing the arc of each pitch type.

K-Nearest Neighbor

It is a non parametric method used for classification method and regression. It can be useful to assign amounts of weight to neighboring data so that the data with in a certain range can contribute more than the distant data. Neighbors can be taken from set from which the class or values are known, known as the training set.

These are the predictions of which type of pitch is thrown.

## [1] CU FF FF FF CH
## Levels: CH CU FF FT SL

This compares the predictions to what actually occurred.

## [1] TRUE TRUE TRUE TRUE TRUE

This it the percentage of times the computer predicted correctly.

## [1] 0.9749353

This is a cross table of the data.

##         verTest$pitch_type
## knn.pred   CH   CU   FF   FT   SL
##       CH  810    0    0    5   13
##       CU    0  868    0    0    8
##       FF   15    0 2216   53    0
##       FT    2    0   27  595    0
##       SL    0    3    0    0  412

Discussion

After thoroughly observing the data listed above, we found that certain variables affect pitch type and how the PITCHf/x predicts and collects the data taken at a baseball game.

In the first research question, we analyzed whether or not batter-hand effects the pitch type that will be thrown. Overall the fastballs are used about equally for both left and right handed batters; however Verlander typically used Change Ups about 15% more often for left handed batters. He also tended to throw sliders about 15% more often to right handed batters.

For our second additional research question, we looked into the relationship between the strike count and pitch type. On the first pitch Verlander tends to throw a fastball or change up, however on the second strike he would typically change to curve balls. 

Our overall research question discussed how the PITCHf/x machine used variables within a game to predict which pitch type would be thrown. In the cloud plot we are able to see the relationship between the position and speed of the ball. We are trying to classify the pitch based off the speed. The two seam and the four seam fast balls are equal in speed but there is clearly different breaks in their positions. The two seamer typically has more break because it has less spin, whereas the four seamer has more spin and stays on path.

Our K-Nearest Neighbor data collection we were able to predict which type of pitch Verlander was going to throw. The computer was able to predict the throws correctly at a 97% rate.