A data frame with 15307 observations on the following 12 variables. Each observation is a single pitch.
Does batter hand effect pitch type?
Does the strike count effect pitch type?
How does the PITCHf/x program determine pitch type?
Here’s a cross table of pitch type and batter hand.
## pitch_type
## batter_hand CH CU FF FT SL
## L 2024 1529 3832 1303 178
## R 526 1187 2924 718 1086
Here’s the cross table data in percentage form.
## pitch_type
## batter_hand CH CU FF FT SL Total
## L 22.83 17.25 43.22 14.70 2.01 100.00
## R 8.17 18.43 45.40 11.15 16.86 100.00
Here’s bar chart showing the data.
Here is the data of the pitch type based on how many strikes there are.
## pitch_type min Q1 median Q3 max mean sd n missing
## 1 CH 0 0 1 2 2 0.8250980 0.8304696 2550 0
## 2 CU 0 1 2 2 2 1.3435199 0.7409356 2716 0
## 3 FF 0 0 1 2 2 0.8502072 0.8284727 6756 0
## 4 FT 0 0 1 1 2 0.7659574 0.8145636 2021 0
## 5 SL 0 0 1 2 2 1.0838608 0.8361918 1264 0
Here is the data in graph form.
Here are different views of multiple variable plotted on a three dimensional graph.
This is a top view which shows the variation of speed in pitches.
This is an angled view showing where different pitches land in the batter box.
This is an angled view showing the arc of each pitch type.
K-Nearest Neighbor
It is a non parametric method used for classification method and regression. It can be useful to assign amounts of weight to neighboring data so that the data with in a certain range can contribute more than the distant data. Neighbors can be taken from set from which the class or values are known, known as the training set.
These are the predictions of which type of pitch is thrown.
## [1] CU FF FF FF CH
## Levels: CH CU FF FT SL
This compares the predictions to what actually occurred.
## [1] TRUE TRUE TRUE TRUE TRUE
This it the percentage of times the computer predicted correctly.
## [1] 0.9749353
This is a cross table of the data.
## verTest$pitch_type
## knn.pred CH CU FF FT SL
## CH 810 0 0 5 13
## CU 0 868 0 0 8
## FF 15 0 2216 53 0
## FT 2 0 27 595 0
## SL 0 3 0 0 412
After thoroughly observing the data listed above, we found that certain variables affect pitch type and how the PITCHf/x predicts and collects the data taken at a baseball game.
In the first research question, we analyzed whether or not batter-hand effects the pitch type that will be thrown. Overall the fastballs are used about equally for both left and right handed batters; however Verlander typically used Change Ups about 15% more often for left handed batters. He also tended to throw sliders about 15% more often to right handed batters.
For our second additional research question, we looked into the relationship between the strike count and pitch type. On the first pitch Verlander tends to throw a fastball or change up, however on the second strike he would typically change to curve balls.
Our overall research question discussed how the PITCHf/x machine used variables within a game to predict which pitch type would be thrown. In the cloud plot we are able to see the relationship between the position and speed of the ball. We are trying to classify the pitch based off the speed. The two seam and the four seam fast balls are equal in speed but there is clearly different breaks in their positions. The two seamer typically has more break because it has less spin, whereas the four seamer has more spin and stays on path.
Our K-Nearest Neighbor data collection we were able to predict which type of pitch Verlander was going to throw. The computer was able to predict the throws correctly at a 97% rate.