July 30, 2015
Justin Verlander is a Major League Baseball pitcher who is currently playing for the Detriot Tigers. In 2006, he was awarded the Rookie of the Year Award, and in 2011 was awarded the Al Cy Young Award. In MLB, every single pitch is tracked through a system called Pitch FX. It records variables such as speed, horizontal movement, and vertical movement of the pitch to accurately determine what type of pitch was thrown.
We set out to see if the horizontal movement of Verlander's pitches was a factor when trying to predict the type of pitch thrown.
Justin Verlander throws 5 different types of pitches:
-FF- Four-Seam Fastball -CH- Change-Up -CU- Curveball -FT- Two-Seam Fastball -SL- Slider
A classification tree can be used to analyze multiple numerical variables for set of input values. In this case, the root of the tree analyzes the first variable, speed, at the first node. The input values either go to the left or right branch of that node to a higher or lower value where another variable can be tested at the end of that branch. Once again there is a division to either the higher or lower value branch where a third variable is analyzed to make the most accurate preidiction of pitch type. With this data, the classification tree was the best way to describe relationships between three variables in order to predict the pitch type more accurately than only looking at one variable. Rstudio is capable to analyze these input values in a three dimmensional model
In this data there was one pitch that was significantly slower than the rest, thrown at 59 MPH. To minimize error, we cut out this outlier, only using pitches above 70 MPH.
In this graph, we analyzed to see if how far the pitch was thrown horizontally affected how far the pitch ended up vertically
In this graph, we analyzed to see speed affected the change in horizontal distance of the ball fromt he pitcher's mound to the plate.
In this graph, we analyzed how the speed of the pitch affects how the ball was thrown vertically.
The first chart is of 10 random pitches. The decimal number is the percentage of the likliness the type of pitch it'll be based on our variables. To get the data for this chart we used a training group, consisting of a random 2/3 of our data set to make perdictions for our test group.
The graph underneath it shows the predicted pitch type for the 10 pitches.
## CH CU FF FT SL ## 1 0.002395927 0.00000000 0.988769093 0.007786763 0.001048218 ## 3 0.000000000 0.96333938 0.000000000 0.000000000 0.036660617 ## 5 0.002395927 0.00000000 0.988769093 0.007786763 0.001048218 ## 16 0.002395927 0.00000000 0.988769093 0.007786763 0.001048218 ## 20 0.002395927 0.00000000 0.988769093 0.007786763 0.001048218 ## 23 0.955454889 0.00000000 0.001132503 0.021140053 0.022272556 ## 27 0.002395927 0.00000000 0.988769093 0.007786763 0.001048218 ## 34 0.002395927 0.00000000 0.988769093 0.007786763 0.001048218 ## 43 0.002583979 0.05254091 0.000000000 0.000000000 0.944875108 ## 44 0.002583979 0.05254091 0.000000000 0.000000000 0.944875108
When we tested our test group we perdicited 5065 pitches correctly out of 5240. Therefore the accuracy of our predictions is 96.7%
## [1] FF CU FF FF FF CH FF FF SL SL ## Levels: CH CU FF FT SL
## [1] 5065
## [1] 5240
## [1] 0.9666031
The graph shows what we predicted the pitches to be and what they actually where. For example we predicted there would be 896 but there was actually 862.
| CH | CU | FF | FT | SL | |
|---|---|---|---|---|---|
| CH | 857 | 0 | 1 | 22 | 16 |
| CU | 0 | 902 | 0 | 0 | 40 |
| FF | 4 | 0 | 2302 | 24 | 3 |
| FT | 0 | 0 | 44 | 653 | 0 |
| SL | 1 | 19 | 0 | 0 | 351 |
This is a 3D chart of how the pitch was thrown after applying speed, horizontal and verticle movement. The y-axis has been rotated for view from y=180.
This is a 3D chart of how the pitch was thrown after applying speed, horizontal and verticle movement. The y-axis has been rotated for a view from y=90.
Overall, we measured to see how the pitch was effected by speed, horizontal movement, and verticle movement based on his pitching facts from the 2006 season to the 2012 season.
After analyzing the results from our predictions, we have concluded that the horizontal movement of the ball does effect the pitch type. To accurately predict the pitch type, two other numerical variables, such as speed and horizontal movement, are needed.