# Load needed libraries
library(tidyverse)
library(readr)
library(knitr)
library(sqldf)

Home Run Exit Velocity and Distance

I am creating a simple model to show if there is a direct correlation with the exit velocity of a ball hit out of the park and the distance that it travels. I found a dataset that contains home run data between 2006 and 2017. In the dataset are fields like exit velocity, distance, elevation angle, etc. The data was collected by the now discontinued hittrackeronline.com. I will be using only 2017 home run data.
# filename <- "/Users/Audiorunner13/CUNY MSDS Course Work/DATA605 Fall 2022/Week 11/HR Tracker.csv"
filename <- tempfile()
download.file("https://raw.githubusercontent.com/audiorunner13/Masters-Coursework/main/DATA605%20Fall%202022/Week%2011/HR%20Tracker.csv",filename)

hr_2017 <- read.csv.sql(filename, "select * from file where GAME_DATE > '01/01/2017' and EXIT_VELOCITY > 70", sep=",")
hr_by_player_2017 <- (head(hr_2017,1000))
plot(hr_by_player_2017[,'EXIT_VELOCITY'],hr_by_player_2017[,'TRUE_DISTANCE'],main="Home Runs Distance by Exit Velocity",xlab="Exit Velocity (mph)",ylab="Distance (ft)")

After creating a simple plot of the distance by the exit velocity of the hit ball, one will notice the right upward trend of distance as exit velocity increases. The question is just how strong of a correlation is it.
(hr_by_player_2017_lm <- lm(TRUE_DISTANCE ~ EXIT_VELOCITY, data = hr_by_player_2017))
## 
## Call:
## lm(formula = TRUE_DISTANCE ~ EXIT_VELOCITY, data = hr_by_player_2017)
## 
## Coefficients:
##   (Intercept)  EXIT_VELOCITY  
##        0.2302         3.8417
I then use the linear model function to calculate the y-intercept and the slope and the residuals. We fit the plot with an abline to illustrate if the data fits the model well.
plot(TRUE_DISTANCE ~ EXIT_VELOCITY, data = hr_by_player_2017)
abline(hr_by_player_2017_lm, col='red')

summary(hr_by_player_2017_lm)
## 
## Call:
## lm(formula = TRUE_DISTANCE ~ EXIT_VELOCITY, data = hr_by_player_2017)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -71.92 -11.60   1.85  13.25  38.99 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     0.2302    13.0252   0.018    0.986    
## EXIT_VELOCITY   3.8417     0.1255  30.622   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.44 on 998 degrees of freedom
## Multiple R-squared:  0.4844, Adjusted R-squared:  0.4839 
## F-statistic: 937.7 on 1 and 998 DF,  p-value: < 2.2e-16
The large t-value ratio of 30.62 there is strong evidence of a correlation between the exit velocity of the hit ball and the distance that it travels.