# Load needed libraries
library(tidyverse)
library(readr)
library(knitr)
library(sqldf)
Home Run Exit Velocity and Distance
I am creating a simple model to show if there is a direct
correlation with the exit velocity of a ball hit out of the park and the
distance that it travels. I found a dataset that contains home run data
between 2006 and 2017. In the dataset are fields like exit velocity,
distance, elevation angle, etc. The data was collected by the now
discontinued hittrackeronline.com. I will be using only 2017 home run
data.
# filename <- "/Users/Audiorunner13/CUNY MSDS Course Work/DATA605 Fall 2022/Week 11/HR Tracker.csv"
filename <- tempfile()
download.file("https://raw.githubusercontent.com/audiorunner13/Masters-Coursework/main/DATA605%20Fall%202022/Week%2011/HR%20Tracker.csv",filename)
hr_2017 <- read.csv.sql(filename, "select * from file where GAME_DATE > '01/01/2017' and EXIT_VELOCITY > 70", sep=",")
hr_by_player_2017 <- (head(hr_2017,1000))
plot(hr_by_player_2017[,'EXIT_VELOCITY'],hr_by_player_2017[,'TRUE_DISTANCE'],main="Home Runs Distance by Exit Velocity",xlab="Exit Velocity (mph)",ylab="Distance (ft)")

After creating a simple plot of the distance by the exit velocity of
the hit ball, one will notice the right upward trend of distance as exit
velocity increases. The question is just how strong of a correlation is
it.
(hr_by_player_2017_lm <- lm(TRUE_DISTANCE ~ EXIT_VELOCITY, data = hr_by_player_2017))
##
## Call:
## lm(formula = TRUE_DISTANCE ~ EXIT_VELOCITY, data = hr_by_player_2017)
##
## Coefficients:
## (Intercept) EXIT_VELOCITY
## 0.2302 3.8417
I then use the linear model function to calculate the y-intercept
and the slope and the residuals. We fit the plot with an abline to
illustrate if the data fits the model well.
plot(TRUE_DISTANCE ~ EXIT_VELOCITY, data = hr_by_player_2017)
abline(hr_by_player_2017_lm, col='red')

summary(hr_by_player_2017_lm)
##
## Call:
## lm(formula = TRUE_DISTANCE ~ EXIT_VELOCITY, data = hr_by_player_2017)
##
## Residuals:
## Min 1Q Median 3Q Max
## -71.92 -11.60 1.85 13.25 38.99
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.2302 13.0252 0.018 0.986
## EXIT_VELOCITY 3.8417 0.1255 30.622 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 18.44 on 998 degrees of freedom
## Multiple R-squared: 0.4844, Adjusted R-squared: 0.4839
## F-statistic: 937.7 on 1 and 998 DF, p-value: < 2.2e-16
The large t-value ratio of 30.62 there is strong evidence of a
correlation between the exit velocity of the hit ball and the distance
that it travels.