Background
The National Football League (also
known as the NFL) is the major professional American football league in
the United States. The NFL is composed of 32 teams and is split into two
conferences, the AFC and the NFC. The NFL posts publically the
statistics for each player and team for each year of play, also known as
a season, on their official website.
Question for Analysis:
Is there correlation for number of
passing yards and touch downs for Quarter Backs in the NFL?
To perform the statistical
analysis for this question, I have taken the following data from the
official NFL website for the 2025 season. The data collected includes
quarterbacks that completed at least one touch down during the 2025
season, the name of the quarterbacks, the team played for, number of
touchdowns, and number of passing yards. I collected data of 62
quarterbacks total. The number of passing yards and touchdowns is
cumulative per season, not per individual game for the data that I will
be using. Quarterbacks with no completed touchdowns for the 2025 season
have not been considered in the data set to prevent skew. Players whose
teams played more games are more likely to have higher passing yards as
well as touchdowns, but this should not effect the
distribution.
Table of Data:
datatable(quarterbackdata)
tdpy <- select(quarterbackdata, c(`PassingYrds`,`TD`))
Mathematical Model:
\[
\underbrace{Y_i}_\text{Touchdowns} = \overbrace{\beta_0}^\text{y-int} +
\overbrace{\beta_1}^\text{slope} \underbrace{X_i}_\text{Passing Yards} +
\epsilon_i \quad \text{where} \ \epsilon_i \sim N(0, \sigma^2)
\]
The estimated regression line is
written as:
\[
\underbrace{\hat{Y}_i}_\text{Touchdowns} = \overbrace{b_0}^\text{est.
y-int} + \overbrace{b_1}^\text{est. slope}
\underbrace{X_i}_\text{Passing Yards}
\]
Hypothesis:
For my hypothesis, my null
hypothesis is that the slope of the linear regression model is zero,
meaning that passing yards do not have correlation or relationship to
number of touch downs acheived by a quarterback. For my alternative
hypothesis, the slope of the linear regression model is not equal to
zero, meaning that there is relationship or correlation between passing
yards and touchdowns acheived by a quarterback. For my significance
level, I am going to use 0.05.
\[
\begin{aligned}
H_0&: \beta_1 = 0 \quad \text{(No relationship between passing yards
and touchdowns)}\\
H_a&: \beta_1 \neq 0 \quad \text{(Relationship between passing yards
and touchdowns exists)}\\
\alpha &= 0.05
\end{aligned}
\]
Assumptions:
When creating a linear regression
model, there are five assumptions that are made: 1. The true regression
relationship between Y and X variables are linear 2. The normal errors
are normally distributed with a mean of zero 3. The variance of the
error terms are constant for all values 4. The X values are fixed and
measured without error 5. The error terms are independent
library(ggplot2)
library(plotly)
tdpy.lm <- lm(TD ~ PassingYrds, data=tdpy)
par(mfrow=c(1,3))
plot(tdpy.lm, which=1:2)
plot(tdpy.lm$residuals, col="firebrick", pch=19)

For this linear regression model,
all of the assumptions above are met.The residuals found in the data set
included data points 1, 25, and 17. These data points represent the
quarterbacks Matthew Stafford, Tyler Shough, and Cam Ward. The residuals
were not removed from the analysis, because they would eventually even
out in the data set.
Analysis:
tdpy.lm <- lm(TD ~ PassingYrds, data=tdpy)
pander(summary(tdpy.lm))
| (Intercept) |
-1.013 |
0.5835 |
-1.736 |
0.08776 |
| PassingYrds |
0.007235 |
0.0002409 |
30.03 |
7.54e-38 |
Fitting linear model: TD ~ PassingYrds
| 62 |
2.724 |
0.9376 |
0.9366 |
Intepreation of the
Analysis:
The hypothesis test provides a
p-value of 7.54e-38, therefore, there is a positive relationship between
passing yards and touchdowns for quarterbacks in the NFL for the 2025
season. The estimated passing yards per touchdown is 0.007235, meaning
that per passing yard, 0.007235 touchdowns are completed. The intercept
estimate is -1.013, meaning that when a quarterback does not have any
passing yards, they have -1.013 completed touchdowns. This is not
possible, seeing as a player cannot score negative amounts of
touchdowns. The R squared value observed in the analysis is 0.9366,
meaning that 93.66% of the quarterbacks fit the estimation of touchdowns
based on their passing yards. For this analysis, an alpha level of 0.05
was used. The p-value gathered from the analysis is 7.54e-38, meaning
that we can fail to reject the null hypothesis and there is a
relationship or correlation between passing yards and completed
touchdowns.
Regression Graphic:
library(ggplot2)
library(plotly)
plot <- ggplot(quarterbackdata, aes( x = `PassingYrds`, y = `TD`)) +
geom_smooth(method= "lm", se= TRUE, color= "firebrick", linetype= "solid") +
geom_point(aes(text = paste("Player:", `QuarterbackName`, "\nTeam:", `Team`)),
size = 3, color="midnightblue", alpha = 0.1) +
labs(title = "Regression Model: Passing Yards VS Touchdowns",
x = "Passing Yards",
y = "Touchdowns") +
theme_minimal()
plot2 <- ggplotly(plot, tooltip = "text")
plot2
Assuming that the relationship is
linear, the equation of the fitted line shown in the plot above
is:
\[
\underbrace{\hat{Y}_i}_\text{Touchdowns} = -1.013\text + 0.007235
\underbrace{X_i}_\text{Passing Yards}
\]
Conclusion:
The hypothesis test provides a
p-value of 7.54e-38, therefore, there is a positive relationship between
passing yards and touchdowns for quarterbacks in the NFL for the 2025
season. The estimated value of the slope is 0.007235, meaning that for
every passing yard, there are 0.007235 touchdowns completed by
quarterbacks. The R squared value is 0.9366, meaning that there is high
correlation, or relationship, between passing yards and touchdowns
completed by quarterbacks in the NFL. This means also that 93.66% of the
variance in the dependent variable (touchdowns) is explained by the
independent variable (passing yards). The other 6.34% of variance can be
explained by factors such as “the red zone squeeze” or when offensive
football players get closer and closer to the redzone to complete a
touchdown, therefore, causing them to have less passing
yards.