This week I will be selecting three variables for the upcoming tasks.
Those variables are:
Continuous variable that is the most “valuable” and will be called the response variable: PTS_per_100 (Points per 100 possessions)
Categorical column than I expect to influence the response variable above and will be called the explanatory variable: Playoffs
A second continuous variable that might influence the response variable: FG_Percent
Using these three variables I will be completing two tasks, an ANOVA test and a Linear Regression Model. I will then summarize the results and explain what can be taken away from each task.
The null hypothesis for the ANOVA test is “The mean PTS_per_100 is the same for playoff teams and non-playoff teams”.
Now I will test this hypothesis:
anova_model <- aov(PTS_per_100 ~ Playoffs, data = NBA)
summary(anova_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## Playoffs 1 3018 3018.2 149.9 <2e-16 ***
## Residuals 1400 28188 20.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
To summarize the results of the ANOVA test I will be focusing on the P-value.
The P-value is 2e^-16 or 0.0000000000000002. This is much smaller than the typical significance level of 0.05. Because this p-value is so small we are rejecting the null hypothesis.
Here is a box plot to further illustrate the results:
boxplot(PTS_per_100 ~ Playoffs,
data = NBA,
main = "Points per 100 Possessions by Playoff Status",
xlab = "Playoff Qualification",
ylab = "Points per 100 Possessions")
There is clearly strong statistical evidence that the average scoring rate (PTS_per_100) differs between teams that make the playoffs and teams that miss the playoffs.
This would suggest that teams that make the playoffs tend to score more effectively than teams that do not make the playoffs.
The main insight I gather from this ANOVA test is that teams that make the playoffs score significantly more points per 100 possessions than teams that miss the playoffs. This is significant because it confirms to us that offensive production is strongly associated with being able to make the playoffs. Therefore, improving your ability to score should increase your chances of making the playoffs. An additional question that could be further explored is: does defense (points allowed) matter more or less than scoring points yourself?
Now I will be building a linear regression model using the FG_Percent column and evaluating its fit to the response variable of PTS_per_100.
Here is the model:
lm_model <- lm(PTS_per_100 ~ FG_Percent, data = NBA)
summary(lm_model)
##
## Call:
## lm(formula = PTS_per_100 ~ FG_Percent, data = NBA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.2720 -2.4021 0.1518 2.5109 13.6736
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 49.901 2.392 20.86 <2e-16 ***
## FG_Percent 122.435 5.160 23.73 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.987 on 1400 degrees of freedom
## Multiple R-squared: 0.2868, Adjusted R-squared: 0.2863
## F-statistic: 563 on 1 and 1400 DF, p-value: < 2.2e-16
The P-value is once again 2e^-16 or 0.0000000000000002.
Based off of the p-value we can conclude that the coefficients are extremely statistically significant. This would mean that FG_Percent (Field Goal Percentage) is a strong predictor of scoring output.
The R^2 value is 0.2868. This tells us that about 28.7% of the variation in scoring can be explained by field goal percentage alone. This is a reasonable amount to be explained and tells us that the shooting efficiency does matter, but there are other factors that affect scoring beyond just Field Goal Percentage.
plot(NBA$FG_Percent, NBA$PTS_per_100,
xlab = "Field Goal Percentage",
ylab = "Points per 100 Possessions",
main = "Relationship Between Shooting Efficiency and Scoring")
abline(lm_model)
After reviewing the model we can recommend that teams should prioritize shooting efficiency. This could include emphasizing quality shot selection, improving the offensive game plan to find more open shots, or focus on improving players’ abilities to knock down shots efficiently.
The insight I gather from this linear regression model is that teams that shoot more efficiently score significantly more points on a per 100 possession basis. This is significant shooting efficiency is a massive part of offensive production and success for teams. Any small improvements that can be made in Field Goal percentage can increase the amount of points a team scores significantly. Some potential questions to further investigate would be how three point percentage or free throw percentage differ in their connection to scoring when compared to the field goal percentage?