Response Variable: Field Goal Percentage (FG%)

\[ \begin{align} H_0 &: \text{All positions have the same FG%.} \\ H_1 &: \text{At least one position has a different FG%.} \end{align} \]

Insights: The chosen response variable is field goal % since it is one of the most widely used measures (analysts, coaches, and even fans look at FG% when evaluating offensive performance of a player) of offensive efficiency in basketball. Significance: Analyzing what factors affect shooting efficiency (FG%) can help analysts better interpret player performance and compare them more meaningful.
Further Question: Instead of using FG%, are there advanced metrics such as effective field goal percentage that could provide an even more accurate measurement of shooting performance.

# Creating the anova model comparing FG% and position
anova_model <- aov(fg_percent ~ pos, data = df)

summary(anova_model)
##               Df Sum Sq Mean Sq F value Pr(>F)    
## pos            4   6.88  1.7209   133.1 <2e-16 ***
## Residuals   3216  41.57  0.0129                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 36 observations deleted due to missingness
# Box Plot visual summarizing FG% for each position
ggplot(df, aes(x = pos, y = fg_percent)) +
  geom_boxplot() +
  scale_y_continuous(labels = scales::percent_format()) +
  labs(
    title = "Field Goal Percentage by Position",
    x = "Position",
    y = "FG%"
  ) +
  theme_minimal()
## Warning: Removed 36 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

Insights: The ANOVA test assesses whether average FG% differs across positions. The results suggest at least one position group has a different FG%. The boxplot visual confirms this by showing, specifically centers and power forwards generally have higher shooting percentages than guards. Significance: This difference in FG% by position likely reflects the different shot selection of players in different roles. Centers and power forwards tend to take more shots closer to the basket, which are easier to convert, while guards take more perimeter(further away from the basket) shots. This suggests that comparing only FG% across positions may be misleading unless positional context is included.
Further Question: Where is the largest gap in shooting efficiency between the positions?

Interpretation: The ANOVA test shows that FG% differs significantly across positions since the p-value < 0.05. Centers and Power Forwards generally have higher FG% than guards (PG and SG), likely because they take more shots near the basket. Guards typically attempt more perimeter shots, lowering overall FG%.

Why This Matters: Since the ANOVA results suggest shooting efficiency differs across positions, analysts and coaches in the NBA should consider positional roles when evaluating players. For example, comparing FG% for centers and point guards is may lead to incorrect conclusions without providing the appropriate context.

# Creating a linear regression model comparing FG% and average shot distance
lm_model <- lm(fg_percent ~ dist, data = df)

summary(lm_model)
## 
## Call:
## lm(formula = fg_percent ~ dist, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.60375 -0.02913  0.01207  0.04808  0.69388 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.6133500  0.0053222  115.25   <2e-16 ***
## dist        -0.0120013  0.0003598  -33.36   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1058 on 3219 degrees of freedom
##   (36 observations deleted due to missingness)
## Multiple R-squared:  0.2569, Adjusted R-squared:  0.2566 
## F-statistic:  1113 on 1 and 3219 DF,  p-value: < 2.2e-16
# Visualizing FG% and average shot distance
ggplot(df, aes(x = dist, y = fg_percent)) +
  geom_point(alpha = 0.4) +
  geom_smooth(method = "lm", se = FALSE, color = "green") +
  scale_y_continuous(labels = scales::percent_format()) +
  labs(
    title = "Average Shot Distance vs FG%",
    x = "Average Shot Distance (feet)",
    y = "FG%"
  ) +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 36 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 36 rows containing missing values or values outside the scale range
## (`geom_point()`).

Looking at the coefficient, when shot distance is 0 feet, the predicted FG% is about 61%. For every 1 foot increase in average shot distance, FG% decreases by about 1.2 percentage points.

Insights: The regression model (paired with the visual) shows a negative relationship between average shot distance and FG%. As average shot distance increases, FG% decreases, which is consistent with the concept that further shots are generally more difficult to make. Significance: The R squared is 0.257, which means about 25.7% of the variability in FG% is explained by average shot distance in this linear model. This indicates that shot distance is an important factor in shooting efficiency, but probably other variables also play a role in determining a player’s FG%. Further Question: If including more variables into the model to understand impact of FG%, would a different model be necessary?

Recommendation: According to the regression results, I would recommend teams building their offensive strategy around getting closer shots to the basket, such as cuts, certain pick and roll actions, and high-percentage shots close to the basket.