This is my week 6 data dive where created two metrics: Shooting Efficiency (compared to FG%) and Distance-Adjusted Efficiency (compared to average shot distance). I did a deep analysis on the calculated variables I created by plotting them against their respective variable, analyzing both the correlation and confidence intervals to further understand the value, potential, or added analysis to inspect shooting efficiency for NBA players.
# Creating the mutated variables Distance-Adjusted Efficiency and Shooting Efficiency
df2 <-
df |>
mutate(
fg_vs_3p_diff = fg_percent - fg_3p, # Shooting Efficiency
efficiency_per_foot = fg_percent / dist # Distance-Adjusted Efficiency
) |>
filter(
!is.na(fg_percent),
!is.na(fg_3p),
!is.na(dist),
dist > 0
)
# Creating a scatterplot comparing FG% and the "Shooting Efficiency" Metric
df2 |>
ggplot(aes(x = fg_percent, y = fg_vs_3p_diff)) +
geom_point(alpha = 0.7) +
geom_smooth(method = "lm", se = FALSE) +
scale_x_continuous(labels = percent_format()) +
scale_y_continuous(labels = percent_format()) +
labs(
title = "Overall FG% vs Shooting Efficiency (FG% − 3P%)",
x = "FG%",
y = "Shooting Efficiency (FG% - 3P%)",
) +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
I created the variable “Shooting Efficieny” which takes the difference
between FG% and 3PT%. The way to interpret to this variable is positive
values, which refers to a player shooting better within the three point
line. Negative values indicate efficient three-point shooting compared
to overall FG%.
Insight: The plot shows a positive linear relationship between FG%
and Shooting Efficiency. As FG% increases, Shooting Efficiency tends to
increase as well. Players with a higher FG% tend to have a higher
“Shooting Efficiency” meaning their efficiency is driven more by closer
shots than three-point shooting.
Significance: This reinforces different NBA shot profiles: interior
scoring (shots closer to the basket) generally have a higher efficiency
than three-point shooting. Three-point specialists might have efficient
3PT%, but not necessarily efficient FG% overall. Outliers: Values that
appear in the top right region have both high FG% and Shooting
Efficiency. These players are most likely rim-dominant bigs or
low-volume three-point shooters. Negative values for Shooting Efficiency
refer to players who have higher 3PT% than their overall FG%, which is
unusual in most cases. This applies to three-point shooters who struggle
to score inside the perimeter. Overall outliers that appear are players
who have low-volume shooting, which causes extreme cases for both FG%
and Shooting Efficiency, which should be removed in the future to
conduct proper league-wide analysis. Further Questions: How many
outliers exist if we filter for either players who played a certain
number of minutes or having a minimum 3PA?
# Creating a scatterplot to compare Average Distance and "Distance-Adjusted Efficiency" metric
df2 |>
ggplot(aes(x = dist, y = efficiency_per_foot)) +
geom_point(alpha = 0.7) +
geom_smooth(method = "lm", se = FALSE) +
scale_y_continuous(labels = percent_format()) +
labs(
title = "Average Shot Distance vs Distance-Adjusted Efficiency",
x = "Average Shot Distance (feet)",
y = "Distance Adjusted Efficiency (FG% / Distance)"
) +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
I created the variable “Distance-Adjusted Efficiency” which takes the ratio of FG% and Distance (FG/Distance). High values mean the player shoots efficiently relative to how far they shoot. Low values means the efficiency drops as shot distance increases.
Insight: There is a strong negative relationship between average shot
distance and Distance Adjusted Efficiency. As average shot distance
increases, Distance-Adjusted Efficiency tends to decrease, which is
consistent with the idea that farther shots are harder to convert.
Significance: This plot shows the trade off between shot difficulty and
efficiency. Players who take closer shots (low distance) achieve higher
Distance Adjusted Efficiency while players with higher average shot
distance have significantly lower Distance Adjusted Efficiency. This
reinforces how shot distance plays a role in shooting efficiency.
Outliers: There are high efficiency outliers at very low distances (high
FG% with short average distance). These might be due to noise or most
likely rim finishers or dunk-heavy players (positional role). There is
also low efficiency at long distances showing near 0% or negative
Distance Adjusted Efficiency, which suggests limited long distance
efficiency. Clearly the plot shows a non-linear relationship so using a
non-linear model to compare these variables in the future would be
better for analysis. Further Questions: Does this relationship differ by
position (guards vs bigs)?
# Creating the correlation matrix for FG% and Shooting Efficiency
cor_matrix <-
df2 |>
select(
fg_percent,
fg_vs_3p_diff,
) |>
cor(use = "complete.obs", method = "pearson")
cor_matrix
## fg_percent fg_vs_3p_diff
## fg_percent 1.0000000 0.4493096
## fg_vs_3p_diff 0.4493096 1.0000000
Insight: According to the correlation matrix for FG% and Shooting Efficiency, there is a moderate positive correlation between the two variables, which makes sense mathematically since in the formula Shooting Efficiency = FG% - 3PT%, FG% appears in both variables. The plot shows a strong upward trend confirming this relationship. Significance: Even with Shooting Efficiency including FG% in its formula, the fact the correlation isn’t 1 suggests 3PT% plays some significant role in shaping overall FG% and overall FG% is not purely driven by close shots. This supports the idea that modern players are more diverse with their scoring compared to previous generations. Further Question: Would the correlation strengthen or weaken based on a chosen position (guards vs bigs)?
# Creating the correlation matrix for Average Shot Distance and Distance Adjusted Efficiency
cor_matrix <-
df2 |>
select(
dist,
efficiency_per_foot,
) |>
cor(use = "complete.obs", method = "pearson")
cor_matrix
## dist efficiency_per_foot
## dist 1.0000000 -0.7593307
## efficiency_per_foot -0.7593307 1.0000000
Insights: According to the correlation matrix for Average Shot Distance and Distance Adjusted Efficiency, there is a strong negative correlation between the two variables, which makes sense intuitively (for basketball) since longer average shot tends to lead to lower FG%. The plot shows a strong downward trend confirming this relationship. Significance: This shows that shot distance plays a major role in shaping scoring efficiency. Players who score closer to the basket have higher efficiency while long-range heavy shot profiles compress distance-adjusted metrics. This supports modern offensive tactics of spacing increasing shot value at the cost of efficiency. Further Question: Would the correlation strengthen or weaken based on a chosen position (guards vs bigs)?
# Calculating the 95% confidence interval for the Shooting Efficiency Metric
ci_fg_diff <-
t.test(df2$fg_vs_3p_diff, conf.level = 0.95)
ci_fg_diff$conf.int
## [1] 0.1213756 0.1318723
## attr(,"conf.level")
## [1] 0.95
Insights: The 95% confidence interval for Shooting Efficiency (the mean difference between FG% and 3PT%) is approximately [0.121, 0.132], meaning there is a 95% confidence that the true average Shooting Efficiency for NBA players lies within this interval. The positive, narrow interval shows that on average, overall FG% exceed 3PT% by about 12-13 percentage points. Significance: Looking at the sample of NBA players, overall FG% exceeds 3PT% by a statistical meaninful margin. The narrow interval also suggests large sample size and strong precision in the estimate. Further Question: What happens if minutes played are weighted for players?
# Calculating the 95% confidence interval for the Distance Adjusted Efficiency Metric
ci_eff_per_foot <-
t.test(df2$efficiency_per_foot, conf.level = 0.95)
ci_eff_per_foot$conf.int
## [1] 0.03847939 0.04107020
## attr(,"conf.level")
## [1] 0.95
Insights: The 95% confidence interval for Distance Adjusted Efficiency is approximately [0.038, 0.041], meaning there is a 95% confidence that the typical Distance Adjusted Efficiency falls for NBA players lies between 3.8% and 4.11% per foot. The narrow interval shows stable estimates and strong consistency across players, reinforcing the highly structured distance-efficiency relationship. Significance: Looking at the sample of NBA players, this suggests a stable relationship between shot distance and scoring efficiency. The narrow interval suggests that conclusions about Distance Adjusted Efficiency are unlikely to change with resampling. Further Question: Does the confidence change significantly if including previous seasons, specifically the seasons before the three-point revolution?