# Converting "Season" column into Date format
df_time <-
df |>
mutate(
season_start_year = as.integer(substr(season, 1, 4)),
season_date = as.Date(paste0(season_start_year, "-10-01"))
)
# Confirming conversion worked
df_time |>
select(season, season_start_year, season_date) |>
distinct() |>
arrange(season_date)
## # A tibble: 5 × 3
## season season_start_year season_date
## <chr> <int> <date>
## 1 2018-19 2018 2018-10-01
## 2 2021-22 2021 2021-10-01
## 3 2022-23 2022 2022-10-01
## 4 2023-24 2023 2023-10-01
## 5 2024-25 2024 2024-10-01
season_summary <-
df_time |>
group_by(season, season_date, season_start_year) |>
summarise(
mean_fga_3p = mean(fga_3p, na.rm = TRUE),
n_players = n(),
.groups = "drop"
) |>
arrange(season_date)
season_summary
## # A tibble: 5 × 5
## season season_date season_start_year mean_fga_3p n_players
## <chr> <date> <int> <dbl> <int>
## 1 2018-19 2018-10-01 2018 0.378 622
## 2 2021-22 2021-10-01 2021 0.399 715
## 3 2022-23 2022-10-01 2022 0.407 609
## 4 2023-24 2023-10-01 2023 0.410 657
## 5 2024-25 2024-10-01 2024 0.422 654
# Creating a tibble with average 3pt attempt rate for each season
time_tibble <-
season_summary |>
select(season_date, mean_fga_3p)
time_tibble
## # A tibble: 5 × 2
## season_date mean_fga_3p
## <date> <dbl>
## 1 2018-10-01 0.378
## 2 2021-10-01 0.399
## 3 2022-10-01 0.407
## 4 2023-10-01 0.410
## 5 2024-10-01 0.422
Note: Since there is only five NBA seasons in the dataset, there is not enough information to split the data into separate eras and fit multiple line trends. Considering the impact of different time-windows revealing any patterns, this could also not be done with only five seasons. Smaller windows would have few observations to interpret reliably.
# Plotting Average 3pt Attempt Rate over seasons
ggplot(time_tibble, aes(x = season_date, y = mean_fga_3p)) +
geom_line() +
geom_point() +
labs(
title = "Average 3-Point Attempt Rate Over Time",
x = "Season",
y = "Average FGA_3P"
) +
theme_minimal()
Insights: The plot shows that average three-point attempt rates have
increased over the past five seasons. Significance: This allows users to
connect league-wide shooting trends to individual shooting trends.
Further Question: How long will this upward trend continue? When will
the peak occur?
# Running regression model to understand trends
trend_model <-
lm(mean_fga_3p ~ season_start_year, data = season_summary)
summary(trend_model)
##
## Call:
## lm(formula = mean_fga_3p ~ season_start_year, data = season_summary)
##
## Residuals:
## 1 2 3 4 5
## 7.373e-05 -4.102e-05 1.303e-03 -2.926e-03 1.590e-03
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.382e+01 9.068e-01 -15.24 0.000613 ***
## season_start_year 7.036e-03 4.486e-04 15.69 0.000563 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.002065 on 3 degrees of freedom
## Multiple R-squared: 0.988, Adjusted R-squared: 0.9839
## F-statistic: 246 on 1 and 3 DF, p-value: 0.0005632
Trend: There is an upward trend in three-point attempt rate over time, meaning across seasons, players are consistently taking a higher proportion of their shots from behind the three-point line. Insights: The regression results show a positive relationship between season_start_year and three-point attempt rate (mean_fga_3p). The coefficient for season_start_year is 0.00704, meaning that on average, the proportion of three-point attempts increases by 0.074 percentage points per season. The season_start_year coefficient is statisically significant (p-value = 0.000563) and the model explains a large portion of the variation in the data (R squared = 0.988), showing a strong association between time and three-point shooting behavior. Significance: This result shows evidence of a drastic shift in NBA shot selection over time, the league-wide movement toward perimeter-oriented play. The steady increase suggests this is not random, but rather a consistent long-term trend in shooting habits. Further Question: Are there certain players or teams that rely more heavily on the three-point shot than others?
# Residual vs Fitted
plot(trend_model, which = 1)
Insights: The residuals are close to zero across all fitted values, meaning the linear model fits the data well overall. However, there is a small, visible pattern in the residuals suggesting slight curvature. Issue/Severity: The small pattern in the residuals suggest the relationship between three-point shot attempt rate and seasons are not completely linear. The spread of residuals is very small and consistent, meaning there is low severity and high confidence in the model. Significance: This shows that a linear model provides a good approximation of the upward trend in three-point attempt rate over time. The small deviation suggests that while the relationship is mostly linear, there may be some minor nonlinear behavior that the model does not capture. However, given the small sample size of seasons assessed, this deviation is not a major concern and do not significantly impact the overall interpretation. Further Question: Would adding a non-linear term better capture the small deviations or is this due to the limited number of seasons in the dataset?
# Using Smoothing to further understand three-point attempt rate over time
ggplot(season_summary, aes(x = season_date, y = mean_fga_3p)) +
geom_point() +
geom_line(alpha = 0.5) +
geom_smooth(method = "loess", se = FALSE) +
labs(
title = "Smoothed Trend in Average 3-Point Attempt Rate",
x = "Season",
y = "Average FGA_3P"
) +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : span too small. fewer data values than degrees of freedom.
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : pseudoinverse used at 17794
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : neighborhood radius 1472
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : There are other near singularities as well. 5.505e+05
# Using Smoothing to further understand three-point attempt rate over time
fga_ts <-
ts(season_summary$mean_fga_3p,
start = min(season_summary$season_start_year),
frequency = 1)
acf(fga_ts, main = "ACF of Average 3-Point Attempt Rate")
Insights: The smoothed trend shows a steady, consistent increase in average three-point attempt rate over time, with no significant drops or cyclical patterns. The increase seems gradual and consistent across seasons, confirming the long-term upward trend. The ACF plot shows a strong positive autocorrelation at lag 1, meaning three-point attempt rate in one season is highly related to the previous season. However, beyond lag 1, the autocorrelation values decrease, but remain within the confidence bounds, meaning there is no strong repeating seasonal pattern in the data. The ACF does not provide strong evidence of repeating seasonality. Significance: These results show that the increase in three-point attempt rate is driven by a continuous, structural change over time rather than a seasonal/cyclical behavior. The strong lag 1 autocorrelation shows that each season builds on the previous one, but the lack of significant correlations at higher lags shows there is no repeating seasonal cycle in shooting behavior. This again reinforces that changes in three-point shooting are not temporary fluctuations, but instead showing the long-term change of play style in the NBA. Also, one note to make is assessing only five basketball seasons makes it difficult to capture seasonality. Further Question: Would including more NBA seasons reveal any cyclical patterns in three-point shooting?