nba <- nba %>%
distinct(Year, Player, Tm, .keep_all = T) %>%
filter(G > 5)
Add day and month to Year column
Use January 1st for each row and convert to date column.(example year updated: 1980/01/01 = January 1, 1980)
nba <- nba %>%
mutate(year_updated = as.Date(paste(Year,'/01/01', sep = "")), .after = Year)
nba_ <- nba %>%
group_by(year_updated) %>%
summarise(total_3PA = sum(`3PA`))
Use year function to pluck the year from the date so that 364 days a year are not filled with NA values.
nba_ts <- as_tsibble(nba_, index = year_updated) %>%
index_by(date = year(year_updated)) %>%
summarise(total_3PA = sum(total_3PA)) %>%
fill_gaps()
Create xts
nba_xts <- xts(x = nba_ts$total_3PA,
order.by = as.Date(paste(nba_ts$date,'/01/01', sep = '')))
nba_xts <- setNames(nba_xts, "total_threes")
nba_ts %>%
ggplot() +
geom_line(mapping = aes(x = date, y = total_3PA)) +
labs(title = "3 Point Attempts by Season")
Here we can see a pretty dramatic increase over time with a couple dips. These dips are associated with NBA lockouts which resulted in a shorter season.
nba_ts %>%
filter(date < year(as.Date('2002/01/01'))) %>%
ggplot() +
geom_line(mapping = aes(x = date, y = total_3PA)) +
ylim(0,90000) +
labs(title = '3 Point Attempts by Season 1980-2001',
subtitle = 'Pre Daryl Morey',
x = 'Season',
y = 'Total 3 Point Attempts')
This graph plots the window of time before Daryl Morey took his first big time basketball operations job for an NBA team. He was very influential in pursuing new age analytics to build his teams, which led to them shooting a lot of threes, sometime too many for their own good.
nba_ts %>%
filter(date >= year(as.Date('2002/01/01'))) %>%
ggplot() +
geom_line(mapping = aes(x = date, y = total_3PA))+
ylim(0,90000) +
labs(title = '3 Point Attempts by Season 2002-2020',
subtitle = 'Daryl Morey has arrived',
x = 'Season',
y = 'Total 3 Point Attempts')
This is the window of time since Daryl More has been in a role as President or Vice President of Basketball Operations. His direct influence would be seen better by looking specifically at the Rockets, however this trend here is a collection of people adopting similar strategies across the league.
nba_ts %>%
ggplot(mapping = aes(x = date, y = total_3PA)) +
geom_line() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "3 Point Attempts by Season",
x = 'Season',
y = 'Total 3 Point Attempts')
## `geom_smooth()` using formula = 'y ~ x'
As we could already tell visually there is a very strong linear trend present. This trend looks pretty consistent over time, however, from 2011-2020 the growth appears to be more rapid than the previous years. This may be due to an overreaction coming out of the lockout (dip at 2012) or the start of a new trend. It would be interesting to see how adding data from 2021 and 2022 season would affect the outlook on this particular aspect of the graph.
pre2012 <- nba_ts %>%
filter(date < year(as.Date('2012/01/01')))
post2012 <- nba_ts %>%
filter(date >= year(as.Date('2012/01/01')))
ggplot() +
geom_line(mapping = aes(x = pre2012$date, y = pre2012$total_3PA)) +
geom_smooth(mapping = aes(x = pre2012$date, y = pre2012$total_3PA),method = "lm", se = FALSE) +
geom_line(mapping = aes(x = post2012$date, y = post2012$total_3PA)) +
geom_smooth(mapping = aes(x = post2012$date, y = post2012$total_3PA),method = "lm", se = FALSE) +
geom_vline(xintercept = 2011.5, color = "red", linetype = "dotted") +
labs(title = "3 Point Attempts by Season",
x = 'Season',
y = 'Total 3 Point Attempts')
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
By separating the trends at the 2012 season we can see that there may be different levels of trend for pre 2012 and 2012-2020. This is not extremely shocking because it took a little time for the league to put together rosters to play this style of basketball.
nba_xts %>%
rollapply(width = 10, \(x) mean(x,na.rm = TRUE), fill = FALSE) %>%
ggplot(mapping = aes(x = Index, y = total_threes)) +
geom_line() +
labs(title = "3 Point Attempts by Season",
subtitle = "10 Season Rolling Average",
x = 'Season',
y = 'Total 3 Point Attempts')
nba_ts %>%
ggplot(mapping = aes(x= date, y = total_3PA)) +
geom_point(size = 1, shape = 'o') +
geom_smooth(span = 0.4, se = FALSE) +
labs(title = "3 Point Attempts by Season",
subtitle = "loess smoothing - span = 0.4",
x = 'Season',
y = 'Total 3 Point Attempts')
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
acf(nba_ts, ci = 0.95, na.action = na.exclude)
pacf(nba_xts, na.action = na.exclude,
xlab = "Lag", main = "PACF for 3 Point Attempts")
Based on the two above plots we can conclude that there is no major seasonality in the data. The largest component of the data is the overall trend from year to year. This makes sense as there is no league wide component, such as an Olympic cycle, that would introduce seasonality larger than 1 to the data. On a day by day level it may be more apparent as one team shooting a lot could raise the total attempts on their specific game days, but as mentioned there is no seasonality from NBA season to season.