Week 12 Data Dive - Time Based Data

For this week’s data dive I will be creating a time series visual and then performing analysis with a linear regression model and smoothing.

The time-based column of data I will be using for this is the “Season” column from my data. I will then be using the “FG_Percent” column as the response variable to analyze over time. From this I will create a tsibble object of the date and response variable.

Next, I will use a linear regression model to detect any upwards or downwards trends.

Lastly, I will use smoothing to detect at least one season in the data. I will attempt to illustrate seasonality using ACF and PACF.

Time Series

# Extract starting year (e.g., "2010-11" -> 2010)
NBA_ts <- NBA

NBA_ts$Year <- as.numeric(substr(NBA_ts$Season, 1, 4))

# Keep only needed columns
NBA_ts <- NBA_ts[, c("Year", "FG_Percent")]

# Remove missing values
NBA_ts <- NBA_ts[complete.cases(NBA_ts), ]
NBA_yearly <- aggregate(FG_Percent ~ Year, data = NBA_ts, FUN = mean)

library(tsibble)
## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr
## 
## Attaching package: 'tsibble'
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, union
NBA_tsibble <- as_tsibble(NBA_yearly, index = Year)
plot(NBA_ts$Year, NBA_ts$FG_Percent,
     type = "l",
     xlab = "Season (Year)",
     ylab = "Field Goal Percentage",
     main = "FG% Over Time")

The dip around the year 2000 stands out to me immediately. I also quickly noticed that the mid 1980s has the highest average field goal percentage as well.

Insights, Significance, and Questions

This visual offers a snapshot of how the average field goal percentage changes over time throughout the history of the NBA. This is significant because it gives us a reference point or idea of what to expect while doing further analysis. My initial question that I would want to look into would be why was there such a dop in FG% in 2000?

Linear Regression Model

trend_model <- lm(FG_Percent ~ Year, data = NBA_ts)

summary(trend_model)
## 
## Call:
## lm(formula = FG_Percent ~ Year, data = NBA_ts)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.062604 -0.013729 -0.000354  0.013630  0.075271 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.3377332  0.0719858   18.58   <2e-16 ***
## Year        -0.0004375  0.0000360  -12.15   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01964 on 1400 degrees of freedom
## Multiple R-squared:  0.0954, Adjusted R-squared:  0.09476 
## F-statistic: 147.7 on 1 and 1400 DF,  p-value: < 2.2e-16

Some key things I would like to point out:

Insights, Significance, and Questions

The key insights I gather from this model are that FG% has slightly declined over time and that the change is gradual. The time alone doesn’t offer much explanation for the change in field goal percentage. There would need to be other factors explored such as shot selection, pace, or other sources of data. This is significant in helping us to understand how much value we put into the model and the analysis it offers. A question that I would want to further explore would be would including three point percentage, and two point percentage help add more explanation for the change in field goal percentage.

Smoothing

# Simple moving average (window = 5 seasons)
smooth_fg <- filter(NBA_ts$FG_Percent, rep(1/5, 5), sides = 2)

plot(NBA_ts$Year, NBA_ts$FG_Percent,
     type = "l",
     xlab = "Year",
     ylab = "FG%",
     main = "FG% with Smoothing")

lines(NBA_ts$Year, smooth_fg, lwd = 2)

The smoothing visual helps show the gradual changes in FG% and reveals specifc instances where efficiency shifted significantly.

acf(NBA_ts$FG_Percent,
    main = "ACF of FG%")

The ACF shows us that FG% is strongly related to nearby seasons and therefore there is persistence over time in the history of gradual change in field goal percentage across the history of the NBA. This gives us a decent idea of the seasonality as well.

pacf(NBA_ts$FG_Percent,
     main = "PACF of FG%")

The PCAF shows us important lag relationships with the spikes in the visual. This helps us to learn about the structure in the data. This could help us find what past seasons had a stronger influence on current FG%. This also helps with illustrating the seasonality.

Insights, Significance, and Questions

The insights I gather from the smoothing, ACF, and PCAF, are that while time has a measurable effect on field goal percentage, it is not the main cause of changes in shooting efficiency. This is significant because it tells us that we would need to include more factors if we want to better understand what causes the changes in field goal percentage over time. Naturally, this leads to the potential question of what other factors account for changes in field goal percentage over the history of the NBA?