Assess the behavior/data-generating process of your time-series using the techniques discussed in class. From your understanding of the process, would you expect the time-series to be mean stationary? Variance stationary? Why or why not?
Code
vehicle_df_plot <-ggplot(data = vehicle_sales_tbl_ts, aes(x = date, y = value)) +geom_line() +xlab("Time") +ylab("Values") +ggtitle("Time vs Values Plot")vehicle_df_plot
According to the time series graph , it’s evident that the series lacks stationary mean and variance. Therefore, our initial approach involves utilizing the rolling standard deviation of the differenced series to examine the data.
If the data appear to be variance non-stationary (using a rolling SD or other method), transform your time-series using a natural log or Box-Cox transformation.
Code
vehicle_sales_diff <- vehicle_sales_tbl_ts %>%mutate(value_diff = value -lag(value, 2)) %>%as_tsibble(index = date)vehicle_sales_diff %>%mutate(diff_sd = zoo::rollapply( value, FUN = sd, width =12, fill =NA)) %>%na.omit()%>%ggplot()+geom_line(aes(date,diff_sd))+geom_smooth(aes(date,diff_sd),method='lm',se=F)+theme_bw()+ggtitle("Standard Deviation of Differenced Vehicle sales, over Time") +ylab("SD of Differenced Vehicle Sales") +xlab("Date")+theme_bw()
`geom_smooth()` using formula = 'y ~ x'
If seasonality is present in your data, use seasonal differencing to remove the seasonal effect (for example, for monthly data estimate the difference.
Seasonal differencing reveals the absence of seasonality in the vehicle data.
Conduct and interpret a KPSS test for stationarity after performing the above operations (if necessary). Do the results suggest that the process is mean stationary? If the process is mean non-stationary, calculate difference the data until mean stationary (after log/Box-Cox and seasonal differencing) and visualize it.
The p-value is > 0.05, suggests that the seasonal value we have derived is now stationary.
Section 2
Produce and interpret ACF/PACF plots of the transformed time series after the above operations. Based on the ACF/PACF alone, does the time-series appear to be an autoregressive process, moving average process, combined, or neither? What might you suspect the order of the time-series to be (e.g. ARIMA(0,1,2)) using the ACF/PACF plots and stationarity tests?
The ACF plot indicates a moving average (MA) of order 2, while the significant lag in the PACF plot is also 2, implying the determination of the autoregressive (AR) order.
Based on the data from the lag plots and stationary we can assume ARIMA(2,0,2)
Section 3
Fit several ARIMA models to your time-series variable based on the “best guesses” above. Which model is the “best” according to the AIC or BIC? After comparing several models, implement the automated ARIMA function from fable to find the “best” model. Does the automated ARIMA function select the same model as you did? If not, why do you think this is the case?
Based on the BIC criterion, the optimal model is ARIMA(2,1,2), which closely resembles the one we observed through the generation of the ACF and PACF lags.
After selecting the best model, derive the fitted values for the time series and plot them against the observed values of the series. Do the in-sample predicted values tend to follow the trends in the data?
Compute and analyze the residuals from the selected model.
Code
best_model %>%gg_tsresiduals(lag=52)
Conduct a Box-Ljung test for residual autocorrelation and examine the ACF/PACF plots in the residuals. Do the residuals appear to be white noise? If the residuals do not appear to be white noise, either interpret your results (or lack thereof) or suggest a possible solution to the problem.
Code
best_model %>%augment() %>%features(.innov, ljung_box, lag =10, dof =2)
With a high p-value of 0.2183795, we do not have enough evidence to reject the null hypothesis. This lack of significance suggests that there is no substantial autocorrelation present in the residuals. Consequently, it implies that the residuals exhibit characteristics akin to white noise, a favorable outcome in time series modeling.