Assess the behavior/data-generating process of your time-series using the techniques discussed in class. From your understanding of the process, would you expect the time-series to be mean stationary? Variance stationary? Why or why not?
Code
milk_df_plot <-ggplot(data = milk_price_tbl_ts, aes(x = date, y = value)) +geom_line() +xlab("Time") +ylab("Values") +ggtitle("Time vs Values Plot")milk_df_plot
According to the time series graph , it’s evident that the series lacks stationary mean and variance. Therefore, our initial approach involves utilizing the rolling standard deviation of the differenced series to examine the data.
If the data appear to be variance non-stationary (using a rolling SD or other method), transform your time-series using a natural log or Box-Cox transformation.
Code
milk_price_diff <- milk_price_tbl_ts %>%mutate(value_diff = value -lag(value)) %>%as_tsibble(index=date)milk_price_diff %>%mutate(diff_sd = zoo::rollapply( value, FUN = sd, width =12, fill =NA)) %>%na.omit()%>%ggplot()+geom_line(aes(date,diff_sd))+geom_smooth(aes(date,diff_sd),method='lm',se=F)+theme_bw()+ggtitle("Standard Deviation of Differenced Milk Price, over Time") +ylab("SD of Differenced Milk Price") +xlab("Date")+theme_bw()
`geom_smooth()` using formula = 'y ~ x'
If seasonality is present in your data, use seasonal differencing to remove the seasonal effect (for example, for monthly data estimate the difference.
Seasonal differencing reveals the absence of seasonality in the milk data.
Conduct and interpret a KPSS test for stationarity after performing the above operations (if necessary). Do the results suggest that the process is mean stationary? If the process is mean non-stationary, calculate difference the data until mean stationary (after log/Box-Cox and seasonal differencing) and visualize it.
The p-value is > 0.05, suggests that the seasonal value we have derived is now stationary.
Section 2
Produce and interpret ACF/PACF plots of the transformed time series after the above operations. Based on the ACF/PACF alone, does the time-series appear to be an autoregressive process, moving average process, combined, or neither? What might you suspect the order of the time-series to be (e.g. ARIMA(0,1,2)) using the ACF/PACF plots and stationarity tests?
The ACF plot indicates a moving average (MA) of order 2, while the significant lag in the PACF plot is also 2, implying the determination of the autoregressive (AR) order.
Based on the data from the lag plots and stationary we can assume ARIMA(2,1,2)
Section 3
Fit several ARIMA models to your time-series variable based on the “best guesses” above. Which model is the “best” according to the AIC or BIC? After comparing several models, implement the automated ARIMA function from fable to find the “best” model. Does the automated ARIMA function select the same model as you did? If not, why do you think this is the case?
Based on the BIC criterion, the optimal model is ARIMA(2,1,2), which closely resembles the one we observed through the generation of the ACF and PACF lags.
After selecting the best model, derive the fitted values for the time series and plot them against the observed values of the series. Do the in-sample predicted values tend to follow the trends in the data?
Compute and analyze the residuals from the selected model.
Code
best_model %>%gg_tsresiduals(lag=52)
Conduct a Box-Ljung test for residual autocorrelation and examine the ACF/PACF plots in the residuals. Do the residuals appear to be white noise? If the residuals do not appear to be white noise, either interpret your results (or lack thereof) or suggest a possible solution to the problem.
Code
best_model %>%augment() %>%features(.innov, ljung_box, lag =10, dof =2)
Given the high p-value (0.9424673), we fail to reject the null hypothesis, suggesting that there is no significant autocorrelation in the residuals. This implies that the residuals appear to be white noise, which is desirable in time series modeling.