Below is a plot of the original time series data.
Upon initial inspection, there is a clear downward quadratic trend in this time series. Additionally, there seems to be seasonality, with upward spikes roughly every 200 days of the recorded data. Finally, the variance looks like it decreases over time, so there is potential heteroscedasticity with the current data.
The variance is decreasing in a quadratic fashion with respect to the mean, so we can use a log transformation to stabilize the variance.
With the variance stabilized, we can move to the model building.
As I mentioned in the Exploratory Data Analysis, there is a clear downward quadratic trend. I will first remove this via twice-differencing.
Next, we will check the periodogram of the residuals to see whether any seasonality is occurring.
The periodogram displays hints of leakage, which means there are multiple spikes in the graph when the strucuture of the data is really not that complicated. We are inclined to think that there are several Fourier frequencies but this is really not the case. It may be due to randomness. For now, we will ignore any possible seasonality and directly observe the ACF and PACF of the residuals from the twice-differenced series.
The ACF and PACF of twice-differenced prices are displayed below.
## ACF PACF
## [1,] -0.50 -0.50
## [2,] -0.02 -0.35
## [3,] 0.04 -0.22
## [4,] -0.04 -0.19
## [5,] 0.03 -0.12
## [6,] -0.04 -0.13
## [7,] 0.00 -0.14
## [8,] 0.02 -0.12
## [9,] 0.01 -0.08
## [10,] -0.02 -0.09
## [11,] 0.01 -0.07
## [12,] 0.02 -0.04
## [13,] -0.01 -0.03
## [14,] -0.03 -0.07
## [15,] 0.01 -0.08
## [16,] 0.00 -0.09
## [17,] 0.03 -0.03
## [18,] -0.02 -0.02
## [19,] -0.01 -0.03
## [20,] 0.00 -0.05
## [21,] 0.06 0.03
## [22,] -0.07 -0.02
## [23,] 0.01 -0.03
## [24,] 0.03 0.00
## [25,] -0.06 -0.08
## [26,] 0.08 -0.01
## [27,] -0.04 -0.02
## [28,] 0.00 -0.04
## [29,] 0.02 -0.04
## [30,] 0.01 -0.01
## [31,] 0.00 0.02
## [32,] -0.01 0.03
## [33,] -0.03 -0.02
## [34,] 0.04 0.03
## [35,] -0.01 0.04
## [36,] -0.03 0.00
## [37,] -0.01 -0.06
## [38,] 0.06 0.01
## [39,] -0.06 -0.03
## [40,] 0.00 -0.05
## [41,] 0.05 0.01
## [42,] -0.04 0.01
## [43,] -0.01 -0.03
## [44,] 0.05 0.05
## [45,] -0.03 0.04
There is a clear cutoff after lag 1 in the ACF, so an MA(1) model may be a good fit. The PACF shows no obvious pattern. Therefore, we can fit this under an ARIMA(0,2,1) model.
Visually, the residuals above look to be approximately white noise. We will verify this by making sure that the ACF has no significant lags beyond 0 (i.e. spikes beyond the 95% confidence interval bands).
The ACF clearly has no significant lags beyond lag 0 (except a random spike at lag 22 which does not have much interpretation). Therefore, we have reached approximately white noise in this model.
For my second model, I will fit a linear model (degree 2) to the data in order to estimate the trend. I will then analyze the ACF and PACF plots of the residuals in order to properly model the noise component.
## [1] 0.93
The ACF has no clear cutoff point but the PACF seems to cutoff after lag 1. Therefore, an AR(1) fit on the linear model residuals can potentially model the noise well. The ACF and PACF of the residuals after fitting the AR(1) model is shown below.
## [1] 0.01
The ACF shows a relatively clear cutoff after lag 0, meaning that none of the spikes are significant. Once again, we see a spike at lag 22, but that is more likely due to randomness than some underlying process. The PACF also shows a clear cutoff at lag 0. Therefore, we have reached approximately white noise.
6 even windows of 199 data points:
Out of Sample (CV):
The first model has a lower CV score.
Model A seems to be the “best” model in this case. It performed better out of sample, as seen with the lower cross-validation score.