63%%6
## [1] 3
plot(UKgas, col='brown', lwd=2, main='Time series plot of UKgas')
acf(UKgas, col='brown', lwd=2)
pacf(UKgas, col='brown', lwd=2)
plot(decompose(UKgas, type = c("multiplicative")), col='brown', lwd=2)
ggsubseriesplot(UKgas)
new_UKgas<-tsclean(UKgas)
Here, we estimated outliers with tsclean function in forecast package.
traindata<-ts(new_UKgas[0:75], frequency = 4, start=c(1960,1))
testdata<-ts(new_UKgas[76:length(UKgas)], frequency = 4, start=c(1978,4))
Augmented Dickey-Fuller (ADF) test and The Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test can be used to test for stationary. These tests are available in tseries package with the function adf.test() for the ADF test and kpss.test() for the KPSS test.
adf.test(traindata)
##
## Augmented Dickey-Fuller Test
##
## data: traindata
## Dickey-Fuller = -2.1996, Lag order = 4, p-value = 0.4941
## alternative hypothesis: stationary
In the ADF test,
H0 : the series is not stationary
Ha : the series is stationary
Hence, a small p-value (i.e., less than Alpha=0.05) suggests that the series is stationary. Here, the p-value(0.5043) is greater than Alpha(0.05). So, we can not reject H0. That means the ADF test shows that the UKgas dataset is not stationary.
kpss.test(traindata)
##
## KPSS Test for Level Stationarity
##
## data: traindata
## KPSS Level = 1.7857, Truncation lag parameter = 3, p-value = 0.01
In the KPSS test,
H0 : the series is stationary
Ha : the series is not stationary
Hence, a small p-value suggests that the series is not stationary and a differencing is required. Here, the p-value(0.01) is less than Alpha(0.05). So, we can reject H0. That means the KPSS test shows that the UKgas dataset is not stationary.
So, here the seasonal differences are required. Now, we can check stationary after taking 1st differences.
D1_UKgas<-diff(traindata, lag=4)
tsdisplay(D1_UKgas, col='brown', lwd=2)
adf.test(D1_UKgas)
##
## Augmented Dickey-Fuller Test
##
## data: D1_UKgas
## Dickey-Fuller = -5.7039, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
kpss.test(D1_UKgas)
##
## KPSS Test for Level Stationarity
##
## data: D1_UKgas
## KPSS Level = 0.36104, Truncation lag parameter = 3, p-value = 0.09395
After taking 1st difference the ADF test shows that the p-value is less than Alpha(0.05) means the data is stationary and the kpss test shows that the p-value is greater than Alpha(0.05) means the data is stationary. So, we can say that one seasonal and 1 nonseasonal difference is enough for the M1 model UKgas dataset to get stationary series.
Here, we can see ACF and PACF plot our seasonal spike is AR=1, MA=1. And non seasonal spike is AR=3, MA=1. So the nonseasonal order is (p=3,d=1,q=1) and seasonal order is (P=1,D=1,Q=1).
m_data<-traindata
M1<-Arima(m_data, order = c(3, 1, 1), seasonal = c(1, 1, 1), method="ML")
summary(M1)
## Series: m_data
## ARIMA(3,1,1)(1,1,1)[4]
##
## Coefficients:
## ar1 ar2 ar3 ma1 sar1 sma1
## -0.8951 -0.8121 -0.8816 0.3557 -0.5754 -0.6343
## s.e. 0.0726 0.0920 0.0698 0.1419 0.1029 0.1185
##
## sigma^2 = 694.8: log likelihood = -327.29
## AIC=668.58 AICc=670.39 BIC=684.32
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 3.581664 24.34885 14.11413 0.7893229 5.71129 0.631229 -0.04288031
ARIMA (3,1,1)(1,1,1)\[4\]
Estimated Model: (1+0.7303B+0.6530B2+0.7449B3)(1+0.4127B4)(1-B)(1-B4)yt= (1-0.1884B)(1+0.5744B4)Zt, where{Zt}~WN(0, sigma_square=909.4)
test(M1$residuals)
## Null hypothesis: Residuals are iid noise.
## Test Distribution Statistic p-value
## Ljung-Box Q Q ~ chisq(20) 13.19 0.8689
## McLeod-Li Q Q ~ chisq(20) 13.4 0.8598
## Turning points T (T-48.7)/3.6 ~ N(0,1) 49 0.9264
## Diff signs S (S-37)/2.5 ~ N(0,1) 37 1
## Rank P (P-1387.5)/109.3 ~ N(0,1) 1442 0.6181
Assumptions:
i. {et} uncorrelated. Here, we can see the ACF and PACF plot of residuals diagnostics, there is no significant spike. So, we can say that the {et} is uncorrelated.
ii. {et} have mean zero. From our Residuals plot, we can say that mean zero but variance in not much constant.
iii. {et} is normally distributed. Here, the Normal Q-Q plot shows that the residuals are approximately normally distributed because the data is near 45 degree.
Here, H0: Residuals are iid noise.
We can see the summary of the residuals test here all p-value is greater than Alpha (0.05). So, here we can not reject H0. So, the M1 model satisfied all assumptions of residuals.
As it has increasing upward trend and multiplicative seasonality, cube root transformation is perfect for this data set.
Cube root transformation:
cube_root<-traindata**(1/3)
plot(cube_root, col='brown', lwd=2)
After cube root transformation, the time plot shows that a upward dumped trend and additive seasonality.
ACF and PACF plot:
Acf(cube_root, col='brown', lwd=2)
Pacf(cube_root, col='brown', lwd=2)
ADF and KPSS Test:
adf.test(cube_root)
##
## Augmented Dickey-Fuller Test
##
## data: cube_root
## Dickey-Fuller = -3.044, Lag order = 4, p-value = 0.1496
## alternative hypothesis: stationary
kpss.test(cube_root)
##
## KPSS Test for Level Stationarity
##
## data: cube_root
## KPSS Level = 1.8497, Truncation lag parameter = 3, p-value = 0.01
Here, the ADF test shows that the p-value is greater than Alpha(0.05) means the data is not stationary and the kpss test shows that the p-value is less than Alpha(0.05) means the data is not stationary. So, here 1 seasonal differences is required.
D2_UKgas<-diff(cube_root, lag=4)
tsdisplay(D2_UKgas, col='brown', lwd=2)
adf.test(D2_UKgas)
##
## Augmented Dickey-Fuller Test
##
## data: D2_UKgas
## Dickey-Fuller = -5.0527, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
kpss.test(D2_UKgas)
##
## KPSS Test for Level Stationarity
##
## data: D2_UKgas
## KPSS Level = 0.22712, Truncation lag parameter = 3, p-value = 0.1
After taking 1st difference the ADF test shows that the p-value is less than Alpha(0.05) means the data is stationary and the kpss test shows that the p-value is greater than Alpha(0.05) means the data is stationary. So, we can say that one seasonal and 0 nonseasonal difference is enough for the M2 model UKgas dataset to get stationary series.
Here, we can see ACF and PACF plot our seasonal spike is AR=1, MA=1. And non seasonal spike is AR=3, MA=1. So the nonseasonal order is (p=3,d=0,q=1) and seasonal order is (P=1,D=1,Q=0).
M2<-Arima(cube_root, order = c(3, 0, 1), seasonal = c(1, 1, 0), include.drift = TRUE)
summary(M2)
## Series: cube_root
## ARIMA(3,0,1)(1,1,0)[4] with drift
##
## Coefficients:
## ar1 ar2 ar3 ma1 sar1 drift
## -0.3526 0.2369 -0.3725 0.9148 -0.3803 0.0341
## s.e. 0.1198 0.1168 0.1102 0.0646 0.1121 0.0055
##
## sigma^2 = 0.04108: log likelihood = 14.92
## AIC=-15.83 AICc=-14.05 BIC=0.01
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.001834469 0.1886841 0.1275057 -0.2280462 2.107927 0.6772917
## ACF1
## Training set 0.02461169
ARIMA (3,0,1)(1,1,0)\[4\]
Estimated Model: (1+0.7303B+0.6530B2+0.7449B3)(1+0.4127B4)(1-B)(1-B4)yt= (1-0.1884B)(1+0.5744B4)Zt, where{Zt}~WN(0, sigma_square=909.4)
test(M2$residuals)
## Null hypothesis: Residuals are iid noise.
## Test Distribution Statistic p-value
## Ljung-Box Q Q ~ chisq(20) 13.62 0.849
## McLeod-Li Q Q ~ chisq(20) 22.76 0.3008
## Turning points T (T-48.7)/3.6 ~ N(0,1) 55 0.0791
## Diff signs S (S-37)/2.5 ~ N(0,1) 38 0.6911
## Rank P (P-1387.5)/109.3 ~ N(0,1) 1612 0.04 *
Assumptions:
i. {et} uncorrelated. Here, we can see the ACF and PACF plot of residuals diagnostics, there is no significant spike. So, we can say that the {et} is uncorrelated.
ii. {et} have mean zero. From our Residuals plot, we can say that mean zero but variance in not much constant.
iii. {et} is normally distributed. Here, the Normal Q-Q plot shows that the residuals are approximately normally distributed because the data is near 45 degree.
Here, H0: Residuals are iid noise.
We can see the summary of the residuals test here without one other p-value is greater than Alpha (0.05). So, here we can not reject H0. So, the M2 model satisfied all assumptions of residuals.
Optimum lambda value:
lambda<-BoxCox.lambda(UKgas)
lambda
## [1] -0.4457023
boxcox<-BoxCox(UKgas, lambda = lambda)
plot(boxcox, col='brown', lwd=2)
After box-cox transformation the time plot shows that upward dumped trend.
ACF and PACF plot:
Acf(boxcox, col='brown', lwd=2)
Pacf(boxcox, col='brown', lwd=2)
ADF and KPSS Test:
adf.test(boxcox)
##
## Augmented Dickey-Fuller Test
##
## data: boxcox
## Dickey-Fuller = -1.0813, Lag order = 4, p-value = 0.9218
## alternative hypothesis: stationary
kpss.test(boxcox)
##
## KPSS Test for Level Stationarity
##
## data: boxcox
## KPSS Level = 2.1876, Truncation lag parameter = 4, p-value = 0.01
Here, the ADF test shows that the p-value is greater than Alpha(0.05) means the data is not stationary and the kpss test shows that the p-value is less than Alpha(0.05) means the data is not stationary. So, here seasonal differences is required.
D3_UKgas<-diff(boxcox, lag=4)
tsdisplay(D3_UKgas, col='brown', lwd=2)
adf.test(D3_UKgas)
##
## Augmented Dickey-Fuller Test
##
## data: D3_UKgas
## Dickey-Fuller = -4.3319, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
kpss.test(D3_UKgas)
##
## KPSS Test for Level Stationarity
##
## data: D3_UKgas
## KPSS Level = 0.20619, Truncation lag parameter = 4, p-value = 0.1
After taking 1st difference the ADF test shows that the p-value is less than Alpha(0.05) means the data is stationary and the kpss test shows that the p-value is greater than Alpha(0.05) means the data is stationary. So, we can say that one seasonal difference is enough for the M3 model UKgas dataset to get stationary series.
Here, we can see ACF and PACF plot our seasonal spike is AR=1, MA=0. And seasonal order is (P=1,D=1,Q=0).
##### Fit the M3 model:
M3<-Arima(boxcox, order = c(0, 0, 0), seasonal = c(1, 1, 0), include.drift = TRUE)
summary(M3)
## Series: boxcox
## ARIMA(0,0,0)(1,1,0)[4] with drift
##
## Coefficients:
## sar1 drift
## -0.2675 0.0014
## s.e. 0.0942 0.0002
##
## sigma^2 = 9.051e-05: log likelihood = 337.5
## AIC=-669 AICc=-668.76 BIC=-661.07
##
## Training set error measures:
## ME RMSE MAE MPE MAPE
## Training set 1.036009e-05 0.009245614 0.005702586 -0.0003883941 0.27852
## MASE ACF1
## Training set 0.7878176 -0.1271794
ARIMA (0,0,0)(1,1,0)\[4\] Estimated Model: (1+0.7303B+0.6530B2+0.7449B3)(1+0.4127B4)(1-B)(1-B4)yt= (1-0.1884B)(1+0.5744B4)Zt, where{Zt}~WN(0, sigma_square=909.4)
test(M3$residuals)
## Null hypothesis: Residuals are iid noise.
## Test Distribution Statistic p-value
## Ljung-Box Q Q ~ chisq(20) 20.01 0.457
## McLeod-Li Q Q ~ chisq(20) 41.13 0.0036 *
## Turning points T (T-70.7)/4.3 ~ N(0,1) 68 0.5394
## Diff signs S (S-53.5)/3 ~ N(0,1) 52 0.6187
## Rank P (P-2889)/188.3 ~ N(0,1) 2735 0.4135
Assumptions: i. {et} uncorrelated. Here, we can see the ACF and PACF plot of residuals diagnostics, there is no significant spike. So, we can say that the {et} is uncorrelated.
ii. {et} have mean zero. From our Residuals plot, we can say that mean zero but variance in not much constant.
iii. {et} is normally distributed. Here, the Normal Q-Q plot shows that the residuals are approximately normally distributed because the data is near 45 degree.
Here, H0: Residuals are iid noise.
We can see the summary of the residuals test here all p-value is greater than Alpha (0.05). So, here we can not reject H0. So, the M3 model satisfied all assumptions of residuals.
| Model | AIC | AICc | BIC |
|---|---|---|---|
| M1 | |||
| M2 | |||
| M3 |
6.
| Model | MSE | RMSE | MAE | MPE | MAPE |
|---|---|---|---|---|---|
| M1 | |||||
| M2 | |||||
| M3 |
Comment on Time-plot:
          Trends                 : The above time series plot shows a clear upward trend.
          Seasonal pattern : These data show a multiplicative seasonal pattern because the pattern repeats every 4 quarters. The frequency is 4, and therefore period is quarterly, so a seasonal component is present.
          Sharp Changes   : The above time series plot shows that there is no sharp change in our data.
          Outliers                : Following the above time series plot, we can assume that there are some outliers in the data. We can check outliers with the help of forecast package and tsoutliers function in R.