Assignment 1

Selected the R Dataset:

63%%6

## [1] 3

My ID is 20215063 and last two digit is 63. So, 63 %% 6 = 3. Hence, our given dataset is UKgas.

Dataset Details:

      UK Quarterly Gas Consumption
      Description: Quarterly UK gas consumption from 1960Q1 to 1986Q4, in millions of therms.
      Usage: UKgae
      Format: A quarterly time series of length 108.

1. Plots

Time-Plot:

plot(UKgas, col='brown', lwd=2, main='Time series plot of UKgas')

Comment on Time-plot:

      The above time-plot describes some main features of data which is given below.
          Trends                 : The above time series plot shows a clear upward trend.
          Seasonal pattern : These data show a multiplicative seasonal pattern because the pattern repeats every 4 quarters. The frequency is 4, and therefore period is quarterly, so a seasonal component is present.
          Sharp Changes   : The above time series plot shows that there is no sharp change in our data.
          Outliers                : Following the above time series plot, we can assume that there are some outliers in the data. We can check outliers with the help of forecast package and tsoutliers function in R.

tsoutliers(UKgas)

## $index
##  [1]  44  57  65  69  73  77  79  81  83  84  85  87  88  89  91  93  95  97  98
## [20]  99 101 103 105 107
## 
## $replacements
##  [1] 301.1369 358.9973 404.5556 475.4278 492.0752 531.2751 394.3690 524.9169
##  [9] 383.0432 565.7420 547.0516 402.9917 585.6832 559.4504 453.9700 643.2421
## [17] 495.6343 670.9894 559.0268 534.9950 676.2419 551.6219 744.3583 588.4111

ACF(AutoCorrelation Function) Plot:

acf(UKgas, col='brown', lwd=2)

Comment on ACF plot:

ACF plot shows that first two significant spike and the coefficient is high at lag 4,8,12. So, here we can say the MA(q)=2.

PACF(Partial AutoCorrelation Function) Plot:

pacf(UKgas, col='brown', lwd=2)

Comment on PACF plot:

PACF plot shows that first three significant spike and the coefficient is high at lag 4,8,12. So, here we can say the AR(p)=3.

Decomposition Plot:

plot(decompose(UKgas, type = c("multiplicative")), col='brown', lwd=2)

Comment on Decomposition plot:

Decomposition plot shows that clear upward trend, multiplicative seasonality and random fluctuation.

Seasonal-Subseries Plot:

ggsubseriesplot(UKgas)

Comment:

A seasonal subseries plot is used to determine if there is significant seasonality in a time series. For our quarterly data, all the Q1 values are plotted, then all the Q2 values, and so on. So, we can clearly see that from the above seasonal subseries data, there is seasonality present in UKgas data.

Estimated Outliers:

new_UKgas<-tsclean(UKgas)

Here, we estimated outliers with tsclean function in forecast package.

Splitting the data into training (70%) and test (30%) data sets:

Training Data:

traindata<-ts(new_UKgas[0:75], frequency = 4, start=c(1960,1))

Test Data:

testdata<-ts(new_UKgas[76:length(UKgas)], frequency = 4, start=c(1978,4))

2.

Checking Stationary:

Augmented Dickey-Fuller (ADF) test and The Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test can be used to test for stationary. These tests are available in tseries package with the function adf.test() for the ADF test and kpss.test() for the KPSS test.

Augmented Dickey-Fuller Test:

adf.test(traindata)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  traindata
## Dickey-Fuller = -2.1996, Lag order = 4, p-value = 0.4941
## alternative hypothesis: stationary

In the ADF test,
H0 : the series is not stationary
Ha : the series is stationary
Hence, a small p-value (i.e., less than Alpha=0.05) suggests that the series is stationary. Here, the p-value(0.5043) is greater than Alpha(0.05). So, we can not reject H0. That means the ADF test shows that the UKgas dataset is not stationary.

Kwiatkowski-Phillips-Schmidt-Shin Test:

kpss.test(traindata)

## 
##  KPSS Test for Level Stationarity
## 
## data:  traindata
## KPSS Level = 1.7857, Truncation lag parameter = 3, p-value = 0.01

In the KPSS test,
H0 : the series is stationary
Ha : the series is not stationary
Hence, a small p-value suggests that the series is not stationary and a differencing is required. Here, the p-value(0.01) is less than Alpha(0.05). So, we can reject H0. That means the KPSS test shows that the UKgas dataset is not stationary.

So, here the seasonal differences are required. Now, we can check stationary after taking 1st differences.

D1_UKgas<-diff(traindata, lag=4)
tsdisplay(D1_UKgas, col='brown', lwd=2)

adf.test(D1_UKgas)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  D1_UKgas
## Dickey-Fuller = -5.7039, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary

kpss.test(D1_UKgas)

## 
##  KPSS Test for Level Stationarity
## 
## data:  D1_UKgas
## KPSS Level = 0.36104, Truncation lag parameter = 3, p-value = 0.09395

After taking 1st difference the ADF test shows that the p-value is less than Alpha(0.05) means the data is stationary and the kpss test shows that the p-value is greater than Alpha(0.05) means the data is stationary. So, we can say that one seasonal and 1 nonseasonal difference is enough for the M1 model UKgas dataset to get stationary series.

Here, we can see ACF and PACF plot our seasonal spike is AR=1, MA=1. And non seasonal spike is AR=3, MA=1. So the nonseasonal order is (p=3,d=1,q=1) and seasonal order is (P=1,D=1,Q=1).

Fit the M1 model:

m_data<-traindata
M1<-Arima(m_data, order = c(3, 1, 1), seasonal = c(1, 1, 1), method="ML")
summary(M1)

## Series: m_data 
## ARIMA(3,1,1)(1,1,1)[4] 
## 
## Coefficients:
##           ar1      ar2      ar3     ma1     sar1     sma1
##       -0.8951  -0.8121  -0.8816  0.3557  -0.5754  -0.6343
## s.e.   0.0726   0.0920   0.0698  0.1419   0.1029   0.1185
## 
## sigma^2 = 694.8:  log likelihood = -327.29
## AIC=668.58   AICc=670.39   BIC=684.32
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE    MAPE     MASE        ACF1
## Training set 3.581664 24.34885 14.11413 0.7893229 5.71129 0.631229 -0.04288031

Mathematical Expression of M1 model:

ARIMA (3,1,1)(1,1,1)\[4\]

Estimated Model: (1+0.7303B+0.6530B2+0.7449B3)(1+0.4127B4)(1-B)(1-B4)yt= (1-0.1884B)(1+0.5744B4)Zt, where{Zt}~WN(0, sigma_square=909.4)

Residuals Diagnostic Checking:

test(M1$residuals)

## Null hypothesis: Residuals are iid noise.
## Test                        Distribution Statistic   p-value
## Ljung-Box Q                Q ~ chisq(20)     13.19    0.8689
## McLeod-Li Q                Q ~ chisq(20)      13.4    0.8598
## Turning points T   (T-48.7)/3.6 ~ N(0,1)        49    0.9264
## Diff signs S         (S-37)/2.5 ~ N(0,1)        37         1
## Rank P         (P-1387.5)/109.3 ~ N(0,1)      1442    0.6181

Assumptions:
i. {et} uncorrelated. Here, we can see the ACF and PACF plot of residuals diagnostics, there is no significant spike. So, we can say that the {et} is uncorrelated.
ii. {et} have mean zero. From our Residuals plot, we can say that mean zero but variance in not much constant.
iii. {et} is normally distributed. Here, the Normal Q-Q plot shows that the residuals are approximately normally distributed because the data is near 45 degree.

Here, H0: Residuals are iid noise.
We can see the summary of the residuals test here all p-value is greater than Alpha (0.05). So, here we can not reject H0. So, the M1 model satisfied all assumptions of residuals.

3. Transformation:

As it has increasing upward trend and multiplicative seasonality, cube root transformation is perfect for this data set.

Cube root transformation:

cube_root<-traindata**(1/3)
plot(cube_root, col='brown', lwd=2)

After cube root transformation, the time plot shows that a upward dumped trend and additive seasonality.
ACF and PACF plot:

Acf(cube_root, col='brown', lwd=2)

Pacf(cube_root, col='brown', lwd=2)

ADF and KPSS Test:

adf.test(cube_root)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  cube_root
## Dickey-Fuller = -3.044, Lag order = 4, p-value = 0.1496
## alternative hypothesis: stationary

kpss.test(cube_root)

## 
##  KPSS Test for Level Stationarity
## 
## data:  cube_root
## KPSS Level = 1.8497, Truncation lag parameter = 3, p-value = 0.01

Here, the ADF test shows that the p-value is greater than Alpha(0.05) means the data is not stationary and the kpss test shows that the p-value is less than Alpha(0.05) means the data is not stationary. So, here 1 seasonal differences is required.

D2_UKgas<-diff(cube_root, lag=4)
tsdisplay(D2_UKgas, col='brown', lwd=2)

adf.test(D2_UKgas)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  D2_UKgas
## Dickey-Fuller = -5.0527, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary

kpss.test(D2_UKgas)

## 
##  KPSS Test for Level Stationarity
## 
## data:  D2_UKgas
## KPSS Level = 0.22712, Truncation lag parameter = 3, p-value = 0.1

After taking 1st difference the ADF test shows that the p-value is less than Alpha(0.05) means the data is stationary and the kpss test shows that the p-value is greater than Alpha(0.05) means the data is stationary. So, we can say that one seasonal and 0 nonseasonal difference is enough for the M2 model UKgas dataset to get stationary series.

Here, we can see ACF and PACF plot our seasonal spike is AR=1, MA=1. And non seasonal spike is AR=3, MA=1. So the nonseasonal order is (p=3,d=0,q=1) and seasonal order is (P=1,D=1,Q=0).

Fit the M2 model:

M2<-Arima(cube_root, order = c(3, 0, 1), seasonal = c(1, 1, 0), include.drift = TRUE)
summary(M2)

## Series: cube_root 
## ARIMA(3,0,1)(1,1,0)[4] with drift 
## 
## Coefficients:
##           ar1     ar2      ar3     ma1     sar1   drift
##       -0.3526  0.2369  -0.3725  0.9148  -0.3803  0.0341
## s.e.   0.1198  0.1168   0.1102  0.0646   0.1121  0.0055
## 
## sigma^2 = 0.04108:  log likelihood = 14.92
## AIC=-15.83   AICc=-14.05   BIC=0.01
## 
## Training set error measures:
##                        ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.001834469 0.1886841 0.1275057 -0.2280462 2.107927 0.6772917
##                    ACF1
## Training set 0.02461169

Mathematical Expression of M2 model:

ARIMA (3,0,1)(1,1,0)\[4\]
Estimated Model: (1+0.7303B+0.6530B2+0.7449B3)(1+0.4127B4)(1-B)(1-B4)yt= (1-0.1884B)(1+0.5744B4)Zt, where{Zt}~WN(0, sigma_square=909.4)

Residuals Diagnostic Checking:

test(M2$residuals)

## Null hypothesis: Residuals are iid noise.
## Test                        Distribution Statistic   p-value
## Ljung-Box Q                Q ~ chisq(20)     13.62     0.849
## McLeod-Li Q                Q ~ chisq(20)     22.76    0.3008
## Turning points T   (T-48.7)/3.6 ~ N(0,1)        55    0.0791
## Diff signs S         (S-37)/2.5 ~ N(0,1)        38    0.6911
## Rank P         (P-1387.5)/109.3 ~ N(0,1)      1612      0.04 *

Here, H0: Residuals are iid noise.
We can see the summary of the residuals test here without one other p-value is greater than Alpha (0.05). So, here we can not reject H0. So, the M2 model satisfied all assumptions of residuals.

4. Box-cox Transformation:

Optimum lambda value:

lambda<-BoxCox.lambda(UKgas)
lambda

## [1] -0.4457023

boxcox<-BoxCox(UKgas, lambda = lambda)
plot(boxcox, col='brown', lwd=2)

After box-cox transformation the time plot shows that upward dumped trend.
ACF and PACF plot:

Acf(boxcox, col='brown', lwd=2)

Pacf(boxcox, col='brown', lwd=2)

ADF and KPSS Test:

adf.test(boxcox)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  boxcox
## Dickey-Fuller = -1.0813, Lag order = 4, p-value = 0.9218
## alternative hypothesis: stationary

kpss.test(boxcox)

## 
##  KPSS Test for Level Stationarity
## 
## data:  boxcox
## KPSS Level = 2.1876, Truncation lag parameter = 4, p-value = 0.01

D3_UKgas<-diff(boxcox, lag=4)
tsdisplay(D3_UKgas, col='brown', lwd=2)

adf.test(D3_UKgas)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  D3_UKgas
## Dickey-Fuller = -4.3319, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary

kpss.test(D3_UKgas)

## 
##  KPSS Test for Level Stationarity
## 
## data:  D3_UKgas
## KPSS Level = 0.20619, Truncation lag parameter = 4, p-value = 0.1

After taking 1st difference the ADF test shows that the p-value is less than Alpha(0.05) means the data is stationary and the kpss test shows that the p-value is greater than Alpha(0.05) means the data is stationary. So, we can say that one seasonal difference is enough for the M3 model UKgas dataset to get stationary series.

Here, we can see ACF and PACF plot our seasonal spike is AR=1, MA=0. And seasonal order is (P=1,D=1,Q=0).
##### Fit the M3 model:

M3<-Arima(boxcox, order = c(0, 0, 0), seasonal = c(1, 1, 0), include.drift = TRUE)
summary(M3)

## Series: boxcox 
## ARIMA(0,0,0)(1,1,0)[4] with drift 
## 
## Coefficients:
##          sar1   drift
##       -0.2675  0.0014
## s.e.   0.0942  0.0002
## 
## sigma^2 = 9.051e-05:  log likelihood = 337.5
## AIC=-669   AICc=-668.76   BIC=-661.07
## 
## Training set error measures:
##                        ME        RMSE         MAE           MPE    MAPE
## Training set 1.036009e-05 0.009245614 0.005702586 -0.0003883941 0.27852
##                   MASE       ACF1
## Training set 0.7878176 -0.1271794

Mathematical Expression of M3 model:

ARIMA (0,0,0)(1,1,0)\[4\] Estimated Model: (1+0.7303B+0.6530B2+0.7449B3)(1+0.4127B4)(1-B)(1-B4)yt= (1-0.1884B)(1+0.5744B4)Zt, where{Zt}~WN(0, sigma_square=909.4)

Residuals Diagnostic Checking:

test(M3$residuals)

## Null hypothesis: Residuals are iid noise.
## Test                        Distribution Statistic   p-value
## Ljung-Box Q                Q ~ chisq(20)     20.01     0.457
## McLeod-Li Q                Q ~ chisq(20)     41.13    0.0036 *
## Turning points T   (T-70.7)/4.3 ~ N(0,1)        68    0.5394
## Diff signs S         (S-53.5)/3 ~ N(0,1)        52    0.6187
## Rank P           (P-2889)/188.3 ~ N(0,1)      2735    0.4135

Assumptions: i. {et} uncorrelated. Here, we can see the ACF and PACF plot of residuals diagnostics, there is no significant spike. So, we can say that the {et} is uncorrelated.
ii. {et} have mean zero. From our Residuals plot, we can say that mean zero but variance in not much constant.
iii. {et} is normally distributed. Here, the Normal Q-Q plot shows that the residuals are approximately normally distributed because the data is near 45 degree.

Here, H0: Residuals are iid noise.
We can see the summary of the residuals test here all p-value is greater than Alpha (0.05). So, here we can not reject H0. So, the M3 model satisfied all assumptions of residuals.

5. Model Selection Criteria:

Model	AIC	AICc	BIC
M1
M2
M3

Model	MSE	RMSE	MAE	MPE	MAPE
M1
M2
M3