Loading and cleaning of data

Define date and arrange ascending

library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
#format(Sys.Date(), "%a %b %d")
Daterow <- seq.Date(from = as.Date("2012/4/01"), to = as.Date("2020/3/01"), by = "month")
Daterow <- as.Date(Daterow, format = "%m/%y")

IIP <- cbind(Daterow,IIP)
as.Date(IIP$Daterow, "%m, %y")
##  [1] "2012-04-01" "2012-05-01" "2012-06-01" "2012-07-01" "2012-08-01"
##  [6] "2012-09-01" "2012-10-01" "2012-11-01" "2012-12-01" "2013-01-01"
## [11] "2013-02-01" "2013-03-01" "2013-04-01" "2013-05-01" "2013-06-01"
## [16] "2013-07-01" "2013-08-01" "2013-09-01" "2013-10-01" "2013-11-01"
## [21] "2013-12-01" "2014-01-01" "2014-02-01" "2014-03-01" "2014-04-01"
## [26] "2014-05-01" "2014-06-01" "2014-07-01" "2014-08-01" "2014-09-01"
## [31] "2014-10-01" "2014-11-01" "2014-12-01" "2015-01-01" "2015-02-01"
## [36] "2015-03-01" "2015-04-01" "2015-05-01" "2015-06-01" "2015-07-01"
## [41] "2015-08-01" "2015-09-01" "2015-10-01" "2015-11-01" "2015-12-01"
## [46] "2016-01-01" "2016-02-01" "2016-03-01" "2016-04-01" "2016-05-01"
## [51] "2016-06-01" "2016-07-01" "2016-08-01" "2016-09-01" "2016-10-01"
## [56] "2016-11-01" "2016-12-01" "2017-01-01" "2017-02-01" "2017-03-01"
## [61] "2017-04-01" "2017-05-01" "2017-06-01" "2017-07-01" "2017-08-01"
## [66] "2017-09-01" "2017-10-01" "2017-11-01" "2017-12-01" "2018-01-01"
## [71] "2018-02-01" "2018-03-01" "2018-04-01" "2018-05-01" "2018-06-01"
## [76] "2018-07-01" "2018-08-01" "2018-09-01" "2018-10-01" "2018-11-01"
## [81] "2018-12-01" "2019-01-01" "2019-02-01" "2019-03-01" "2019-04-01"
## [86] "2019-05-01" "2019-06-01" "2019-07-01" "2019-08-01" "2019-09-01"
## [91] "2019-10-01" "2019-11-01" "2019-12-01" "2020-01-01" "2020-02-01"
## [96] "2020-03-01"
View(IIP)
plot(IIP$Daterow, IIP$Manufacturing, type="l", xlab = "year",ylab = "Manufacturing Index")

# IIP doesn't look stationary 
d.manu <- diff(IIP$Manufacturing)
plot(d.manu, type="l", xlab = "year",ylab = "Manufacturing Index")

#Lambda<- BoxCox.lambda(IIP$Manufacturing)
#summary(Lambda)
#plot(Lambda)
# d.GI looks stationary 
#summary(IIP$Manufacturing)
#summary(d.manu)

perform Dickey Fuller tests and augmented Dickey Fuller test to be statistically obvious and proceed.

#DF & ADF tests for stationarity in Y 
#k is the number of lags, Dickey Fuller test for stationarity
#adf.test(IIP$Manufacturing,"stationary", k=0)

#k is the number of lags, Augmented Dickey Fuller test for stationarity
#adf.test(IIP$Manufacturing,"stationary")

#DF & ADF for diff(positive cases)
#adf.test(d.manu,"stationary",k=0)
#adf.test(d.manu,"stationary")

The autocorrelation function (ACF) gives the autocorrelation at all possible lags. The autocorrelation at lag 0 is included by default which always takes the value 1 as it represents the correlation between the data and themselves. This function also helps in predicting which model to use under time series. 1. Autoregression (AR) model 2. Moving average (MA) model 3. ARMA (AR+MA) 4. ARIMA Autoregression Integrated Moving Average model. As well as to get a rough estimate of number of lags in the model.

#ACF and PACF graphs for visualising the differenced value 
#acf(d.manu, na.action = na.omit)
#pacf(d.manu, na.action = na.omit)

As we can infer from the graph above, the autocorrelation continues to decrease as the lag increases, confirming that there is no linear association between observations separated by larger lags. Also, the autocorrelation is oscillating, meaning the coefficient of the dependent variable is negative.

Approximate Model

Approximate Model

Time Series Modelling

#ARIMA Model
#auto.arima(IIP$Manufacturing, trace=TRUE) 

All the possible models are estimated here, Under ARIMA model (p,d,q) p = number of lags for autoregression (i.e. past values of IIP) d = number of times differenced (Integrated) q = number of lags of the residual value (i.e. past values of the unexplained error term)

and it is observed that all estimated models have d = 1. As we already, saw earlier that out positive cases was not statonary. Hence, the auto.arima function made the series stationary by differencing it once. As tested earlier, which is stationary.

The best model suggested is ARIMA(2,1,0) . We have to check the AIC/ BIC values for its minimal to choose the model. At the same time, the model should be parsimonious i.e. having lesser varaibles.

ARIMA(2,1,0) is the best model as per lowest AIC value. Estimating the model below -

#fitarima <- Arima(IIP$Manufacturing, order = c(1,1,1), include.drift = TRUE)
#coeftest(fitarima)

All the estimates are significant with p values < 0.05.

Residuals should be awhite noise, if not, there is some information left out in the residuals that is not captured in the projection For white noise series, we expect each autocorrelation to be close to zero. Of course, they will not be exactly equal to zero as there is some random variation. For a white noise series, we expect 95% of the spikes in the ACF to lie within ±2/√T where T is the length of the time series. It is common to plot these bounds on a graph of the ACF (the blue dashed lines above). If one or more large spikes are outside these bounds, or if substantially more than 5% of spikes are outside these bounds, then the series is probably not white noise.

#checkresiduals(fitarima)

The ACF plot of the residuals from the ARIMA model shows that all autocorrelations are not within the threshold limits, indicating that the residuals are not behaving like white noise. A portmanteau test returns a small p-value, also suggesting that the residuals are not white noise.

Seasonal Trend Decomposition using Loess (STL) works as an additive process where the data is decomposed into trend, seasonality and remainder and each component of the data can be taken apart for analysis. the strength of seasonality on the data and get a score for it using this formula: Approximate Model Source: https://otexts.com/fpp2/seasonal-strength.html A seasonal strength of above 0.64 has been regarded as strong

t <- as.vector(t(IIP$Manufacturing))
ts <- ts(t[1:96], start = c(2012,04), frequency = 12)

View(ts)

modelStl <- stl(ts, s.window = "periodic")

plot(modelStl)

we note that the trend dominates the data series and consequently the grey bars are of similar size.

Seasonal Mixed Model

library(astsa)
## 
## Attaching package: 'astsa'
## The following object is masked from 'package:forecast':
## 
##     gas
diff12 <- diff(IIP$Manufacturing, 12)
adf.test(diff12, "stationary")
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff12
## Dickey-Fuller = -0.85482, Lag order = 4, p-value = 0.9531
## alternative hypothesis: stationary
diff12and1 <- diff(diff12,1)
adf.test(diff12and1, "stationary")
## Warning in adf.test(diff12and1, "stationary"): p-value smaller than printed p-
## value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff12and1
## Dickey-Fuller = -5.7975, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
acf2(diff12and1,48)

##       [,1]  [,2] [,3]  [,4]  [,5] [,6] [,7]  [,8] [,9] [,10] [,11] [,12] [,13]
## ACF  -0.25  0.02 0.01 -0.13 -0.03 0.04 0.16 -0.11 0.09 -0.02  0.11 -0.28  0.13
## PACF -0.25 -0.05 0.00 -0.13 -0.11 0.00 0.18 -0.04 0.04  0.02  0.18 -0.25  0.00
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF  -0.03 -0.05  0.16 -0.03     0  0.04  0.00 -0.07 -0.03  0.06  0.03  0.02
## PACF -0.02  0.00  0.05  0.01     0  0.15 -0.02 -0.02 -0.09  0.11 -0.01  0.03
##      [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37]
## ACF   0.01  0.03 -0.11 -0.01  0.03 -0.14  0.03  0.06  0.02 -0.05  0.03 -0.08
## PACF -0.05  0.07 -0.04 -0.05 -0.05 -0.08 -0.10  0.03 -0.01 -0.02 -0.02 -0.02
##      [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48]
## ACF   0.05  0.02 -0.03  0.03 -0.02  0.06 -0.03  0.05 -0.08  0.04 -0.08
## PACF  0.09  0.05 -0.08  0.00  0.04  0.01 -0.04  0.06 -0.03  0.07 -0.11
plot (diff12and1, type = "l", xlab = "lag", ylab ="Stationary Seasonally adjusted Manufacturing Index")

fit <- auto.arima(ts, trace=TRUE, test="kpss", ic="bic")
## 
##  ARIMA(2,1,2)(1,0,1)[12] with drift         : 586.481
##  ARIMA(0,1,0)            with drift         : 646.8043
##  ARIMA(1,1,0)(1,0,0)[12] with drift         : 583.5002
##  ARIMA(0,1,1)(0,0,1)[12] with drift         : 582.7973
##  ARIMA(0,1,0)                               : 642.2885
##  ARIMA(0,1,1)            with drift         : 591.024
##  ARIMA(0,1,1)(1,0,1)[12] with drift         : 574.488
##  ARIMA(0,1,1)(1,0,0)[12] with drift         : 574.8235
##  ARIMA(0,1,1)(2,0,1)[12] with drift         : 578.1159
##  ARIMA(0,1,1)(1,0,2)[12] with drift         : 578.0847
##  ARIMA(0,1,1)(0,0,2)[12] with drift         : 579.7059
##  ARIMA(0,1,1)(2,0,0)[12] with drift         : 574.124
##  ARIMA(0,1,0)(2,0,0)[12] with drift         : 596.4863
##  ARIMA(1,1,1)(2,0,0)[12] with drift         : 577.164
##  ARIMA(0,1,2)(2,0,0)[12] with drift         : 577.3498
##  ARIMA(1,1,0)(2,0,0)[12] with drift         : 580.597
##  ARIMA(1,1,2)(2,0,0)[12] with drift         : 581.6329
##  ARIMA(0,1,1)(2,0,0)[12]                    : 569.9051
##  ARIMA(0,1,1)(1,0,0)[12]                    : 571.0745
##  ARIMA(0,1,1)(2,0,1)[12]                    : 573.9534
##  ARIMA(0,1,1)(1,0,1)[12]                    : 570.4569
##  ARIMA(0,1,0)(2,0,0)[12]                    : 591.9771
##  ARIMA(1,1,1)(2,0,0)[12]                    : 572.9409
##  ARIMA(0,1,2)(2,0,0)[12]                    : 573.1104
##  ARIMA(1,1,0)(2,0,0)[12]                    : 576.0453
##  ARIMA(1,1,2)(2,0,0)[12]                    : 577.4237
## 
##  Best model: ARIMA(0,1,1)(2,0,0)[12]
sarima(ts, 0,1,1,2,0,0,12)
## initial  value 1.963973 
## iter   2 value 1.636345
## iter   3 value 1.528207
## iter   4 value 1.521169
## iter   5 value 1.514816
## iter   6 value 1.512420
## iter   7 value 1.509543
## iter   8 value 1.509011
## iter   9 value 1.508971
## iter  10 value 1.508955
## iter  11 value 1.508953
## iter  12 value 1.508953
## iter  13 value 1.508952
## iter  14 value 1.508952
## iter  14 value 1.508952
## iter  14 value 1.508952
## final  value 1.508952 
## converged
## initial  value 1.509705 
## iter   2 value 1.501966
## iter   3 value 1.494504
## iter   4 value 1.485137
## iter   5 value 1.483398
## iter   6 value 1.483039
## iter   7 value 1.482932
## iter   8 value 1.482928
## iter   9 value 1.482928
## iter   9 value 1.482928
## iter   9 value 1.482928
## final  value 1.482928 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##           ma1    sar1    sar2  constant
##       -0.7475  0.3090  0.3534    0.1595
## s.e.   0.0837  0.1277  0.1452    0.2565
## 
## sigma^2 estimated as 18.01:  log likelihood = -275.68,  aic = 561.35
## 
## $degrees_of_freedom
## [1] 91
## 
## $ttable
##          Estimate     SE t.value p.value
## ma1       -0.7475 0.0837 -8.9320  0.0000
## sar1       0.3090 0.1277  2.4197  0.0175
## sar2       0.3534 0.1452  2.4344  0.0169
## constant   0.1595 0.2565  0.6217  0.5357
## 
## $AIC
## [1] 5.908996
## 
## $AICc
## [1] 5.913674
## 
## $BIC
## [1] 6.04341
#serial.test(test, lags.pt=10, type="PT.asymptotic")
#Box.test(test$residuals,type="Ljung",lag=20)

Forecasting

for0 <- forecast(fit,6)
for0
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Apr 2020       121.3803 115.8554 126.9052 112.9306 129.8299
## May 2020       126.9431 121.2203 132.6659 118.1909 135.6954
## Jun 2020       124.2620 118.3480 130.1761 115.2172 133.3068
## Jul 2020       125.3735 119.2741 131.4729 116.0453 134.7017
## Aug 2020       124.8019 118.5227 131.0811 115.1988 134.4051
## Sep 2020       124.4126 117.9586 130.8666 114.5420 134.2831
plot(for0, type = "l", xlab = "year", ylab ="Forecasted Manufacturing Index")

for1 <- sarima.for(IIP$Manufacturing, 6, 0,1,1,2,0,0,12)

round(for1$pred)
## Time Series:
## Start = 97 
## End = 102 
## Frequency = 1 
## [1] 122 128 125 126 126 125
round(for1$se, 2)
## Time Series:
## Start = 97 
## End = 102 
## Frequency = 1 
## [1] 4.24 4.38 4.51 4.63 4.75 4.87

The output prints the forecasts and the standard errors of the forecasts, and supplies a graphic of the forecast with +/- 1 and 2 prediction error bounds

author: “Ramya Emandi”