The series in the Loan.csv is the monthly volume of commercial bank real-estate loans, in billions of dollars, from January 1973 to October 1978, a total of 70 observations. The data are derived from reports to the Federal Reserve System from large commercial banks. For such a data set, check for stationarity, build an ARIMA model, perform data transformation/difference (if needed), model identification, model selection, diagnostic checking, parameter estimation, and forecast the next two years.
data <- as.ts(read.csv('Loans.csv'))
data %>% ggtsdisplay(lag.max = 40)
data %>% diff() %>% acf()
bcx <- BoxCox.ar(data)
bcx$lambda[which.max(bcx$loglike)]
## [1] -0.1
par(mfrow=c(1,2))
plot(1/(data^0.5))
plot(log(data))
data_Xn <- (1/(data^0.5))
data_Xn %>% ggtsdisplay(lag.max = 40)
data_Xn %>% adf.test()
##
## Augmented Dickey-Fuller Test
##
## data: .
## Dickey-Fuller = -1.1827, Lag order = 4, p-value = 0.9028
## alternative hypothesis: stationary
data_Xn %>% diff() %>% adf.test()
##
## Augmented Dickey-Fuller Test
##
## data: .
## Dickey-Fuller = -2.405, Lag order = 4, p-value = 0.4106
## alternative hypothesis: stationary
data_Xn %>% diff() %>% diff() %>% adf.test()
##
## Augmented Dickey-Fuller Test
##
## data: .
## Dickey-Fuller = -3.9737, Lag order = 4, p-value = 0.01628
## alternative hypothesis: stationary
data_Xn %>% diff() %>% diff() %>% ggtsdisplay(lag.max = 40)
data_Xn %>% diff() %>% diff() %>% eacf()
## AR/MA
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 x o o o o o o o o o o o o o
## 1 x o o o o o o o o o o o o o
## 2 x o o o o o o o o o o o o o
## 3 x o o o o o o o o o o o o o
## 4 o o o o o o o o o o o o o o
## 5 o o x x o o o o o o o o o o
## 6 x x o o x o o o o o o o o o
## 7 x x o o x o o o o o o o o o
EACF Analysis:
Further, ARIMA(0,2,1), ARIMA(0,2,2) and ARIMA(1,2,1) are considered and checked for better coefficient values.
(fit1 <- arima(data_Xn, order = c(0,2,1)))
##
## Call:
## arima(x = data_Xn, order = c(0, 2, 1))
##
## Coefficients:
## ma1
## -0.3931
## s.e. 0.1104
##
## sigma^2 estimated as 8.189e-08: log likelihood = 451.5, aic = -901
(fit2 <- arima(data_Xn, order = c(0,2,2)))
##
## Call:
## arima(x = data_Xn, order = c(0, 2, 2))
##
## Coefficients:
## ma1 ma2
## -0.4241 0.1049
## s.e. 0.1234 0.1508
##
## sigma^2 estimated as 8.131e-08: log likelihood = 451.73, aic = -899.46
(fit3 <- arima(data_Xn, order = c(1,2,1)))
##
## Call:
## arima(x = data_Xn, order = c(1, 2, 1))
##
## Coefficients:
## ar1 ma1
## -0.1701 -0.2476
## s.e. 0.2781 0.2700
##
## sigma^2 estimated as 8.149e-08: log likelihood = 451.66, aic = -899.31
coeftest(fit2)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ma1 -0.42408 0.12341 -3.4364 0.0005895 ***
## ma2 0.10494 0.15082 0.6958 0.4865634
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest(fit3)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.17015 0.27807 -0.6119 0.5406
## ma1 -0.24755 0.27001 -0.9168 0.3592
checkresiduals(fit1)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,2,1)
## Q* = 7.5676, df = 9, p-value = 0.5782
##
## Model df: 1. Total lags used: 10
ARIMA(0,2,1) is chosen further for forecast
data_Xn %>%
Arima(order=c(0,2,1)) %>%
forecast(h = 24) %>%
autoplot() +
ylab("Monthly volume of commercial bank real-estate loans") + xlab("Year")
Above is the forecast for transformed data of Monthly volume of commercial bank real-estate loans for Nov 1978 to Oct 1980
data %>%
Arima(order=c(0,2,1), lambda = -0.1) %>%
forecast(h = 24) %>%
autoplot() +
ylab("Monthly volume of commercial bank real-estate loans") + xlab("Year")
Above is the forecast for original data of Monthly volume of commercial bank real-estate loans for Nov 1978 to Oct 1980. The diverging of confidence interval is due to differencing of order 2.
The data set Disposable_Income.csv contains (read across) the quarterly disposable income in Japan during the period of 1961 through 1987. Fit an appropriate (seasonal) ARIMA model to the disposable income series and forecast the disposable income in 1988.
inc <- as.ts(read.csv('Disposable_Income.csv'))
inc %>% ggtsdisplay(lag.max = 40)
inc %>% diff() %>% acf()
Further, variance is taken care before seasonality to avoid any inclusion of negative values.
inc %>% diff(lag = 4) %>% ggtsdisplay(lag.max = 40)
inc %>% log() %>% ggtsdisplay(lag.max = 40)
inc %>% log() %>% diff(lag = 4) %>% ggtsdisplay(lag.max = 40)
However, the PACF plot shows potential AR(3) related model. Hence ur.df test is used instead of adf test for stationarity check.
(lagdf = (nrow(data)-1)^(1/3))
## [1] 4.081655
inc %>% log() %>% diff(lag = 4) %>%
ur.df(type ="none", lags = floor(lagdf)) %>% summary()
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression none
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.117279 -0.010617 0.001973 0.013726 0.067756
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## z.lag.1 -0.05398 0.04298 -1.256 0.2123
## z.diff.lag1 -0.46735 0.10415 -4.487 2.06e-05 ***
## z.diff.lag2 -0.28731 0.11468 -2.505 0.0140 *
## z.diff.lag3 0.07082 0.11315 0.626 0.5329
## z.diff.lag4 -0.18987 0.09983 -1.902 0.0603 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.02766 on 93 degrees of freedom
## Multiple R-squared: 0.356, Adjusted R-squared: 0.3214
## F-statistic: 10.28 on 5 and 93 DF, p-value: 7.22e-08
##
##
## Value of test-statistic is: -1.256
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau1 -2.58 -1.95 -1.62
Here, the test statisctic value -1.256 is greater than -1.95. Thus, the null hypothesis cannot be rejected here and the series is found be non-stationary
inc %>% log() %>% diff(lag = 4) %>% diff() %>% ggtsdisplay(lag.max = 40)
| fit | Models | AIC values |
|---|---|---|
| 1 | SARIMA(7,1,5)(0,1,0)[4] | -439.82 |
| 2 | SARIMA(6,1,5)(0,1,0)[4] | -442.53 |
| 3 | SARIMA(8,1,5)(0,1,0)[4] | -443.27 |
(fit_inc <- Arima(inc, order=c(8,1,5), seasonal=c(0,1,0),lambda=0 , include.constant = FALSE))
## Series: inc
## ARIMA(8,1,5)
## Box Cox transformation: lambda= 0
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 ar6 ar7 ar8
## 0.7636 -0.1935 0.5094 0.6696 -0.7659 0.1910 -0.5026 0.3231
## s.e. 0.1598 0.1695 0.2255 0.2123 0.1627 0.1735 0.2287 0.2129
## ma1 ma2 ma3 ma4 ma5
## -1.5023 0.7487 -0.5622 0.1170 0.2772
## s.e. 0.1748 0.2929 0.3452 0.4501 0.2354
##
## sigma^2 estimated as 0.0006411: log likelihood=235.64
## AIC=-443.27 AICc=-438.66 BIC=-405.99
checkresiduals(fit_inc)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(8,1,5)
## Q* = 5.8696, df = 3, p-value = 0.1181
##
## Model df: 13. Total lags used: 16
SARIMA(8,1,5)(0,1,0)[4] is chosen further for forecast
fit_inc %>% forecast(h=4) %>% autoplot() +
ylab("Quarterly disposable income in Japan 1961-1988 ") + xlab("Year")
coeftest(fit_inc)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.76357 0.15984 4.7772 1.778e-06 ***
## ar2 -0.19351 0.16951 -1.1416 0.253604
## ar3 0.50936 0.22549 2.2589 0.023891 *
## ar4 0.66956 0.21233 3.1534 0.001614 **
## ar5 -0.76594 0.16268 -4.7084 2.497e-06 ***
## ar6 0.19096 0.17352 1.1005 0.271094
## ar7 -0.50264 0.22868 -2.1980 0.027950 *
## ar8 0.32311 0.21286 1.5179 0.129030
## ma1 -1.50235 0.17480 -8.5947 < 2.2e-16 ***
## ma2 0.74875 0.29291 2.5562 0.010582 *
## ma3 -0.56221 0.34524 -1.6284 0.103431
## ma4 0.11702 0.45012 0.2600 0.794887
## ma5 0.27719 0.23540 1.1775 0.238979
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the coefficients of the model,