Time series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data. Time series forecasting is the use of a model to predict future values based on previously observed values of the same component.
In this project, a time series data of the monthly volume of commercial bank real-estate loans, in billions of dollars, from January 1973 to October 1978, is given. It has a total of 70 observations.
The data is as follows:
46.5 47.0 47.5 48.3 49.1 50.1 51.1 52.0 53.2 53.9 54.5 55.2 55.6 55.7
56.1 56.8 57.5 58.3 58.9 59.4 59.8 60.0 60.0 60.3 60.1 59.7 59.5 59.4
59.3 59.2 59.1 59.0 59.3 59.5 59.5 59.5 59.7 59.7 60.5 60.7 61.3 61.4
61.8 62.4 62.4 62.9 63.2 63.4 63.9 64.5 65.0 65.4 66.3 67.7 69.0 70.0
71.4 72.5 73.4 74.6 75.2 75.9 76.8 77.9 79.2 80.5 82.6 84.4 85.9 87.6
We will perform the following steps in order to analyze the data and make predictions:
First, we will load the required packages
library(tseries)
library(forecast)
Once the packages are loaded, we will import the data in R as time series.
bank_case <- as.ts(scan("bank_case.txt"))
bank_case
## Time Series:
## Start = 1
## End = 70
## Frequency = 1
## [1] 46.5 47.0 47.5 48.3 49.1 50.1 51.1 52.0 53.2 53.9 54.5 55.2 55.6 55.7
## [15] 56.1 56.8 57.5 58.3 58.9 59.4 59.8 60.0 60.0 60.3 60.1 59.7 59.5 59.4
## [29] 59.3 59.2 59.1 59.0 59.3 59.5 59.5 59.5 59.7 59.7 60.5 60.7 61.3 61.4
## [43] 61.8 62.4 62.4 62.9 63.2 63.4 63.9 64.5 65.0 65.4 66.3 67.7 69.0 70.0
## [57] 71.4 72.5 73.4 74.6 75.2 75.9 76.8 77.9 79.2 80.5 82.6 84.4 85.9 87.6
Now that we have imported the data as time series, we need to plot the time series data and its ACF PACF. This will help us to have an idea about the time series data and answer the following questions:
par(mfrow=c(1,3))
plot(bank_case,main="Time Series")
acf(bank_case, main="ACF of real-estate loans data",lag.max=10,ylim = c(-1,1))
pacf(bank_case,main="PACF of real-estate loans data",lag.max=10,ylim=c(-1,1))
It can be clearly seen from the plot of the time series that the time series is not stationary and requires differentiation in order to become stationary. The ACF also decreases gradually which also hints that the given time series is not stationary.
Though it is clearly seen in the plots that the time series is not stationary, we will still perform the adf test , at a significance level of 0.05, to determine the stationarity.
adf.test(bank_case)
##
## Augmented Dickey-Fuller Test
##
## data: bank_case
## Dickey-Fuller = -0.25591, Lag order = 4, p-value = 0.99
## alternative hypothesis: stationary
We can see in the output that the p-value = 0.99 which is greater than 0.05. Hence, at a significance level of 0.05.Thus, we fail to reject the null hypothesis and conclude that the time series is non stationary.
We will take first order difference and check the stationarity once again :
bank_case_d1 <- diff(bank_case)
adf.test(bank_case_d1)
##
## Augmented Dickey-Fuller Test
##
## data: bank_case_d1
## Dickey-Fuller = -1.7615, Lag order = 4, p-value = 0.6721
## alternative hypothesis: stationary
Again, we can see from the output that the p-value = 0.6721 which is still greater than 0.05. Thus, at a significance level of 0.05, we again fail to reject the null hypothesis and conclude that the time series is nonstationary.
We will now take a second order difference and test the stationarity.
bank_case_d2 <- diff(bank_case_d1)
adf.test(bank_case_d2)
##
## Augmented Dickey-Fuller Test
##
## data: bank_case_d2
## Dickey-Fuller = -4.0774, Lag order = 4, p-value = 0.01157
## alternative hypothesis: stationary
After the second order difference, we can see from the output that the p-value = 0.01157 which is less than 0.05.Hence, at a significance level of 0.05, we reject the null hypothesis and conclude that after taking the second order difference, the time series is stationary.
Below is the plot of the second order time series , its ACF and PACF
par(mfrow=c(1,3))
plot(bank_case_d2,main="SOD Time Series")
acf(bank_case_d2, main="ACF of SOD of Time Series ",lag.max=10,ylim = c(-1,1))
pacf(bank_case_d2,main="PACF of SOD of Time Series",lag.max=10,ylim=c(-1,1))
Now that we have made the time series stationary, we will fit in the model for the original time series data and mention d=2. The above ACF and PACF suggests that MA(1) is a good candidate.
bank_fit <- arima(x = bank_case, order = c(0, 2, 1))
bank_fit
##
## Call:
## arima(x = bank_case, order = c(0, 2, 1))
##
## Coefficients:
## ma1
## -0.3722
## s.e. 0.1070
##
## sigma^2 estimated as 0.08094: log likelihood = -11.09, aic = 26.17
From the above output, It can be seen that for |-0.3722/0.1070|= 3.47 is greater than 2 , hence we reject the null hypothesis and conclude that it is significant.
Once we have estimated the parameters , we will find the fitted values
fitted(bank_fit)
## Time Series:
## Start = 1
## End = 70
## Frequency = 1
## [1] 46.47920 47.06127 47.50005 48.00251 48.99033 49.85927 51.01043
## [8] 52.06666 52.92481 54.29758 54.74797 55.19228 55.89713 56.11058
## [15] 55.95280 56.44522 57.36796 58.15086 59.04450 59.55378 59.95723
## [22] 60.25852 60.29621 60.11024 60.52938 60.05980 59.43390 59.27540
## [29] 59.25363 59.18274 59.09358 58.99761 58.89911 59.45080 59.68169
## [36] 59.56762 59.52517 59.83493 59.75022 61.02096 61.01945 61.79559
## [43] 61.64722 62.14314 62.90441 62.58772 63.28378 63.53118 63.64882
## [50] 64.30652 65.02799 65.51042 65.84109 67.02921 68.85036 70.24431
## [57] 71.09092 72.68497 73.66884 74.40005 75.72559 75.99560 76.63558
## [64] 77.63881 78.90279 80.38939 81.75884 84.38695 86.19514 87.50984
While modeling the time series data, we made certain assumptions of the nature of error.
We need to plot the residual, its ACF and PACF in order to see if the assumptions are still intact. If the model fits the data well, then residual will behave like white noises.
#Residual
par(mfrow=c(1,3))
plot(bank_fit$residuals, ylab='Residuals')
acf(bank_fit$residuals,ylim=c(-1,1))
pacf(bank_fit$residuals,ylim=c(-1,1))
From the residual plot , we can confirm that the residual has a mean of 0 and the variance is constant as well . The ACF is 0 for lag> 0 , and the PACF is 0 as well.
So, we can say that the residual behaves like white noise and conclude that the model ARIMA(0,2,1) fits the data well. Alternatively, we can also test at a significance level of 0.05 if residual follow white noise.
checkresiduals(bank_fit)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,2,1)
## Q* = 9.141, df = 9, p-value = 0.4244
##
## Model df: 1. Total lags used: 10
Here, the p value is 0.4244 which is greater than 0.05 . Hence , at a significance level of 0.05 we fail to reject the null hypothesis and conclude that the residual follows white noise. This means that the model fits the data well.
Now that we know the model is a good fit, we need to forecast for the next two years. This can be done as follows:
bank_predict <- forecast(bank_fit,h=24)
bank_predict
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 71 89.26645 88.90184 89.63105 88.70883 89.82406
## 72 90.93289 90.23633 91.62946 89.86759 91.99820
## 73 92.59934 91.52157 93.67711 90.95103 94.24765
## 74 94.26579 92.76016 95.77142 91.96312 96.56845
## 75 95.93223 93.95588 97.90859 92.90966 98.95481
## 76 97.59868 95.11200 100.08536 93.79563 101.40173
## 77 99.26512 96.23123 102.29902 94.62518 103.90506
## 78 100.93157 97.31583 104.54731 95.40177 106.46137
## 79 102.59802 98.36770 106.82834 96.12830 109.06773
## 80 104.26446 99.38846 109.14046 96.80727 111.72166
## 81 105.93091 100.37954 111.48228 97.44083 114.42099
## 82 107.59736 101.34217 113.85254 98.03088 117.16384
## 83 109.26380 102.27745 116.25016 98.57909 119.94851
## 84 110.93025 103.18635 118.67414 99.08698 122.77351
## 85 112.59669 104.06977 121.12362 99.55589 125.63750
## 86 114.26314 104.92850 123.59778 99.98704 128.53924
## 87 115.92959 105.76327 126.09590 100.38155 131.47762
## 88 117.59603 106.57475 128.61731 100.74044 134.45162
## 89 119.26248 107.36357 131.16139 101.06466 137.46030
## 90 120.92893 108.13027 133.72758 101.35508 140.50277
## 91 122.59537 108.87541 136.31534 101.61250 143.57825
## 92 124.26182 109.59946 138.92417 101.83768 146.68596
## 93 125.92826 110.30290 141.55363 102.03132 149.82520
## 94 127.59471 110.98614 144.20328 102.19410 152.99533
plot(forecast(bank_fit,h=24))
Based on the steps described above, the final equation of the given time series is:
\[ Zt - μ = (1-θ1B- θ2B^2) at \] \[ Zt = (1+0.3722B) at, at = N(0,0.08094) \]