Dataset is downloaded from Kaggel.com website. The dataset contains two columns: Quarterly time from year 1947 to 2020 and Quarterly US GDP values from 1947 to 2020.
\(\underline{\textbf{ARIMA model:}}\)
ARIMA (p,d,q): which is an ARMA(p,q) model where \(Y_t\) has been differenced \(d\) times.
AR(p): \(Y_t=c+ \phi Y_{t-1}+ \phi Y_{t-2}+\cdots + \phi Y_{t-p} +e_t\)
MA(q): \(Y_t=c+ e_t+ \theta e_{t-1}+ \theta e_{t-2}+\cdots + \theta e_{t-q}\)
\(\underline{\textbf{4 stages of analysis:}}\)
- Identification: test for stationality and deciding for appropriate model (best value of \(p,d,q\))
- Estimation: finding the best fit model. Get estimates of coefficient
- Diagnostic: check residuals are white noise (iid) random independent errors
- Forecasting: constructing forecasting confident interval
library(readr)
library(itsmr)
library(tseries)
library(forecast)
library(aTSA)
setwd("~/Documents/Academics/Oakland_University/Grad_Courses/STAT/STAT _5330(TimeSeries)/FINAL_PROJECT")
gdp_us <- read_csv("GDPUS.csv")
sum(which(is.na(gdp_us$NA000334Q)))
## [1] 0
There is no missing value so we’re good. Let’s graph the histogram next to see the overall data.
hist(gdp_us$NA000334Q, col = "lightblue",
main = "Histogram of US Quartly GDP from 1947 to 2020",
xlab = "Year")
gdp_us_ts <- ts(gdp_us$NA000334Q);class(gdp_us_ts);plotc(gdp_us_ts)
## [1] "ts"
We see an upward trend thus I’m going to apply differencing next
Take the differencing to stabilizing the mean, however, we see increase in variance. I take the inverse transformation next.
Construct the time series with difference of inverse in real GDP: \(y_t\):
\[y_t=-1/ GDP_t- (-1/ GDP_{t-1})\]
The data is now significantly more stationary (the mean and variance are roughly consistent over the years).
par(mfrow=c(1,2));gdp_us_tsd <- diff(gdp_us_ts);plotc(gdp_us_tsd)
gdp_us_tsdi <- diff(-1/(gdp_us_tsd));plotc(gdp_us_tsdi)
adf.test(gdp_us_tsdi)
## Augmented Dickey-Fuller Test
## alternative: stationary
##
## Type 1: no drift no trend
## lag ADF p.value
## [1,] 0 -29.6 0.01
## [2,] 1 -20.8 0.01
## [3,] 2 -16.3 0.01
## [4,] 3 -14.6 0.01
## [5,] 4 -13.2 0.01
## [6,] 5 -12.0 0.01
## Type 2: with drift no trend
## lag ADF p.value
## [1,] 0 -29.6 0.01
## [2,] 1 -20.8 0.01
## [3,] 2 -16.3 0.01
## [4,] 3 -14.6 0.01
## [5,] 4 -13.2 0.01
## [6,] 5 -12.0 0.01
## Type 3: with drift and trend
## lag ADF p.value
## [1,] 0 -29.5 0.01
## [2,] 1 -20.8 0.01
## [3,] 2 -16.3 0.01
## [4,] 3 -14.5 0.01
## [5,] 4 -13.1 0.01
## [6,] 5 -12.0 0.01
## ----
## Note: in fact, p.value = 0.01 means p.value <= 0.01
Next, we need to find the best ARIMA model. We use ACF and PACF to find the best values of \(p,d,q\). Using general guidline:
- AR(p): ACF declines quickly to 0, and PACF has significant spikes at lag 1 to p.
- MA(q): ACF has significant spikes at lag 1 to p, and PACF declines quickly to 0.
par(mfrow=c(2,2))
acf(gdp_us$NA000334Q, main="ACF for Real GDP")
pacf(gdp_us$NA000334Q, main="PACF for Real GDP")
acf(gdp_us_tsdi, main="ACF for GDP after taking inverse diffrenced")
pacf(gdp_us_tsdi, main="PACF for GDP after taking inverse diffrenced")
So the suggesting model here is ARIMA \((0,1,1)\)
gdp_fit <- arima(gdp_us_tsdi, order = c(0,0,1))
print(gdp_fit)
##
## Call:
## arima(x = gdp_us_tsdi, order = c(0, 0, 1))
##
## Coefficients:
## ma1 intercept
## -1.0000 0e+00
## s.e. 0.0112 1e-04
##
## sigma^2 estimated as 6.977e-05: log likelihood = 986.83, aic = -1967.67
The equation is:
\(Y_t=e_t-1.0000(e_{t-1}-e_{t-2})=0,\;\text{where}\; e \sim \text{WN}\)
Checking validity
checkresiduals(gdp_fit)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,1) with non-zero mean
## Q* = 1.2503, df = 9, p-value = 0.9986
##
## Model df: 1. Total lags used: 10
p-value is \(0.99> 0.05\), implies the residuals are white noise.
Check the estimated model for adequacy
Plot inverted AR and MA roots to check stationarity and invertibility. If time series can be represented as a finite order moving average process, it is stationary. If time series can be represented as a finite order autoregressive process, it is invertible.
plot(gdp_fit)
From the figure above, the inverse MA roots lie inside the unit circle. Thus this time series are stationary.
gdp_forecast<- forecast(gdp_fit, lead =5*4)
## Forecast for univariate time series:
## Lead Forecast S.E Lower Upper
## 295 1 1.53e-04 0.00837 -0.0162 0.0166
## 296 2 -1.75e-06 0.01181 -0.0232 0.0232
## 297 3 -1.75e-06 0.01181 -0.0232 0.0232
## 298 4 -1.75e-06 0.01181 -0.0232 0.0232
## 299 5 -1.75e-06 0.01181 -0.0232 0.0232
## 300 6 -1.75e-06 0.01181 -0.0232 0.0232
## 301 7 -1.75e-06 0.01181 -0.0232 0.0232
## 302 8 -1.75e-06 0.01181 -0.0232 0.0232
## 303 9 -1.75e-06 0.01181 -0.0232 0.0232
## 304 10 -1.75e-06 0.01181 -0.0232 0.0232
## 305 11 -1.75e-06 0.01181 -0.0232 0.0232
## 306 12 -1.75e-06 0.01181 -0.0232 0.0232
## 307 13 -1.75e-06 0.01181 -0.0232 0.0232
## 308 14 -1.75e-06 0.01181 -0.0232 0.0232
## 309 15 -1.75e-06 0.01181 -0.0232 0.0232
## 310 16 -1.75e-06 0.01181 -0.0232 0.0232
## 311 17 -1.75e-06 0.01181 -0.0232 0.0232
## 312 18 -1.75e-06 0.01181 -0.0232 0.0232
## 313 19 -1.75e-06 0.01181 -0.0232 0.0232
## 314 20 -1.75e-06 0.01181 -0.0232 0.0232
## ------
## Note: confidence level = 95 %
plot(gdp_forecast)
\[\textbf{THANK YOU FOR LISTENING!}\]