I. About Dataset and possible analysis

Dataset is downloaded from Kaggel.com website. The dataset contains two columns: Quarterly time from year 1947 to 2020 and Quarterly US GDP values from 1947 to 2020.

\(\underline{\textbf{ARIMA model:}}\)

ARIMA (p,d,q): which is an ARMA(p,q) model where \(Y_t\) has been differenced \(d\) times.

AR(p): \(Y_t=c+ \phi Y_{t-1}+ \phi Y_{t-2}+\cdots + \phi Y_{t-p} +e_t\)

MA(q): \(Y_t=c+ e_t+ \theta e_{t-1}+ \theta e_{t-2}+\cdots + \theta e_{t-q}\)

\(\underline{\textbf{4 stages of analysis:}}\)

Identification: test for stationality and deciding for appropriate model (best value of \(p,d,q\))

Estimation: finding the best fit model. Get estimates of coefficient

Diagnostic: check residuals are white noise (iid) random independent errors

Forecasting: constructing forecasting confident interval

II. Descriptive summary and data clean up

library(readr)
library(itsmr)
library(tseries)
library(forecast)
library(aTSA)
setwd("~/Documents/Academics/Oakland_University/Grad_Courses/STAT/STAT _5330(TimeSeries)/FINAL_PROJECT")
gdp_us <- read_csv("GDPUS.csv")
sum(which(is.na(gdp_us$NA000334Q)))

## [1] 0

There is no missing value so we’re good. Let’s graph the histogram next to see the overall data.

hist(gdp_us$NA000334Q, col = "lightblue",
     main = "Histogram of US Quartly GDP from 1947 to 2020",
     xlab = "Year")

gdp_us_ts <- ts(gdp_us$NA000334Q);class(gdp_us_ts);plotc(gdp_us_ts)

## [1] "ts"

We see an upward trend thus I’m going to apply differencing next

III. Identification

Take the differencing to stabilizing the mean, however, we see increase in variance. I take the inverse transformation next.

Construct the time series with difference of inverse in real GDP: \(y_t\):

\[y_t=-1/ GDP_t- (-1/ GDP_{t-1})\]

The data is now significantly more stationary (the mean and variance are roughly consistent over the years).

par(mfrow=c(1,2));gdp_us_tsd <- diff(gdp_us_ts);plotc(gdp_us_tsd)
gdp_us_tsdi <- diff(-1/(gdp_us_tsd));plotc(gdp_us_tsdi)

adf.test(gdp_us_tsdi)

## Augmented Dickey-Fuller Test 
## alternative: stationary 
##  
## Type 1: no drift no trend 
##      lag   ADF p.value
## [1,]   0 -29.6    0.01
## [2,]   1 -20.8    0.01
## [3,]   2 -16.3    0.01
## [4,]   3 -14.6    0.01
## [5,]   4 -13.2    0.01
## [6,]   5 -12.0    0.01
## Type 2: with drift no trend 
##      lag   ADF p.value
## [1,]   0 -29.6    0.01
## [2,]   1 -20.8    0.01
## [3,]   2 -16.3    0.01
## [4,]   3 -14.6    0.01
## [5,]   4 -13.2    0.01
## [6,]   5 -12.0    0.01
## Type 3: with drift and trend 
##      lag   ADF p.value
## [1,]   0 -29.5    0.01
## [2,]   1 -20.8    0.01
## [3,]   2 -16.3    0.01
## [4,]   3 -14.5    0.01
## [5,]   4 -13.1    0.01
## [6,]   5 -12.0    0.01
## ---- 
## Note: in fact, p.value = 0.01 means p.value <= 0.01

IV. Estimation

Next, we need to find the best ARIMA model. We use ACF and PACF to find the best values of \(p,d,q\). Using general guidline:

AR(p): ACF declines quickly to 0, and PACF has significant spikes at lag 1 to p.

MA(q): ACF has significant spikes at lag 1 to p, and PACF declines quickly to 0.

par(mfrow=c(2,2))
acf(gdp_us$NA000334Q, main="ACF for Real GDP")
pacf(gdp_us$NA000334Q, main="PACF for Real GDP")

acf(gdp_us_tsdi, main="ACF for GDP after taking inverse diffrenced")
pacf(gdp_us_tsdi, main="PACF for GDP after taking inverse diffrenced")

So the suggesting model here is ARIMA \((0,1,1)\)

gdp_fit <- arima(gdp_us_tsdi, order = c(0,0,1))
print(gdp_fit)

## 
## Call:
## arima(x = gdp_us_tsdi, order = c(0, 0, 1))
## 
## Coefficients:
##           ma1  intercept
##       -1.0000      0e+00
## s.e.   0.0112      1e-04
## 
## sigma^2 estimated as 6.977e-05:  log likelihood = 986.83,  aic = -1967.67

The equation is:

\(Y_t=e_t-1.0000(e_{t-1}-e_{t-2})=0,\;\text{where}\; e \sim \text{WN}\)

V. Diagnostic

Checking validity

checkresiduals(gdp_fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,1) with non-zero mean
## Q* = 1.2503, df = 9, p-value = 0.9986
## 
## Model df: 1.   Total lags used: 10

p-value is \(0.99> 0.05\), implies the residuals are white noise.

Check the estimated model for adequacy

Plot inverted AR and MA roots to check stationarity and invertibility. If time series can be represented as a finite order moving average process, it is stationary. If time series can be represented as a finite order autoregressive process, it is invertible.

plot(gdp_fit)

From the figure above, the inverse MA roots lie inside the unit circle. Thus this time series are stationary.

VI. Forecasting

gdp_forecast<- forecast(gdp_fit, lead =5*4)

## Forecast for univariate time series: 
##     Lead  Forecast     S.E   Lower  Upper
## 295    1  1.53e-04 0.00837 -0.0162 0.0166
## 296    2 -1.75e-06 0.01181 -0.0232 0.0232
## 297    3 -1.75e-06 0.01181 -0.0232 0.0232
## 298    4 -1.75e-06 0.01181 -0.0232 0.0232
## 299    5 -1.75e-06 0.01181 -0.0232 0.0232
## 300    6 -1.75e-06 0.01181 -0.0232 0.0232
## 301    7 -1.75e-06 0.01181 -0.0232 0.0232
## 302    8 -1.75e-06 0.01181 -0.0232 0.0232
## 303    9 -1.75e-06 0.01181 -0.0232 0.0232
## 304   10 -1.75e-06 0.01181 -0.0232 0.0232
## 305   11 -1.75e-06 0.01181 -0.0232 0.0232
## 306   12 -1.75e-06 0.01181 -0.0232 0.0232
## 307   13 -1.75e-06 0.01181 -0.0232 0.0232
## 308   14 -1.75e-06 0.01181 -0.0232 0.0232
## 309   15 -1.75e-06 0.01181 -0.0232 0.0232
## 310   16 -1.75e-06 0.01181 -0.0232 0.0232
## 311   17 -1.75e-06 0.01181 -0.0232 0.0232
## 312   18 -1.75e-06 0.01181 -0.0232 0.0232
## 313   19 -1.75e-06 0.01181 -0.0232 0.0232
## 314   20 -1.75e-06 0.01181 -0.0232 0.0232
## ------ 
## Note: confidence level = 95 %

plot(gdp_forecast)

\[\textbf{THANK YOU FOR LISTENING!}\]

Final Project, Time Series I, STAT 5330, W2023

Mimi Tran

2023-04-24