Problem 1

Category A: Real Gross Private Domestic Investment

The first step is to import the monthly time series for Real Gross Private Domestic Investment, FRED/GPDIC1, using Quandl packages. We will denote FRED/GPDIC1 as \(INV_{t}\).

inv <- Quandl("FRED/GPDIC1", type="ts")
str(inv)
 Time-Series [1:276] from 1947 to 2016: 223 206 200 238 263 ...
plot(inv, ylab="Real Gross Private Domestic Investment")

Judging by the movement of the data that exhibit an exponential trend and unit used in the dataset (billions of chained 2005 dollars seasonally adjusted), the proper transformation for this dataset is a log transformation. We define log changes of quarterly Real Gross Private Domestic Investment as \(LINV_{t}= \Delta \log{INV_{t}} = \log{INV_{t}} - \log{INV_{t-1}}\), where \(INV_{t}\) is the original quarterly Real Gross Private Domestic Investment, we can implement the calculation in R using the following code:

linv <- diff(log(inv), differences=1)

The plot of the new transformed \(LINV_{t}\) can be seen as follow:

plot(linv, ylab="Log of Real Gross Private Domestic Investment")

To check for the unit-root, we will run ADF test and KPSS test for \(LINV_{t}\) as follow:

linv.urdf <- ur.df(linv, type = "trend", selectlags = "BIC")
summary(linv.urdf)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression trend 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.180696 -0.021800  0.000491  0.023798  0.206680 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.044e-02  5.878e-03   1.777   0.0767 .  
## z.lag.1     -7.283e-01  7.575e-02  -9.614   <2e-16 ***
## tt          -2.402e-05  3.654e-05  -0.657   0.5116    
## z.diff.lag  -7.447e-02  6.038e-02  -1.233   0.2186    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04753 on 269 degrees of freedom
## Multiple R-squared:  0.3973, Adjusted R-squared:  0.3906 
## F-statistic: 59.11 on 3 and 269 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -9.6143 30.8294 46.2414 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau3 -3.98 -3.42 -3.13
## phi2  6.15  4.71  4.05
## phi3  8.34  6.30  5.36
test1 <- kpss.test(linv, null = "Trend")
test1

    KPSS Test for Trend Stationarity

data:  linv
KPSS Trend = 0.01618, Truncation lag parameter = 3, p-value = 0.1

From ADF test we conclude that \(LINV_{t}\) is trend stationary. KPSS test confirms our finding form ADF-test, it shows a borderline rejection of \(H_{0}: LINV_{t}\), our series is Trend Stationary.

The next step is to decide whether to use zero mean or non-zero mean in our specification for \(LINV_{t}\). To make the determination, we can compare the ADF test from both specification as follow:

linv.urdf.nonzeromean <- ur.df(linv, type ="drift", selectlags = "BIC")
linv.urdf.zeromean <- ur.df(linv, selectlags = "BIC")

We then compare the \(\tau_{3}\) statistics for non-zero and zero mean and found the following:

linv.urdf.nonzeromean@teststat[1]
[1] -9.604417
linv.urdf.zeromean@teststat[1]
[1] -9.219451

We found that non-zero mean specification is more preferable compared to zero mean. Hence, we will estimate ARIMA(p,0,q)(p,0,q)[4] with non-zero mean as our preferred specification if we want to conduct forecasting.

Category B: NYSE Composite Index

The first step is to import the daily closing price of NYSE Composite Index, YAHOO/INDEX_NYA, using Quandl packages. We will denote closing price of YAHOO/INDEX_NYA as \(NYSE_{t}\).

nyse <- Quandl("YAHOO/INDEX_NYA", type="zoo")
str(nyse[,4])
'zoo' series from 1965-12-31 to 2016-02-22
  Data: num [1:12623] 529 527 528 531 532 ...
  Index:  Date[1:12623], format: "1965-12-31" "1966-01-03" "1966-01-04" "1966-01-05" ...
plot(nyse[,4], ylab="Closing Price of NYSE Composite Index")

Judging by the movement of the data that exhibit an exponential trend and unit used in the dataset (current US$), the proper transformation for this dataset is a log transformation. We define log changes of quarterly Real Gross Private Domestic Investment as \(LNYSE_{t}= \Delta \log{NYSE_{t}} = \log{NYSE_{t}} - \log{NYSE_{t-1}}\), where \(NYSE_{t}\) is the original quarterly Real Gross Private Domestic Investment, we can implement the calculation in R using the following code:

lnyse <- diff(log(nyse[,4]), differences=1)

The plot of the new transformed \(LNYSE_{t}\) can be seen as follow:

plot(lnyse, ylab="Log of Closing Price of NYSE Composite Index")

To check for the unit-root, we will run ADF test and KPSS test for \(LNYSE_{t}\) as follow:

lnyse.urdf <- ur.df(lnyse, type = "trend", selectlags = "BIC")
summary(lnyse.urdf)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression trend 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.211468 -0.004575  0.000215  0.004864  0.113151 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.972e-04  1.804e-04   1.093 0.274426    
## z.lag.1     -9.867e-01  1.230e-02 -80.227  < 2e-16 ***
## tt           4.715e-09  2.475e-08   0.190 0.848928    
## z.diff.lag   3.306e-02  8.899e-03   3.716 0.000204 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01013 on 12616 degrees of freedom
## Multiple R-squared:  0.4781, Adjusted R-squared:  0.478 
## F-statistic:  3853 on 3 and 12616 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -80.2268 2145.445 3218.168 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau3 -3.96 -3.41 -3.12
## phi2  6.09  4.68  4.03
## phi3  8.27  6.25  5.34
test2 <- kpss.test(lnyse, null = "Trend")
test2

    KPSS Test for Trend Stationarity

data:  lnyse
KPSS Trend = 0.097619, Truncation lag parameter = 25, p-value =
0.1

From ADF test we conclude that \(LNYSE_{t}\) is trend stationary. KPSS test confirms our finding form ADF-test, it shows a borderline rejection of \(H_{0}: LNYSE_{t}\), our series is Trend Stationary.

The next step is to decide whether to use zero mean or non-zero mean in our specification for \(LNYSE_{t}\). To make the determination, we can compare the ADF test from both specification as follow:

lnyse.urdf.nonzeromean <- ur.df(lnyse, type ="drift", selectlags = "BIC")
lnyse.urdf.zeromean <- ur.df(lnyse, selectlags = "BIC")

We then compare the \(\tau_{3}\) statistics for non-zero and zero mean and found the following:

lnyse.urdf.nonzeromean@teststat[1]
[1] -80.22961
lnyse.urdf.zeromean@teststat[1]
[1] -80.17325

We found that non-zero mean specification is more preferable compared to zero mean. Hence, we will estimate ARIMA(p,0,q)(p,0,q) with non-zero mean as our preferred specification if we want to conduct forecasting.

Category C: 3-Month Treasury Bill Secondary Market Rate

The first step is to import the monthly 3-Month Treasury Bill Secondary Market Rate, FRED/TB3MS, using Quandl packages. We will denote FRED/TB3MS as \(TB_{t}\).

tb <- Quandl("FRED/TB3MS", type="ts")
str(tb)
 Time-Series [1:985] from 1934 to 2016: 0.72 0.62 0.24 0.15 0.16 0.15 0.15 0.19 0.21 0.27 ...
plot(tb, ylab="3-Month Treasury Bill: Secondary Market Rate (%)")

Since the unit used in the dataset is percentage, the proper transformation for this dataset is to leave it as is. Hence, we can move to the next stage to check for the unit-root, we will run ADF test and KPSS test for \(TB_{t}\) as follow:

tb.urdf <- ur.df(tb, type = "trend", selectlags = "BIC")
summary(tb.urdf)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression trend 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8517 -0.0666 -0.0222  0.0851  2.5890 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.111e-02  2.342e-02   1.329   0.1843    
## z.lag.1     -9.025e-03  3.631e-03  -2.485   0.0131 *  
## tt           1.886e-06  4.062e-05   0.046   0.9630    
## z.diff.lag   3.406e-01  3.009e-02  11.322   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3472 on 979 degrees of freedom
## Multiple R-squared:  0.1189, Adjusted R-squared:  0.1162 
## F-statistic: 44.02 on 3 and 979 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -2.4853 2.2068 3.31 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau3 -3.96 -3.41 -3.12
## phi2  6.09  4.68  4.03
## phi3  8.27  6.25  5.34

From the result of ADF-test, using \(\tau_{3}\) statistics, we failed reject \(H_{0}: \gamma = 0\). This indicate that \(TB_{t}\) is not trend stationary. Furthermore, using \(\phi_{3}\) statistics, we also failed to reject \(H_{0}: \gamma = \beta = 0\).

Further testing using alternative specification of non-zero mean (drift, Model B as in the notes) and zero mean (Model A as in the notes) are conducted using the following code:

tb.urdf <- ur.df(tb, type = "drift", selectlags = "BIC")
summary(tb.urdf)

############################################### 
# Augmented Dickey-Fuller Test Unit Root Test # 
############################################### 

Test regression drift 


Call:
lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.8522 -0.0671 -0.0216  0.0852  2.5887 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.031875   0.016666   1.913   0.0561 .  
z.lag.1     -0.008979   0.003488  -2.574   0.0102 *  
z.diff.lag   0.340576   0.030038  11.338   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.347 on 980 degrees of freedom
Multiple R-squared:  0.1189,    Adjusted R-squared:  0.1171 
F-statistic: 66.09 on 2 and 980 DF,  p-value: < 2.2e-16


Value of test-statistic is: -2.5738 3.3125 

Critical values for test statistics: 
      1pct  5pct 10pct
tau2 -3.43 -2.86 -2.57
phi1  6.43  4.59  3.78
tb.urdf <- ur.df(tb, selectlags = "BIC")
summary(tb.urdf)

############################################### 
# Augmented Dickey-Fuller Test Unit Root Test # 
############################################### 

Test regression none 


Call:
lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.8914 -0.0461  0.0035  0.0967  2.5531 

Coefficients:
           Estimate Std. Error t value Pr(>|t|)    
z.lag.1    -0.00399    0.00232   -1.72   0.0857 .  
z.diff.lag  0.33797    0.03005   11.25   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3474 on 981 degrees of freedom
Multiple R-squared:  0.1156,    Adjusted R-squared:  0.1138 
F-statistic: 64.09 on 2 and 981 DF,  p-value: < 2.2e-16


Value of test-statistic is: -1.7202 

Critical values for test statistics: 
      1pct  5pct 10pct
tau1 -2.58 -1.95 -1.62

From both non-zero mean and zero mean model for ADF-test result, the rejection of \(H_{0}\) are borderline at best, to get more conclusive result we will run KPPS test as follow:

test3 <- kpss.test(tb, null = "Trend")
test3

    KPSS Test for Trend Stationarity

data:  tb
KPSS Trend = 2.2343, Truncation lag parameter = 7, p-value = 0.01

KPSS test confirms our finding form ADF-test, this indicate that \(TB_{t}\) is not trend stationary and following a random walk that warrant for differencing before we can use the time series for forecasting purpose.

The next step is to transform \(TB_{t}\) into first difference form and re-checking for the existence of unit root and stationary of our new dataset \(\Delta TB_{t}\) using both ADF test and KPSS test:

dtb <- diff(tb, differences=1)
dtb.urdf <- ur.df(dtb, type = "trend", selectlags = "BIC")
summary(dtb.urdf)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression trend 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.3199 -0.0564 -0.0001  0.0790  2.1806 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.665e-02  2.180e-02   0.764    0.445    
## z.lag.1     -8.035e-01  3.604e-02 -22.298  < 2e-16 ***
## tt          -3.369e-05  3.836e-05  -0.878    0.380    
## z.diff.lag   2.085e-01  3.126e-02   6.672 4.22e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3406 on 978 degrees of freedom
## Multiple R-squared:  0.3618, Adjusted R-squared:  0.3598 
## F-statistic: 184.8 on 3 and 978 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -22.2981 165.7374 248.6052 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau3 -3.96 -3.41 -3.12
## phi2  6.09  4.68  4.03
## phi3  8.27  6.25  5.34
test4 <- kpss.test(dtb, null = "Trend")
test4

    KPSS Test for Trend Stationarity

data:  dtb
KPSS Trend = 0.03477, Truncation lag parameter = 7, p-value = 0.1

From ADF test we conclude that \(\Delta TB_{t}\) is difference stationary. KPSS test confirms our finding form ADF-test, it shows a borderline rejection of \(H_{0}: \Delta TB_{t}\), our series is difference stationary.

The next step is to decide whether to use zero mean or non-zero mean in our specification for \(\Delta TB_{t}\). To make the determination, we can compare the ADF test from both specification as follow:

dtb.urdf.nonzeromean <- ur.df(dtb, type ="drift", selectlags = "BIC")
dtb.urdf.zeromean <- ur.df(dtb, selectlags = "BIC")

We then compare the \(\tau_{3}\) statistics for non-zero and zero mean and found the following:

dtb.urdf.nonzeromean@teststat[1]
[1] -22.28352
dtb.urdf.zeromean@teststat[1]
[1] -22.29494

We found that zero mean specification is more preferable compared to zero mean. Hence, we will estimate ARIMA(p,0,q)(p,0,q)[12] with zero mean as our preferred specification if we want to conduct forecasting.