Assignment1

3

First, I loaded the dataset and library the packages needed.

#load the dataset
path = "D:\\university\\23春夏\\金融计量\\作业1"
setwd(path)
library(readxl)

## Warning: 程辑包'readxl'是用R版本4.2.2 来建造的

library(tidyverse)

## Warning: 程辑包'tidyverse'是用R版本4.2.2 来建造的

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   0.3.5 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.3      ✔ forcats 0.5.2

## Warning: 程辑包'ggplot2'是用R版本4.2.2 来建造的

## Warning: 程辑包'tidyr'是用R版本4.2.2 来建造的

## Warning: 程辑包'dplyr'是用R版本4.2.2 来建造的

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(ggplot2)
library(astsa)

## Warning: 程辑包'astsa'是用R版本4.2.2 来建造的

library(forecast)

## Warning: 程辑包'forecast'是用R版本4.2.2 来建造的

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo 
## 
## 载入程辑包：'forecast'
## 
## The following object is masked from 'package:astsa':
## 
##     gas

library(tseries)

## Warning: 程辑包'tseries'是用R版本4.2.2 来建造的

library(agricolae)

## Warning: 程辑包'agricolae'是用R版本4.2.3 来建造的

data<-read_excel("quarterly.xls")
glimpse(data)

## Rows: 212
## Columns: 19
## $ DATE     <chr> "1960Q1", "1960Q2", "1960Q3", "1960Q4", "1961Q1", "1961Q2", "…
## $ FFR      <dbl> 3.93, 3.70, 2.94, 2.30, 2.00, 1.73, 1.68, 2.40, 2.46, 2.61, 2…
## $ Tbill    <dbl> 3.87, 2.99, 2.36, 2.31, 2.35, 2.30, 2.30, 2.46, 2.72, 2.72, 2…
## $ Tb1yr    <dbl> 4.57, 3.87, 3.07, 2.99, 2.87, 2.94, 3.01, 3.10, 3.21, 3.02, 3…
## $ r5       <dbl> 4.64, 4.30, 3.67, 3.75, 3.64, 3.62, 3.90, 3.84, 3.84, 3.63, 3…
## $ r10      <dbl> 4.49, 4.26, 3.83, 3.89, 3.79, 3.79, 3.98, 3.97, 4.02, 3.87, 3…
## $ PPINSA   <dbl> 31.67, 31.73, 31.63, 31.70, 31.80, 31.47, 31.50, 31.53, 31.70…
## $ Finished <dbl> 33.20, 33.40, 33.43, 33.67, 33.63, 33.33, 33.33, 33.37, 33.53…
## $ CPI      <dbl> 29.40, 29.57, 29.59, 29.78, 29.84, 29.83, 29.95, 29.99, 30.11…
## $ CPICORE  <dbl> 18.92, 19.00, 19.07, 19.14, 19.17, 19.23, 19.32, 19.37, 19.44…
## $ M1NSA    <dbl> 140.53, 138.40, 139.60, 142.67, 142.23, 141.40, 142.00, 146.6…
## $ M2SA     <dbl> 896.1, 903.3, 919.4, 932.8, 948.9, 966.4, 982.7, 1000.0, 1020…
## $ M2NSA    <dbl> 299.40, 300.03, 305.50, 312.30, 317.10, 320.97, 326.50, 334.7…
## $ Unemp    <dbl> 5.13, 5.23, 5.53, 6.27, 6.80, 7.00, 6.77, 6.20, 5.63, 5.53, 5…
## $ IndProd  <dbl> 23.93, 23.41, 23.02, 22.47, 22.13, 23.00, 23.74, 24.57, 24.94…
## $ RGDP     <dbl> 2845.3, 2832.0, 2836.6, 2800.2, 2816.9, 2869.6, 2915.9, 2975.…
## $ Potent   <dbl> 2824.2, 2851.2, 2878.7, 2906.7, 2934.8, 2962.9, 2991.3, 3019.…
## $ Deflator <dbl> 18.521, 18.579, 18.648, 18.700, 18.743, 18.785, 18.843, 18.90…
## $ Curr     <dbl> 31.830, 31.862, 32.217, 32.624, 32.073, 32.131, 32.699, 33.42…

a

y<-diff(log(data$IndProd),lag=1)
mdl_a1_lag1<-lm(y~lag(y,1))
summary(mdl_a1_lag1)

## 
## Call:
## lm(formula = y ~ lag(y, 1))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.047714 -0.006337  0.000170  0.006327  0.044898 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.0028088  0.0009497   2.957  0.00346 ** 
## lag(y, 1)   0.5999273  0.0548041  10.947  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0127 on 208 degrees of freedom
##   (因为不存在，1个观察量被删除了)
## Multiple R-squared:  0.3655, Adjusted R-squared:  0.3625 
## F-statistic: 119.8 on 1 and 208 DF,  p-value: < 2.2e-16

After adding the AR(||8||), the equation’s adjusted R^2 is increased. It means the explanatory ability of the model is increased.

Box.test(y)

## 
##  Box-Pierce test
## 
## data:  y
## X-squared = 75.941, df = 1, p-value < 2.2e-16

mdl_a2_lag8<-lm(y~lag(y,1)+lag(y,8))
summary(mdl_a2_lag8)

## 
## Call:
## lm(formula = y ~ lag(y, 1) + lag(y, 8))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.044260 -0.006154  0.000062  0.006414  0.046069 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.003618   0.001015   3.565 0.000455 ***
## lag(y, 1)    0.588308   0.055857  10.532  < 2e-16 ***
## lag(y, 8)   -0.131225   0.053174  -2.468 0.014434 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01225 on 200 degrees of freedom
##   (因为不存在，8个观察量被删除了)
## Multiple R-squared:  0.3802, Adjusted R-squared:  0.374 
## F-statistic: 61.33 on 2 and 200 DF,  p-value: < 2.2e-16

However, simply adding an AR(||8||) may have two concerns: one is the equation may emit the effects of intermediate orders(as the acf plot depicts 2 and 3 orders also need to be added into the equation), another is the high order may reduce the sample space.

acf(y)

b

As the below time path plot shows, the un-employed rate has an increasing tendency with the time goes by, which may have the problem of non-stationary. The ACF plot suggests that the acf slowly tends to zero. It may not be covariance stationary with normally distributed errors.

ggplot(data=data,aes(x=DATE,y=Unemp))+geom_point()

acf(data$Unemp)

AR(2) model of urate. The equation below is $y_{t} = 0.226+1.65*y_{t-1}-0.683y_{t-2}+ε_{t}$

y=data$Unemp
mdl_b1_lag2<-lm(y~lag(y,1)+lag(y,2))
summary(mdl_b1_lag2)

## 
## Call:
## lm(formula = y ~ lag(y, 1) + lag(y, 2))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.71003 -0.15973 -0.02581  0.13814  1.02401 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.22616    0.06725   3.363 0.000918 ***
## lag(y, 1)    1.64595    0.05096  32.299  < 2e-16 ***
## lag(y, 2)   -0.68267    0.05110 -13.360  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2485 on 207 degrees of freedom
##   (因为不存在，2个观察量被删除了)
## Multiple R-squared:  0.9767, Adjusted R-squared:  0.9765 
## F-statistic:  4335 on 2 and 207 DF,  p-value: < 2.2e-16

characteristic roots of AR(2) model For a AR(2) model $ x_{t} = φ_{0} + φ_{1}x_{t-1} + φ_{2}x_{t-1} + εt$

its characteristic function is $ ^2 - _{1}*- = 0$

the characteristic roots are $\lambda_{1}=\frac{\phi_1+\sqrt{\phi_1^2+4 \phi_2}}{2}, \lambda_2=\frac{\phi_1-\sqrt{\phi_1^2+4 \phi_2}}{2}$.

Therefore, the characteristic roots of deterministic part of the difference equation are $\lambda_{1} = 0.82+0.074i,\lambda_{2} = 0.82-0.074i$.

The implied adjustment process is not stationary based on the result of the characteristic roots.

comparison of AR(1) and AR(2) Model The AR(1) model’s coefficient is significant, but the intercept is not significant. The equation of AR(1) is $y_{t}=0.134+0.98*y_{t-1}$. The coefficient of yt-1 and the adjusted R^2 is lower than such in AR(2) model.

y=data$Unemp
mdl_b1_lag1<-lm(y~lag(y,1))
summary(mdl_b1_lag1)

## 
## Call:
## lm(formula = y ~ lag(y, 1))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.78687 -0.18887 -0.06627  0.08743  1.66773 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.13427    0.09070    1.48     0.14    
## lag(y, 1)    0.98000    0.01443   67.89   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3376 on 209 degrees of freedom
##   (因为不存在，1个观察量被删除了)
## Multiple R-squared:  0.9566, Adjusted R-squared:  0.9564 
## F-statistic:  4609 on 1 and 209 DF,  p-value: < 2.2e-16

c

From the ACF and PACF plots of the dly, a Box-Jenkins modeler may worry about the strong auto correlation of the core CPI. The acf very slowly tends to zero.

dly<-diff(log(data$CPICORE),lag=1)
acf(dly)

pacf(dly)

The d2ly shows more stationary property than the dly, thus the modeler may want to work with the second difference of the logarithm of the core CPI.

d2ly<-diff(dly,lag=1)
acf(d2ly)

pacf(d2ly)

ii. Both the AR(1) and MA(1) models’ coefficients are significant. Based on the AIC, the MA(1) is better than AR(1). The MA(1) model is $d2ly_{t} = -0.2704*a_{t-1} +ε_{t}$.

# ar(1) model
mdl_d2ly_ar1<-arima(d2ly,order=c(1,0,0))
mdl_d2ly_ar1

## 
## Call:
## arima(x = d2ly, order = c(1, 0, 0))
## 
## Coefficients:
##           ar1  intercept
##       -0.2497      0e+00
## s.e.   0.0667      1e-04
## 
## sigma^2 estimated as 4.169e-06:  log likelihood = 1002.71,  aic = -1999.41

# ma(1) model
mdl_d2ly_ma1<-arima(d2ly,order=c(0,0,1))
mdl_d2ly_ma1

## 
## Call:
## arima(x = d2ly, order = c(0, 0, 1))
## 
## Coefficients:
##           ma1  intercept
##       -0.2704      0e+00
## s.e.   0.0679      1e-04
## 
## sigma^2 estimated as 4.15e-06:  log likelihood = 1003.2,  aic = -2000.39

forecasting properties To compare the ar(1) and ma(1) models’ forecasting properties, I use the first 80% data to build the training set and use the left to build the evaluation set. The regression results report that, the ar(1) model shows better forecast properties because of the lower RMSE, MAE, MASE compared with the ma(1) model.

d2ly<-data.frame(d2ly)
train_set<-d2ly[1:round(0.8*length(d2ly$d2ly)),]
evaluation_set<-d2ly[round(0.8*length(d2ly$d2ly)):length(d2ly$d2ly),]
mdl_train_ar1<-arima(train_set,order=c(1,0,0))
mdl_train_ma1<-arima(train_set,order=c(0,0,1))
summary(mdl_train_ar1)

## 
## Call:
## arima(x = train_set, order = c(1, 0, 0))
## 
## Coefficients:
##           ar1  intercept
##       -0.2438      0e+00
## s.e.   0.0749      2e-04
## 
## sigma^2 estimated as 4.636e-06:  log likelihood = 793.25,  aic = -1580.5
## 
## Training set error measures:
##                       ME        RMSE         MAE      MPE     MAPE      MASE
## Training set -6.9603e-07 0.002153073 0.001521556 128.5028 164.8044 0.6221351
##                      ACF1
## Training set -0.006486651

summary(mdl_train_ma1)

## 
## Call:
## arima(x = train_set, order = c(0, 0, 1))
## 
## Coefficients:
##           ma1  intercept
##       -0.2490      0e+00
## s.e.   0.0746      1e-04
## 
## sigma^2 estimated as 4.633e-06:  log likelihood = 793.3,  aic = -1580.6
## 
## Training set error measures:
##                         ME        RMSE        MAE      MPE     MAPE      MASE
## Training set -1.131661e-06 0.002152453 0.00152763 133.2132 167.0032 0.6246186
##                      ACF1
## Training set -0.004947497

1-step forecasts First, I build the AR(2) model of dly series and make 12 step forecasts of the core CPI. The below plot depicts that the core CPI has a rising trend in the future 12 steps.

mdl_dly_ar2<-arima(dly,order=c(2,0,0))
forecast(mdl_dly_ar2,h=12)

##     Point Forecast         Lo 80       Hi 80        Lo 95       Hi 95
## 212    0.002762031  0.0001771710 0.005346891 -0.001191171 0.006715233
## 213    0.002967140 -0.0002182688 0.006152549 -0.001904522 0.007838803
## 214    0.003202030 -0.0005130304 0.006917091 -0.002479665 0.008883726
## 215    0.003416511 -0.0007064766 0.007539500 -0.002889055 0.009722078
## 216    0.003622875 -0.0008407211 0.008086471 -0.003203606 0.010449356
## 217    0.003818882 -0.0009318913 0.008569656 -0.003446799 0.011084564
## 218    0.004005638 -0.0009914016 0.009002678 -0.003636675 0.011647951
## 219    0.004183443 -0.0010268106 0.009393698 -0.003784953 0.012151839
## 220    0.004352758 -0.0010435481 0.009749065 -0.003900180 0.012605697
## 221    0.004513981 -0.0010456408 0.010073602 -0.003988727 0.013016688
## 222    0.004667500 -0.0010361614 0.010371161 -0.004055497 0.013390496
## 223    0.004813682 -0.0010175062 0.010644871 -0.004104351 0.013731716

plot(forecast(mdl_dly_ar2,h=12))

Then I build the MA(1) model of the d2ly series and make 12 step forecasts. After transforming the forecasts of the shock to the forecasts of core CPI, the below plot also shows the rising tendency, but the difference compared with the AR(2) model of dly series is that the point forecasts are stable from the second step forecast to the end.

mdl_d2ly_ma1<-arima(d2ly,order=c(0,0,1))
summary(mdl_d2ly_ma1)

## 
## Call:
## arima(x = d2ly, order = c(0, 0, 1))
## 
## Coefficients:
##           ma1  intercept
##       -0.2704      0e+00
## s.e.   0.0679      1e-04
## 
## sigma^2 estimated as 4.15e-06:  log likelihood = 1003.2,  aic = -2000.39
## 
## Training set error measures:
##                         ME        RMSE         MAE     MPE     MAPE      MASE
## Training set -9.937184e-07 0.002037075 0.001477732 124.407 159.0768 0.6175522
##                     ACF1
## Training set 0.001615847

forecasts<-forecast(mdl_d2ly_ma1,h=12)
forecasts

##     Point Forecast        Lo 80       Hi 80        Lo 95       Hi 95
## 211   2.126180e-04 -0.002397999 0.002823235 -0.003779976 0.004205212
## 212  -6.932459e-06 -0.002711293 0.002697428 -0.004142895 0.004129030
## 213  -6.932459e-06 -0.002711293 0.002697428 -0.004142895 0.004129030
## 214  -6.932459e-06 -0.002711293 0.002697428 -0.004142895 0.004129030
## 215  -6.932459e-06 -0.002711293 0.002697428 -0.004142895 0.004129030
## 216  -6.932459e-06 -0.002711293 0.002697428 -0.004142895 0.004129030
## 217  -6.932459e-06 -0.002711293 0.002697428 -0.004142895 0.004129030
## 218  -6.932459e-06 -0.002711293 0.002697428 -0.004142895 0.004129030
## 219  -6.932459e-06 -0.002711293 0.002697428 -0.004142895 0.004129030
## 220  -6.932459e-06 -0.002711293 0.002697428 -0.004142895 0.004129030
## 221  -6.932459e-06 -0.002711293 0.002697428 -0.004142895 0.004129030
## 222  -6.932459e-06 -0.002711293 0.002697428 -0.004142895 0.004129030

#transform the forecast to the cpicore forecasts
x<-data.frame(forecasts$mean)
x.lag<-rbind(factor(tail(forecasts$x,1)),head(x,11))
cpicore_forecasts<-x-0.2704*x.lag
plot(cpicore_forecasts)

Assignment1

yang_xinyao_3200105136

2023-05-19

1

2

3

a

b

c