Homework 1

Data: Built-in data in library “astsa” and “electricity.RData”.

# install packages as needed
#install.packages("astsa")
# setup
library(astsa)

setwd("C:/Users/Emily/Dropbox/My PC (DESKTOP-39IBD6A)/Desktop/Brit School/Spring 2021/Data Decision/hw1")

load(file = "electricity.RData")

# is.na(electricity) No missing values..
set.seed(100)

# pull data from astsa package
data(cmort)
data(gtemp)

Exercise 1

ACF and PACF from simulated data under ARMA(p,q) model For simulated stationary data (n=200) under different ARMA models listed below, (i) plot observations, (ii) plot its ACF and PACF plots, and (iii) describe what you observe from ACF and PACF.

a) Two sets of simulated stationary data under AR(3) model by specifying coefficients (𝜙1,𝜙2,𝜙3) with your choice

NOTE: for the case of nonstationary model, R will give a warning message.

AR(3)

ar3.sim <- arima.sim(list(order =c(3,0,0), ar=c(0.5, -0.3, .08)), n = 200)

ar3.sim2 <- arima.sim(list(order =c(3,0,0), ar=c(0.8, -0.5, .06)), n = 200)


par(mfrow=c(2, 3))
# sim 1
plot.ts(ar3.sim, main = "sim 1")
acf(ar3.sim, lwd= 3, main = "acf - Sim 1")
pacf(ar3.sim, lwd = 3, main = "pcf - sim 1")
# sim 2
plot.ts(ar3.sim2, main = "sim 2")
acf(ar3.sim2, lwd = 3, main = "acf")
pacf(ar3.sim2, lwd = 3, main = "pcf")

Both simulations look similar in their PCF, but the raw data looks different. The 2 ACFs also look slightly different at the 2nd lag.

b) Two sets of simulated stationary data under MA(3) model by specifying coefficients (𝜃1, 𝜃2, 𝜃3) with your choice

MA(3)

ma3.sim <- arima.sim(list(order =c(0,0,3), ma = c(0.5, 0.8, 0.3)), n = 200)
ma3.sim2 <- arima.sim(list(order =c(0,0,3), ma = c(0.5, -0.8, 0.4)), n = 200)



par(mfrow=c(2,3))
#sim 1
plot.ts(ma3.sim, main = "sim 1")
acf(ma3.sim, lwd = 3, main = "acf - sim 1")
pacf(ma3.sim, lwd = 3, main = "pcf - sim 1")
# sim 2
plot.ts(ma3.sim2, main = "sim 2")
acf(ma3.sim2, lwd = 3, main = "acf - sim 2")
pacf(ma3.sim2, lwd = 3, main = "pcf - sim 2")

The 2 simulated datasets look very different here, even though they are both MA(3). While both ACFs gradually plateua, they do so in very different patterns. The pattern of the 2 PACFs look almost opposite.

c) Two sets of simulated stationary data under ARMA(2,2) by specifying coefficients (𝜙1,𝜙2, 𝜃1, 𝜃2) with your

ARMA(2,2)

arma22.sim1 <- arima.sim(list(order =c(2,0,2), ar=c(0.5, 0.3), ma = c(-0.5, 0.8)), n = 200)
arma22.sim2 <- arima.sim(list(order =c(2,0,2), ar=c(0.3, 0.5), ma = c(-0.3, 0.6)), n = 200)



par(mfrow=c(2,3))
# sim 1
plot.ts(arma22.sim1, main = "sim 1")
acf(arma22.sim1, lwd = 3, main = "acf - sim 1")
pacf(arma22.sim1, lwd = 3, main = "pcf - sim 1")
# sim 2
plot.ts(arma22.sim2, main = "sim 2")
acf(arma22.sim2, lwd = 3, main = "acf - sim 2")
pacf(arma22.sim2, lwd = 3, main = "pcf - sim 2")

These 2 simulated timeseries look very similar in raw data, ACF and PACF. That makes sense since I did not change the coefficients by much for the 2 simulations.

d) Two sets of simulated stationary data under ARMA(3,3) by specifying coefficients (𝜙1,𝜙2,𝜙3, 𝜃1, 𝜃2, 𝜃3) with your choice

ARMA(3,3)

arma33.sim1 <- arima.sim(list(order =c(3,0,3), ar=c(0.5, -0.3, -0.5), ma = c(0.5, 0.3, 0.8)), n = 200)
arma33.sim2 <- arima.sim(list(order =c(3,0,3), ar=c(0.3, 0.3, -0.8), ma = c(-0.5, 0.4, 0.6)), n = 200)


par(mfrow=c(2, 3))
# sim 1
plot.ts(arma33.sim1, main = "sim 1")
acf(arma33.sim1, lwd = 3, main = "acf - sim 1")
pacf(arma33.sim1, lwd = 3, main = "pcf - sim 1")
# sim 2
plot.ts(arma33.sim2, main = "sim 2")
acf(arma33.sim2, lwd = 3, main = "acf - sim 2")
pacf(arma33.sim2, lwd = 3, main = "pcf - sim 2")

The 2 simluated timeseries here look very different in their ACFs, with some back and forth, grouped together pattern in sim 1 with more immediately back and forth in sim 2. The PACFs seem to be nearly opposite.

Exercise 2

For data examples listed below, check whether the data look stationary using plotted observations or ACF/ PACF plots. Expain the reason.

a) Use the data “cmort” in astsa package – weekly cardiovascular mortality in Los Angeles Countyw

par(mfrow=c(1, 3))
plot.ts(cmort)
acf(cmort)
pacf(cmort)

The data does not look stationary. There appears to be a linear trend

  1. Use the data “gtemp” in astsa package – annual global mean land-ocean temperature deviations
par(mfrow=c(1, 3))
plot.ts(gtemp)
acf(gtemp)
pacf(gtemp)

The data is not stationary. There is definately a linear trend that begins increases around 1980.

  1. Based on the data “gtemp” in astsa package – annual global mean land-ocean temperature deviations, same data as (b), use the 1 order differenced series, 𝑌𝑡 ′ = 𝑌𝑡 − 𝑌𝑡−1
par(mfrow=c(1, 3))
gtemp.diff = diff(gtemp, differences = 1)
plot.ts(gtemp.diff)
acf(gtemp.diff)
pacf(gtemp.diff)

After taking the first difference, the data appears to be stationary. There is no pattern I can see, it seems random around 0.

  1. Use the data “electricity” in electricity.RData – monthly US electricity generated
par(mfrow=c(1, 3))
plot.ts(electricity)
acf(electricity)
pacf(electricity)

The data is not stationary, there is a clear linear trend increasing.

  1. Based on the data “electricity” in electricity.RData – monthly US electricity generated, use the log transformed series, log(𝑌𝑡)
par(mfrow=c(1, 3))
electricity.log = log(electricity)
plot.ts(electricity.log)
acf(electricity.log)
pacf(electricity.log)

The data is not stationary. There is a clear linear trend weven with the log transformation.

  1. Based on the data “electricity” in electricity.RData – monthly US electricity generated, use 1 order differenced log transformed series, log(𝑌𝑡 ) ′ = log(𝑌𝑡) − log(𝑌𝑡−1)
par(mfrow=c(1, 3))
electricity.log.diff = diff(electricity.log, differences = 1)
plot.ts(electricity.log.diff)
acf(electricity.log.diff)
pacf(electricity.log.diff)

The data is stationary. The data appears to be random around 0.

Exercise 3

Use the data “cmort” in astsa package examined in Exercise 2 a) By visually checking, decide what ARMA model is appropriate, i.e., decide p, q in ARMA(p,q). Explain your answer.

plot.ts(cmort)

From the plot above, the data is not stationary. So the first thing I want to do is apply 1 difference to see if the data becomes stationary after that.

cmort.diff = diff(cmort, differences = 1)
plot.ts(cmort.diff)

The data now appears to be stationary, so I’ll plot the ACF and PACF to try and decide p,q

acf(cmort.diff)

pacf(cmort.diff)

Based on the ACF, I’d chose p = 2 from the spikes in the ACF. From the PACF, I think q = 1 will work.

But I’d like to explore p = 1 as well. Next I’ll fit a ARMA(2,1) model and an ARMA(1,1) model to the data that has taken the 1st difference.

  1. Fit the model(s) which you chose in part (a) using sarima function in R and interpret the result.
arma21 = sarima(cmort.diff, 2, 0, 1)
## initial  value 1.907405 
## iter   2 value 1.856514
## iter   3 value 1.764037
## iter   4 value 1.758706
## iter   5 value 1.756237
## iter   6 value 1.756227
## iter   7 value 1.756194
## iter   8 value 1.756122
## iter   9 value 1.755948
## iter  10 value 1.755652
## iter  11 value 1.755529
## iter  12 value 1.755462
## iter  13 value 1.755461
## iter  14 value 1.755459
## iter  15 value 1.755453
## iter  16 value 1.755443
## iter  17 value 1.755433
## iter  18 value 1.755426
## iter  19 value 1.755425
## iter  19 value 1.755425
## final  value 1.755425 
## converged
## initial  value 1.756350 
## iter   2 value 1.756343
## iter   3 value 1.756340
## iter   4 value 1.756340
## iter   4 value 1.756340
## iter   4 value 1.756340
## final  value 1.756340 
## converged

arma11 = sarima(cmort.diff, 1, 0, 1)
## initial  value 1.908706 
## iter   2 value 1.864950
## iter   3 value 1.758481
## iter   4 value 1.758244
## iter   5 value 1.758051
## iter   6 value 1.757827
## iter   7 value 1.757772
## iter   8 value 1.757764
## iter   8 value 1.757764
## final  value 1.757764 
## converged
## initial  value 1.757729 
## iter   2 value 1.757725
## iter   3 value 1.757720
## iter   4 value 1.757715
## iter   5 value 1.757713
## iter   6 value 1.757713
## iter   6 value 1.757713
## iter   6 value 1.757713
## final  value 1.757713 
## converged

arma21$AIC
## [1] 6.37028
arma11$AIC
## [1] 6.369082

Looking at the AIC, I will go with ARMA(1,1) because ARMA(2,1) had an AIC of 6.37028 and ARMA(1,1) had AIC of 6.369082

  1. Check model diagnostics and explain if your chosen model seems appropriate.
sarima(cmort.diff, 1, 0, 1)
## initial  value 1.908706 
## iter   2 value 1.864950
## iter   3 value 1.758481
## iter   4 value 1.758244
## iter   5 value 1.758051
## iter   6 value 1.757827
## iter   7 value 1.757772
## iter   8 value 1.757764
## iter   8 value 1.757764
## final  value 1.757764 
## converged
## initial  value 1.757729 
## iter   2 value 1.757725
## iter   3 value 1.757720
## iter   4 value 1.757715
## iter   5 value 1.757713
## iter   6 value 1.757713
## iter   6 value 1.757713
## iter   6 value 1.757713
## final  value 1.757713 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans, 
##     fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##           ar1      ma1    xmean
##       -0.3777  -0.1719  -0.0269
## s.e.   0.0882   0.0967   0.1549
## 
## sigma^2 estimated as 33.61:  log likelihood = -1610.56,  aic = 3229.12
## 
## $degrees_of_freedom
## [1] 504
## 
## $ttable
##       Estimate     SE t.value p.value
## ar1    -0.3777 0.0882 -4.2842  0.0000
## ma1    -0.1719 0.0967 -1.7770  0.0762
## xmean  -0.0269 0.1549 -0.1735  0.8623
## 
## $AIC
## [1] 6.369082
## 
## $AICc
## [1] 6.369176
## 
## $BIC
## [1] 6.402443

The model seems appropriate based on the standardized residuals and QQ plot, but the graph of p values for Ljung-Box statistic do not seem to suggest a good fit for the model.

  1. Make a prediction for next 4 values based on your final model.
predict(arima(cmort.diff, order = c(1, 0, 1)), n.ahead = 4)$pred
## Time Series:
## Start = c(1979, 41) 
## End = c(1979, 44) 
## Frequency = 52 
## [1]  1.06291191 -0.43853155  0.12863308 -0.08561123

Exercise 4

Use the data “gtemp” in astsa package examined in Exercise 2

  1. By visually checking, decide what ARMA model is appropriate for 1 order differenced series, 𝑌𝑡 ′ = 𝑌𝑡 − 𝑌𝑡−1 . Explain your answer. If you see multiple ARMA models look appropriate, try all of them, but at most three models.
gtemp.diff = diff(gtemp, differences = 1)
acf(gtemp.diff)

pacf(gtemp.diff)

I want to try ARMA(1, 3) model for this data. There is only 1 spike in the ACF and 3 spikes in the PACF.

  1. Fit the model(s) which you chose in part (a) using sarima function and interpret the result. If you have multiple models, report all of them.
sarima(gtemp.diff, 1, 0, 3)
## initial  value -2.211269 
## iter   2 value -2.328791
## iter   3 value -2.335632
## iter   4 value -2.337057
## iter   5 value -2.344885
## iter   6 value -2.345040
## iter   7 value -2.345467
## iter   8 value -2.345676
## iter   9 value -2.346966
## iter  10 value -2.349997
## iter  11 value -2.352061
## iter  12 value -2.352665
## iter  13 value -2.355933
## iter  14 value -2.358148
## iter  15 value -2.359904
## iter  16 value -2.360401
## iter  17 value -2.360972
## iter  18 value -2.363412
## iter  19 value -2.363529
## iter  20 value -2.363654
## iter  21 value -2.363678
## iter  22 value -2.363681
## iter  22 value -2.363681
## final  value -2.363681 
## converged
## initial  value -2.365998 
## iter   2 value -2.366003
## iter   3 value -2.366007
## iter   4 value -2.366009
## iter   5 value -2.366010
## iter   5 value -2.366010
## iter   5 value -2.366010
## final  value -2.366010 
## converged
## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans, 
##     fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##           ar1     ma1      ma2      ma3   xmean
##       -0.9376  0.4846  -0.6341  -0.2857  0.0065
## s.e.   0.0946  0.1165   0.0955   0.0866  0.0025
## 
## sigma^2 estimated as 0.008755:  log likelihood = 122.17,  aic = -232.34
## 
## $degrees_of_freedom
## [1] 124
## 
## $ttable
##       Estimate     SE t.value p.value
## ar1    -0.9376 0.0946 -9.9090  0.0000
## ma1     0.4846 0.1165  4.1612  0.0001
## ma2    -0.6341 0.0955 -6.6373  0.0000
## ma3    -0.2857 0.0866 -3.2988  0.0013
## xmean   0.0065 0.0025  2.6171  0.0100
## 
## $AIC
## [1] -1.80112
## 
## $AICc
## [1] -1.797338
## 
## $BIC
## [1] -1.668105
arma13 = sarima(gtemp.diff, 1, 0, 3)
## initial  value -2.211269 
## iter   2 value -2.328791
## iter   3 value -2.335632
## iter   4 value -2.337057
## iter   5 value -2.344885
## iter   6 value -2.345040
## iter   7 value -2.345467
## iter   8 value -2.345676
## iter   9 value -2.346966
## iter  10 value -2.349997
## iter  11 value -2.352061
## iter  12 value -2.352665
## iter  13 value -2.355933
## iter  14 value -2.358148
## iter  15 value -2.359904
## iter  16 value -2.360401
## iter  17 value -2.360972
## iter  18 value -2.363412
## iter  19 value -2.363529
## iter  20 value -2.363654
## iter  21 value -2.363678
## iter  22 value -2.363681
## iter  22 value -2.363681
## final  value -2.363681 
## converged
## initial  value -2.365998 
## iter   2 value -2.366003
## iter   3 value -2.366007
## iter   4 value -2.366009
## iter   5 value -2.366010
## iter   5 value -2.366010
## iter   5 value -2.366010
## final  value -2.366010 
## converged

arma13$fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans, 
##     fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##           ar1     ma1      ma2      ma3   xmean
##       -0.9376  0.4846  -0.6341  -0.2857  0.0065
## s.e.   0.0946  0.1165   0.0955   0.0866  0.0025
## 
## sigma^2 estimated as 0.008755:  log likelihood = 122.17,  aic = -232.34

The p values for the Ljung-Box statistic don’t look close to 0 after lag 5.

I’m also going to look at ARMA(1, 2) just to see what it looks like

sarima(gtemp.diff, 1, 0, 2)
## initial  value -2.211269 
## iter   2 value -2.319512
## iter   3 value -2.330118
## iter   4 value -2.336819
## iter   5 value -2.338492
## iter   6 value -2.340501
## iter   7 value -2.340763
## iter   8 value -2.341026
## iter   9 value -2.341472
## iter  10 value -2.341919
## iter  11 value -2.341964
## iter  12 value -2.342001
## iter  13 value -2.342041
## iter  14 value -2.342062
## iter  15 value -2.342072
## iter  16 value -2.342077
## iter  17 value -2.342084
## iter  18 value -2.342096
## iter  19 value -2.342101
## iter  20 value -2.342102
## iter  21 value -2.342102
## iter  22 value -2.342102
## iter  22 value -2.342102
## iter  22 value -2.342102
## final  value -2.342102 
## converged
## initial  value -2.346486 
## iter   2 value -2.346529
## iter   3 value -2.346569
## iter   4 value -2.346631
## iter   5 value -2.346686
## iter   6 value -2.346869
## iter   7 value -2.346936
## iter   8 value -2.346943
## iter   9 value -2.346943
## iter   9 value -2.346943
## iter   9 value -2.346943
## final  value -2.346943 
## converged
## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans, 
##     fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1      ma1      ma2   xmean
##       0.0318  -0.5419  -0.1738  0.0065
## s.e.  0.2770   0.2644   0.1753  0.0025
## 
## sigma^2 estimated as 0.009109:  log likelihood = 119.71,  aic = -229.43
## 
## $degrees_of_freedom
## [1] 125
## 
## $ttable
##       Estimate     SE t.value p.value
## ar1     0.0318 0.2770  0.1147  0.9089
## ma1    -0.5419 0.2644 -2.0496  0.0425
## ma2    -0.1738 0.1753 -0.9917  0.3233
## xmean   0.0065 0.0025  2.5553  0.0118
## 
## $AIC
## [1] -1.778491
## 
## $AICc
## [1] -1.77599
## 
## $BIC
## [1] -1.667645
arma12 = sarima(gtemp.diff, 1, 0, 2)
## initial  value -2.211269 
## iter   2 value -2.319512
## iter   3 value -2.330118
## iter   4 value -2.336819
## iter   5 value -2.338492
## iter   6 value -2.340501
## iter   7 value -2.340763
## iter   8 value -2.341026
## iter   9 value -2.341472
## iter  10 value -2.341919
## iter  11 value -2.341964
## iter  12 value -2.342001
## iter  13 value -2.342041
## iter  14 value -2.342062
## iter  15 value -2.342072
## iter  16 value -2.342077
## iter  17 value -2.342084
## iter  18 value -2.342096
## iter  19 value -2.342101
## iter  20 value -2.342102
## iter  21 value -2.342102
## iter  22 value -2.342102
## iter  22 value -2.342102
## iter  22 value -2.342102
## final  value -2.342102 
## converged
## initial  value -2.346486 
## iter   2 value -2.346529
## iter   3 value -2.346569
## iter   4 value -2.346631
## iter   5 value -2.346686
## iter   6 value -2.346869
## iter   7 value -2.346936
## iter   8 value -2.346943
## iter   9 value -2.346943
## iter   9 value -2.346943
## iter   9 value -2.346943
## final  value -2.346943 
## converged

arma12$fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans, 
##     fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1      ma1      ma2   xmean
##       0.0318  -0.5419  -0.1738  0.0065
## s.e.  0.2770   0.2644   0.1753  0.0025
## 
## sigma^2 estimated as 0.009109:  log likelihood = 119.71,  aic = -229.43

The Ljung-Box statistic looks much better for this model, but the AIC is slightly higher than the ARMA(1,3) model.

  1. Check model diagnostics and explain if your chosen model seems appropriate. If you have multiple models, choose the final one based on AIC and diagnostics plot.
sarima(gtemp.diff, 1, 0, 2)
## initial  value -2.211269 
## iter   2 value -2.319512
## iter   3 value -2.330118
## iter   4 value -2.336819
## iter   5 value -2.338492
## iter   6 value -2.340501
## iter   7 value -2.340763
## iter   8 value -2.341026
## iter   9 value -2.341472
## iter  10 value -2.341919
## iter  11 value -2.341964
## iter  12 value -2.342001
## iter  13 value -2.342041
## iter  14 value -2.342062
## iter  15 value -2.342072
## iter  16 value -2.342077
## iter  17 value -2.342084
## iter  18 value -2.342096
## iter  19 value -2.342101
## iter  20 value -2.342102
## iter  21 value -2.342102
## iter  22 value -2.342102
## iter  22 value -2.342102
## iter  22 value -2.342102
## final  value -2.342102 
## converged
## initial  value -2.346486 
## iter   2 value -2.346529
## iter   3 value -2.346569
## iter   4 value -2.346631
## iter   5 value -2.346686
## iter   6 value -2.346869
## iter   7 value -2.346936
## iter   8 value -2.346943
## iter   9 value -2.346943
## iter   9 value -2.346943
## iter   9 value -2.346943
## final  value -2.346943 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans, 
##     fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1      ma1      ma2   xmean
##       0.0318  -0.5419  -0.1738  0.0065
## s.e.  0.2770   0.2644   0.1753  0.0025
## 
## sigma^2 estimated as 0.009109:  log likelihood = 119.71,  aic = -229.43
## 
## $degrees_of_freedom
## [1] 125
## 
## $ttable
##       Estimate     SE t.value p.value
## ar1     0.0318 0.2770  0.1147  0.9089
## ma1    -0.5419 0.2644 -2.0496  0.0425
## ma2    -0.1738 0.1753 -0.9917  0.3233
## xmean   0.0065 0.0025  2.5553  0.0118
## 
## $AIC
## [1] -1.778491
## 
## $AICc
## [1] -1.77599
## 
## $BIC
## [1] -1.667645

Even though the AIC of ARMA(1,3) is slightly better, I’m going to choose ARMA(1,2) because of the better looking Ljung-Box statistic plot. It seems to suggest that ARMA(1,2) is a better fit for the data than ARMA(1,3)

  1. Make a prediction for next 4 values based on your final model.
predict(arima(gtemp.diff, order = c(1, 0, 2)), n.ahead = 4)$pred
## Time Series:
## Start = 2010 
## End = 2013 
## Frequency = 1 
## [1] -0.008370590 -0.007246593  0.006029457  0.006451301