Exercise 1

For simulated stationary data (n=200) under different ARMA models listed below, (i) plot observations, (ii) plot its ACF and PACF plots, and (iii) describe what you observe from ACF and PACF.

a) Two sets of simulated stationary data under AR(3) model by specifying coefficients (𝜙1,𝜙2,𝜙3) with your choice

For ar3.1 and ar3.2 we see similar plots. Their acf plot has a gradual decay while their pacf spikes at one and has a lag cut off at 3. For ar3.3 and ar3.4 we see a spike at lag1 in the acf plot and with the pacf plot a negative spike at lag1 and lag2.

ar3.1 = arima.sim(list(order =c(3,0,0), ar=c(0.3, 0.2, 0.4)), n = 200)
ar3.2 = arima.sim(list(order =c(3,0,0), ar=c(0.4, 0.3, 0.2)), n = 200)
ar3.3 = arima.sim(list(order =c(3,0,0), ar=c(-0.8, -0.7, -0.3)), n = 200)
ar3.4 = arima.sim(list(order =c(3,0,0), ar=c(-0.5, -0.4, -0.1)), n = 200)
#par(mfrow=c(2,2))
#plot.ts(ar3.1, main = "phi = 0.2")
#plot.ts(ar3.2, main = "phi = 0.7")
#plot.ts(ar3.3, main = "phi = -0.3")
#plot.ts(ar3.4, main = "phi = -0.8")
par(mfrow=c(1,2))
acf(ar3.1, lwd = 3); pacf(ar3.1, lwd = 3)

acf(ar3.2, lwd = 3); pacf(ar3.2, lwd = 3)

acf(ar3.3, lwd = 3); pacf(ar3.3, lwd = 3)

acf(ar3.4, lwd = 3); pacf(ar3.4, lwd = 3)

b) Two sets of simulated stationary data under MA(3) model by specifying coefficients (𝜃1, 𝜃2, 𝜃3) with your choice

For ma3.1 the acf plot spikes at lag1 then slowly decays past lag4. In the pacf plot it spikes at lag1. ma3.2 the acf plot spikes at lag1 drops down and the has one more significant lag at 4 while the pacf isn’t even significant at lag1. ma3.3 spikes at lag1 in the acf plot while most of the lags in the pacf plot are negative. ma3.4 is very similar to ma3.3 with the exception that lag1 in the pacf plot has a negative spike at lag1.

ma3.1 = arima.sim(list(order =c(0,0,3), ma=c(0.9, 0.6, 0.4)), n = 200)
ma3.2 = arima.sim(list(order =c(0,0,3), ma=c(0.4, -0.4, 0.3)), n = 200)
ma3.3 = arima.sim(list(order =c(0,0,3), ma=c(-0.8, -0.6, -0.3)), n = 200)
ma3.4 = arima.sim(list(order =c(0,0,3), ma=c(-0.5, 0.7, -0.2)), n = 200)
par(mfrow=c(1,2))
acf(ma3.1, lwd = 3); pacf(ma3.1, lwd = 3)

acf(ma3.2, lwd = 3); pacf(ma3.2, lwd = 3)

acf(ma3.3, lwd = 3); pacf(ma3.3, lwd = 3)

acf(ma3.4, lwd = 3); pacf(ma3.4, lwd = 3)

c) Two sets of simulated stationary data under ARMA(2,2) by specifying coefficients (𝜙1,𝜙2, 𝜃1, 𝜃2) with your choice

The first acf plot we observe an exponential decay. In the pacf plot we have a spike at lag1, and then a cut off at lag2.

par(mfrow=c(1,2))
arma2 = arima.sim(list(order =c(2,0,2), ar = c(0.2,0.6), ma = c(0.4, 0.5)), n =200)
acf(arma2, lwd=3); pacf(arma2, lwd=3)

The second acf plot we have a spike at lag1 for the acf plot followed by a majority negative lag at the pacf plot. The first three lags in the pacf plot all appear to be significant but should be noted that it does not spike at lag1.

par(mfrow=c(1,2))
arma2.2 = arima.sim(list(order =c(2,0,2), ar = c(-0.4,0.3), ma = c(-0.2, -0.7)), n =200)
acf(arma2.2, lwd=3); pacf(arma2.2, lwd=3)

d) Two sets of simulated stationary data under ARMA(3,3) by specifying coefficients (𝜙1,𝜙2,𝜙3, 𝜃1, 𝜃2, 𝜃3) with your choice

The first acf and pacf plots look very similar to that seen in the first acf and pacf plots in part c. We see an exponential decay within the acf plot. The pacf plot has a spike and cutoff at lag1 then taper to zero.

par(mfrow=c(1,2))
arma3 = arima.sim(list(order =c(3,0,3), ar = c(0.2, 0.5 ,0.1), ma = c(0.4, 0.5, 0.5)), n =200)
acf(arma3, lwd=3); pacf(arma3, lwd=3)

The second acf plot has a spike at zero then tapers to zero. pacf does not spike at lag1 but has several significant positive and negative lags that will eventually taper down to zero.

par(mfrow=c(1,2))
arma3.3 = arima.sim(list(order =c(3,0,3), ar = c(0.4, -0.5 ,0.1), ma = c(-0.2, 0.8, -0.5)), n =200)
acf(arma3.3, lwd=3); pacf(arma3.3, lwd=3)

Exercise 2: ACF and PACF from data examples

For data examples listed below, check whether the data look stationary using plotted observations or ACF/ PACF plots. Expain the reason.

a) Use the data “cmort” in astsa package – weekly cardiovascular mortality in Los Angeles County

In the acf plot we see a gradual decay and then a sharp cut off at lag2 in the pacf plot. Looking at the time series plot we can conclude that the data is not stationary

par(mfrow=c(1,2))
acf(cmort, main = "acf"); pacf(cmort, main = "pacf")

plot.ts(cmort)

b) Use the data “gtemp” in astsa package – annual global mean land-ocean temperature deviations

In the acf plot we see a gradual decay and then a sharp cut off at lag2 in the pacf plot, very similar to the gtemp plot. it does look like at lag4 it does become significant again, but it’s too close to say either way. The data is certainly not stationary.

par(mfrow=c(1,2))
acf(gtemp, main = "acf"); pacf(gtemp, main = "pacf")

plot.ts(gtemp)

c) Based on the data “gtemp” in astsa package – annual global mean land-ocean temperature deviations, same data as (b), use the 1 order differenced series, 𝑌𝑡′ = 𝑌𝑡 − 𝑌𝑡−1

We see a large spike at lag1 in the acf plot followed by a negative trend, and one last significant spike at lag5. In the pacf plot we see the first four lags being significant but also negative. I would say based off of the plots that the data looks stationary.

diff.gtemp = diff(gtemp, lag=1)
par(mfrow=c(1,2))
acf(diff.gtemp, main = "acf"); pacf(diff.gtemp, main = "pacf")

plot.ts(diff.gtemp)

d) Use the data “electricity” in electricity.RData – monthly US electricity generated

For electricity we see all of the lags are significant and tend to be distributed in an almost wave pattern. The pacf plot spikes at lag1 followed by several less significant lags both positive and negative. Looking at the plot though it appears that the data is not stationary.

par(mfrow=c(1,2))
acf(electricity, main = "acf"); pacf(electricity, main = "pacf")

e) Based on the data “electricity” in electricity.RData – monthly US electricity generated, use the log transformed series, log(𝑌𝑡)

The log of electricity doesn’t seem to have a different effect on the acf and pacf plots.

par(mfrow=c(1,2))

acf(log(electricity), main = "acf"); pacf(log(electricity), main = "pacf")

plot.ts(electricity)

f) Based on the data “electricity” in electricity.RData – monthly US electricity generated, use 1 order differenced log transformed series, log(𝑌𝑡)′ = log(𝑌𝑡) − log(𝑌𝑡−1)

This one is certainly harder to interpret stationality, but based on acf and pacf, because we don’t see them tail off we’ll assume non-stationary.

par(mfrow=c(1,2))
diff.elec = diff(log(electricity), lag=1)
acf(diff.elec, main = "acf"); pacf(diff.elec, main = "pacf")

plot.ts(diff.elec)

Exercise 3: Model Fit

Use the data “cmort” in astsa package examined in Exercise 2

a) By visually checking, decide what ARMA model is appropriate, i.e., decide p, q in ARMA(p,q). Explain your answer.

Examining the acf and pacf plots generated in exercise 2 for the cmort data we see the acf plot tail off and a cut-off on the pacf plot at lag 2 so we can assume it’s an AR(2) model, so we’ll use p=2.

par(mfrow=c(1,2))
acf(cmort, main = "acf"); pacf(cmort, main = "pacf")

b) Fit the model(s) which you chose in part (a) using sarima function in R and interpret the result.

For model diagnostics we see that both ar1 and ar2 have a p-value of 0 which tells us that each point is significantly different from 0. Our Aic value is equal to 6.33 which is pretty low, so it’s a good sign.

ar.cmort = sarima(cmort,2,0,0)
## initial  value 2.300097 
## iter   2 value 2.148325
## iter   3 value 1.756362
## iter   4 value 1.751728
## iter   5 value 1.737837
## iter   6 value 1.737835
## iter   7 value 1.737834
## iter   8 value 1.737832
## iter   9 value 1.737831
## iter  10 value 1.737830
## iter  11 value 1.737829
## iter  12 value 1.737824
## iter  13 value 1.737818
## iter  14 value 1.737810
## iter  15 value 1.737805
## iter  16 value 1.737804
## iter  17 value 1.737804
## iter  18 value 1.737804
## iter  18 value 1.737804
## iter  18 value 1.737804
## final  value 1.737804 
## converged
## initial  value 1.740035 
## iter   2 value 1.740030
## iter   3 value 1.740016
## iter   4 value 1.740012
## iter   5 value 1.740006
## iter   6 value 1.740000
## iter   7 value 1.739986
## iter   8 value 1.739968
## iter   9 value 1.739959
## iter  10 value 1.739956
## iter  11 value 1.739954
## iter  12 value 1.739953
## iter  13 value 1.739952
## iter  14 value 1.739952
## iter  15 value 1.739952
## iter  16 value 1.739951
## iter  17 value 1.739951
## iter  18 value 1.739951
## iter  19 value 1.739950
## iter  20 value 1.739950
## iter  21 value 1.739949
## iter  22 value 1.739949
## iter  23 value 1.739949
## iter  24 value 1.739948
## iter  24 value 1.739948
## iter  24 value 1.739948
## final  value 1.739948 
## converged

ar.cmort
## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans, 
##     fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1     ar2    xmean
##       0.4301  0.4424  88.8538
## s.e.  0.0397  0.0398   1.9407
## 
## sigma^2 estimated as 32.37:  log likelihood = -1604.71,  aic = 3217.43
## 
## $degrees_of_freedom
## [1] 505
## 
## $ttable
##       Estimate     SE t.value p.value
## ar1     0.4301 0.0397 10.8404       0
## ar2     0.4424 0.0398 11.1245       0
## xmean  88.8538 1.9407 45.7839       0
## 
## $AIC
## [1] 6.333521
## 
## $AICc
## [1] 6.333615
## 
## $BIC
## [1] 6.366832

c) Check model diagnostics and explain if your chosen model seems appropriate.

We have large p-values for Ljung-Box test which tells us that there is evidence of white noise in the model, telling us that AR(2) is able to explain all of the autocorrelation within the data. We see significance for lag7 and what looks like lag12 autocorrelation on residuals. Overall we can conclude that the model does seem appropriate.

d) Make a prediction for next 4 values based on your final model.

87.66207, 86.85311, 87.46615, and 87.37190

sarima.for(cmort, n.ahead=4, 2, 0, 0)

## $pred
## Time Series:
## Start = c(1979, 41) 
## End = c(1979, 44) 
## Frequency = 52 
## [1] 87.66207 86.85311 87.46615 87.37190
## 
## $se
## Time Series:
## Start = c(1979, 41) 
## End = c(1979, 44) 
## Frequency = 52 
## [1] 5.689543 6.193387 7.148343 7.612531

Exercise 4: Model Fit

Use the data “gtemp” in astsa package examined in Exercise 2

a) By visually checking, decide what ARMA model is appropriate for 1 order differenced series, 𝑌𝑡′ = 𝑌𝑡 −𝑌𝑡−1 . Explain your answer. If you see multiple ARMA models look appropriate, try all of them, but at most three models.

First looking at the acf and pacf plots the first model that comes to mind is MA(1). I came to this conclusion based on the spike at 1 and that there is no indication of a gradual decay.

par(mfrow=c(1,2))
acf(diff.gtemp, main = "acf"); pacf(diff.gtemp, main = "pacf")

b) Fit the model(s) which you chose in part (a) using sarima function and interpret the result. If you have multiple models, report all of them.

For model diagnostics we see MA(1) have a p-value of 0 which tells us that each point is significantly different from 0. Our Aic value of -1.76.

gt.cmort = sarima(diff.gtemp,0,0,1)
## initial  value -2.213853 
## iter   2 value -2.283711
## iter   3 value -2.285917
## iter   4 value -2.299585
## iter   5 value -2.304774
## iter   6 value -2.320202
## iter   7 value -2.323954
## iter   8 value -2.327090
## iter   9 value -2.328063
## iter  10 value -2.328249
## iter  11 value -2.328250
## iter  12 value -2.328250
## iter  13 value -2.328251
## iter  14 value -2.328251
## iter  14 value -2.328251
## iter  14 value -2.328251
## final  value -2.328251 
## converged
## initial  value -2.326014 
## iter   2 value -2.326045
## iter   3 value -2.326046
## iter   4 value -2.326047
## iter   5 value -2.326048
## iter   5 value -2.326048
## iter   5 value -2.326048
## final  value -2.326048 
## converged

gt.cmort
## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans, 
##     fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##           ma1   xmean
##       -0.6550  0.0063
## s.e.   0.0792  0.0030
## 
## sigma^2 estimated as 0.0095:  log likelihood = 117.02,  aic = -228.03
## 
## $degrees_of_freedom
## [1] 127
## 
## $ttable
##       Estimate     SE t.value p.value
## ma1    -0.6550 0.0792 -8.2660  0.0000
## xmean   0.0063 0.0030  2.1092  0.0369
## 
## $AIC
## [1] -1.767707
## 
## $AICc
## [1] -1.766969
## 
## $BIC
## [1] -1.7012

c) Check model diagnostics and explain if your chosen model seems appropriate. If you have multiple models, choose the final one based on AIC and diagnostics plot.

Examining the diagnostics plot we can first see that within the ACF plot all insignificant autocorrelations telling us that there is evidence of white noise. Our Lyjung-Box also shows the smaller p-values also implying white noise. Based off of both of these indicators we can assume that it will most likely not be appropriate.

d) Make a prediction for next 4 values based on your final model.

-0.023226899, 0.006342873, 0.006342873, and 0.006342873

sarima.for(diff.gtemp, n.ahead=4, p=0, d=0, q=1)

## $pred
## Time Series:
## Start = 2010 
## End = 2013 
## Frequency = 1 
## [1] -0.023226899  0.006342873  0.006342873  0.006342873
## 
## $se
## Time Series:
## Start = 2010 
## End = 2013 
## Frequency = 1 
## [1] 0.09746909 0.11651634 0.11651634 0.11651634