Exercise 1: ACF and PACF from simulated data under ARMA(p,q) model

For simulated stationary data (n=200) under different ARMA models listed below, (i) plot observations, (ii) plot its ACF and PACF plots, and (iii) describe what you observe from ACF and PACF.

a) Two sets of simulated stationary data under AR(3) model by specifying coefficients (𝜙1,𝜙2,𝜙3) with your choice NOTE: for the case of non-stationary model, R will give a warning message.

set.seed(100)
ar3.sim1 = arima.sim(list(order =c(3,0,0), ar=c(0.7, 0.2,-0.5)), n = 200)
ar3.sim2 = arima.sim(list(order =c(3,0,0), ar=c(-0.8, 0.1, 0.3)), n = 200)

par(mfrow=c(1,2))
plot.ts(ar3.sim1)
plot.ts(ar3.sim2)

acf(ar3.sim1, lwd = 3)
acf(ar3.sim2, lwd = 3)

pacf(ar3.sim1, lwd = 3)
pacf(ar3.sim2, lwd = 3)

For the ACF on both simulated models, we see that the data tails off to zero. For the PACF there is a cut off around lag=3.

b) Two sets of simulated stationary data under MA(3) model by specifying coefficients (𝜃1, 𝜃2, 𝜃3) with your choice

ma3.sim1 = arima.sim(list(order =c(0,0,3), ma=c(0.9, 0.6,-0.1)), n = 200)
ma3.sim2 = arima.sim(list(order =c(0,0,3), ma=c(0.1, 0.4, -0.4)), n = 200)

par(mfrow=c(1,2))
plot.ts(ma3.sim1)
plot.ts(ma3.sim2)

acf(ma3.sim1, lwd = 3)
acf(ma3.sim2, lwd = 3)

pacf(ma3.sim1, lwd = 3)
pacf(ma3.sim2, lwd = 3)

For the ACF on both simulated models, we see that the data tails off to zero. Actually, for the second model it is hard to tell whether it is tailing off or has a cut off at lag=4. For the PACF, they both seem to have a cutoff around lag=7.

c) Two sets of simulated stationary data under ARMA(2,2) by specifying coefficients (𝜙1,𝜙2, 𝜃1, 𝜃2) with your choice

ar2ma2.sim1 = arima.sim(list(order =c(2,0,2), ar=c(0.7, 0.2), ma=c(-0.3, 0.5)), n = 200)
ar2ma2.sim2 = arima.sim(list(order =c(2,0,2), ar=c(-0.3, 0.6), ma=c(-0.5, 0.1)), n = 200)

par(mfrow=c(1,2))
plot.ts(ar2ma2.sim1)
plot.ts(ar2ma2.sim2)

acf(ar2ma2.sim1, lwd = 3)
acf(ar2ma2.sim2, lwd = 3)

pacf(ar2ma2.sim1, lwd = 3)
pacf(ar2ma2.sim2, lwd = 3)

For both models we can see that the ACF tails off to zero. However, in the first model it tails off with all positive values while the second model tails off in a “funnel” looking pattern. In the PACF plots, we can see a clear cutoff for both. The cutoff for the first model is at lag=2. The cutoff for the second model is at lag=1.

d) Two sets of simulated stationary data under ARMA(3,3) by specifying coefficients (𝜙1,𝜙2,𝜙3, 𝜃1, 𝜃2, 𝜃3) with your choice

ar3ma3.sim1 = arima.sim(list(order =c(3,0,3), ar=c(0.3, -0.2, 0.4), ma=c(0.2, -0.7, 0.8)), n = 200)
ar3ma3.sim2 = arima.sim(list(order =c(3,0,3), ar=c(0.4, -0.5, -0.1), ma=c(0.6, -0.2, -0.7)), n = 200)

par(mfrow=c(1,2))
plot.ts(ar3ma3.sim1)
plot.ts(ar3ma3.sim2)

acf(ar3ma3.sim1, lwd = 3)
acf(ar3ma3.sim2, lwd = 3)

pacf(ar3ma3.sim1, lwd = 3)
pacf(ar3ma3.sim2, lwd = 3)

For the ACF plots, it looks as though both tail off but it is kind of hard to say. For the first model PACF, it seems as if there is a cutoff at lag=5. For the second model it also seems like there is a cutoff around lag=8.

Exercise 2: ACF and PACF from data examples

For data examples listed below, check whether the data looks stationary using plotted observations or ACF/ PACF plots. Explain the reasoning.

a) Use the data “cmort” in astsa package – weekly cardiovascular mortality in Los Angeles County

plot.ts(cmort)

acf(cmort, lwd = 3)

pacf(cmort, lwd = 3)

I would say that this data is stationary because it is roughly horizontal (constant mean), seems to have constant variance, and no major long-term pattern. It is sloping downwards a little bit but I don’t think it is enough of a trend to say this is non-stationary. Also, the ACF plot drops to zero very quickly which is indicative of stationary data.

b) Use the data “gtemp” in astsa package – annual global mean land-ocean temperature deviations

plot.ts(gtemp)

acf(gtemp, lwd = 3)

pacf(gtemp, lwd = 3)

I would say this data is non-stationary because there is a pretty obvious upward trend and the mean is clearly not constant.

c) Based on the data “gtemp” in astsa package – annual global mean land-ocean temperature deviations, same data as (b), use the 1 order differenced series, \(𝑌𝑡′ = 𝑌𝑡 − 𝑌𝑡−1\)

diff.gtemp = diff(gtemp, lag=1)
plot.ts(diff.gtemp)

acf(diff.gtemp, lwd = 3)

pacf(diff.gtemp, lwd = 3)

After performing a first order difference on gtemp, we can see that it now looks stationary. It appears to have a constant mean and constant variance. Variance increases a little bit over time, but I would not say that it is enough to call the data non-stationary. Also, we can see from the ACF plot that the value drops quickly to zero, which is also indicative of stationary data.

d) Use the data “electricity” in electricity.RData – monthly US electricity generated

load("C:/Users/bstok/Desktop/Park DA 6213 Data Drive Decision Making and Design/Homework 1/electricity.RData")
plot.ts(electricity)

acf(electricity, lwd = 3)

pacf(electricity, lwd = 3)

As we can see from the time series plot, this data is not stationary due to an upward trend in the data (non constant mean). There seems to also be heteroscedasticity as the variance seems to increase over time.

e) Based on the data “electricity” in electricity.RData – monthly US electricity generated, use the log transformed series, \(log(𝑌𝑡)\)

log.electricity = log(electricity)
plot.ts(log.electricity)

acf(log.electricity, lwd = 3)

pacf(log.electricity, lwd = 3)

After performing a log transform on electricity we see that the time series plot still seems to suggest the data is non stationary. Although the values are smaller, we still see an upward trend in the data (non constant mean). However, the data does seem to have homoscedasticity after the log transform.

f) Based on the data “electricity” in electricity.RData – monthly US electricity generated, use 1 order differenced log transformed series, \(log(𝑌𝑡)′ = log(𝑌𝑡) − log(𝑌𝑡−1)\)

diff.log.electricity = diff(log(electricity), lag=1)
plot.ts(diff.log.electricity)

acf(diff.log.electricity, lwd = 3)

pacf(diff.log.electricity, lwd = 3)

After performing the first order differenced log transform on the data, we can see that the data now appears to be stationary. We have constant mean and constant variance.

Exercise 3: Model Fit

Use the data “cmort” in astsa package examined in Exercise 2

a) By visually checking, decide what ARMA model is appropriate, i.e., decide p, q in ARMA(p,q). Explain your answer.

plot(cmort)

acf(cmort, lwd = 3)

pacf(cmort, lwd = 3)

From the ACF and PACF plots we can see that the ACF tails off to zero and the PACF has a cutoff at lag 2 indicating an AR(2) model may fit this data.

b) Fit the model(s) which you chose in part (a) using sarima function in R and interpret the result.

fit.cmort = sarima(cmort, p=2, d=0, q=0) 
## initial  value 2.300097 
## iter   2 value 2.148325
## iter   3 value 1.756362
## iter   4 value 1.751728
## iter   5 value 1.737837
## iter   6 value 1.737835
## iter   7 value 1.737834
## iter   8 value 1.737832
## iter   9 value 1.737831
## iter  10 value 1.737830
## iter  11 value 1.737829
## iter  12 value 1.737824
## iter  13 value 1.737818
## iter  14 value 1.737810
## iter  15 value 1.737805
## iter  16 value 1.737804
## iter  17 value 1.737804
## iter  18 value 1.737804
## iter  18 value 1.737804
## iter  18 value 1.737804
## final  value 1.737804 
## converged
## initial  value 1.740035 
## iter   2 value 1.740030
## iter   3 value 1.740016
## iter   4 value 1.740012
## iter   5 value 1.740006
## iter   6 value 1.740000
## iter   7 value 1.739986
## iter   8 value 1.739968
## iter   9 value 1.739959
## iter  10 value 1.739956
## iter  11 value 1.739954
## iter  12 value 1.739953
## iter  13 value 1.739952
## iter  14 value 1.739952
## iter  15 value 1.739952
## iter  16 value 1.739951
## iter  17 value 1.739951
## iter  18 value 1.739951
## iter  19 value 1.739950
## iter  20 value 1.739950
## iter  21 value 1.739949
## iter  22 value 1.739949
## iter  23 value 1.739949
## iter  24 value 1.739948
## iter  24 value 1.739948
## iter  24 value 1.739948
## final  value 1.739948 
## converged

The p-values for both of our AR coefficients are significant and our AIC value is 6.333521.

c) Check model diagnostics and explain if your chosen model seems appropriate.

Based on the diagnostic plots from part (b), we can see that all of our residuals appear as white noise as they are all within the boundaries. For our Ljung-Box statistic, we can see that all p-values are all above 0.05. After observing the behavior of both of these plots, we can say that our model is appropriate.

d) Make a prediction for next 4 values based on your final model.

sarima.for(cmort, 4, p=2, d=0, q=0)

## $pred
## Time Series:
## Start = c(1979, 41) 
## End = c(1979, 44) 
## Frequency = 52 
## [1] 87.66207 86.85311 87.46615 87.37190
## 
## $se
## Time Series:
## Start = c(1979, 41) 
## End = c(1979, 44) 
## Frequency = 52 
## [1] 5.689543 6.193387 7.148343 7.612531

We can see from the output that our model predictions are: 87.66207, 86.85311, 87.46615, and 87.37190.

Exercise 4: Model Fit

Use the data “gtemp” in astsa package examined in Exercise 2

a) By visually checking, decide what ARMA model is appropriate for 1 order differenced series, 𝑌𝑡′=𝑌𝑡−𝑌𝑡−1 . Explain your answer. If you see multiple ARMA models look appropriate, try all of them, but at most three models.

diff.gtemp = diff(gtemp, lag=1)
plot.ts(diff.gtemp)

acf(diff.gtemp, lwd = 3)

pacf(diff.gtemp, lwd = 3)

From the PACF and ACF plots, we notice that there is a cutoff at 2 on the ACF graph which indicates an MA(2) model may fit this data. Also, there appears to be a cutoff at 5 on the PACF graph which indicates an AR(5) model may fit this data. We will try fitting both of these models to the data.

b) Fit the model(s) which you chose in part (a) using sarima function and interpret the result. If you have multiple models, report all of them.

fit.diff.gtemp.ma2 = sarima(diff.gtemp, p=0, d=0, q=2) 
## initial  value -2.213853 
## iter   2 value -2.315754
## iter   3 value -2.346899
## iter   4 value -2.347576
## iter   5 value -2.348789
## iter   6 value -2.348964
## iter   7 value -2.349074
## iter   8 value -2.349076
## iter   8 value -2.349076
## final  value -2.349076 
## converged
## initial  value -2.346874 
## iter   2 value -2.346888
## iter   3 value -2.346891
## iter   4 value -2.346892
## iter   5 value -2.346893
## iter   5 value -2.346893
## iter   5 value -2.346893
## final  value -2.346893 
## converged

fit.diff.gtemp.ma2
## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans, 
##     fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##           ma1      ma2   xmean
##       -0.5134  -0.1915  0.0065
## s.e.   0.0830   0.0780  0.0025
## 
## sigma^2 estimated as 0.00911:  log likelihood = 119.71,  aic = -231.41
## 
## $degrees_of_freedom
## [1] 126
## 
## $ttable
##       Estimate     SE t.value p.value
## ma1    -0.5134 0.0830 -6.1847  0.0000
## ma2    -0.1915 0.0780 -2.4552  0.0154
## xmean   0.0065 0.0025  2.5425  0.0122
## 
## $AIC
## [1] -1.793892
## 
## $AICc
## [1] -1.792404
## 
## $BIC
## [1] -1.705216
fit.diff.gtemp.ar5 = sarima(diff.gtemp, p=5, d=0, q=0) 
## initial  value -2.197587 
## iter   2 value -2.302454
## iter   3 value -2.338428
## iter   4 value -2.355919
## iter   5 value -2.356806
## iter   6 value -2.357706
## iter   7 value -2.358177
## iter   8 value -2.358216
## iter   9 value -2.358216
## iter   9 value -2.358216
## iter   9 value -2.358216
## final  value -2.358216 
## converged
## initial  value -2.369568 
## iter   2 value -2.369639
## iter   3 value -2.369649
## iter   4 value -2.369685
## iter   5 value -2.369688
## iter   6 value -2.369688
## iter   6 value -2.369688
## iter   6 value -2.369688
## final  value -2.369688 
## converged

fit.diff.gtemp.ar5
## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans, 
##     fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##           ar1      ar2      ar3      ar4      ar5   xmean
##       -0.4592  -0.4483  -0.3914  -0.0844  -0.1955  0.0064
## s.e.   0.0862   0.0952   0.0971   0.0949   0.0862  0.0032
## 
## sigma^2 estimated as 0.008693:  log likelihood = 122.65,  aic = -231.29
## 
## $degrees_of_freedom
## [1] 123
## 
## $ttable
##       Estimate     SE t.value p.value
## ar1    -0.4592 0.0862 -5.3243  0.0000
## ar2    -0.4483 0.0952 -4.7114  0.0000
## ar3    -0.3914 0.0971 -4.0290  0.0001
## ar4    -0.0844 0.0949 -0.8898  0.3753
## ar5    -0.1955 0.0862 -2.2664  0.0252
## xmean   0.0064 0.0032  1.9751  0.0505
## 
## $AIC
## [1] -1.792972
## 
## $AICc
## [1] -1.787634
## 
## $BIC
## [1] -1.637788

We can see that all of the coefficients for the MA(2) model are significant. However, not all of the coefficients from the AR(5) model are significant.

c) Check model diagnostics and explain if your chosen model seems appropriate. If you have multiple models, choose the final one based on AIC and diagnostics plot.

Based on the diagnostics plot from part (b), we can see that both of the models are appropriate because the residuals appear to be white noise as they are all within the boundaries and all p-values are all above 0.05 for our Ljung-Box statistic. We can see that both AIC values are similar but the MA(2) model is slightly better based on AIC criteria and is the simpler model, therefore we will choose the MA(2) model over the AR(5) model.

d) Make a prediction for next 4 values based on your final model.

sarima.for(diff.gtemp, 4, p=0, d=0, q=2)

## $pred
## Time Series:
## Start = 2010 
## End = 2013 
## Frequency = 1 
## [1] -0.007650622 -0.007983511  0.006464234  0.006464234
## 
## $se
## Time Series:
## Start = 2010 
## End = 2013 
## Frequency = 1 
## [1] 0.09544714 0.10729062 0.10883624 0.10883624

We can see from the output that our model predictions are: -0.007650622, -0.007983511, 0.006464234, and 0.006464234.