Exercise 1: ACF and PACF from simulated data under ARMA(p,q) model

For simulated stationary data (n=200) under different ARMA models listed below, (i) plot observations, (ii) plot its ACF and PACF plots, and (iii) describe what you observe from ACF and PACF.

a: Two sets of simulated stationary data under AR(3) model by specifying coefficients of your choice

#AR(3)
set.seed(9)
ar3.sim1 = arima.sim(list(order =c(3,0,0), ar=c(0.1, 0.2, 0.4)), n = 200)
ar3.sim2 = arima.sim(list(order =c(3,0,0), ar=c(0.2, 0.3, 0.4)), n = 200)

#plotting observations
par(mfrow=c(1,2))
plot.ts(ar3.sim1)
plot.ts(ar3.sim2)

#plotting ACF
par(mfrow=c(1,2))
acf(ar3.sim1, lwd = 3, main = "acf for ar3.sim1")
acf(ar3.sim2, lwd = 3, main = "acf for ar3.sim2")

#plotting PACF
par(mfrow=c(1,2))
pacf(ar3.sim1, lwd = 3, main = "pacf for ar3.sim1")
pacf(ar3.sim2, lwd = 3, main = "pacf for ar3.sim2")

Comment:

  • Plot observations: When looking at the plot observations, we can see that both plots are roughly horizontal and that you can’t see any predictable patterns in the long-term. This means that they have constant mean, constant variance, and any autocorrelation between adjacent terms is constant across all time periods.

  • ACF plot: As you can see from the ACF plots, autocorrelation functions yield a geometric behavior, like sinusoidal pattern (especially for the first simulation) It is gradually decreasing, tailing off to 0 at a much later lag.

  • PACF plot: The PACF plots, however, has a sharp drop after 3 significant lags, although there are a couple significant points that go beyond the threshold after 3 significant lags.

#MA(3)
set.seed(9)
ma3.sim1 = arima.sim(list(order =c(0,0,3), ma=c(0.9, 0.6, 0.3)), n = 200)
ma3.sim2 = arima.sim(list(order =c(0,0,3), ma=c(0.4, -0.4, -0.2)), n = 200)

#plotting observations
par(mfrow=c(1,2))
plot.ts(ma3.sim1)
plot.ts(ma3.sim2)

#plotting ACF
par(mfrow=c(1,2))
acf(ma3.sim1, lwd = 3, main = "acf")
acf(ma3.sim2, lwd = 3)

#plotting PACF
par(mfrow=c(1,2))
pacf(ma3.sim1, lwd = 3, main = "acf")
pacf(ma3.sim2, lwd = 3)

Comment:

  • Plot observations: Just like the previous observation, we can see that both plots are roughly horizontal and that you can’t see any predictable patterns in the long-term. This means that they have constant mean, constant variance, and any autocorrelation between adjacent terms is constant across all time periods.

  • ACF plot: When defining a moving average model, we expect the opposite from its ACF plot. Rather than gradually decreasing to 0, it has a sharp cut off after the third significant lag.

  • PACF plot: In this situation, the PACF plot has a sinusoidal like behavior. It shows a geometric or gradual decreasing trend, tailing off to 0.


c: Two sets of simulated stationary data under ARMA(2,2) by specifying coefficients of your choice

#ARMA(2,2)
set.seed(9)
arma22.sim1 <- arima.sim(list(order=c(2,0,2), ar=c(0.4, 0.2), ma=c(0.4, 0.1)), n =200)
arma22.sim2 <- arima.sim(list(order=c(2,0,2), ar=c(0.4, 0.1), ma=c(0.2, 0.4)), n =200)

#plotting observations
par(mfrow=c(1,2))
plot.ts(arma22.sim1)
plot.ts(arma22.sim2)

#plotting ACF
par(mfrow=c(1,2))
acf(arma22.sim1, lwd=3)
acf(arma22.sim2, lwd=3)

#plotting PACF
par(mfrow=c(1,2))
pacf(arma22.sim1, lwd=3)
pacf(arma22.sim2, lwd=3)

Comment:

  • Plot observations: Just like the previous observations, we can see that both plots are roughly horizontal and that you can’t see any predictable patterns in the long-term. This means that they have constant mean, constant variance, and any autocorrelation between adjacent terms is constant across all time periods.

  • ACF plots: Our ARMA model yields a geometric/sinusoidal behavior in both of its ACF plots, tapering to 0 at a much later lag in a gradual fashion. The same behavior can be found in our first autocorrelation model.

  • PACF plots: In this scenario, our PACF plot starts from lag 1 instead of lag 0. We can see that it has a sharp cut off after the second significant lag. Unlike our ACF plot, we can’t see any geometric/sinusoidal pattern.


d: Two sets of simulated stationary data under ARMA(3,3) by specifying coefficients of your choice

#ARMA(3,3)
set.seed(9)
arma33.sim1 <- arima.sim(list(order=c(3,0,3), ar=c(0.5, 0.2, 0.1), ma=c(0.5, 0.5, 0.1)), n =200)
arma33.sim2 <- arima.sim(list(order=c(3,0,3), ar=c(0.1, 0.1, 0.3), ma=c(0.1, 0.2, 0.5)), n =200)

#plotting observations
par(mfrow=c(1,2))
plot.ts(arma33.sim1)
plot.ts(arma33.sim2)

#plotting ACF
par(mfrow=c(1,2))
acf(arma33.sim1, lwd=3)
acf(arma33.sim2, lwd=3)

#plotting PACF
par(mfrow=c(1,2))
pacf(arma33.sim1, lwd=3)
pacf(arma33.sim2, lwd=3)

Comment:

  • Plot observations: Just like the previous observations, we can see that both plots are roughly horizontal and that you can’t see any predictable patterns in the long-term. This means that they have constant mean, constant variance, and any autocorrelation between adjacent terms is constant across all time periods.

  • ACF plots: Our ARMA(3,3) model yields a geometric/sinusoidal behavior in both of its ACF plots, tapering to 0 at a much later lag in a gradual fashion. The same behavior can be found in our previous ACF plot, and the first autocorrelation model.

  • PACF plots: We can see that our PACF plots have a sharp cut off at the third significant lag. Unlike our ACF plot, we can’t see any geometric/sinusoidal pattern.


Exercise 2: ACF and PACF from data examples

For data examples listed below, check whether the data look stationary using plotted observations or ACF/ PACF plots. Explain the reason.

a: Use the data cmort in astsa package – weekly cardiovascular mortality in Los Angeles County

#fetching data
data("cmort")

#plotted observations
plot.ts(cmort)

#plotting ACF and PACF
par(mfrow=c(1,2))
acf(cmort, main = "acf")
pacf(cmort, main = "pacf")

Comment: From the plotted observations, we can see that this dataset is a non-stationary dataset. Though it doesn’t significantly show, we can see that there’s a pattern/trend going on in the plot. This is further supported by its ACF plot - it gradually decreases as lags increase. This means that the mean of the series depends on the time that the value is observed. From the PACF plot, we can see that only 2 lags are included.


b: Use the data gtemp in astsa package – annual global mean land-ocean temperature deviations

#plotting observations
plot.ts(gtemp)

#plotting ACF/PACF
par(mfrow=c(1,2))
acf(gtemp, main = "acf")
pacf(gtemp, main = "pacf")

Comment: From the plotted observations, we can see that this dataset is a non-stationary dataset. This is vividly shown by the increasing trend/pattern in the plot over time. Moreover, its ACF plots indicate large points even under large lags. And although they are gradually decreasing, they don’t tail off to 0. This means that the mean of the series depends on the time that the value is observed. From the PACF plot, we can see that only 2 lags are included, just like the previous dataset.


c: Based on the data gtemp in astsa package – annual global mean land-ocean temperature deviations, same data as (b), use the 1 order differenced series

#first order difference
set.seed(9)
diff.gtemp = diff(gtemp, lag=1)
plot.ts(diff.gtemp)

Comment: After executing the first order difference, we can see that the gtemp dataset is now stationary. This is indicated by the roughly horizontal plots and that you can’t see any predictable patterns in the long-term. This means that they have constant mean, constant variance, and any autocorrelation between adjacent terms is constant across all time periods.


d: Use the data electricity in electricity.RData – monthly US electricity generated

load("electricity.RData")

#plotted observations
plot.ts(electricity)

#plotting ACF and PACF
par(mfrow=c(1,2))
acf(electricity, main = "acf")
pacf(electricity, main = "pacf")

Comment: From the plotted observations, we can see that this dataset is a non-stationary dataset. This is vividly shown by the increasing trend/pattern in the plot over time. Moreover, its ACF plots indicate large points even under large lags, and they don’t tail off to 0. This means that the mean of the series depends on the time that the value is observed.


e: Based on the data electricity in electricity.RData – monthly US electricity generated, use the log transformed series.

plot.ts(log(electricity), main = "log transformed series")

Comment: The log transformation stabilizes the mean/variance in the electricity dataset. This is because differencing really helps stabilizing the mean/variance, as it’s the change between each observation in the original series. When plotting the original observations, we can see that it ranges from 150000 to 350000, while the newly log transformed series ranges from only 12 to 12.8.


f: Based on the data electricity in electricity.RData – monthly US electricity generated, use 1 order differenced log transformed series

diff.log.electricity = diff(log(electricity), lag=1)
plot(diff.log.electricity)

Comment: Using first order differenced log transformed series, we can see that the variance/mean for electricity dataset is much more stabilized. This is indicated by the stable, large variations across the plot. Rather than having an increasing pattern, the first order difference log transformation series has a horizontal pattern.


Exercise 3: Model Fit

Use the data cmort in astsa package examined in Exercise 2

a: By visually checking, decide what ARMA model is appropriate, i.e., decide p, q in ARMA(p,q). Explain your answer.

#plotting ACF and PACF
par(mfrow=c(1,2))
acf(cmort, main = "acf")
pacf(cmort, main = "pacf")

Comment: Looking at the PACF plot, we can see that a sharp drop occurs at the second significant lag. Hence, AR(2) seems appropriate for the model. A sinusoidal pattern is also present in its ACF plot.


b: Fit the model(s) which you chose in part (a) using sarima function in R and interpret the result

sarima(cmort, p=2, d=0, q=0)
## initial  value 2.300097 
## iter   2 value 2.148325
## iter   3 value 1.756362
## iter   4 value 1.751728
## iter   5 value 1.737837
## iter   6 value 1.737835
## iter   7 value 1.737834
## iter   8 value 1.737832
## iter   9 value 1.737831
## iter  10 value 1.737830
## iter  11 value 1.737829
## iter  12 value 1.737824
## iter  13 value 1.737818
## iter  14 value 1.737810
## iter  15 value 1.737805
## iter  16 value 1.737804
## iter  17 value 1.737804
## iter  18 value 1.737804
## iter  18 value 1.737804
## iter  18 value 1.737804
## final  value 1.737804 
## converged
## initial  value 1.740035 
## iter   2 value 1.740030
## iter   3 value 1.740016
## iter   4 value 1.740012
## iter   5 value 1.740006
## iter   6 value 1.740000
## iter   7 value 1.739986
## iter   8 value 1.739968
## iter   9 value 1.739959
## iter  10 value 1.739956
## iter  11 value 1.739954
## iter  12 value 1.739953
## iter  13 value 1.739952
## iter  14 value 1.739952
## iter  15 value 1.739952
## iter  16 value 1.739951
## iter  17 value 1.739951
## iter  18 value 1.739951
## iter  19 value 1.739950
## iter  20 value 1.739950
## iter  21 value 1.739949
## iter  22 value 1.739949
## iter  23 value 1.739949
## iter  24 value 1.739948
## iter  24 value 1.739948
## iter  24 value 1.739948
## final  value 1.739948 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans, 
##     fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1     ar2    xmean
##       0.4301  0.4424  88.8538
## s.e.  0.0397  0.0398   1.9407
## 
## sigma^2 estimated as 32.37:  log likelihood = -1604.71,  aic = 3217.43
## 
## $degrees_of_freedom
## [1] 505
## 
## $ttable
##       Estimate     SE t.value p.value
## ar1     0.4301 0.0397 10.8404       0
## ar2     0.4424 0.0398 11.1245       0
## xmean  88.8538 1.9407 45.7839       0
## 
## $AIC
## [1] 6.333521
## 
## $AICc
## [1] 6.333615
## 
## $BIC
## [1] 6.366832

Comment: Our estimated coefficients for ar1 would be 0.43, and 0.44 for ar2, which aren’t far too different. Looking at the p-value for ar1 and ar2, we can conclude that both of our models look fine as they are significantly different from 0 and are proper.


c: Check model diagnostics and explain if your chosen model seems appropriate.

Comment: When analyzing the diagnostic plots, we mainly focus on its ACF of Residuals plot, and its Ljung-Box.

  • ACF of Residuals: There are insignificant autocorrelations on residuals through all lags, as this is indicated by the black lines inside the blue dotted line.

  • Ljung-Box: All the p-values are larger than 0.05, which implies the evidence of white noise. This means that Yt-hat explains all the correlation part or systematic change part in the Yt - the model is fine.


d: Make a prediction for next 4 values based on your final model.

#forecasting
sarima.for(cmort, n.ahead=4, p=2, d=0, q=0)

## $pred
## Time Series:
## Start = c(1979, 41) 
## End = c(1979, 44) 
## Frequency = 52 
## [1] 87.66207 86.85311 87.46615 87.37190
## 
## $se
## Time Series:
## Start = c(1979, 41) 
## End = c(1979, 44) 
## Frequency = 52 
## [1] 5.689543 6.193387 7.148343 7.612531

Comment: The 4 predicted values for cmort based on AR(2) are 5.69, 6.19, 7.15, and 7.61.


Exercise 4: Model Fit

Use the data gtemp in astsa package examined in Exercise 2

a: By visually checking, decide what ARMA model is appropriate for 1 order differenced series. Explain your answer. If you see multiple ARMA models look appropriate, try all of them, but at most three models.

set.seed(9)
diff.gtemp = diff(gtemp, lag=1)
plot.ts(diff.gtemp)

#plotting ACF and PACF
par(mfrow=c(1,2))
acf(diff.gtemp, main = "acf")
pacf(diff.gtemp, main = "pacf")

Comment: Looking at both plots, we can see that there are multiple models that can be obtained. In the ACF plot, the sharp drop occurs at lag 9. This means that MA(9) can be used as one of our models. In the PACF plot yields a sinusoidal pattern, with a cut off at lag 5. Hence, AR(5) would be a good model to use.


b: Fit the model(s) which you chose in part (a) using sarima function and interpret the result. If you have multiple models, report all of them.

#AR(5)
set.seed(9)
sarima(gtemp, p=5, d=1, q=0)
## initial  value -2.197587 
## iter   2 value -2.302454
## iter   3 value -2.338428
## iter   4 value -2.355919
## iter   5 value -2.356806
## iter   6 value -2.357706
## iter   7 value -2.358177
## iter   8 value -2.358216
## iter   9 value -2.358216
## iter   9 value -2.358216
## iter   9 value -2.358216
## final  value -2.358216 
## converged
## initial  value -2.369568 
## iter   2 value -2.369639
## iter   3 value -2.369649
## iter   4 value -2.369685
## iter   5 value -2.369688
## iter   6 value -2.369688
## iter   6 value -2.369688
## iter   6 value -2.369688
## final  value -2.369688 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##           ar1      ar2      ar3      ar4      ar5  constant
##       -0.4592  -0.4483  -0.3914  -0.0844  -0.1955    0.0064
## s.e.   0.0862   0.0952   0.0971   0.0949   0.0862    0.0032
## 
## sigma^2 estimated as 0.008693:  log likelihood = 122.65,  aic = -231.29
## 
## $degrees_of_freedom
## [1] 123
## 
## $ttable
##          Estimate     SE t.value p.value
## ar1       -0.4592 0.0862 -5.3243  0.0000
## ar2       -0.4483 0.0952 -4.7114  0.0000
## ar3       -0.3914 0.0971 -4.0290  0.0001
## ar4       -0.0844 0.0949 -0.8898  0.3753
## ar5       -0.1955 0.0862 -2.2664  0.0252
## constant   0.0064 0.0032  1.9751  0.0505
## 
## $AIC
## [1] -1.792972
## 
## $AICc
## [1] -1.787634
## 
## $BIC
## [1] -1.637788

Comment: Looking at the p-values, we can conclude that all coefficients are significantly different from 0 except for ar4 (p-value > 0.05) Its AIC is -1.79 and BIC is -1.64.

#AR(9)
set.seed(9)
sarima(gtemp, p=0, d=1, q=9)
## initial  value -2.213853 
## iter   2 value -2.349486
## iter   3 value -2.373034
## iter   4 value -2.379587
## iter   5 value -2.381940
## iter   6 value -2.383877
## iter   7 value -2.384270
## iter   8 value -2.384889
## iter   9 value -2.384960
## iter  10 value -2.384974
## iter  11 value -2.384975
## iter  12 value -2.384976
## iter  13 value -2.384976
## iter  13 value -2.384976
## iter  13 value -2.384976
## final  value -2.384976 
## converged
## initial  value -2.381100 
## iter   2 value -2.381135
## iter   3 value -2.381178
## iter   4 value -2.381205
## iter   5 value -2.381214
## iter   6 value -2.381226
## iter   7 value -2.381229
## iter   8 value -2.381230
## iter   9 value -2.381230
## iter   9 value -2.381230
## iter   9 value -2.381230
## final  value -2.381230 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##           ma1      ma2      ma3     ma4      ma5      ma6     ma7     ma8
##       -0.4660  -0.2519  -0.1097  0.2261  -0.0536  -0.0043  -0.011  0.1591
## s.e.   0.0872   0.0944   0.1030  0.1010   0.1020   0.1186   0.086  0.1222
##           ma9  constant
##       -0.1855    0.0063
## s.e.   0.1020    0.0025
## 
## sigma^2 estimated as 0.008475:  log likelihood = 124.14,  aic = -226.27
## 
## $degrees_of_freedom
## [1] 119
## 
## $ttable
##          Estimate     SE t.value p.value
## ma1       -0.4660 0.0872 -5.3450  0.0000
## ma2       -0.2519 0.0944 -2.6688  0.0087
## ma3       -0.1097 0.1030 -1.0649  0.2891
## ma4        0.2261 0.1010  2.2378  0.0271
## ma5       -0.0536 0.1020 -0.5255  0.6002
## ma6       -0.0043 0.1186 -0.0359  0.9714
## ma7       -0.0110 0.0860 -0.1281  0.8983
## ma8        0.1591 0.1222  1.3021  0.1954
## ma9       -0.1855 0.1020 -1.8184  0.0715
## constant   0.0063 0.0025  2.4973  0.0139
## 
## $AIC
## [1] -1.75404
## 
## $AICc
## [1] -1.739588
## 
## $BIC
## [1] -1.51018

Comment: Looking at the p-values, we can conclude that the only coefficients that are significantly different than 0 are ma1, ma2, and ma4 (p-value < 0.05). Its AIC value is -1.74 and BIC value is -1.51, which is higher than our AR(5) model.


c: Check model diagnostics and explain if your chosen model seems appropriate. If you have multiple models, choose the final one based on AIC and diagnostics plot

Comment: When analyzing the diagnostic plots, we mainly focus on its ACF of Residuals plot, and its Ljung-Box. Both models yield more or less the same result in the diagnostic plots.

  • ACF of Residuals: There are insignificant autocorrelations on residuals through all lags, as this is indicated by the black lines inside the blue dotted line.

  • Ljung-Box: All the p-values are larger than 0.05, which implies the evidence of white noise. This means that Yt-hat explains all the correlation part or systematic change part in the Yt - both models are fine.

  • Model Selection: The most appropriate model in this case would be AR(5) since it has a lower AIC and BIC value compared to the MA(9) model. Not only that, but it also uses fewer coefficients than MA(9) which makes it a simpler model.


d: Make a prediction for next 4 values based on your final model.

sarima.for(gtemp, n.ahead=4, p=5, d=1, q=0)

## $pred
## Time Series:
## Start = 2010 
## End = 2013 
## Frequency = 1 
## [1] 0.5525449 0.5821039 0.5439333 0.5869987
## 
## $se
## Time Series:
## Start = 2010 
## End = 2013 
## Frequency = 1 
## [1] 0.09323566 0.10599813 0.10970600 0.11172644

Comment: The 4 predicted values for gtemp based on AR(5) are 0.09, 0.10, 0.11, and 0.11.