a: Two sets of simulated stationary data under AR(3) model by specifying coefficients of your choice
#AR(3)
set.seed(9)
ar3.sim1 = arima.sim(list(order =c(3,0,0), ar=c(0.1, 0.2, 0.4)), n = 200)
ar3.sim2 = arima.sim(list(order =c(3,0,0), ar=c(0.2, 0.3, 0.4)), n = 200)
#plotting observations
par(mfrow=c(1,2))
plot.ts(ar3.sim1)
plot.ts(ar3.sim2)
#plotting ACF
par(mfrow=c(1,2))
acf(ar3.sim1, lwd = 3, main = "acf for ar3.sim1")
acf(ar3.sim2, lwd = 3, main = "acf for ar3.sim2")
#plotting PACF
par(mfrow=c(1,2))
pacf(ar3.sim1, lwd = 3, main = "pacf for ar3.sim1")
pacf(ar3.sim2, lwd = 3, main = "pacf for ar3.sim2")
Comment:
Plot observations: When looking at the plot observations, we can see that both plots are roughly horizontal and that you can’t see any predictable patterns in the long-term. This means that they have constant mean, constant variance, and any autocorrelation between adjacent terms is constant across all time periods.
ACF plot: As you can see from the ACF plots, autocorrelation functions yield a geometric behavior, like sinusoidal pattern (especially for the first simulation) It is gradually decreasing, tailing off to 0 at a much later lag.
PACF plot: The PACF plots, however, has a sharp drop after 3 significant lags, although there are a couple significant points that go beyond the threshold after 3 significant lags.
#MA(3)
set.seed(9)
ma3.sim1 = arima.sim(list(order =c(0,0,3), ma=c(0.9, 0.6, 0.3)), n = 200)
ma3.sim2 = arima.sim(list(order =c(0,0,3), ma=c(0.4, -0.4, -0.2)), n = 200)
#plotting observations
par(mfrow=c(1,2))
plot.ts(ma3.sim1)
plot.ts(ma3.sim2)
#plotting ACF
par(mfrow=c(1,2))
acf(ma3.sim1, lwd = 3, main = "acf")
acf(ma3.sim2, lwd = 3)
#plotting PACF
par(mfrow=c(1,2))
pacf(ma3.sim1, lwd = 3, main = "acf")
pacf(ma3.sim2, lwd = 3)
Comment:
Plot observations: Just like the previous observation, we can see that both plots are roughly horizontal and that you can’t see any predictable patterns in the long-term. This means that they have constant mean, constant variance, and any autocorrelation between adjacent terms is constant across all time periods.
ACF plot: When defining a moving average model, we expect the opposite from its ACF plot. Rather than gradually decreasing to 0, it has a sharp cut off after the third significant lag.
PACF plot: In this situation, the PACF plot has a sinusoidal like behavior. It shows a geometric or gradual decreasing trend, tailing off to 0.
c: Two sets of simulated stationary data under ARMA(2,2) by specifying coefficients of your choice
#ARMA(2,2)
set.seed(9)
arma22.sim1 <- arima.sim(list(order=c(2,0,2), ar=c(0.4, 0.2), ma=c(0.4, 0.1)), n =200)
arma22.sim2 <- arima.sim(list(order=c(2,0,2), ar=c(0.4, 0.1), ma=c(0.2, 0.4)), n =200)
#plotting observations
par(mfrow=c(1,2))
plot.ts(arma22.sim1)
plot.ts(arma22.sim2)
#plotting ACF
par(mfrow=c(1,2))
acf(arma22.sim1, lwd=3)
acf(arma22.sim2, lwd=3)
#plotting PACF
par(mfrow=c(1,2))
pacf(arma22.sim1, lwd=3)
pacf(arma22.sim2, lwd=3)
Comment:
Plot observations: Just like the previous observations, we can see that both plots are roughly horizontal and that you can’t see any predictable patterns in the long-term. This means that they have constant mean, constant variance, and any autocorrelation between adjacent terms is constant across all time periods.
ACF plots: Our ARMA model yields a geometric/sinusoidal behavior in both of its ACF plots, tapering to 0 at a much later lag in a gradual fashion. The same behavior can be found in our first autocorrelation model.
PACF plots: In this scenario, our PACF plot starts from lag 1 instead of lag 0. We can see that it has a sharp cut off after the second significant lag. Unlike our ACF plot, we can’t see any geometric/sinusoidal pattern.
d: Two sets of simulated stationary data under ARMA(3,3) by specifying coefficients of your choice
#ARMA(3,3)
set.seed(9)
arma33.sim1 <- arima.sim(list(order=c(3,0,3), ar=c(0.5, 0.2, 0.1), ma=c(0.5, 0.5, 0.1)), n =200)
arma33.sim2 <- arima.sim(list(order=c(3,0,3), ar=c(0.1, 0.1, 0.3), ma=c(0.1, 0.2, 0.5)), n =200)
#plotting observations
par(mfrow=c(1,2))
plot.ts(arma33.sim1)
plot.ts(arma33.sim2)
#plotting ACF
par(mfrow=c(1,2))
acf(arma33.sim1, lwd=3)
acf(arma33.sim2, lwd=3)
#plotting PACF
par(mfrow=c(1,2))
pacf(arma33.sim1, lwd=3)
pacf(arma33.sim2, lwd=3)
Comment:
Plot observations: Just like the previous observations, we can see that both plots are roughly horizontal and that you can’t see any predictable patterns in the long-term. This means that they have constant mean, constant variance, and any autocorrelation between adjacent terms is constant across all time periods.
ACF plots: Our ARMA(3,3) model yields a geometric/sinusoidal behavior in both of its ACF plots, tapering to 0 at a much later lag in a gradual fashion. The same behavior can be found in our previous ACF plot, and the first autocorrelation model.
PACF plots: We can see that our PACF plots have a sharp cut off at the third significant lag. Unlike our ACF plot, we can’t see any geometric/sinusoidal pattern.
a: Use the data cmort in astsa package – weekly cardiovascular mortality in Los Angeles County
#fetching data
data("cmort")
#plotted observations
plot.ts(cmort)
#plotting ACF and PACF
par(mfrow=c(1,2))
acf(cmort, main = "acf")
pacf(cmort, main = "pacf")
Comment: From the plotted observations, we can see that this dataset is a non-stationary dataset. Though it doesn’t significantly show, we can see that there’s a pattern/trend going on in the plot. This is further supported by its ACF plot - it gradually decreases as lags increase. This means that the mean of the series depends on the time that the value is observed. From the PACF plot, we can see that only 2 lags are included.
b: Use the data gtemp in astsa package – annual global mean land-ocean temperature deviations
#plotting observations
plot.ts(gtemp)
#plotting ACF/PACF
par(mfrow=c(1,2))
acf(gtemp, main = "acf")
pacf(gtemp, main = "pacf")
Comment: From the plotted observations, we can see that this dataset is a non-stationary dataset. This is vividly shown by the increasing trend/pattern in the plot over time. Moreover, its ACF plots indicate large points even under large lags. And although they are gradually decreasing, they don’t tail off to 0. This means that the mean of the series depends on the time that the value is observed. From the PACF plot, we can see that only 2 lags are included, just like the previous dataset.
c: Based on the data gtemp in astsa package – annual global mean land-ocean temperature deviations, same data as (b), use the 1 order differenced series
#first order difference
set.seed(9)
diff.gtemp = diff(gtemp, lag=1)
plot.ts(diff.gtemp)
Comment: After executing the first order difference, we can see that the gtemp dataset is now stationary. This is indicated by the roughly horizontal plots and that you can’t see any predictable patterns in the long-term. This means that they have constant mean, constant variance, and any autocorrelation between adjacent terms is constant across all time periods.
d: Use the data electricity in electricity.RData – monthly US electricity generated
load("electricity.RData")
#plotted observations
plot.ts(electricity)
#plotting ACF and PACF
par(mfrow=c(1,2))
acf(electricity, main = "acf")
pacf(electricity, main = "pacf")
Comment: From the plotted observations, we can see that this dataset is a non-stationary dataset. This is vividly shown by the increasing trend/pattern in the plot over time. Moreover, its ACF plots indicate large points even under large lags, and they don’t tail off to 0. This means that the mean of the series depends on the time that the value is observed.
e: Based on the data electricity in electricity.RData – monthly US electricity generated, use the log transformed series.
plot.ts(log(electricity), main = "log transformed series")
Comment: The log transformation stabilizes the mean/variance in the electricity dataset. This is because differencing really helps stabilizing the mean/variance, as it’s the change between each observation in the original series. When plotting the original observations, we can see that it ranges from 150000 to 350000, while the newly log transformed series ranges from only 12 to 12.8.
f: Based on the data electricity in electricity.RData – monthly US electricity generated, use 1 order differenced log transformed series
diff.log.electricity = diff(log(electricity), lag=1)
plot(diff.log.electricity)
Comment: Using first order differenced log transformed series, we can see that the variance/mean for electricity dataset is much more stabilized. This is indicated by the stable, large variations across the plot. Rather than having an increasing pattern, the first order difference log transformation series has a horizontal pattern.
cmort in astsa package examined in Exercise 2a: By visually checking, decide what ARMA model is appropriate, i.e., decide p, q in ARMA(p,q). Explain your answer.
#plotting ACF and PACF
par(mfrow=c(1,2))
acf(cmort, main = "acf")
pacf(cmort, main = "pacf")
Comment: Looking at the PACF plot, we can see that a sharp drop occurs at the second significant lag. Hence, AR(2) seems appropriate for the model. A sinusoidal pattern is also present in its ACF plot.
b: Fit the model(s) which you chose in part (a) using sarima function in R and interpret the result
sarima(cmort, p=2, d=0, q=0)
## initial value 2.300097
## iter 2 value 2.148325
## iter 3 value 1.756362
## iter 4 value 1.751728
## iter 5 value 1.737837
## iter 6 value 1.737835
## iter 7 value 1.737834
## iter 8 value 1.737832
## iter 9 value 1.737831
## iter 10 value 1.737830
## iter 11 value 1.737829
## iter 12 value 1.737824
## iter 13 value 1.737818
## iter 14 value 1.737810
## iter 15 value 1.737805
## iter 16 value 1.737804
## iter 17 value 1.737804
## iter 18 value 1.737804
## iter 18 value 1.737804
## iter 18 value 1.737804
## final value 1.737804
## converged
## initial value 1.740035
## iter 2 value 1.740030
## iter 3 value 1.740016
## iter 4 value 1.740012
## iter 5 value 1.740006
## iter 6 value 1.740000
## iter 7 value 1.739986
## iter 8 value 1.739968
## iter 9 value 1.739959
## iter 10 value 1.739956
## iter 11 value 1.739954
## iter 12 value 1.739953
## iter 13 value 1.739952
## iter 14 value 1.739952
## iter 15 value 1.739952
## iter 16 value 1.739951
## iter 17 value 1.739951
## iter 18 value 1.739951
## iter 19 value 1.739950
## iter 20 value 1.739950
## iter 21 value 1.739949
## iter 22 value 1.739949
## iter 23 value 1.739949
## iter 24 value 1.739948
## iter 24 value 1.739948
## iter 24 value 1.739948
## final value 1.739948
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
## fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ar2 xmean
## 0.4301 0.4424 88.8538
## s.e. 0.0397 0.0398 1.9407
##
## sigma^2 estimated as 32.37: log likelihood = -1604.71, aic = 3217.43
##
## $degrees_of_freedom
## [1] 505
##
## $ttable
## Estimate SE t.value p.value
## ar1 0.4301 0.0397 10.8404 0
## ar2 0.4424 0.0398 11.1245 0
## xmean 88.8538 1.9407 45.7839 0
##
## $AIC
## [1] 6.333521
##
## $AICc
## [1] 6.333615
##
## $BIC
## [1] 6.366832
Comment: Our estimated coefficients for ar1 would be 0.43, and 0.44 for ar2, which aren’t far too different. Looking at the p-value for ar1 and ar2, we can conclude that both of our models look fine as they are significantly different from 0 and are proper.
c: Check model diagnostics and explain if your chosen model seems appropriate.
Comment: When analyzing the diagnostic plots, we mainly focus on its ACF of Residuals plot, and its Ljung-Box.
ACF of Residuals: There are insignificant autocorrelations on residuals through all lags, as this is indicated by the black lines inside the blue dotted line.
Ljung-Box: All the p-values are larger than 0.05, which implies the evidence of white noise. This means that Yt-hat explains all the correlation part or systematic change part in the Yt - the model is fine.
d: Make a prediction for next 4 values based on your final model.
#forecasting
sarima.for(cmort, n.ahead=4, p=2, d=0, q=0)
## $pred
## Time Series:
## Start = c(1979, 41)
## End = c(1979, 44)
## Frequency = 52
## [1] 87.66207 86.85311 87.46615 87.37190
##
## $se
## Time Series:
## Start = c(1979, 41)
## End = c(1979, 44)
## Frequency = 52
## [1] 5.689543 6.193387 7.148343 7.612531
Comment: The 4 predicted values for cmort based on AR(2) are 5.69, 6.19, 7.15, and 7.61.
gtemp in astsa package examined in Exercise 2a: By visually checking, decide what ARMA model is appropriate for 1 order differenced series. Explain your answer. If you see multiple ARMA models look appropriate, try all of them, but at most three models.
set.seed(9)
diff.gtemp = diff(gtemp, lag=1)
plot.ts(diff.gtemp)
#plotting ACF and PACF
par(mfrow=c(1,2))
acf(diff.gtemp, main = "acf")
pacf(diff.gtemp, main = "pacf")
Comment: Looking at both plots, we can see that there are multiple models that can be obtained. In the ACF plot, the sharp drop occurs at lag 9. This means that MA(9) can be used as one of our models. In the PACF plot yields a sinusoidal pattern, with a cut off at lag 5. Hence, AR(5) would be a good model to use.
b: Fit the model(s) which you chose in part (a) using sarima function and interpret the result. If you have multiple models, report all of them.
#AR(5)
set.seed(9)
sarima(gtemp, p=5, d=1, q=0)
## initial value -2.197587
## iter 2 value -2.302454
## iter 3 value -2.338428
## iter 4 value -2.355919
## iter 5 value -2.356806
## iter 6 value -2.357706
## iter 7 value -2.358177
## iter 8 value -2.358216
## iter 9 value -2.358216
## iter 9 value -2.358216
## iter 9 value -2.358216
## final value -2.358216
## converged
## initial value -2.369568
## iter 2 value -2.369639
## iter 3 value -2.369649
## iter 4 value -2.369685
## iter 5 value -2.369688
## iter 6 value -2.369688
## iter 6 value -2.369688
## iter 6 value -2.369688
## final value -2.369688
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
## optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 constant
## -0.4592 -0.4483 -0.3914 -0.0844 -0.1955 0.0064
## s.e. 0.0862 0.0952 0.0971 0.0949 0.0862 0.0032
##
## sigma^2 estimated as 0.008693: log likelihood = 122.65, aic = -231.29
##
## $degrees_of_freedom
## [1] 123
##
## $ttable
## Estimate SE t.value p.value
## ar1 -0.4592 0.0862 -5.3243 0.0000
## ar2 -0.4483 0.0952 -4.7114 0.0000
## ar3 -0.3914 0.0971 -4.0290 0.0001
## ar4 -0.0844 0.0949 -0.8898 0.3753
## ar5 -0.1955 0.0862 -2.2664 0.0252
## constant 0.0064 0.0032 1.9751 0.0505
##
## $AIC
## [1] -1.792972
##
## $AICc
## [1] -1.787634
##
## $BIC
## [1] -1.637788
Comment: Looking at the p-values, we can conclude that all coefficients are significantly different from 0 except for ar4 (p-value > 0.05) Its AIC is -1.79 and BIC is -1.64.
#AR(9)
set.seed(9)
sarima(gtemp, p=0, d=1, q=9)
## initial value -2.213853
## iter 2 value -2.349486
## iter 3 value -2.373034
## iter 4 value -2.379587
## iter 5 value -2.381940
## iter 6 value -2.383877
## iter 7 value -2.384270
## iter 8 value -2.384889
## iter 9 value -2.384960
## iter 10 value -2.384974
## iter 11 value -2.384975
## iter 12 value -2.384976
## iter 13 value -2.384976
## iter 13 value -2.384976
## iter 13 value -2.384976
## final value -2.384976
## converged
## initial value -2.381100
## iter 2 value -2.381135
## iter 3 value -2.381178
## iter 4 value -2.381205
## iter 5 value -2.381214
## iter 6 value -2.381226
## iter 7 value -2.381229
## iter 8 value -2.381230
## iter 9 value -2.381230
## iter 9 value -2.381230
## iter 9 value -2.381230
## final value -2.381230
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
## optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ma1 ma2 ma3 ma4 ma5 ma6 ma7 ma8
## -0.4660 -0.2519 -0.1097 0.2261 -0.0536 -0.0043 -0.011 0.1591
## s.e. 0.0872 0.0944 0.1030 0.1010 0.1020 0.1186 0.086 0.1222
## ma9 constant
## -0.1855 0.0063
## s.e. 0.1020 0.0025
##
## sigma^2 estimated as 0.008475: log likelihood = 124.14, aic = -226.27
##
## $degrees_of_freedom
## [1] 119
##
## $ttable
## Estimate SE t.value p.value
## ma1 -0.4660 0.0872 -5.3450 0.0000
## ma2 -0.2519 0.0944 -2.6688 0.0087
## ma3 -0.1097 0.1030 -1.0649 0.2891
## ma4 0.2261 0.1010 2.2378 0.0271
## ma5 -0.0536 0.1020 -0.5255 0.6002
## ma6 -0.0043 0.1186 -0.0359 0.9714
## ma7 -0.0110 0.0860 -0.1281 0.8983
## ma8 0.1591 0.1222 1.3021 0.1954
## ma9 -0.1855 0.1020 -1.8184 0.0715
## constant 0.0063 0.0025 2.4973 0.0139
##
## $AIC
## [1] -1.75404
##
## $AICc
## [1] -1.739588
##
## $BIC
## [1] -1.51018
Comment: Looking at the p-values, we can conclude that the only coefficients that are significantly different than 0 are ma1, ma2, and ma4 (p-value < 0.05). Its AIC value is -1.74 and BIC value is -1.51, which is higher than our AR(5) model.
c: Check model diagnostics and explain if your chosen model seems appropriate. If you have multiple models, choose the final one based on AIC and diagnostics plot
Comment: When analyzing the diagnostic plots, we mainly focus on its ACF of Residuals plot, and its Ljung-Box. Both models yield more or less the same result in the diagnostic plots.
ACF of Residuals: There are insignificant autocorrelations on residuals through all lags, as this is indicated by the black lines inside the blue dotted line.
Ljung-Box: All the p-values are larger than 0.05, which implies the evidence of white noise. This means that Yt-hat explains all the correlation part or systematic change part in the Yt - both models are fine.
Model Selection: The most appropriate model in this case would be AR(5) since it has a lower AIC and BIC value compared to the MA(9) model. Not only that, but it also uses fewer coefficients than MA(9) which makes it a simpler model.
d: Make a prediction for next 4 values based on your final model.
sarima.for(gtemp, n.ahead=4, p=5, d=1, q=0)
## $pred
## Time Series:
## Start = 2010
## End = 2013
## Frequency = 1
## [1] 0.5525449 0.5821039 0.5439333 0.5869987
##
## $se
## Time Series:
## Start = 2010
## End = 2013
## Frequency = 1
## [1] 0.09323566 0.10599813 0.10970600 0.11172644
Comment: The 4 predicted values for gtemp based on AR(5) are 0.09, 0.10, 0.11, and 0.11.