Data: Built-in data in library “astsa” and “electricity.RData”.
# install packages as needed
#install.packages("astsa")
# setup
library(astsa)
setwd("C:/Users/Emily/Dropbox/My PC (DESKTOP-39IBD6A)/Desktop/Brit School/Spring 2021/Data Decision/hw1")
load(file = "electricity.RData")
# is.na(electricity) No missing values..
set.seed(100)
# pull data from astsa package
data(cmort)
data(gtemp)
ACF and PACF from simulated data under ARMA(p,q) model For simulated stationary data (n=200) under different ARMA models listed below, (i) plot observations, (ii) plot its ACF and PACF plots, and (iii) describe what you observe from ACF and PACF.
NOTE: for the case of nonstationary model, R will give a warning message.
AR(3)
ar3.sim <- arima.sim(list(order =c(3,0,0), ar=c(0.5, -0.3, .08)), n = 200)
ar3.sim2 <- arima.sim(list(order =c(3,0,0), ar=c(0.8, -0.5, .06)), n = 200)
par(mfrow=c(2, 3))
# sim 1
plot.ts(ar3.sim, main = "sim 1")
acf(ar3.sim, lwd= 3, main = "acf - Sim 1")
pacf(ar3.sim, lwd = 3, main = "pcf - sim 1")
# sim 2
plot.ts(ar3.sim2, main = "sim 2")
acf(ar3.sim2, lwd = 3, main = "acf")
pacf(ar3.sim2, lwd = 3, main = "pcf")
Both simulations look similar in their PCF, but the raw data looks different. The 2 ACFs also look slightly different at the 2nd lag.
MA(3)
ma3.sim <- arima.sim(list(order =c(0,0,3), ma = c(0.5, 0.8, 0.3)), n = 200)
ma3.sim2 <- arima.sim(list(order =c(0,0,3), ma = c(0.5, -0.8, 0.4)), n = 200)
par(mfrow=c(2,3))
#sim 1
plot.ts(ma3.sim, main = "sim 1")
acf(ma3.sim, lwd = 3, main = "acf - sim 1")
pacf(ma3.sim, lwd = 3, main = "pcf - sim 1")
# sim 2
plot.ts(ma3.sim2, main = "sim 2")
acf(ma3.sim2, lwd = 3, main = "acf - sim 2")
pacf(ma3.sim2, lwd = 3, main = "pcf - sim 2")
The 2 simulated datasets look very different here, even though they are both MA(3). While both ACFs gradually plateua, they do so in very different patterns. The pattern of the 2 PACFs look almost opposite.
ARMA(2,2)
arma22.sim1 <- arima.sim(list(order =c(2,0,2), ar=c(0.5, 0.3), ma = c(-0.5, 0.8)), n = 200)
arma22.sim2 <- arima.sim(list(order =c(2,0,2), ar=c(0.3, 0.5), ma = c(-0.3, 0.6)), n = 200)
par(mfrow=c(2,3))
# sim 1
plot.ts(arma22.sim1, main = "sim 1")
acf(arma22.sim1, lwd = 3, main = "acf - sim 1")
pacf(arma22.sim1, lwd = 3, main = "pcf - sim 1")
# sim 2
plot.ts(arma22.sim2, main = "sim 2")
acf(arma22.sim2, lwd = 3, main = "acf - sim 2")
pacf(arma22.sim2, lwd = 3, main = "pcf - sim 2")
These 2 simulated timeseries look very similar in raw data, ACF and PACF. That makes sense since I did not change the coefficients by much for the 2 simulations.
ARMA(3,3)
arma33.sim1 <- arima.sim(list(order =c(3,0,3), ar=c(0.5, -0.3, -0.5), ma = c(0.5, 0.3, 0.8)), n = 200)
arma33.sim2 <- arima.sim(list(order =c(3,0,3), ar=c(0.3, 0.3, -0.8), ma = c(-0.5, 0.4, 0.6)), n = 200)
par(mfrow=c(2, 3))
# sim 1
plot.ts(arma33.sim1, main = "sim 1")
acf(arma33.sim1, lwd = 3, main = "acf - sim 1")
pacf(arma33.sim1, lwd = 3, main = "pcf - sim 1")
# sim 2
plot.ts(arma33.sim2, main = "sim 2")
acf(arma33.sim2, lwd = 3, main = "acf - sim 2")
pacf(arma33.sim2, lwd = 3, main = "pcf - sim 2")
The 2 simluated timeseries here look very different in their ACFs, with some back and forth, grouped together pattern in sim 1 with more immediately back and forth in sim 2. The PACFs seem to be nearly opposite.
For data examples listed below, check whether the data look stationary using plotted observations or ACF/ PACF plots. Expain the reason.
par(mfrow=c(1, 3))
plot.ts(cmort)
acf(cmort)
pacf(cmort)
The data does not look stationary. There appears to be a linear trend
par(mfrow=c(1, 3))
plot.ts(gtemp)
acf(gtemp)
pacf(gtemp)
The data is not stationary. There is definately a linear trend that begins increases around 1980.
par(mfrow=c(1, 3))
gtemp.diff = diff(gtemp, differences = 1)
plot.ts(gtemp.diff)
acf(gtemp.diff)
pacf(gtemp.diff)
After taking the first difference, the data appears to be stationary. There is no pattern I can see, it seems random around 0.
par(mfrow=c(1, 3))
plot.ts(electricity)
acf(electricity)
pacf(electricity)
The data is not stationary, there is a clear linear trend increasing.
par(mfrow=c(1, 3))
electricity.log = log(electricity)
plot.ts(electricity.log)
acf(electricity.log)
pacf(electricity.log)
The data is not stationary. There is a clear linear trend weven with the log transformation.
par(mfrow=c(1, 3))
electricity.log.diff = diff(electricity.log, differences = 1)
plot.ts(electricity.log.diff)
acf(electricity.log.diff)
pacf(electricity.log.diff)
The data is stationary. The data appears to be random around 0.
Use the data “cmort” in astsa package examined in Exercise 2 a) By visually checking, decide what ARMA model is appropriate, i.e., decide p, q in ARMA(p,q). Explain your answer.
plot.ts(cmort)
From the plot above, the data is not stationary. So the first thing I want to do is apply 1 difference to see if the data becomes stationary after that.
cmort.diff = diff(cmort, differences = 1)
plot.ts(cmort.diff)
The data now appears to be stationary, so I’ll plot the ACF and PACF to try and decide p,q
acf(cmort.diff)
pacf(cmort.diff)
Based on the ACF, I’d chose p = 2 from the spikes in the ACF. From the PACF, I think q = 1 will work.
But I’d like to explore p = 1 as well. Next I’ll fit a ARMA(2,1) model and an ARMA(1,1) model to the data that has taken the 1st difference.
arma21 = sarima(cmort.diff, 2, 0, 1)
## initial value 1.907405
## iter 2 value 1.856514
## iter 3 value 1.764037
## iter 4 value 1.758706
## iter 5 value 1.756237
## iter 6 value 1.756227
## iter 7 value 1.756194
## iter 8 value 1.756122
## iter 9 value 1.755948
## iter 10 value 1.755652
## iter 11 value 1.755529
## iter 12 value 1.755462
## iter 13 value 1.755461
## iter 14 value 1.755459
## iter 15 value 1.755453
## iter 16 value 1.755443
## iter 17 value 1.755433
## iter 18 value 1.755426
## iter 19 value 1.755425
## iter 19 value 1.755425
## final value 1.755425
## converged
## initial value 1.756350
## iter 2 value 1.756343
## iter 3 value 1.756340
## iter 4 value 1.756340
## iter 4 value 1.756340
## iter 4 value 1.756340
## final value 1.756340
## converged
arma11 = sarima(cmort.diff, 1, 0, 1)
## initial value 1.908706
## iter 2 value 1.864950
## iter 3 value 1.758481
## iter 4 value 1.758244
## iter 5 value 1.758051
## iter 6 value 1.757827
## iter 7 value 1.757772
## iter 8 value 1.757764
## iter 8 value 1.757764
## final value 1.757764
## converged
## initial value 1.757729
## iter 2 value 1.757725
## iter 3 value 1.757720
## iter 4 value 1.757715
## iter 5 value 1.757713
## iter 6 value 1.757713
## iter 6 value 1.757713
## iter 6 value 1.757713
## final value 1.757713
## converged
arma21$AIC
## [1] 6.37028
arma11$AIC
## [1] 6.369082
Looking at the AIC, I will go with ARMA(1,1) because ARMA(2,1) had an AIC of 6.37028 and ARMA(1,1) had AIC of 6.369082
sarima(cmort.diff, 1, 0, 1)
## initial value 1.908706
## iter 2 value 1.864950
## iter 3 value 1.758481
## iter 4 value 1.758244
## iter 5 value 1.758051
## iter 6 value 1.757827
## iter 7 value 1.757772
## iter 8 value 1.757764
## iter 8 value 1.757764
## final value 1.757764
## converged
## initial value 1.757729
## iter 2 value 1.757725
## iter 3 value 1.757720
## iter 4 value 1.757715
## iter 5 value 1.757713
## iter 6 value 1.757713
## iter 6 value 1.757713
## iter 6 value 1.757713
## final value 1.757713
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
## fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ma1 xmean
## -0.3777 -0.1719 -0.0269
## s.e. 0.0882 0.0967 0.1549
##
## sigma^2 estimated as 33.61: log likelihood = -1610.56, aic = 3229.12
##
## $degrees_of_freedom
## [1] 504
##
## $ttable
## Estimate SE t.value p.value
## ar1 -0.3777 0.0882 -4.2842 0.0000
## ma1 -0.1719 0.0967 -1.7770 0.0762
## xmean -0.0269 0.1549 -0.1735 0.8623
##
## $AIC
## [1] 6.369082
##
## $AICc
## [1] 6.369176
##
## $BIC
## [1] 6.402443
The model seems appropriate based on the standardized residuals and QQ plot, but the graph of p values for Ljung-Box statistic do not seem to suggest a good fit for the model.
predict(arima(cmort.diff, order = c(1, 0, 1)), n.ahead = 4)$pred
## Time Series:
## Start = c(1979, 41)
## End = c(1979, 44)
## Frequency = 52
## [1] 1.06291191 -0.43853155 0.12863308 -0.08561123
Use the data “gtemp” in astsa package examined in Exercise 2
gtemp.diff = diff(gtemp, differences = 1)
acf(gtemp.diff)
pacf(gtemp.diff)
I want to try ARMA(1, 3) model for this data. There is only 1 spike in the ACF and 3 spikes in the PACF.
sarima(gtemp.diff, 1, 0, 3)
## initial value -2.211269
## iter 2 value -2.328791
## iter 3 value -2.335632
## iter 4 value -2.337057
## iter 5 value -2.344885
## iter 6 value -2.345040
## iter 7 value -2.345467
## iter 8 value -2.345676
## iter 9 value -2.346966
## iter 10 value -2.349997
## iter 11 value -2.352061
## iter 12 value -2.352665
## iter 13 value -2.355933
## iter 14 value -2.358148
## iter 15 value -2.359904
## iter 16 value -2.360401
## iter 17 value -2.360972
## iter 18 value -2.363412
## iter 19 value -2.363529
## iter 20 value -2.363654
## iter 21 value -2.363678
## iter 22 value -2.363681
## iter 22 value -2.363681
## final value -2.363681
## converged
## initial value -2.365998
## iter 2 value -2.366003
## iter 3 value -2.366007
## iter 4 value -2.366009
## iter 5 value -2.366010
## iter 5 value -2.366010
## iter 5 value -2.366010
## final value -2.366010
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
## fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ma1 ma2 ma3 xmean
## -0.9376 0.4846 -0.6341 -0.2857 0.0065
## s.e. 0.0946 0.1165 0.0955 0.0866 0.0025
##
## sigma^2 estimated as 0.008755: log likelihood = 122.17, aic = -232.34
##
## $degrees_of_freedom
## [1] 124
##
## $ttable
## Estimate SE t.value p.value
## ar1 -0.9376 0.0946 -9.9090 0.0000
## ma1 0.4846 0.1165 4.1612 0.0001
## ma2 -0.6341 0.0955 -6.6373 0.0000
## ma3 -0.2857 0.0866 -3.2988 0.0013
## xmean 0.0065 0.0025 2.6171 0.0100
##
## $AIC
## [1] -1.80112
##
## $AICc
## [1] -1.797338
##
## $BIC
## [1] -1.668105
arma13 = sarima(gtemp.diff, 1, 0, 3)
## initial value -2.211269
## iter 2 value -2.328791
## iter 3 value -2.335632
## iter 4 value -2.337057
## iter 5 value -2.344885
## iter 6 value -2.345040
## iter 7 value -2.345467
## iter 8 value -2.345676
## iter 9 value -2.346966
## iter 10 value -2.349997
## iter 11 value -2.352061
## iter 12 value -2.352665
## iter 13 value -2.355933
## iter 14 value -2.358148
## iter 15 value -2.359904
## iter 16 value -2.360401
## iter 17 value -2.360972
## iter 18 value -2.363412
## iter 19 value -2.363529
## iter 20 value -2.363654
## iter 21 value -2.363678
## iter 22 value -2.363681
## iter 22 value -2.363681
## final value -2.363681
## converged
## initial value -2.365998
## iter 2 value -2.366003
## iter 3 value -2.366007
## iter 4 value -2.366009
## iter 5 value -2.366010
## iter 5 value -2.366010
## iter 5 value -2.366010
## final value -2.366010
## converged
arma13$fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
## fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ma1 ma2 ma3 xmean
## -0.9376 0.4846 -0.6341 -0.2857 0.0065
## s.e. 0.0946 0.1165 0.0955 0.0866 0.0025
##
## sigma^2 estimated as 0.008755: log likelihood = 122.17, aic = -232.34
The p values for the Ljung-Box statistic don’t look close to 0 after lag 5.
I’m also going to look at ARMA(1, 2) just to see what it looks like
sarima(gtemp.diff, 1, 0, 2)
## initial value -2.211269
## iter 2 value -2.319512
## iter 3 value -2.330118
## iter 4 value -2.336819
## iter 5 value -2.338492
## iter 6 value -2.340501
## iter 7 value -2.340763
## iter 8 value -2.341026
## iter 9 value -2.341472
## iter 10 value -2.341919
## iter 11 value -2.341964
## iter 12 value -2.342001
## iter 13 value -2.342041
## iter 14 value -2.342062
## iter 15 value -2.342072
## iter 16 value -2.342077
## iter 17 value -2.342084
## iter 18 value -2.342096
## iter 19 value -2.342101
## iter 20 value -2.342102
## iter 21 value -2.342102
## iter 22 value -2.342102
## iter 22 value -2.342102
## iter 22 value -2.342102
## final value -2.342102
## converged
## initial value -2.346486
## iter 2 value -2.346529
## iter 3 value -2.346569
## iter 4 value -2.346631
## iter 5 value -2.346686
## iter 6 value -2.346869
## iter 7 value -2.346936
## iter 8 value -2.346943
## iter 9 value -2.346943
## iter 9 value -2.346943
## iter 9 value -2.346943
## final value -2.346943
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
## fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ma1 ma2 xmean
## 0.0318 -0.5419 -0.1738 0.0065
## s.e. 0.2770 0.2644 0.1753 0.0025
##
## sigma^2 estimated as 0.009109: log likelihood = 119.71, aic = -229.43
##
## $degrees_of_freedom
## [1] 125
##
## $ttable
## Estimate SE t.value p.value
## ar1 0.0318 0.2770 0.1147 0.9089
## ma1 -0.5419 0.2644 -2.0496 0.0425
## ma2 -0.1738 0.1753 -0.9917 0.3233
## xmean 0.0065 0.0025 2.5553 0.0118
##
## $AIC
## [1] -1.778491
##
## $AICc
## [1] -1.77599
##
## $BIC
## [1] -1.667645
arma12 = sarima(gtemp.diff, 1, 0, 2)
## initial value -2.211269
## iter 2 value -2.319512
## iter 3 value -2.330118
## iter 4 value -2.336819
## iter 5 value -2.338492
## iter 6 value -2.340501
## iter 7 value -2.340763
## iter 8 value -2.341026
## iter 9 value -2.341472
## iter 10 value -2.341919
## iter 11 value -2.341964
## iter 12 value -2.342001
## iter 13 value -2.342041
## iter 14 value -2.342062
## iter 15 value -2.342072
## iter 16 value -2.342077
## iter 17 value -2.342084
## iter 18 value -2.342096
## iter 19 value -2.342101
## iter 20 value -2.342102
## iter 21 value -2.342102
## iter 22 value -2.342102
## iter 22 value -2.342102
## iter 22 value -2.342102
## final value -2.342102
## converged
## initial value -2.346486
## iter 2 value -2.346529
## iter 3 value -2.346569
## iter 4 value -2.346631
## iter 5 value -2.346686
## iter 6 value -2.346869
## iter 7 value -2.346936
## iter 8 value -2.346943
## iter 9 value -2.346943
## iter 9 value -2.346943
## iter 9 value -2.346943
## final value -2.346943
## converged
arma12$fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
## fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ma1 ma2 xmean
## 0.0318 -0.5419 -0.1738 0.0065
## s.e. 0.2770 0.2644 0.1753 0.0025
##
## sigma^2 estimated as 0.009109: log likelihood = 119.71, aic = -229.43
The Ljung-Box statistic looks much better for this model, but the AIC is slightly higher than the ARMA(1,3) model.
sarima(gtemp.diff, 1, 0, 2)
## initial value -2.211269
## iter 2 value -2.319512
## iter 3 value -2.330118
## iter 4 value -2.336819
## iter 5 value -2.338492
## iter 6 value -2.340501
## iter 7 value -2.340763
## iter 8 value -2.341026
## iter 9 value -2.341472
## iter 10 value -2.341919
## iter 11 value -2.341964
## iter 12 value -2.342001
## iter 13 value -2.342041
## iter 14 value -2.342062
## iter 15 value -2.342072
## iter 16 value -2.342077
## iter 17 value -2.342084
## iter 18 value -2.342096
## iter 19 value -2.342101
## iter 20 value -2.342102
## iter 21 value -2.342102
## iter 22 value -2.342102
## iter 22 value -2.342102
## iter 22 value -2.342102
## final value -2.342102
## converged
## initial value -2.346486
## iter 2 value -2.346529
## iter 3 value -2.346569
## iter 4 value -2.346631
## iter 5 value -2.346686
## iter 6 value -2.346869
## iter 7 value -2.346936
## iter 8 value -2.346943
## iter 9 value -2.346943
## iter 9 value -2.346943
## iter 9 value -2.346943
## final value -2.346943
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
## fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ma1 ma2 xmean
## 0.0318 -0.5419 -0.1738 0.0065
## s.e. 0.2770 0.2644 0.1753 0.0025
##
## sigma^2 estimated as 0.009109: log likelihood = 119.71, aic = -229.43
##
## $degrees_of_freedom
## [1] 125
##
## $ttable
## Estimate SE t.value p.value
## ar1 0.0318 0.2770 0.1147 0.9089
## ma1 -0.5419 0.2644 -2.0496 0.0425
## ma2 -0.1738 0.1753 -0.9917 0.3233
## xmean 0.0065 0.0025 2.5553 0.0118
##
## $AIC
## [1] -1.778491
##
## $AICc
## [1] -1.77599
##
## $BIC
## [1] -1.667645
Even though the AIC of ARMA(1,3) is slightly better, I’m going to choose ARMA(1,2) because of the better looking Ljung-Box statistic plot. It seems to suggest that ARMA(1,2) is a better fit for the data than ARMA(1,3)
predict(arima(gtemp.diff, order = c(1, 0, 2)), n.ahead = 4)$pred
## Time Series:
## Start = 2010
## End = 2013
## Frequency = 1
## [1] -0.008370590 -0.007246593 0.006029457 0.006451301