Hyndman Chapter 8

Problem 8.1

a

All three of the plots indicate white noise because none of the data exceeds the critical values. The differences between them are because we’d expect less variance as our number of samples goes up.

b

The p = .05 critical value is defined as \[ \pm \frac{1.96}{\sqrt{T-d}} \] where T is the sample size and d is the lag degree. Consequently, we see that as sample size increases, we’d expect our critical value to decrease. This explains the difference between the three plots above.

Problem 8.2

## Series: x 
## ARIMA(0,1,0) 
## 
## sigma^2 estimated as 52.62:  log likelihood=-1251.37
## AIC=2504.74   AICc=2504.75   BIC=2508.64

Both the ACF and PACF plots show significant autocorrelation as there are many lags spikes beyond the critical values. Additionally, the data does not appear to be normally distributed.

Problem 8.3

a

Here a Box-Cox transformatin is inappropriate because variance appears to be constant.

## Series: x 
## ARIMA(2,1,2) with drift 
## 
## Coefficients:
##           ar1      ar2     ma1     ma2    drift
##       -1.3032  -0.4332  1.5284  0.8340  66.1585
## s.e.   0.2122   0.2084  0.1417  0.1185   7.5595
## 
## sigma^2 estimated as 2262:  log likelihood=-283.34
## AIC=578.67   AICc=580.46   BIC=590.61

The plots show that except for an intiail spike for the ACF, none of the values exceed the critical value. Additionally, increasing the differencing order to 2 increases the errors at various lags. The arima algorithm says a damped trend linear exponential model has the best fit. Lags at 1 and 12 were removed with differencing.

b

x <- usgdp

par(mfrow = c(2,2))
plot(x, main = 'x')
transformed <- BoxCox(x, lambda = BoxCox.lambda(x))
qqnorm(transformed, main = "Transformed Data")
qqline(transformed)
acf(transformed, main = "Box Cox Data")
data.diff <- diff(transformed, 1)
data.diff <- diff(transformed, 1)
acf(data.diff, main = "Lag 1")

auto.arima(x)

## Series: x 
## ARIMA(2,2,2) 
## 
## Coefficients:
##           ar1     ar2      ma1      ma2
##       -0.1228  0.3106  -0.5835  -0.3669
## s.e.   0.2869  0.0872   0.3004   0.2862
## 
## sigma^2 estimated as 1604:  log likelihood=-1199.57
## AIC=2409.13   AICc=2409.39   BIC=2426.43

The auto.arima() packages says the optimal idfferencing order is 2.

c

x <- mcopper
par(mfrow = c(2,2))
plot(x, main = 'x')
transformed <- BoxCox(x, lambda = BoxCox.lambda(x))
qqnorm(transformed, main = "Transformed Data")
qqline(transformed)
data.diff <- diff(transformed, 1)
auto.arima(data.diff)

## Series: data.diff 
## ARIMA(0,0,1) with zero mean 
## 
## Coefficients:
##          ma1
##       0.3720
## s.e.  0.0388
## 
## sigma^2 estimated as 0.04997:  log likelihood=45.05
## AIC=-86.1   AICc=-86.08   BIC=-77.43

acf(transformed)
acf(data.diff)

This model is optimized with first order differencing.

d

x <- enplanements
par(mfrow = c(2,2))
plot(x, main = 'x')
transformed <- BoxCox(x, lambda = BoxCox.lambda(x))

qqnorm(transformed, main = "Transformed Data")
qqline(transformed)


#data.diff <- diff(data.diff, 1)
auto.arima(transformed)

## Series: transformed 
## ARIMA(0,1,1)(0,1,1)[12] 
## 
## Coefficients:
##           ma1     sma1
##       -0.4012  -0.7479
## s.e.   0.0683   0.1174
## 
## sigma^2 estimated as 0.0003159:  log likelihood=698.48
## AIC=-1390.95   AICc=-1390.86   BIC=-1380.17

acf(transformed)
acf(data.diff)

This data benefits from first order differencing.

e

x <-visitors
par(mfrow = c(2,2))
plot(x, main = 'x')
transformed <- BoxCox(x, lambda = BoxCox.lambda(x))

qqnorm(transformed, main = "Transformed Data")
qqline(transformed)
acf(transformed)

auto.arima(transformed)

## Series: transformed 
## ARIMA(0,1,1)(2,1,1)[12] 
## 
## Coefficients:
##           ma1     sar1     sar2     sma1
##       -0.3111  -0.2852  -0.1860  -0.5353
## s.e.   0.0684   0.1501   0.1123   0.1558
## 
## sigma^2 estimated as 0.06172:  log likelihood=-9.07
## AIC=28.14   AICc=28.41   BIC=45.27

data.diff <- diff(transformed, 1)

acf(data.diff)

First order differencing greatly improves this model. ## Problem 8.5

curl::curl_download("https://otexts.com/fpp2/extrafiles/retail.xlsx", "retail.xlsx")
retail <- readxl::read_excel("retail.xlsx", skip = 1)
ts <- ts(retail[,"A3349872X"], frequency = 12, start = c(1982,4))
autoplot(ts)

x <- ts
par(mfrow = c(2,2))
plot(x, main = 'x')
transformed <- BoxCox(x, lambda = BoxCox.lambda(x))

qqnorm(transformed, main = "Transformed Data")
qqline(transformed)
acf(transformed)
auto.arima(transformed)

## Series: transformed 
## ARIMA(3,1,2)(0,1,2)[12] 
## 
## Coefficients:
##           ar1     ar2     ar3      ma1      ma2     sma1     sma2
##       -0.0095  0.2964  0.3182  -0.4350  -0.4255  -0.6319  -0.1803
## s.e.   0.1628  0.1003  0.0642   0.1635   0.1268   0.0570   0.0565
## 
## sigma^2 estimated as 0.0008774:  log likelihood=770.42
## AIC=-1524.85   AICc=-1524.44   BIC=-1493.58

diff.data <- diff(transformed, 2)
diff.data <- diff(transformed, 1)
acf(diff.data)

autoplot(diff.data)

Differencing twice at 1 and 2 lag removed most of the lags spike and a Box Cox transformation normalizes the data.

Problem 8.6

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]

b

plot(y)

z <- ts(numeric(100))
for(i in 2:100)
  z[i] <- 0.9*z[i-1] + e[i]

plot(z)

Increasing \(\phi_1\) changes the pattern of the time series, rather than the scaling factor that \(\epsilon\) contributes. ### c

z <- ts(numeric(100))
for(i in 2:100)
  z[i] <- mean(z[1:i]) + 0.6*e[i]

plot(z)

d

z <- ts(numeric(100))
for(i in 2:100)
  z[i] <- mean(z[1:i]) + 0.9*e[i]

plot(z)

It changes the scaling rather than the series. ### e

z1 <- ts(numeric(100))
for(i in 2:100){
  z1[i] <-  0.6*z1[i-1] + e[i]
  z1[i] <- mean(z1[1:i]) + 0.6*e[i]
}
plot(z1)

f

z <- ts(numeric(100))
for(i in 2:100){
  z[i] <-  -.8*z[i-1] +  +.3*z[i-1] + e[i]
 # z[i] <-  0.8*z[i-1] + e[i]
}
plot(z)

g

par(mfrow = c(2,2))
plot(z1)
plot(z)
acf(z1)
acf(z)

plot(z,z1)

The second model appears to be slightly noisier. ## Problem 8.7

x <- wmurders
plot(x)

transformed <- BoxCox(x, lambda = BoxCox.lambda(x))
par(mfrow = c(1,2))
qqnorm(x, main = "Data")
qqline(x)
qqnorm(transformed, main = "yndmanTransformed Data")
qqline(transformed)

acf(x)
auto.arima(x)

## Series: x 
## ARIMA(1,2,1) 
## 
## Coefficients:
##           ar1      ma1
##       -0.2434  -0.8261
## s.e.   0.1553   0.1143
## 
## sigma^2 estimated as 0.04632:  log likelihood=6.44
## AIC=-6.88   AICc=-6.39   BIC=-0.97

data.diff <- diff(transformed, 5)
data.diff <- diff(transformed, 2)
acf(data.diff)

a

This data benefits most from 2nd order differencing and first order AR and MA models.

b

The constant is close to the mean of the data. Because that number for us is small, it’s unlikely that the model will be effected much with or without it.

mean(data.diff)

## [1] 0.003160471

c

\[ (1-\phi_1)(1-B)^2y_t = c + (1+\theta_1 B) \]

d

model <- Arima(transformed, order = c(1,2,1))
plot(residuals(model))

model

## Series: transformed 
## ARIMA(1,2,1) 
## 
## Coefficients:
##           ar1     ma1
##       -0.3006  -0.786
## s.e.   0.1529   0.119
## 
## sigma^2 estimated as 0.002851:  log likelihood=80.37
## AIC=-154.74   AICc=-154.25   BIC=-148.83

The residuals seem sufficiently small. ### e

predictions <- predict(model, n.ahead = 3)
predictions

## $pred
## Time Series:
## Start = 2005 
## End = 2007 
## Frequency = 1 
## [1] 0.8727642 0.8394207 0.8050389
## 
## $se
## Time Series:
## Start = 2005 
## End = 2007 
## Frequency = 1 
## [1] 0.05339028 0.07231055 0.09498005

f

ts.plot(transformed, predictions$pred)

### g Yes. My model was the same as the auto.arima model. T

model.arima <- auto.arima(transformed)
predictions <- predict(model.arima, n.ahead = 3)
predictions

## $pred
## Time Series:
## Start = 2005 
## End = 2007 
## Frequency = 1 
## [1] 0.8727642 0.8394207 0.8050389
## 
## $se
## Time Series:
## Start = 2005 
## End = 2007 
## Frequency = 1 
## [1] 0.05339028 0.07231055 0.09498005

ts.plot(transformed, predictions$pred)

HW 6

simplymathematics

March 21, 2019

Hyndman Chapter 8

Problem 8.1

a

b

Problem 8.2

Problem 8.3

a

b

c

d

e

Problem 8.6

b

d

f

g

a

b

c

d

f