Chapter 8 Exercise

1.Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

a.Explain the differences among these figures. Do they all indicate that the data are white noise? They do all indicate that the data are white noise, since all at each lag point, we can see correlation which has coefficient on scale of -1 to 1 lies wihtin the dashed, blue line;these are the idicators.

b.Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

It is different distance from the mean of zeron because it depends on the sample size. and here we are seeing dramatic and rapid increases on the series from xl to x3.Hence, the critical region decreases as the series num,ber increases.

2.A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

library(readr)
library(ggplot2)
library(forecast)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

library(GGally)

## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2

library(fpp2)

## ── Attaching packages ────────────────────────────────────── fpp2 2.4 ──

## ✓ fma       2.4     ✓ expsmooth 2.3

##

library(seasonal)
library(psych)

## 
## Attaching package: 'psych'

## The following object is masked from 'package:seasonal':
## 
##     outlier

## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

#install.packages("kableExtra")
library(kableExtra)

## 
## Attaching package: 'kableExtra'

## The following object is masked from 'package:dplyr':
## 
##     group_rows

ibmclose

## Time Series:
## Start = 1 
## End = 369 
## Frequency = 1 
##   [1] 460 457 452 459 462 459 463 479 493 490 492 498 499 497 496 490 489 478
##  [19] 487 491 487 482 479 478 479 477 479 475 479 476 476 478 479 477 476 475
##  [37] 475 473 474 474 474 465 466 467 471 471 467 473 481 488 490 489 489 485
##  [55] 491 492 494 499 498 500 497 494 495 500 504 513 511 514 510 509 515 519
##  [73] 523 519 523 531 547 551 547 541 545 549 545 549 547 543 540 539 532 517
##  [91] 527 540 542 538 541 541 547 553 559 557 557 560 571 571 569 575 580 584
## [109] 585 590 599 603 599 596 585 587 585 581 583 592 592 596 596 595 598 598
## [127] 595 595 592 588 582 576 578 589 585 580 579 584 581 581 577 577 578 580
## [145] 586 583 581 576 571 575 575 573 577 582 584 579 572 577 571 560 549 556
## [163] 557 563 564 567 561 559 553 553 553 547 550 544 541 532 525 542 555 558
## [181] 551 551 552 553 557 557 548 547 545 545 539 539 535 537 535 536 537 543
## [199] 548 546 547 548 549 553 553 552 551 550 553 554 551 551 545 547 547 537
## [217] 539 538 533 525 513 510 521 521 521 523 516 511 518 517 520 519 519 519
## [235] 518 513 499 485 454 462 473 482 486 475 459 451 453 446 455 452 457 449
## [253] 450 435 415 398 399 361 383 393 385 360 364 365 370 374 359 335 323 306
## [271] 333 330 336 328 316 320 332 320 333 344 339 350 351 350 345 350 359 375
## [289] 379 376 382 370 365 367 372 373 363 371 369 376 387 387 376 385 385 380
## [307] 373 382 377 376 379 386 387 386 389 394 393 409 411 409 408 393 391 388
## [325] 396 387 383 388 382 384 382 383 383 388 395 392 386 383 377 364 369 355
## [343] 350 353 340 350 349 358 360 360 366 359 356 355 367 357 361 355 348 343
## [361] 330 340 339 331 345 352 346 352 357

autoplot(ibmclose) + ylab("Closing Price for IBM") + xlab("Day")

plot(acf(ibmclose)) #shows stron trend of sutocorrelation in TS

plot(pacf(ibmclose)) #There is one extrem outlier at position Lag 1, and the correlation is almost 1.

3.For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

usnetelec usgdp mcopper enplanements visitors

summary(usnetelec)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   296.1   889.0  2040.9  1972.1  3002.7  3858.5

summary(usgdp)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1568    2632    4552    5168    7130   11404

summary(mcopper)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   216.6   566.0   949.2   997.8  1262.5  4306.0

summary(enplanements)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   20.14   27.18   34.88   35.67   42.78   56.14

summary(visitors)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    75.4   189.2   303.1   288.2   378.7   593.1

plot(BoxCox(usnetelec, lambda=0), main="usnetelec:US Net Electricity Generation")

plot(BoxCox(usgdp, lambda=0.5), main="usgdp: US GDP")

plot(BoxCox(mcopper, lambda=1/3), main="mcopper:Monthly Copper Prices")

plot(BoxCox(enplanements, lambda=0), main="Enplanements")

plot(BoxCox(visitors, lambda=0), main="visitors")

4.For the enplanements data, write down the differences you chose above using backshift operator notation.

#Apply a log transformation to the enplanemets data
log.enp = log(enplanements)
log.enp

##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 1979 3.050220 3.132010 3.254243 3.193763 3.153163 3.289148 3.292126 3.357594
## 1980 3.061052 3.101443 3.194583 3.153163 3.093766 3.199897 3.178054 3.217675
## 1981 3.007661 3.026746 3.092405 3.121483 3.126761 3.168845 3.184698 3.119718
## 1982 3.021400 3.075467 3.170526 3.178470 3.114404 3.205993 3.185939 3.208017
## 1983 3.083285 3.154444 3.295096 3.203153 3.182212 3.283539 3.243764 3.273364
## 1984 3.117950 3.164208 3.295837 3.292498 3.282038 3.364879 3.322154 3.394173
## 1985 3.232779 3.292498 3.453157 3.436886 3.424263 3.468544 3.466361 3.501344
## 1986 3.334701 3.401531 3.529884 3.493473 3.494080 3.551340 3.574310 3.632574
## 1987 3.357594 3.508256 3.631250 3.624608 3.579901 3.607127 3.629925 3.648318
## 1988 3.389125 3.492560 3.614156 3.564449 3.550192 3.610107 3.607941 3.657389
## 1989 3.418382 3.477232 3.572907 3.520757 3.522234 3.635215 3.611728 3.673258
## 1990 3.435921 3.524889 3.613886 3.583519 3.545586 3.634423 3.624341 3.688379
## 1991 3.420019 3.467609 3.513931 3.552200 3.532518 3.604682 3.617652 3.656614
## 1992 3.394844 3.452841 3.547892 3.518684 3.538638 3.704014 3.755135 3.778263
## 1993 3.443938 3.521052 3.599775 3.619529 3.600595 3.672242 3.686877 3.712840
## 1994 3.495901 3.589888 3.709662 3.685624 3.688129 3.764682 3.785779 3.793915
## 1995 3.595393 3.657647 3.752793 3.742183 3.717467 3.802878 3.790533 3.820127
## 1996 3.615502 3.735763 3.840527 3.808882 3.794590 3.860098 3.850148 3.878880
## 1997 3.687629 3.761200 3.871618 3.826247 3.817053 3.888345 3.896706 3.902579
## 1998 3.668932 3.771841 3.866398 3.879706 3.851211 3.922369 3.909620 3.917210
## 1999 3.710641 3.806440 3.909018 3.905804 3.869116 3.955848 3.978747 3.951628
## 2000 3.722556 3.838807 3.967647 3.945071 3.950859 4.027492 4.014760 3.995996
## 2001 3.780319 3.861992 3.966890 3.953165 3.926320 4.005331 4.016383 4.027849
## 2002 3.641001 3.747856 3.872242 3.839022 3.841171 3.917806                  
##           Sep      Oct      Nov      Dec
## 1979 3.129826 3.139833 3.140265 3.097386
## 1980 3.023834 3.070376 3.002708 3.062924
## 1981 3.020425 3.065258 3.040228 3.077312
## 1982 3.037354 3.083743 3.077312 3.108168
## 1983 3.142858 3.188004 3.173041 3.176386
## 1984 3.242983 3.288775 3.294725 3.269569
## 1985 3.289148 3.356200 3.325036 3.416414
## 1986 3.422633 3.464485 3.434310 3.475067
## 1987 3.454738 3.492560 3.472898 3.461979
## 1988 3.496204 3.543565 3.534562 3.483085
## 1989 3.486763 3.547316 3.541539 3.493777
## 1990 3.498929 3.547604 3.529591 3.488292
## 1991 3.482470 3.534854 3.474448 3.534854
## 1992 3.590163 3.555919 3.529884 3.520165
## 1993 3.600595 3.642574 3.614964 3.591818
## 1994 3.680343 3.714791 3.703768 3.675794
## 1995 3.685624 3.734331 3.727379 3.698830
## 1996 3.731939 3.792789 3.719651 3.783508
## 1997 3.756304 3.805996 3.775057 3.785325
## 1998 3.779634 3.844172 3.827554 3.814851
## 1999 3.826247 3.894673 3.891820 3.824502
## 2000 3.865770 3.922567 3.930452 3.843530
## 2001 3.447126 3.684369 3.725693 3.700067
## 2002

acf(log.enp)

acf(BoxCox(log.enp, lambda=0))

pacf(BoxCox(log.enp, lambda=0))

acf(BoxCox(log.enp, lambda=1/3))

pacf(BoxCox(log.enp, lambda=1/3))

acf(BoxCox(log.enp, lambda=1/2))

pacf(BoxCox(log.enp, lambda=1/2))

#Overall, acf plots do not seem to change at all, while pacf plots shitffs a bit by setting different value of lambda

5.For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

#str(retail)
#retail <- read.table("~/Desktop/retail.xlsx", header=TRUE, quote="\"", stringsAsFactors=TRUE)
#retail.ts = ts(retail)

#retail.bc = BoxCox(retail.ts, lambda = 0)
#summary(retail.bc)

##Not sure why, but when I knit the code, there shows error message about retail dataset

6.Use R to simulate and plot some data from simple ARIMA models.

y = ts(numeric(100))
e = rnorm(100)
for(i in 2:100)
  y[i] = 0.6*y[i-1] + e[i]

#Time plot
autoplot(y)

y.new = ts(numeric(100))
e = rnorm(100)
for(i in 2:100)
  y.new[i] = 0.1*y.new[i-1] + e[i]

plot(y.new)

#If we decrease theta1, the data becomes more condensed

#My function wit theta = 0.6
y = ts(numeric(100))
e = rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]

plot(y)

#theta  = 0.3, 1.2
y1 = ts(numeric(100))
e = rnorm(100)
for(i in 2:100)
  y1[i] <- 0.3*y1[i-1] + e[i]

plot(y1)#like we mentioned eariler, the data is more condense when theta decreases

y2 = ts(numeric(100))
e = rnorm(100)
for(i in 2:100)
  y2[i] <- 1.2*y2[i-1] + e[i]

plot(y2)

#with ϕ1=0.6, θ1=0.6 and σ^2=1
y = ts(numeric(100))
e =- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]

arima(y)

## 
## Call:
## arima(x = y)
## 
## Coefficients:
##       intercept
##          0.1452
## s.e.     0.1215
## 
## sigma^2 estimated as 1.475:  log likelihood = -161.34,  aic = 326.68

y.AR = ts(numeric(100))
e = rnorm(100)
for(i in 2:100)
  y.AR[i] <- 0.6*y.AR[i-1] + 0.6*e[i-1] + e[i]

#plot(y.AR)

arima(y.AR)

## 
## Call:
## arima(x = y.AR)
## 
## Coefficients:
##       intercept
##         -0.1789
## s.e.     0.1731
## 
## sigma^2 estimated as 2.998:  log likelihood = -196.78,  aic = 397.57

#with ϕ1=−0.8, ϕ2=0.3 and σ^2=1.
y.AR1 = ts(numeric(100))
e = rnorm(100)
for(i in 2:100)
  y.AR1[i] <- -0.8*y.AR1[i-1] + 0.3*e[i-1] + e[i]


plot(y.AR)

plot(y.AR1)

#Her we see with model y.AR1, it shows higher volatility

7.Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States. a.By studying appropriate graphs of the series in R, find an appropriate ARIMA( p,d,q ) model for these data. b.Should you include a constant in the model? Explain. c.Write this model in terms of the backshift operator. d.Fit the model using R and examine the residuals. Is the model satisfactory? e.Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated. f.Create a plot of the series with forecasts and prediction intervals for the next three periods shown. g.Does auto.arima() give the same model you have chosen? If not, which model do you think is better?

summary(wmurders)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.233   2.609   3.463   3.393   4.055   4.492

describe(wmurders)

##    vars  n mean   sd median trimmed  mad  min  max range  skew kurtosis  se
## X1    1 55 3.39 0.75   3.46     3.4 0.97 2.23 4.49  2.26 -0.14    -1.57 0.1

autoplot(wmurders) + ylab("Rate of women murdered") + xlab("Year")

#ETS model
wm.ets <- ets(wmurders, model="ZZZ", damped=NULL, alpha=NULL, beta=NULL,
    gamma=NULL, phi=NULL, lambda=NULL, biasadj=FALSE,
    additive.only=FALSE, restrict=TRUE,
    allow.multiplicative.trend=FALSE)
wm.ets

## ETS(M,N,N) 
## 
## Call:
##  ets(y = wmurders, model = "ZZZ", damped = NULL, alpha = NULL,  
## 
##  Call:
##      beta = NULL, gamma = NULL, phi = NULL, additive.only = FALSE,  
## 
##  Call:
##      lambda = NULL, biasadj = FALSE, restrict = TRUE, allow.multiplicative.trend = FALSE) 
## 
##   Smoothing parameters:
##     alpha = 0.9599 
## 
##   Initial states:
##     l = 2.4177 
## 
##   sigma:  0.0617
## 
##      AIC     AICc      BIC 
## 49.36113 49.83172 55.38313

#With Arima model

wm.arima = wmurders %>%
  Arima(order=c(0,1,1), seasonal=c(0,1,1)) 
wm.arima

## Series: . 
## ARIMA(0,1,1) 
## 
## Coefficients:
##           ma1
##       -0.0527
## s.e.   0.1070
## 
## sigma^2 estimated as 0.04628:  log likelihood=6.85
## AIC=-9.7   AICc=-9.47   BIC=-5.72

checkresiduals(wm.arima)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1)
## Q* = 13.122, df = 9, p-value = 0.1572
## 
## Model df: 1.   Total lags used: 10

wm.arima2 = arima(wmurders, order=c(3,1,1))
wm.arima2

## 
## Call:
## arima(x = wmurders, order = c(3, 1, 1))
## 
## Coefficients:
##          ar1     ar2      ar3      ma1
##       0.0683  0.3011  -0.0608  -0.1179
## s.e.  1.7595  0.1635   0.5328   1.7569
## 
## sigma^2 estimated as 0.04104:  log likelihood = 9.5,  aic = -8.99

checkresiduals(wm.arima2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(3,1,1)
## Q* = 9.3026, df = 6, p-value = 0.1573
## 
## Model df: 4.   Total lags used: 10

#Auto Arima
wm.auarimta = auto.arima(wmurders)
wm.auarimta

## Series: wmurders 
## ARIMA(1,2,1) 
## 
## Coefficients:
##           ar1      ma1
##       -0.2434  -0.8261
## s.e.   0.1553   0.1143
## 
## sigma^2 estimated as 0.04632:  log likelihood=6.44
## AIC=-6.88   AICc=-6.39   BIC=-0.97

fc1 = forecast(wm.auarimta,h = 3 )
fc1

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005       2.470660 2.194836 2.746484 2.048824 2.892496
## 2006       2.363106 1.986351 2.739862 1.786908 2.939304
## 2007       2.252833 1.765391 2.740276 1.507354 2.998313

plot(fc1)

#compare with forecast w/o auto.arima
fc2 = forecast(wmurders,h = 3 )
fc2

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005       2.592549 2.387657 2.797442 2.279194 2.905905
## 2006       2.592549 2.308273 2.876826 2.157786 3.027313
## 2007       2.592549 2.246455 2.938644 2.063243 3.121855

plot(fc2)

kable(accuracy(fc1))

	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	-0.0106596	0.2072523	0.1528734	-0.2149476	4.335214	0.9400996	0.0217634

kable(accuracy(fc2))

	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	0.0033121	0.211224	0.1601012	-0.0556575	4.653724	0.9845473	-0.0340256

# with a lower RMSE, aut.arima does better with fitting

8.Consider austa, the total international visitors to Australia (in millions) for the period 1980-2015.

a.Use auto.arima() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods. b.Plot forecasts from an ARIMA(0,1,1) model with no drift and compare these to part a. Remove the MA term and plot again. c.Plot forecasts from an ARIMA(2,1,3) model with drift. Remove the constant and see what happens. d.Plot forecasts from an ARIMA(0,0,1) model with a constant. Remove the MA term and plot again. e.Plot forecasts from an ARIMA(0,2,1) model with no constant.

#Not sure why austa doesn't work as proper type, so i used this dataset from R with the same contain but starting time at 1990 except 1980

#Auto.Arima
austa= auto.arima(austourists)
checkresiduals(austa)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,0)(1,1,0)[4] with drift
## Q* = 4.0937, df = 5, p-value = 0.536
## 
## Model df: 3.   Total lags used: 8

plot(forecast(austa,h=10))

#Arima
austa.arima = arima(austourists, order=c(0,1,1))
plot(forecast(austa.arima,h=10))

checkresiduals(austa.arima)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1)
## Q* = 145.5, df = 7, p-value < 2.2e-16
## 
## Model df: 1.   Total lags used: 8

#nodrifts(2,1,3)
austa.arima1 = arima(austourists, order=c(2,1,3))
plot(forecast(austa.arima1,h=10))

checkresiduals(austa.arima1)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,1,3)
## Q* = 231.4, df = 3, p-value < 2.2e-16
## 
## Model df: 5.   Total lags used: 8

#(0,0,1)
austa.arima2 = arima(austourists, order=c(2,1,3))
plot(forecast(austa.arima2,h=10))

checkresiduals(austa.arima2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,1,3)
## Q* = 231.4, df = 3, p-value < 2.2e-16
## 
## Model df: 5.   Total lags used: 8

#w/o MA 

austa.arima3 = arima(austourists)
plot(forecast(austa.arima3,h=10))

checkresiduals(austa.arima3)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,0) with non-zero mean
## Q* = 184.38, df = 7, p-value < 2.2e-16
## 
## Model df: 1.   Total lags used: 8

#(0,2,1)
austa.arima4 = arima(austourists, order=c(0,2,1))
plot(forecast(austa.arima4,h=10))

checkresiduals(austa.arima4)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,2,1)
## Q* = 185.57, df = 7, p-value < 2.2e-16
## 
## Model df: 1.   Total lags used: 8

9.For the usgdp series:

a.if necessary, find a suitable Box-Cox transformation for the data; b.fit a suitable ARIMA model to the transformed data using auto.arima(); c.try some other plausible models by experimenting with the orders chosen; d.choose what you think is the best model and check the residual diagnostics; e.produce forecasts of your fitted model. Do the forecasts look reasonable? f.compare the results with what you would obtain using ets() (with no transformation).

plot(BoxCox(usgdp, lambda=0), main="US GDP") # chose this one with log transformation

plot(BoxCox(usgdp, lambda=1/2), main="US GDP")

plot(BoxCox(usgdp, lambda=1/3), main="US GDP")

usgdp.new = BoxCox(usgdp, lambda=0)
auto.arima(usgdp.new)

## Series: usgdp.new 
## ARIMA(2,1,2) with drift 
## 
## Coefficients:
##          ar1      ar2      ma1     ma2   drift
##       1.3594  -0.7732  -1.1002  0.6102  0.0084
## s.e.  0.1489   0.1774   0.2197  0.2285  0.0007
## 
## sigma^2 estimated as 8.503e-05:  log likelihood=773.76
## AIC=-1535.52   AICc=-1535.15   BIC=-1514.73

checkresiduals(auto.arima(usgdp.new))

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,1,2) with drift
## Q* = 6.3724, df = 3, p-value = 0.09483
## 
## Model df: 5.   Total lags used: 8

#With just arima
arima(usgdp.new,order=c(0,2,1))

## 
## Call:
## arima(x = usgdp.new, order = c(0, 2, 1))
## 
## Coefficients:
##           ma1
##       -0.9994
## s.e.   0.1382
## 
## sigma^2 estimated as 9.781e-05:  log likelihood = 748.7,  aic = -1493.4

arima(usgdp.new,order=c(0,0,1))

## 
## Call:
## arima(x = usgdp.new, order = c(0, 0, 1))
## 
## Coefficients:
##          ma1  intercept
##       1.0000     8.3967
## s.e.  0.0186     0.0373
## 
## sigma^2 estimated as 0.08289:  log likelihood = -43.93,  aic = 93.87

kable(accuracy(arima(usgdp.new,order=c(0,2,1)))) #this one is much better with the lowest RMSE, will d forecaste on this one

	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	-0.0004536	0.0098713	0.0072944	-0.0049525	0.089257	0.6876135	0.3302236

kable(accuracy(arima(usgdp.new,order=c(0,0,1))))

	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	0.0012828	0.28791	0.2473493	-0.2202569	2.979135	23.31646	0.9590697

fc = forecast(arima(usgdp.new,order=c(0,2,1)),h = 25)
fc

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2006 Q2       9.350083 9.337386 9.362781 9.330664 9.369503
## 2006 Q3       9.358482 9.340487 9.376478 9.330961 9.386004
## 2006 Q4       9.366882 9.344795 9.388968 9.333103 9.400660
## 2007 Q1       9.375281 9.349724 9.400837 9.336195 9.414366
## 2007 Q2       9.383680 9.355046 9.412313 9.339889 9.427470
## 2007 Q3       9.392079 9.360647 9.423510 9.344008 9.440149
## 2007 Q4       9.400478 9.366457 9.434498 9.348448 9.452507
## 2008 Q1       9.408877 9.372432 9.445321 9.353140 9.464614
## 2008 Q2       9.417276 9.378541 9.456011 9.358036 9.476516
## 2008 Q3       9.425675 9.384761 9.466589 9.363102 9.488247
## 2008 Q4       9.434074 9.391075 9.477073 9.368313 9.499835
## 2009 Q1       9.442473 9.397471 9.487475 9.373648 9.511298
## 2009 Q2       9.450872 9.403937 9.497806 9.379092 9.522652
## 2009 Q3       9.459271 9.410466 9.508076 9.384631 9.533911
## 2009 Q4       9.467670 9.417051 9.518289 9.390255 9.545085
## 2010 Q1       9.476069 9.423685 9.528453 9.395955 9.556183
## 2010 Q2       9.484468 9.430364 9.538572 9.401724 9.567213
## 2010 Q3       9.492867 9.437084 9.548650 9.407555 9.578180
## 2010 Q4       9.501266 9.443841 9.558691 9.413442 9.589090
## 2011 Q1       9.509665 9.450632 9.568698 9.419382 9.599948
## 2011 Q2       9.518064 9.457455 9.578674 9.425370 9.610759
## 2011 Q3       9.526463 9.464306 9.588621 9.431402 9.621525
## 2011 Q4       9.534862 9.471184 9.598540 9.437475 9.632250
## 2012 Q1       9.543261 9.478088 9.608435 9.443587 9.642936
## 2012 Q2       9.551661 9.485014 9.618307 9.449734 9.653587

autoplot(fc)

#compare with ets
fc2 = forecast(ets(usgdp.new),h=25)
fc2

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2006 Q2       9.350070 9.337326 9.362815 9.330580 9.369561
## 2006 Q3       9.358478 9.340492 9.376465 9.330970 9.385987
## 2006 Q4       9.366886 9.344871 9.388901 9.333217 9.400555
## 2007 Q1       9.375294 9.349881 9.400707 9.336428 9.414160
## 2007 Q2       9.383702 9.355294 9.412111 9.340255 9.427149
## 2007 Q3       9.392110 9.360993 9.423227 9.344521 9.439699
## 2007 Q4       9.400518 9.366910 9.434127 9.349118 9.451918
## 2008 Q1       9.408926 9.372998 9.444854 9.353979 9.463873
## 2008 Q2       9.417334 9.379227 9.455441 9.359054 9.475614
## 2008 Q3       9.425742 9.385573 9.465911 9.364309 9.487175
## 2008 Q4       9.434150 9.392020 9.476280 9.369718 9.498582
## 2009 Q1       9.442558 9.398554 9.486562 9.375260 9.509856
## 2009 Q2       9.450966 9.405164 9.496768 9.380918 9.521014
## 2009 Q3       9.459374 9.411842 9.506906 9.386680 9.532068
## 2009 Q4       9.467782 9.418580 9.516984 9.392534 9.543030
## 2010 Q1       9.476190 9.425372 9.527007 9.398471 9.553908
## 2010 Q2       9.484598 9.432215 9.536981 9.404485 9.564711
## 2010 Q3       9.493006 9.439102 9.546909 9.410567 9.575444
## 2010 Q4       9.501414 9.446031 9.556796 9.416713 9.586114
## 2011 Q1       9.509821 9.452998 9.566645 9.422917 9.596726
## 2011 Q2       9.518229 9.460000 9.576459 9.429175 9.607284
## 2011 Q3       9.526637 9.467035 9.586240 9.435484 9.617791
## 2011 Q4       9.535045 9.474101 9.595990 9.441839 9.628252
## 2012 Q1       9.543453 9.481196 9.605711 9.448238 9.638668
## 2012 Q2       9.551861 9.488317 9.615406 9.454679 9.649044

autoplot(fc2)

#The results are pretty similar

10.Consider austourists, the quarterly visitor nights (in millions) spent by international tourists to Australia for the period 1999–2015. Describe the time plot. What can you learn from the ACF graph? What can you learn from the PACF graph? Produce plots of the seasonally differenced data
(1B4)Yt . What model do these graphs suggest? Does auto.arima() give the same model that you chose? If not, which model do you think is better? Write the model in terms of the backshift operator, then without using the backshift operator.

summary(austourists)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   19.15   31.14   38.60   40.93   49.64   73.26

autoplot(austourists)

acf(austourists)

pacf(austourists)

#There are similarities from the critical regions, and the pacf graph focus mor eon the correlation of residuals within data

#plots of seasonally diff. data
plot(diff(austourists, lag=4, differences=1), xlab="Year", ylab="Number of Night Visitors", main="International Tourists to Australia from 1990 to 2015", col="purple")

#since the trend doesn't look strong from the plots, I will use auto.arima for better hangling the light trends

auto.arima(austourists)

## Series: austourists 
## ARIMA(1,0,0)(1,1,0)[4] with drift 
## 
## Coefficients:
##          ar1     sar1   drift
##       0.4705  -0.5305  0.5489
## s.e.  0.1154   0.1122  0.0864
## 
## sigma^2 estimated as 5.15:  log likelihood=-142.48
## AIC=292.97   AICc=293.65   BIC=301.6

#Write the model in terms of the backshift operator:
#(1−B^4)Yt = BY(t-1) + e[i]

12.For the mcopper data: a.if necessary, find a suitable Box-Cox transformation for the data; b.fit a suitable ARIMA model to the transformed data using auto.arima(); c.try some other plausible models by experimenting with the orders chosen; d.choose what you think is the best model and check the residual diagnostics; e.produce forecasts of your fitted model. Do the forecasts look reasonable? f.compare the results with what you would obtain using ets() (with no transformation).

autoplot(mcopper)  + xlab("Year")

mccopper.new = BoxCox(mcopper, lambda=0)
autoplot(mccopper.new)

mccopper.new1 = BoxCox(mcopper, lambda=1/2)
autoplot(mccopper.new1)

mccopper.new2 = BoxCox(mcopper, lambda=1/3)
autoplot(mccopper.new2)

#with auto.arima
mc1 = auto.arima(mccopper.new)

autoplot(mc1)

#try arima
mc2=arima(mccopper.new, order=c(1,1,1))
mc2

## 
## Call:
## arima(x = mccopper.new, order = c(1, 1, 1))
## 
## Coefficients:
##           ar1     ma1
##       -0.0083  0.3825
## s.e.   0.1007  0.0912
## 
## sigma^2 estimated as 0.003628:  log likelihood = 782.85,  aic = -1559.69

mc3=arima(mccopper.new, order=c(1,1,1))
mc3

## 
## Call:
## arima(x = mccopper.new, order = c(1, 1, 1))
## 
## Coefficients:
##           ar1     ma1
##       -0.0083  0.3825
## s.e.   0.1007  0.0912
## 
## sigma^2 estimated as 0.003628:  log likelihood = 782.85,  aic = -1559.69

kable(accuracy(forecast(mc1)))

	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	0.0033419	0.0601786	0.0431071	0.0469853	0.6404858	0.1997745	-0.0040969

kable(accuracy(forecast(mc2)))

	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	0.0033531	0.0601783	0.0431101	0.047132	0.6405084	0.1997883	-0.0028409

kable(accuracy(forecast(mc3)))

	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	0.0033531	0.0601783	0.0431101	0.047132	0.6405084	0.1997883	-0.0028409

#mc2$3 are almost the same but a bit better than mc1
autoplot(forecast(mc2))

#Compare with applying ets 

mc4=ets(mccopper.new)
mc4

## ETS(A,Ad,N) 
## 
## Call:
##  ets(y = mccopper.new) 
## 
##   Smoothing parameters:
##     alpha = 0.9999 
##     beta  = 0.119 
##     phi   = 0.8775 
## 
##   Initial states:
##     l = 5.5644 
##     b = -0.0051 
## 
##   sigma:  0.0637
## 
##      AIC     AICc      BIC 
## 474.2865 474.4373 500.2968

kable(accuracy(forecast(mc4)))

	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	0.0024387	0.0634376	0.045512	0.0350968	0.6759264	0.2109194	0.2349072

autoplot(forecast(mc4))

#Again, the results are very similar, excpet there is negative trend from ETS than ARIMA

14.For the same time series you used in the previous exercise, try using a non-seasonal model applied to the seasonally adjusted data obtained from STL. The stlf() function will make the calculations easy (with method=“arima”). Compare the forecasts with those obtained in the previous exercise. Which do you think is the best approach?

sales.stlf = stlf(auscafe, method="arima")
sales.stlf.fc = forecast(sales.stlf, h=24)

autoplot(sales.stlf.fc)

checkresiduals(sales.stlf.fc)

## Warning in checkresiduals(sales.stlf.fc): The fitted degrees of freedom is based
## on the model used for the seasonally adjusted data.

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  ARIMA(2,2,3)
## Q* = 77.202, df = 19, p-value = 5.618e-09
## 
## Model df: 5.   Total lags used: 24

kable(accuracy(sales.stlf.fc))

	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	0.0023181	0.0322502	0.0233357	0.1893512	1.649092	0.2233375	-0.0094788

# I think stl model work a bit better here, since we are seeing overall improvement on MASE and better value from RMSE as well

15.For your retail time series (Exercise 5 above):

develop an appropriate seasonal ARIMA model; compare the forecasts with those you obtained in earlier chapters; Obtain up-to-date retail data from the ABS website (Cat 8501.0, Table 11), and compare your forecasts with the actual numbers. How good were the forecasts from the various models?

#retail.ts = ts(retail)
#summary(retail.ts)

#auto.arima(retail.ts)

## not sure why, but I am getting error from running auto.arima for the retail dataset as ts;same problem with question 5

16.Consider sheep, the sheep population of England and Wales from 1867–1939.

Produce a time plot of the time series.

Assume you decide to fit the following model:

autoplot(sheep) + ylab("sheep population ") + xlab("Year") + ggtitle("sheep population of England and Wales from 1867–1939")

auto.arima(sheep)

## Series: sheep 
## ARIMA(3,1,0) 
## 
## Coefficients:
##          ar1      ar2      ar3
##       0.4210  -0.2018  -0.3044
## s.e.  0.1193   0.1363   0.1243
## 
## sigma^2 estimated as 4991:  log likelihood=-407.56
## AIC=823.12   AICc=823.71   BIC=832.22

sheep.auto = auto.arima(sheep)
fc1 = forecast(sheep.auto)
fc2 = forecast(arima(sheep)) #compare with just arima

kable(accuracy(fc1))#Much better result

	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	-6.566385	68.68715	53.43974	-0.4246247	2.977279	0.7962876	-0.0614987

kable(accuracy(fc2))

	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	0	221.2673	170.5344	-1.512368	9.5306	2.541076	0.9128629

#acf(fc1)
#pacf(fc1)
checkresiduals(fc1)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(3,1,0)
## Q* = 3.4035, df = 7, p-value = 0.8453
## 
## Model df: 3.   Total lags used: 10

## I think this model is appropriate to use for prediction base on the residual checking results; it also kinda make sence since the data do not show strong trend, and arima is pretty good with handling that

. The last five values of the series are given below: The estimated parameters are ϕ1=0.42, ϕ2=−0.20, and ϕ3=−0.30. Without using the forecast function, calculate forecasts for the next three years (1940–1942).

knowing that σ^2[1+α^2(h-1)]

hence, the calculated results are 1940 (ϕ1=0.42, ϕ2=−0.20, ϕ3=−0.30); 1941(ϕ1=0.42, ϕ2=−0.20, ϕ3=−0.30); 1942(ϕ1=0.42, ϕ2=−0.20, ϕ3=−0.30)

#obtain the forecasts using forecast
sheep.ts = ts(sheep)
forecast(sheep.ts)

##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 74       1796.999 1689.519 1904.479 1632.623 1961.376
## 75       1796.999 1645.007 1948.991 1564.548 2029.451
## 76       1796.999 1610.851 1983.148 1512.310 2081.689
## 77       1796.999 1582.056 2011.943 1468.271 2125.728
## 78       1796.999 1556.686 2037.313 1429.472 2164.527
## 79       1796.999 1533.750 2060.249 1394.394 2199.604
## 80       1796.999 1512.658 2081.340 1362.137 2231.861
## 81       1796.999 1493.027 2100.972 1332.113 2261.886
## 82       1796.999 1474.588 2119.411 1303.914 2290.085
## 83       1796.999 1457.148 2136.850 1277.242 2316.757

kable(accuracy(forecast(sheep.ts)))

	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	-5.561613	82.71026	66.19504	-0.3826213	3.622046	0.9863499	0.3571961

##based on the accuracy, this model is not as good as results from applying auto.arima then forecasting

17.The annual bituminous coal production in the United States from 1920 to 1968 is in data set bicoal.

#bicoal
autoplot(bicoal) + ylab("Bituminous Coal Production ") + xlab("Year") + ggtitle ("bituminous coal production in the United States from 1920 to 1968")

bicoal.auto = auto.arima(bicoal)
bicoal.auto

## Series: bicoal 
## ARIMA(4,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1      ar2     ar3      ar4      mean
##       0.8334  -0.3443  0.5525  -0.3780  481.5221
## s.e.  0.1366   0.1752  0.1733   0.1414   21.0591
## 
## sigma^2 estimated as 2795:  log likelihood=-262.05
## AIC=536.1   AICc=538.1   BIC=547.45

bicoal.auto.fc = forecast(bicoal.auto)


acf(bicoal)

pacf(bicoal)

#obtain the forecasts from the same model.
bicocal.ts = ts(bicoal)
bicoal.ts.fc = forecast(arima(bicocal.ts ))

checkresiduals(bicoal.ts.fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,0) with non-zero mean
## Q* = 74.888, df = 9, p-value = 1.663e-12
## 
## Model df: 1.   Total lags used: 10

checkresiduals(bicoal.auto.fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(4,0,0) with non-zero mean
## Q* = 4.852, df = 5, p-value = 0.4342
## 
## Model df: 5.   Total lags used: 10

18.Before doing this exercise, you will need to install the Quandl package in R using

#install.packages("Quandl")
#library(Quandl)

#y = Quandl("?????", api_key="?????", type="ts")

##Getting Error messgae as:
#Error in format.code(code) : Codes are comprised of capital letters, numbers and underscores only.

# Hence I picked data usmelec for completing this problem

autoplot(usmelec) + ylab("total net generation of electricity") + xlab("Year")

usmelec.atuo = auto.arima(usmelec)
checkresiduals(usmelec)

## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.

usmelec.fc = forecast(arima(usmelec))

autoplot(usmelec.fc)

# Compare with ETS

usmele.ets = ets(usmelec, model = "MAN")
checkresiduals(usmele.ets)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(M,Ad,N)
## Q* = 1827.2, df = 19, p-value < 2.2e-16
## 
## Model df: 5.   Total lags used: 24

usmele.ets.fc = forecast(usmele.ets)

autoplot(usmele.ets.fc)

## I will prefer ETS model a bit more in this situation 

kable(accuracy(usmelec.fc))

	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	0	68.80362	59.54107	-7.778652	25.42039	6.61219	0.9356744

kable(accuracy(usmele.ets))## I will prefer ETS model a bit more in this situation. much better accuracy and improvements could be seen

	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1
Training set	0.4032889	23.86919	19.27969	-0.2014986	7.291456	2.14106	0.1300418

Chapter 8 Exercise

Yuanye Gao

11/24/2020