DATA 624 Homework 6

Question 8.1

Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers, and 1,000 random numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise?

Image Question 8.1

The above graph show us correlation between various lags of the series. It shows the ACFs for 36, 360 and 1000 random numbers. The Y axis seem to be on the same scale for 36, 360 and 1000 data points and the X axis shows the lags getting longer. It indicates the data are white noise. Most of the lags do not cross the blue line that represent bounds. We expect atlease 95% of lags to be within the boundaries

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

As the number of data increases the value of the significant coorelation decreases. Also as the observation count increases, the number of large outliers from the mean also decreases. This is the law of large numbers.

Question 8.2

A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

library(fpp2)
library(urca)
library(readxl)

data("ibmclose")
autoplot(ibmclose)

ggAcf(ibmclose)

ggPacf(ibmclose)

From the above autoplot charts we can see the time series is not stationary. It also doesn’t seem to exhibit seasonal patterns. The data need differencing to get stationary data. The ACF plots show high correlation for the different lags. The plot reveals that 95% of spikes are within the bound.

Question 8.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Transformations like logarithms can help to stabilize the variance of a time series. Differencing can help stabilize the mean of a time series by removing changes in the level of a time series, and therefore eliminating (or reducing) trend and seasonality. And that’s the requisite for obtaining stationary data.

a. usnetelec

ggtsdisplay(usnetelec, main = "Annual US Net Electricity Generation", ylab = "Billion kWh", 
    xlab = "Year")

usnetelec_bc <- BoxCox(usnetelec, lambda=BoxCox.lambda(usnetelec))
ndiffs(usnetelec_bc)

## [1] 2

usnetelec_bc_diff <- diff(diff(usnetelec_bc))
ndiffs(usnetelec_bc_diff)

## [1] 0

ggtsdisplay(usnetelec_bc_diff)

Data is stationary and trend is removed.

b. usgdp

ggtsdisplay(usgdp, main = "Quarterly US GDP", ylab = "GDP", xlab = "Year")

usgdp_bc <- BoxCox(usgdp, lambda=BoxCox.lambda(usgdp))
ndiffs(usgdp_bc)

## [1] 1

usgdp_bc_diff <- diff(usgdp_bc)
ndiffs(usgdp_bc_diff)

## [1] 0

ggtsdisplay(usgdp_bc_diff)

There is 95% spikes but are within the bounds after one order differencing, data is stationary.

c. mcopper

ggtsdisplay(mcopper, main = "Monthly Copper Prices", ylab = "Price", xlab = "Year")

mcopper_bc <- BoxCox(mcopper, lambda=BoxCox.lambda(mcopper))
ndiffs(mcopper_bc)

## [1] 1

mcopper_bc_diff <- diff(mcopper_bc)
ndiffs(mcopper_bc_diff)

## [1] 0

ggtsdisplay(mcopper_bc_diff)

After BoxCox and first differencing the timeseries is stationary.

d. enplanements

ggtsdisplay(enplanements, main = "Monthly US Domestic Enplanements", ylab = "million", 
    xlab = "Year")

enplanements_bc <- BoxCox(enplanements, lambda=BoxCox.lambda(enplanements))
ndiffs(enplanements_bc)

## [1] 1

nsdiffs(enplanements_bc)

## [1] 1

enplanements_bc_diff <- diff(enplanements_bc)
enplanements_bc_diff1 <- diff(enplanements_bc_diff, lag=12)
ndiffs(enplanements_bc_diff1)

## [1] 0

nsdiffs(enplanements_bc_diff1)

## [1] 0

ggtsdisplay(enplanements_bc_diff1)

Time plot here shows the stationarity and 95% of the spikes are within the bounds.

e. visitors

ggtsdisplay(visitors, main = "Monthly Australian Short-Term Overseas Vistors", ylab = "Visitors in Thousands", 
    xlab = "Year")

visitors_bc <- BoxCox(visitors, lambda=BoxCox.lambda(visitors))
ndiffs(visitors_bc)

## [1] 1

nsdiffs(visitors_bc)

## [1] 1

visitors_bc_diff <- diff(visitors_bc)
visitors_bc_diff1 <- diff(visitors_bc_diff, lag=12)

ggtsdisplay(visitors_bc_diff1)

This time series after BoxCox transformation is very much like the original timeseries. There doesn’t look like a major differences between the 2 ACFs the transformation

Question 8.5

For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

retaildata <- read_excel("data/retail.xlsx", skip = 1)
myts <- ts(retaildata[, "A3349398A"], frequency = 12, start = c(1982, 4))
autoplot(myts) + ylab("Retail Food Sales") + ggtitle("New South Wales - Food Sales")

myts %>% ggtsdisplay()

BoxCox.lambda(myts)

## [1] 0.1231563

myts.bc = BoxCox(myts, BoxCox.lambda(myts))
autoplot(myts.bc, main = "New South Wales - Food Sales (BoxCox transformed)")

The low-value BoxCox (lambda 0.12) reduces the variations in the data

# Number of differences required for a stationary series
ndiffs(myts.bc)

## [1] 1

# Number of differences required for a seasonally stationary series
nsdiffs(myts.bc)

## [1] 1

# KPSS Unit Root Test to test if we have stationary data
myts.bc %>% diff %>% ur.kpss %>% summary

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0117 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

The results show that it is not in the critical region. The test fails to reject the null hypotheses. After differencing there is a stationary data.

Question 8.6

Use R to simulate and plot some data from simple ARIMA models.

a. Use the following R code to generate data from an AR(1) model with ϕ1=0.6 and σ2=1. The process starts with y1=0.

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]

b. Produce a time plot for the series. How does the plot change as you change ϕ1?

#library(ggplot2)
library(gridExtra)
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
p1 <- autoplot(y) +ggtitle("phi=0.6")


for(i in 2:100)
  y[i] <- 0.1*y[i-1] + e[i]
p2<- autoplot(y)+ggtitle(" phi=0.1")




for(i in 2:100)
  y[i] <- 1.0*y[i-1] + e[i]
p3<- autoplot(y)+ggtitle("phi= 1.0")

grid.arrange(p1,p2,p3)

c. Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1.

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*e[i-1] + e[i]

d. Produce a time plot for the series. How does the plot change as you change θ1?

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*e[i-1] + e[i]
p1 <- autoplot(y) +ggtitle("theta= 0.6")



y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.1*e[i-1] + e[i]

p2 <- autoplot(y) +ggtitle(" theta=0.1")



y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 1.0*e[i-1] + e[i]
p3 <- autoplot(y) +ggtitle("theta= 1.0")


grid.arrange(p1,p2,p3)

e. Generate data from an ARMA(1,1) model with ϕ1=0.6, theta1=0.6, and σ2=1.

# e arima(1)
y_ar1 <- ts(numeric(100))
e <- rnorm(100)

#phi =0.6, theta =0.6, sigma^2 = 1
for(i in 2:100)
  y_ar1[i] <- 0.6*y_ar1[i-1]+ 0.6*e[i-1] + e[i]

f. Generate data from an AR(2) model with θ1=−0.8, θ2=0.3, and σ2=1. (Note that these parameters will give a non-stationary series).

y_ar2 <- ts(numeric(100))
e <- rnorm(100)


for(i in 3:100)
  y_ar2[i] <- -0.8*y_ar2[i-1]+ 0.3*y_ar2[i-2] + e[i]

g. Graph the latter two series and compare them.

ggtsdisplay(y_ar1)

ggtsdisplay(y_ar2)

Question 8.7

Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.

By studying appropriate graphs of the series in R, find an appropriate ARMIA(p,d,q) model for these data.

autoplot(wmurders)

wmurders %>% diff() %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.4697 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

ggtsdisplay(wmurders)

Should you include a constant in the model? Explain.

After we differenced the function, constant should not be included in this model. Arima function will set the constant to 0 when d is greater than 0

Write this model in terms of the backshift operator.

\[(1-\phi_1B)(1-B)y_t = c + \epsilon_t\]

Fit the model using R and examine the residuals. Is the model satisfactory?

Yes the model is satisfactory. The p value is not significant and the autocorrelations for lags are in acceptable range.

fit <- Arima(wmurders, c(1,1,0))
checkresiduals(fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,1,0)
## Q* = 13.281, df = 9, p-value = 0.1503
## 
## Model df: 1.   Total lags used: 10

Forecast three times ahead. Check your forecasts by hand to make sure that you know periods shown.

fit

## Series: wmurders 
## ARIMA(1,1,0) 
## 
## Coefficients:
##           ar1
##       -0.0841
## s.e.   0.1346
## 
## sigma^2 estimated as 0.04616:  log likelihood=6.92
## AIC=-9.85   AICc=-9.61   BIC=-5.87

forecast(fit, h=3)

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005       2.595506 2.320177 2.870835 2.174427 3.016585
## 2006       2.594992 2.221624 2.968359 2.023976 3.166007
## 2007       2.595035 2.143387 3.046682 1.904300 3.285770

f1 = wmurders[55] + (-0.0841 * (wmurders[55] - wmurders[54])) - 0.0000025
f2 = f1 + (-0.0841 * (f1 - wmurders[55])) 
f3 = f2 + (-0.0841 * (f2 - f1))
print(paste('forecast 1:', f1))

## [1] "forecast 1: 2.5955066804"

print(paste('forecast 2:', f2))

## [1] "forecast 2: 2.59499167887836"

print(paste('forecast 3:', f3))

## [1] "forecast 3: 2.59503499050633"

for(i in 2:51){
    theta = 1
    test <- wmurders %>% diff()
    test[i] <- test[i - 1] + theta*e[i-1] + e[i]}
test

## Time Series:
## Start = 1951 
## End = 2004 
## Frequency = 1 
##  [1] -0.066051  0.010941 -0.078785  0.034196 -0.096699  0.145162 -0.055508
##  [8]  0.093885  0.081643  0.081254  0.001387  0.048453 -0.047440  0.139087
## [15]  0.123834  0.300088  0.288182 -0.074626  0.104202  0.184722  0.343738
## [22] -0.065378  0.565260  0.000579 -0.094974 -0.412076  0.130620 -0.096573
## [29]  0.182569  0.166352 -0.139356 -0.082214 -0.359198  0.040682 -0.006997
## [36]  0.204758  0.044996  0.035790 -0.066893  0.074951  0.296692 -0.300084
## [43]  0.247114 -0.398660 -0.076133 -0.358104 -0.275239 -0.109680 -0.105621
## [50] -0.229222 -2.687428 -0.506770 -0.135470 -0.072844

Create a plot of the series with forecasts and prediction intervals for the next three periods shown.

autoplot(forecast(fit, h=3))

g . Does auto.arima() give the same model you have chosen? If not, which model do you think is better?

Auto.arima does not give the same as the one selected. Auto arima model projects decreasing rates over the next three years while my model projects a constant rate.

fit_auto <- auto.arima(wmurders, stepwise=FALSE, approximation=FALSE)
autoplot(forecast(fit_auto,3))

After comparing both models AICc the manual AICc is -9.61 while auto.arima() has an AICc of -6.7. The AICc of Manual ARIMA model is smaller and therefore better.

fit

## Series: wmurders 
## ARIMA(1,1,0) 
## 
## Coefficients:
##           ar1
##       -0.0841
## s.e.   0.1346
## 
## sigma^2 estimated as 0.04616:  log likelihood=6.92
## AIC=-9.85   AICc=-9.61   BIC=-5.87

fit_auto

## Series: wmurders 
## ARIMA(0,2,3) 
## 
## Coefficients:
##           ma1     ma2      ma3
##       -1.0154  0.4324  -0.3217
## s.e.   0.1282  0.2278   0.1737
## 
## sigma^2 estimated as 0.04475:  log likelihood=7.77
## AIC=-7.54   AICc=-6.7   BIC=0.35