8.1, 8.2, 8.3, 8.5., 8.6, 8.7

8.1

Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers. 1.

a

Explain the differences among these figures. Do they all indicate that the data are white noise?
Each series is showing the ACF function of a random time series. Each is indicating white noise since correlations at all lags fall within the statistically significant intervals indicated by the dotted blue lines. As expected, the direction and magnitude of the correlations do not indicate any discernible pattern suggesting, white noise. AS the number of random samples of the data set increases, the correlation of respective lags decreases as the likelihood of correlation due purely to chance decreases - therefore the 95% confidence interval also decreases in width.

b

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

This is due to the different number of observations in the time series. The dotted lines can be thought of as the probability that a correlation of a given lag is significant, not due to chance.

8.2

A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

The ACF shows correlations with large positive lags that has a very gradual fall to 0. It takes about 100 days to fall to 0. For this example, there is information gain in the PACF correlogram. Removing the effects of lags shows only one remaining spike at lag 1. Therefore, a stationary time series could be obtained by differencing the data. The remaining lags appear as white noise, suggesting that seasonal differencing is not necessary.

ggtsdisplay(ibmclose)

8.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

usnetelec

The first panel shows the original time series, in which there is a clear trend. The second shows the variance transformed time series (lambda=.5, square root transformation). The third panel shows the differenced time series (differencing order = 1), which is now stationary.

# find box cox transformation 
lambda <- BoxCox.lambda(usnetelec) %>% round(1)

# plot original data, transformed data, and differenced/transformed data
cbind("Electricity generation" = usnetelec,
      "Transformed electricity generation" = BoxCox(usnetelec, lambda),
      "Annual change in Transformed electricity generation" = diff(BoxCox(usnetelec, lambda),1)) %>%
  autoplot(facets=TRUE) 

usgdp

A box-cox transformation of lambda=0.4 applied to the data stabalizes the variance, and differecning order of 1 can make the time series stationary.

# find box cox transformation 
lambda <- BoxCox.lambda(usgdp) %>% round(1)

#find differencing
autoplot(usgdp)

ggseasonplot(usgdp)

# plot original data, transformed data, and differenced/transformed data
cbind("US Gross Domestic PRoduct" = usgdp,
      "Transformed US Gross Domestic PRoduct" = BoxCox(usgdp, lambda),
      "Annual change in Transformed US Gross Domestic PRoduct" = diff(BoxCox(usgdp, lambda),1)) %>%
  autoplot(facets=TRUE) 

mcopper

A box-cox transformation of lambda=0.2 applied to the data stabalizes the variance, and differecning order of 1 can make the time series stationary.

# find box cox transformation 
lambda <- BoxCox.lambda(mcopper) %>% round(1)

#find differencing
pacf(mcopper)

ggseasonplot(mcopper)

# plot original data, transformed data, and differenced/transformed data
cbind("Monthly copper proce" = mcopper,
      "Transformed copper proce" = BoxCox(mcopper, lambda),
      "Annual change in Transformed copper proce" = diff(BoxCox(mcopper, lambda),1)) %>%
  autoplot(facets=TRUE) 

enplanements

in this time series, there is a seasonal pattern, so in addition to a BoxCox transform, lambda=-0.2, there needs to be a differencing on the seasonal component, which is twelve records per year. After that an additional differencing is taken to end up with stationary data.

str(enplanements)
##  Time-Series [1:282] from 1979 to 2002: 21.1 22.9 25.9 24.4 23.4 ...
# find box cox transformation 
lambda <- BoxCox.lambda(enplanements) %>% round(1)

#find differencing
autoplot(enplanements)

ggseasonplot(enplanements, polar=T)

# plot original data, transformed data, and differenced/transformed data
cbind("Electricity generation" = enplanements,
      "Transformed electricity generation" = BoxCox(enplanements, lambda),
      "Annual change in Transformed electricity generation" = diff(BoxCox(enplanements, lambda),12)) %>% diff(1) %>% 
  autoplot(facets=TRUE) 

visitors

A BoxCox transformation with lambda= 0.3 and then a differencing of order=1 on the seasonal component should achieve stationarity. Additional differencing does not seem necessary.

# find box cox transformation 
lambda <- BoxCox.lambda(visitors) %>% round(1)

#find differencing
autoplot(visitors)

# seasonal patterns
ggseasonplot(visitors, polar = T)

# plot original data, transformed data, and differenced/transformed data
cbind("Electricity generation" = visitors,
      "Transformed electricity generation" = BoxCox(visitors, lambda),
      "Annual change in Transformed electricity generation" = diff(BoxCox(visitors, lambda),12)) %>%
  autoplot(facets=TRUE) 

8.5

For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

This sales data has seasonal patterns in it. A differencing should be taken on the seasonal component and also an additional differencing.

# read in retail data from excel spreadsheet
retaildata <- readxl::read_excel("retail.xlsx", skip=1)

# create time series object
myts <- ts(retaildata[,"A3349627V"],
  frequency=12, start=c(1982,4))

# plot original time series 
autoplot(myts) + 
  ggtitle("Orignal liquor sales time series ")

# seasonal plots
ggseasonplot(myts, polar = T)

# check out autocorrelatins
acf(myts, lag.max = 20)

pacf(myts, lag.max =25)

# seasonally differenced time series
ts_diff <- diff(myts,12) 

ts_diff <- diff(ts_diff,1)
ggtsdisplay(ts_diff)

8.6

Use R to simulate and plot some data from simple ARIMA models.

a.

Use the following R code to generate data from an AR(1) model with ϕ1=0.6 and σ2=1. The process starts with y1=0.

# gneralized function to produce an AR1 model with given coefficient
AR <- function(phi){
  y <- ts(numeric(100))
  e <- rnorm(100) # random normal numbers with sd =1 
  for(i in 2:100){
    y[i] <- phi*y[i-1] + e[i]
  }
  return(y)
}

b.

Produce a time plot for the series. How does the plot change as you change ϕ1?

Keeping the value of phi between 0 and 1, lower values cause more fluctations and variance with level changes while higher values have lower variance and high value magnitudes.

# phi = 0.6
AR(0.6) %>% autoplot()

# phi = 0.1
AR(0.2) %>% autoplot()

# phi= 0.9
AR(0.9) %>% autoplot()

c. 

Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1.

MA1 <- function(theta){
  y <- ts(numeric(100))
  e <- rnorm(100) # random normal numbers with sd =1 
  for(i in 2:100){
    y[i] <- theta*e[i-1] + e[i]
  }
  return(y)
}

d.

Produce a time plot for the series. How does the plot change as you change θ1?

MA1(0.6) %>% autoplot

MA1(0.1) %>% autoplot

MA1(0.9) %>% autoplot

e.

Generate data from an ARMA(1,1) model with ϕ1=0.6, θ1=0.6 and σ2=1.

ARMA11 <- function(theta, phi){
  y <- ts(numeric(100))
  e <- rnorm(100) # random normal numbers with sd =1 
  for(i in 2:100){
    y[i] <- phi*y[i-1] + e[i] + theta*e[i-1] 
  }
  return(y)
}
ARMA11(0.6, 0.6) %>%  autoplot()

f.

Generate data from an AR(2) model with ϕ1=−0.8, ϕ2=0.3 and 0 σ2=1. (Note that these parameters will give a non-stationary series.)

AR2 <- function(phi1, phi2){
  y <- ts(numeric(100))
  e <- rnorm(100) # random normal numbers with sd =1 
  for(i in 3:100){
    y[i] <- phi1*y[i-1] + phi2*y[i-2] + e[i] 
  }
  return(y)
}
AR2(-0.8, 0.3) %>%  autoplot()

g.

Graph the latter two series and compare them.

The AR2 model has a compunding increase in variance as the level of the time series increases. It fluctuates from an increasing negative number to an increasingly high number due to the opposing signs on its coefficients. The ARMA model does not exhibit variance or rapidly changing function value as the level increases due to the addition of the moving average term.

8.7

a.

Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States. By studying appropriate graphs of the series in R, find an appropriate ARIMA(p,d,q) model for these data.

A differencing of 1 or 2 will make the data more stationary. If differenced twice, however, the ACF will have a significant lag at k=1, suggesting a q order of 1. Likewise, the PACF has a signifincat lag at k=1 suggesting a p order of 1. An appropriate model would be ARIMA(1,2,1). An arguement could be made to limit the order of differencing to 1, but under that scenario neither the ACF of PACF has a significant lag at k=0, which would result in a random walk. The data are aggregated annually, so seasonal considerations are not applicable.

library(fpp)
## Loading required package: lmtest
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: tseries
# plot time series, ACF and PACF
autoplot(wmurders)

# d of 1 makes it stationary
diff(wmurders,2) %>% autoplot

# from graph PACF indicates AR(1)

diff(wmurders,2) %>% acf

diff(wmurders,2) %>% pacf

b.

Should you include a constant in the model? Explain.

No, because there is an integrated term. A constant would cause drift when there is none.

c.

Write this model in terms of the backshift operator.

(1-phi B)(1-B)^2 yt = (1+theta B)et

d. 

Fit the model using R and examine the residuals. Is the model satisfactory?

The residuals suggest the model is a valid model - normal distribution of white noise indicated by the ACF and residual histogram. The results of the Ljung-Box test confirm what was learned from the visuals.

fit1 <- arima(wmurders, order=c(1,2,1))
checkresiduals(fit1)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,2,1)
## Q* = 12.419, df = 8, p-value = 0.1335
## 
## Model df: 2.   Total lags used: 10

e.

Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.

# forecst 3  times ahead
forecast(fit1, h=3)
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005       2.470660 2.200091 2.741229 2.056861 2.884459
## 2006       2.363106 1.993529 2.732684 1.797886 2.928327
## 2007       2.252833 1.774677 2.730989 1.521557 2.984110

f. 

Create a plot of the series with forecasts and prediction intervals for the next three periods shown.

autoplot(forecast(fit1, h=3))

g.

Does auto.arima() give the same model you have chosen? If not, which model do you think is better?

auto.arima() gives a model of ARIMA(0,2,3) which is different from what I came up with. It has a lower AIC and also shows a rate of decrease less than the model I fit, there is a limit to how low the model predictions should go, clearly negative prediction intervals should be avoided. The residual plots on my model were marginally better.

fit2 <- auto.arima(wmurders, stepwise=F, approximation = F)

checkresiduals(fit2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,2,3)
## Q* = 10.706, df = 7, p-value = 0.152
## 
## Model df: 3.   Total lags used: 10
autoplot(forecast(fit2, h=3))