DATA 624: Home Work 06

Libraries
Exercise 8.1
Exercise 8.2
Exercise 8.3
Excercise 8.5
Excercise 8.6
- Question 8.7

Libraries

library(ggplot2)
library(fpp2)
library(plotly)
library(kableExtra)
library(gridExtra)
library(tseries)
library(readxl)

Exercise 8.1

Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

a.Explain the differences among these figures. Do they all indicate that the data are white noise?

Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers.

All three figure shows ACFs, which used to assess whether a time series is dependent on its past. Here, length of the three time series are different which also directly related to autocorrelation. Time series that show no autocorrelation are called white noise. For white noise series, we expect each autocorrelation to be close to zero. Of course, they will not be exactly equal to zero as there is some random variation. For a white noise series, we expect 95% of the spikes in the ACF to lie within $\pm 2/\sqrt { T }$ where $T$ is the length of the time series. It is common to plot these bounds on a graph of the ACF (the blue dashed lines above).

If one or more large spikes are outside these bounds, or if substantially more than 5% of spikes are outside these bounds, then the series is probably not white noise. So, we can conclude all three series are white noise.

b.Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

Critical values at different distances from the mean of zero because the distances depends on the length of the time series. Also, autocorrelations are different in each figure for same reason and they all refer to white noise as 95% of the spikes in the ACF lied within $\pm 2/\sqrt { T }$ for all three time series. Here, $T$ is the length of the time series.

Exercise 8.2

A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

ggplotly(autoplot(ibmclose))

ggtsdisplay(ibmclose)

Book defined, stationary time series as one whose properties do not depend on the time at which the series is observed. Thus, time series with trends, or with seasonality, are not stationary — the trend and seasonality will affect the value of the time series at different times. On the other hand, a white noise series is stationary — it does not matter when you observe it, it should look much the same at any point in time.

We can see from above plots, the time series has clear seasonality and downward trend which makes it non-stationary. ACF plot shows that 95% of the spikes in the ACF doesn’t lie within $2/ $ where $T$ is the length of the time series, which makes the series non-stationary. PACF plot shows that there is a strong correlation among time series and their 1 lagged values. Thus, IBM stock can be predicted by 1 lagged values and they aren’t stationary.

Here, differencing can help stabilise the mean of a time series by removing changes in the level of a time series, and therefore eliminating (or reducing) trend and seasonality. So, we can make above time series stationary using differencing.

ggtsdisplay(ibmclose %>% log()  %>% diff(1))

Exercise 8.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

a.`usnetelec`

ggplotly(autoplot(usnetelec))

The time series has clear upward linear trend. We should be able to make the series stationary only using differencing. We can use ndiffs function to get the order of differencing.

gg_ts_plot <- function(ts, title){
  grid.arrange( 
    autoplot(ts) +
      ggtitle(title) +
      theme(axis.title = element_blank()),
    grid.arrange(
      ggAcf(ts) + 
        ggtitle(element_blank()), 
      ggPacf(ts) + 
        ggtitle(element_blank()), ncol = 2), nrow = 2)
}

gg_ts_plot(diff(usnetelec, ndiffs(usnetelec)), paste0("Differenced US Net Electricity Generation (lag=", ndiffs(usnetelec), ")"))

The time series need to be differenced by 1
Let’s check if the data is stationary.

is_stationary <- function(data){
  results <- kpss.test(data)
  if (results$p.value > 0.05){
    "Data IS Stationary"
  } else {
    "Data IS NOT Stationary"
  }
}
cat(is_stationary(diff(usnetelec, ndiffs(usnetelec))))

## Warning in kpss.test(data): p-value greater than printed p-value

## Data IS Stationary

b.`usgdp`

ggplotly(autoplot(usnetelec))

The time series has clear linear upward trend. We should be able to make the series stationary using bix-cox and order of differencing.

usgdp_lambda <- BoxCox.lambda(usgdp)
boxcox_usgdp <- BoxCox(usgdp, lambda = usgdp_lambda)
gg_ts_plot(diff(boxcox_usgdp, ndiffs(boxcox_usgdp)), paste0("Differenced Quarterly US GDP (lag=", ndiffs(boxcox_usgdp), ", lambda=",round(usgdp_lambda,2),")"))

The Box Cox transformation with a lambda of 0.37 and a differencing of 1

Let’s check if the series became stationary:

cat(is_stationary(diff(boxcox_usgdp, ndiffs(boxcox_usgdp))))

## Warning in kpss.test(data): p-value greater than printed p-value

## Data IS Stationary

c. `mcopper`

ggplotly(autoplot(mcopper))

The time series has the spike in the late 2000’s

mcopper_lambda <- BoxCox.lambda(mcopper)
boxcox_mcopper <- BoxCox(mcopper, lambda = mcopper_lambda)
gg_ts_plot(diff(boxcox_mcopper, ndiffs(boxcox_mcopper)), paste0("Differenced Monthly Grade A Copper Prices (lag=", ndiffs(boxcox_mcopper), ", lambda=",round(mcopper_lambda,2),")"))

The Box Cox transformation with a lambda of 0.19 and a differencing of 1.

Let’s check if the series became stationary:

cat(is_stationary(diff(boxcox_mcopper, ndiffs(boxcox_mcopper))))

## Warning in kpss.test(data): p-value greater than printed p-value

## Data IS Stationary

d. `enplanements`

ggplotly(autoplot(enplanements))

We can see that from above that variance is not constant. We will need a Box-Cox transformation and differencing to make it stationary.

enplanements_lambda <- BoxCox.lambda(enplanements)
boxcox_enplanements <- BoxCox(enplanements, lambda = enplanements_lambda)
gg_ts_plot(diff(boxcox_enplanements, ndiffs(boxcox_enplanements)), paste0("Differenced Monthly US Domestic Enplanements (lag=", ndiffs(boxcox_enplanements), ", lambda=",round(enplanements_lambda,2),")"))

The Box Cox transformation with a lambda of -0.23 and a differencing of 1 (1st order).

Let’s check if the series became stationary:

cat(is_stationary(diff(boxcox_enplanements, ndiffs(boxcox_enplanements))))

## Warning in kpss.test(data): p-value greater than printed p-value

## Data IS Stationary

e. `visitors`

ggplotly(autoplot(visitors))

We can see that from above that the time series has seasonality with a upward trend. We will need a Box-Cox transformation and differencing to make it stationary.

visitors_lambda <- BoxCox.lambda(visitors)
boxcox_visitors <- BoxCox(visitors, lambda = visitors_lambda)
gg_ts_plot(diff(boxcox_visitors, ndiffs(boxcox_visitors)), paste0("Differenced Monthly Australian Overseas Visitors (lag=", ndiffs(boxcox_visitors), ", lambda=",round(visitors_lambda,2),")"))

The Box Cox transformation had a lambda of 0.28 and a differencing of 1 (1st order)

Let’s check if the series became stationary:

cat(is_stationary(diff(boxcox_visitors, ndiffs(boxcox_visitors))))

## Warning in kpss.test(data): p-value greater than printed p-value

## Data IS Stationary

Excercise 8.5

For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349873A"],
  frequency=12, start=c(1982,4))
ggplotly(autoplot(myts))

myts_lambda <- BoxCox.lambda(myts)
boxcox_retail <- BoxCox(myts, lambda = myts_lambda)
gg_ts_plot(diff(boxcox_retail, ndiffs(boxcox_retail)), paste0("Differenced Monthly Retail Sales (lag=", ndiffs(boxcox_retail), ", lambda=",round(myts_lambda,2),")"))

The Box Cox transformation had a lambda of 0.13 and a differencing of 1 (1st order).

Let’s check if the series became stationary:

cat(is_stationary(diff(boxcox_retail, ndiffs(boxcox_retail))))

## Warning in kpss.test(data): p-value greater than printed p-value

## Data IS Stationary

Excercise 8.6

Use R to simulate and plot some data from simple ARIMA models.

a.Use the following R code to generate data from an AR(1) model with $ϕ_1 = 0.6$ and $σ_2 = 1$. The process starts with $y_1=0$.

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]

Let’s create a function so that we can change $ϕ_1$

set.seed(123)
ts_AR1 <- function(phi) {
    y <- ts(numeric(100))
    e <- rnorm(100)
    for(i in 2:100)
      y[i] <- phi*y[i-1] + e[i]
    return (y)
}

b.Produce a time plot for the series. How does the plot change as you change $ϕ_1$?

Time plot for the series with $ϕ_1 = 0.6$

ts_AR1(0.6) %>% ggtsdisplay()

Time plot for the series with $ϕ_1 = 1$

ts_AR1(1) %>% ggtsdisplay()

Time plot for the series with $ϕ_1 = 0$

ts_AR1(0) %>% ggtsdisplay()

We can see that if $ϕ_1$ increases the distance from zero also increses. As a result, it increases autocorrelation.

c.Write your own code to generate data from an MA(1) model with $θ_1=0.6$ and $σ_2=1$.

ts_MA1 <- function(theta){
  set.seed(123)
  y <- ts(numeric(100))
  e <- rnorm(100)
  e[1] <- 0
  for(i in 2:100){
    y[i] <- theta*e[i-1] + e[i]
  }
  return(y)
}

d.Produce a time plot for the series. How does the plot change as you change $θ_1$?

p <- autoplot(ts_MA1(0.6))
for(theta in seq(0.1, 0.9, 0.1)){
  p <- p + autolayer(ts_MA1(theta), series = paste(theta))
}
p +
  labs(title="The effects of changing Theta", color = "Theta") +
  theme(axis.title = element_blank(), legend.position = "bottom") +
  scale_color_brewer(palette = "Set1")

It is difficult from above to see effects of changing $ϕ_1$, we can see the distance from zero increase if $ϕ_1$ increase if we look closely.

e.Generate data from an ARMA(1,1) model with $ϕ_1=0.6$, $θ_1=0.6$ and $σ_2=1$.

ts_ARMA <- function(phi, theta){
  set.seed(123)
  y <- ts(numeric(100))
  e <- rnorm(100)
  e[1] <- 0
  for(i in 2:100)
    y[i] <- phi*y[i-1] + theta*e[i-1] + e[i]
  return(y)
}

f.Generate data from an AR(2) model with $ϕ_1=−0.8$, $ϕ_2=0.3$ and $σ_2=1$. (Note that these parameters will give a non-stationary series.)

ts_AR2 <- function(phi_1, phi_2){
  set.seed(123)
  y <- ts(numeric(100))
  e <- rnorm(100)
  for(i in 3:100)
    y[i] <- phi_1*y[i-1] + phi_2*y[i-2] + e[i]
  return(y)
}

g.Graph the latter two series and compare them.

autoplot(ts_ARMA(0.6, 0.6), series = "ARMA(1,1)") +
  autolayer(ts_AR2(-0.8, 0.3), series = "AR(2)") +
  theme(axis.title = element_blank(), legend.position = "bottom", legend.title = element_blank()) +
  scale_color_brewer(palette = "Set1")

Question 8.7

Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.

a.By studying appropriate graphs of the series in R, find an appropriate ARIMA(p,d,q) model for these data.

ggplotly(autoplot(wmurders) +
  ggtitle("Women Murdered in the U.S.") +
  theme(axis.title = element_blank()))

We can see from above the series has both upward and downward trend, thus we need to differenced to make the series stationary before moving forward.

gg_ts_plot(diff(wmurders, ndiffs(wmurders)), "Differenced Women Murdered in the U.S.")

High first value in ACF indicates that the d is 1 and first/only high value of PACF indicates that p shoyld be 1 in our ARIMA model.

arima(wmurders, order = c(1, 2, 1))$aic

## [1] -6.879768

b.Should you include a constant in the model? Explain.

No. If we include a constant in a non-seasonal ARIMA model it will include a polynomial trend of order d in the forcasting function which might introduce a drift. Thus, i wouldn’t include a constant in this case.

c.Write this model in terms of the backshift operator.

$(1-\phi_1B) (1-B)^2y_t = c + (1 + \theta_1B)e_t$

d.Fit the model using R and examine the residuals. Is the model satisfactory?

arima <- arima(wmurders, order = c(1, 2, 1))
checkresiduals(arima)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,2,1)
## Q* = 12.419, df = 8, p-value = 0.1335
## 
## Model df: 2.   Total lags used: 10

We can see from ACF that autocorrelations lied within acceptable limit which indicates residuals are white noise. The p-value from Ljung-Box test also indiate same.

e.Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.

forecast(arima, h=3) %>%
  kable() %>%
  kable_styling()

	Point Forecast	Lo 80	Hi 80	Lo 95	Hi 95
2005	2.470660	2.200091	2.741229	2.056861	2.884459
2006	2.363106	1.993529	2.732684	1.797886	2.928327
2007	2.252833	1.774677	2.730989	1.521557	2.984110

f.Create a plot of the series with forecasts and prediction intervals for the next three periods shown.

autoplot(forecast(arima, h=3))

g.Does auto.arima() give the same model you have chosen? If not, which model do you think is better?

auto.arima(wmurders)

## Series: wmurders 
## ARIMA(1,2,1) 
## 
## Coefficients:
##           ar1      ma1
##       -0.2434  -0.8261
## s.e.   0.1553   0.1143
## 
## sigma^2 estimated as 0.04632:  log likelihood=6.44
## AIC=-6.88   AICc=-6.39   BIC=-0.97

Yes
auto.arima() gives the same model that I have chosen.

DATA 624: Home Work 06

DATA 624: Home Work 06

Libraries

Exercise 8.1

Exercise 8.2

Exercise 8.3

a.usnetelec

b.usgdp

c. mcopper

d. enplanements

e. visitors

Excercise 8.5

Excercise 8.6

Question 8.7

a.`usnetelec`

b.`usgdp`

c. `mcopper`

d. `enplanements`

e. `visitors`