HW6

8.1 Explain the differences among these figures.
8.3 For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
8.5 For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
8.6 Use R to simulate and plot some data from simple ARIMA models.
8.7 Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.

8.1, 8.2, 8.3, 8.5., 8.6, 8.7

8.1 Explain the differences among these figures.

Do they all indicate that the data are white noise?
Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise? Time series that show no autocorrelation (linear relationship between lagged values) are considered white noise.For a white noise series, we expect 95% of the spikes in the ACF to lie within ±2/√T where T is the length of the time series. If 5% or more of the spikes are outside the blue lines then it is not white noise. As the T gets larger, the space between the dotted blue lines will get smaller, therefore the first graph shows the smallest T and the third graphs shows the largest T. In addition, the graphs have less than 5% of the spikes outside the bounded blue lines. In addition, the data is stationary as the ACF drops to zero relatively quickly in all 3 graphs. The critical values are at different distances from the mean of zero since this is random data with no apparent seasonality or trend. In addition, the autocorrelations are different in each graph due to the differing random number sets (36 random, 360 random, 1000 random). Each set of random numbers will differ and have different peaks and troughs.

##8.2 A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

## ── Attaching packages ────────────────────────────────────────────────────────── fpp3 0.3 ──

## ✓ tibble      3.0.3     ✓ tsibble     0.9.2
## ✓ dplyr       1.0.2     ✓ tsibbledata 0.2.0
## ✓ tidyr       1.1.2     ✓ feasts      0.1.5
## ✓ lubridate   1.7.9     ✓ fable       0.2.1
## ✓ ggplot2     3.3.2

## ── Conflicts ───────────────────────────────────────────────────────────── fpp3_conflicts ──
## x lubridate::date()   masks base::date()
## x dplyr::filter()     masks stats::filter()
## x tsibble::interval() masks lubridate::interval()
## x dplyr::lag()        masks stats::lag()

## Loading required package: forecast

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

## Loading required package: fma

## Loading required package: expsmooth

## 
## Attaching package: 'seasonal'

## The following object is masked from 'package:tibble':
## 
##     view

autoplot(ibmclose)+
    ylab("Closing Price") + xlab("Time")

ggtsdisplay(ibmclose)

A stationary time series is one whose properties do not depend on the time at which the series is observed. A stationary time series will not show seasonality or trend and will have no predictable long term pattern such as the time series plot of IBM closing stock prices. The ACF lag plot shows non-stationary data as the ACF decreased towards 0 very gradually. Stationary data will have an ACF that drops to 0 relatively quickly. PACF charts show the relationship between yt and yt-k after removing the effects of lags 1,2,3,…,k-1. The data may follow an ARIMA( p, d ,0) model if the ACF and PACF plots of the differenced data show the following patterns: the ACF is exponentially decaying or sinusoidal; there is a significant spike at lag pin the PACF, but none beyond lag p

8.3 For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

usnetelec

autoplot(usnetelec)

ggtsdisplay(usnetelec)

The time series does not appear to be seasonal, but has an overall increased trend.

usnetelec%>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 1.464 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

The test statistic is much bigger than the 1% critical value, indicating that the null hypothesis is rejected. That is, the data are not stationary. We can difference the data, and apply the test again.

lambda <- BoxCox.lambda(usnetelec)
print(lambda)

## [1] 0.5167714

autoplot(BoxCox(usnetelec, lambda)) +
  ggtitle("BoxCox Transformation")

The box cox transformation does not appear to help at all.

usnetelec%>% diff() %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.1585 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

After differencing is applied, the test statistic is now well within the range we would expect for stationary data. So we can conclude that the differenced data are stationary.

ndiffs(usnetelec)

## [1] 1

A differencing of 1 is appropriate for the usnetelec data set.

usgdp

autoplot(usgdp)

ggtsdisplay(usgdp)

lambda <- BoxCox.lambda(usgdp)
print(lambda)

## [1] 0.366352

usgdpBC <- BoxCox(usgdp, lambda)
autoplot(BoxCox(usgdp, lambda)) +
  ggtitle("BoxCox Transformation")

Performing a box cox transformation did reduce the curve in the original time series.

usgdp%>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 4.6556 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

usgdpBC%>% diff %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.2013 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

After applying a box cox transformation with lambda = 0.366352 and running the KPSS test, the test statistic is now well within the range we would expect for stationary data. So we can conclude that the differenced data with box cox transformation are now stationary.

ndiffs(BoxCox(usgdp, lambda))

## [1] 1

Using the ndiffs function, we see that the first differencing is appropriate.

mcopper

autoplot(mcopper)

ggtsdisplay(mcopper)

The ACF plot is exponentially decaying and the PACF plot shows a significant spike at lag(p) but none thereafter, therefore the data is non stationary.

mcopper%>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 6 lags. 
## 
## Value of test-statistic is: 5.01 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

mcopper%>% diff %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 6 lags. 
## 
## Value of test-statistic is: 0.1843 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

lambda <- BoxCox.lambda(mcopper)
print(lambda)

## [1] 0.1919047

mcopperbc <- BoxCox(mcopper, lambda)
autoplot(BoxCox(mcopper, lambda)) +
  ggtitle("BoxCox Transformation")

mcopperbc%>% diff %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 6 lags. 
## 
## Value of test-statistic is: 0.0573 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

After applying a box cox transformation with lambda = 0.1919047 and running the KPSS test, the test statistic is now well within the range we would expect for stationary data. So we can conclude that the differenced data with box cox transformation are now stationary.

ndiffs(BoxCox(mcopper, lambda))

## [1] 1

For mcopper data set, first differencing is appropriate.

enplanements

autoplot(enplanements)

ggtsdisplay(enplanements)

Enplanements data shows some seasonality and is therefore non stationary.

enplanements%>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 4.4423 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

The KPSS unit root test confirms that the data is not stationary.

enplanements%>% diff %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0086 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

By applying differencing without a box cox transformation, we are able to make the stationary.

ndiffs(enplanements)

## [1] 1

visitors

autoplot(visitors)

ggtsdisplay(visitors)

visitors%>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 4.6025 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

The KPSS test confirms that the visitors data is not stationary.

visitors%>% diff %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.0191 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

ndiffs(visitors)

## [1] 1

By applying first differencing to the visitors data set, we are able to now make the data stationary.

8.5 For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

Differencing involves computing the differences between consecutive values. This makes a non stationary plot stationary.

myts <- ts(retaildata[,"A3349413L"],
  frequency=12, start=c(1982,4))

autoplot(myts)

ggtsdisplay(myts)

The ACF lag plot of the retail data shows that the data is non-stationary as the ACF decreased towards 0 very gradually and there are some apparent seasonal trends and an overall increased trend.

lambda <- BoxCox.lambda(myts)
print(lambda)

## [1] 0.1606171

autoplot(BoxCox(myts, lambda)) +
  ggtitle("BoxCox Transformation")

mytsbc <- BoxCox(myts, lambda)

autoplot(diff(myts))

library(urca)
myts %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 5.842 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

myts %>%
  diff() %>%
  ur.kpss() %>%
  summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0338 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

After differencing, the retail data set is now stationary

mytsbc %>%
  diff() %>%
  ur.kpss() %>%
  summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0295 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

The addition of the box cox transformation further improves the stationary model

ndiffs(mytsbc)

## [1] 1

Using the ndiff function, first order differencing of the box cox transformation with lambda = 0.1606171 is appropriate.

8.6 Use R to simulate and plot some data from simple ARIMA models.

Arima (p,d,q) models: p=order of the autoregressive part; d=degree of first differencing involved; q=order of the moving average part.

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]

Produce a time plot for the series. How does the plot change as you changeϕ1?

autoplot(y)

arima1 <- function(x){
  y <- ts(numeric(100))
  e <- rnorm(100)
  for(i in 2:100)
    y[i] <- x*y[i-1] + e[i]
  return(y)
}
autoplot(arima1(0.1))

autoplot(arima1(0.3))

autoplot(arima1(0.9))

The lower the ϕ1 value is, the more stationary the time series becomes.

Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1.

A moving average model uses past forecast errors in a regression-like model.

MA1 <- function(x){
  y <- ts(numeric(100))
  e <- rnorm(100)
  for(i in 2:100)
    y[i] <- x*e[i-1] + e[i]
  return(y)
}
autoplot(MA1(0.6))

Produce a time plot for the series. How does the plot change as you change θ1?

autoplot(MA1(0.1))

autoplot(MA1(0.9))

Changing the parameters results in different time series patterns.

Generate data from an ARMA(1,1) model with ϕ1=0.6, θ1 =0.6 and σ2=1.

y <- ts(numeric(100))
e <- rnorm(100)

for(i in 2:100)
  y[i] <- 0.6*y[i-1] + 0.6*e[i-1] + e[i]

arima11 <- y

autoplot(arima11) +
  ggtitle('ARMA(1,1)')

Generate data from an AR(2) model with ϕ1=−0.8 ,ϕ2= 0.3 and σ2=1. (Note that these parameters will give a non-stationary series.)

y <- ts(numeric(100))
e <- rnorm(100)

for(i in 3:100)
  y[i] <- -0.8*y[i-1] + 0.3*y[i-2] + e[i]

ar2 <- y


autoplot(ar2) +
  ggtitle('AR2')

Graph the latter two series and compare them.

autoplot(arima11, series = "ARMA(1, 1)") +
  autolayer(ar2, series = "AR(2)") +
  ylab("y value") +
  guides(colour = guide_legend(title = "Model"))

ggtsdisplay(ar2)

ggtsdisplay(arima11)

THe Arima(1,1) model appears to look more like white noise than the AR(2) model based on the ACF and PACF plots.

8.7 Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.

By studying appropriate graphs of the series in R, find an appropriate ARIMA( p,d,q) model for these data.

autoplot(wmurders)

ggtsdisplay(wmurders)

The initial data follows similarities to an ARIMA (p,d,0) model due to: The ACF plot is exponentially decaying and the PACF plot shows a significant spike at lag(p) but none thereafter.

myts %>%
  ur.kpss() %>%
  summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 5.842 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

myts %>%
  diff() %>%
  ur.kpss() %>%
  summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0338 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

Performing the KPSS test, it confirms that after differencing is applied, the model is now stationary.

With differencing applied:

wmurders %>% ndiffs()

## [1] 2

The appropriate differencing is second order.But we will also work with first order differencing.

wmurders %>% diff() %>% ggtsdisplay()

With the differencing applied the ACF plot and PACF plot now show that an ARIMA(p,d,q) model is now appropriate.

(fit <- Arima(wmurders, order = c(0,1,2)))

## Series: wmurders 
## ARIMA(0,1,2) 
## 
## Coefficients:
##           ma1     ma2
##       -0.0660  0.3712
## s.e.   0.1263  0.1640
## 
## sigma^2 estimated as 0.0422:  log likelihood=9.71
## AIC=-13.43   AICc=-12.95   BIC=-7.46

checkresiduals(fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,2)
## Q* = 9.7748, df = 8, p-value = 0.2812
## 
## Model df: 2.   Total lags used: 10

An ARIMA(0,1,2) model is appropriate. The d=1 for first order differencing,

Should you include a constant in the model? Explain I would recommend the long term forecast going to a non zero constant instead of following a straight linear line. Following a straight linear line is usually not realiatic in real world data sets.
Write this model in terms of the backshift operator. Based on the text: (1−Byt)=c+(1+θ1B+θ2B2)εt
Fit the model using R and examine the residuals. Is the model satisfactory?

arima_012 <- Arima(wmurders, order=c(0,1,2))
checkresiduals(arima_012)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,2)
## Q* = 9.7748, df = 8, p-value = 0.2812
## 
## Model df: 2.   Total lags used: 10

The model is satisfactory.

Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.

forecast(arima_012, h=3)

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005       2.458450 2.195194 2.721707 2.055834 2.861066
## 2006       2.477101 2.116875 2.837327 1.926183 3.028018
## 2007       2.477101 1.979272 2.974929 1.715738 3.238464

Create a plot of the series with forecasts and prediction intervals for the next three periods shown.

fit %>% forecast(h=3) %>% autoplot(include=80)

*Does auto.arima() give the same model you have chosen? If not, which model do you think is better?

fit <- auto.arima(diff(wmurders), seasonal = FALSE, stepwise = FALSE)
fit

## Series: diff(wmurders) 
## ARIMA(0,1,3) 
## 
## Coefficients:
##           ma1     ma2      ma3
##       -1.0154  0.4324  -0.3217
## s.e.   0.1282  0.2278   0.1737
## 
## sigma^2 estimated as 0.04475:  log likelihood=7.77
## AIC=-7.54   AICc=-6.7   BIC=0.35

According to the auto arima function, an ARIMA (0,1,3) model is appropriate for the Wmurders dataset.

checkresiduals(fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,3)
## Q* = 10.548, df = 7, p-value = 0.1596
## 
## Model df: 3.   Total lags used: 10

The auto arima, gives a 0,1,3 model which does not appear to be as satisfactory as the 0,1,2 model based on the normal distribution of the residuals with the checkresiduals function.

HW6

Christina Kasman

10/11/2020

8.1 Explain the differences among these figures.

8.3 For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

8.5 For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

8.6 Use R to simulate and plot some data from simple ARIMA models.

8.7 Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.