Data 624 Week 8 Assignmnet

Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise?

Yes, they all indicate the data are white noise. Because the auto correlation are inside the threshold levels.

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

I think it is because of the number of samples. As the number of sample increases, the auto correlation gets better and correct. Because each lag would get more samples/observation and gets the acf better.

  1. Left: ACF for a white noise series of 36 numbers.
  2. Middle: ACF for a white noise series of 360 numbers.
  3. Right: ACF for a white noise series of 1,000 numbers.

For e.g lets experiment with the retail data we had before.

retaildata <- readxl::read_excel('C:\\Users\\charls.joseph\\Documents\\Cuny\\Data624\\week1\\retail.xlsx', skip=1)
myts <- ts(retaildata[,"A3349397X"],
  frequency=12, start=c(1982,4))

autoplot(myts) +
  xlab('Year') +
  ylab("Turn Over")

ACF with 10 samples in timeseries

acf(myts[1:10])

ACF with 100 samples in timeseries

acf(myts[1:100])

ACF with 200 samples in timeseries

acf(myts[1:200])

ACF with 300 samples in timeseries

acf(myts)

As you see, when the samples increases, the ACF gets better and could determine the trend and seasonailty pattern in the data.

8.2

A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

ggtsdisplay(ibmclose)

ACF is significant in the beginning and it is gradually decreasing. This means it has a trend. Since it doesnt have scallowed shape, It doesnt have seasonaility. This indicate this is not stationary. Non-stationary timeseries shouldn’t have seasoned or trend component.

ACF shows the correlation between lagged data as well as it indicate whether the time series is stationary or not. If the ACF is with in the threshold level, then it is stationary. This need a differencing to make it stationary to apply AR or MA models.

In the other side, PACF shows the correlation between current value and the previous values. When you model the timeseries data as as linear relation ship, it can be written as

y(t) = b0 + b1 * y(t-1) + b2 * y(t-2)

It shows the correlation between y(t) and y(t-1) & y(t) and y(t-2) independently.

From PACF, i dont see a big correlation between y(t) and y(t-1), but there is a correlation. y(t) and y(t-2), there is no correlation. There are intermittent correlation as the lags goes on.

PACF is used to determine the order for AR model. ACF is used to determine the order of MA model.

8.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

ggtsdisplay(usnetelec)

This time series has a linear upward trend, hence it is not stationary. In order to make it stationary, we can apply log or box-cox transformation to reduce the variance and apply differencing as well

Lets look at usnetelec dataset.

lambda <- BoxCox.lambda(usnetelec)
usnetelec_bc <- BoxCox(usnetelec, lambda)
ggtsdisplay(diff(usnetelec_bc) )

You can see the time box cox and differening make the time series stationary.

Lets look at usgdp dataset.

ggtsdisplay(usgdp)

lambda <- BoxCox.lambda(usgdp)
usgdp_bc <- BoxCox(usgdp, lambda)
ggtsdisplay(diff(usgdp_bc) )

ACF and Pacf shows that there is a still slight correlation for the first lag. We will do a null hyposis test using kpss.

library(urca)

usgdp_bc %>% diff() %>% ur.kpss %>% summary
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.2013 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

test stat is > .05 and we will do one more differencing.

ggtsdisplay(diff(diff(usgdp_bc) ))

usgdp_bc %>% diff() %>% diff() %>% ur.kpss %>% summary
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.012 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

We see that p-value is less than 0.05 and the data becomes stationary with 2 types differencing.

Lets look at mcopper dataset.

ggtsdisplay(mcopper)

lambda <- BoxCox.lambda(mcopper)
mcopper_bc <- BoxCox(mcopper, lambda)
ggtsdisplay(diff(mcopper_bc) )

Looking at the orginal plot, this data has no variance in the beginning and constant varience in the middle and a sudden change in the variance towards the end.

There is an upward trend and it is a stationary. After applying transformation and differencing, the data looks to be a stationary. But we will run the hypothesis test and confirm.

mcopper_bc %>% diff()  %>% ur.kpss %>% summary
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 6 lags. 
## 
## Value of test-statistic is: 0.0573 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

The p-value is 0.05 and we can consider this as a stationary now.

Lets look at enplanements

ggtsdisplay(enplanements)

lambda <- BoxCox.lambda(enplanements)
enplanements_bc <- BoxCox(enplanements, lambda)
ggtsdisplay(diff(enplanements_bc) )

enplanements_bc %>% diff()  %>% ur.kpss %>% summary
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0151 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

The p-value shows that the data became stationary after the tranformation and differecing.

Lets look at the visitors

ggtsdisplay(visitors)

lambda <- BoxCox.lambda(visitors)
visitors_bc <- BoxCox(visitors, lambda)
ggtsdisplay(diff(visitors_bc) )

visitors_bc %>% diff()  %>% ur.kpss %>% summary
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.0519 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

The p-value shows the data became stationary after the transformation and differenceing.

8.5

For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

The plot shows a changing variance and thus indicate a need of transformation. We will apply a tranformation and differencing

ggtsdisplay(myts)

lambda <- BoxCox.lambda(myts)
myts_bc <- BoxCox(myts, lambda)
ggtsdisplay(diff(myts_bc) )

myts_bc %>% diff()  %>% ur.kpss %>% summary
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0248 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

The p-value shows the differencing made the data stationary.

8.6

Use R to simulate and plot some data from simple ARIMA models.

a. Use the following R code to generate data from an AR(1) model with with ϕ1=0.6 and σ2=1. The process starts with y1=0.

y <- ts(numeric(100))
e <- rnorm(100, sd =1)
for(i in 2:100){
   y[i] <- 0.6*y[i-1] + e[i]
}

b. Produce a time plot for the series. How does the plot change as you change ϕ1 ?

fn1 <- function(ph1) {
set.seed(123)
y <- ts(numeric(100))
e <- rnorm(100, sd =1)
for(i in 2:100){
   y[i] <- ph1*y[i-1] + e[i]
}
return(y)
}

autoplot(fn1(0.6)) +
  geom_line(colour = "blue") +
  autolayer(fn1(0.9)) +
  autolayer(fn1(0.4))

When ϕ1 increases the variance increases.

c. Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1.

ma_1 <- function(ph1) {
set.seed(123)
y <- ts(numeric(100))
e <- rnorm(100, sd =1)
for(i in 2:100){
   y[i] <- ph1*e[i-1] + e[i]
}
return(y)
}

d. Produce a time plot for the series. How does the plot change as you change θ1?

autoplot(ma_1(0.6)) +
  geom_line(colour = "blue") +
  autolayer(ma_1(0.9)) +
  autolayer(ma_1(0.4))

e. Generate data from an ARMA(1,1) model with ϕ1=0.6, θ1=0.6 and σ2=1.

arma_1 <- function(ph1, theta1) {
set.seed(123)
y <- ts(numeric(100))
e <- rnorm(100, sd =1)
for(i in 2:100){
   y[i] <- ph1*y[i-1] + theta1*e[i-1] + e[i]
}
return(y)
}
data_arma_1 <- arma_1(0.6, 0.6)
autoplot(data_arma_1)

f. Generate data from an AR(2) model with ϕ1=−0.8, ϕ2=0.3 and σ2=1. (Note that these parameters will give a non-stationary series.)

ar_2 <- function(ph1, ph2) {
set.seed(123)
y <- ts(numeric(100))
e <- rnorm(100, sd =1)
for(i in 3:100){
   y[i] <- ph1*y[i-1] + ph2*y[i-2] + e[i]
}
return(y)
}
data_ar_2 <- ar_2(-0.8, 0.3)
autoplot(data_ar_2)

Graph the latter two series and compare them

The AR(2) generated data are obviously non-stationary and it has seasonality as well. The ARMA(1,1) generated data does not have this apparent seasonality and it appears to be random and much more stationary than the AR(2) data.

Below is the ACF and PACF for ARMA(1,1) and AR(2)

ggtsdisplay(data_arma_1)

ggtsdisplay(data_ar_2)

8.7

Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.

By studying appropriate graphs of the series in R, find an appropriate ARIMA( p,d,q) model for these data.

ggtsdisplay(wmurders, main='women murdered each year in USA')

Looking at the plot, we see that there is trend, but there is no seasonality. Seeing a difference in the variance, so i will perform a box cox transformation and difference to make it stationary.

ggtsdisplay(wmurders)

lambda <- BoxCox.lambda(wmurders)
wmurders_bc <- BoxCox(wmurders, lambda)
ggtsdisplay(diff(wmurders_bc) )

wmurders_bc %>% diff()  %>% ur.kpss %>% summary
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.5466 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739
ggtsdisplay(wmurders)

lambda <- BoxCox.lambda(wmurders)
wmurders_bc <- BoxCox(wmurders, lambda)
ggtsdisplay(diff(wmurders_bc) )

wmurders_bc %>% diff() %>% diff() %>% ur.kpss %>% summary
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.0532 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739
wmurders_bc <- BoxCox(wmurders, lambda)
wmurders_final <- diff(diff(wmurders_bc))
ggtsdisplay(wmurders_final)

There is two significant spikes in the ACF and and one significant spike in the PACF plots. The appropriate ARIMA model can be ARIMA(1,2,2).

  1. Should you include a constant in the model? Explain.

Since it is non seasonal and d = 2. So constand is not required.

  1. Write this model in terms of the backshift operator.

(1−ϕ1B)(1−B)2yt=c+(1-θ1B -θ2B^2 )εt, where c will be 0 if no constant included.

  1. Fit the model using R and examine the residuals. Is the model satisfactory?
(fit <- Arima(wmurders, order=c(1,2,2)))
## Series: wmurders 
## ARIMA(1,2,2) 
## 
## Coefficients:
##           ar1      ma1      ma2
##       -0.7677  -0.2812  -0.4977
## s.e.   0.2350   0.2879   0.2762
## 
## sigma^2 estimated as 0.04552:  log likelihood=7.38
## AIC=-6.75   AICc=-5.92   BIC=1.13
checkresiduals(fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,2,2)
## Q* = 9.6215, df = 7, p-value = 0.2111
## 
## Model df: 3.   Total lags used: 10

The ACF that the residuals are within critical limits. And the residuals histogram shows that the residuals are normally distributed and model is good to go.

  1. Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.
forecast(fit, h=3)
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005       2.534015 2.260584 2.807446 2.115838 2.952192
## 2006       2.404157 2.026793 2.781521 1.827029 2.981286
## 2007       2.331482 1.829669 2.833296 1.564025 3.098940
autoplot(forecast(fit, h=3))