Data 624 Assignment 6

8.1

Figure 8.31 Shows the ACFs for the 36 randmo numbers, 360 random numbers and 1,000 Random numbers

a.

Explain the differences among these figures. Do they all indicate that the data are white noise?

Each of these figures show a white noise series of different periods. The longer the time series, the smaller the scale for indicating whether the autocorrelations are significant. series 2 actually seems to show to lags that go above the significance threshold.

b.

Why are the critical values at different distances from the mean of zero? why are the autocorrelations different in each figure when they refer to white noise?

The critical values are different because of the length of the time series. the longer the time series, the more data points there are to increase confidence and lower the threshold for significance. the auto correlations are different because the series represents different lengths of time.

8.2

A classic example of a non-stationary series is the daily closing IBM stock and the ACF and PACF. Explain how each plot shows that the series is a non-stationary and shold be differenced.

ibmclose %>% autoplot()+
  defaulttheme+
  labs("IBM Closing Stock Price")

Acf(ibmclose)

pacf(ibmclose)

The autocorrelation of this plot shows that many the lags of the series are closely related to each other when considering the entire series. However, when evaluating these lags independently using a partial autocorrelation plot, the time series shows that

8.3

***For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

a.

exercise8.3<-
  function(data, name){
    #Find lambda for data
    lmbda<-round(BoxCox.lambda(data),2)
    
    #get time series for boxcoxed data
    bcox_data<-BoxCox(data,lmbda)
    
    #return differences of boxcoxed data
    diffs<-ndiffs(bcox_data)
    
    #return time series of differenced boxcoxed data
    data_bcox_diff<-diff(bcox_data,differences = diffs)
    
    return1<-
      data.frame(dataset = name, lambda = lmbda, differences = diffs)
    
    p1<-
      autoplot(data)+
      theme_bw()+
      labs(title = name,
           y= name)
    
    p2<-
      autoplot(bcox_data)+
      theme_bw()+
      labs(title = paste0("BoxCox transformed ", name),
           y = name)
    
    p3<- 
      autoplot(data_bcox_diff)+
      theme_bw()+
      labs(title = paste0("BoxCox and Differenced ", name),
           y = name)
    
    return2<-grid.arrange(p1,p2,p3, nrow=3)
    
    return(list(return1,return2))
    
  }

exercise8.3(usnetelec,"usnetelec")

[[1]]
    dataset lambda differences
1 usnetelec   0.52           2

[[2]]
TableGrob (3 x 1) "arrange": 3 grobs
  z     cells    name           grob
1 1 (1-1,1-1) arrange gtable[layout]
2 2 (2-2,1-1) arrange gtable[layout]
3 3 (3-3,1-1) arrange gtable[layout]

b.

exercise8.3(usgdp,"usgdp")

[[1]]
  dataset lambda differences
1   usgdp   0.37           1

[[2]]
TableGrob (3 x 1) "arrange": 3 grobs
  z     cells    name           grob
1 1 (1-1,1-1) arrange gtable[layout]
2 2 (2-2,1-1) arrange gtable[layout]
3 3 (3-3,1-1) arrange gtable[layout]

c.

exercise8.3(mcopper,"mcopper")

[[1]]
  dataset lambda differences
1 mcopper   0.19           1

[[2]]
TableGrob (3 x 1) "arrange": 3 grobs
  z     cells    name           grob
1 1 (1-1,1-1) arrange gtable[layout]
2 2 (2-2,1-1) arrange gtable[layout]
3 3 (3-3,1-1) arrange gtable[layout]

d.

exercise8.3(enplanements,"enplanements")

[[1]]
       dataset lambda differences
1 enplanements  -0.23           1

[[2]]
TableGrob (3 x 1) "arrange": 3 grobs
  z     cells    name           grob
1 1 (1-1,1-1) arrange gtable[layout]
2 2 (2-2,1-1) arrange gtable[layout]
3 3 (3-3,1-1) arrange gtable[layout]

e.

exercise8.3(visitors,"visitors")

[[1]]
   dataset lambda differences
1 visitors   0.28           1

[[2]]
TableGrob (3 x 1) "arrange": 3 grobs
  z     cells    name           grob
1 1 (1-1,1-1) arrange gtable[layout]
2 2 (2-2,1-1) arrange gtable[layout]
3 3 (3-3,1-1) arrange gtable[layout]

8.5

For your retail data (from exercise 3 in section 2.10), find the appropriate order of differencing (after transforming if necessary) to obtain stationary data.

library(httr)
url1<-"https://otexts.com/fpp2/extrafiles/retail.xlsx"
GET(url1, write_disk(tf <- tempfile(fileext = ".xlsx")))

Response [https://otexts.com/fpp2/extrafiles/retail.xlsx]
  Date: 2021-03-29 03:04
  Status: 200
  Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
  Size: 639 kB
<ON DISK>  C:\Users\REGIST~1\AppData\Local\Temp\RtmpcBZeYu\fileae586ec2d4a.xlsx

retaildata  <- readxl::read_excel(tf, skip = 1)


myts <- ts(retaildata[,"A3349873A"],
  frequency=12, start=c(1982,4))



exercise8.3(myts, "retail data")

[[1]]
      dataset lambda differences
1 retail data   0.13           1

[[2]]
TableGrob (3 x 1) "arrange": 3 grobs
  z     cells    name           grob
1 1 (1-1,1-1) arrange gtable[layout]
2 2 (2-2,1-1) arrange gtable[layout]
3 3 (3-3,1-1) arrange gtable[layout]

The appropriate order of differencing is shown in the table above.

8.6

a.

***Use the following R code to generate data from an AR(1) model with theta1 = 0.6 and signme squares = 1. the process starts with y1 = 0.

ex8.6a<- function(phi){
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100){
  y[i] <- phi*y[i-1] + e[i]}
return(y)
}

b.

Produce a time plot for the series. How does the plot change as you change phi?

The plot changes with phi by adjusting the number of autocorrelations in the dataset. lower phi values seem to difference the data better and decrease number of significant autocorrelations

p1a<-ex8.6a(0.6) %>% autoplot+defaulttheme+
  labs(title = "phi = 0.6")
p1b<-ggAcf(ex8.6a(0.6))+defaulttheme

p2a<-ex8.6a(0.3) %>% autoplot+defaulttheme+
  labs(title = "phi = 0.3")
p2b<-ggAcf(ex8.6a(0.3))+defaulttheme

p3a<-ex8.6a(0.9) %>% autoplot+defaulttheme+
  labs(title = "phi = 0.9")
p3b<-ggAcf(ex8.6a(0.9))+defaulttheme

grid.arrange(p1a,p2a,p3a,p1b,p2b,p3b, nrow=2)

c.

Write your own code to generate data from an MA(1) model with phi 1 = 0.6 and sigma squared = 1.

The function can be seen in the hidden code below

ex8.6c <- function(theta, sd=1, n=100){
  y <- ts(numeric(n))
  e <- rnorm(n, sd=sd)
  e[1] <- 0
  for(i in 2:n)
    y[i] <- theta*e[i-1] + e[i]
  return(y)
}

d.

Produce a time plot for the series. How does the plot change as you change theta 1?

The time plot adjusts with theta by changing the peaks of the time series. the data seems to get more compressed with lower theta’s based on the scales.

p1a<-ex8.6c(0.6) %>% autoplot+defaulttheme+
  labs(title = "theta = 0.6")
p1b<-ggAcf(ex8.6c(0.6))+defaulttheme

p2a<-ex8.6c(0.3) %>% autoplot+defaulttheme+
  labs(title = "theta = 0.3")
p2b<-ggAcf(ex8.6c(0.3))+defaulttheme

p3a<-ex8.6c(0.9) %>% autoplot+defaulttheme+
  labs(title = "theta = 0.9")
p3b<-ggAcf(ex8.6c(0.9))+defaulttheme

grid.arrange(p1a,p2a,p3a,p1b,p2b,p3b, nrow=2)

e.

Generate data from an ArMA(1,1) model with theta1 = 0.6, and sigma 2 =1. (Note that these parameters will give non-stationary series.)

The ARMA(1,1) model is presented below

y1 <- ts(numeric(100))
e <- rnorm(100, sd=1)
for(i in 2:100)
  y1[i] <- 0.6*y1[i-1] + 0.6*e[i-1] + e[i]
autoplot(y1) +
  ggtitle('ARMA(1,1)')+
  defaulttheme

f.

Generate data from an AR(2) model with -0.8, 0.3, and 1 (note that these parameters will give a non-stationary series.)

the two series are plotted below and are quite different. The ARMA 2 series shows a sinusoidal pattern that is increasing in its frequency while the ARMA 1,1 series shows a stationary dataset with two autocorrelations on the acf and three on the pacf.

y2 <- ts(numeric(100))
e <- rnorm(100, sd=1)
for(i in 3:100)
  y2[i] <- -0.8*y2[i-1] + 0.3*y2[i-2] + e[i]
autoplot(y2) +
  ggtitle('AR(2)')+
  defaulttheme

g.

Graph the latter two series and compare them.

ggtsdisplay(y1, main = 'ARMA(1,1)')

ggtsdisplay(y2, main = 'ARMA(2)')

8.7

a.

By studying appropriate graphs of the series in R, find an appropriate ARIMA(p,d,q) model for these data.

Lookin at the initial series, we need to difference this non-stationary data. the data is non-seasonal so we do not need to be concerned on any seasonality ARIMA components.

ggtsdisplay(wmurders)

after differencing this data, we see that it is still not stationary as shown below with a slight trend downwards and a bit of cyclicality

diff(wmurders, differences = 1) %>% ggtsdisplay()

after a second order of differencing, we may not move to the next step with this stationary time series

diff(wmurders, differences = 2) %>% ggtsdisplay()

The autocorrelations of this dataset show that we can assume an AR term of 2 based on the ACF which gradually descends towards 0.

We’ll guess a few values around that and test their BICs based on our initial assumption. The value with the lowest BIC will be the one assumed as the best model and the one used to produce forecasts.

Arima(wmurders, order = c(2,2,0))$bic

[1] 5.232945

Arima(wmurders, order = c(1,2,0))$bic

[1] 3.93715

Arima(wmurders, order = c(3,2,0))$bic

[1] 7.862588

Arima(wmurders, order = c(2,2,1))$bic

[1] 0.7628476

Arima(wmurders, order = c(1,2,1))$bic

[1] -0.9688923

Based on above, the best seems to be around arima(1,2,1)

b.

Should you include a constant in the model? Explain.

I do not believe a constant is necessary in this model as no drift term is necessary

d.

Fit the model using R and examining the residuals. Is the model satisfactory?

wmurdersArima <- Arima(wmurders, order = c(1, 2, 1))
checkresiduals(wmurdersArima)


    Ljung-Box test

data:  Residuals from ARIMA(1,2,1)
Q* = 12.419, df = 8, p-value = 0.1335

Model df: 2.   Total lags used: 10

Based on the residuals of this model showing white noise, the model is satisfactory

e.

Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated

wmurders_forecast <- forecast(wmurdersArima, h = 3)

wmurders_forecast$model

Series: wmurders 
ARIMA(1,2,1) 

Coefficients:
          ar1      ma1
      -0.2434  -0.8261
s.e.   0.1553   0.1143

sigma^2 estimated as 0.04632:  log likelihood=6.44
AIC=-6.88   AICc=-6.39   BIC=-0.97

years <- length(wmurders)
e <- wmurdersArima$residuals
fc1 <- 2*wmurders[years] - wmurders[years - 1] - 1.0181*e[years] + 0.1470*e[years - 1]
fc2 <- 2*fc1 - wmurders[years] + 0.1470*e[years]
fc3 <- 2*fc2 - fc1

c(fc1, fc2, fc3)

[1] 2.458155 2.332379 2.206602

wmurders_forecast$mean

Time Series:
Start = 2005 
End = 2007 
Frequency = 1 
[1] 2.470660 2.363106 2.252833

Our estimate is close to the model estimate and satisfactory. ### f.

Create a plot of the series with forecasts and prediction intervals for the next three periods shown.

autoplot(wmurders_forecast)+defaulttheme

g.

Does auto.arima() give the same model you have chosen? if not, which model do you think is better?

auto.arima(wmurders, approximation = F)

Series: wmurders 
ARIMA(1,2,1) 

Coefficients:
          ar1      ma1
      -0.2434  -0.8261
s.e.   0.1553   0.1143

sigma^2 estimated as 0.04632:  log likelihood=6.44
AIC=-6.88   AICc=-6.39   BIC=-0.97

Auto Arima selects the same model that I have chosen

Data 624 Assignment 6

Joshua Registe

3/28/2021

8.1

a.

b.

8.2

8.3

a.

b.

c.

d.

e.

8.5

8.6

a.

b.

c.

d.

e.

f.

g.

8.7

a.

b.

d.

e.

g.