Data 624 Homework 6

Homework Assignment 6

8.1

a. Explain the differences among these figures. Do they all indicate that the data are white noise?

Each of the ACF plots indicate the data are white noise because over 95% of the lags fall within the AFC boundaries; there are no autocorrelations lying outside the 95% limits. The difference between the plots is the lines give the values beyond which the autocorrelations are (statistically) significantly different from zero.

b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

In small sample conditions, the test for critical value may be overly conservative such that the null hypothesis is rejected less often than indicated by the chosen significance level. As the sample size gets larger, the critical value becomes smaller, thus the critical values are different.

8.2

plot(ibmclose)

ggtsdisplay(ibmclose)

ACF plot shows that the autocorrelation values are bigger than critical value and decrease slowly.

my_afc <- acf(ibmclose, plot=F)
my_afc

## 
## Autocorrelations of series 'ibmclose', by lag
## 
##     0     1     2     3     4     5     6     7     8     9    10    11 
## 1.000 0.993 0.986 0.978 0.971 0.964 0.956 0.948 0.939 0.930 0.922 0.914 
##    12    13    14    15    16    17    18    19    20    21    22    23 
## 0.906 0.897 0.889 0.881 0.872 0.863 0.853 0.843 0.832 0.821 0.810 0.799 
##    24    25 
## 0.787 0.775

Above, we can see that the autocorrelations change over time, thus the graph is non-stationary. The PACF graph shows that there is is a strong correlation between the IBM stock and the first lag. Differencing is a way to remove a “random” trend. To do this, we take the difference between a value and the one for the time period immediately previous to it.

8.3

a.

ggtsdisplay(usnetelec)

This graph shows the data to be linearly increasing. Having no seasonality to the data but having the ACF plot show that the autocorrelation values are bigger than critical value and a decrease, we know that differencing is the type of transformation needed to make this data stationary.

diff.usnetelect <- diff(diff(usnetelec))
ggtsdisplay(diff.usnetelect)

#####b.

ggtsdisplay(usgdp)

Looks similar to the usnetelec data. As above, we will just be doing differencing on this data.

diff.usgdp <- diff(diff(usgdp))
ggtsdisplay(diff.usgdp)

there appears to still be some trend in the data, let’s check

ndiffs(usgdp)

## [1] 2

let’s difference the data again.

d2usgdp<-diff(diff.usgdp)
ggtsdisplay(d2usgdp)

There still appears to be a trend. I will try a third difference.

d3usgdp <- diff(usgdp, 3)
ggtsdisplay(d3usgdp)

because I am not certain if this is correct, I will try the box.test

Box.test((d3usgdp), type = "Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  (d3usgdp)
## X-squared = 170.1, df = 1, p-value < 2.2e-16

kpss.test(d3usgdp)

## Warning in kpss.test(d3usgdp): p-value smaller than printed p-value

## 
##  KPSS Test for Level Stationarity
## 
## data:  d3usgdp
## KPSS Level = 1.8465, Truncation lag parameter = 4, p-value = 0.01

The kpss test result shows that differencing twice was enough to make the data stationary.

c.

ggtsdisplay(mcopper)

The mcopper data appears to have an increasing trend in the data. It appears that there might be slightly seasonality to the data, so I will perform a box-cox transformation on the data before differencing it.

lambda_mcopper <- BoxCox.lambda(mcopper)
lambda_mcopper_diff <- diff(BoxCox(mcopper,lambda_mcopper))
ggtsdisplay(lambda_mcopper_diff)

The data now looks mostly stationary after the first lag. I’m not sure how to work with that.

d.

ggtsdisplay(enplanements)

This dataset has clear seasonality and increase over time. This dataset will need a boxcox transformation and seasonal differencing

lambda_enplanements <- BoxCox.lambda(enplanements)

ndiffs(enplanements)

## [1] 1

nsdiffs(enplanements)

## [1] 1

lambda_enplanements <- BoxCox.lambda(enplanements)
lambda_enplanements_diff <- diff(BoxCox(enplanements,lambda_enplanements), lag = 12)
ggtsdisplay(lambda_enplanements_diff)

kpss.test(lambda_enplanements_diff)

## 
##  KPSS Test for Level Stationarity
## 
## data:  lambda_enplanements_diff
## KPSS Level = 0.36169, Truncation lag parameter = 5, p-value =
## 0.09367

e.

ggtsdisplay(visitors)

This data set is similar to the above data set (enplaments). I will preform a similar transformation to the data.

lambda_visitors <- BoxCox.lambda(visitors)
lambda_visitors_diff <- diff(BoxCox(visitors,lambda_visitors), lag = 12)
ggtsdisplay(lambda_visitors_diff)

kpss.test(lambda_visitors_diff)

## 
##  KPSS Test for Level Stationarity
## 
## data:  lambda_visitors_diff
## KPSS Level = 0.70984, Truncation lag parameter = 4, p-value =
## 0.01265

The kpss test shows that the transformation has made the data stationary.

8.5

retaildata <- readxl::read_excel('retail.xlsx', skip = 1)

## readxl works best with a newer version of the tibble package.
## You currently have tibble v1.4.2.
## Falling back to column name repair from tibble <= v1.4.2.
## Message displays once per session.

myts <- ts(retaildata[,'A3349399C'], frequency = 12, start = c(1982,4))
ggtsdisplay(myts)

lambda_myts <- BoxCox.lambda(myts)
lambda_myts_diff <- diff(BoxCox(myts,lambda_myts), lag = 12)
ggtsdisplay(lambda_myts_diff)

kpss.test(lambda_myts_diff)

## Warning in kpss.test(lambda_myts_diff): p-value greater than printed p-
## value

## 
##  KPSS Test for Level Stationarity
## 
## data:  lambda_myts_diff
## KPSS Level = 0.3465, Truncation lag parameter = 5, p-value = 0.1

8.6

a.

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100){
   y[i] <- 0.6*y[i-1] + e[i]
}

b.

ts1 <- function(x) {
  y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100){
   y[i] <- x*y[i-1] + e[i]
}
return(y)
}

autoplot(ts1(0.6)) +
  geom_line(colour = "blue") +
  autolayer(ts1(0.9)) +
  autolayer(ts1(0.4))

  ylab("AR(1) models")

## $y
## [1] "AR(1) models"
## 
## attr(,"class")
## [1] "labels"

as Phi increases, the variation of y increases

c.

ts2 <- function(x2) {
  y <- ts(numeric(100))
  e <- rnorm(100, 1)
  for(i in 2:100){
    y[i] <- x2*e[i-1] + e[i]
  }
return(y)
}

d.

autoplot(ts2(0.6)) +
 geom_line(colour = "blue") +
  autolayer(ts1(0.9)) +
  autolayer(ts1(0.4))

  ylab("MA(1) models")

## $y
## [1] "MA(1) models"
## 
## attr(,"class")
## [1] "labels"

As with the first plot, as theta increases, the variation on y increases ####e.

y <- ts(numeric(100))
e <- rnorm(100,1)
for(i in 2:50)
  y[i] <- 0.6*y[i-1] + 0.6*e[i-1] + e[i]

f.

y2 <- ts(numeric(100))
e <- rnorm(100,1)
for(i in 3:50)
  y2[i] <- -0.8*y2[i-1] + 0.3*e[i-2] + e[i]

g.

autoplot(y) +
  geom_line(colour = "blue") +
  autolayer(y2)+
  ylab("MA(2) and ARMA(1,1) models")

8.7

a.

ggtsdisplay(wmurders)

There appears to be no seasonality to this data, so a difference will be the only transformation I make.

ndiffs(wmurders)

## [1] 2

diff_wmurders <- diff(wmurders, 2)
ggtsdisplay(diff_wmurders)

b.

c.

(1 - B)^2yt = (1 + theta1B + theta2B^2)et

d.

wmurders_rfit <- Arima(wmurders, 
                              order = c(0, 2, 2))
checkresiduals(wmurders_rfit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,2,2)
## Q* = 11.764, df = 8, p-value = 0.1621
## 
## Model df: 2.   Total lags used: 10

The histogram shows that the residuals of this model are mostly normally distributed. The ACF model shows us that the residuals are within the 95% confidence interval.

e.

ts3_wmurders <- forecast(
  wmurders_rfit, h = 3
)
ts3_wmurders

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005       2.480525 2.202620 2.758430 2.055506 2.905544
## 2006       2.374890 1.985422 2.764359 1.779250 2.970531
## 2007       2.269256 1.772305 2.766206 1.509235 3.029276

ts3_wmurders$mean

## Time Series:
## Start = 2005 
## End = 2007 
## Frequency = 1 
## [1] 2.480525 2.374890 2.269256

years <- 3
e1 <- 2.480525
e2 <- 2.374890
e3 <- 2.269256

f.

autoplot(ts3_wmurders)

g.

wmurders_autoarima <- forecast(
  auto.arima(wmurders), h = 3
)

autoplot(wmurders_autoarima)

accuracy(ts3_wmurders)

##                      ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.0113461 0.2088162 0.1525773 -0.2403396 4.331729 0.9382785
##                     ACF1
## Training set -0.05094066

accuracy(wmurders_autoarima)

##                       ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.01065956 0.2072523 0.1528734 -0.2149476 4.335214 0.9400996
##                    ACF1
## Training set 0.02176343

Data 624 Homework 6

Natalie Mollaghan

3/24/2019

Homework Assignment 6

8.1

a. Explain the differences among these figures. Do they all indicate that the data are white noise?

b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

8.2

8.3

a.

c.

d.

e.

8.5

8.6

a.

b.

c.

d.

f.

g.

8.7

a.

b.

c.

d.

e.

f.

g.