Data 624 HW

Homework 6

Question 8.1

1.Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers. Explain the differences among these figures. Do they all indicate that the data are white noise?

These pictures show the correlation between different lags of the series (shown on the x-axis).
The y-axis (the correlation) has the same scale for each plot, but the x axis shows an increasing number of lags as the series gets longer.
If the data are white noise (random) then we expect the correlations to be below the blue line, which indicates a significant lag.
For all the plots, the correlations of the lags shown are all below the significance level so they are all indicitive of white noise.

b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The value of a significant correlation is where T is the number of data. From this we can see that as the number of data increase the value of a significant correlation decreases.

Question 8.2

A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

ACF plot shows that the autocorrelation values are bigger than critical value and decrease slowly.
Also, r1 is large(near to 1) and positive.
It means that the IBM stock data are non-stationary(that is, predictable using lagged values).
PACF plot shows that there is a strong correlation between IBM stock data and their 1 lagged values.
It means that IBM stock data can be predicted by 1 lagged values and they aren’t stationary.
To get stationary data, IBM stock data need differencing.
Differencing can help stabilize the mean of a time series by removing changes in the level of a time series.
Therefore it will eliminate or reduce trend and seasonality where the effect can make non-staionary data stationary.

Time plots reveals the series is stationary, in addition looking at ACF and PACF plot reveals that 95% of spikes are within the bound. The time series is stationary with one differencing.

Question 8.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

usnetelec
usgdp
mcopper
enplanements
visitors

ggtsdisplay(usnetelec)

usnetelec_bc <- BoxCox(usnetelec, lambda=BoxCox.lambda(usnetelec))
ndiffs(usnetelec_bc)

## [1] 2

usnetelec_bc_diff <- diff(diff(usnetelec_bc))
ndiffs(usnetelec_bc_diff)

## [1] 0

ggtsdisplay(usnetelec_bc_diff)

Data is stationary and trend is removed.

ggtsdisplay(usgdp)

usgdp_bc <- BoxCox(usgdp, lambda=BoxCox.lambda(usgdp))
ndiffs(usgdp_bc)

## [1] 1

usgdp_bc_diff <- diff(usgdp_bc)
ndiffs(usgdp_bc_diff)

## [1] 0

ggtsdisplay(usgdp_bc_diff)

95% of spikes are within the bounds after one order differencing, data is stationary.

ggtsdisplay(mcopper)

mcopper_bc <- BoxCox(mcopper, lambda=BoxCox.lambda(mcopper))
ndiffs(mcopper_bc)

## [1] 1

mcopper_bc_diff <- diff(mcopper_bc)
ndiffs(mcopper_bc_diff)

## [1] 0

ggtsdisplay(mcopper_bc_diff)

Time series is stationary affter BOxCox and first differencing.

ggtsdisplay(enplanements)

enplanements_bc <- BoxCox(enplanements, lambda=BoxCox.lambda(enplanements))
ndiffs(enplanements_bc)

## [1] 1

nsdiffs(enplanements_bc)

## [1] 1

enplanements_bc_diff <- diff(enplanements_bc)
enplanements_bc_diff1 <- diff(enplanements_bc_diff, lag=12)
ndiffs(enplanements_bc_diff1)

## [1] 0

nsdiffs(enplanements_bc_diff1)

## [1] 0

ggtsdisplay(enplanements_bc_diff1)

Time plot clearly shows the stationarity and 95% of the spikes are within the bounds.

ggtsdisplay(visitors)

visitors_bc <- BoxCox(visitors, lambda=BoxCox.lambda(visitors))
ndiffs(visitors_bc)

## [1] 1

nsdiffs(visitors_bc)

## [1] 1

visitors_bc_diff <- diff(visitors_bc)
visitors_bc_diff1 <- diff(visitors_bc_diff, lag=12)

ggtsdisplay(visitors_bc_diff1)

5. For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

retaildata <- readxl::read_excel("retail.xlsx", skip=1)

myts <- ts(retaildata[,"A3349335T"],frequency=12, start=c(1982,4))

autoplot(myts)

myts_bc <- BoxCox(myts, lambda= BoxCox.lambda(myts))
ggtsdisplay(myts_bc)

myts_bc1 <- myts_bc %>% diff()

myts_bc1 %>% ndiffs()

## [1] 0

myts_bc1%>%ggtsdisplay()

myts_bc2 <-myts_bc1%>% diff(lag=12)

myts_bc2%>%ggtsdisplay()

ndiffs(myts_bc2)

## [1] 0

nsdiffs(myts_bc2)

## [1] 0

Data is stationary.

6. Use R to simulate and plot some data from simple ARIMA models.

a. Use the following R code to generate data from an AR(1) model with ??1=0.6 and ??2=1. The process starts with y1=0.

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]

b. Produce a time plot for the series. How does the plot change as you change ??1?

library(ggplot2)
library(gridExtra)
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
p1 <- autoplot(y) +ggtitle("phi=0.6")


for(i in 2:100)
  y[i] <- 0.1*y[i-1] + e[i]
p2<- autoplot(y)+ggtitle(" phi=0.1")




for(i in 2:100)
  y[i] <- 1.0*y[i-1] + e[i]
p3<- autoplot(y)+ggtitle("phi= 1.0")

grid.arrange(p1,p2,p3)

There are more flucuations and spikes become more steep if we decrease the phi value, however increase in the value reverses the effect resulting in smoothing the plot a little.

c. Write your own code to generate data from an MA(1) model with ??1=0.6 and ??2=1.

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*e[i-1] + e[i]

d. Produce a time plot for the series. How does the plot change as you change ??1?

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*e[i-1] + e[i]
p1 <- autoplot(y) +ggtitle("theta= 0.6")



y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.1*e[i-1] + e[i]

p2 <- autoplot(y) +ggtitle(" theta=0.1")



y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 1.0*e[i-1] + e[i]
p3 <- autoplot(y) +ggtitle("theta= 1.0")


grid.arrange(p1,p2,p3)

e. Generate data from an ARMA(1,1) model with ??1=0.6, ??1=0.6 and ??2=1.

# e arima(1)
y_ar1 <- ts(numeric(100))
e <- rnorm(100)

#phi =0.6, theta =0.6, sigma^2 = 1
for(i in 2:100)
  y_ar1[i] <- 0.6*y_ar1[i-1]+ 0.6*e[i-1] + e[i]

f. Generate data from an AR(2) model with ??1=???0.8, ??2=0.3 and ??2=1. (Note that these parameters will give a non-stationary series.)

# f arima(2)
y_ar2 <- ts(numeric(100))
e <- rnorm(100)


for(i in 3:100)
  y_ar2[i] <- -0.8*y_ar2[i-1]+ 0.3*y_ar2[i-2] + e[i]

g. Graph the latter two series and compare them.

ggtsdisplay(y_ar1)

ggtsdisplay(y_ar2)

Data 624 HW_6 MK