Homework 6

1.Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

a. Explain the differences among these figures. Do they all indicate that the data are white noise?

The three plots are ACF plots for series of diferent numbers 36, 360 and 1000. The blue dashed lines are the the bounds (\[\pm \frac{\sqrt2}{T}\]), for the series to be white noise, we except 95% of spikes within the bounds. For all three cases, the 95% spikes are within the dashed lines. We expect each autocorrelation to be close to zero, and is true in all three cases, although in second plot, it seems there are significant spikes, but we ca consider them as less than 5%. All the plots indicate white noise.

b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each

figure when they each refer to white noise?

The critical values are defined by the formula (\[\pm \frac{\sqrt2}{T}\]) which depends on T (length of ime series). In all three plots, bound length decreases as the length of time series increases. For all three cases the critical value is different due to length of time series.

2. A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

ggtsdisplay(ibmclose)

From the autoplot it is clear that there is increasing and decreasing trend in the time series. ACF plots reveals significant spikes all over. We will apply the unit root test to see the order of differencing needs to be applied to the data to make it stationary.

ndiffs(ibmclose)

## [1] 1

ibmclose_diff1<- diff(ibmclose)
ggtsdisplay(ibmclose_diff1)

Time plots reveals the series is stationary, in addition looking at ACF and PACF plot reveals that 95% of spikes are within the bound. The time series is stationary with one differencing.

3. For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

a.usnetelec

ggtsdisplay(usnetelec)

#Time plot shows that there is increasing trend and data is non stationary.

usnetelec_bc <- BoxCox(usnetelec, lambda=BoxCox.lambda(usnetelec))
ndiffs(usnetelec_bc)

## [1] 2

## order 2 differencing required
usnetelec_bc_diff <- diff(diff(usnetelec_bc))
ndiffs(usnetelec_bc_diff)

## [1] 0

ggtsdisplay(usnetelec_bc_diff)

Data is stationary and trend is removed.

b. usgdp

ggtsdisplay(usgdp)

#Time plot shows that there is increasing trend and data is non stationary.

usgdp_bc <- BoxCox(usgdp, lambda=BoxCox.lambda(usgdp))
ndiffs(usgdp_bc)

## [1] 1

## order 1 differencing required
usgdp_bc_diff <- diff(usgdp_bc)
ndiffs(usgdp_bc_diff)

## [1] 0

ggtsdisplay(usgdp_bc_diff)

95% of spikes are within the bounds after one order differencing, data is stationary.

c. mcopper

ggtsdisplay(mcopper)

#Time plot shows that there is  trend and data is non stationary.

mcopper_bc <- BoxCox(mcopper, lambda=BoxCox.lambda(mcopper))
ndiffs(mcopper_bc)

## [1] 1

## order 1 differencing required
mcopper_bc_diff <- diff(mcopper_bc)
ndiffs(mcopper_bc_diff)

## [1] 0

ggtsdisplay(mcopper_bc_diff)

Time series is stationary affter BOxCox and first differencing.

d. enplanements

ggtsdisplay(enplanements)

#Time plot shows that there is  trend and data is non stationary. There seems to be seasonality also in the time series.

enplanements_bc <- BoxCox(enplanements, lambda=BoxCox.lambda(enplanements))
ndiffs(enplanements_bc)

## [1] 1

## order 1 differencing required

nsdiffs(enplanements_bc)

## [1] 1

## one order seasonal differencing

enplanements_bc_diff <- diff(enplanements_bc)
enplanements_bc_diff1 <- diff(enplanements_bc_diff, lag=12)
ndiffs(enplanements_bc_diff1)

## [1] 0

nsdiffs(enplanements_bc_diff1)

## [1] 0

ggtsdisplay(enplanements_bc_diff1)

Time plot clearly shows the stationarity and 95% of the spikes are within the bounds.

e. visitors

ggtsdisplay(visitors)

## Increasing trend in the data, seasonal pattern, non-contant variance.

visitors_bc <- BoxCox(visitors, lambda=BoxCox.lambda(visitors))
ndiffs(visitors_bc)

## [1] 1

nsdiffs(visitors_bc)

## [1] 1

visitors_bc_diff <- diff(visitors_bc)
visitors_bc_diff1 <- diff(visitors_bc_diff, lag=12)

ggtsdisplay(visitors_bc_diff1)

Time series data is stationary after fist order differencing and seasonal differencing. However it is better approach o check the order of differencing, some times applying seasonal differencing does no require first differencing.

5. For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

retaildata <- readxl::read_excel("C:/Users/Gurpreet/Documents/Data624/retail.xlsx", skip=1)

myts <- ts(retaildata[,"A3349335T"],frequency=12, start=c(1982,4))

autoplot(myts)

From the autoplot it is clear that there is increasing trend in the time series. The seasonality is also evident from the plot. The variance is increasing with time. In order to make the constant variance or to make the series stationary, we need to transform the data. We will use boxcox transformation.

myts_bc <- BoxCox(myts, lambda= BoxCox.lambda(myts))
ggtsdisplay(myts_bc)

After transforming the data, we can still see from ACF plot that data is approaching zero slowly

#myts_bc%>% ur.kpss()  %>% nsdiffs()
myts_bc1 <- myts_bc %>% diff()

myts_bc1 %>% ndiffs()

## [1] 0

myts_bc1%>%ggtsdisplay()

myts_bc2 <-myts_bc1%>% diff(lag=12)

myts_bc2%>%ggtsdisplay()

ndiffs(myts_bc2)

## [1] 0

nsdiffs(myts_bc2)

## [1] 0

Data is stationary.

6. Use R to simulate and plot some data from simple ARIMA models.

a. Use the following R code to generate data from an AR(1) model with ??1=0.6 and ??2=1. The process starts with y1=0.

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]

b. Produce a time plot for the series. How does the plot change as you change ??1?

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
p1 <- autoplot(y) +ggtitle("phi=0.6")


for(i in 2:100)
  y[i] <- 0.1*y[i-1] + e[i]
p2<- autoplot(y)+ggtitle(" phi=0.1")




for(i in 2:100)
  y[i] <- 1.0*y[i-1] + e[i]
p3<- autoplot(y)+ggtitle("phi= 1.0")

grid.arrange(p1,p2,p3)

There are more flucuations and spikes become more steep if we decrease the phi value, however increase in the value reverses the effect resulting in smoothing the plot a little.

c. Write your own code to generate data from an MA(1) model with ??1=0.6 and ??2=1.

#c MA(1) model
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*e[i-1] + e[i]

d. Produce a time plot for the series. How does the plot change as you change ??1?

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*e[i-1] + e[i]
p1 <- autoplot(y) +ggtitle("theta= 0.6")



y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.1*e[i-1] + e[i]

p2 <- autoplot(y) +ggtitle(" theta=0.1")



y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 1.0*e[i-1] + e[i]
p3 <- autoplot(y) +ggtitle("theta= 1.0")


grid.arrange(p1,p2,p3)

With the increase in theta, there are less fluctuations in the spikes.

e. Generate data from an ARMA(1,1) model with ??1=0.6, ??1=0.6 and ??2=1.

# e arima(1)
y_ar1 <- ts(numeric(100))
e <- rnorm(100)

#phi =0.6, theta =0.6, sigma^2 = 1
for(i in 2:100)
  y_ar1[i] <- 0.6*y_ar1[i-1]+ 0.6*e[i-1] + e[i]

f. Generate data from an AR(2) model with ??1=???0.8, ??2=0.3 and ??2=1. (Note that these parameters will give a non-stationary series.)

# f arima(2)
y_ar2 <- ts(numeric(100))
e <- rnorm(100)


for(i in 3:100)
  y_ar2[i] <- -0.8*y_ar2[i-1]+ 0.3*y_ar2[i-2] + e[i]

g. Graph the latter two series and compare them.

ggtsdisplay(y_ar1)

ggtsdisplay(y_ar2)

As expected model 1 AR(1) is better than both with lesser fluctuations, trends, and non-stationarity

7. Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.

a. By studying appropriate graphs of the series in R, find an appropriate ARIMA(\(p,d,q\)) model for these data.

ggtsdisplay(wmurders)

The plot shows there is an increasing trend in the beginning and then it decreases. To eiminate teh trend we will do the differencing.

wmurders_diff1<- diff(wmurders)
ggtsdisplay(wmurders_diff1)

The series seems to be stationary. There is one significant lag in ACF and PACF plots, we can assume that 95% are inside the region and the data is a white noise. Looking at the plot it seems we do not need second differencing. We can perform check using unit root test to see if differencing is required.

ndiffs(wmurders_diff1)

## [1] 1

We need to apply second order differencing as sugggested by unit root test.

wmurders_diff2<- diff(wmurders_diff1)

ggtsdisplay(wmurders_diff2)

ACF has significant lags at one and two. PACF has significant lag at one and five. PACF is decaying and there is no significant lag in ACF after 2. Our model will be ARIMA(0,2,2)

b. Should you include a constant in the model? Explain.

No, we should not include a constant. for d= 2, long term forecast will be a quadratic trend.For d>1, no constantis suggested as quadratic or higher order trend is usually dangerous while forecasting. https://robjhyndman.com/hyndsight/arimaconstants/

c. Write this model in terms of the backshift operator.

Backward notaion

(1-\(B^2\))yt = (1+\(\theta_{1}\)B + \(\theta_{2}\)\(B^2\))\(e^2 t\)

d. Fit the model using R and examine the residuals. Is the model satisfactory?

fc<- arima(wmurders_diff2,order=c(0, 2, 2))
checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,2,2)
## Q* = 39.558, df = 8, p-value = 3.873e-06
## 
## Model df: 2.   Total lags used: 10

From ACF plot, we can assume 95% residuals lies within the region, although do not seems to be normally distributed, we can say that the model is satisfactory.

e.Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.

fc_3 <- forecast(fc,h=3)

f.Create a plot of the series with forecasts and prediction intervals for the next three periods shown.

autoplot(fc_3)

g.Does auto.arima() give the same model you have chosen? If not, which model do you think is better?

fc_auto <- forecast(auto.arima(wmurders), h = 3)

accuracy(fc_3)

##                       ME      RMSE       MAE      MPE     MAPE     MASE
## Training set 0.004495644 0.3201679 0.2327397 182.5013 182.5013 0.521467
##                    ACF1
## Training set -0.6727248

accuracy(fc_auto)

##                       ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.01065956 0.2072523 0.1528734 -0.2149476 4.335214 0.9400996
##                    ACF1
## Training set 0.02176343

auto.arima() gives a different model than one we have chosen. Based on the accuracies, auto.arima() is the best model.