Exercise 8.1

(a)

The major difference between these three figures is sample size which were 36, 360 and 1000 random numbers. ACF is biggest with 36 sample size as compared with 360 and 1000 which becomes very small. Each autocorrelation is expected to be close to 0 for white noise series. Also, there is no pattern and spikes are all within the range that’s hwy we can say that all the graphs show white noise.

(b)

Critical values has to be within range of 1.96/sq(T) and T here shows the time series length. Critical value starts getting smaller with the increase in T.

Exercise 8.2

A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

# loading the ibmclose data
data(ibmclose)

# ggtsdisplay() function prints autoplot, ACF and PACF
ggtsdisplay(ibmclose, main="Daily Closing IBM Stock Price Series", ylab="Sales", xlab="Days")

If the time series has seasonality then it is not stationary and it is clear in the top graph that there is a trend in the data. Also, there is no sudden drop to 0 in ACF plot in left graph which shows it is not stationary. PACF plot shows the same thing. Hence the data is not stationary and its statistical properties will not be constant over time.

Exercise 8.3

the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.


(a) usnetelec
(b) usgdp
(c) mcopper
(d) enplanements
(e) visitors

usnetelec

ggtsdisplay(usnetelec, main="Before Transformation")

# boxcox transformation
lambda <- BoxCox.lambda(usnetelec)
print(paste0("The suggested lambda value for usnetelec is ",lambda))
## [1] "The suggested lambda value for usnetelec is 0.516771443964645"
ggtsdisplay(BoxCox(usnetelec, lambda), main="After Transformation")

# Required Differences
ndiffs(BoxCox(usnetelec, lambda))
## [1] 2
# At Second difference
ggtsdisplay(usnetelec %>% diff(2), main="At Second Difference")

There is not much of a difference after transformation and since this data does not have seasonal data that’s why there is no need to check for seasonal differencing. ndiffs function was used to see how many differences are required and it showed 2 that’s why I plotted at second difference and data is stationary now as ACF plot shows sudden drop to 0.

ggtsdisplay(usgdp, main="Before Transformation")

#boxcox transformation
lambda <- BoxCox.lambda(usgdp)
print(paste0("The suggested lambda value for usgdp is ", lambda))
## [1] "The suggested lambda value for usgdp is 0.366352049520934"
ggtsdisplay(BoxCox(usnetelec, lambda), main="After Transformation")

# Required differences
print(paste0("The suggested difference is ", ndiffs(BoxCox(usgdp, lambda))))
## [1] "The suggested difference is 1"
# Differences
ggtsdisplay(usgdp %>% diff(1), main="At First Difference")

usgdp dataset was non-stationary and after boxcox transformation it improved slighlty with boxcox transformation but using ndiffs it was identified that data will be stationary at first difference and hence I used at first difference and seems like data is stationary at first difference as per the ACF plot.

mcopper

ggtsdisplay(mcopper, main="Before Transformation")

ggseasonplot(mcopper)

#boxcox transformation
lambda <- BoxCox.lambda(mcopper)
print(paste0("The suggested lambda value for mcopper is ", lambda))
## [1] "The suggested lambda value for mcopper is 0.191904709003829"
ggtsdisplay(BoxCox(mcopper, lambda), main="After Transformation")

# Required differences
print(paste0("The suggested difference is ", ndiffs(BoxCox(mcopper, lambda))))
## [1] "The suggested difference is 1"
# Differences
ggtsdisplay(usgdp %>% diff(1), main="At First Difference")

There is no seasonality in mcopper dataset but data is non-stationary as ACF is not showing sudden drop to 0 in the plot. There is some variability in the plot which was fixed using boxcox transformation. Boxcox transformation straightened the line to good extend. After checking with ndiffs, it was found that data will be stationary at first difference which can be seen at first difference i.e. sharp drop to 0 in ACF plot and even the autoplot is white noise.

enplanements

ggtsdisplay(enplanements, main="Before Transformation")

ggseasonplot(enplanements)

#boxcox transformation
lambda <- BoxCox.lambda(enplanements)
print(paste0("The suggested lambda value for enplanement is ", lambda))
## [1] "The suggested lambda value for enplanement is -0.226946111237065"
ggtsdisplay(BoxCox(enplanements, lambda), main="After Transformation")

# Required differences
print(paste0("The suggested difference is ", ndiffs(BoxCox(enplanements, lambda))))
## [1] "The suggested difference is 1"
# Differences
ggtsdisplay(enplanements %>% diff(12) %>% diff(1), main="At First Difference")

The data contains some seasonality and therefore it has to be lagged at seasonal difference. THere is no need of boxcox transformation as the autoplot did not change significantly after transformation. ndiffs() identified that first difference will be enough to make the data stationary and we can see that ACF dropped suddenly to 0 after using at first difference.

visitors

ggtsdisplay(visitors, main="Before Transformation")

ggseasonplot(visitors)

#boxcox transformation
lambda <- BoxCox.lambda(visitors)
print(paste0("The suggested lambda value for visitors is ", lambda))
## [1] "The suggested lambda value for visitors is 0.277524910835111"
ggtsdisplay(BoxCox(visitors, lambda), main="After Transformation")

# Required differences
print(paste0("The suggested difference is ", ndiffs(BoxCox(visitors, lambda))))
## [1] "The suggested difference is 1"
# Differences
ggtsdisplay(visitors %>% diff(12) %>% diff(1), main="At First Difference")

Data was initially non-stationary as per ACF plot but at first difference it has become stationary. Initially it needed slight stability that’s why boxcox transformation was used to make it stable. There was also seasonality in the data which was also lagged in ggtsdisplay.

Exercise 8.5

For your retail data (From Exericse 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

retail <- readxl::read_excel('retail.xlsx', skip=1)
myts <- ts(retail[,"A3349627V"], frequency = 12, start=c(1982,4))
ggtsdisplay(myts, main="LIquor Retail Turnover in New South Wales")

ggseasonplot(myts)

print(paste0("Is the Data Stationary? : ", is.stationary(myts)))
## [1] "Is the Data Stationary? : FALSE"

Apparently the autoplot shows that transformation will be necessary and I am going to use BoxCox transformation as per the requirement of this question. Also, there is a seasonality in the data and increasing trend overall.

lambda <- BoxCox.lambda(myts)
trans_data <- BoxCox(myts, lambda)
trans_data %>% ggtsdisplay()

print(paste0("Is the Data Stationary? : ", is.stationary(trans_data)))
## [1] "Is the Data Stationary? : FALSE"

THe BoxCox transformation straightened the line slightly but there is definitely seasonality factor as well. It is still not stationary even after transformation. Let’s check first how many differencing is required and then I’m going to use KPSS method for unit root test to check stationary after first difference.

ndiffs(trans_data)
## [1] 1
trans_data %>% diff(12) %>% diff(1) %>% ur.kpss()
## 
## ####################################### 
## # KPSS Unit Root / Cointegration Test # 
## ####################################### 
## 
## The value of the test statistic is: 0.0156

The t-stat less than 1 and now the data is within the range after differencing and we can say that data is stationary. The data has become stationary at first difference. p-value of 0.0156 shows the data is stationary.

Exercise 8.6

Use R to simulate and plot some data from simple ARIMA models.

(a)

Use the following R code to generate data from an AR(1) model with ϕ1 = 0.6 and σ2 = 1. The process starts with y1 = 0.

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]

(b)

Produce a time plot for the series. How does the plot change as you change ϕ1?

We will change to 0.2, 0.4 and 0.8 and see how plot looks like.

y2 <- ts(numeric(100))
y3 <- ts(numeric(100))
y4 <- ts(numeric(100))

for(i in 2:100){
    y2[i] <- 0.8*y2[i-1] + e[i]
    y3[i] <- 0.4*y3[i-1] + e[i]
    y4[i] <- 0.2*y4[i-1] + e[i]
}

autoplot(y, series="0.6 (original)") + autolayer(y2, series = "0.8") +
  scale_color_manual(name=expression(phi),
                     values=c("0.6 (original)" = "black",
                              "0.8" = "brown1"))

autoplot(y, series="0.6 (original)") + autolayer(y3, series = "0.4") +
  scale_color_manual(name=expression(phi),
                     values=c("0.6 (original)" = "black",
                              "0.4" = "blue1"))

autoplot(y, series="0.6 (original)") + autolayer(y4, series = "0.2") +
  scale_color_manual(name=expression(phi),
                     values=c("0.6 (original)" = "black",
                              "0.2" = "darkorange1"))

Seems like as we increase ϕ1, variability also increases and vice versa.

(C)

Write your own code to generate data from an MA(1) model with θ1 = 0.6 and σ2 = 1.

MA1 <- function(phi, theta){
  set.seed(42)
  y <- ts(numeric(100))
  e <- rnorm(100)
  e[1] <- 0
  for(i in 2:100)
    y[i] <- phi*y[i-1] + theta*e[i-1] + e[i]
  return(y)
}

(d)

Generate data from an AR(2) model with ϕ1=−0.8 , ϕ2=0.3 and σ2=1.

MA2 <- function(phi_1, phi_2){
  set.seed(42)
  y <- ts(numeric(100))
  e <- rnorm(100)
  for(i in 3:100)
    y[i] <- phi_1*y[i-1] + phi_2*y[i-2] + e[i]
  return(y)
}

(e)

Graph the latter two series and compare them.

autoplot(MA1(0.6, 0.6), series = "ARMA(1,1)") +
  autolayer(MA2(-0.8, 0.3), series = "AR(2)") +
  theme(axis.title = element_blank(), legend.position = "bottom", legend.title = element_blank()) +
  scale_color_brewer(palette = "Set1")

Exercise 8.7

Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.

(a)

By studying appropriate graphs of the series in R, find an appropriate ARIMA(p,d,q) model for these data.

ggtsdisplay(wmurders, main="Total number of women murdered each year per 100,000")

According to the ACF plot, wmurders data is non-stationary and there is no seasonality in the data that’s why no transformation is required. Overall, it has upward trend until the early 1990s and then decreasing trend after that.

# Number of difference required
print(paste0("The suggested difference level is ", ndiffs(wmurders)))
## [1] "The suggested difference level is 2"
# CHecking the unit root test at second difference
wmurders %>% diff(differences=2) %>% ur.kpss()
## 
## ####################################### 
## # KPSS Unit Root / Cointegration Test # 
## ####################################### 
## 
## The value of the test statistic is: 0.0458
# Plot at second difference
ggtsdisplay(diff(wmurders, differences=2), main = "At second difference")

The data has become stationary at second difference as the t-value is under 1 which we can verify in the plot above. Coming up to the question, best model for ARIMA(p, q, d) is ARIMA(1,2,2).

(b)

Should you include a constant in the model? Explain

THe model should not include a constant which would introduct a drift as there is on consistent trend.

(c)

Write this model in terms of the backshift operator.

$$

(1 - B)yt

$$

(d)

Fit the model using R and examine the residuals. Is the model satisfactory?

Yes, the model is satisfactory at ARIMA(1,2,2)

(e)

Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.

(f)

Create a plot of the series with forecasts and prediction intervals for the next three periods shown.

arim <- arima(wmurders, order=c(1,2,2))
autoplot(forecast(arim, h=3))

(g)

Does auto.arima() give the same model you have chosen? If not, which model do you think is better?

auto.arima(wmurders)
## Series: wmurders 
## ARIMA(1,2,1) 
## 
## Coefficients:
##           ar1      ma1
##       -0.2434  -0.8261
## s.e.   0.1553   0.1143
## 
## sigma^2 estimated as 0.04632:  log likelihood=6.44
## AIC=-6.88   AICc=-6.39   BIC=-0.97

No, auto.arima gave 1,2,1 which differs from my model which was 1,2,2.