Data 624 Assignment 6

9.1

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

a) Explain the differences among these figures. Do they all indicate that the data are white noise?

The ACF plots while having spikes stay within the confidence limits for all three plots. In the small sample set the variation causes the auto-correlations to fluctuate a bit but as the sample size increases the confidence intervals gets narrower as the uncertainty decreases especially when it gets to a 1000 samples where the ACF is nearly flat and there are no significant lags. The variation in the beginning even thought its present can be accounted to the low sample size. The data still qualifies as white nose

b) Why are the critical values at different distances from the mean of zero? Why are the auto-correlations different in each figure when they each refer to white noise?

The critical values are always a function of the sample size and it decreases as an exponent as the sample size increases. For the 95 percent confidence interval for a white noise series we expect each of the correlations to be within the 1.96/sqrt T, close to zero but not exactly zero ,and also not similar when it comes to different sample sizes to account for the random variation

9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

gafa_stock %>%
  filter(Symbol == "AMZN") %>%
  autoplot(Close)

We see a constant upwards trend for the closing price until 2019 and then a sharp decline. This is clearly not a stationary data considering the obvious trend.

gafa_stock %>%
  filter(Symbol=="AMZN") %>%
  ACF(Close) %>%
  autoplot()

gafa_stock %>%
  filter(Symbol=="AMZN") %>%
  PACF(Close) %>%
  autoplot()

ACF shows all lags outside the confidence interval r1 large and positive and decreasing very slowly consistent with a non stationary data as for stationary the ACF would’ve dropped to zero relatively quickly. PACF shows a huge spike at lag 1 showing us that it needs to be differenced most probably just once to be stationary

9.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

a) Turkish GDP from global_economy.

turkish <- global_economy %>%
  filter(Country=="Turkey")
turkish %>%
  autoplot(GDP)

turkish %>%
  ACF(GDP) %>%
  autoplot()

turkish %>%
  PACF(GDP) %>%
  autoplot()

lambda <- turkish %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)
turkish <- turkish %>%
  mutate(lambdagdp = box_cox(GDP, lambda))

turkish <- turkish %>%
  mutate(diff1gdp = difference(lambdagdp))

autoplot(turkish, diff1gdp)

turkish %>%
  ACF(diff1gdp) %>%
  autoplot()

turkish %>%
  PACF(diff1gdp) %>%
  autoplot()

We were able to get the data almost stationary with a box cox transformation and one round of differencing

b) Accommodation takings in the state of Tasmania from aus_accommodation.

tas <- aus_accommodation %>%
  filter(State == "Tasmania") %>%
  select(Takings) 
autoplot(tas, Takings)

Here we would probably have to difference to deal with the trend as well as the seasonality after the box cos transformation

lambda <- tas %>%
  features(Takings, features = guerrero) %>%
  pull(lambda_guerrero)
tas <- tas %>%
  mutate(takings_bc = box_cox(Takings, lambda))

We do the first difference to handle the trend

tas <- tas %>%
  mutate(diff1_takings = difference(takings_bc))

And a seasonal differencing to handle the seasonality

tas <- tas %>%
  mutate(diff_seasonal = difference(diff1_takings, lag = 4))

autoplot(tas, diff_seasonal)

tas %>%
  ACF(diff_seasonal) %>%
  autoplot()

tas %>%
  PACF(diff_seasonal) %>%
  autoplot()

We were able to achieve near stationary data with two rounds of differencing here.

c) Monthly sales from souvenirs.

sou <- souvenirs %>%  
  select(Month, Sales)
autoplot(sou, Sales)

lambda <- sou %>%
  features(Sales, features = guerrero) %>%
  pull(lambda_guerrero)

sou <- sou %>%
  mutate(sales_bc = box_cox(Sales, lambda))

First differencing

sou <- sou %>%
  mutate(diff1 = difference(sales_bc),lag = 12)

autoplot(sou, diff1)

sou %>%
  ACF(diff1) %>%
  autoplot()

sou %>%
  PACF(diff1) %>%
  autoplot()

Second differencing

sou <- sou %>%
  mutate(diff2 = difference(diff1),lag = 12)

autoplot(sou, diff2)

sou %>%
  ACF(diff2) %>%
  autoplot()

sou %>%
  PACF(diff2) %>%
  autoplot()

9.5

For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

set.seed(90483484)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries %>%
  autoplot(Turnover)

lambda <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

myseries <- myseries %>%
  mutate(turnover_bc = box_cox(Turnover, lambda))

autoplot(myseries, turnover_bc)

myseries <- myseries %>%
  mutate(diff1 = difference(turnover_bc))

autoplot(myseries, diff1)

myseries %>%
  ACF(diff1) %>%
  autoplot()

myseries %>%
  PACF(diff1) %>%
  autoplot()

myseries <- myseries %>%
  mutate(diff2 = difference(diff1, lag = 12))

autoplot(myseries, diff2)

myseries %>%
  ACF(diff2) %>%
  autoplot()

myseries %>%
  PACF(diff2) %>%
  autoplot()

A box cox transformation , first order differencing to get the trend and the second order seasonal differencing to get the seasonality seems to almost make this data stationary

9.6

a) Use the following R code to generate data from an AR(1) model

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

autoplot(sim, y)

b) Produce a time plot for the series. How does the plot change as you change

y_neg <- numeric(100)
y_zero <- numeric(100)
y_1 <- numeric(100)
e <- rnorm(100)

for(i in 2:100){
  y_neg[i] <- -0.6 * y_neg[i-1] + e[i]
  y_zero[i] <- 0 * y_zero[i-1] + e[i]
  y_1[i] <- 1 * y_1[i-1] + e[i]
}

sim <- tsibble(
  idx = seq_len(100),
  phi_neg = y_neg,
  phi_0 = y_zero,
  phi_1 = y_1,
  index = idx
)

sim %>%
  autoplot(phi_neg) +
  labs(title = "phi = -0.6$")

sim %>%
  autoplot(phi_0) +
  labs(title = "phi = 0$")

sim %>%
  autoplot(y) +
  labs(title = "phi = +0.6")

sim %>%
  autoplot(phi_1) +
  labs(title = "phi = +1")

We notice that as Phi goes on to 0 the data resembles white noise more closely, the variances remain relatively constant, data remains stationary, as it goes away from 0 it goes to resemble a random walk, variance changes over time , and trends seem to emerge and the data gets less and less stationary

c) Write your own code to generate data from an MA(1) model with

yma1 <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  yma1[i] <- e[i] + 0.6*e[i-1]

d) Produce a time plot for the series. How does the plot change as you change theta

sima1 <- tsibble(idx = seq_len(100), yma1  = yma1, index = idx)
sima1 %>%
  autoplot(yma1)

n <- 100
e <- rnorm(n)
yma1_neg <- numeric(n)
yma1_0   <- numeric(n)
yma1_1   <- numeric(n)

for(i in 2:n){
  yma1_neg[i] <- -0.6*e[i-1] + e[i]
  yma1_0[i]   <- 0*e[i-1] + e[i]
  yma1_1[i]   <- 1*e[i-1] + e[i]
}

sima1 <- tsibble(
  idx = seq_len(n),
  yma1_neg = yma1_neg,
  yma1_0   = yma1_0,
  yma1_1   = yma1_1,
  index = idx
)

sima1 %>% autoplot(yma1_neg) + labs(title = "theta = -0.6")

sima1 %>% autoplot(yma1_0)   + labs(title = "theta = 0")

sima1 %>% autoplot(yma1_1)   + labs(title = "theta = 1")

This is similar to the Ar model with the data getting more similar to white noise at 0 and random walk as it moves away

e) Generate data from an ARMA(1,1) model wit

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- (0.6*y[i-1]) + e[i] + (0.6*e[i-1]) 
simarma <- tsibble(idx = seq_len(100), y = y, index = idx)

f) Generate data from an AR(2) mode

y <- numeric(100)
e <- rnorm(100)
for (i in 3:100)
    y[i] <- (-0.8*y[i-1]) + e[i] + (0.3*y[i-2])
simar2 <- tsibble(idx = seq_len(100), y = y, index = idx)

g) Graph the latter two series and compare them.

simarma %>%
  autoplot(y)

simar2 %>%
  autoplot(y)

ARMA plot looks to be stationary while the AR series seems to be oscillating with wider variances with a consistent cyclic pattern

9.7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

a) Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

fit <- aus_airpassengers %>%
  model(ARIMA(Passengers))
report(fit)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

ARIMA(0,2,1) model was selected

fit %>%
  gg_tsresiduals()

Residuals look mostly like white noise, normally distributed around 0, peaks within confidence intervals, no discernible patterns

fit %>%
  forecast(h=10) %>%
  autoplot(aus_airpassengers)

b) Write the model in terms of the backshift operator.

(1−B)^2yt= c + (1−theta1B)Et

Here Theta1 = -0.8963 Constant = 0

(1−B)^2yt=(1−0.8963B)Et

Where Byt = yt-1

c) Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

fit <- aus_airpassengers |>
  model(ARIMA(Passengers ~ 1 + pdq(0,1,0)))
fit %>%
  forecast(h=10) %>% 
  autoplot(aus_airpassengers)

The confidence interval is narrower and forecast looks to be damped for this model when compared to a

d) Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

fit_constant <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ 1 + pdq(2,1,2)))  

fit_constant %>% 
    forecast(h = 10) %>%
       autoplot(aus_airpassengers)

fit_withoutconstant <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ pdq(2,1,2)))  

fit_withoutconstant %>% 
    forecast(h = 10) %>%
       autoplot(aus_airpassengers)

This is a lot closer to the automatically chosen ARIMA model. Without the constant it threw an error and did not forecast as invalid model due to non stationary

e) Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

fit <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))

fit %>%
  forecast(h = 10) %>%
  autoplot(aus_airpassengers)

Here the forecast increases much more sharply in line with the trend and the interval is also narrower than a)

9.8

For the United States GDP series (from global_economy):

a) if necessary, find a suitable Box-Cox transformation for the data;

usa <- global_economy %>%
  filter(Country == "United States") 

usa %>% 
  autoplot(GDP)

lambda <- usa %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

usa <- usa %>%
  mutate(GDP_bc = box_cox(GDP, lambda)) 


usa %>% 
  autoplot(GDP_bc)

b) fit a suitable ARIMA model to the transformed data using ARIMA();

fitusa <- usa %>%
  model(ARIMA(box_cox(GDP, lambda)))
report(fitusa)

## Series: GDP 
## Model: ARIMA(1,1,0) w/ drift 
## Transformation: box_cox(GDP, lambda) 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

c) try some other plausible models by experimenting with the orders chosen;

keeping the drift as recommended, d=1 to account for trend and changing p, q

fit1 <- usa %>%
  model(ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(1,1,1)))
report(fit1)

## Series: GDP 
## Model: ARIMA(1,1,1) w/ drift 
## Transformation: box_cox(GDP, lambda) 
## 
## Coefficients:
##          ar1      ma1  constant
##       0.4736  -0.0190  114.8547
## s.e.  0.2851   0.3286    9.3486
## 
## sigma^2 estimated as 5580:  log likelihood=-325.32
## AIC=658.64   AICc=659.41   BIC=666.82

fit2 <- usa %>%
  model(ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(0,1,1)))
report(fit1)

## Series: GDP 
## Model: ARIMA(1,1,1) w/ drift 
## Transformation: box_cox(GDP, lambda) 
## 
## Coefficients:
##          ar1      ma1  constant
##       0.4736  -0.0190  114.8547
## s.e.  0.2851   0.3286    9.3486
## 
## sigma^2 estimated as 5580:  log likelihood=-325.32
## AIC=658.64   AICc=659.41   BIC=666.82

d) choose what you think is the best model and check the residual diagnostics;

The best model is the one that was recommended, ARIMA(1,1,0) w/ drift with AICc = 657.1, less residual variance with the lowest sigma

fitusa %>% gg_tsresiduals()

The plots confirm the facts we have. This model has residuals that resemble white noise, no discernible patterns, adequately normal around 0 and auto-correlations much lower than the intervals.

e) produce forecasts of your fitted model. Do the forecasts look reasonable?

forecast_arima <- fitusa %>%
  forecast(h = 10) 

autoplot(forecast_arima, usa)

Given the prior data the forecasts look very reasonable and follows the trend line

f) compare the results with what you would obtain using ETS() (with no transformation).

fitets <- usa |>
  model(ETS(GDP))
report(fitets)

## Series: GDP 
## Model: ETS(M,A,N) 
##   Smoothing parameters:
##     alpha = 0.9990876 
##     beta  = 0.5011949 
## 
##   Initial states:
##          l[0]        b[0]
##  448093333334 64917355687
## 
##   sigma^2:  7e-04
## 
##      AIC     AICc      BIC 
## 3190.787 3191.941 3201.089

forecast_ets <- fitets %>% 
  forecast(h=10) 

autoplot(forecast_ets, usa)

This recommended ETS model ETS(M,A,N) does worse with its forecasts when compared to the ARIMA(1,1,0) w/ drift model with higher AIc. It also accounts for the trend better and doesent dampen off like the ETS forecasts which are more conservative.

Data 624 Assignment 6

Jerald Melukkaran

2025-10-18

9.1

9.2

9.3

9.5

9.6

9.7

9.8