DATA 624 Homework 6

Exercise 9.1

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

a. Explain the differences among these figures. Do they all indicate that the data are white noise?

As the amount of random numbers increases, the smaller the blue-bounded lines. All 3 figures show that the data are white noise, but it also shows that larger sample sizes better approximate that autocorrelations = 0.

b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The critical values depend on the sample size. The bounds are given by \(\pm 1.96/ \sqrt{n}\), so as \(n\) increases, the bounds become narrower. The autocorrelations are different in each figure because the smaller sample sizes produce more variability and the random fluctuations may appear larger even though they are still white noise.

Exercise 9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

library(fpp3)

# fitler for Amazon stock
amazon <- gafa_stock %>%
  filter(Symbol == "AMZN")

# plot daily closing price
amazon %>%
  autoplot(Close) +
  labs(
    title = "Amazon Stock (AMZN) Daily Closing Prices",
    x = "Date",
    y = "Closing Price (USD)"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, face = 'bold')
  )

# plot ACF & PACF
amazon %>%
  ACF(Close) %>%
  autoplot() +
  labs(
    title = "ACF of Amazon Daily Closing Prices"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, face = 'bold')
  )

amazon %>%
  PACF(Close) %>%
  autoplot() +
  labs(
    title = "PACF of Amazon Daily Closing Prices"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, face = 'bold')
  )

The ACF plot shows very high autocorrelations at all lags, close to 1. If the data was stationary, the correlations would drop off quickly towards zero. They remain strong and consistently close to 1 at all lags.

The PACF plot has a very large spike at lag 1, which is dramatically higher than all the other lags. This indicates that the data is not stationary.

The above confirms that the data is non-stationary and should be differenced in order to achieve stationarity.

Exercise 9.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Part A: Turkish GDP from global_economy.

# filter for turkey
turkey <- global_economy %>%
  filter(Country == "Turkey")

# plot GDP
turkey %>%
  autoplot(GDP) +
  labs(
    title="Turkey GDP",
       x = "Year",
       y="GDP (USD)"
    ) +
  theme(
    plot.title = element_text(hjust=0.5, face='bold')
    )

# plot ACF
turkey %>%
  ACF(GDP) %>%
  autoplot() +
  labs(
    title = "ACF Plot of Turkey GDP"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, face = 'bold')
  )

# KPSS check
turkey %>%
  features(GDP, unitroot_kpss)

## # A tibble: 1 × 3
##   Country kpss_stat kpss_pvalue
##   <fct>       <dbl>       <dbl>
## 1 Turkey       1.22        0.01

The plot shows a long-term upward trend.

The above ACF plot shows that the values have a gradual decrease, which is a characteristic of non-stationary data.

The KPSS test shows that the p-value is 0.01, which means that the data is not stationary.

# find lambda
lambda_gdp <- turkey %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

# box cox transformation
turkey <- turkey %>%
  mutate(
    transformed_gdp = box_cox(GDP, lambda_gdp)
  )

# plot transformation
turkey %>%
  autoplot(transformed_gdp) +
  labs(
    title="Box-Cox Transformed Turkey GDP\nwith Lambda = 0.1572",
       x = "Year",
       y="Transformed GDP (USD)"
    ) +
  theme(
    plot.title = element_text(hjust=0.5, face='bold')
    )

The transformed data still shows an upward trend.

# apply differencing
turkey <- turkey %>%
  mutate(
    diff_transformed_gdp = difference(transformed_gdp)
  )

# plot differenced transformed GDP
turkey %>%
  autoplot(diff_transformed_gdp)

# ACF plot
turkey %>%
  ACF(diff_transformed_gdp) %>%
  autoplot() +
  labs(
    title = "ACF Plot"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, face = 'bold')
  )

# PACF plot
turkey %>%
  PACF(diff_transformed_gdp) %>%
  autoplot() +
  labs(
    title = "PACF Plot"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, face = 'bold')
  )

# KPSS check
turkey %>%
  features(diff_transformed_gdp, unitroot_kpss)

## # A tibble: 1 × 3
##   Country kpss_stat kpss_pvalue
##   <fct>       <dbl>       <dbl>
## 1 Turkey     0.0889         0.1

By applying a box-cox transformation and differencing, we are able to obtain a white noise series as shown in the ACF plot above. The KPSS test shows that the p-value is 0.1, which means that the data is now stationary.

Part B: Accommodation takings in the state of Tasmania from aus_accommodation.

# Filter for Tasmania
tasmania <- aus_accommodation %>%
  filter(State == "Tasmania")

# plot takings
tasmania %>%
  autoplot(Takings) +
  labs(
    title="Tasmania Accommodation Takings",
       x = "Quarter",
       y="GDP (AUDm))"
    ) +
  theme(
    plot.title = element_text(hjust=0.5, face='bold')
    )

# plot ACF
tasmania %>%
  ACF(Takings) %>%
  autoplot() +
  labs(
    title = "ACF Plot of Tasmania Accommodation Takings"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, face = 'bold')
  )

# KPSS check
tasmania %>%
  features(Takings, unitroot_kpss)

## # A tibble: 1 × 3
##   State    kpss_stat kpss_pvalue
##   <chr>        <dbl>       <dbl>
## 1 Tasmania      1.88        0.01

The time-series plot shows that there is a long-term upward trend with possible seasonality.

The ACF plot indiciates possible seasonality.

The KPSS test shows that the p-value is 0.01, which indicates that the data is not stationary.

# find lambda
lambda_tas <- tasmania %>%
  features(Takings, features = guerrero) %>%
  pull(lambda_guerrero)

# box cox transformation
tasmania <- tasmania %>%
  mutate(
    transformed_takings = box_cox(Takings, lambda_tas)
  )

# plot transformation
tasmania %>%
  autoplot(transformed_takings) +
  labs(
    title="Box-Cox Transformed Tasmania Takings\nwith Lambda = 0.001819",
       x = "Quarter",
       y="Transformed Takings (AUDm)"
    ) +
  theme(
    plot.title = element_text(hjust=0.5, face='bold')
    )

The above plot shows that the variance has decreased after the transformation since the amplitude of the seasonality is lessened.

# apply differencing
tasmania <- tasmania %>%
  mutate(
    diff_transformed_takings = difference(transformed_takings,4 )
  )

# plot differenced transformed takings
tasmania %>%
  autoplot(diff_transformed_takings)

# ACF plot
tasmania %>%
  ACF(diff_transformed_takings) %>%
  autoplot() +
  labs(
    title = "ACF Plot"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, face = 'bold')
  )

# PACF plot
tasmania %>%
  PACF(diff_transformed_takings) %>%
  autoplot() +
  labs(
    title = "PACF Plot"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, face = 'bold')
  )

# KPSS check
tasmania %>%
  features(diff_transformed_takings, unitroot_kpss)

## # A tibble: 1 × 3
##   State    kpss_stat kpss_pvalue
##   <chr>        <dbl>       <dbl>
## 1 Tasmania     0.198         0.1

The PACF plot shows one large spike at lag 2 and then the values drop off sharply towards 0, this indicates stationary data.

The KPSS test shows that the p-value is 0.1, indicating that the data is now stationary.

A single seasonal difference and box-cox transformation with lambda ~ 0.0018 will produce stationary data.

Part C: Monthly sales from souvenirs.

# plot
souvenirs %>%
  autoplot() +
  labs(
    title = "Souvenir Shop Monthly Sales",
    x = "Month",
    y = "Sales (AUD)"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, face = 'bold')
  )

# plot ACF
souvenirs %>%
  ACF(Sales) %>%
  autoplot() +
  labs(
    title = "ACF Plot of Monthly Sales"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, face = 'bold')
  )

# KPSS check
souvenirs %>%
  features(Sales, unitroot_kpss)

## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1      1.31        0.01

From the time-series plot, there is a long-term upward trend with possible seasonality. The ACF plot shows two large spikes. The KPSS test shows that the p-value is 0.01, which indicates that the data is non-stationary.

# find lambda
lambda_sales <- souvenirs %>%
  features(Sales, features = guerrero) %>%
  pull(lambda_guerrero)

# box cox transformation
souvenirs <- souvenirs %>%
  mutate(
    transformed_sales = box_cox(Sales, lambda_sales)
  )

# plot transformation
souvenirs %>%
  autoplot(transformed_sales) +
  labs(
    title="Box-Cox Transformed Monthly Sales\nwith Lambda = 0.002118",
       x = "Month",
       y="Sales (AUD)"
    ) +
  theme(
    plot.title = element_text(hjust=0.5, face='bold')
    )

The above plot shows that the variance has decreased after the transformation since the amplitude of the seasonality is lessened.

# apply differencing
souvenirs <- souvenirs %>%
  mutate(
    diff_transformed_sales = difference(transformed_sales, 12)
  )

# plot differenced transformed sales
souvenirs %>%
  autoplot(diff_transformed_sales)

# ACF plot
souvenirs %>%
  ACF(diff_transformed_sales) %>%
  autoplot() +
  labs(
    title = "ACF Plot"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, face = 'bold')
  )

# PACF plot
souvenirs %>%
  PACF(diff_transformed_sales) %>%
  autoplot() +
  labs(
    title = "PACF Plot"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, face = 'bold')
  )

# KPSS check
souvenirs %>%
  features(diff_transformed_sales, unitroot_kpss)

## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1     0.240         0.1

The lack of longer-term correlation in the PACF plot indicates that the trend and seasonality have been removed.

The KPSS test shows that the p-value is 0.1, indicating that the data is stationary.

So, a box-cox transformation with lambda ~ 0.002118 and a single seasonal differencing will produce stationary data.

Exercise 9.5

For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

# read excel file
retail <- readxl::read_excel("/Users/kristinlussi/Documents/MSDS/DATA 624/retail.xlsx", skip = 1)

# convert data frame to time series object, select just A3349398A time series
myts <- ts(retail[,"A3349398A"], frequency = 12, start = c(1982, 4)) %>% 
  as_tsibble(index=Month)

# PLOT
autoplot(myts) +
  labs(
    title = "Australian Retail Turnover",
    x= "Month",
    y = "Turnover"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, face = 'bold')
  )

From the above time series plot, there is obvious seasonality with a long-term upward trend.

# apply box cox transformation

# find lambda
lambda_retail <- myts %>%
  features(value, features = guerrero) %>%
  pull(lambda_guerrero)

# box cox transformation
myts <- myts %>%
  mutate(
    transformed_retail = box_cox(value, lambda_retail)
  )

# plot transformation
myts %>%
  autoplot(transformed_retail) +
  labs(
    title="Box-Cox Transformed Australian Retail Turnover\nwith Lambda = 0.12316",
       x = "Month",
       y="Turnover"
    ) +
  theme(
    plot.title = element_text(hjust=0.5, face='bold')
    )

Applying a box-cox transformation stabilized the variance, so the seasonal fluctuations now appear to be more consistent throughout the series. The long-term upward trend and seasonality is still present, so the next step is differencing.

# check to see if seasonal differencing is needed - yes because = 1
myts %>%
  features(transformed_retail, unitroot_nsdiffs)

## # A tibble: 1 × 1
##   nsdiffs
##     <int>
## 1       1

# apply seasonal difference
myts <- myts %>%
  mutate(s_difference = difference(transformed_retail, 12)) 

# how many regular diffs? - 1 
myts %>% 
  features(s_difference, unitroot_ndiffs)

## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      1

# apply non-seasonal difference
myts <- myts %>%
  mutate(ns_difference = difference(s_difference))

# plot
myts %>%
  autoplot(ns_difference) +
  labs(
    title = "Australian Retail Turnover Box-Cox Transformed\n + seasonal diff(12) + first diff"
  ) +
  theme(
    plot.title = element_text(hjust=0.5, face='bold')
    )

# ACF after differencing
myts %>%
  ACF(ns_difference) %>%
  autoplot() +
  labs(
    title = "ACF After Differencing"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, face ='bold')
  )

# PACF after diffrencing
myts %>%
  PACF(ns_difference) %>%
  autoplot() + 
  labs(
    title = "PACF After Differencing"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, face = 'bold')
  )

# KPSS test
myts %>%
  features(ns_difference, unitroot_kpss)

## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1    0.0165         0.1

The KPSS test confirms that the data is now stationary with a p-value of 0.1.

In summary, we applied a box-cox transformation with \(\lambda \approx 0.12316\), then applied one seasonal difference (\(m=12\)), and then one non-seasonal difference to yield stationary data.

Exercise 9.6

Simulate and plot some data from simple ARIMA models.

Part A

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

Part B

Produce a time plot for the series. How does the plot change as you change \(\phi_{1}\)?

set.seed(1999)
e <- rnorm(100)

# data frame to hold results
all_sims <- tibble()

# loop through different phi values
for (phi in seq(0, 1, by = 0.1)) {
  y <- numeric(100)
  for (i in 2:100) {
    y[i] <- phi * y[i - 1] + e[i]
  }
  sim <- tibble(idx = 1:100, y = y, phi = phi)
  all_sims <- bind_rows(all_sims, sim)
}

# convert to tsibble for plotting
sim_ts <- as_tsibble(all_sims, index = idx, key = phi)

# plot
autoplot(sim_ts, y) +
  labs(
    title = "AR(1) Simulations for Different Phi Values",
    x = "Time", 
    y = "y"
    ) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold")
    )

As the \(\phi\) values get closer to 1, the more variability there is in the data, resulting in non-stationary data. As the \(\phi\) values get closer to 0, the data is more stationary.

Part C

Write your own code to generate data from an MA(1) model with \(\theta_{1} = 0.6\) and \(\sigma^{2} =1\).

sigma <- 1
theta <- 0.6
n <- 100 # number of observations

# generate white noise
e <- rnorm(n, mean = 0, sd = sqrt(sigma))

# initialize series
y <- numeric(n)
y[1] = e[1] # first value 

# generate MA(1) process
for (t in 2:n){
  y[t] <- e[t] + theta * e[t-1]
}

# convert to tsibble for plotting
ma1 <- tsibble(idx = 1:n, y=y, index = idx)

Part D:

Produce a time plot for the series. How does the plot change as you change \(\theta_{1}\)?

all_ma_sims <-tibble()

# loop through theta values
for (theta in seq(-0.6, 0.6, by = 0.3)) {
  e <- rnorm(n, mean = 0, sd = 1)
  y <- numeric(n)
  y[1] <- e[1]
  for (t in 2:n) {
    y[t] <- e[t] + theta * e[t - 1]
  }
  sim <- tibble(idx = 1:n, y = y, theta = theta)
  all_ma_sims <- bind_rows(all_ma_sims, sim)
}

# convert to tsibble for plots
ma1 <- as_tsibble(all_ma_sims, index = idx, key = theta)

# plot 
ma1 %>%
  autoplot(y) +
  labs(
    title = "MA(1) Simulations for Different Values of Theta" ,
    x = "Time",
    y = expression(y[t])
  ) +
  facet_wrap(~theta) +
  theme(
    plot.title = element_text(hjust = 0.5)
  )

The further \(\theta\) is from zero, the more the data remembers the last value, which makes the lines more choppy and the data looks less like white noise.

Part E

Generate data from an ARMA(1,1) model with \(\phi_{1} = 0.6\), \(\theta_{1} =0.6\), and \(\sigma^{2} = 1\).

n <- 100 # number of observations
phi <- 0.6 
theta <- 0.6
sigma <- 1

# generate white noise
e <- rnorm(n, mean = 0, sd = sqrt(sigma))

# initialize series
y <- numeric(n)
y[1] = e[1] # first value 

# generate ARMA(1,1) process
for (t in 2:n) {
  y[t] <- phi * y[t - 1] + e[t] + theta * e[t - 1]
}

# convert to tsibble for plotting
arma11 <- tsibble(idx = 1:n, y = y, index = idx)

Part F

Generate data from an AR(2) model with \(\phi_{1} = -0.8\), \(\phi_{2} = 0.3\), and \(\sigma^{2} = 1\) (Note that these parameters will give a non-stationary series).

n <- 100 # number of observations
phi1 <- -0.8
phi2 <- 0.3
sigma <-1

# generate white noise
e <- rnorm(n, mean = 0, sd = sqrt(sigma))

# initialize the series
y <- numeric(n)
y[1] <- e[1]
y[2] <- e[2]

# generate AR(2) process
for (t in 3:n) {
  y[t] <- phi1 * y[t - 1] + phi2 * y[t - 2] + e[t]
}

# convert to tsibble for plotting
ar2 <- tsibble(idx = 1:n, y = y, index = idx)

Part G

Graph the latter two series and compare them.

# plot arma11
arma11 %>%
  autoplot(y) +
  labs(
    title = "ARMA(1,1) Simulation (phi = 0.6, theta = 0.6)",
    x = "Time", 
    y= expression(y[t])
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, face= 'bold')
  )

# ACF & PACF plot
arma11 %>%
  ACF(y) %>%
  autoplot() +
  labs(
    title = "ACF Plots for ARMA(1,1) Model"
  )+
  theme(
    plot.title = element_text(hjust = 0.5, face= 'bold')
  )

arma11 %>%
  PACF(y) %>%
  autoplot() +
  labs(
    title = "PACF Plots for ARMA(1,1) Model"
  )+
  theme(
    plot.title = element_text(hjust = 0.5, face= 'bold')
  )

# plot AR(2)
ar2 %>%
  autoplot(y) +
  labs(
    title = "AR(2) Simulation (phi_1 = -0.8, phi_2 = 0.3)",
    x = "Time",
    y = expression(y[t])
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5, face = "bold"))

# plot ACF & PACF
ar2 %>%
  ACF(y) %>%
  autoplot() +
  labs(
    title = "ACF Plot for AR(2) Model"
  )+
  theme(
    plot.title = element_text(hjust = 0.5, face= 'bold')
  )

ar2 %>%
  PACF(y) %>%
  autoplot()+
  labs(
    title = "PACF Plot for AR(2) Model"
  )+
  theme(
    plot.title = element_text(hjust = 0.5, face= 'bold')
  )

The ACF & PACF plots for the ARMA(1,1) model confirm that the data is stationary. The ACF plot shows gradual exponential decay which indicates that the series exhibits short term correlation that decreases overtime. The PACF plot shows a significant spike on the first lag, followed by small non-zero values. Together, these plots confirm that the data is stationary and stabilizes quickly.

The AR(2) model produces a non-stationary series, shown by the oscillations in the time-series plot. The ACF plot decays very slowly in an oscillating pattern, while the PACF plot has a significant spike at lag 1. Both of these are signs of non-stationary data.

Exercise 9.7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

data("aus_airpassengers")

# plot the data
aus_airpassengers %>%
  autoplot(Passengers) +
  labs(
    title = "Australian Air Passengers (1970-2011)",
    x="Year",
    y = "Passengers (Millions)"
  )

Part A

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

# Fit the ARIMA model
aus_air_fit <- aus_airpassengers %>%
  model(ARIMA(Passengers))

report(aus_air_fit)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

The model that was selected was ARIMA(0,2,1).

# check residuals for white noise
aus_air_fit %>% 
  gg_tsresiduals()

# ljung box test
augment(aus_air_fit) %>% 
  features(.innov, ljung_box, lag = 24)

## # A tibble: 1 × 3
##   .model            lb_stat lb_pvalue
##   <chr>               <dbl>     <dbl>
## 1 ARIMA(Passengers)    24.0     0.463

The ACF shows that the values are within the blue bounds, which indicates that there is no significant autocorrelation. The p-value is >0.05, which means that the residuals show no significant autocorrelation and the model fits well. From this, we can confirm that the resdiuals look like white noise.

# Forecasts
aus_air_fit %>%
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(
    title = "Forecasts from ARIMA Model",
    y = "Passengers (Millions)"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, face='bold')
  )

Part B

Write the model in terms of the backshift operator.

\((1- B)^2 y_{t} = (1-0.8963B)\epsilon_{t}\)

Part C

Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

fit_drift <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ pdq(0,1,0)))

report(fit_drift)

## Series: Passengers 
## Model: ARIMA(0,1,0) w/ drift 
## 
## Coefficients:
##       constant
##         1.4191
## s.e.    0.3014
## 
## sigma^2 estimated as 4.271:  log likelihood=-98.16
## AIC=200.31   AICc=200.59   BIC=203.97

fit_drift %>%
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(
    title = "ARIMA(0,1,0) with Drift Forecast"
  ) +
  theme(
    plot.title = element_text(hjust=0.5, face='bold')
  )

# check residuals for white noise
fit_drift %>% 
  gg_tsresiduals()

The ARIMA(0,1,0) model shows a linear trend that slightly underfits the curvature of the actual data. The original ARIMA(0,2,1) model provides a smoother and more realistic long-term growth pattern.

Part D

Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

fit_212_drift <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ 1 + pdq(p=2, d=1, q=2)))

report(fit_212_drift)

## Series: Passengers 
## Model: ARIMA(2,1,2) w/ drift 
## 
## Coefficients:
##           ar1      ar2     ma1     ma2  constant
##       -0.5518  -0.7327  0.5895  1.0000    3.2216
## s.e.   0.1684   0.1224  0.0916  0.1008    0.7224
## 
## sigma^2 estimated as 4.031:  log likelihood=-96.23
## AIC=204.46   AICc=206.61   BIC=215.43

fit_212_drift %>%
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(
    title = "ARIMA(2,1,2) with Drift Forecast"
  ) +
  theme(
    plot.title = element_text(hjust=0.5, face='bold')
  )

# check residuals for white noise
fit_212_drift%>% 
  gg_tsresiduals()

This new model follows the upward trend more closely than the previous models. It captures short term trends better and gives slightly narrower forecast intervals.

fit_212_drift_noconstant <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ 0 + pdq(p=2, d=1, q=2)))

report(fit_212_drift_noconstant)

## Series: Passengers 
## Model: NULL model 
## NULL model

Removing the constant results in an error.

Part E

Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

fit_021c <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))

## Warning: Model specification induces a quadratic or higher order polynomial trend. 
## This is generally discouraged, consider removing the constant or reducing the number of differences.

report(fit_021c)

## Series: Passengers 
## Model: ARIMA(0,2,1) w/ poly 
## 
## Coefficients:
##           ma1  constant
##       -1.0000    0.0571
## s.e.   0.0585    0.0213
## 
## sigma^2 estimated as 3.855:  log likelihood=-95.1
## AIC=196.21   AICc=196.79   BIC=201.63

fit_021c %>%
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(
    title = "ARIMA(0,2,1) with a Constant Forecast"
  ) +
  theme(
    plot.title = element_text(hjust=0.5, face='bold')
  )

# check residuals for white noise
fit_021c %>% 
  gg_tsresiduals()

The model produces a warning due to it inducing a polynomial trend, which is discouraged. This causes the forecasts to curve upward overtime instead of increasing linearly.

Exercise 9.8

For the United States GDP series (from global_economy):

us <- global_economy %>%
  filter(Country == "United States")

us %>%
  autoplot(GDP) +
  labs(
    title = "United States GDP",
    y = "GDP (USD)",
    x = "Year"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, face='bold')
  )

Part A

if necessary, find a suitable Box-Cox transformation for the data;

# find lambda
lambda_us <- us %>%
  features(GDP, features = guerrero) |>
  pull(lambda_guerrero)

# apply box cox transformation
us <- us %>%
  mutate(
    transformed_gdp = box_cox(GDP, lambda_us)
  )

# plot after transformation
us %>%
  autoplot(transformed_gdp) +
  labs(
    title = "United States GDP (Box-Cox Transformation)",
    y = "GDP (USD)",
    x = "Year"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, face='bold')
  )

Part B

fit a suitable ARIMA model to the transformed data using ARIMA();

# Fit the ARIMA model
us_gdp_fit <- us %>%
  model(ARIMA(transformed_gdp))

report(us_gdp_fit)

## Series: transformed_gdp 
## Model: ARIMA(1,1,0) w/ drift 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

Part C

try some other plausible models by experimenting with the orders chosen;

fits <- us %>%
  model(
    auto_partc = ARIMA(transformed_gdp),
    fit021c_partc = ARIMA(transformed_gdp ~ 1 + pdq(0,2,1)),
    fit010_partc = ARIMA(transformed_gdp ~ pdq(0,1,0))
  )

report(fits) %>%
  arrange(AICc)

## # A tibble: 3 × 9
##   Country       .model        sigma2 log_lik   AIC  AICc   BIC ar_roots ma_roots
##   <fct>         <chr>          <dbl>   <dbl> <dbl> <dbl> <dbl> <list>   <list>  
## 1 United States fit021c_partc  6115.   -323.  652.  652.  658. <cpl>    <cpl>   
## 2 United States auto_partc     5479.   -325.  657.  657.  663. <cpl>    <cpl>   
## 3 United States fit010_partc   6774.   -332.  668.  668.  672. <cpl>    <cpl>

Part D

choose what you think is the best model and check the residual diagnostics;

The ARIMA(0,2,1) model with constant provided the best fit out of the 3 models.

best_fit <- us %>%
  model(fit021c_partc = ARIMA(transformed_gdp ~ 1 + pdq(0,2,1)))

best_fit %>%
  gg_tsresiduals()

The time plot shows that the residuals fluctuate randomly around zero with no obvious trend.

The ACF plot shows that the spikes are within the blue bounds, confirming there is no significant autocorrelation and residuals seem like white noise.

The histogram is roughly centered at zero with slight left-skewness.

These resisdual diagnostics confirm that the ARIMA (0,2,1) with constant model fits the data well.

Part E

produce forecasts of your fitted model. Do the forecasts look reasonable?

# forecast
fc <- best_fit %>% forecast(h = 10)

# plot forecasts
autoplot(fc, us) +
  labs(
    title = "ARIMA(0,2,1) with a Constant Forecast",
    x = "Year",
    y = "GDP (USD)"
  ) +
  theme(
    plot.title = element_text(hjust=0.5, face='bold')
  )

I think that the forecasts look reasonable for the data. They extend the long-term upward trend without any unrealistic jumps or decreases. The prediction intervals widen over time.

Part F

compare the results with what you would obtain using ETS() (with no transformation).

# fit ETS model with no transformation
ets_fit <- us %>%
  model(ETS(GDP))

# plot forecasts
ets_fit %>%
  forecast(h = 10) %>%
  autoplot(us) +
  labs(
    title = "ETS Forecasts for U.S. GDP",
    y = "GDP (USD)"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, face = 'bold')
    )

The ETS forecasts show a more dramatic upward trend over time, while the ARIMA(0,2,1) w/ constant forecast has a more modest increase. The forecast intervals for the ETS forecast are much wider than the ARIMA forecasts.

DATA 624 Homework 6

Kristin Lussi

2025-10-19

Exercise 9.1

a. Explain the differences among these figures. Do they all indicate that the data are white noise?

b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

Exercise 9.2

Exercise 9.3

Part A: Turkish GDP from global_economy.

Part B: Accommodation takings in the state of Tasmania from aus_accommodation.

Part C: Monthly sales from souvenirs.

Exercise 9.5

Exercise 9.6

Part A

Part B

Part C

Part D:

Part E

Part F

Part G

Exercise 9.7

Part A

Part B

Part C

Part D

Part E

Exercise 9.8

Part A

Part B

Part C

Part D

Part E

Part F