Data624 Homework

library(forecast)
library(tseries)
library(fpp3)

Do the exercises 9.1, 9.2, 9.3, 9.5, 9.6, 9.7, 9.8 in Hyndman.

9.1.

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

knitr::include_graphics("/Users/ulianaplotnikova/Desktop/Untitled 6.png")

a.Explain the differences among these figures. Do they all indicate that the data are white noise?

The differences among the figures are due to sample size. Smaller samples (36 numbers) show more variability and deviations from zero in the ACF, while larger samples (360 and 1000 numbers) show ACFs closer to zero, which is consistent with white noise. The critical values are further from zero for smaller samples because smaller samples are more prone to random fluctuations. All three datasets are consistent with white noise, but the larger samples provide more reliable evidence.

b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The variation in distances of critical values from a mean of zero arises from their calculation as ±1.96/√𝑇, with 𝑇 denoting the length of the time series. As the value of 𝑇 increases, the critical value tends to decrease. This indicates that larger series sizes result in critical values that are closer to zero. The differences observed in autocorrelations can be explained by the larger sizes of random number series, which lower the probability of identifying autocorrelation.

9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

head(gafa_stock)

## # A tsibble: 6 x 8 [!]
## # Key:       Symbol [1]
##   Symbol Date        Open  High   Low Close Adj_Close    Volume
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>     <dbl>
## 1 AAPL   2014-01-02  79.4  79.6  78.9  79.0      67.0  58671200
## 2 AAPL   2014-01-03  79.0  79.1  77.2  77.3      65.5  98116900
## 3 AAPL   2014-01-06  76.8  78.1  76.2  77.7      65.9 103152700
## 4 AAPL   2014-01-07  77.8  78.0  76.8  77.1      65.4  79302300
## 5 AAPL   2014-01-08  77.0  77.9  77.0  77.6      65.8  64632400
## 6 AAPL   2014-01-09  78.1  78.1  76.5  76.6      65.0  69787200

amazon <- gafa_stock[gafa_stock$Symbol == "AMZN", ]
amazon <- amazon[order(amazon$Date), ]

Explain how each plot shows that the series is non-stationary and should be differenced.

# Plot the daily closing prices
plot(amazon$Date, amazon$Close, type = "l", 
     main = "Amazon Daily Closing Prices", 
     xlab = "Date", ylab = "Closing Price")

# Plot the ACF and PACF to assess stationarity
par(mfrow = c(1, 2))  # Set layout for two plots in one row

acf(amazon$Close, main = "ACF of Amazon Closing Prices")
pacf(amazon$Close, main = "PACF of Amazon Closing Prices")

par(mfrow = c(1, 1))  # Reset layout to default</p>

Explanation: The ACF plot reveals slow decay which demonstrates strong autocorrelation across multiple lags pointing to non-stationarity. The PACF plot displays a gradual decline rather than a distinct endpoint. further supporting non-stationarity in the series.These patterns suggest that the series should be differenced to remove trends and stabilize its behavior before further analysis.

9.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Turkish GDP from global_economy.
Accommodation takings in the state of Tasmania from aus_accommodation.
Monthly sales from souvenirs.

Turkish GDP from global_economy.

turkish_gdp <- subset(global_economy, Country == "Turkey")$GDP

# Check if data has NA values
if(length(turkish_gdp) == 0 || any(is.na(turkish_gdp))) {
  stop("Turkish GDP data is missing or incomplete. Please check your dataset.")
}
plot(turkish_gdp, main = "Turkish GDP")

# Determine the Box-Cox transformation parameter lambda
lambda <- BoxCox.lambda(turkish_gdp)
print(paste("Optimal lambda:", lambda))

## [1] "Optimal lambda: 0.157180448620967"

# Apply the Box-Cox transformation
gdp_transformed <- BoxCox(turkish_gdp, lambda)

# Plot the transformed data and ACF to check for stationarity
plot(gdp_transformed, main = "Box-Cox Transformed Turkish GDP")

par(mfrow = c(1, 2))  # Set layout for two plots in one row
Acf(gdp_transformed, main = "ACF of Transformed Turkish GDP")
Pacf(gdp_transformed, main = "PACF of Transformed Turkish GDP")

Perform differencing

gdp_diff <- diff(gdp_transformed, differences = 1)
gdp_diff

##  [1] -20.9580371   3.8641127   5.5237878   2.8834936   2.5318469   6.5107950
##  [7]   4.1214595   4.4693683   4.3752831  -5.3465183  -2.0138979   9.3711896
## [13]   9.7928765  14.4276069  10.4860555   6.6236731   6.5690427   5.1964561
## [19]  16.2489133 -13.5123755   1.6305665  -4.8307459  -2.2651547  -1.3753064
## [25]   5.6881283   6.0441482   7.2987645   2.1751346   8.8179840  18.9712393
## [31]  -0.2463746   3.1357320   7.4708845 -18.4018017  14.8262872   4.0001846
## [37]   2.6590696  22.7894523  -4.6754318   4.0371263 -18.9720831  10.5713250
## [43]  16.8352456  17.0634912  14.5320468   6.7468324  14.3438261   8.9964705
## [49] -12.3957591  13.1228665   5.6185139   3.6469969   6.3706087  -1.3260642
## [55]  -6.2756711   0.3423482  -1.0660121

# Plot the differenced data and ACF
plot(gdp_diff, main = "Differenced Box-Cox Transformed Turkish GDP")

par(mfrow = c(1, 2))  # Set layout for two plots in one row
Acf(gdp_diff, main = "ACF of Differenced Transformed Turkish GDP")
Pacf(gdp_diff, main = "PACF of Differenced Transformed Turkish GDP")

Perform Augmented Dickey-Fuller test to check for stationarity

adf_test_result <- adf.test(gdp_diff)

## Warning in adf.test(gdp_diff): p-value smaller than printed p-value

print(adf_test_result)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  gdp_diff
## Dickey-Fuller = -4.3707, Lag order = 3, p-value = 0.01
## alternative hypothesis: stationary

Accommodation takings in the state of Tasmania from aus_accommodation.

aus_accommodation |>
  filter(State == "Tasmania") |>
  autoplot(Takings)

lambda <- aus_accommodation |>
  filter(State == "Tasmania") |>
  features(Takings, features = guerrero) |>
  pull(lambda_guerrero)

aus_accommodation |>
  filter(State == "Tasmania") |>
  features(box_cox(Takings, lambda), unitroot_ndiffs)

## # A tibble: 1 × 2
##   State    ndiffs
##   <chr>     <int>
## 1 Tasmania      1

To become stationary, the data needs to be differenced once

Monthly sales from souvenirs

head(souvenirs)

## # A tsibble: 6 x 2 [1M]
##      Month Sales
##      <mth> <dbl>
## 1 1987 Jan 1665.
## 2 1987 Feb 2398.
## 3 1987 Mar 2841.
## 4 1987 Apr 3547.
## 5 1987 May 3753.
## 6 1987 Jun 3715.

souvenirs %>%
  gg_tsdisplay(Sales, plot_type='partial', lag = 36) +
  labs(title = "Non-transformed Monthly Souvenir Sales")

# calculate lambda
lambda <- souvenirs %>%
  features(Sales, features = guerrero) %>%
  pull(lambda_guerrero)
lambda

## [1] 0.002118221

souvenirs %>%
  mutate(Sales_bc = box_cox(Sales, lambda)) %>%
  mutate(Sales_bc_diff = difference(Sales_bc, 12)) %>%
  gg_tsdisplay(Sales_bc_diff, plot_type='partial', lag = 36) +
  labs(title = paste0("Monthly souvenir sales differenced with lambda = ", round(lambda, 4)))

9.5

For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

head(aus_retail)

## # A tsibble: 6 x 5 [1M]
## # Key:       State, Industry [1]
##   State                        Industry            `Series ID`    Month Turnover
##   <chr>                        <chr>               <chr>          <mth>    <dbl>
## 1 Australian Capital Territory Cafes, restaurants… A3349849A   1982 Apr      4.4
## 2 Australian Capital Territory Cafes, restaurants… A3349849A   1982 May      3.4
## 3 Australian Capital Territory Cafes, restaurants… A3349849A   1982 Jun      3.6
## 4 Australian Capital Territory Cafes, restaurants… A3349849A   1982 Jul      4  
## 5 Australian Capital Territory Cafes, restaurants… A3349849A   1982 Aug      3.6
## 6 Australian Capital Territory Cafes, restaurants… A3349849A   1982 Sep      4.2

set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

gg_tsdisplay(myseries, plot_type = "partial")+
  labs(title = "Non-transformed Australia Retail Turnover")

## Plot variable not specified, automatically selected `y = Turnover`

lambda <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)
lambda

## [1] 0.08303631

myseries %>%
  gg_tsdisplay(difference(box_cox(Turnover,lambda), 12), plot_type='partial', lag = 36) +
  labs(title = paste0("Differenced Australia Retail Turnover = ", round(lambda,2)))

9.6

Simulate and plot some data from simple ARIMA models.

a. Use the following R code to generate data from an AR(1) model with \(\phi_1 = 0.6\) and \(\sigma^2 = 1\). The process starts with \(y_1=0\).

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

b.Produce a time plot for the series. How does the plot change as you change ϕ1?

library(tsibble)
library(ggplot2)
library(latex2exp)

# Function to generate and plot an AR(1) model
plot_ar1_model <- function(phi, n = 100, seed = NULL) {
  if (!is.null(seed)) set.seed(seed)
  e <- rnorm(n)
  y <- numeric(n)
  y[1] <- e[1]  # Initial value

  for(i in 2:n) {
    y[i] <- phi * y[i-1] + e[i]
  }

  tsibble(idx = seq_len(n), y = y, index = idx) %>%
    autoplot(y) +
    labs(title = TeX(sprintf("AR(1) Model with $\\phi$ = %s", phi)),
         subtitle = "Simulation with Gaussian white noise",
         y = "Value",
         x = "Index") +
    theme_minimal()
}

plot_ar1_model(-0.6, seed = 123)

plot_ar1_model(0, seed = 456)

plot_ar1_model(1, seed = 789)

plot_ar1_model(0.6, seed = 101)

Modifying the value of 𝜙1 leads to distinct patterns in the time series. At 𝜙1=0, the series appears similar to white noise, while at 𝜙1=1, it takes on characteristics of a random walk. A negative value for 𝜙1 results in oscillations around the mean. As 𝜙1 decreases, the variability increases, leading to a greater number of spikes.

c. Write your own code to generate data from an MA(1) model with \(\phi_1 = 0.6\) and \(\sigma^2 = 1\).

ma.1 = function(theta, sigma, n){
  y = ts(numeric(n))
  e = rnorm(n, sigma)
  for(i in 2:n)
    y[i] = theta*e[i-1] + e[i]
  return(y)
}

d. Produce a time plot for the series. How does the plot change as you change \(\phi_1\)?

theta = c(-0.6, 0, 0.6)
sigma = 1
n = 100
for (i in 1:3){
  y = ma.1(theta[i], sigma, n)
  p = autoplot(y) + labs(title = sprintf("theta = %0.1f", theta[i]))
  acf = ggAcf(y) + labs(title = sprintf("theta = %0.1f", theta[i]))
  gridExtra::grid.arrange(p,acf, ncol = 2)
}

As the value of theta changes, the dependency pattern on past shocks changes.With higher theta the short-term smoothing is more pronounced.

e. Generate data from an ARMA(1,1) model with \(\phi_1 = 0.6\), \(\theta_1 = 0.6\), and \(\sigma^2 = 1\).

set.seed(525)
phi = 0.6
theta = 0.6
sigma = 1
y1 = ts(numeric(100))
e = rnorm(1000, sigma)
for(i in 2:100)
  y1[i] = phi*y1[i-1] + theta*e[i-1] + e[i]

p1 = autoplot(y1) + labs(y = "y", title = expression(paste("ARMA(1,1): ", phi[1], "= 0.6, ", theta[1], "= 0.6")))
acf1 = ggAcf(y1) + labs(y = "y", title = expression(paste("ARMA(1,1): ", phi[1], "= 0.6, ", theta[1], "= 0.6")))
gridExtra::grid.arrange(p1, acf1, ncol = 2)

f. Generate data from an AR(2) model with \(\phi_1 = -0.8\), \(\phi_1 = 0.3\), and \(\sigma^2 = 1\). (Note that these parameters will give a non-stationary series.)

set.seed(300)
phi_1 = -0.8
phi_2 = 0.3
sigma = 1
y2 = ts(numeric(100))
e = rnorm(100, sigma)
for(i in 3:100)
  y2[i] = phi_1*y2[i-1] + phi_2*y2[i-2] + e[i]

p2 = autoplot(y2) + labs(y = "y", title = expression(paste("AR(2): ", phi[1], "= -0.8, ", phi[2], "= 0.3")))
acf2 = ggAcf(y2) + labs(y = "y", title = expression(paste("AR(2): ", phi[1], "= -0.8, ", phi[2], "= 0.3")))
gridExtra::grid.arrange(p2, acf2, ncol = 2)

g. Graph the latter two series and compare them.

ggtsdisplay(y1, main = "ARMA(1,1) model with $\\phi_1 = 0.6$, $\\theta_1 = 0.6$, and $\\sigma^2 = 1$")

ggtsdisplay(y2, main = "AR(2) model with $\\phi_1 = -0.8$, $\\phi_2 = 0.3$, and $\\sigma^2 = 1$")

9.7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

ts_data <- aus_airpassengers
# Step 2: Use auto.arima to select a model and check residuals.
model_auto <- auto.arima(ts_data)
summary(model_auto)   # displays the selected model

## Series: ts_data 
## ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 = 4.308:  log likelihood = -97.02
## AIC=198.04   AICc=198.32   BIC=201.65
## 
## Training set error measures:
##                     ME     RMSE      MAE       MPE     MAPE     MASE       ACF1
## Training set 0.3453292 2.008183 1.215008 0.9528709 4.809966 0.699953 -0.1743841

# Check residuals for white noise
checkresiduals(model_auto)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,2,1)
## Q* = 6.5774, df = 8, p-value = 0.5828
## 
## Model df: 1.   Total lags used: 9

#Forecast the next 10 periods for the selected model
forecast_auto <- forecast(model_auto, h = 10)
plot(forecast_auto, main = "Forecast from Auto ARIMA Model")

Write the model in terms of the backshift operator.

(1-B)² yₜ = (1+θ₁B) εₜ

Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

# Load the data and filter for years prior to 2012
adjusted_model <- aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ pdq(0,1,0))) 


adjusted_model %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Forecasts of Australian air traffic with ARIMA(0,1,0)", y = "Passengers (in millions)")

# Residual analysis of the model
adjusted_model %>% 
  gg_tsresiduals() +
  labs(title = "Residual analysis for the ARIMA(0,1,0) model")

Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens

adjusted_model2 <- aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ pdq(2,1,2)))


adjusted_model2 %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Forecasts of Australian air traffic with ARIMA(2,1,2)", y = "Passengers (in millions)")

# Residual analysis of the model
adjusted_model2 %>% 
  gg_tsresiduals() +
  labs(title = "Residual analysis for the ARIMA(2,1,2) model")

Remove the constant and see what happens.

adjusted_model3 <-aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))

## Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 2))
## [1] non-stationary AR part from CSS

report(adjusted_model3)

## Series: Passengers 
## Model: NULL model 
## NULL model

Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

adjusted_model4 <- aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ pdq(0,2,1)))


adjusted_model4 %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Forecasts of Australian air traffic with ARIMA(0,2,1)", y = "Passengers (in millions)")

# Residual analysis of the model
adjusted_model4 %>% 
  gg_tsresiduals() +
  labs(title = "Residual analysis for the ARIMA(0,2,1) model")

The slope is getting steeper, and a notice is issued that the model recommends a higher-order polynomial trend, so you should eliminate the constant.

9.8

For the United States GDP series (from global_economy):

a. if necessary, find a suitable Box-Cox transformation for the data;

us_economy <- global_economy %>%
filter(Code == "USA")
us_economy %>%
gg_tsdisplay(GDP, plot_type='partial') +
labs(title = "US GDP Time Series Plot")

lambda_usa <- us_economy %>%
features(GDP, features = guerrero) %>%
 pull(lambda_guerrero)
lambda_usa

## [1] 0.2819443

b.fit a suitable ARIMA model to the transformed data using ARIMA();

fit_arima <- us_economy %>%
model(ARIMA(box_cox(GDP, lambda_usa)))
report(fit_arima)

## Series: GDP 
## Model: ARIMA(1,1,0) w/ drift 
## Transformation: box_cox(GDP, lambda_usa) 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

c.try some other plausible models by experimenting with the orders chosen;

us_economy %>%
gg_tsdisplay(box_cox(GDP, lambda_usa), plot_type='partial') +
labs(title = "Box-Cox Transformed US GDP Time Series Plot")

usa_models <- us_economy %>%

  model(
    arima110 = ARIMA(box_cox(GDP, lambda_usa) ~ pdq(1,1,0)),
    arima111 = ARIMA(box_cox(GDP, lambda_usa) ~ pdq(1,1,1)),
    arima120 = ARIMA(box_cox(GDP, lambda_usa) ~ pdq(1,2,0)),
    arima210 = ARIMA(box_cox(GDP, lambda_usa) ~ pdq(2,1,0)),
    arima212 = ARIMA(box_cox(GDP, lambda_usa) ~ pdq(2,1,2))

   

  )

glance(usa_models) %>% arrange(AICc) %>% select(.model:BIC)

## # A tibble: 5 × 6
##   .model   sigma2 log_lik   AIC  AICc   BIC
##   <chr>     <dbl>   <dbl> <dbl> <dbl> <dbl>
## 1 arima120  6780.   -326.  656.  656.  660.
## 2 arima110  5479.   -325.  657.  657.  663.
## 3 arima111  5580.   -325.  659.  659.  667.
## 4 arima210  5580.   -325.  659.  659.  667.
## 5 arima212  5734.   -325.  662.  664.  674.

Based on the AIC and AICC values, the Arima 120 model is the best-fitting model among the five considered.

usa_models %>%
select(arima120) %>%
gg_tsresiduals() +
ggtitle("Residual Diagnostics for ARIMA(1,2,0) Model")

e.produce forecasts of your fitted model. Do the forecasts look reasonable?

usa_models %>%
forecast(h=10) %>%
filter(.model=='arima120') %>%
autoplot(global_economy)

Yes, the forecasts look reasonable since the trend is continuing at a similar slope.

f.compare the results with what you would obtain using ETS() (with no transformation).

fit_ets_usa <- us_economy %>%
model(ETS(GDP))
report(fit_ets_usa)

## Series: GDP 
## Model: ETS(M,A,N) 
##   Smoothing parameters:
##     alpha = 0.9990876 
##     beta  = 0.5011949 
## 
##   Initial states:
##          l[0]        b[0]
##  448093333334 64917355687
## 
##   sigma^2:  7e-04
## 
##      AIC     AICc      BIC 
## 3190.787 3191.941 3201.089

fit_ets_usa %>%
forecast(h=10) %>%
autoplot(global_economy)

The ETS() model shows a much higher AICc, indicating that it’s not performing as well as previous model.

Data624 Homework_6

Uliana Plotnikova

2025-03-08

9.1.

9.2

9.3

9.5

9.6

9.7

9.8