The difference between these figures lies in the height of the spikes
and how wide the confidence bands are. As the sample size grows, the
spikes become smaller, and the bands get tighter because the bounds
depend on ±1.96 divided by the square root of the sample size, T. While
the second and third plots are consistent with white noise since all
spikes are within the bounds, the first plot shows a few spikes that go
beyond the lines, suggesting some autocorrelation. So, I would say only
the second and third plots likely represent white noise.
The critical values are at different distances from zero because they depend on the sample size. As the length of the time series increases, the critical values get closer to zero since the bounds are calculated as ±1.96 divided by the square root of the sample size. The autocorrelations look different in each figure because random white noise will still produce different patterns in finite samples. Small datasets tend to show more variation just by chance. Even though all plots represent white noise, smaller samples will show more extreme spikes and larger samples will appear flatter and more stable.
gafa_stock
), along with the ACF and PACF. Explain how each
plot shows that the series is non-stationary and should be
differenced.gafa_stock %>%
filter(Symbol == "AMZN") %>%
gg_tsdisplay(Close, plot_type='partial') +
labs(title = "Amazon Closing Stock Price")
## Warning: Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## # A tibble: 1 × 2
## Symbol ndiffs
## <chr> <int>
## 1 AMZN 1
The time series plot of Amazon’s closing stock price shows a strong
upward trend, which is a sign of non-stationarity because the mean is
not constant over time. The ACF plot displays very slow decay, with
autocorrelations remaining high even at large lags, which further
suggests non-stationarity. The PACF has a significant spike at lag 1 and
quickly drops off, which is typical for a series that needs
differencing. Finally, the ndiffs
output confirms that one
difference is needed to make the series stationary. Together, these
plots and the test clearly indicate that the data should be differenced
before modeling.
global_economy
.# plot
global_economy %>%
filter(Country == "Turkey") %>%
gg_tsdisplay(GDP, plot_type='partial') +
labs(title = "Non-transformed Turkish GDP")
# calculate lambda
lambda <- global_economy %>%
filter(Country == "Turkey") %>%
features(GDP, features = guerrero) %>%
pull(lambda_guerrero)
# unit root test
global_economy %>%
filter(Country == "Turkey") %>%
features(box_cox(GDP,lambda), unitroot_ndiffs)
## # A tibble: 1 × 2
## Country ndiffs
## <fct> <int>
## 1 Turkey 1
global_economy %>%
filter(Country == "Turkey") %>%
gg_tsdisplay(difference(box_cox(GDP, lambda)), plot_type = 'partial') +
labs(title = TeX(paste0("Differenced Turkish GDP with $\\lambda$ = ", round(lambda, 2))))
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
Turkish GDP is a non-stationary and non-seasonal series. A Box-Cox
transformation with λ = 0.16 was applied to stabilize the variance, and
one difference was needed to remove the trend. After differencing once,
the series appears stationary as seen in the transformed and differenced
plot, with both ACF and PACF showing no significant
autocorrelations.
aus_accommodation
.# plot
aus_accommodation %>%
filter(State == "Tasmania") %>%
gg_tsdisplay(Takings, plot_type='partial') +
labs(title = "Non-transformed Tasmania Accomodation Takings")
# calculate lambda
lambda <-aus_accommodation %>%
filter(State == "Tasmania") %>%
features(Takings, features = guerrero) %>%
pull(lambda_guerrero)
#unit root test
aus_accommodation %>%
filter(State == "Tasmania") %>%
features(box_cox(Takings,lambda), unitroot_nsdiffs)
## # A tibble: 1 × 2
## State nsdiffs
## <chr> <int>
## 1 Tasmania 1
# transformed plot
aus_accommodation %>%
filter(State == "Tasmania") %>%
gg_tsdisplay(difference(box_cox(Takings,lambda), 4), plot_type='partial') +
labs(title = TeX(paste0("Differenced Tasmania Accomodation Takings with $\\lambda$ = ",
round(lambda,2))))
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
The accommodation takings series for Tasmania is non-stationary with an
upward trend and strong seasonality. A Box-Cox transformation with λ = 0
(log transformation) was applied to stabilize variance. After
differencing once, the trend is removed, but the ACF still shows some
significant spikes at seasonal lags, suggesting the presence of
remaining seasonality. The PACF also shows a strong spike at lag 1 and
some seasonal pattern. Therefore, a log transformation and one
difference address non-stationarity, but a seasonal difference may also
be needed to fully capture the seasonal behavior.
souvenirs
.# plot
souvenirs %>%
gg_tsdisplay(Sales, plot_type='partial', lag = 36) +
labs(title = "Non-transformed Monthly Souvenir Sales")
# calculate lambda
lambda <- souvenirs %>%
features(Sales, features = guerrero) %>%
pull(lambda_guerrero)
# unit root test
souvenirs %>%
features(box_cox(Sales,lambda), unitroot_nsdiffs)
## # A tibble: 1 × 1
## nsdiffs
## <int>
## 1 1
# transformed plot
souvenirs %>%
gg_tsdisplay(difference(box_cox(Sales,lambda), 12), plot_type='partial', lag = 36) +
labs(title = TeX(paste0("Differenced Monthly Souvenir Sales with $\\lambda$ = ",
round(lambda,2))))
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).
The monthly souvenir sales series is highly non-stationary, showing both strong trend and seasonality. A Box-Cox transformation with λ = 0 (log transformation) was applied to stabilize variance. After applying one difference, the trend appears to be removed, but seasonality is still present. The ACF shows significant spikes at seasonal lags, and the PACF has a strong spike at lag 1 with smaller spikes following, suggesting that additional seasonal differencing may be needed to fully capture the seasonal pattern. So, a log transformation and one regular difference address non-stationarity, but seasonal differencing should be considered next.
set.seed(1234)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
# plot
myseries %>%
gg_tsdisplay(Turnover, plot_type='partial', lag = 36) +
labs(title = "Non-transformed Retail Turnover")
# lambda calculation
lambda <- myseries %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
# unit root test
myseries %>%
features(box_cox(Turnover, lambda), unitroot_nsdiffs)
## # A tibble: 1 × 3
## State Industry nsdiffs
## <chr> <chr> <int>
## 1 Tasmania Cafes, restaurants and takeaway food services 1
# transformed plot
myseries %>%
gg_tsdisplay(difference(box_cox(Turnover,lambda), 12), plot_type='partial', lag = 36) +
labs(title = TeX(paste0("Differenced Tasmania Accomodation Takings with $\\lambda$ = ",
round(lambda,2))))
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).
The retail turnover series is clearly non-stationary because of the
strong upward trend and growing variance over time. I applied a Box-Cox
transformation with λ = 0 to stabilize the variance, and after taking
one difference, the trend is mostly removed. However, when I look at the
ACF and PACF plots, I still see significant spikes at seasonal lags,
which suggests there’s remaining seasonality. So, while a log
transformation and first difference help make the data more stable, I
would also consider adding a seasonal difference to fully achieve
stationarity.
\(\phi_1 = 0.6\) and \(\sigma^2 = 1\). The process starts with \(y_1 = 0\).
set.seed(123) # For reproducibility
# Generate noise
e <- rnorm(100)
# Function to simulate AR(1) series given phi
simulate_ar1 <- function(phi, e) {
y <- numeric(100)
for (i in 2:100) {
y[i] <- phi * y[i - 1] + e[i]
}
return(y)
}
# Simulate AR(1) for different phi values
phi_values <- c(0.6, 1, 0, -0.6)
data_list <- lapply(phi_values, function(phi) {
tibble(
idx = 1:100,
y = simulate_ar1(phi, e),
phi_label = paste0("phi==", phi) # Prepare label for parsing
)
})
# Combine into one data frame
combined_data <- bind_rows(data_list)
# Plot using ggplot2 and facet_wrap with parsed labels
ggplot(combined_data, aes(x = idx, y = y)) +
geom_line() +
facet_wrap(~ phi_label, scales = "free_y", ncol = 2, labeller = label_parsed) +
labs(
title = "AR(1) Models with Different φ₁ Values",
x = "Index",
y = "Value"
) +
theme_minimal()
As I change \(\phi_1\), the pattern of
the time series changes too. When \(\phi_1 =
0.6\), the series shows some correlation over time but still
stays around the same range. When \(\phi_1 =
1\), the series looks like a random walk, drifting up and down
without returning to a mean — so it’s non-stationary. When \(\phi_1 = 0\), it’s just white noise,
totally random with no clear pattern. And when \(\phi_1 = -0.6\), the series bounces up and
down more sharply because of the negative correlation, creating that
zig-zag effect.
Write your own code to generate data from an MA(1) model with \(\theta_1 = 0.6\) and \(\sigma^2 = 1\).
Produce a time plot for the series. How does the plot change as you change \(\theta_1\)?
set.seed(123) # For reproducibility
# Function to generate MA(1) series
generate_ma1 <- function(theta, n = 100) {
e <- rnorm(n)
y <- e + theta * lag(e, default = 0)
tibble(idx = 1:n, y = y, theta_label = paste0("theta == ", theta))
}
# Generate series for different theta values
theta_values <- c(-0.6, 0, 0.6, 0.9)
ma1_list <- lapply(theta_values, generate_ma1)
# Combine all into one data frame
combined_ma1 <- bind_rows(ma1_list)
# Plot with facets
ggplot(combined_ma1, aes(x = idx, y = y)) +
geom_line() +
facet_wrap(~ theta_label, labeller = label_parsed) +
labs(title = "MA(1) Models with Different $\\theta_1$ Values", x = "Index", y = "Value") +
theme_minimal()
As \(\theta_1\) decreases, I notice
that the series tends to switch directions more often, leading to more
frequent fluctuations. The overall size of the spikes stays about the
same, so the magnitude doesn’t change much. However, when \(\theta_1\) is lower, the series appears to
hover closer around the mean, making the process look more balanced
without long runs in one direction.
Generate data from an ARMA(1,1) model withProduce a time plot for the series. How does the plot change as you change \(\theta_1\)?
Generate data from an AR(2) model with \(\phi_1 = -0.8\), \(\phi_2 = 0.3\), and \(\sigma^2 = 1\). (Note that these parameters will give a non-stationary series.)
Graph the latter two series and compare them.
library(tsibble)
library(ggplot2)
library(patchwork) # for arranging plots side by side
# First series: AR(1) or ARMA(1,1) with theta = 0
set.seed(123)
e1 <- rnorm(100)
y1 <- numeric(100)
phi <- 0.6
theta <- 0 # effectively AR(1)
for (i in 2:100) {
y1[i] <- phi * y1[i - 1] + e1[i] + theta * e1[i - 1]
}
arma11_series <- tsibble(idx = 1:100, y = y1, index = idx)
# Second series: AR(2)
set.seed(1234)
e2 <- rnorm(100)
y2 <- numeric(100)
for (i in 3:100) {
y2[i] <- -0.8 * y2[i - 1] + 0.3 * y2[i - 2] + e2[i]
}
ar2_series <- tsibble(idx = 1:100, y = y2, index = idx)
# Plotting
p1 <- ggplot(arma11_series, aes(x = idx, y = y)) +
geom_line(color = "steelblue") +
labs(title = "AR(1) Series (φ=0.6)", x = "Time", y = "Value") +
theme_minimal()
p2 <- ggplot(ar2_series, aes(x = idx, y = y)) +
geom_line(color = "darkred") +
labs(title = "AR(2) Series (φ1=-0.8, φ2=0.3)", x = "Time", y = "Value") +
theme_minimal()
# Correct way to arrange side by side
p1 | p2 # Horizontal layout
When comparing the two series, there are some clear differences in their
behavior. The AR(1) series, with a coefficient of 0.6, appears stable
and stationary. It fluctuates smoothly around zero, with values staying
within a reasonable range. There’s no obvious trend, and the ups and
downs feel natural and balanced, showing a consistent pattern over time.
On the other hand, the AR(2) series, with coefficients of -0.8 and 0.3,
behaves very differently. It starts off small but quickly becomes
explosive, with values growing larger and larger over time. This creates
sharp, exaggerated swings that keep getting worse as time goes on. The
series is clearly unstable and non-stationary, meaning it doesn’t settle
around a constant average or variance. While the AR(1) model looks like
something we might see in real-world data because of its stability, the
AR(2) model would need adjustments to its parameters to avoid this
explosive behavior and be usable in practical scenarios.
aus_airpassengers
, the total number of
passengers (in millions) from` Australian air carriers for the period
1970-2011.ARIMA()
to find an appropriate ARIMA model.
What model was selected. Check that the residuals look like white noise.
Plot forecasts for the next 10 periods.## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8756
## s.e. 0.0722
##
## sigma^2 estimated as 4.671: log likelihood=-87.8
## AIC=179.61 AICc=179.93 BIC=182.99
fit %>%
forecast(h=10) %>%
autoplot(aus_airpassengers) +
labs(title = "Australian Aircraft Passengers with ARIMA(0,2,1)", y = "Passengers (in millions)")
Using the
ARIMA()
function, the chosen model is
ARIMA(0,2,1). This means the data was differenced twice to handle the
trend, and a moving average term of order 1 was included. Looking at the
residual plots, they seem to behave like white noise, there’s no clear
pattern, and the autocorrelations are within the confidence bands. This
suggests the model fits well. The forecast for the next 10 periods shows
a continued upward trend, with reasonable prediction intervals that
widen over time to reflect uncertainty. Overall, ARIMA(0,2,1) looks like
a good choice for this data.
Great! Here’s a clear and simple way to write the ARIMA(0,2,1) model using the backshift operator:
\[ (1 - B)^2 y_t = (1 - 0.8756 B) e_t \]
fit2 <-aus_airpassengers %>%
filter(Year < 2012) %>%
model(ARIMA(Passengers ~ pdq(0,1,0)))
fit2 %>%
forecast(h=10) %>%
autoplot(aus_airpassengers) +
labs(title = "Australian Aircraft Passengers with ARIMA(0,1,0)", y = "Passengers (in millions)")
The ARIMA(0,1,0) with drift model gives a more steady and gradual
forecast compared to the ARIMA(0,2,1) model. Its forecast line goes up
smoothly and the confidence intervals are narrower, meaning there’s less
uncertainty. On the other hand, the ARIMA(0,2,1) forecast grows faster
and shows much wider intervals, which means more uncertainty. Overall,
ARIMA(0,1,0) with drift looks more stable and realistic for forecasting
future passenger numbers.
fit3 <-aus_airpassengers %>%
filter(Year < 2012) %>%
model(ARIMA(Passengers ~ pdq(2,1,2)))
fit3 %>%
forecast(h=10) %>%
autoplot(aus_airpassengers) +
labs(title = "Australian Aircraft Passengers with ARIMA(2,1,2)", y = "Passengers (in millions)")
#removing constant
fit4 <-aus_airpassengers %>%
filter(Year < 2012) %>%
model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))
## Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 2))
## [1] non-stationary AR part from CSS
## Series: Passengers
## Model: NULL model
## NULL model
The ARIMA(2,1,2) model with drift gives a forecast that looks similar to part (c) — it shows a steady upward trend that continues the growth in passenger numbers. The residuals also look like white noise, which suggests that the model fits the data well.
When I remove the constant (drift), the model becomes a null model, and the forecast flattens out. This happens because without the drift, the model can only include a polynomial trend of order d−1, which in this case is 0. So, instead of capturing the upward trend, the forecast stays flat and no longer reflects the growth in passengers. This shows that including the drift is important for a more realistic forecast.
## Warning: Model specification induces a quadratic or higher order polynomial trend.
## This is generally discouraged, consider removing the constant or reducing the number of differences.
fit5 %>%
forecast(h=10) %>%
autoplot(aus_airpassengers) +
labs(title = "Australian Aircraft Passengers with ARIMA(0,2,1) with constant", y = "Passengers (in millions)")
When I plot forecasts from an ARIMA(0,2,1) model with a constant, the
forecast grows way too fast and looks unrealistic. This happens because
adding a constant to a twice-differenced model creates a quadratic
trend, which isn’t usually recommended. I also get a warning that
suggests removing the constant or using fewer differences. So overall,
including a constant in this model makes the forecast explode, and it’s
better to leave it out.
global_economy
):# plot
global_economy %>%
filter(Code == "USA") %>%
gg_tsdisplay(GDP, plot_type='partial') +
labs(title = "Non-transformed United States GDP")
# calculate lambda
lambda <-global_economy %>%
filter(Code == "USA") %>%
features(GDP, features = guerrero) %>%
pull(lambda_guerrero)
The GDP data shows a strong upward trend, and the ACF and PACF decay slowly, which means it’s non-stationary. A log transformation would be a good choice to stabilize the variance.
ARIMA()
;fit6 <- global_economy %>%
filter(Code == "USA") %>%
model(ARIMA(box_cox(GDP, lambda)))
report(fit6)
## Series: GDP
## Model: ARIMA(1,1,0) w/ drift
## Transformation: box_cox(GDP, lambda)
##
## Coefficients:
## ar1 constant
## 0.4586 118.1822
## s.e. 0.1198 9.5047
##
## sigma^2 estimated as 5479: log likelihood=-325.32
## AIC=656.65 AICc=657.1 BIC=662.78
I fit an ARIMA model to the transformed GDP data, and the best model is ARIMA(1,1,0) with drift. This means the data was differenced once to handle the trend, and it includes an AR term and a constant. A Box-Cox transformation was also used to stabilize the variance. Both the AR term and the drift are significant, so this model seems like a good fit for the data.
global_economy %>%
filter(Code == "USA") %>%
gg_tsdisplay(box_cox(GDP,lambda), plot_type='partial') +
labs(title = "Transformed United States GDP")
# unit root test
global_economy %>%
filter(Code == "USA") %>%
features(box_cox(GDP,lambda), unitroot_ndiffs)
## # A tibble: 1 × 2
## Country ndiffs
## <fct> <int>
## 1 United States 1
usa_fit <- global_economy %>%
filter(Code == "USA") %>%
model(arima110 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,1,0)),
arima120 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,2,0)),
arima210 = ARIMA(box_cox(GDP,lambda) ~ pdq(2,1,0)),
arima212 = ARIMA(box_cox(GDP,lambda) ~ pdq(2,1,2)),
arima111 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,1,1)))
glance(usa_fit) %>% arrange(AICc) %>% select(.model:BIC)
## # A tibble: 5 × 6
## .model sigma2 log_lik AIC AICc BIC
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 arima120 6780. -326. 656. 656. 660.
## 2 arima110 5479. -325. 657. 657. 663.
## 3 arima111 5580. -325. 659. 659. 667.
## 4 arima210 5580. -325. 659. 659. 667.
## 5 arima212 5734. -325. 662. 664. 674.
The unit root test suggests that I only need to difference the data once. The ACF plot shows a decreasing trend, and the PACF has a significant spike at lag 1, which points to an AR(1) model, or ARIMA(1,1,0). This is consistent with what I found in part b. The AICc values for the models are close, with ARIMA(1,2,0) having the lowest, but ARIMA(1,1,0) is right behind it.
I think ARIMA(1,1,0) with drift is the best model. Even though ARIMA(1,2,0) had a slightly lower AICc, the residual diagnostics for ARIMA(1,1,0) look better. The residuals for ARIMA(1,2,0) show some large spikes and outliers, and the histogram is a bit skewed. The ACF also shows some small but noticeable autocorrelations. So overall, ARIMA(1,1,0) with drift gives a good fit and cleaner residuals, making it the better choice.
I produced forecasts from my fitted model, and they look reasonable. The forecast continues the upward trend of GDP, which makes sense based on the past data. The prediction intervals also widen over time, showing more uncertainty in the long run. Overall, the forecasts seem to align well with the pattern of the data.
ETS()
(with no transformation).## Series: GDP
## Model: ETS(M,A,N)
## Smoothing parameters:
## alpha = 0.9990876
## beta = 0.5011949
##
## Initial states:
## l[0] b[0]
## 448093333334 64917355687
##
## sigma^2: 7e-04
##
## AIC AICc BIC
## 3190.787 3191.941 3201.089
When I fit the ETS model without any transformation, the forecasts still follow the upward trend in GDP, but they seem a bit more flexible and responsive to recent data compared to the ARIMA model. The prediction intervals are wider, showing more uncertainty, but the general shape of the forecast looks reasonable.
Also, the AIC, AICc, and BIC values from the ETS model are much higher than the ARIMA model, which suggests that ARIMA may be a better fit for this data in terms of model selection criteria. So while both models give reasonable forecasts, ARIMA seems to fit the data slightly betterz based on these results.