Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
a) Explain the differences among these figures. Do they all indicate that the data are white noise?
We know they all indicate white noise, however what varies is the sample of size, making it easier with a larger sample to observe white noise representation, as the ACF estimation becomes more reliable, evidenced by the confidence bounds approaching 0.
Given the variability that the left graphic shows, and the small size of the sample, it makes it slightly harder to conclude is only white noise.
The middle and right graphics show with a lot more certainty bars well within the confidence bounds, and very close to 0.
b) Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?
Since we know that the formula for critical values (or the confidence bounds) includes using the sample size as a denominator (its root square), the bigger the sample, the larger the denominator. Thus, the smaller the final size of the critical values, shrinking the confidence bounds.
Since white noise in theory should show no correlation all, with smaller samples fluctuation in the data may give more easily the impression there is correlation when there is not. When the sample size increases the variability in autocorrelation values decreases, making the process look more like true white noise.
9.2
A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
Load needed libraries
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
The ACF values remain high across many lags and decrease very slowly instead of quickly dropping, meaning that past values have a strong influence on future values.
The PACF has a large spike at lag 1 and then quickly falls near zero. This is typical of a random walk, meaning each observation is highly correlated with the previous one, but not beyond that.
# DIFFERENCINGamazon_stock <- amazon_stock |>mutate(Close_diff =difference(Close)) |># First-order differencingdrop_na() # Remove NA from differencing# AFTER DIFFERENCINGACF_amazon_diff <- amazon_stock |>ACF(Close_diff) |>autoplot() +labs(title ="ACF Plot After Differencing")PACF_amazon_diff <- amazon_stock |>PACF(Close_diff) |>autoplot() +labs(title ="PACF Plot After Differencing")ACF_amazon_diff
PACF_amazon_diff
The ACF now drops to near zero from the beginning, indicating that the trend has been successfully removed with some small spikes occurring at certain lags, but within expected randomness. Also, the PACF has smaller spikes but no clear pattern beyond lag 1.
9.3
For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
a) Turkish GDP from global_economy.
First load the data and plot it
turkey_gdp <- global_economy |>filter(Country =="Turkey") |>select(Year, GDP) |>as_tsibble(index = Year)autoplot(turkey_gdp, GDP) +labs(title ="Turkish GDP (Raw Data)", x ="Year", y ="GDP") +theme_minimal()
The plot of raw data for Turkey shows a strong upward trend, which confirms non-stationarity. Also, the variance appears to increase over time, suggesting that a transformation is necessary before differencing.
So the next step is finding lamba and apply a Box-Cox transformation:
lambda <- turkey_gdp |>features(GDP, features = guerrero) # Guerrero method finds best lambdalambda
# A tibble: 1 × 1
lambda_guerrero
<dbl>
1 0.157
# applying box-cox transformation with optimal lambdaturkey_gdp <- turkey_gdp |>mutate(GDP_transformed =box_cox(GDP, lambda$lambda_guerrero))# plot the turkey transformed seriesautoplot(turkey_gdp, GDP_transformed) +labs(title ="Box-Cox Transformed Turkish GDP", x ="Year", y ="Transformed GDP") +theme_minimal()
We find that the optimal lambda is 0.157, which means a logarithmic transformation might be the most appropriate one, and after applying the Box-Cox transformation the series appears more stable, but still with an upward trend and requiring differencing.
The ACF shows a slow decay, which indicates a non-stationary series. And the PACF has a strong spike at lag 1 and cuts off after that.
But let’s say that we couldn’t easily confirm visually that the data was non-stationary and needed differencing. Applying a KPPS test is a quick way to get a quantitative test to confirm or reject stationarity.
Finally, the ACF plot after differencing shows that autocorrelation quickly drops off, confirming stationarity. And the new PACF plot has some small spikes, but no strong patterns.
b) Accommodation takings in the state of Tasmania from aus_accommodation.
Loand and plot data
# Load and prepare the datasettas_accommodation <- aus_accommodation |>filter(State =="Tasmania") |>select(Date, Takings) |>as_tsibble(index = Date) # Using "Date" instead of "Month"# ---------------- PLOT RAW DATA ---------------- #autoplot(tas_accommodation, Takings) +labs(title ="Tasmania Accommodation Takings (Raw Data)", x ="Year", y ="Takings") +theme_minimal()
The initial plot shows both a trend and strong seasonality. So we look for optimal lambda next.
lambda_tas <- tas_accommodation |>features(Takings, features = guerrero) lambda_tas
# A tibble: 1 × 1
lambda_guerrero
<dbl>
1 0.00182
And apply a box-cox transformation according to lambda (close to 0 so possibly logarithmic)
tas_accommodation <- tas_accommodation |>mutate(Takings_transformed =box_cox(Takings, lambda_tas$lambda_guerrero))autoplot(tas_accommodation, Takings_transformed) +labs(title ="Box-Cox Transformed Tasmania Accommodation Takings", x ="Year", y ="Transformed Takings") +theme_minimal()
The transformed data still shows an upward trend, meaning differencing is still needed. We can confirm this with ACF and PACF plots.
The ACF plot shows slow decay across many lags, confirming the presence of a trend, and repeating seasonal spikes every 4 quarters suggest seasonality.
The ACF plot still shows strong seasonal spikes (every 4 quarters) and the PACF one still has a strong seasonal lag-4 spike. So we go for seasonal differencing to remove the seasonal pattern.
tas_accommodation <- tas_accommodation |>mutate(Takings_seasonal_diff =difference(Takings_diff, lag =4)) |># Seasonal differencingdrop_na() # Remove NA from differencingautoplot(tas_accommodation, Takings_seasonal_diff) +labs(title ="Seasonally Differenced Tasmania Accommodation Takings", x ="Year", y ="Seasonally Differenced Takings") +theme_minimal()
The resulting plots shows no evident trend, and any fluctuation seems constant around a mean.
The ACF no longer exhibits a slow decay, which suggests that trend has been removed, as well as the PACF which does not have a strong lag-1 spike, confirming that the first differencing worked.
No further differencing needed at this point.
c) Monthly sales from souvenirs.
Load and plot the data
souvenirs_ts <- souvenirs |>mutate(Month =yearmonth(Month)) |># Ensure Month is in year-month formatas_tsibble(index = Month) # Convert to time series tibbleautoplot(souvenirs_ts, Sales) +labs(title ="Souvenir Monthly Sales (Raw Data)", x ="Year", y ="Sales") +theme_minimal()
The initial plot shows a strong trend for sales increasing over time, as well as recurring peaks, likely around the same months each year. It also shows high variance, indicating heteroskedasticity. We can look for optimal lambda and apply box-cox transformation
lambda_souvenirs <- souvenirs_ts |>features(Sales, features = guerrero) lambda_souvenirs
# A tibble: 1 × 1
lambda_guerrero
<dbl>
1 0.00212
souvenirs_ts <- souvenirs_ts |>mutate(Sales_transformed =box_cox(Sales, lambda_souvenirs$lambda_guerrero))autoplot(souvenirs_ts, Sales_transformed) +labs(title ="Box-Cox Transformed Souvenir Sales", x ="Year", y ="Transformed Sales") +theme_minimal()
The transformation makes fluctuations more stable but a trend is still present, meaning we still need differencing.
The ACF plot before differencing shows slow decay and a strong positive correlations at high lags. Also there is significant PACF at lag 1 and seasonal spikes at lag 12.
souvenirs_ts <- souvenirs_ts |>mutate(Sales_diff =difference(Sales_transformed)) |># First differencingdrop_na() # Remove NA values after differencing# Plot first differenced seriesautoplot(souvenirs_ts, Sales_diff) +labs(title ="First Differenced Souvenir Sales", x ="Year", y ="Differenced Sales") +theme_minimal()
The first-differenced plot removes the trend but a a seasonal pattern is still visible, indicating seasonal differencing is needed.
And the ACF now decays quickly, meaning stationarity has been achieved.
9.5
For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
Let’s load our data yet again.
# seedset.seed(86863)# Use same random time series from aus_retailmyseries <- aus_retail %>%filter(`Series ID`==sample(aus_retail$`Series ID`, 1))# Convert to tsibblemyseries <- myseries %>%as_tsibble(index = Month)glimpse(myseries)
Rows: 441
Columns: 5
Key: State, Industry [1]
$ State <chr> "New South Wales", "New South Wales", "New South Wales", "…
$ Industry <chr> "Takeaway food services", "Takeaway food services", "Takea…
$ `Series ID` <chr> "A3349792X", "A3349792X", "A3349792X", "A3349792X", "A3349…
$ Month <mth> 1982 Apr, 1982 May, 1982 Jun, 1982 Jul, 1982 Aug, 1982 Sep…
$ Turnover <dbl> 85.4, 84.8, 80.7, 82.4, 80.7, 82.1, 87.3, 87.4, 97.2, 93.0…
autoplot(myseries, Turnover) +labs(title ="Monthly Retail Turnover",x ="Year", y ="Turnover") +theme_minimal()
The initial plot of monthly retail shows a clear upward trend over time, indicating non-stationarity. There is also some visible seasonality in the data, meaning that differencing might be required both for trend and seasonal components.
lambda <- myseries %>%features(Turnover, features = guerrero)lambda
# A tibble: 1 × 3
State Industry lambda_guerrero
<chr> <chr> <dbl>
1 New South Wales Takeaway food services 0.00214
The ACF plot shows strong positive autocorrelations at all lags, confirming that the series is still non-stationary. Also, the PACF plot has a large significant spike at lag 1, indicating a strong trend component.
# Apply first differencingmyseries <- myseries %>%mutate(Turnover_diff =difference(Turnover_transformed)) %>%drop_na()# Plot first differenced seriesautoplot(myseries, Turnover_diff) +labs(title ="First Differenced Retail Turnover",x ="Year", y ="Differenced Turnover") +theme_minimal()
# ACF & PACF after first differencingacf_plot_diff1 <- myseries %>%ACF(Turnover_diff) %>%autoplot() +labs(title ="ACF: First Differenced Retail Turnover")pacf_plot_diff1 <- myseries %>%PACF(Turnover_diff) %>%autoplot() +labs(title ="PACF: First Differenced Retail Turnover")acf_plot_diff1
pacf_plot_diff1
After differencing, the time series plot appears more stable, with fluctuations around a constant mean. However, there are still seasonal patterns visible. Also, the ACF plot now shows a significant spike at lag 12, confirming the presence of seasonality, and the PACF plot also has a spike at lag 12. This tell us we need an additional seasonal differencing step.
The final time series plot appears stationary, with no obvious trends. The ACF plot has only minor autocorrelation at some lags, and the PACF plot also shows no strong signs of additional differencing required.
# KPSS test to check stationaritykpss_test <- myseries %>%features(Turnover_seasonal_diff, unitroot_kpss)kpss_test
# A tibble: 1 × 4
State Industry kpss_stat kpss_pvalue
<chr> <chr> <dbl> <dbl>
1 New South Wales Takeaway food services 0.0164 0.1
Since the p-value > 0.05, we fail to reject the null hypothesis of stationarity, meaning the series is now stationary and further differencing is not needed.
9.6
Simulate and plot some data from simple ARIMA models.
a) Use the following R code to generate data from an AR(1) model with\(ϕ_1 =0.6\)and\(σ^2=1\). The process starts with\(y_1=0\)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
b) Produce a time plot for the series. How does the plot change as you change \(ϕ_1\)?
Let’s use the provided code to create a function that goes over different values of phi I will manually choose. We are still keeping the value of 0.6 which was the initial provided value.
n <-100# of observationsphi_values <-c(-0.9, -0.3, 0, 0.3, 0.6, 0.9) # different values of phi# store data of each phigenerate_ar1 <-function(phi, n) {y <-numeric(n) e <-rnorm(n)for (i in2:n) {y[i] <- phi * y[i-1] + e[i]}tibble(idx =seq_len(n), y = y, phi =paste("Phi =", phi))}# combining the datasetsim_data <-bind_rows(lapply(phi_values, generate_ar1, n = n))# Converting to a tsibblesim_data <-as_tsibble(sim_data, index = idx, key = phi)ggplot(sim_data, aes(x = idx, y = y)) +geom_line() +facet_wrap(~phi, ncol =2) +# Arrange in a 2-column gridlabs(title ="AR(1) Process for Different Phi Values",x ="Time", y ="Value") +theme_minimal()
Depending on the value of phi we can see that high positive values of phi produce smoother trends, while high negative values creates a more oscillatory behavior. When its closer to 0 there does not seem to be any correlation but rather some white noise.
c) Write your own code to generate data from an MA(1) model with \(θ_1=0.6\) and \(σ^2=1\)
simulate_MA1 <-function(theta, sigma2 =1, n =100) { e <-rnorm(n, mean =0, sd =sqrt(sigma2)) # White noise y <-numeric(n)for (i in2:n) { y[i] <- e[i] + theta * e[i-1] # MA(1) process }return(tibble(idx =seq_len(n), y = y, theta =factor(theta)))}# Define different theta values to exploretheta_values <-c(-0.84, -0.2, 0, 0.35, 0.6, 0.99)# Generate data for each theta and combine into a single tibblesimulated_data <-bind_rows(lapply(theta_values, simulate_MA1))
d) Produce a time plot for the series. How does the plot change as you change \(θ_1\) ?
ggplot(simulated_data, aes(x = idx, y = y)) +geom_line() +facet_wrap(~theta, ncol =2) +labs(title ="MA(1) Process for Different Theta Values",x ="Time", y ="Value") +theme_minimal()
Similar to the previous plot, with a small value of phi the data looks more random. When phi grows towards negative values it seems to narrow the data, and when phi grows towards positive values there seem to be more evident trends.
e) Generate data from an ARMA(1,1) model with \(ϕ_1=0.6\)\(θ_1=0.6\) and \(σ^2=1\) .
n <-100phi <-0.6# AR(1) coefficienttheta <-0.6# MA(1) coefficientsigma <-1# Standard deviation of white noise# white noisee <-rnorm(n, mean =0, sd = sigma)#seriesy <-numeric(n)y[1] <- e[1] # Start process with first noise term# Simulate ARMA(1,1) processfor (i in2:n) { y[i] <- phi * y[i-1] + e[i] + theta * e[i-1]}# Convert to tsibblesim_data <-tsibble(idx =seq_len(n), y = y, index = idx)
f) Generate data from an AR(2) model with \(ϕ_1=−0.8\) , \(ϕ_2=0.3\) and \(σ^2=1\) (Note that these parameters will give a non-stationary series.)
n <-100# Length of time seriesphi1 <--0.8# AR(1) coefficientphi2 <-0.3# AR(2) coefficientsigma <-1# Standard deviation of white noise# white noisee <-rnorm(n, mean =0, sd = sigma)# y seriesy_ar2 <-numeric(n)y[1] <- e[1] # Start process with first noise termy[2] <- e[2] # Second value is also white noise# Simulate AR(2) for (i in3:n) { y[i] <- phi1 * y[i-1] + phi2 * y[i-2] + e[i]}sim_data_ar2 <-tsibble(idx =seq_len(n), y = y, index = idx)
g) Graph the latter two series and compare them
# ARMA(1,1) ggplot(sim_data, aes(x = idx, y = y)) +geom_line() +labs(title ="Simulated ARMA(1,1) Process (ϕ1=0.6, θ1=0.6, σ²=1)",x ="Time",y ="Value") +theme_minimal()
# AR(2)ggplot(sim_data_ar2, aes(x = idx, y = y)) +geom_line() +labs(title ="Simulated AR(2) Process (ϕ1=-0.8, ϕ2=0.3, σ²=1)",x ="Time",y ="Value") +theme_minimal()
The AR (1) plot shows a relatively stable time series, fluctuating around zero (as it remains within a stable range).
The AR(2) plot shows how the amplitude of oscillations increases over time, showing non-stationary series, in this case a diverging series.
9.7
Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.
a) Use ARIMA () to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.
We can start by plotting the data to explore
# Plot the raw dataaus_airpassengers %>%autoplot(Passengers) +labs(title ="Australian Air Passengers (1970-2011)",y ="Passengers (millions)")
We can easily observe an upward trend. So let’s fit a model.
fit <- aus_airpassengers %>%model(ARIMA(Passengers))report(fit)
The selected model is ARIMA(0,2,1), meaning p =0 no autoregressive terms; d = 2 meaning it requried a second-order differencing before it became stationary; q=1 meaning a first order moving average (or MA) component was selected. Given that MA coefficient is -0.8963 it tells us there is a strong influence of past errors.
# Checking residuals fit %>%gg_tsresiduals()
Most residuals are centered around zero, but there are some fluctuations. The ACF of residuals does not show significant autocorrelation. The histogram of residuals is almost normal but with some outliers.
Now we forecast the next 10 periods and plot the forecasts
fc <- fit %>%forecast(h =10)fc %>%autoplot(aus_airpassengers) +labs(title ="10-Period Forecasts for Australian Air Passengers",y ="Passengers (millions)")
b) Write the model in terms of the backshift operator.
We know that backshift operator can be expressed as
\(B^ky_t=y_{t−k}\)
And differencing is represented by \((1-B) y_t= y_t-y_{t-1}\)
Since the model is ARIMA (0,2,1), and d=2, we apply differencing twice by squaring (1-B)
So we get:
\[
(1-B)^2y_t = (1-2B+B^2)y_t
\]
When we multiply \(y_t\) by 2B it shifts \(y_t\) back one time and scales it by 2. When multipled by \(B^2\) it shifts back 2 time periods.
Because the model has a Moving Average of 1, which is defined as
\[
y_t = e_t + \theta_1e_{t-1}
\]
Everything results in:
\[
y_t -2y{t-1} +y_{t-2}=e_t +\theta_1e_{t-1}
\]
If we subsitute \(\theta_1\) with -0.8963 we get
\[
y_t -2y{t-1} +y_{t-2}=e_t - 0.8963e_{t-1}
\]
Or substituting earlier in the equation we can have
\[
(1-2B + B^2)y_t = (1-08963B)e_t
\]
c) Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.
data_aus_p <- aus_airpassengers # By specifying only drift it will apply an ARIMA (0,1,0)fit <- data_aus_p %>%model(ARIMA(Passengers ~1))report(fit)
fc <- fit %>%forecast(h =10)fc %>%autoplot(data_aus_p) +labs(title ="Forecasts of ARIMA(0,1,0) Model with Drift",y ="Passengers (millions)", x ="Year") +theme_minimal()
The ARIMA (0,1,0) assumes a linear trend, meaning the forecast looks slightly more like a straight line, wile ARIMA (0,2,1) is a little curvier. At the same time ARIMA (0,2,1) seems to have a wider prediction interval.
d) Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.
fit_arima212 <- aus_airpassengers %>%model(ARIMA(Passengers ~1+pdq(2,1,2)))fc_arima212 <- fit_arima212 %>%forecast(h =10)fc_arima212 %>%autoplot(aus_airpassengers) +labs(title ="Forecasts from ARIMA(2,1,2) Model with Drift",y ="Passengers (millions)", x ="Year") +theme_minimal()
This plot now has confidence intervals slightly more constrained than ARIMA (0,2,1), indicating a better model stability perhaps. Since ARIMA (0,2,1), assumes an accelerating trend it leads to a more aggressive forecast, while this one (2,1,2) seems more balanced, showing short term fluctuations better.
# ARIMA(2,1,2) without driftfit_no_drift <- aus_airpassengers |>model(ARIMA(Passengers ~0+pdq(2,1,2)))
Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 2))
[1] non-stationary AR part from CSS
fc_no_drift <- fit_no_drift |>forecast(h =10)fc_no_drift |>autoplot(aus_airpassengers) +labs(title ="Forecasts from ARIMA(2,1,2) Without Drift",y ="Passengers (millions)", x ="Year")
Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
-Inf
Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
-Inf
Warning: Removed 10 rows containing missing values or values outside the scale range
(`geom_line()`).
When setting the constant to 0 we get a warning “non-stationary AR part from CSS” suggests that the ARIMA(2,1,2) model without drift may be numerically unstable or non-invertible. This issue arises because the estimated AR coefficients likely violate the stationarity condition.
e) Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?
# ARIMA(0,2,1) with a constantfit_021 <- aus_airpassengers |>model(ARIMA(Passengers ~pdq(0,2,1) +1))
Warning: Model specification induces a quadratic or higher order polynomial trend.
This is generally discouraged, consider removing the constant or reducing the number of differences.
fit_021 <- fit_021 |>forecast(h =10)fit_021 |>autoplot(aus_airpassengers) +labs(title ="Forecasts from ARIMA(0,2,1) Model with a Constant",y ="Passengers (millions)", x ="Year")
When a constant is included in a model with two differences (d=2), the forecasts will follow a quadratic growth pattern, since most real life situations don’t follow this trend indefinitely, the forecast will become more unrealistic as times goes on.
9.8
For the United States GDP series (from global_economy):
a) if necessary, find a suitable Box-Cox transformation for the data;
Load and plot the original data
us_gdp <- global_economy |>filter(Country =="United States") |>select(Year, GDP)autoplot(us_gdp, GDP) +labs(title ="United States GDP", y ="GDP (USD)", x ="Year")
The initial plot shows strong exponential growth. We can find optimal lambda next.
lambda_bc <- us_gdp |>features(GDP, features = guerrero) lambda_bc
# A tibble: 1 × 1
lambda_guerrero
<dbl>
1 0.282
This lambda indicates that the transformation will be something between a log transformation and a square root transformation.
us_gdp <- us_gdp |>mutate(GDP_transformed =box_cox(GDP, lambda_bc$lambda_guerrero))autoplot(us_gdp, GDP_transformed) +labs(title ="Box-Cox Transformed US GDP", y ="Transformed GDP", x ="Year")
The transformed series is less exponential now, with more stable variance.
b) fit a suitable ARIMA model to the transformed data using ARIMA();
The chosen model was ARIMA (1,1,0) with drift, meaning a differencing of 1 was required; the model did not require a moving average term, and the drift (constant) of 118.1822 suggests a steady upward trend.
c) try some other plausible models by experimenting with the orders chosen;
# ARIMA(2,1,0) with constantfit_arima_210 <- us_gdp |>model(ARIMA(GDP_transformed ~pdq(2,1,0) +1))report(fit_arima_210)
Other than ARIMA(1,1,0) with drift, which was automatically chosen by ARIMA(), the next best option seemed an ARIMA (2,1,0) with drift, as it only had a slightly higher AIC. However, it add complexity without improving the fit significantly.
d) choose what you think is the best model and check the residual diagnostics;
The residuals appear to fluctuate over time, but there are some large spikes, suggesting that the model does not completely capture all variations in the data. Also, there are some significant autocorrelations in the residuals at certain lags, indicating that the model might not fully capture the structure in the data. Finally, the distribution of residuals appears slightly skewed, suggesting potential non-normality.
e) produce forecasts of your fitted model. Do the forecasts look reasonable?
# Forecast for the next 10 yearsforecast_arima110 <- fit_arima110 |>forecast(h =10)# Plot forecastsforecast_arima110 |>autoplot(us_gdp) +labs(title ="Forecasts from ARIMA(1,1,0) Model with Drift",y ="Transformed GDP", x ="Year")
The forecast follows a linear increasing trend. Since the model includes drift, the future values are expected to continue increasing.
The forecast appears reasonable in terms of the general increasing pattern of GDP, but since the residuals are not completely white noise, it suggests the model might not be the best fit.
This model assumes a constant trend in GDP growth, which might be a reasonable assumption for short-term forecasting. However, the residual diagnostics indicate potential autocorrelation issues, meaning the model does not fully capture the time series structure.
f) compare the results with what you would obtain using ETS() (with no transformation).
us_gdp <- global_economy |>filter(Country =="United States") |>select(Year, GDP)# ETS model with automatic selectionfit_ets <- us_gdp |>model(ETS(GDP))# Display model summaryreport(fit_ets)
# Forecast for the next 10 periodsforecast_ets <- fit_ets |>forecast(h =10)# Plot the forecastsforecast_ets |>autoplot(us_gdp) +labs(title ="ETS Model Forecast for U.S. GDP",y ="GDP (USD)", x ="Year")
The ETS model uses a multiplicative structure with an additive trend, and the forecast confidence intervals are very wide, indicating that the model projects high uncertainty in the future. The high AIC suggests poorer model fit compared to the ARIMA model.