DATA624 - Homework 6

Author

Anthony Josue Roman

Exercise 9.1

9.1 A

The difference between the three ACF plots is due to the number of observations. There are 36, 360, and 1,000 observations. With 36 observations, there is more fluctuation in the autocorrelation values with larger spikes due to randomness. As the number of observations increases, the autocorrelation function stabilizes closer to zero for all lags.

Yes, they are consistent with white noise for all three plots. While the autocorrelation of white noise is zero at all lags, there is some fluctuation in the autocorrelation values due to randomness.

9.1 B

The critical values are at different distances from zero because they depend on the sample size. The bounds are approximated using the standard error of the autocorrelation of the sample. The standard error is approximately equal to \(1/\sqrt{n}\). For smaller values of the sample size \(n\), the standard error is larger. Therefore, the bounds are wider. For larger values of the sample size \(n\), the standard error is smaller. Therefore, the bounds are narrower.

The autocorrelation values are different for each figure because each figure is based on a different random sample of white noise. Even though the three series are white noise in theory, the autocorrelation values will be different for each series due to random chance. For larger values of the sample size \(n\), the autocorrelation values will be closer to zero.

Exercise 9.2

Code
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

df = pd.read_csv("gafa_stock.csv")

df["ds"] = pd.to_datetime(df["ds"])

amazon = df[df["unique_id"] == "AMZN_Close"].copy()

amazon = amazon.sort_values("ds")

plt.figure(figsize=(10, 4))
plt.plot(amazon["ds"], amazon["y"])
plt.title("Amazon Daily Closing Prices")
plt.xlabel("Date")
plt.ylabel("Price")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

plot_acf(amazon["y"], lags=50)
plt.title("ACF - Amazon Stock")
plt.show()

plot_pacf(amazon["y"], lags=50)
plt.title("PACF - Amazon Stock")
plt.show()

The time plot of Amazon’s closing prices shows a strong trend in an upward direction with fluctuations in the data. However, since the mean is changing with time, it is not a stationary process.

The ACF plot also suggests that the process is not stationary because all the autocorrelation values are very large and positive. In a stationary process, the autocorrelation values tend to decrease quickly.

The PACF plot shows a very large spike in the first lag value with all other values being very low. In addition to this, a strong trend in Amazon’s closing prices also suggests that the process is not stationary.

The three plots suggest that Amazon’s closing prices are not stationary and hence should be differenced before fitting an ARIMA model.

Exercise 9.3

Code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from scipy.stats import boxcox_normmax
from scipy.special import boxcox1p
from statsmodels.graphics.tsaplots import plot_acf

9.3 A

Code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf

global_economy = pd.read_csv("global_economy.csv")

turkey = (
    global_economy[global_economy["unique_id"] == "Turkey"]
    .copy()
    .sort_values("ds")
)

turkey["ds"] = pd.to_datetime(turkey["ds"], format="%Y")

turkey = turkey[["ds", "GDP"]].dropna()

turkey["y"] = np.log(turkey["GDP"])

turkey["diff1"] = turkey["y"].diff()

fig, axes = plt.subplots(3, 1, figsize=(10, 8))

axes[0].plot(turkey["ds"], turkey["GDP"])
axes[0].set_title("Turkey GDP")
axes[0].set_xlabel("")
axes[0].tick_params(axis="x", labelbottom=False)

axes[1].plot(turkey["ds"], turkey["y"])
axes[1].set_title("Log Transformed GDP")
axes[1].set_xlabel("")
axes[1].tick_params(axis="x", labelbottom=False)

axes[2].plot(turkey["ds"], turkey["diff1"])
axes[2].set_title("First Difference")
axes[2].set_xlabel("Year")

plt.tight_layout()
plt.show()

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

plot_acf(turkey["y"].dropna(), lags=20, ax=axes[0])
axes[0].set_title("ACF - Log GDP")

plot_acf(turkey["diff1"].dropna(), lags=20, ax=axes[1])
axes[1].set_title("ACF - Differenced GDP")

plt.tight_layout()
plt.show()

For the case of the Turkish GDP, it is observed that there is a strong trend in the series, and the variation is also increasing. Therefore, a transformation is required. The type of transformation required in this case is a log transformation, as the variation has to be constant.

For the case of the log transformation of the Turkish GDP, it is observed that there is a trend in the series. Therefore, taking one regular difference is enough to remove the trend in the series, making it stationary. This can also be proved by taking the ACF, as the values are small and within the confidence bounds.

Therefore, in the case of the Turkish GDP, a log transformation is required, followed by one regular difference. Thus, \(d=1\).

9.3 B

Code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf

accom = pd.read_csv("aus_accomodation.csv")

tas = (
    accom[accom["unique_id"] == "Tasmania"]
    .copy()
    .sort_values("ds")
)

tas["ds"] = pd.to_datetime(tas["ds"])

tas = tas[["ds", "Takings"]].dropna()

tas["y"] = np.log(tas["Takings"])

tas["diff_seasonal"] = tas["y"].diff(4)

tas["diff_both"] = tas["diff_seasonal"].diff()

fig, axes = plt.subplots(4, 1, figsize=(10, 10))

axes[0].plot(tas["ds"], tas["Takings"])
axes[0].set_title("Tasmania Accommodation Takings")
axes[0].tick_params(axis="x", labelbottom=False)

axes[1].plot(tas["ds"], tas["y"])
axes[1].set_title("Log Transformed")
axes[1].tick_params(axis="x", labelbottom=False)

axes[2].plot(tas["ds"], tas["diff_seasonal"])
axes[2].set_title("Seasonal Difference (lag 4)")
axes[2].tick_params(axis="x", labelbottom=False)

axes[3].plot(tas["ds"], tas["diff_both"])
axes[3].set_title("Seasonal + First Difference")
axes[3].set_xlabel("Date")

plt.tight_layout()
plt.show()

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

plot_acf(tas["y"].dropna(), lags=20, ax=axes[0])
axes[0].set_title("ACF - Log Series")

plot_acf(tas["diff_seasonal"].dropna(), lags=20, ax=axes[1])
axes[1].set_title("ACF - Seasonal Diff")

plot_acf(tas["diff_both"].dropna(), lags=20, ax=axes[2])
axes[2].set_title("ACF - Seasonal + First Diff")

plt.tight_layout()
plt.show()

The series of accommodation takings in Tasmania appears to have an increasing trend and strong seasonal effects, suggesting that the data are not stationary. Also, the variability of the data appears to increase with the level of the data, suggesting that a log transformation is appropriate.

After applying the log transformation to the data, the result of applying a seasonal difference with lag 4 to the data eliminates most of the seasonal effects. However, there still appears to be a trend effect. After applying an additional first difference to the data, the trend effect appears to be eliminated. This suggests that the data are approximately stationary. This conclusion can be supported by the ACF plot, where all the autocorrelations are within the confidence interval.

The appropriate transformation appears to be a log transformation with one seasonal difference and one regular difference, suggesting that \(D = 1\) and \(d = 1\).

9.3 C

Code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf

souvenirs = pd.read_csv("souvenirs.csv")

souvenirs["Month"] = pd.to_datetime(souvenirs["Month"])
souvenirs = souvenirs.sort_values("Month")
souvenirs = souvenirs[["Month", "Sales"]].dropna()

souvenirs["y"] = np.log(souvenirs["Sales"])

souvenirs["diff_seasonal"] = souvenirs["y"].diff(12)

souvenirs["diff_both"] = souvenirs["diff_seasonal"].diff()

fig, axes = plt.subplots(4, 1, figsize=(10, 10))

axes[0].plot(souvenirs["Month"], souvenirs["Sales"])
axes[0].set_title("Souvenirs Sales")
axes[0].tick_params(axis="x", labelbottom=False)

axes[1].plot(souvenirs["Month"], souvenirs["y"])
axes[1].set_title("Log Transformed")
axes[1].tick_params(axis="x", labelbottom=False)

axes[2].plot(souvenirs["Month"], souvenirs["diff_seasonal"])
axes[2].set_title("Seasonal Difference (lag 12)")
axes[2].tick_params(axis="x", labelbottom=False)

axes[3].plot(souvenirs["Month"], souvenirs["diff_both"])
axes[3].set_title("Seasonal + First Difference")
axes[3].set_xlabel("Date")

plt.tight_layout()
plt.show()

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

plot_acf(souvenirs["y"].dropna(), lags=24, ax=axes[0])
axes[0].set_title("ACF - Log Series")

plot_acf(souvenirs["diff_seasonal"].dropna(), lags=24, ax=axes[1])
axes[1].set_title("ACF - Seasonal Diff")

plot_acf(souvenirs["diff_both"].dropna(), lags=24, ax=axes[2])
axes[2].set_title("ACF - Seasonal + First Diff")

plt.tight_layout()
plt.show()

The souvenirs series has a strong trend and seasonality, indicating non-stationary data. The size of the seasonal variation is proportional to the level of the series, so a log transformation is suitable.

When the log transformation is applied, a seasonal difference with lag 12 eliminates most of the seasonality, but the series still has a trend component. A first difference eliminates the trend, and the new series appears to be stationary. The ACF is used to confirm this, where most values lie within the confidence interval after the two transformations.

Thus, the suitable transformation is a log transformation with one seasonal difference and one regular difference, so \(D = 1\) and \(d = 1\).

Exercise 9.5

Code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf

retail = pd.read_csv("aus_retail.csv")

np.random.seed(12345678)
series_id = np.random.choice(retail["Series ID"].unique())

myseries = (
    retail[retail["Series ID"] == series_id]
    .copy()
    .sort_values("Month")
)

myseries["Month"] = pd.to_datetime(myseries["Month"])

myseries = myseries[["Month", "Turnover"]].dropna()

plt.figure(figsize=(10, 4))
plt.plot(myseries["Month"], myseries["Turnover"])
plt.title(f"Retail Series: {series_id}")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

myseries["y"] = np.log(myseries["Turnover"])

myseries["diff_seasonal"] = myseries["y"].diff(12)

myseries["diff_both"] = myseries["diff_seasonal"].diff()

fig, axes = plt.subplots(4, 1, figsize=(10, 10))

axes[0].plot(myseries["Month"], myseries["Turnover"])
axes[0].set_title("Original Series")
axes[0].tick_params(axis="x", labelbottom=False)

axes[1].plot(myseries["Month"], myseries["y"])
axes[1].set_title("Log Transformed")
axes[1].tick_params(axis="x", labelbottom=False)

axes[2].plot(myseries["Month"], myseries["diff_seasonal"])
axes[2].set_title("Seasonal Difference (lag 12)")
axes[2].tick_params(axis="x", labelbottom=False)

axes[3].plot(myseries["Month"], myseries["diff_both"])
axes[3].set_title("Seasonal + First Difference")
axes[3].set_xlabel("Date")

plt.tight_layout()
plt.show()

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

plot_acf(myseries["y"].dropna(), lags=24, ax=axes[0])
axes[0].set_title("ACF - Log")

plot_acf(myseries["diff_seasonal"].dropna(), lags=24, ax=axes[1])
axes[1].set_title("ACF - Seasonal Diff")

plot_acf(myseries["diff_both"].dropna(), lags=24, ax=axes[2])
axes[2].set_title("ACF - Seasonal + First Diff")

plt.tight_layout()
plt.show()

The retail series has a strong trend and seasonal effects, which indicate non-stationary processes. The variability in the series is also constant over the level, hence the need to transform the series by taking the logarithm.

After transforming the series, taking a seasonal difference with a lag of 12 eliminates the seasonal effects, but the ACF shows a slow decay, indicating non-stationary processes. After taking one first difference, the trend is removed, and the processes appear to be stationary, since the ACF shows most values within the confidence interval.

Thus, the appropriate transformation is taking the logarithm, one seasonal difference, and one first difference, which implies \(D = 1\) and \(d = 1\).

Exercise 9.6

9.6 A

Code
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

np.random.seed(624)

y = np.zeros(100)
e = np.random.normal(size=100)

for i in range(1, 100):
    y[i] = 0.6 * y[i-1] + e[i]

plt.figure(figsize=(10, 4))
plt.plot(y)
plt.title("AR(1) Simulation (phi = 0.6)")
plt.show()

9.6 B

As the value of \(\phi_1\) increases, the series appears to become more persistent and smoother because the current value is heavily dependent on the previous value. If the value of \(\phi_1\) is close to 0, the series appears to behave more like white noise. When the value of \(\phi_1\) approaches 1, the series appears to become very persistent and can behave almost like a random walk.

9.6 C

Code
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(624)

e = np.random.normal(size=100)
y = np.zeros(100)

for i in range(1, 100):
    y[i] = e[i] + 0.6 * e[i-1]

plt.figure(figsize=(10, 4))
plt.plot(y)
plt.title("MA(1) Simulation ($\\theta_1 = 0.6$)")
plt.xlabel("Time")
plt.ylabel("Value")
plt.tight_layout()
plt.show()

9.6 D

As the value of \(\theta_1\) increases in the series, we see more short-term dependence in the data. If \(\theta_1\) is near 0, then we see white noise with no structure. As we increase \(\theta_1\), we see more correlation in our data, but this dies out quickly and does not last for a long period.

9.6 E

Code
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(624)

e = np.random.normal(size=100)
y = np.zeros(100)

for i in range(1, 100):
    y[i] = 0.6 * y[i-1] + e[i] + 0.6 * e[i-1]

plt.figure(figsize=(10, 4))
plt.plot(y)
plt.title("ARMA(1,1) Simulation ($\\phi_1=0.6$, $\\theta_1=0.6$)")
plt.xlabel("Time")
plt.ylabel("Value")
plt.tight_layout()
plt.show()

9.6 F

Code
np.random.seed(624)

e = np.random.normal(size=100)
y2 = np.zeros(100)

for i in range(2, 100):
    y2[i] = -0.8 * y2[i-1] + 0.3 * y2[i-2] + e[i]

plt.figure(figsize=(10, 4))
plt.plot(y2)
plt.title("AR(2) Simulation ($\\phi_1=-0.8$, $\\phi_2=0.3$)")
plt.xlabel("Time")
plt.ylabel("Value")
plt.tight_layout()
plt.show()

9.6 G

The ARMA(1,1) series is seen to be stable and fluctuating around a constant mean value. This is characteristic of a stationary series.

On the other hand, the AR(2) series is seen to be explosive, where the magnitude of the fluctuations is increasing at a rapid rate. This is characteristic of a non-stationary series.

The ARMA(1,1) model is seen to generate a stable series, whereas the AR(2) model generates an unstable series due to the values of the parameters.

Exercise 9.7

Code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from pmdarima import auto_arima

air = pd.read_csv("aus_airpassengers.csv")

air["Year"] = pd.to_datetime(air["Year"], format="%Y")
air = air.sort_values("Year")

y = air["Passengers"]

plt.figure(figsize=(10,4))
plt.plot(air["Year"], y)
plt.title("Air Passengers")
plt.show()

Code
model_auto = auto_arima(
    y,
    seasonal=False,
    trace=True,
    stepwise=True
)

print(model_auto.summary())
Performing stepwise search to minimize aic
 ARIMA(2,2,2)(0,0,0)[0] intercept   : AIC=inf, Time=0.03 sec
 ARIMA(0,2,0)(0,0,0)[0] intercept   : AIC=228.811, Time=0.00 sec
 ARIMA(1,2,0)(0,0,0)[0] intercept   : AIC=214.665, Time=0.00 sec
 ARIMA(0,2,1)(0,0,0)[0] intercept   : AIC=inf, Time=0.01 sec
 ARIMA(0,2,0)(0,0,0)[0]             : AIC=226.831, Time=0.00 sec
 ARIMA(2,2,0)(0,0,0)[0] intercept   : AIC=210.407, Time=0.00 sec
 ARIMA(3,2,0)(0,0,0)[0] intercept   : AIC=211.393, Time=0.01 sec
 ARIMA(2,2,1)(0,0,0)[0] intercept   : AIC=inf, Time=0.02 sec
 ARIMA(1,2,1)(0,0,0)[0] intercept   : AIC=inf, Time=0.01 sec
 ARIMA(3,2,1)(0,0,0)[0] intercept   : AIC=inf, Time=0.02 sec
 ARIMA(2,2,0)(0,0,0)[0]             : AIC=208.450, Time=0.00 sec
 ARIMA(1,2,0)(0,0,0)[0]             : AIC=212.710, Time=0.00 sec
 ARIMA(3,2,0)(0,0,0)[0]             : AIC=209.436, Time=0.00 sec
 ARIMA(2,2,1)(0,0,0)[0]             : AIC=201.177, Time=0.01 sec
 ARIMA(1,2,1)(0,0,0)[0]             : AIC=199.244, Time=0.01 sec
 ARIMA(0,2,1)(0,0,0)[0]             : AIC=198.038, Time=0.00 sec
 ARIMA(0,2,2)(0,0,0)[0]             : AIC=199.176, Time=0.00 sec
 ARIMA(1,2,2)(0,0,0)[0]             : AIC=inf, Time=0.02 sec

Best model:  ARIMA(0,2,1)(0,0,0)[0]          
Total fit time: 0.163 seconds
                               SARIMAX Results                                
==============================================================================
Dep. Variable:                      y   No. Observations:                   47
Model:               SARIMAX(0, 2, 1)   Log Likelihood                 -97.019
Date:                Sun, 22 Mar 2026   AIC                            198.038
Time:                        20:58:30   BIC                            201.651
Sample:                             0   HQIC                           199.385
                                 - 47                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ma.L1         -0.8963      0.114     -7.842      0.000      -1.120      -0.672
sigma2         4.2120      0.420     10.023      0.000       3.388       5.036
===================================================================================
Ljung-Box (L1) (Q):                   1.48   Jarque-Bera (JB):               104.43
Prob(Q):                              0.22   Prob(JB):                         0.00
Heteroskedasticity (H):              20.02   Skew:                             1.61
Prob(H) (two-sided):                  0.00   Kurtosis:                         9.73
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

9.7 A

Code
plt.figure(figsize=(10,4))
plt.plot(model_auto.resid())
plt.title("Residuals (AutoARIMA)")
plt.show()

The Auto ARIMA procedure has selected an ARIMA (0,2,1) model based on the minimum AIC. This shows that the series needs to be twice-differenced in order to be stationary, which also confirms the presence of a strong trend in the series.

The residual diagnostics also indicate that the residuals appear to be white noise, as confirmed by the Ljung-Box test.

9.7 B

The ARIMA(0,2,1) model can be written using the backshift operator as:

\((1 - B)^2 y_t = (1 - \theta_1 B)\varepsilon_t\)

9.7 C

Code
from statsmodels.tsa.arima.model import ARIMA
import pandas as pd
import matplotlib.pyplot as plt

air["Year"] = pd.to_datetime(air["Year"], format="%Y")
air = air.sort_values("Year")

y_rw = air["Passengers"]

model_rw = ARIMA(y_rw, order=(0,1,0), trend="t").fit()

steps = 10
forecast_rw = model_rw.forecast(steps=steps)

last_year = air["Year"].iloc[-1]
future_years = pd.date_range(
    start=last_year + pd.offsets.YearEnd(1),
    periods=steps,
    freq="Y"
)

plt.figure(figsize=(10,4))
plt.plot(air["Year"], y_rw, label="Observed")
plt.plot(future_years, forecast_rw, label="RW with drift")
plt.legend()
plt.title("ARIMA(0,1,0) Forecast")
plt.xlabel("Year")
plt.ylabel("Passengers")
plt.tight_layout()
plt.show()

The linear forecast produced by the ARIMA(0,1,0) model with drift is a continuation of the upward trend. However, this model does not capture other autocorrelation structures that may be present in the series, as is done in the ARIMA(1,1,1) model. The result is a smoother curve.

This is a baseline model that is assuming that changes will continue at some constant average rate. While it does capture the overall trend of the series, it is not flexible enough to capture short-term variations.

9.7 D

While more complicated ARIMA models can be designed to detect more features of the data, over-parametrization can cause instability. The ARIMA(0,2,1) model achieves a balance between capturing the essential trend without over-complicating the process.

The removal of the constant term impairs the model’s capacity to detect long-run growth.

9.7 E

The ARIMA(0,2,1) model with a constant can produce unstable or unrealistic forecasts due to over-differencing, which can potentially lead to exaggerated trends and increased forecast uncertainty.

Exercise 9.8

Code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.holtwinters import ExponentialSmoothing

gdp = pd.read_csv("global_economy.csv")

us = gdp[gdp["Code"] == "USA"].copy()
us["ds"] = pd.to_datetime(us["ds"], format="%Y")
us = us.sort_values("ds")

us = us[["ds", "GDP"]].dropna()

plt.figure(figsize=(10,4))
plt.plot(us["ds"], us["GDP"])
plt.title("United States GDP")
plt.xlabel("Year")
plt.ylabel("GDP")
plt.tight_layout()
plt.show()

9.8 A

Code
us["log_GDP"] = np.log(us["GDP"])

plt.figure(figsize=(10,4))
plt.plot(us["ds"], us["log_GDP"])
plt.title("Log Transformed United States GDP")
plt.xlabel("Year")
plt.ylabel("log(GDP)")
plt.tight_layout()
plt.show()

The log transformation for the United States GDP shows an increasing trend that is linear rather than nonlinear. This is because the log transformation stabilizes variance over time, which otherwise would be affected by exponential growth as shown by the original GDP values.

The trend still exists, but using the log transformation makes the values more appropriate for modeling, especially for ARIMA models.

9.8 B

Code
us["log_GDP"] = np.log(us["GDP"])

if "ds" in us.columns:
    us["ds"] = pd.to_datetime(us["ds"], format="%Y")
    us = us.set_index("ds")

if isinstance(us.index, pd.PeriodIndex):
    pass
else:
    us.index = pd.DatetimeIndex(us.index).to_period("Y")

y = us["log_GDP"]

model_111 = ARIMA(y, order=(1,1,1)).fit()
model_211 = ARIMA(y, order=(2,1,1)).fit()
model_110 = ARIMA(y, order=(1,1,0)).fit()
model_011 = ARIMA(y, order=(0,1,1)).fit()

print("ARIMA(1,1,1) AIC:", model_111.aic)
print("ARIMA(2,1,1) AIC:", model_211.aic)
print("ARIMA(1,1,0) AIC:", model_110.aic)
print("ARIMA(0,1,1) AIC:", model_011.aic)
ARIMA(1,1,1) AIC: -276.83631896650616
ARIMA(2,1,1) AIC: -275.43290294103826
ARIMA(1,1,0) AIC: -266.09774782883244
ARIMA(0,1,1) AIC: -191.75093222332464

9.8 C

Alternative ARIMA models for the log transformation of the United States GDP were considered. They were ARIMA(2,1,1), ARIMA(1,1,0), and ARIMA(0,1,1). The best model was selected using the Akaike Information Criterion.

The ARIMA(1,1,1) model had the minimum AIC value among the considered models. Therefore, it is considered to be the best model for the series to be used for forecasting.

9.8 D

Code
resid = model_111.resid.iloc[1:].copy()
resid.index = resid.index.to_timestamp()

plt.figure(figsize=(10,4))
plt.plot(resid.index, resid.values)
plt.title("Residuals of ARIMA(1,1,1)")
plt.xlabel("Year")
plt.ylabel("Residuals")
plt.tight_layout()
plt.show()

The residuals of the ARIMA(1,1,1) process appear to vary randomly around zero with no apparent trend or pattern. The variation seems constant over time, with no apparent structures remaining in the residuals.

The result suggests that the process has successfully captured the underlying structure of the log-transformed United States GDP data, with the residuals resembling white noise.

9.8 E

Code
from statsmodels.tsa.arima.model import ARIMA
import pandas as pd
import matplotlib.pyplot as plt

us = gdp[gdp["Code"] == "USA"].copy()

us["ds"] = pd.to_datetime(us["ds"], format="%Y")
us = us.sort_values("ds")

y = np.log(us["GDP"])

model_111 = ARIMA(y, order=(1,1,1)).fit()

steps = 10
forecast = model_111.get_forecast(steps=steps)
forecast_mean = forecast.predicted_mean
conf_int = forecast.conf_int()

last_date = us["ds"].iloc[-1]
forecast_index = pd.date_range(
    start=last_date + pd.offsets.YearEnd(1),
    periods=steps,
    freq="YE"
)

plt.figure(figsize=(10,4))
plt.plot(us["ds"], y, label="Observed")
plt.plot(forecast_index, forecast_mean, label="Forecast")

plt.fill_between(
    forecast_index,
    conf_int.iloc[:, 0],
    conf_int.iloc[:, 1],
    alpha=0.2
)

plt.title("ARIMA(1,1,1) Forecast with Confidence Intervals")
plt.xlabel("Year")
plt.ylabel("log(GDP)")
plt.legend()
plt.tight_layout()
plt.show()

The ARIMA(1,1,1) model has a smooth rising trend that continues to show the long-term growth pattern of the log-transformed GDP. The confidence intervals are widening over time, which indicates that as time progresses, there is more uncertainty.

The underlying trend is well represented by this model. The narrow intervals indicate that there is a stable growth pattern for GDP.

9.8 F

Code
from statsmodels.tsa.holtwinters import ExponentialSmoothing

ets_model = ExponentialSmoothing(
    us["GDP"],
    trend="add",
    seasonal=None
).fit()

ets_forecast = ets_model.forecast(10)

gdp_plot = us.copy()
gdp_plot["ds"] = pd.to_datetime(gdp_plot["ds"], format="%Y")
gdp_plot = gdp_plot.sort_values("ds")

arima_forecast = np.exp(forecast_mean)

last_date = gdp_plot["ds"].iloc[-1]
future_dates = pd.date_range(
    start=last_date + pd.offsets.YearEnd(1),
    periods=10,
    freq="YE"
)

plt.figure(figsize=(10,4))
plt.plot(gdp_plot["ds"], gdp_plot["GDP"], label="Observed")
plt.plot(future_dates, arima_forecast, label="ARIMA Forecast")
plt.plot(future_dates, ets_forecast, label="ETS Forecast")

plt.title("ARIMA vs ETS Forecasts - US GDP")
plt.xlabel("Year")
plt.ylabel("GDP")
plt.legend()
plt.tight_layout()
plt.show()

The ARIMA and ETS models produce similar forecasts for United States GDP, both projecting continued upward growth. The ARIMA model shows slightly higher forecasts, indicating a more aggressive growth trajectory, while the ETS model provides a smoother and more conservative estimate.

The similarity between the forecasts suggests that both models effectively capture the long-term trend in GDP. However, the ARIMA model may better account for underlying autocorrelation in the data, while the ETS model focuses on smoothing the trend.