knitr::opts_chunk$set(echo = TRUE)
Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
library(magick)
## Warning: package 'magick' was built under R version 4.3.3
## Linking to ImageMagick 6.9.12.98
## Enabled features: cairo, freetype, fftw, ghostscript, heic, lcms, pango, raw, rsvg, webp
## Disabled features: fontconfig, x11
url <- "https://github.com/sokkarbishoy/DATA624-Public/raw/39c2dbd957bde26a3234b84978e7089610ce907e/9.223.png"
img <- image_read(url)
print(img)
## format width height colorspace matte filesize density
## 1 PNG 1344 403 sRGB FALSE 26883 76x76
The difference between the three are that the ACF values, these represent how much a time series is correlated with itself at each lag value. For the first plot, the ACF values vary more widely at different lags, and there were multiple points, as more sample size inccrease we can see that the ACF values show less variation of each lag, with most values between the confidence intervals. All three plots are consistent with white noise behavior, but the degree of confidence improves with sample size. As the sample size increases, the ACF values approach zero more consistently
In the plots above, we can see that the critical values, represented in the dashed lines, vary in thier distance from zero because the critical bounds for the ACF depend in the sample size. Even though the samples were chooses randomly, the autocorrelation looks different across the plots due to sampling variablility. The larger the sample, the better ACF plot will approximate the autocorrelation between all the lags.
##9.2
A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
amzn <- gafa_stock %>%
filter(Symbol == "AMZN") %>%
arrange(Date)
#Convert Date column to Date type if it's not already
amzn$Date <- as.Date(amzn$Date)
amzn_ts <- ts(amzn$Close, frequency = 365)
#Plot the daily closing prices
p1 <- ggplot(amzn, aes(x = Date, y = Close)) +
geom_line(color = "blue") +
labs(title = "Amazon Daily Closing Prices",
x = "Date",
y = "Closing Price (USD)") +
theme_minimal()
# Plot the Autocorrelation Function (ACF)
p2 <- ggAcf(amzn_ts, lag.max = 50) +
ggtitle("ACF of Amazon Closing Prices") +
theme_minimal()
p2
p3 <- ggPacf(amzn_ts, lag.max = 50) +
ggtitle("PACF of Amazon Closing Prices") +
theme_minimal()
p3
From the above plots we can see that they indicate that the time series
of Amazon daily closing is non stationary. The trend in the time series
plot and the ACF shows signs of non stationary. plot of Amazon’s closing
prices shows a clear upward trend over time, indicating that the mean is
not constant. This trend suggests that the series is non-stationary. The
ACF plot shows high autocorrelations that slowly decrease as the lag
increases, rather than quickly dropping off. This pattern is a sign of
non-stationarity. The PACF plot has a significant spike at lag 1, with
other lags much lower but still significant, further supporting the
presence of non-stationary.
##9.3 For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
global_economy <- global_economy
turkish_GDP <- global_economy %>%
filter(Country == "Turkey")
autoplot(turkish_GDP, GDP) +
ggtitle("Turkish GDP over Time") +
ylab("GDP") +
xlab("Year")
The data is from 1960 to 2017.
# We use the BoxCox.lambda function to find the optimal lambda
lambda <- BoxCox.lambda(turkish_GDP$GDP)
turkish_GDP$GDP_transformed <- BoxCox(turkish_GDP$GDP, lambda)
autoplot(turkish_GDP, GDP_transformed) +
ggtitle("Box-Cox Transformed Turkish GDP") +
ylab("Transformed GDP") +
xlab("Year")
ggAcf(turkish_GDP$GDP_transformed, lag.max = 50) +
ggtitle("ACF of Transformed Turkish GDP")
ggPacf(turkish_GDP$GDP_transformed, lag.max = 50) +
ggtitle("PACF of Transformed Turkish GDP")
BoxCoxedData <- BoxCox(turkish_GDP$GDP, lambda = BoxCox.lambda(turkish_GDP$GDP))
DifferencedData <- diff(BoxCoxedData)
# Explore the data after BoxCoxing and differencing to check for improvement.
ggtsdisplay(DifferencedData)
# Step 2: Determine the appropriate order of differencing
d <- ndiffs(turkish_GDP$GDP_transformed)
d
## [1] 1
The order of difference needed is at least one.
Tasmania_takings <- aus_accommodation %>%
filter(State == "Tasmania")
Tasmania_takings
## # A tsibble: 74 x 5 [1Q]
## # Key: State [1]
## Date State Takings Occupancy CPI
## <qtr> <chr> <dbl> <dbl> <dbl>
## 1 1998 Q1 Tasmania 28.7 67 67
## 2 1998 Q2 Tasmania 19.0 45 67.4
## 3 1998 Q3 Tasmania 16.1 39 67.5
## 4 1998 Q4 Tasmania 25.9 56 67.8
## 5 1999 Q1 Tasmania 28.4 66 67.8
## 6 1999 Q2 Tasmania 20.1 48 68.1
## 7 1999 Q3 Tasmania 17.3 41 68.7
## 8 1999 Q4 Tasmania 24.3 56 69.1
## 9 2000 Q1 Tasmania 30.0 66.2 69.7
## 10 2000 Q2 Tasmania 21.7 49.7 70.2
## # ℹ 64 more rows
autoplot(Tasmania_takings, Takings) +
ggtitle("Tasmania Taking over Time") +
ylab("Takings") +
xlab("Year")
#determine the optimal box-cox transformation to stabilize the variance
lambda <- BoxCox.lambda(Tasmania_takings$Takings)
cat("Optimal Box-Cox lambda:", lambda, "\n")
## Optimal Box-Cox lambda: -0.005076712
Tasmania_takings <- Tasmania_takings %>%
mutate(Takings_transformed = BoxCox(Takings, lambda))
Tasmania_takings_diff <- Tasmania_takings %>%
mutate(Takings_diff = c(NA, diff(Takings_transformed, differences = d)))
d <- ndiffs(Tasmania_takings$Takings_transformed)
cat("Order of differencing required:", d, "\n")
## Order of differencing required: 1
Tasmania_takings_diff <- Tasmania_takings %>%
mutate(Takings_diff = c(NA, diff(Takings_transformed, differences = d)))
autoplot(na.omit(Tasmania_takings_diff), Takings_diff) +
ggtitle("Differenced and Transformed Tasmania Takings") +
ylab("Differenced Transformed Takings") +
xlab("Year")
Since the data is very seasonal and while it appears to have a more consistent variance and mean than the original data, there is still a clear seasonal pattern that has not been removed. Stationarity typically requires that there is no remaining predictable pattern or seasonality in the data.
souvenirs %>%
gg_tsdisplay(Sales, plot_type='partial', lag = 36) +
labs(title = "Non-transformed Monthly Souvenir Sales")
# calculate lambda
lambda <- souvenirs %>%
features(Sales, features = guerrero) %>%
pull(lambda_guerrero)
##9.5 For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
set.seed(242423)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
autoplot(myseries, Turnover) +
ggtitle("Selected Retail Series over Time") +
ylab("Turnover") +
xlab("Year")
lambda <- BoxCox.lambda(myseries$Turnover)
cat("Optimal Box-Cox lambda:", lambda, "\n")
## Optimal Box-Cox lambda: -0.231131
myseries <- myseries %>%
mutate(Turnover_transformed = BoxCox(Turnover, lambda))
d <- ndiffs(myseries$Turnover_transformed)
cat("Order of regular differencing required:", d, "\n")
## Order of regular differencing required: 1
myseries_diff <- myseries %>%
mutate(Turnover_diff = c(NA, diff(Turnover_transformed, differences = d)))
autoplot(na.omit(myseries_diff), Turnover_diff) +
ggtitle("Differenced and Transformed Series") +
ylab("Differenced Transformed Turnover") +
xlab("Year")
ggAcf(na.omit(myseries_diff$Turnover_diff), lag.max = 50) +
ggtitle("ACF of Differenced and Transformed Series")
ggPacf(na.omit(myseries_diff$Turnover_diff), lag.max = 50) +
ggtitle("PACF of Differenced and Transformed Series")
The original series plot shows a clear upward trend, indicating non-stationarity, and variance that increases over time. After applying a Box-Cox transformation and first-order differencing, the differenced series plot appears to have a more stable mean but still exhibits high-frequency fluctuations.
##9.6 Simulate and plot some data from simple ARIMA models.
a)Use the following R code to generate data from an AR(1) model with ϕ 1 = 0.6 and σ2=1. The process starts with y1=0.
set.seed(2434243)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
ggplot(sim, aes(x = idx, y = y)) +
geom_line() +
ggtitle("Simulated AR(1) Process with φ1 = 0.6") +
xlab("Time") +
ylab("y")
set.seed(244243)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.9*y[i-1] + e[i]
sim2 <- tsibble(idx = seq_len(100), y = y, index = idx)
ggplot(sim2, aes(x = idx, y = y)) +
geom_line() +
ggtitle("Simulated AR(1) Process with φ1 = 0.9") +
xlab("Time") +
ylab("y")
b) Produce a time plot for the series. How does the plot change as you
change ϕ1? the more ϕ1 increases it becomes more persistent, meaning it
retains memory of past values for longer. As ϕ1 decreases the series
shows less autocorrelation between consecutive values, with faster decay
in memory.
set.seed(2434243)
theta1 <- 0.6
n <- 100
e <- rnorm(n, mean = 0, sd = 1) # White noise with variance σ^2 = 1
# Initialize y as a numeric vector
y <- numeric(n)
y[1] <- e[1] # Start the process
# Generate the MA(1) series
for(i in 2:n) {
y[i] <- e[i] + theta1 * e[i - 1]
}
sim_ma1 <- tsibble(idx = seq_len(n), y = y, index = idx)
# Plot the MA(1) series
ggplot(sim_ma1, aes(x = idx, y = y)) +
geom_line() +
ggtitle("Simulated MA(1) Process with θ1 = 0.6 and σ^2 = 1") +
xlab("Time") +
ylab("y")
d) the series exhibits stronger correlations between consecutive terms.
This results in a smoother, more oscillatory pattern, as each term is
more heavily influenced by the previous noise shock.When θ1 is close to
0, the correlation between consecutive terms diminishes, making the
series resemble white noise, with little or no dependence between
terms.
Generate data from an ARMA(1,1) model with ϕ1=0.6, θ1=0.6 and σ2=1
Generate data from an AR(2) model with ϕ1=-0.8, ϕ2 = 0.3 and σ2=1. (Note that these parameters will give a non-stationary series.)
Graph the latter two series and compare them
set.seed(2434243)
# the ARMA(1,1) model
phi1_arma <- 0.6
theta1_arma <- 0.6
n <- 100
e <- rnorm(n, mean = 0, sd = 1)
# ARMA(1,1) series
y_arma <- numeric(n)
y_arma[1] <- e[1]
# Generate the ARMA(1,1) series
for(i in 2:n) {
y_arma[i] <- phi1_arma * y_arma[i - 1] + e[i] + theta1_arma * e[i - 1]
}
# Convert ARMA(1,1) series to a tsibble for plotting
sim_arma11 <- tsibble(idx = seq_len(n), y = y_arma, index = idx)
# Parameters for the AR(2) model
phi1_ar2 <- -0.8
phi2_ar2 <- 0.3
# Initialize the AR(2) series
y_ar2 <- numeric(n)
y_ar2[1:2] <- e[1:2]
# AR(2) series (non-stationary)
for(i in 3:n) {
y_ar2[i] <- phi1_ar2 * y_ar2[i - 1] + phi2_ar2 * y_ar2[i - 2] + e[i]
}
sim_ar2 <- tsibble(idx = seq_len(n), y = y_ar2, index = idx)
p1 <- ggplot(sim_arma11, aes(x = idx, y = y)) +
geom_line() +
ggtitle("Simulated ARMA(1,1) Process with φ1 = 0.6, θ1 = 0.6") +
xlab("Time") +
ylab("y")
p2 <- ggplot(sim_ar2, aes(x = idx, y = y)) +
geom_line() +
ggtitle("Simulated AR(2) Process with φ1 = -0.8, φ2 = 0.3 (Non-stationary)") +
xlab("Time") +
ylab("y")
grid.arrange(p1, p2, ncol = 2)
##9.7 Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.
We use auto.arima to find the best-fitting ARIMA model for the aus_airpassengers data. We then check if the residuals appear like white noise. b) Write the model in terms of the backshift operator. yt=−0.8963∗ϵt-1+ϵt
c)Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.
air_ts_fit1 <- aus_airpassengers %>%
model(arima1 = ARIMA(Passengers ~ pdq(0,1,0)))
report(air_ts_fit1)
## Series: Passengers
## Model: ARIMA(0,1,0) w/ drift
##
## Coefficients:
## constant
## 1.4191
## s.e. 0.3014
##
## sigma^2 estimated as 4.271: log likelihood=-98.16
## AIC=200.31 AICc=200.59 BIC=203.97
air_ts_fit1 %>%
gg_tsresiduals()
air_ts_fit1 %>%
forecast(h=12) %>%
autoplot(aus_airpassengers)
data(aus_airpassengers)
model_a <- auto.arima(aus_airpassengers)
print(model_a)
## Series: aus_airpassengers
## ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8963
## s.e. 0.0594
##
## sigma^2 = 4.308: log likelihood = -97.02
## AIC=198.04 AICc=198.32 BIC=201.65
checkresiduals(model_a)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,2,1)
## Q* = 6.5774, df = 8, p-value = 0.5828
##
## Model df: 1. Total lags used: 9
summary(model_a)
## Series: aus_airpassengers
## ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8963
## s.e. 0.0594
##
## sigma^2 = 4.308: log likelihood = -97.02
## AIC=198.04 AICc=198.32 BIC=201.65
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.3453292 2.008183 1.215008 0.9528709 4.809966 0.699953 -0.1743841
data("aus_airpassengers")
air_passengers <- aus_airpassengers %>%
filter(Year >= 1970, Year <= 2011) %>%
summarise(Total_Passengers = sum(Passengers))
# Convert to time series object
air_ts <- ts(air_passengers$Total_Passengers, start = 1970, frequency = 1)
# Part a: Fit an ARIMA model
fit_arima <- auto.arima(air_ts)
print(fit_arima)
## Series: air_ts
## ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8756
## s.e. 0.0722
##
## sigma^2 = 4.671: log likelihood = -87.8
## AIC=179.61 AICc=179.93 BIC=182.99
# Check residuals to see if they resemble white noise
checkresiduals(fit_arima)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,2,1)
## Q* = 5.7075, df = 7, p-value = 0.5743
##
## Model df: 1. Total lags used: 8
# Plot forecast for the next 10 periods
forecast_arima <- forecast(fit_arima, h = 10)
autoplot(forecast_arima) +
ggtitle("ARIMA Model Forecasts for Australian Air Passengers") +
xlab("Year") + ylab("Passengers (millions)")
The residuals appear to oscillate around zero without any clear pattern or trend, which is a good indication that the model has removed systematic structure from the data.he ACF plot shows that most auto correlations are within the 95% confidence bounds (indicated by the dashed blue lines). This suggests that there is little to no significant autocorrelation remaining in the residuals, which is a key characteristic of white noise. Overall, the residuals from this ARIMA(0,2,1) model display characteristics close to white noise, indicating that the model has effectively captured the patterns in the data.
##9.8 For the United States GDP series (from global_economy):
if necessary, find a suitable Box-Cox transformation for the data; fit a suitable ARIMA model to the transformed data using ARIMA();
us_gdp <- global_economy %>%
filter(Code == 'USA') %>%
select(Country, GDP)
lambda_value <- us_gdp %>%
features(GDP, features = guerrero) %>%
pull(lambda_guerrero)
us_gdp_transformed <- us_gdp %>%
mutate(GDP_transformed = box_cox(GDP, lambda_value))
fit <- us_gdp %>%
model(
arima = ARIMA(GDP, stepwise = FALSE, approx = FALSE))
report(fit)
## Series: GDP
## Model: ARIMA(0,2,2)
##
## Coefficients:
## ma1 ma2
## -0.4206 -0.3048
## s.e. 0.1197 0.1078
##
## sigma^2 estimated as 2.615e+22: log likelihood=-1524.08
## AIC=3054.15 AICc=3054.61 BIC=3060.23
try some other plausible models by experimenting with the orders chosen;
us_gdp_ts <- ts(us_gdp_transformed$GDP_transformed, start = c(1970), frequency = 1) # Adjust start year if necessary
# Experimenting with different ARIMA models
# ARIMA(1,1,1)
model_111 <- Arima(us_gdp_ts, order = c(1, 1, 1))
# ARIMA(1,1,2)
model_112 <- Arima(us_gdp_ts, order = c(1, 1, 2))
# ARIMA(2,1,1)
model_211 <- Arima(us_gdp_ts, order = c(2, 1, 1))
# ARIMA(0,1,1)
model_011 <- Arima(us_gdp_ts, order = c(0, 1, 1))
# Compare models based on AIC values
models <- list(model_111, model_112, model_211, model_011)
aic_values <- sapply(models, AIC)
best_model <- models[[which.min(aic_values)]]
print(best_model)
## Series: us_gdp_ts
## ARIMA(1,1,2)
##
## Coefficients:
## ar1 ma1 ma2
## 0.9944 -0.4986 -0.2366
## s.e. 0.0082 0.1314 0.1261
##
## sigma^2 = 5821: log likelihood = -327.69
## AIC=663.37 AICc=664.14 BIC=671.54
# Residual diagnostics for the best model
checkresiduals(best_model)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,1,2)
## Q* = 4.0278, df = 7, p-value = 0.7766
##
## Model df: 3. Total lags used: 10
# Plotting the ACF and PACF of residuals for white noise check
acf(best_model$residuals, main="ACF of Residuals")
pacf(best_model$residuals, main="PACF of Residuals")
# Additional tests for residuals (Ljung-Box test)
Box.test(best_model$residuals, lag = 10, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: best_model$residuals
## X-squared = 4.0278, df = 10, p-value = 0.9461
choose what you think is the best model and check the residual diagnostics; produce forecasts of your fitted model. Do the forecasts look reasonable? compare the results with what you would obtain using ETS() (with no transformation).
arima_forecast <- forecast(best_model, h = 10) # Forecast 10 periods ahead
plot(arima_forecast, main = "Forecast from Best ARIMA Model")
# Forecast summary for inspection
print(arima_forecast)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2028 19978.52 19880.74 20076.30 19828.98 20128.06
## 2029 20167.39 19991.45 20343.33 19898.32 20436.46
## 2030 20355.21 20109.62 20600.80 19979.61 20730.80
## 2031 20541.98 20227.58 20856.37 20061.15 21022.80
## 2032 20727.71 20343.36 21112.06 20139.90 21315.52
## 2033 20912.41 20456.20 21368.61 20214.70 21610.11
## 2034 21096.08 20565.82 21626.34 20285.11 21907.05
## 2035 21278.73 20672.09 21885.37 20350.95 22206.51
## 2036 21460.36 20774.99 22145.73 20412.17 22508.54
## 2037 21640.98 20874.55 22407.41 20468.82 22813.14
# Forecasting with ETS (no transformation applied)
ets_model <- ets(us_gdp_ts) # Fit ETS model
ets_forecast <- forecast(ets_model, h = 10)
plot(ets_forecast, main = "Forecast from ETS Model")
# Display both ARIMA and ETS forecast results for comparison
print(ets_forecast)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2028 19972.82 19821.34 20124.30 19741.14 20204.49
## 2029 20170.52 19912.32 20428.71 19775.64 20565.39
## 2030 20368.22 19996.79 20739.65 19800.17 20936.27
## 2031 20565.92 20072.20 21059.64 19810.84 21321.00
## 2032 20763.62 20138.22 21389.03 19807.14 21720.10
## 2033 20961.33 20194.98 21727.67 19789.29 22133.36
## 2034 21159.03 20242.75 22075.31 19757.70 22560.35
## 2035 21356.73 20281.81 22431.65 19712.78 23000.67
## 2036 21554.43 20312.43 22796.42 19654.96 23453.90
## 2037 21752.13 20334.86 23169.40 19584.60 23919.66
# Plot ARIMA and ETS forecasts together for direct comparison
plot(arima_forecast, main = "ARIMA vs ETS Forecast Comparison")
lines(ets_forecast$mean, col = "blue", lty = 2) # ETS forecast as blue dashed line
legend("topleft", legend = c("ARIMA Forecast", "ETS Forecast"), col = c("black", "blue"), lty = c(1, 2))
Both the ARIMA and ETS models capture the upward trend of the GDP data.
This suggests that both models recognize the long-term growth pattern in
the historical data.