Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
knitr::include_graphics("C:/Users/dkbs0/OneDrive/Desktop/Data 624/Week 7/Figure 9.32 .png")
Left Plot (36 Numbers): This ACF plot exhibits considerable variability, with several spikes exceeding the confidence intervals represented by the dashed blue lines. Due to the limited sample size of 36 observations, identifying a clear pattern is more challenging, and random fluctuations may be more pronounced. Some lag values may appear significant even if the data is white noise, simply due to the small sample.
Middle Plot (360 Numbers): This plot appears more stable than the left one, with most autocorrelation values near zero and within the confidence intervals. The larger sample size of 360 observations provides a more accurate representation of the underlying white noise process, resulting in fewer false signals of autocorrelation.
Right Plot (1,000 Numbers): This plot shows exceptional stability, with nearly all points close to zero and well within the confidence intervals. The substantial sample size of 1,000 observations further reinforces the indication that the data is white noise. The randomness is more evenly distributed, reducing the occurrence of spurious spikes in autocorrelation.
White Noise for 3 plots: In all three plots, despite some apparent spikes in the smaller samples (left and middle), the data likely reflects white noise overall, as the ACF values remain close to zero without any systematic autocorrelation pattern. The deviations observed in the left plot are probably due to the limited sample size.
So sample size increases, the ACF plots smooth out, demonstrating that the autocorrelation values consistently approach zero, which indicates a stronger alignment with the characteristics of white noise.
The critical values (confidence intervals) vary in distance from the mean of zero due to the influence of sample size on their width. For smaller samples (e.g., 36 numbers), confidence intervals are wider, reflecting greater uncertainty in autocorrelation estimates. Conversely, larger samples (e.g., 360 or 1,000 numbers) produce narrower intervals, indicating more reliable estimates. This relationship is described by the formula ±1.96/sqrt(N), where N is the number of observations. Thus, as the sample size increases, the critical values get closer to zero.
A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
library(forecast)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(ggplot2)
library(fpp3)
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.5
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.4.1
## ✔ lubridate 1.9.2 ✔ fable 0.4.0
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
library(urca)
library(ggfortify)
## Registered S3 methods overwritten by 'ggfortify':
## method from
## autoplot.Arima forecast
## autoplot.acf forecast
## autoplot.ar forecast
## autoplot.bats forecast
## autoplot.decomposed.ts forecast
## autoplot.ets forecast
## autoplot.forecast forecast
## autoplot.stl forecast
## autoplot.ts forecast
## fitted.ar forecast
## fortify.ts forecast
## residuals.ar forecast
library(dplyr)
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
#head(gafa_stock)
am_stock <- gafa_stock %>%
filter(Symbol == "AMZN") %>%
dplyr::select(Date, Close)
am_stock %>%
autoplot(Close) +
ggtitle("AMZN Daily prices")
# ACF
am_stock %>%
ACF(Close) %>%
autoplot() + ggtitle("ACF: AMZN Stock Prices")
#PACF
am_stock %>%
PACF(Close) %>%
autoplot() + ggtitle("PACF: AMZN Stock Prices")
Plot of Daily Closing Prices:
The closing prices typically exhibit a trend, showing a consistent upward or downward movement over time. This trend indicates that the mean of the series changes over time, which is a hallmark of non-stationarity.
The ACF plot will often show slow decay or significant autocorrelations at longer lags. This indicates that past values have a persistent influence on future values, suggesting non-stationarity. PACF Plot:
The PACF plot might show significant spikes for the first few lags and then gradually decline, reinforcing the idea that the series does not revert to a constant mean and may require differencing to achieve stationarity.
The presence of a trend in the price plot, along with the behavior of the ACF and PACF, collectively suggests that the Amazon stock prices are non-stationary and should be differenced to stabilize the mean and variance before further analysis. `
For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
#filter Turkish GDP, take a look
turkey_econ <- global_economy %>%
filter(Country == 'Turkey')
autoplot(turkey_econ)
## Plot variable not specified, automatically selected `.vars = GDP`
# find lambda and transform
BoxCox.lambda(turkey_econ$GDP)
## [1] 0.1571804
#box cox
turkey_econ %>%
autoplot(BoxCox(GDP, BoxCox.lambda(turkey_econ$GDP)), colour = "blue")
Box Cox transformation (lambda 0.1571804) enhances the relationship
between GDP and time by reducing variation. Let’s employ ndiffs to show
the number of differences to achieve stationary data.
ndiffs(turkey_econ$GDP)
## [1] 2
#Accommodation takings in the state of Tasmania from aus_accommodation.
tas_acc <- aus_accommodation %>%
filter(State == 'Tasmania')
tas_acc %>% autoplot(Takings, colour="#00008B")
#lambda
BoxCox.lambda(tas_acc$Takings)
## [1] -0.005076712
#box-cox & plot
tas_acc %>%
autoplot(BoxCox(Takings, BoxCox.lambda(tas_acc$Takings)), colour="#00008B")
#differences
ndiffs(tas_acc$Takings)
## [1] 1
The Box-Cox transformation (lambda = -0.0051) effectively stabilizes variability and creates a more linear relationship between Takings and time. The differencing analysis indicates that only one difference is needed to achieve stationarity.
#Monthly sales from souvenirs.
souvenirs %>% autoplot()
## Plot variable not specified, automatically selected `.vars = Sales`
#lambda
BoxCox.lambda(souvenirs$Sales)
## [1] -0.2444328
#plot boxcox and compare to above image
souvenirs %>%
autoplot(BoxCox(Sales, BoxCox.lambda(BoxCox.lambda(souvenirs$Sales))))
ndiffs(souvenirs$Sales)
## [1] 1
There’s virtually no difference between the transformed plot and the original: no improvement in variation.
For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
set.seed(200)
aus_series <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
aus_series %>%
gg_tsdisplay(Turnover, plot_type = 'partial', lag_max = 36) +
labs(title= "Australian retail turnover", y = NULL)
aus_series %>%
transmute(
`Turnover` = Turnover,
`log(Turnover)` = log(Turnover),
`log(Turnover) | Ann. Change` = difference(log(Turnover), 12),
`DD log(Turnover)` =
difference(difference(log(Turnover), 12), 1)
)%>%
pivot_longer(-Month, names_to="Type", values_to="Turnover") %>%
mutate(
Type = factor(Type, levels = c(
"Turnover",
"log(Turnover)",
"log(Turnover) | Ann. Change",
"DD log(Turnover)"))
) %>%
ggplot(aes(x = Month, y = Turnover)) +
geom_line() +
facet_grid(vars(Type), scales = "free_y") +
labs(title= "Monthly Australian retail turnover", y = NULL)
Overall, the numbers seem to show an upward trend with seasonal fluctuations. The stationary data lacks predictable patterns, but applying a log transformation helps stabilize it while maintaining the upward trend. Finally, the seasonally differenced data after log transformation was further stabilized.
Simulate and plot some data from simple ARIMA models.
#a. Use the following R code to generate data from an AR(1) model with ϕ1 = 0.6 and σ^2 =1. The process starts with y1=0.
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
head(sim)
## # A tsibble: 6 x 2 [1]
## idx y
## <int> <dbl>
## 1 1 0
## 2 2 0.433
## 3 3 0.818
## 4 4 0.550
## 5 5 0.216
## 6 6 -0.891
#b. Produce a time plot for the series. How does the plot change as you change ϕ1?
sim %>% autoplot(y) +
labs(title = expression("AR(1) model with" ~ phi[1] ~ "=" ~ 0.6 ~ "," ~ sigma^2 ~ "=" ~ 1 ~ "," ~ y[1] ~ "=" ~ 0))
set.seed(432)
y2 <- numeric(100)
y3 <- numeric(100)
e2 <- rnorm(100)
e3 <- rnorm(100)
for(i in 2:100){
y2[i] <- 0.1*y2[i-1] + e2[i]
y3[i] <- 0.9*y3[i-1] + e3[i]
}
sim <- tsibble(idx = seq_len(100), y = y, y2 = y2, y3 = y3, index = idx)
plot1 <- sim %>% autoplot(y2) + labs( title = "Phi = 0.1")
plot2 <- sim %>% autoplot(y) + labs( title = "Phi = 0.6")
plot3 <- sim %>% autoplot(y3) + labs( title = "Phi = 0.9")
grid.arrange(plot1, plot2, plot3, nrow = 2)
We can see here ϕ1 approaches increases and decreases, both magnitude and wavelength track directionally.
#c: write your own code to generate data from an MA(1) model with model with θ1=0.6 and σ2=1.
set.seed(400)
y4 <- numeric(100)
e4 <- rnorm(100)
for(i in 2:100)
y4[i] <- 0.6*e4[i-1] + e4[i]
sim2 <- tsibble(idx = seq_len(100), y4 = y4, index = idx)
head(sim2)
## # A tsibble: 6 x 2 [1]
## idx y4
## <int> <dbl>
## 1 1 0
## 2 2 -0.00665
## 3 3 1.84
## 4 4 0.201
## 5 5 -1.01
## 6 6 -1.71
#d:Produce a time plot for the series. How does the plot change as you change θ1?
set.seed(400)
y5 <- numeric(100)
y6 <- numeric(100)
e5 <- rnorm(100)
e6 <- rnorm(100)
for(i in 2:100){
y5[i] <- 0.1*e5[i-1] + e5[i]
y6[i] <- 0.9*e6[i-1] + e6[i]
}
sim2 <- tsibble(idx = seq_len(100), y4 = y4, y5 = y5, y6 = y6, index = idx)
plot4 <- sim2 %>% autoplot(y5) + labs( title = "Theta = 0.1")
plot5 <- sim2 %>% autoplot(y4) + labs( title = "Theta = 0.6")
plot6 <- sim2 %>% autoplot(y6) + labs( title = "Theta = 0.9")
grid.arrange(plot4, plot5, plot6, nrow = 2)
As θ shifts, the plots do not vary much. There is consistent variance,
and the minimum-maximum stays fairly consistent as well, as are
wavelengths.
#e: Generate data from an ARMA(1,1) model with ϕ1=0.6, θ1=0.6 and σ2=1
for(i in 2:100)
y[i] <- 0.6*y[i-1] + 0.6*e[i-1] + e[i]
sim_ARMA <- tsibble(idx = seq_len(100), y = y, index = idx)
head(sim_ARMA)
## # A tsibble: 6 x 2 [1]
## idx y
## <int> <dbl>
## 1 1 0
## 2 2 0.568
## 3 3 1.16
## 4 4 1.09
## 5 5 0.575
## 6 6 -0.744
#f: Generate data from an AR(2) model with ϕ1=−0.8,ϕ2=0.3 and σ^2=1. (Note that these parameters will give a non-stationary series.)
p_1<--0.8
p_2<-0.3
for(i in 3:100)
y[i] <- p_1*y[i-1] + p_2*y[i-2] + e[i]
AR_sim <- tsibble(idx = seq_len(100), y = y, index = idx)
#g: Graph the latter two series and compare them.
plt_ARMA <- sim_ARMA %>% autoplot(y)+
labs(title = expression("ARMA:" ~ phi[1] ~ "=" ~ 0.6 ~ "," ~ theta[1] ~ "=" ~ 0.6 ~ "," ~ sigma^2 ~ "=" ~ 1))
plt_AR <- AR_sim %>% autoplot(y)+
labs(title = expression("AR:" ~ phi[1] ~ "=" ~ -0.8 ~ "," ~ theta[1] ~ "=" ~ 0.3 ~ "," ~ sigma^2 ~ "=" ~ 1))
gridExtra::grid.arrange(plt_ARMA, plt_AR, ncol = 1, nrow = 2)
These outputs reveal significant differences, as the ARMA model suggests
little to no trend, while the AR model displays erratic positive and
negative values in the later indices. Examining both non-stationary
models together highlights the critical role of differencing.
Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011. * a.Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods. * b.Write the model in terms of the backshift operator. * c.Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a. * d.Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens. * e.Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?
#a Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.
fit <- aus_airpassengers %>%
model(ARIMA(Passengers~pdq(0,1,0)))
fit %>% forecast(h=10) %>%
autoplot(aus_airpassengers) +
labs(y = "passengers", title = "Australia: Air Travel")
fit %>% gg_tsresiduals()
report(fit)
## Series: Passengers
## Model: ARIMA(0,1,0) w/ drift
##
## Coefficients:
## constant
## 1.4191
## s.e. 0.3014
##
## sigma^2 estimated as 4.271: log likelihood=-98.16
## AIC=200.31 AICc=200.59 BIC=203.97
I selected the ARIMA(0,1,0) model. The residuals function we employed shows that there is white noise, judging by the ACF plot.
##b: Write the model in terms of the backshift operator. Yt=Yt−1+1.4191+ϵt
#c: Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a. N/A as I employed this model originally.
#d: Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.
fit_2 <- aus_airpassengers %>%
model(ARIMA(Passengers ~ pdq(2,1,2) + PDQ(0,0,0) + 1))
report(fit_2)
## Series: Passengers
## Model: ARIMA(2,1,2) w/ drift
##
## Coefficients:
## ar1 ar2 ma1 ma2 constant
## -0.5518 -0.7327 0.5895 1.0000 3.2216
## s.e. 0.1684 0.1224 0.0916 0.1008 0.7224
##
## sigma^2 estimated as 4.031: log likelihood=-96.23
## AIC=204.46 AICc=206.61 BIC=215.43
#remove constant
fit_2.1 <- aus_airpassengers %>%
model(ARIMA(Passengers ~ pdq(0,1,2) + PDQ(0,0,0) + 0))
report(fit_2.1)
## Series: Passengers
## Model: ARIMA(0,1,2)
##
## Coefficients:
## ma1 ma2
## 0.1555 0.2913
## s.e. 0.1402 0.1429
##
## sigma^2 estimated as 5.613: log likelihood=-104.02
## AIC=214.05 AICc=214.62 BIC=219.53
#d Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?
fit3 <- aus_airpassengers %>%
model(ARIMA(Passengers ~ pdq(0,2,1)))
report(fit3)
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8963
## s.e. 0.0594
##
## sigma^2 estimated as 4.308: log likelihood=-97.02
## AIC=198.04 AICc=198.32 BIC=201.65
fit3 %>% forecast(h=10) %>%
autoplot(aus_airpassengers) +
labs(y = "assengers", title = "Travel: AUS")
This is arguable the best fitting model.
For the United States GDP series (from global_economy): * a. if necessary, find a suitable Box-Cox transformation for the data; * b.fit a suitable ARIMA model to the transformed data using ARIMA(); * c.try some other plausible models by experimenting with the orders chosen; * d.choose what you think is the best model and check the residual diagnostics; * e.produce forecasts of your fitted model. Do the forecasts look reasonable? * compare the results with what you would obtain using ETS() (with no transformation).
#a if necessary, find a suitable Box-Cox transformation for the data;
usa_econ <- global_economy %>%
filter(Code == "USA")
# Plotting US GDP with actual values in trillions
usa_econ %>%
autoplot(GDP) +
labs(title = "US GDP",
y = "GDP (in Trillions of USD)") +
scale_y_continuous(labels = function(x) paste0(x / 1e12, " T")) + # Format y-axis labels in trillions
theme_minimal()
No seasonality so no transform needed.
#b fit a suitable ARIMA model to the transformed data using ARIMA();
fit <- usa_econ %>%
model(ARIMA(GDP))
report(fit)
## Series: GDP
## Model: ARIMA(0,2,2)
##
## Coefficients:
## ma1 ma2
## -0.4206 -0.3048
## s.e. 0.1197 0.1078
##
## sigma^2 estimated as 2.615e+22: log likelihood=-1524.08
## AIC=3054.15 AICc=3054.61 BIC=3060.23
ARIMA(0,2,2).
#c try some other plausible models by experimenting with the orders chosen;
ndiffs(usa_econ$GDP)
## [1] 2
The number of differences, 2, justifies keeping d=2 in our next ARIMA selections.
fit_other1<- usa_econ %>%
model(ARIMA(GDP ~ pdq(1,2,2)))
fit_other2<- usa_econ %>%
model(ARIMA(GDP ~ pdq(2,2,0)))
fit_other3<- usa_econ %>%
model(ARIMA(GDP ~ pdq(2,2,2)))
report(fit_other1)
## Series: GDP
## Model: ARIMA(1,2,2)
##
## Coefficients:
## ar1 ma1 ma2
## 0.2053 -0.5912 -0.1928
## s.e. 0.3008 0.2886 0.2102
##
## sigma^2 estimated as 2.646e+22: log likelihood=-1523.86
## AIC=3055.72 AICc=3056.51 BIC=3063.82
report(fit_other2)
## Series: GDP
## Model: ARIMA(2,2,0)
##
## Coefficients:
## ar1 ar2
## -0.2088 -0.2059
## s.e. 0.1320 0.1317
##
## sigma^2 estimated as 2.984e+22: log likelihood=-1527.51
## AIC=3061.02 AICc=3061.48 BIC=3067.09
report(fit_other3)
## Series: GDP
## Model: ARIMA(2,2,2)
##
## Coefficients:
## ar1 ar2 ma1 ma2
## 1.3764 -0.4780 -1.9659 1.0000
## s.e. 0.1216 0.1354 0.0723 0.0719
##
## sigma^2 estimated as 2.283e+22: log likelihood=-1521.14
## AIC=3052.27 AICc=3053.47 BIC=3062.4
In assessing our reports, we can ascertain that changing our q value a lower figure results in a weaker model and higher AIC figures.
#d choose what you think is the best model and check the residual diagnostics;
fit_other3 %>% gg_tsresiduals()
This looks as though the model is sufficient, so I’ll move on to producing forecasts.
#e produce forecasts of your fitted model. Do the forecasts look reasonable?
fit_other3 %>% forecast(h=10) %>%
autoplot(usa_econ) +
labs(y = "GDP")+
scale_y_continuous(labels = scales::comma) + # Format y-axis labels as comma-separated
theme_minimal()
The forecasts track with existing annual trends.
#f compare the results with what you would obtain using ETS() (with no transformation).
ets_comparison1 <- usa_econ %>%
model(ETS(GDP))
ets_comparison1 %>% forecast(h = 10) %>%
autoplot(usa_econ)
ets_comparison1 %>% gg_tsresiduals()
After comparison my model and the ETS, I think ARIMA(2,2,2) performs a bit better.