a. Explain the differences among
these figures. Do they all indicate that the data are white noise?
They do all indicate the data is white noise. Although it might be hard to tell on the right one since its showing series of 1,000. The main difference between these 3 is the number of samples present, with less numbers the ACF estimate will seem to have a higher variance especially in comparision to the middle and right visual.
The the blue lines, or critical values are smaller and closer to zero with more data, and wider in the first visual with less data of 36 numbers. The true autocorrelation for a white noise process is zero, so its different in each in each figure due to the change in sample size.
The autplot, on top, shows tha=e variance changing and that there is a trend, which makes the series non-stationary. The ACF plot is decreasing very slowly which indicates that it is non-staionary. Additionally, the PACF shows a large autocorrelation on lag 1 and then drops quickly which indicates that the series is non-stationary. In order to make a non-stationary series stationary, I can transform the series to stabilize the variance, and might still need to complete differencing to stabilize the mean. The order of differencing needed can be found using the KPSS test.
library(fpp3)
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.5
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.4.1
## ✔ lubridate 1.9.3 ✔ fable 0.4.0
## ✔ ggplot2 3.5.1
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
gafa_stock %>% filter(Symbol == "AMZN") %>%
gg_tsdisplay(Close, plot_type='partial') +
labs(title = "Amazon Closing Price")
## Warning: Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
3. For the following series, find an appropriate Box-Cox transformation
and order of differencing in order to obtain stationary data.
EDIT
library(latex2exp)
turkey <- global_economy %>%
filter(Country == "Turkey")
lambda <- global_economy %>%
filter(Country == "Turkey") %>%
features(GDP, features = guerrero) %>%
pull(lambda_guerrero)
turkey_transformed <- box_cox(turkey$GDP, lambda)
turkey_tsibble <- turkey %>%
as_tsibble(index = Year, key = Country) %>%
mutate(Transformed_GDP = turkey_transformed)
gg_tsdisplay(turkey_tsibble, GDP, plot_type = 'partial') +
labs(title = "Original Turkish GDP")
gg_tsdisplay(turkey_tsibble, plot_type = 'partial') +
labs(title = TeX(paste0("Transformed Turkish GDP with $\\lambda$ = ",
round(lambda, 2))))
## Plot variable not specified, automatically selected `y = GDP`
#still needs to be differenced
library(tseries)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
turkey_tsibble %>%
features(Transformed_GDP, unitroot_ndiffs)
## # A tibble: 1 × 2
## Country ndiffs
## <fct> <int>
## 1 Turkey 1
turkey_tsibble %>%
mutate(Differenced_GDP = Transformed_GDP %>% difference(1)) %>%
gg_tsdisplay(Differenced_GDP, plot_type = 'partial')
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
Tasmania <- aus_accommodation %>%
filter(State == "Tasmania")
gg_tsdisplay(Tasmania, Takings, plot_type = 'partial') +
labs(title = "Original Tasmania Takings")
Tasmania %>%
features(Takings, unitroot_ndiffs)
## # A tibble: 1 × 2
## State ndiffs
## <chr> <int>
## 1 Tasmania 1
Tasmania %>%
features(Takings, unitroot_nsdiffs)
## # A tibble: 1 × 2
## State nsdiffs
## <chr> <int>
## 1 Tasmania 1
lambda <- Tasmania %>%
features(Takings, features = guerrero) %>%
pull(lambda_guerrero)
t_transformed <- box_cox(Tasmania$Takings, lambda)
# lag 4 = seasonal
Tasmania %>% gg_tsdisplay(difference(t_transformed, 4), plot_type='partial') +
labs(title = TeX(paste0("Seasonal Differenced Tasmania with $\\lambda$ = ",
round(lambda,2))))
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
#still needs one more order of difference
Tasmania %>%
gg_tsdisplay(difference(difference(t_transformed, 4)), plot_type = 'partial') +
labs(title = TeX(paste0("Seasonal + one order Differenced Tasmania with $\\lambda$ = ", round(lambda, 2))))
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_point()`).
** does it matter if unitroot_nsdiffs if done before or after
transformation? ** internet says after transformation
souvenirs %>%
gg_tsdisplay(Sales, plot_type='partial', lag = 36) +
labs(title = "Non-transformed Monthly Souvenir Sales")
lambda <- souvenirs %>%
features(Sales, features = guerrero) %>%
pull(lambda_guerrero)
souvenirs %>%
features(box_cox(Sales,lambda), unitroot_nsdiffs) #one seasonal differencing
## # A tibble: 1 × 1
## nsdiffs
## <int>
## 1 1
souvenirs %>%
features(box_cox(Sales,lambda), unitroot_ndiffs) #one differencing
## # A tibble: 1 × 1
## ndiffs
## <int>
## 1 1
souvenirs %>%
gg_tsdisplay(difference(difference(box_cox(Sales,lambda), 12)), plot_type='partial', lag = 36) +
labs(title = TeX(paste0("Differenced Monthly Souvenir Sales with $\\lambda$ = ",
round(lambda,2))))
## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_point()`).
set.seed(12345678)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1)) #monthly
myseries %>%
gg_tsdisplay(Turnover, plot_type='partial', lag = 36) +
labs(title = "Non-transformed Retail")
lambda <- myseries %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
myseries <- myseries %>%
mutate(Transformed_Turnover = box_cox(Turnover, lambda))
myseries %>%
features(Transformed_Turnover, unitroot_ndiffs)
## # A tibble: 1 × 3
## State Industry ndiffs
## <chr> <chr> <int>
## 1 Northern Territory Clothing, footwear and personal accessory retailing 1
myseries %>%
features(Transformed_Turnover, unitroot_nsdiffs)
## # A tibble: 1 × 3
## State Industry nsdiffs
## <chr> <chr> <int>
## 1 Northern Territory Clothing, footwear and personal accessory retailing 1
myseries %>%
gg_tsdisplay(difference(difference(box_cox(Turnover,lambda), 12)), plot_type='partial', lag = 36) +
labs(title = TeX(paste0("Differenced Monthly Souvenir Sales with $\\lambda$ = ",
round(lambda,2))))
## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_point()`).
6. Simulate and plot some data from simple ARIMA models.
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
Changing the parameter changes the patterns in the series. Value are required to be under 1 to be considered stationary. Negative values of phi cause the series to oscillates around zero, on negative and positive values. Plot with phi of 0.9 shows a more clear trend; it is close to 1 which is typically random walk. Phi of 0.1 is closes to white noise, this plot has more fluctuations and change quickly from previous value. When phi = 0.6 it is more moderate compared to .1 and more volatile compared to .9.
set.seed(123)
sim %>% autoplot(.vars=y)+
labs(title = "AR(1) phi = 0.6")
for(i in 2:100)
y[i] <- 0.1*y[i-1] + e[i]
sim1 <- tsibble(idx = seq_len(100), y = y, index = idx)
sim1 %>% autoplot(.vars=y)+
labs(title = "AR(1) phi = 0.1")
for(i in 2:100)
y[i] <- 0.9*y[i-1] + e[i]
sim2 <- tsibble(idx = seq_len(100), y = y, index = idx)
sim2 %>% autoplot(.vars=y)+
labs(title = "AR(1) phi = 0.9")
for(i in 2:100)
y[i] <- -0.6*y[i-1] + e[i]
sim3 <- tsibble(idx = seq_len(100), y = y, index = idx)
sim3 %>% autoplot(.vars=y)+
labs(title = "AR(1) phi = -.6")
e <- rnorm(100, mean = 0, sd = 1) #variance =1
# MA(1)
for (i in 2:100) {
y[i] <- e[i] + 0.6 * e[i - 1] # θ1 = 0.6
}
sim4 <- tsibble(idx = seq_len(100), y = y, index = idx)
sim4 %>% autoplot(.vars=y)+
labs(title = "MA(1) pheta = 0.6")
Changing pheta has almost the opposite effect as phi; the plot with 0.8 has more fluctuations as the previous error terms has more influence. When pheta is .2 the plot has more fluctuations and changes quickly. Similarly negative value of pheta also has an oscillatory pattern as seen in the plot of pheta = -.5.
for (i in 2:100) {
y[i] <- e[i] + 0.2 * e[i - 1] }
sim5 <- tsibble(idx = seq_len(100), y = y, index = idx)
sim5 %>% autoplot(.vars=y)+
labs(title = "MA(1) pheta = 0.2")
for (i in 2:100) {
y[i] <- e[i] + 0.8 * e[i - 1] }
sim6 <- tsibble(idx = seq_len(100), y = y, index = idx)
sim6 %>% autoplot(.vars=y)+
labs(title = "MA(1) pheta = 0.8")
for (i in 2:100) {
y[i] <- e[i] + -0.5 * e[i - 1] }
sim7 <- tsibble(idx = seq_len(100), y = y, index = idx)
sim7 %>% autoplot(.vars=y)+
labs(title = "MA(1) pheta = -.5")
e. Generate data from an ARMA(1,1) model with 𝜙1=0.6, 𝜃1=0.6, and
𝜎^2=1.
arma11 <- function(phi, theta, n = 100, sigma = 1) {
e <- rnorm(n, mean = 0, sd = sigma)
y <- numeric(n)
for (i in 2:n) {
y[i] <- phi * y[i - 1] + e[i] + theta * e[i - 1]
}
tsibble(idx = seq_len(n), y = y, index = idx)
}
phi1 <- 0.6
theta1 <- 0.6
arma_series <- arma11(phi1, theta1)
ar2 <- function(phi1, phi2, n = 100, sigma = 1) {
e <- rnorm(n, mean = 0, sd = sigma)
y <- numeric(n)
for (i in 3:n) { # 2 phetas means start at 3rd
y[i] <- phi1 * y[i - 1] + phi2 * y[i - 2] + e[i]
}
tsibble(idx = seq_len(n), y = y, index = idx)
}
phi1 <- -0.8
phi2 <- 0.3
ar2_series <- ar2(phi1, phi2)
ARMA (1,1) has residuals that appear to be white noise; the autoplot look like its stationary, with no trends, and fluctuates between 4 and -2. The AR(2) plot has data on zero about half the plot then it begins to exponentially increases and decreases. AR(2) does not appear to be stationary, ACF doesnt show white noise and oscillates, the PACF only has one lag present. Something seems off and this is because ϕ2-ϕ1 =0.3−(−0.8) = 1.1 and is supposed to be under 1 for AR to be stationary.
arma_series%>%
gg_tsdisplay(y, plot_type='partial') +
labs(title = TeX("ARMA(1,1) with ϕ1 = 0.6 and θ1 = 0.6"))
ar2_series %>%
gg_tsdisplay(y, plot_type='partial') +
labs(title = TeX("AR(2) with ϕ1 = -0.8 and ϕ2 = 0.3"))
The ARIMA (0,2,1) was selected. The residuals in both the ACF and PACF look like white noise.
fit <- aus_airpassengers |>
filter(Year < 2008) %>%
model(ARIMA(Passengers))
report(fit)
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8810
## s.e. 0.0795
##
## sigma^2 estimated as 2.464: log likelihood=-67.55
## AIC=139.1 AICc=139.47 BIC=142.27
residuals <- fit %>% residuals()
gg_tsdisplay(residuals, plot_type = "partial") +
labs(title = "Residuals of the ARIMA Model")
## Plot variable not specified, automatically selected `y = .resid`
fit %>%
forecast(h=10) %>%
autoplot(aus_airpassengers)
b. Write the model in terms of the backshift operator.
\[ y_t = -0.87 \epsilon_{t-1} + \epsilon_t \]
\[ (1 - B)^2 y_t = (1 - 0.87 B) \epsilon_t \]
Model in a: ARIMA(2,1,0) has a better looking forecast than the ARIMA (0,1,0) in c, which looks like the forecasts fall below the actual points, or the prediction interval is too narrow. The ACF and PACF residuals similar but ARIMA(2,1,0) has more negative lags and this model ARIMA (0,1,0) has more positive lags.
fit2 <-aus_airpassengers %>%
filter(Year < 2008) %>%
model(ARIMA(Passengers ~ pdq(0,1,0) +1 )) # Passengers ~ drift() or +1
fit2 %>%
forecast(h=15) %>%
autoplot(aus_airpassengers) +
labs(title = "Australian Passengers with ARIMA(0,1,0)", y = "Passengers (in millions)")
residuals <- fit2 %>% residuals()
gg_tsdisplay(residuals, plot_type = "partial") +
labs(title = "Residuals for ARIMA(0,1,0) + drift")
## Plot variable not specified, automatically selected `y = .resid`
The lags in this model are much shorter than a and c. The residuals also suggest this model is stationary. When removing the constant, errors come up and then residuals appear to have NA values preventing plotting. Removing the constant has many potential impact- on the mean, and for example, removing the constant results in a trend of order 0.
fit3 <-aus_airpassengers %>%
filter(Year < 2008) %>%
model(ARIMA(Passengers ~ pdq(2,1,2) +1 )) # Passengers ~ drift() or +1
fit3 %>%
forecast(h=15) %>%
autoplot(aus_airpassengers) +
labs(title = "Australian Passengers with ARIMA(2,1,2)", y = "Passengers (in millions)")
residuals <- fit3 %>% residuals()
gg_tsdisplay(residuals, plot_type = "partial") +
labs(title = "Residuals for ARIMA(2,1,2) + drift")
## Plot variable not specified, automatically selected `y = .resid`
sum(is.na(aus_airpassengers))
## [1] 0
fit3 <- aus_airpassengers %>%
filter(Year < 2008) %>%
model(ARIMA(Passengers ~ pdq(2,1,2) + 0)) # Using +0 instead of +1
## Warning: 1 error encountered for ARIMA(Passengers ~ pdq(2, 1, 2) + 0)
## [1] non-stationary AR part from CSS
residuals <- fit3 %>% residuals()
sum(is.na(residuals))
## [1] 38
if (all(!is.na(residuals))) {
gg_tsdisplay(residuals, plot_type = "partial") +
labs(title = "Residuals for ARIMA(2,1,2) without constant")
} else {
message("Residuals contain NA values; cannot plot.")
}
## Residuals contain NA values; cannot plot.
There is a warning: “Model specification induces a quadratic or higher order polynomial trend”.The Forecast looks like a better match than some of the other models. The slope also changes a bit. This is the auto model R gave us in a but with the constant added.
fit3 <-aus_airpassengers %>%
filter(Year < 2008) %>%
model(ARIMA(Passengers ~ pdq(0,2,1) +1 )) # Passengers ~ drift() or +1
## Warning: Model specification induces a quadratic or higher order polynomial trend.
## This is generally discouraged, consider removing the constant or reducing the number of differences.
fit3 %>%
forecast(h=15) %>%
autoplot(aus_airpassengers) +
labs(title = "Australian Passengers with ARIMA(0,2,1))", y = "Passengers (in millions)")
residuals <- fit3 %>% residuals()
gg_tsdisplay(residuals, plot_type = "partial") +
labs(title = "Residuals for ARIMA(0,2,1)) + drift")
## Plot variable not specified, automatically selected `y = .resid`
us_gdp <- global_economy %>% filter(Country == "United States")
gg_tsdisplay(us_gdp, plot_type = "partial") #not stationary
## Plot variable not specified, automatically selected `y = GDP`
a. if necessary, find a suitable Box-Cox transformation for the
data;
lambda <-us_gdp %>%
features(GDP, features = guerrero) %>%
pull(lambda_guerrero)
us_gdp_transformed <- us_gdp %>%
mutate(Transformed_GDP = box_cox(GDP, lambda))
us_gdp_transformed %>%
features(Transformed_GDP, unitroot_ndiffs)
## # A tibble: 1 × 2
## Country ndiffs
## <fct> <int>
## 1 United States 1
us_gdp_transformed <- us_gdp_transformed %>%
mutate(Differenced_GDP = difference(Transformed_GDP))
best_fit <- us_gdp_transformed %>%
model(auto_arima = ARIMA(Differenced_GDP))
report(best_fit)
## Series: Differenced_GDP
## Model: ARIMA(1,0,0) w/ mean
##
## Coefficients:
## ar1 constant
## 0.4586 118.1789
## s.e. 0.1198 9.5047
##
## sigma^2 estimated as 5381: log likelihood=-325.32
## AIC=656.65 AICc=657.09 BIC=662.83
usa_fit <- global_economy %>%
filter(Country == "United States") %>%
model(arima120 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,2,0)),
arima210 = ARIMA(box_cox(GDP,lambda) ~ pdq(2,1,0)),
arima111 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,1,1)))
glance(usa_fit) %>% arrange(AICc) %>% select(.model:BIC)
## # A tibble: 3 × 6
## .model sigma2 log_lik AIC AICc BIC
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 arima120 6780. -326. 656. 656. 660.
## 2 arima111 5580. -325. 659. 659. 667.
## 3 arima210 5580. -325. 659. 659. 667.
# the AICc value of a model from c, plausible models, is lower than the autoplot
usa_fit %>%
select(arima120) %>%
gg_tsresiduals() +
ggtitle("Residuals for ARIMA(1,2,0)")
#residuals prove to be white noise and non
usa_fit <- us_gdp_transformed %>%
model(arima120 = ARIMA(box_cox(GDP, lambda) ~ pdq(1, 2, 0)))
forecasts <- usa_fit %>%
forecast(h = 8)
autoplot(forecasts, us_gdp_transformed) +
ggtitle("Forecast for US GDP with ARIMA(1,2,0)") +
labs(y = "GDP (Transformed)", x = "Year")
accuracy(usa_fit)
## # A tibble: 1 × 11
## Country .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <fct> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 United St… arima… Trai… -1.84e9 1.74e11 1.05e11 0.130 1.64 0.308 0.427 0.0397
Visually the forecast for the ARIMA model has a larger slope than ETS. The AICc cannot be be used to compare the models since ARIMA has differencing and ETS does not. The RMSE value is needed. However, I’m unsure why the RMSE values for both models are so larger, I may have made an error. Actually, I did not invert the scale bak after transfroming the data for the AIRMA model but that does not explain it doe ETS since there was no transformation. Based on these results though the ETS model has the lower RMSE value. There is a lot more improvements I can make but I encounter more errors as I tried. So i will just submit this and hope for the best.
ets_fit <- us_gdp %>%
model(ETS_Model = ETS(GDP))
ets_forecasts <- ets_fit %>%
forecast(h = 8)
autoplot(us_gdp, GDP) +
autolayer(ets_forecasts, series = "ETS Forecasts") +
ggtitle("ETS Forecasts for US GDP")
## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), :
## Ignoring unknown parameters: `series`
## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `series`
accuracy(ets_fit)
## # A tibble: 1 × 11
## Country .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <fct> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 United Sta… ETS_M… Trai… 2.10e10 1.67e11 1.05e11 0.484 1.91 0.307 0.409 0.153