#Q1-A/B
#In the 3 examples of our ACF graphs, they all show a white noise pattern as there is no decay pattern. The bounds of each graph are limited by the function used to calculate bounds being 1/sqrt(T) where T is our sample size. As long as our ACF falls within the absolute value of this function, we consider it noise. In these graphs we see as the bounds shrink the function still fall within our bounds implying they are simply noise. The critical values change for this same reason as the sample size is in the denominator of our function for calculating the standard error. This also brings about the idea that there is a diminishing return on the error as sample size grows. For the values given the difference is substantial and visible, but sample size must grow exponentially to continue having the same impact. This is also why even as the sample size grows, the autocorrelations show no pattern and do not jump outside the bounds which implies that this is still just referring to white noise in each.
#Q2
amazon_stock <- gafa_stock |>
filter(Symbol == "AMZN")
#Daily closing prices
autoplot(amazon_stock, Close) +
labs(
title = "Amazon Daily Closing Prices",
x = "Date",
y = "Closing Price"
)

amazon_stock |>
mutate(diff_close = difference(Close)) |>
gg_tsdisplay(diff_close, plot_type = "partial") +
labs(title = "Differenced Amazon Closing Prices with ACF and PACF")
## Warning: `gg_tsdisplay()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsdisplay()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

#First as far as having having a changing mean and variance I think that is visually clear about our stock data as it does no have a long term mean. When comparing the pacf and acf graphs they are extremely similar at almost every value. With some exceptions here, most autocorrelations are within the significance dashed lines and around lags 7-20 the pacf is lower for just about every value even though they follow the same pattern. This similarity betwenn the differenced data and the differenced data behaves very similar to the noise which is also why this data after differencing is actually stationary about the mean.
#Q3
# A.Turkish GDP
#US GDP data would typically have seasonal patterns as I Imagine turkish does as well but because this dataset is yearly we don't have to consider that. Clearly from our graph there is very strong growth and increasing variance, for this reason we use a log transformation (lambda~0) and as yearly observations we will use 1 as our differencing value. Clealry by differencing our log values we get a much more stationary looking dataset.
turkey_gdp <- global_economy |>
filter(Country == "Turkey") |>
select(Year, GDP)
autoplot(turkey_gdp, GDP)

turkey_gdp |>
mutate(log_GDP = log(GDP)) |>
autoplot(log_GDP)

turkey_gdp |>
mutate(diff_log_GDP = difference(log(GDP))) |>
autoplot(diff_log_GDP)
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

# B.Tasmania accommodation takings
#This dataset is very different from the previous as we can clearly see the seasonal trend showing a spike at Q1 for every year of the dataset. We have a clear growth that seems like multiplicative seasonality and growth so I would assume the log transformation again would be best. I calculated the guerrero method lambda value which came out to .0018 so the log transformation is confirmed. We then again take the differences at lag 4 for the quarterly data and get an interesting looking graph that is stationary.
tas_accom <- aus_accommodation |>
filter(State == "Tasmania") |>
mutate(Quarter = yearquarter(Date)) |>
select(Quarter, Takings)
autoplot(tas_accom, Takings)

lambda_tas <- tas_accom|>
features(Takings, features = guerrero) |>
pull(lambda_guerrero)
tas_accom |>
mutate(log_Takings = log(Takings)) |>
autoplot(log_Takings)

tas_accom |>
mutate(seas_diff = difference(log(Takings), lag = 4)) |>
autoplot(seas_diff)
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

# C. Souvenirs
#This final dataset of sales data shows an interesting pattern with a big spike in December of every eyar with a big drop in January so there is a seasonal pattern. We also see a very strong pattern of growth and increasing variance which is why I assumed would be a log() transform and this is confirmed by our guerrero lambda value of .0021. I chose a lag of 12 for the yearly pattern and then we can see after log transformation we have a strong seasonal pattern For this reason I did first differencing on the seasonally differenced dataset to obtain a dataset that is stationary.
autoplot(souvenirs)
## Plot variable not specified, automatically selected `.vars = Sales`

souvenirs |>
mutate(log_sales = log(Sales)) |>
autoplot(log_sales)

souvenirs |>
mutate(seas_diff = difference(log(Sales), lag = 12)) |>
autoplot(seas_diff)
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

lambda_souv <- souvenirs|>
features(Sales, features = guerrero) |>
pull(lambda_guerrero)
souvenirs |>
mutate(final_diff = difference(difference(log(Sales), lag = 12))) |>
autoplot(final_diff)
## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).

#Q5
#Here is my dataset from the requested exercise. We can see a clear pattern of growth but there is also some obvious seasnoality with December spikes and January drops. This is why I again will use a log transformation and a lag of 12 similar to our Souveniers dataset from question 3. I differenced both the lag for seasonal with an additional final differencing in order to get my final stationary dataset. The ACF also shows data mostly within our bounds along with our plot being centered around a stable mean.
set.seed(111)
my_retail <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`, 1))
autoplot(my_retail, Turnover) +
labs(title = "Retail turnover series", y = "Turnover")

my_retail <- my_retail |>
mutate(
log_turnover = log(Turnover),
seas_diff = difference(log_turnover, lag = 12),
final_diff = difference(seas_diff)
)
autoplot(my_retail, final_diff) +
labs(
title = "Stationary transformed retail series",
y = "Differenced log turnover"
)
## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).

my_retail |>
ACF(final_diff, na.action = na.omit) |>
autoplot()

#Q6
#A.
set.seed(1001)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
autoplot(sim, y) +
labs(
title = "Simulated AR(1) series with phi1 = 0.6",
x = "Time",
y = "y"
)

#B.
#Here we can see that as we decrease the phi value our data behaves more like noise as opposed to the 0.9 value which appears to have much more of a direction. Below zero we see our data more similar to the 0.2 value but behaving even more like an oscillaitng dataset since the negative causes it to turn direction more frequently.
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.2*y[i-1] + e[i]
sim1 <- tsibble(idx = seq_len(100), y = y, index = idx)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.9*y[i-1] + e[i]
sim2 <- tsibble(idx = seq_len(100), y = y, index = idx)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- -0.6*y[i-1] + e[i]
sim3 <- tsibble(idx = seq_len(100), y = y, index = idx)
autoplot(sim1, y) + labs(title = "AR(1): phi1 = 0.2")

autoplot(sim2, y) + labs(title = "AR(1): phi1 = 0.9")

autoplot(sim3, y) + labs(title = "AR(1): phi1 = -0.6")

#C
#adjusted formula to be using e for our MA model as opposed to AR model with y[i-1]
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- e[i] + 0.6*e[i-1]
sim_ma1 <- tsibble(idx = seq_len(100), y = y, index = idx)
autoplot(sim_ma1, y) +
labs(
title = "Simulated MA(1) series with theta1 = 0.6",
x = "Time",
y = "y"
)

#D
#Similarly here we see our dataset theta around zero behave more like noise, but didn't noticeably have the most variance. For our 0.9 value of theta we see the smoothest looking trends while our negative value again seems to take the most erratic turns compared to our other values.
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- e[i] + 0.2*e[i-1]
sim_ma1a <- tsibble(idx = seq_len(100), y = y, index = idx)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- e[i] + 0.9*e[i-1]
sim_ma1b <- tsibble(idx = seq_len(100), y = y, index = idx)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- e[i] - 0.6*e[i-1]
sim_ma1c <- tsibble(idx = seq_len(100), y = y, index = idx)
autoplot(sim_ma1a, y) + labs(title = "MA(1): theta1 = 0.2")

autoplot(sim_ma1b, y) + labs(title = "MA(1): theta1 = 0.9")

autoplot(sim_ma1c, y) + labs(title = "MA(1): theta1 = -0.6")

#E.
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i] + 0.6*e[i-1]
sim_arma11 <- tsibble(idx = seq_len(100), y = y, index = idx)
autoplot(sim_arma11, y) +
labs(
title = "Simulated ARMA(1,1) series",
x = "Time",
y = "y"
)

#F.
y <- numeric(100)
e <- rnorm(100)
for(i in 3:100)
y[i] <- -0.8*y[i-1] + 0.3*y[i-2] + e[i]
sim_ar2 <- tsibble(idx = seq_len(100), y = y, index = idx)
autoplot(sim_ar2, y) +
labs(
title = "Simulated AR(2) series",
x = "Time",
y = "y"
)

#G
#When we look at the ARMA(1,1) we see very stationary behavior from the moderate values around 0.Very similar variablity throughout but we do observe a few big turns in direction, our variance is constant throughout. In our other model there is clearly increasing variance that seems to be a consistent oscillaitng pattern that grows more over time. This is consistent with the big oscillations we have seen from negative values in this exercise to give us a clearly non-stationary dataset due to the variance.
arma_tbl <- sim_arma11 |>
as_tibble() |>
mutate(series = "ARMA(1,1)")
ar2_tbl <- sim_ar2 |>
as_tibble() |>
mutate(series = "AR(2)")
both <- bind_rows(arma_tbl, ar2_tbl)
ggplot(both, aes(x = idx, y = y)) +
geom_line() +
facet_wrap(~series, scales = "free_y") +
labs(
title = "Simulated ARMA(1,1) and AR(2) series",
x = "Time",
y = "y"
)

#Q7
#A
#FIrst we use the auto ARIMA to find an appropriate model and then we use the Ljung-Box test to further confirm that the residuals behave like white noise with a p-value of 0.46 suggesting there is no autocorrelation left over in our residuals.
fit_auto <- aus_airpassengers |>
model(auto = ARIMA(Passengers))
report(fit_auto)
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8963
## s.e. 0.0594
##
## sigma^2 estimated as 4.308: log likelihood=-97.02
## AIC=198.04 AICc=198.32 BIC=201.65
glance(fit_auto)
## # A tibble: 1 × 8
## .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 auto 4.31 -97.0 198. 198. 202. <cpl [0]> <cpl [1]>
gg_tsresiduals(fit_auto)
## Warning: `gg_tsresiduals()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsresiduals()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

augment(fit_auto) |>
features(.innov, ljung_box, lag = 24, dof = 0)
## # A tibble: 1 × 3
## .model lb_stat lb_pvalue
## <chr> <dbl> <dbl>
## 1 auto 24.0 0.463
fc_auto <- fit_auto |>
forecast(h = 10)
autoplot(aus_airpassengers, Passengers) +
autolayer(fc_auto, level = 95) +
labs(
title = "Forecasts from automatically selected ARIMA model",
y = "Passengers (millions)",
x = "Year"
)

#B.
#Our selected model was ARIMA(0,2,1) which is why we would use second order differencing as well as a single MA term. If we use the backshift operator then (1-2B+B^2)yt=(1-0.8963B)et is our notation with our suggested ARIMA coefficient using second order differencing.
#C
#Here is aan example of a (0,1,0) which is a random walk with drift model. We can see the graph is giving us a much more linear forecast than the selected model. The selected model gives an upward change instead of leveled off and seems to have a wider bounds of uncertainty as well. This is confirming the autoselected model is more appropriate.
fit_rw_drift <- aus_airpassengers |>
model(rw_drift = ARIMA(Passengers ~ pdq(0,1,0)))
report(fit_rw_drift)
## Series: Passengers
## Model: ARIMA(0,1,0) w/ drift
##
## Coefficients:
## constant
## 1.4191
## s.e. 0.3014
##
## sigma^2 estimated as 4.271: log likelihood=-98.16
## AIC=200.31 AICc=200.59 BIC=203.97
fc_rw_drift <- fit_rw_drift |>
forecast(h = 10)
autoplot(aus_airpassengers, Passengers) +
autolayer(fc_auto, series = "Auto ARIMA", level = 95) +
autolayer(fc_rw_drift, series = "ARIMA(0,1,0) with drift", level = 95) +
labs(
title = "Auto ARIMA vs ARIMA(0,1,0) with drift",
y = "Passengers (millions)",
x = "Year"
)
## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), :
## Ignoring unknown parameters: `series`
## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `series`
## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), :
## Ignoring unknown parameters: `series`
## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `series`
## Scale for fill_ramp is already present.
## Adding another scale for fill_ramp, which will replace the existing scale.

#D
#First we see interestingly that the no drift model fails as NULL suggesting that a drift model is needed and the ma2 value of 1.00 is a confirmation of this. The reason being that type of value suggests a dataset may be numerically unstable. It seems here that drift variable is actually an important component of viewing this as stationary data.
fit_212_drift <- aus_airpassengers |>
model(arima212_drift = ARIMA(Passengers ~ 1 + pdq(2,1,2)))
fit_212_nodrift <- aus_airpassengers |>
model(arima212_nodrift = ARIMA(Passengers ~ 0 + pdq(2,1,2)))
## Warning: 1 error encountered for arima212_nodrift
## [1] non-stationary AR part from CSS
report(fit_212_drift)
## Series: Passengers
## Model: ARIMA(2,1,2) w/ drift
##
## Coefficients:
## ar1 ar2 ma1 ma2 constant
## -0.5518 -0.7327 0.5895 1.0000 3.2216
## s.e. 0.1684 0.1224 0.0916 0.1008 0.7224
##
## sigma^2 estimated as 4.031: log likelihood=-96.23
## AIC=204.46 AICc=206.61 BIC=215.43
report(fit_212_drift)
## Series: Passengers
## Model: ARIMA(2,1,2) w/ drift
##
## Coefficients:
## ar1 ar2 ma1 ma2 constant
## -0.5518 -0.7327 0.5895 1.0000 3.2216
## s.e. 0.1684 0.1224 0.0916 0.1008 0.7224
##
## sigma^2 estimated as 4.031: log likelihood=-96.23
## AIC=204.46 AICc=206.61 BIC=215.43
report(fit_212_nodrift)
## Series: Passengers
## Model: NULL model
## NULL model
fit_models <- aus_airpassengers |>
model(
`Auto ARIMA` = ARIMA(Passengers),
`ARIMA(0,1,0) with drift` = ARIMA(Passengers ~ 1 + pdq(0,1,0)),
`ARIMA(2,1,2) with drift` = ARIMA(Passengers ~ 1 + pdq(2,1,2)),
`ARIMA(2,1,2) no constant` = ARIMA(Passengers ~ 0 + pdq(2,1,2))
)
## Warning: 1 error encountered for ARIMA(2,1,2) no constant
## [1] non-stationary AR part from CSS
report(fit_models)
## Warning in report.mdl_df(fit_models): Model reporting is only supported for
## individual models, so a glance will be shown. To see the report for a specific
## model, use `select()` and `filter()` to identify a single model.
## # A tibble: 3 × 8
## .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 Auto ARIMA 4.31 -97.0 198. 198. 202. <cpl [0]> <cpl [1]>
## 2 ARIMA(0,1,0) with drift 4.27 -98.2 200. 201. 204. <cpl [0]> <cpl [0]>
## 3 ARIMA(2,1,2) with drift 4.03 -96.2 204. 207. 215. <cpl [2]> <cpl [2]>
fc_models <- fit_models |>
forecast(h = 10)
autoplot(aus_airpassengers, Passengers) +
autolayer(
filter(fc_models, .model %in% c(
"Auto ARIMA",
"ARIMA(0,1,0) with drift",
"ARIMA(2,1,2) with drift"
)),
level = 95
) +
labs(
title = "Forecast comparison",
y = "Passengers (millions)",
x = "Year"
)

autoplot(aus_airpassengers, Passengers) +
autolayer(
filter(fc_models, .model %in% c(
"ARIMA(2,1,2) with drift",
"ARIMA(2,1,2) no constant"
)),
level = 95
) +
labs(
title = "ARIMA(2,1,2): with drift vs no constant",
y = "Passengers (millions)",
x = "Year"
)
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).

#E
#This constant allows the resulting graph to show a positive upward trend instead of leveling off and prevents the error being produced from the previous example when trying to remove the constant by placing our zero first.
fit_021_const <- aus_airpassengers |>
model(arima021_const = ARIMA(Passengers ~ pdq(0,2,1)))
report(fit_021_const)
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8963
## s.e. 0.0594
##
## sigma^2 estimated as 4.308: log likelihood=-97.02
## AIC=198.04 AICc=198.32 BIC=201.65
fc_021_const <- fit_021_const |>
forecast(h = 10)
autoplot(aus_airpassengers, Passengers) +
autolayer(fc_021_const, series = "ARIMA(0,2,1) with constant", level = 95) +
labs(
title = "Forecasts from ARIMA(0,2,1) with constant",
y = "Passengers (millions)",
x = "Year"
)
## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), :
## Ignoring unknown parameters: `series`
## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `series`

#Q8
#A
us_gdp <- global_economy |>
filter(Code == "USA") |>
select(Year, GDP)
autoplot(us_gdp, GDP) +
labs(title = "United States GDP", y = "GDP", x = "Year")

# Box-Cox lambda
lambda_us <- us_gdp |>
features(GDP, features = guerrero) |>
pull(lambda_guerrero)
lambda_us
## [1] 0.2819443
#B/C
#Here we can see that the best arima model is (1,1,0) as it is Identical to the autoarima model
# Fit ARIMA models
fit_model_options <- us_gdp |>
model(
auto_arima = ARIMA(box_cox(GDP, lambda_us)),
arima011 = ARIMA(box_cox(GDP, lambda_us) ~ pdq(0,1,1)),
arima110 = ARIMA(box_cox(GDP, lambda_us) ~ pdq(1,1,0)),
arima111 = ARIMA(box_cox(GDP, lambda_us) ~ pdq(1,1,1)),
arima210 = ARIMA(box_cox(GDP, lambda_us) ~ pdq(2,1,0)),
arima012 = ARIMA(box_cox(GDP, lambda_us) ~ pdq(0,1,2))
)
glance(fit_model_options) |>
arrange(AICc)
## # A tibble: 6 × 8
## .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 auto_arima 5479. -325. 657. 657. 663. <cpl [1]> <cpl [0]>
## 2 arima110 5479. -325. 657. 657. 663. <cpl [1]> <cpl [0]>
## 3 arima011 5689. -326. 659. 659. 665. <cpl [0]> <cpl [1]>
## 4 arima111 5580. -325. 659. 659. 667. <cpl [1]> <cpl [1]>
## 5 arima210 5580. -325. 659. 659. 667. <cpl [2]> <cpl [0]>
## 6 arima012 5634. -326. 659. 660. 667. <cpl [0]> <cpl [2]>
#D
#When we look at the AICc we see that the auto arima model was identical to the AACc of our (1,1,0) which had the lowest value of 657.1 which is why this is the best candidate.
# Choose best model
best_fit <- us_gdp |>
model(
best_arima = ARIMA(box_cox(GDP, lambda_us))
)
report(best_fit)
## Series: GDP
## Model: ARIMA(1,1,0) w/ drift
## Transformation: box_cox(GDP, lambda_us)
##
## Coefficients:
## ar1 constant
## 0.4586 118.1822
## s.e. 0.1198 9.5047
##
## sigma^2 estimated as 5479: log likelihood=-325.32
## AIC=656.65 AICc=657.1 BIC=662.78
# Residual diagnostics
gg_tsresiduals(best_fit)

augment(best_fit) |>
features(.innov, ljung_box, lag = 10, dof = 1)
## # A tibble: 1 × 3
## .model lb_stat lb_pvalue
## <chr> <dbl> <dbl>
## 1 best_arima 3.81 0.923
#E
#Yes I think this looks like a reasonable forecast without a substantially large interval.
# Forecasts
fc_arima <- best_fit |>
forecast(h = 10)
autoplot(us_gdp, GDP) +
autolayer(fc_arima, level = 95) +
labs(title = "ARIMA forecasts for US GDP", y = "GDP", x = "Year")

#F
# ETS comparison
fit_ets <- us_gdp |>
model(
ets = ETS(GDP)
)
report(fit_ets)
## Series: GDP
## Model: ETS(M,A,N)
## Smoothing parameters:
## alpha = 0.9990876
## beta = 0.5011949
##
## Initial states:
## l[0] b[0]
## 448093333334 64917355687
##
## sigma^2: 7e-04
##
## AIC AICc BIC
## 3190.787 3191.941 3201.089
fc_ets <- fit_ets |>
forecast(h = 10)
autoplot(us_gdp, GDP) +
autolayer(fc_ets, level = 95) +
labs(title = "ETS forecasts for US GDP", y = "GDP", x = "Year")

#Projections of our Methods show a smaller confidence interval for our ARIMA than our ets model with the addition of some additional movement of that projection. ARIMA stabilizes the variance and had stronger residuals which is why I believe it is the preferred model. I think an interesting test here would be to see how these projections succeed from today over the next ten years to see if our comparison was correct.
fc_compare <- bind_rows(
fc_arima |> mutate(Model = "ARIMA"),
fc_ets |> mutate(Model = "ETS")
)
autoplot(us_gdp, GDP) +
autolayer(fc_compare, level = 95) +
labs(title = "ARIMA vs ETS forecasts for US GDP", y = "GDP", x = "Year")
