#Import needed libraries
library(tsibbledata) #to use the time series data in it for the exercises.
library(tsibble) # to use datasets and function as_tsibble
library(tibble) # to use view function
library(ggplot2)
library(feasts) # to use the functions for graphics like autoplot()
library(readr) # to uses read_csv function
library(dplyr) # to use Filter, mutate, arrange function etc
library(tidyr) # to use pivot_longer function
library(fpp3) # to use us_gasoline dataset
library(seasonal) # X-13ARIMA-SEATS decomposition
library(feasts)
library(fable) # for pdq() function
library(corrplot)
Answer:
A convenient way to produce a time plot, ACF plot and PACF plot in one command is to use the gg_tsdisplay() function with plot_type = “partial”.
The time plot shows that the closing price have a positive increasing trend over time. The mean of the values is not consistent over time. There is no variation observed. Hence the time series is not stationary.
The ACF plot shows the lags are positive but decreasing over time.It is not quickly converging to 0. The value of r1 is large and positive.There is a strong autocorrelation among the lags.The data is not white noise because 95% of the spikes in the ACF does not lie within ± 1.96 of the blue line. Hence the closing price should be differenced.
The PACF plot shows a strong autocorrelation at lag1.The PACF for a stationary time series would be 0 for all of the lags. Hence the time series is not stationary.
A very low value of p(0.01) indicates that we can reject the null hypothesis.The data is not stationary.
Post performing the differencing checks on the closing price,the value indicates teh need for only first-order differencing.
# Data preparation
amazon_2015 <- gafa_stock |>
filter(Symbol == "AMZN", year(Date) == 2015)
# Original Plot
amazon_2015 |>
autoplot(Close) + labs(subtitle = "Amazon daily closing stock price")
# Time plot with ACF and PACF
amazon_2015 |>
gg_tsdisplay(Close, plot_type='partial') + labs(subtitle = "Amazon daily closing stock price")
# p value
amazon_2015 |>
features(Close, unitroot_kpss)
## # A tibble: 1 × 3
## Symbol kpss_stat kpss_pvalue
## <chr> <dbl> <dbl>
## 1 AMZN 4.02 0.01
# number of differences needed
amazon_2015 |>
features(Close, unitroot_ndiffs)
## # A tibble: 1 × 2
## Symbol ndiffs
## <chr> <int>
## 1 AMZN 1
# Differenced plot
amazon_2015 |> ACF(difference(Close)) |>
autoplot() + labs(subtitle = "Changes in Amazon closing stock price(differenced)")
# p value post differencing
amazon_2015 |>
mutate(diff_close = difference(Close)) |>
features(diff_close, ljung_box, lag = 10)
## # A tibble: 1 × 3
## Symbol lb_stat lb_pvalue
## <chr> <dbl> <dbl>
## 1 AMZN 19.7 0.0324
Answer:
The time plot shows a positive exponential trend which is increasing over time.There is no seasonality or variation.The time series is not stationary.
The lambda value of 0.157 indicates that the data needs a strong transformation as a log transformation (λ = 0)to make it more suitable for analysis and forecasting.
Based on the unitroot_ndiffs function, first order non seasonal differencing is only needed.
turkey_gdp <- global_economy |>
filter(Code == "TUR")
turkey_gdp |>
autoplot(GDP) +
labs(y = " GDP", title = "Turkish GDP")
# box cox transformation
lambda <- turkey_gdp |> # this is to find the lamba which is needed for box cox formula
features(GDP, features = guerrero) |> # guerrero find s the best lambda at the peak
pull(lambda_guerrero)
turkey_gdp |>
autoplot(box_cox(GDP, lambda)) +
labs(y = "",title = paste("Turkish GDP with log transformation λ = ",round(lambda,4)))
# number of differences needed
turkey_gdp |>
features(GDP, unitroot_ndiffs)
## # A tibble: 1 × 2
## Country ndiffs
## <fct> <int>
## 1 Turkey 1
# number of seasonal differences needed
turkey_gdp |>
features(GDP, unitroot_nsdiffs)
## # A tibble: 1 × 2
## Country nsdiffs
## <fct> <int>
## 1 Turkey 0
# Differenced plot
turkey_gdp |> autoplot(difference(log(GDP))) + labs(subtitle = "Changes in Turkish GDP (log transformed and differenced)")
Answer:
The time plot shows no trend over time.There is seasonality and a increasing variation over time.The time series is not stationary.
The lambda value of 0.0018 indicates that the data needs a strong transformation as a log transformation (λ = 0)to make it more suitable for analysis and forecasting.
Based on the unitroot_ndiffs function, first order differencing is needed.
Based on the unitroot_nsdiffs function, first order seasonal differencing is also needed.
tasmania_acc <- aus_accommodation |>
filter(State == "Tasmania")
tasmania_acc |>
autoplot(Takings) +
labs(y = "Takings", title = "Tasmania Accomodation Takings")
# box cox transformation
lambda <- tasmania_acc |> # this is to find the lamba which is needed for box cox formula
features(Takings, features = guerrero) |> # guerrero find s the best lambda at the peak
pull(lambda_guerrero)
tasmania_acc |>
autoplot(box_cox(Takings, lambda)) +
labs(y = "Takings",title = paste("Tasmania Accomodation Takings with log transformation λ = ",round(lambda,4)))
# number of differences needed
tasmania_acc |>
features(Takings, unitroot_ndiffs)
## # A tibble: 1 × 2
## State ndiffs
## <chr> <int>
## 1 Tasmania 1
# number of differences needed
tasmania_acc |>
features(Takings, unitroot_nsdiffs)
## # A tibble: 1 × 2
## State nsdiffs
## <chr> <int>
## 1 Tasmania 1
# Applying seasonal differencing first and then regular differencing
tasmania_acc |> autoplot(difference(difference(log(Takings),4))) + labs(subtitle = "Tasmania Accomodation Takings (log transformed, seasonal differenced, regular differenced)")
Answer:
The time plot shows a increasing trend over time.There is seasonality( spike in december) and a increasing variation over time.The time series is not stationary.
The lambda value of 0.0021 indicates that the data needs a strong transformation as a log transformation (λ = 0)to make it more suitable for analysis and forecasting.
Based on the unitroot_ndiffs function, first order differencing is needed.
Based on the unitroot_nsdiffs function, first order seasonal differencing is also needed.
souvenirs |>
autoplot(Sales) +
labs(y = "Monthly Sales", title = "Monthly Sale of souvenirs")
# box cox transformation
lambda <- souvenirs |> # this is to find the lamba which is needed for box cox formula
features(Sales, features = guerrero) |> # guerrero find s the best lambda at the peak
pull(lambda_guerrero)
souvenirs |>
autoplot(box_cox(Sales, lambda)) +
labs(y = "Monthly Sales",title = paste("Monthly Sale of souvenirs with log transformation λ = ",round(lambda,4)))
# number of differences needed
souvenirs |>
features(Sales, unitroot_ndiffs)
## # A tibble: 1 × 1
## ndiffs
## <int>
## 1 1
# number of differences needed
souvenirs |>
features(Sales, unitroot_nsdiffs)
## # A tibble: 1 × 1
## nsdiffs
## <int>
## 1 1
# Applying seasonal differencing first and then regular differencing
souvenirs |> autoplot(difference(difference(log(Sales),12))) + labs(subtitle = "Monthly Sale of souvenirs(log transformed, seasonal differenced, regular differenced)")
Answer:
The complete transformation in backshift operator notation would be:
(1-B)(1-B^12)(log Y_t)
where:
Y_t is the original souvenirs sales series
log Y_t is the Box-Cox transformed series (since λ ≈ 0 is effectively a log transformation)
(1-B) represents the regular first difference
(1-B^12) represents the seasonal difference with period 12 (for monthly data)
This notation indicates that we first apply the Box-Cox (log) transformation to stabilize variance, then apply both seasonal and regular differencing to eliminate trend and seasonality, resulting in a stationary series.
Answer:
The time plot shows a increasing trend over time.There is seasonality and a increasing variation over time.The time series is not stationary.
The lambda value of 0.083 indicates that the data needs a strong transformation as a log transformation (λ = 0)to make it more suitable for analysis and forecasting.
Based on the unitroot_ndiffs function, first order differencing is needed.
Based on the unitroot_nsdiffs function, first order seasonal differencing is also needed.
set.seed(12345678)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries |>
autoplot(Turnover) +
labs(y = "Retail turnover in $Million AUD ", title = "Turnover: Australian retail industry")
# box cox transformation
lambda <- myseries |> # this is to find the lamba which is needed for box cox formula
features(Turnover, features = guerrero) |> # guerrero find s the best lambda at the peak
pull(lambda_guerrero)
myseries |>
autoplot(box_cox(Turnover, lambda)) +
labs(y = "Monthly Sales",title = paste("Monthly Sale of souvenirs with log transformation λ = ",round(lambda,4)))
# number of differences needed
myseries |>
features(Turnover, unitroot_ndiffs)
## # A tibble: 1 × 3
## State Industry ndiffs
## <chr> <chr> <int>
## 1 Northern Territory Clothing, footwear and personal accessory retailing 1
# number of differences needed
myseries |>
features(Turnover, unitroot_nsdiffs)
## # A tibble: 1 × 3
## State Industry nsdiffs
## <chr> <chr> <int>
## 1 Northern Territory Clothing, footwear and personal accessory retailing 1
# Applying seasonal differencing first and then regular differencing
myseries |> autoplot(difference(difference(log(Turnover),12))) + labs(subtitle = "Monthly Sale of souvenirs(log transformed, seasonal differenced, regular differenced)")
#data preparation
set.seed(123)
phi_1_func <- function(phi_1) {
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- phi_1*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
return(sim)
}
Answer:
When i change the phi_1 parameter in the AR(1) model, the behavior of the time series changes significantly.
When φ₁ = 0:
The series becomes pure white noise. No autocorrelation between observations. The plot looks like random fluctuations around zero with no pattern.
When 0 < φ₁ < 1 (like our φ₁ = 0.6):
The series shows positive autocorrelation. Values tend to persist in the same direction for several time periods. Creates “local trends” or apparent cycles.
When φ₁ = 1:
The series becomes a random walk. Can wander far from its starting point. No tendency to return to a mean value. Shows strong persistent behavior and pronounced trends.
When φ₁:-0.5: The series shows negative autocorrelation Values tend to alternate between positive and negative Creates a “choppy” or oscillating appearance
When φ₁ = -1: The series alternates perfectly between positive and negative values Creates a strong oscillating pattern
When |φ₁|=2: The series becomes non-stationary and explosive Values grow exponentially in magnitude over time The plot would show extreme values that increase without bound
phi_1_func(0.6) |>
autoplot(y) + labs( title = "Random Series with ϕ1= 0.6 (Moderate Persistence)")
phi_1_func(0) |>
autoplot(y) + labs( title = "Random Series with ϕ1= 0 (White Noise)")
phi_1_func(1) |>
autoplot(y) + labs( title = "Random Series with ϕ1= 1 (Random Walk)")
phi_1_func(-0.5) |>
autoplot(y) + labs( title = "Random Series with ϕ1= -0.5 (Oscillating)")
phi_1_func(-1) |>
autoplot(y) + labs( title = "Random Series with ϕ1= -1 (Strong Oscillating)")
phi_1_func(2) |>
autoplot(y) + labs( title = "Random Series with ϕ1= 2 (Exponential)")
set.seed(124)
phi_1_own_func <- function(theta) {
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- theta*e[i-1] + e[i]
sim_own <- tsibble(idx = seq_len(100), y = y, index = idx)
return(sim_own)
}
Answer:
As mentioned earlier, when θ1 is closer to 0 it more resembles white noise and when θ1 is closer to 1,it more closely resembles a dampened oscillatory pattern.When θ1 is negative, the plot may show an amplified oscillatory behavior.
phi_1_own_func(0.6) |>
autoplot(y) + labs( title = "Random Series with ϕ1= 0.6 (Moderate Persistence)")
phi_1_own_func(0) |>
autoplot(y) + labs( title = "Random Series with ϕ1= 0 (White Noise)")
phi_1_own_func(1) |>
autoplot(y) + labs( title = "Random Series with ϕ1= 1 (Moderate Persistence)")
phi_1_own_func(-0.5) |>
autoplot(y) + labs( title = "Random Series with ϕ1= -0.5 (Oscillating)")
phi_1_own_func(-1) |>
autoplot(y) + labs( title = "Random Series with ϕ1= -1 (Strong Oscillating)")
phi_1_own_func(2) |>
autoplot(y) + labs( title = "Random Series with ϕ1= 2 (Moderate Persistence)")
set.seed(123)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100){
y[i] <- 0.6*y[i-1] + 0.6*e[i-1] + e[i]
}
arma_1_1 <- tsibble(idx = seq_len(100), y = y, index = idx)
set.seed(123)
yy <- numeric(100)
ee <- rnorm(100)
for(i in 3:100){
yy[i] <- -0.8 * yy[i-1] + 0.3 * yy[i-2] + ee[i]
}
arma_2 <- tsibble(idx = seq_len(100), yy = yy, index = idx)
Answer:
ARMA(1,1) with ϕ1=0.6 combines autoregressive and moving average components. Shows smoother patterns with moderate persistence from the AR component. Has short-term correction effects from the MA component. Stationary since |φ₁| = 0.6 < 1
AR(2) with ϕ1=−0.8,ϕ2=0.3 is a pure autoregressive model with two lags. The negative φ₁ creates oscillatory behavior with alternating values. The positive φ₂ adds some longer-term persistence. Not Stationary since -0.8 + 0.3 = -0.5 < 1, φ₂ - φ₁ = 0.3 - (-0.8) = 1.1 >1, and |φ₂| = 0.3 < 1
autoplot(arma_1_1) +labs(title =" ARMA(1,1)with ϕ1=0.6")
autoplot(arma_2)+labs(title =" ARMA(2) with ϕ1=−0.8,ϕ2=0.3")
Answer:
The timeplot shows a positive increasing trend with no seasonality or variation over time. The lambda value of -0.176 shows the need of a log transformation. The ndiffs value of 2 indicates we need to perform a second order differencing. nsdiffs value of 0 confirms that there is no seasonal differencing needed.
The ACF suggests an MA(1) model; so a candidate is an ARIMA(0,2,1). We fit an ARIMA(0,2,1) along with two automated model selections, one using the default stepwise procedure, and one working harder to search a larger model space.They both generated the same ARIMA(0,2,1) ARIMA model fitted also reported a ARIMA(0,2,1). Hence its consistent with what i observed.
The residual plot shows there is not autocorrelation of the lags, normal distribution, the lags are within the blue line and hence the data is white noise.
Plot forecasts for the next 10 years shows a relative exponential trend consistent with past error terms.
aus_airpassengers |>
autoplot() +labs( title = "Australian air passengers" , y ="total number of passengers (in millions) ")
# number of differences needed
aus_airpassengers |>
features(Passengers, unitroot_ndiffs)
## # A tibble: 1 × 1
## ndiffs
## <int>
## 1 2
# number of seasonal differences needed
aus_airpassengers |>
features(Passengers, unitroot_nsdiffs)
## # A tibble: 1 × 1
## nsdiffs
## <int>
## 1 0
# Applying second order differencing
aus_airpassengers_new <- aus_airpassengers |>
mutate(Passengers_new = difference(difference(Passengers)))
# Review the timeplot, ACF and PACF plot
aus_airpassengers_new |>
gg_tsdisplay(Passengers_new, plot_type='partial')+ labs(subtitle = "Australian air passengers (second order differenced)")
# Checking the fit
aus_airpassengers_fit <- aus_airpassengers |>
filter(Year < 2012) |>
model(ARIMA(Passengers))
report(aus_airpassengers_fit)
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8756
## s.e. 0.0722
##
## sigma^2 estimated as 4.671: log likelihood=-87.8
## AIC=179.61 AICc=179.93 BIC=182.99
# Review the residuals
aus_airpassengers_fit |>
gg_tsresiduals()
# plot forecasts
fc <- aus_airpassengers_fit |>
forecast(h="10 years")
fc |>
autoplot(aus_airpassengers) +
labs(title = "10 Year Forecase for Australian Air Passengers")+
scale_x_continuous(breaks = seq(min(aus_airpassengers$Year), max(aus_airpassengers$Year), by = 5))
Answer:
Theta is −0.8756 as seen above, MA(2) model:
y[t]=(−0.8756)e[t−1]+e[t]
d = 2 due to second order differencing, The backshift operator notation would be:
(1−B)^2 y[t]=(1−0.8756B)e[t]
This can be rewritten to isolate y[t] as :
(1-2B+B^2)y[t]=(1−0.8756B)e[t]
y[t] -2y[t-1]+ y[t-2] = e[t]−0.8756e[t-1]
y[t] = 2y[t-1]- y[t-2]+e[t]−0.8756e[t-1]
Answer:
Fitted an ARIMA(0,1,0) model (which is equivalent to a random walk without drift) to the Passengers variable
The + 1 adds the constant term, making it a random walk with drift model.
The residual plot shows there is not autocorrelation of the lags, normal distribution, the lags are within the blue line and hence the data is white noise.
Comparing the forecasts of a) and c) , i observe that a) forecasts correctly till 2013 and then higher than the actual time series like a exponential trend whereas c) forecasts lower than the actual time series throughout. hence c) is a dampened forecast.
# Checking the fit
aus_airpassengers_fit2 <- aus_airpassengers |>
filter(Year < 2012) |>
model(arima010 = ARIMA(Passengers ~ 1+ pdq(0,1,0)))
report(aus_airpassengers_fit2)
## Series: Passengers
## Model: ARIMA(0,1,0) w/ drift
##
## Coefficients:
## constant
## 1.3669
## s.e. 0.3319
##
## sigma^2 estimated as 4.629: log likelihood=-89.08
## AIC=182.17 AICc=182.48 BIC=185.59
# Review the residuals
aus_airpassengers_fit2 |>
gg_tsresiduals()
# forecast for next 10 years
fc2 <- aus_airpassengers_fit2 |>
forecast(h="10 years")
# plot the forecasts
fc2 |>
autoplot(aus_airpassengers) +
labs(title = "10 Year Forecase for Australian Air Passengers")+
scale_x_continuous(breaks = seq(min(aus_airpassengers$Year), max(aus_airpassengers$Year), by = 5))
I observe that a) forecasts correctly till 2013 and then higher than the actual time series, c) forecasts lower than the actual time series throughout, d) forecasts lower than the actual time series till 2013 and then forecasts higher than the actual time series.
when i remove the constant, it becomes a NULL model, non stationary and hence invalid.
# Checking the fit
aus_airpassengers_fit3 <- aus_airpassengers |>
filter(Year < 2012) |>
model(arima212 = ARIMA(Passengers ~ 1 + pdq(2,1,2)))
report(aus_airpassengers_fit3)
## Series: Passengers
## Model: ARIMA(2,1,2) w/ drift
##
## Coefficients:
## ar1 ar2 ma1 ma2 constant
## 1.4694 -0.5103 -1.5736 0.6780 0.0650
## s.e. 0.3780 0.3558 0.3081 0.2688 0.0294
##
## sigma^2 estimated as 4.748: log likelihood=-87.74
## AIC=187.47 AICc=189.94 BIC=197.75
# Review the residuals
aus_airpassengers_fit3 |>
gg_tsresiduals()
# forecast for next 10 years
fc3 <- aus_airpassengers_fit3 |>
forecast(h="10 years")
# plot the forecasts
fc3 |>
autoplot(aus_airpassengers) +
labs(title = "10 Year Forecase for Australian Air Passengers") +
scale_x_continuous(breaks = seq(min(aus_airpassengers$Year), max(aus_airpassengers$Year), by = 5))
Answer:
Fitted an ARIMA(0,2,1) model to the Passengers variable
The + 1 adds the constant term, making it quadratic trend (polynomial of order 2).
The constant term is 0.0721 which represents acceleration in the trend rather than a simple drift term.
The forecast plot is showing a quadratic forecast.
# Checking the fit
aus_airpassengers_fit4 <- aus_airpassengers |>
filter(Year < 2012) |>
model(arima021 = ARIMA(Passengers ~ 1 + pdq(0,2,1)))
report(aus_airpassengers_fit4)
## Series: Passengers
## Model: ARIMA(0,2,1) w/ poly
##
## Coefficients:
## ma1 constant
## -1.0000 0.0721
## s.e. 0.0709 0.0260
##
## sigma^2 estimated as 4.086: log likelihood=-85.74
## AIC=177.48 AICc=178.15 BIC=182.55
# Review the residuals
aus_airpassengers_fit4 |>
gg_tsresiduals()
# forecast for next 10 years
fc4 <- aus_airpassengers_fit4 |>
forecast(h="10 years")
# plot the forecasts
fc4 |>
autoplot(aus_airpassengers) +
labs(title = "10 Year Forecase for Australian Air Passengers") +
scale_x_continuous(breaks = seq(min(aus_airpassengers$Year), max(aus_airpassengers$Year), by = 5))
The United States GDP time series shows a exponential trend with no seasonality or variation.
Lambda value of 0.2819 shows the need for a log or square root transformation to provide accurate forecast.
usa_gdp <- global_economy |>
filter(Code == "USA") |>
select(Year,GDP)
usa_gdp |>
autoplot(GDP) +
labs(y = " GDP", title = "USA GDP")
# box cox transformation
lambda <- usa_gdp |> # this is to find the lamba which is needed for box cox formula
features(GDP, features = guerrero) |> # guerrero find s the best lambda at the peak
pull(lambda_guerrero)
usa_gdp |>
autoplot(box_cox(GDP, lambda)) +
labs(y = "",title = paste("USA GDP with log transformation λ = ",round(lambda,4)))
Answer: The fit shows a ARIMA(1,1,0) w/ drift .
fit <- usa_gdp|>
model(ARIMA(box_cox(GDP, lambda)))
report(fit)
## Series: GDP
## Model: ARIMA(1,1,0) w/ drift
## Transformation: box_cox(GDP, lambda)
##
## Coefficients:
## ar1 constant
## 0.4586 118.1822
## s.e. 0.1198 9.5047
##
## sigma^2 estimated as 5479: log likelihood=-325.32
## AIC=656.65 AICc=657.1 BIC=662.78
ndiffs() shows the need for first order differencing Other plausible models were tried but all have higher AICc than the ARIMA(1,1,0) w/ drift .
# number of differences needed
usa_gdp |>
features(box_cox(GDP,lambda), unitroot_ndiffs)
## # A tibble: 1 × 1
## ndiffs
## <int>
## 1 1
usa_gdp |>
gg_tsdisplay(box_cox(GDP,lambda),plot_type='partial')
fit <- usa_gdp |>
select(GDP, Year) |>
model(
arima011 = ARIMA(box_cox(GDP,lambda) ~ pdq(0,1,1)),
arima110 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,1,0)),
arima111 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,1,1)),
arima010 = ARIMA(box_cox(GDP,lambda) ~ pdq(0,1,0)),
arima212 = ARIMA(box_cox(GDP,lambda) ~ pdq(2,1,2)),
arima120 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,2,0)),
arima121 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,2,1)),
auto = ARIMA(box_cox(GDP,lambda), stepwise = FALSE, approx = FALSE)
)
fit |> pivot_longer(everything(), names_to = "Model name",
values_to = "Orders")
## # A mable: 8 x 2
## # Key: Model name [8]
## `Model name` Orders
## <chr> <model>
## 1 arima011 <ARIMA(0,1,1) w/ drift>
## 2 arima110 <ARIMA(1,1,0) w/ drift>
## 3 arima111 <ARIMA(1,1,1) w/ drift>
## 4 arima010 <ARIMA(0,1,0) w/ drift>
## 5 arima212 <ARIMA(2,1,2) w/ drift>
## 6 arima120 <ARIMA(1,2,0)>
## 7 arima121 <ARIMA(1,2,1)>
## 8 auto <ARIMA(1,1,0) w/ drift>
glance(fit) |> arrange(AICc) |> select(.model:BIC)
## # A tibble: 8 × 6
## .model sigma2 log_lik AIC AICc BIC
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 arima121 5761. -321. 648. 649. 655.
## 2 arima120 6780. -326. 656. 656. 660.
## 3 arima110 5479. -325. 657. 657. 663.
## 4 auto 5479. -325. 657. 657. 663.
## 5 arima011 5689. -326. 659. 659. 665.
## 6 arima111 5580. -325. 659. 659. 667.
## 7 arima212 5734. -325. 662. 664. 674.
## 8 arima010 6774. -332. 668. 668. 672.
Answer: The innovation residuals are centered around O. The ACF plot shows no autocorrelationa and the lags are all within the blue line. Hence its all white noise.
The large p-value (0.63) confirms that the residuals are similar to white noise.
Hence ARIMA(1,2,1) is the best model.
fit |>
select(arima121) |>
gg_tsresiduals()
augment(fit) |>
filter(.model == "arima121") |>
features(.innov, ljung_box, lag=10, dof=4)
## # A tibble: 1 × 3
## .model lb_stat lb_pvalue
## <chr> <dbl> <dbl>
## 1 arima121 4.36 0.629
Yes the forecast looks reasonable and shows similar exponential trend.
forecast(fit, h=10) |>
filter(.model=='arima121') |>
autoplot(usa_gdp)+
labs(title = "Forecasted United States GDP using ARIMA(1,2,1)")
As the trend is exponential with no seasonality i will try the ETS MMN model. However ARIMA121 model is more accurate than it as it has lower AICc value.
fit_compare <- usa_gdp |>
select(GDP, Year) |>
model(
arima121 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,2,1)),
ETS = ETS(GDP),
ETSMMN = ETS(GDP ~ error("M") + trend("M") + season("N"))
)
fit_compare |> pivot_longer(everything(), names_to = "Model name",
values_to = "Orders")
## # A mable: 3 x 2
## # Key: Model name [3]
## `Model name` Orders
## <chr> <model>
## 1 arima121 <ARIMA(1,2,1)>
## 2 ETS <ETS(M,A,N)>
## 3 ETSMMN <ETS(M,M,N)>
glance(fit_compare) |> arrange(AICc) |> select(.model:BIC)
## # A tibble: 3 × 6
## .model sigma2 log_lik AIC AICc BIC
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 arima121 5761. -321. 648. 649. 655.
## 2 ETSMMN 0.000611 -1588. 3186. 3187. 3196.
## 3 ETS 0.000678 -1590. 3191. 3192. 3201.
fit_compare %>%
forecast(h = 10) %>%
autoplot(usa_gdp, levels(NULL)) +
ggtitle("US GDP Forecast")+
scale_x_continuous(breaks = seq(min(usa_gdp$Year), max(usa_gdp$Year), by = 5))