library(tidyverse)
library(fpp3)
library(seasonal)
library(fable)
The first key differenece between each of these figures are the critical bounds (blue dashed lines) of the autocorrelation plot. As the size of the series increases (series 1 (36 values) to series 3 (1000 values)) the bounds become closer and closer to 0. In order for a series to be white noise, we would expect that 95% of the spikes to be between the two bounds. Aside from one spike in the first series and 2 spikes in the second series, which might be fals positives, the autocorrelations stay in bounds. For this reason we can say that all of them indicate white noise.
The critical values are different distances from the mean of 0 as this is a reflection of how the bounds are calculated. The formula to construct the bounds is \(\pm \frac{1.96}{\sqrt{T}}\), where T is the length of the time series. As the size of the series increases the denomonator becomes bigger, giving us smaller bounds.
gafa_stock %>% filter(Symbol == "AMZN") %>% gg_tsdisplay(Close, plot_type = "partial", lag_max = 30) + labs(title = "Amazon Closing Stock Price", x = "USD", Y = "Date")
## Warning: Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
The time series of Amazons closing price is not stationary. We can see this through all three graphs above. The most obvious is in the time series plot, which shows the linear trend in the closing price. Stationary series is roughly horizontal and has constant variability around the mean. When evaluating the ACF and the PCF we see other characteristics of non stationary data. These include highly correlated lags that decrease very slowly in the ACF and multiple significant spikes after the first lag in the PACF plot.
With stationary data we should see a rapid or exponential decay of the autocorrelations in the ACF plot and there should typically not be any correlations beyond the first lag in the PACF plot.
# Inspect the data
turkish_gdp <- global_economy %>% filter(Country == "Turkey")
head(turkish_gdp)
## # A tsibble: 6 x 9 [1Y]
## # Key: Country [1]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Turkey TUR 1960 13995067818. NA 0.0000540 3.67 2.06 27472331
## 2 Turkey TUR 1961 8022222222. 1.16 0.0000557 6.79 5.12 28146893
## 3 Turkey TUR 1962 8922222222. 5.57 0.0000578 7.97 5.60 28832805
## 4 Turkey TUR 1963 10355555556. 9.07 0.0000615 6.97 4.18 29531342
## 5 Turkey TUR 1964 11177777778. 5.46 0.0000622 5.47 4.47 30244232
## 6 Turkey TUR 1965 11944444444. 2.82 0.0000650 5.40 4.56 30972965
# Plot time series
turkish_gdp %>% autoplot(GDP) + labs(title = "Turkey GDP")
# Box-Cox transformation
lambda <- turkish_gdp %>%
features(GDP, features = guerrero) |>
pull(lambda_guerrero)
turkish_gdp_transformed <- turkish_gdp %>% mutate(GDP_transformed = box_cox(GDP, lambda))
turkish_gdp_transformed %>% autoplot(GDP_transformed) +
labs(y = "", title = paste0("Transformed Turkish GDP with lambda = ", round(lambda, 4)))
# Determine order of differencing using unit - root tests
turkish_gdp_transformed %>% features(GDP_transformed, unitroot_ndiffs)
## # A tibble: 1 × 2
## Country ndiffs
## <fct> <int>
## 1 Turkey 1
For the Turkish GDP the most appropriate transformation was found to be 0.157 and the appropriate order of differencing was 1
# Inspect the data
tasmania_accom <- aus_accommodation %>% filter(State == "Tasmania")
head(tasmania_accom)
## # A tsibble: 6 x 5 [1Q]
## # Key: State [1]
## Date State Takings Occupancy CPI
## <qtr> <chr> <dbl> <dbl> <dbl>
## 1 1998 Q1 Tasmania 28.7 67 67
## 2 1998 Q2 Tasmania 19.0 45 67.4
## 3 1998 Q3 Tasmania 16.1 39 67.5
## 4 1998 Q4 Tasmania 25.9 56 67.8
## 5 1999 Q1 Tasmania 28.4 66 67.8
## 6 1999 Q2 Tasmania 20.1 48 68.1
# Plot the time series
tasmania_accom %>% autoplot(Takings) + labs(title = "Accommodation Takings in Tasmania")
# Box-Cox transformation
lambda <- tasmania_accom %>%
features(Takings, features = guerrero) |>
pull(lambda_guerrero)
tasmania_accom_transformed <- tasmania_accom %>% mutate(Takings_transformed = box_cox(Takings, lambda))
tasmania_accom_transformed %>% autoplot(Takings_transformed) +
labs(y = "", title = paste0("Transformed Takings with lambda = ", round(lambda, 4)))
# Determine order of differencing using unit - root tests
tasmania_accom_transformed %>% features(Takings_transformed, feat_stl)
## # A tibble: 1 × 10
## State trend_strength seasonal_strength_year seasonal_peak_year
## <chr> <dbl> <dbl> <dbl>
## 1 Tasmania 0.997 0.992 1
## # ℹ 6 more variables: seasonal_trough_year <dbl>, spikiness <dbl>,
## # linearity <dbl>, curvature <dbl>, stl_e_acf1 <dbl>, stl_e_acf10 <dbl>
# Check if a seasonal difference is needed
tasmania_accom_transformed %>% features(Takings_transformed, unitroot_nsdiffs)
## # A tibble: 1 × 2
## State nsdiffs
## <chr> <int>
## 1 Tasmania 1
# Seaasonally difference the transformed takings and save it to a new column
tasmania_accom_transformed <- tasmania_accom_transformed %>% mutate(sdiff_Takings_transformed = difference(Takings_transformed, 4))
# Plot the seasonally differenced data
tasmania_accom_transformed %>% autoplot(sdiff_Takings_transformed) + labs(title = "Seasonal Differenced Takings")
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
# Check if another order of difference is needed after a seasonal difference is applied
tasmania_accom_transformed %>% features(sdiff_Takings_transformed, unitroot_ndiffs)
## # A tibble: 1 × 2
## State ndiffs
## <chr> <int>
## 1 Tasmania 0
# Check to determine data is white noise using ACF/PCF
tasmania_accom_transformed %>% gg_tsdisplay(sdiff_Takings_transformed, plot_type = "partial", lag_max = 12)
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
# Check if data is stationary using KPSS test
tasmania_accom_transformed %>% features(sdiff_Takings_transformed, unitroot_kpss)
## # A tibble: 1 × 3
## State kpss_stat kpss_pvalue
## <chr> <dbl> <dbl>
## 1 Tasmania 0.198 0.1
According to the KPSS test the p value is above 0.05 so we fail to reject the null hypothesis that the data is stationary, so we do not have to difference again. For the Tasmania Accomodations data the most appropriate transformation was found to be 0.0018 and the appropriate order of differencing was 1 seasonal difference.
# Inspect data
head(souvenirs)
## # A tsibble: 6 x 2 [1M]
## Month Sales
## <mth> <dbl>
## 1 1987 Jan 1665.
## 2 1987 Feb 2398.
## 3 1987 Mar 2841.
## 4 1987 Apr 3547.
## 5 1987 May 3753.
## 6 1987 Jun 3715.
# Plot time series
souvenirs %>% autoplot(Sales) + labs(title = "Montly Souvenirs Sales")
# Box-Cox transformation
lambda <- souvenirs %>%
features(Sales, features = guerrero) |>
pull(lambda_guerrero)
souvenirs_transformed <- souvenirs %>% mutate(Sales_transformed = box_cox(Sales, lambda))
souvenirs_transformed %>% autoplot(Sales_transformed) +
labs(y = "", title = paste0("Souvenirs Sales with lambda = ", round(lambda, 4)))
# Determine order of differencing using unit - root tests
souvenirs_transformed %>% features(Sales_transformed, unitroot_nsdiffs)
## # A tibble: 1 × 1
## nsdiffs
## <int>
## 1 1
# Apply the order of seasonal differencing
souvenirs_transformed <- souvenirs_transformed %>% mutate(Sales_sdiff = difference(Sales_transformed, 12))
# Determine order of differencing using unit - root tests and check KPSS
souvenirs_transformed %>% features(Sales_sdiff, unitroot_ndiffs)
## # A tibble: 1 × 1
## ndiffs
## <int>
## 1 0
souvenirs_transformed %>% features(Sales_sdiff, unitroot_kpss)
## # A tibble: 1 × 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 0.240 0.1
# plot the transfored/differenced Sales data with both ACF and PACF plots
souvenirs_transformed %>% gg_tsdisplay(Sales_sdiff, plot_type = "partial", lag_max = 12)
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).
souvenirs_transformed %>% mutate(Sales_diff = difference(Sales_sdiff, 1)) %>% gg_tsdisplay(Sales_diff, plot_type = "partial", lag_max = 12)
## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_point()`).
Just out of curiosity I looked at the plots after differenceing a second
time to visualize what the time series looked like along with the ACF
and PACF plots. The time series of the 2nd order differencing looks a
little more white noise as well as what we might typically see in an ACF
plot. However, from the unit root tests after the first order
differencing on seasonality it appears we would not need to difference
again.
# Filter data to get Household goods retailing in Victoria
aus_houshold_goods <- aus_retail %>% filter(State == "Victoria" & Industry == "Household goods retailing")
# View first few rows
head(aus_houshold_goods)
## # A tsibble: 6 x 5 [1M]
## # Key: State, Industry [1]
## State Industry `Series ID` Month Turnover
## <chr> <chr> <chr> <mth> <dbl>
## 1 Victoria Household goods retailing A3349643V 1982 Apr 173.
## 2 Victoria Household goods retailing A3349643V 1982 May 180.
## 3 Victoria Household goods retailing A3349643V 1982 Jun 167.
## 4 Victoria Household goods retailing A3349643V 1982 Jul 174.
## 5 Victoria Household goods retailing A3349643V 1982 Aug 178.
## 6 Victoria Household goods retailing A3349643V 1982 Sep 180.
aus_houshold_goods %>% autoplot(Turnover) + labs(title = "Household Goods Retailing Turnover in Victoria, AUS", y = "Millions (AUD)")
lambda <- aus_houshold_goods %>%
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
aus_houshold_goods_transformed <- aus_houshold_goods %>% mutate(Turnover_transformed = box_cox(Turnover, lambda))
aus_houshold_goods_transformed %>% autoplot(Turnover_transformed) + labs(title = paste0("Household Goods Retailing Turnover in Victoria, AUS with lambda = ", round(lambda, 4)), y = "Millions (AUD)")
aus_houshold_goods_transformed %>% features(Turnover_transformed, unitroot_nsdiffs)
## # A tibble: 1 × 3
## State Industry nsdiffs
## <chr> <chr> <int>
## 1 Victoria Household goods retailing 1
aus_houshold_goods_transformed <- aus_houshold_goods_transformed %>% mutate(Turnover_sdiff = difference(Turnover_transformed, 12))
aus_houshold_goods_transformed %>% gg_tsdisplay(Turnover_sdiff, plot_type = "partial")
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).
# Check unit root test to evaluate if anymore differencing is needed
aus_houshold_goods_transformed %>% features(Turnover_sdiff, unitroot_ndiffs)
## # A tibble: 1 × 3
## State Industry ndiffs
## <chr> <chr> <int>
## 1 Victoria Household goods retailing 0
# Check KPSS for stationarity
aus_houshold_goods_transformed %>% features(Turnover_sdiff, unitroot_kpss)
## # A tibble: 1 × 4
## State Industry kpss_stat kpss_pvalue
## <chr> <chr> <dbl> <dbl>
## 1 Victoria Household goods retailing 0.0819 0.1
We would only difference the data 1 time for seasonality in order to obtain stationarity.
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim %>% autoplot(y) + labs(title = "Simulated Time Series - AR(1)", y = "value", x = "id")
# Create plots for different values of phi
# values of phi
phi_values <- c(-0.1, 0, 0.2, 0.4, 0.6, 0.8, 1)
sim_ar1 <- function(phi, y, e){
for(i in 2:100)
y[i] <- phi*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
plot_object <- sim %>%
autoplot(y) + labs(title = paste0("Simulated Time Series with phi: ", phi),
y = "value",
x = "id")
return(plot_object)
}
#plots <- c()
for(val in phi_values)
print(sim_ar1(val, y, e))
As \(\phi_1\) goes from 0 to 1 we see that the time series moves from white noise \(\phi_1 = 0\) to random walk, when \(\phi_1 = 1\).
When \(\sigma^2 = 1\) each observation has the same variance so the equation we would use is
yt = \(\epsilon_t\) + \(0.6\epsilon_{t-1}\)
# Simulate 100 data points
y <- numeric(100)
#e represents the past errors
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*e[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim %>% autoplot(y) + labs(title = "Simulated Time Series - MA(1)", y = "value", x = "id")
# Visualize the autocorrelation plot
sim %>% ACF(y) %>% autoplot()
# Create a function that looks at different values of theta
# Select theta values
theta_values <- c(-0.1, 0, 0.2, 0.4, 0.6, 0.8, 1)
sim_ma1 <- function(theta, y, e){
for(i in 2:100)
y[i] <- theta*e[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
plot_object <- sim %>%
autoplot(y) + labs(title = paste0("Simulated MA(1) Time Series with theta: ", theta),
y = "value",
x = "id")
return(plot_object)
}
# Run the function to generate a plot of the time series for each value of theta
for(val in theta_values)
print(sim_ma1(val, y, e))
As theta changes the data appears to get smoother with the amount of variablity decreasing but remaing constant aroud the mean of 0.
# Simulate 100 data points
y <- numeric(100)
# e represents the past errors
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + 0.6*e[i] + e[i]
sim_arma11 <- tsibble(idx = seq_len(100), y = y, index = idx)
# View data
head(sim_arma11)
## # A tsibble: 6 x 2 [1]
## idx y
## <int> <dbl>
## 1 1 0
## 2 2 1.36
## 3 3 0.108
## 4 4 2.99
## 5 5 4.06
## 6 6 2.38
## Generate and AR(2) model
y <- numeric(100)
e <- rnorm(100)
# Simulate 100 values
for(i in 2:99)
y[i+1] <- -0.8*y[i] + 0.3*y[i-1] + e[i]
sim_ar2 <- tsibble(idx = seq_len(100), y = y, index = idx)
# sim_ar2_b <- arima.sim(model = list(order = c(2, 0, 0), ar = c(-0.8, 0.3)), n = 100)
par(mfrow = c(1, 2))
sim_arma11 %>% autoplot(y) + labs(title = "ARMA(1,1)")
sim_ar2 %>% autoplot(y) + labs(title = "AR(2)")
In the AR(2) values oscillate between positive and negative values, which is what we see when \(\phi_1 < 0\). Restricting autoregressive models to stationary data, constraints on the parameters need to be met.
For p = 1: \(-1 < \phi_1 < 1\)
For p = 2: \(-1 < \phi_2 < 1\) \(\phi_2 + \phi_1 < 1\) \(\phi_2 - \phi_1 < 1\) - The AR(2) model with the parameters given violates this constraint.
head(aus_airpassengers, 20)
## # A tsibble: 20 x 2 [1Y]
## Year Passengers
## <dbl> <dbl>
## 1 1970 7.32
## 2 1971 7.33
## 3 1972 7.80
## 4 1973 9.38
## 5 1974 10.7
## 6 1975 11.1
## 7 1976 10.9
## 8 1977 11.3
## 9 1978 12.1
## 10 1979 13.0
## 11 1980 13.6
## 12 1981 13.2
## 13 1982 13.2
## 14 1983 12.6
## 15 1984 13.2
## 16 1985 14.4
## 17 1986 15.5
## 18 1987 16.9
## 19 1988 18.8
## 20 1989 15.1
aus_airpassengers %>% autoplot(Passengers) + labs(title = "Australian Air Carrier Passengers")
# Use ARIMA function to auto generate a model
fit <- aus_airpassengers %>% model(ARIMA(Passengers, stepwise = F))
report(fit)
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8963
## s.e. 0.0594
##
## sigma^2 estimated as 4.308: log likelihood=-97.02
## AIC=198.04 AICc=198.32 BIC=201.65
The model that was generated was an ARIMA(0,2,1) model.
# Check that the residuals are white noise
fit %>% gg_tsresiduals()
Visualizing the residuals across time, we can see that the constant
variability changes around 1988. There are greater flutuations after
this year. When comparing this informtion to the distribution of the
residuals we see that there is one major outlier that might be
influencing the data. In the ACF plot we see no correlations passing the
significant bounds. I think we can consider the residuals close enough
to white noise.
augment(fit) %>% features(.innov, ljung_box, lag = 10, dof = 1)
## # A tibble: 1 × 3
## .model lb_stat lb_pvalue
## <chr> <dbl> <dbl>
## 1 ARIMA(Passengers, stepwise = F) 6.70 0.669
We can run a Ljung-Box test to evaluate if the residuals are distinguishable from white noise. Since the results are not significant we can conclude that the residuals are not distinguishable from a white noise series.
# Build forecast for the next 10 years
fit %>% forecast(h = 10) %>% autoplot(aus_airpassengers) + labs(y = "Number of Passengers", title = "Australian Air Carrier Passengers")
There are no autoregressive terms so this model in back shift notation becomes
\[(1 - B)^2y_t = (1 + \theta_1B)\epsilon_t\]
fit <- aus_airpassengers %>% model(ARIMA(Passengers ~ pdq(0, 1, 0)))
# Get model report
fit %>% report()
## Series: Passengers
## Model: ARIMA(0,1,0) w/ drift
##
## Coefficients:
## constant
## 1.4191
## s.e. 0.3014
##
## sigma^2 estimated as 4.271: log likelihood=-98.16
## AIC=200.31 AICc=200.59 BIC=203.97
# Generate and plot the forecasts
fit %>% forecast(h = 10) %>% autoplot(aus_airpassengers) + labs(y = "Number of Passengers", title = "ARIMA_010: Australian Air Carrier Passengers")
Compared to the the ARIMA(0,2,1) from \(part~a\) that had an AICc score of 198.32, where as the ARIMA(0,1,0) from \(part ~ b\) had an AICc score of 200.59. This indicates that adding a q term of 1 is a better. The forecasts looked pretty similar but the prediction intervals in \(part~a\) were wider as the forecasts went further into the future.
# Forecast with constant
fit <- aus_airpassengers %>% model(ARIMA(Passengers ~ 1 + pdq(2, 1, 2)))
fit %>% report()
## Series: Passengers
## Model: ARIMA(2,1,2) w/ drift
##
## Coefficients:
## ar1 ar2 ma1 ma2 constant
## -0.5518 -0.7327 0.5895 1.0000 3.2216
## s.e. 0.1684 0.1224 0.0916 0.1008 0.7224
##
## sigma^2 estimated as 4.031: log likelihood=-96.23
## AIC=204.46 AICc=206.61 BIC=215.43
fit %>% forecast(h = 10) %>% autoplot(aus_airpassengers) + labs(y = "Number of Passengers", title = "ARIMA_212 With Constant: Australian Air Carrier Passengers")
# Fit and compare three ARIMA models from part a, c, d
aus_airpassengers %>% model(a_ARIMA_021 = ARIMA(Passengers ~ pdq(0, 2, 1)),
b_ARIMA_010 = ARIMA(Passengers ~ pdq(0, 1, 0)),
c_ARIMA_212 = ARIMA(Passengers ~ 1 + pdq(2, 1, 2))) %>%
report()
## Warning in report.mdl_df(.): Model reporting is only supported for individual
## models, so a glance will be shown. To see the report for a specific model, use
## `select()` and `filter()` to identify a single model.
## # A tibble: 3 × 8
## .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 a_ARIMA_021 4.31 -97.0 198. 198. 202. <cpl [0]> <cpl [1]>
## 2 b_ARIMA_010 4.27 -98.2 200. 201. 204. <cpl [0]> <cpl [0]>
## 3 c_ARIMA_212 4.03 -96.2 204. 207. 215. <cpl [2]> <cpl [2]>
Fitting this data with an ARIMA(2,1,2) model with drift and forecasting the next 10 years we see a higher AICc compared to both models in \(part~a\) and \(part~b\). When visualizing the plot we do see small fluctuations in the number of passengers which does reflect the underlying data. Prediction intervals are similar ranges to those obtained in \(part~b\). Using the AICc to compare models b and c to a may not be the most appropriate metric to use as model a used a second difference.
# Forecast without constant
fit <- aus_airpassengers %>% model(ARIMA(Passengers ~ 0 + pdq(2, 1, 2)))
## Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 2))
## [1] non-stationary AR part from CSS
fit %>% report()
## Series: Passengers
## Model: NULL model
## NULL model
fit %>% forecast(h = 10) %>% autoplot(aus_airpassengers) + labs(y = "Number of Passengers", title = "ARIMA_212 No constant: Australian Air Carrier Passengers")
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).
When removing the constant from an ARIMA(2,1,2) I get an error message as stated above.
fit <- aus_airpassengers %>% model(ARIMA(Passengers ~ 1 + pdq(0, 2, 1)))
## Warning: Model specification induces a quadratic or higher order polynomial trend.
## This is generally discouraged, consider removing the constant or reducing the number of differences.
fit %>% forecast(h = 10) %>% autoplot(aus_airpassengers) + labs(y = "Number of Passengers", title = "ARIMA_021 With constant: Australian Air Carrier Passengers")
# Get report
fit %>% report()
## Series: Passengers
## Model: ARIMA(0,2,1) w/ poly
##
## Coefficients:
## ma1 constant
## -1.0000 0.0571
## s.e. 0.0585 0.0213
##
## sigma^2 estimated as 3.855: log likelihood=-95.1
## AIC=196.21 AICc=196.79 BIC=201.63
Just as expected with a difference order of 2 and a constant that \(\neq\) 0, the long term forecasts will follow a quadratic trend.
germany <- global_economy %>% filter(Country == "Germany")
head(germany)
## # A tsibble: 6 x 9 [1Y]
## # Key: Country [1]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Germany DEU 1960 NA NA 24.6 NA NA 72814900
## 2 Germany DEU 1961 NA NA 25.2 NA NA 73377632
## 3 Germany DEU 1962 NA NA 25.9 NA NA 74025784
## 4 Germany DEU 1963 NA NA 26.7 NA NA 74714353
## 5 Germany DEU 1964 NA NA 27.3 NA NA 75318337
## 6 Germany DEU 1965 NA NA 28.2 NA NA 75963695
germany %>% autoplot(GDP) + labs(title = "Germany GDP", y = "USD")
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).
# Determine order of difference using unit - root tests and check KPSS
germany %>% features(GDP, unitroot_ndiffs)
## # A tibble: 1 × 2
## Country ndiffs
## <fct> <int>
## 1 Germany 1
germany %>% features(difference(GDP), unitroot_kpss)
## # A tibble: 1 × 3
## Country kpss_stat kpss_pvalue
## <fct> <dbl> <dbl>
## 1 Germany 0.0536 0.1
# Perform differencing and save the differenced data in a new column
germany <- germany %>% mutate(GDP_diff = difference(GDP))
# Generate ACF plot
germany %>% ACF(GDP_diff) %>% autoplot() + labs(title = "ACF Plot")
germany %>% PACF(GDP_diff) %>% autoplot() + labs(title = "PACF Plot")
In the ACF plot the autocorrelations appear to follow a sinusodial wave and the PACF has a significant spike at lag three and all but one (lag 7) are not significant after that. Based on this information I would consider an AR(3) model.
There are 2 different versions of having the model get picked through automated processing using the ARIMA() function
# Stepwise
germany %>% model(ARIMA(GDP_diff)) %>% report()
## Series: GDP_diff
## Model: ARIMA(0,0,3)
##
## Coefficients:
## ma1 ma2 ma3
## 0.1223 -0.1215 0.3618
## s.e. 0.1809 0.1139 0.2049
##
## sigma^2 estimated as 3.07e+22: log likelihood=-1287.45
## AIC=2582.91 AICc=2583.66 BIC=2591.15
# Search
germany %>% model(ARIMA(GDP_diff, stepwise = FALSE)) %>% report()
## Series: GDP_diff
## Model: ARIMA(3,0,0)
##
## Coefficients:
## ar1 ar2 ar3
## 0.2349 -0.2219 0.4270
## s.e. 0.1313 0.1321 0.1371
##
## sigma^2 estimated as 2.941e+22: log likelihood=-1286.51
## AIC=2581.01 AICc=2581.77 BIC=2589.25
The ARIMA Model also chose and AR(3) when setting stepwise to False. We can compare the these two models using the corrected Akaike’s Information Criterion (AICc).
# Fit 2 models
fit <- germany %>% model(arima003 = ARIMA(GDP_diff ~ pdq(0, 0, 3)),
arima300 = ARIMA(GDP_diff ~ pdq(3, 0, 0))) %>%
pivot_longer(!Country, names_to = "Model name",
values_to = "Orders"
)
# Check AICc
glance(fit) %>% arrange(AICc) %>% select(`Model name`:BIC)
## # A tibble: 2 × 7
## `Model name` .model sigma2 log_lik AIC AICc BIC
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 arima300 Orders 2.94e22 -1287. 2581. 2582. 2589.
## 2 arima003 Orders 3.07e22 -1287. 2583. 2584. 2591.
The first order differenced GDP data that was determined in part b of the problem was used to fit the ARIMA model, so it did not require a differenced term in the ARIMA function. For the back shift notation the first order of differencing should be included. The back shift notation for an ARIMA(3,1,0) is as follows…
\[\\ \phi_1 = 0.2349 \\ \phi_2 = -0.2219 \\ \phi_3 = 0.4270 \\ 1st~order~difference = (1 - B) \\ AR(3) = (1 - \phi_1B - \phi_2B^2 - \phi_3B^3)yt = \epsilon_t \\ ARIMA(3,1,0) \\ (1 - B) * (1 - 0.2349B - -0.2219{B^2} - 0.4270B^3)yt = \epsilon_t. \\ \]