Assignment: Exercises 9.1, 9.2, 9.3, 9.5, 9.6, 9.7, 9.8 in Hyndman
Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
Answer: The Autocorrelation Function (ACF) measures the correlation of a time series with its past values and it is very much affected by the sample size. As a result, the various figures do not all indicate (with certainty) that the data are white noise.
The ACF of the small sample (36 numbers) shows some random spikes, but, given the small sample size, these spikes could be due to random variation rather than any underlying pattern. I would therefore be wary to conclude to a white noise for such a small sample size.
The middle ACF plot (with 360 numbers) more reliably indicates white noise. This is because a larger sample size (as 360) typically leads to more reliable estimates of autocorrelation. Also, the ACF plot shows fewer random spikes compared to the figure (with 36 numbers), and most of the values remain close to zero.
Lastly, with 1,000 observations, the ACF plot becomes more stable and provides a clearer picture of the series.The third ACF plot demonstrates values very close to zero, with minimal random fluctuations.The confidence intervals is also narrower, and there are no spikes crossing these intervals, which confirms the white noise nature of the series.
Answer: The differences in the critical values at different distances from the mean of zero and the variations in auto correlations in the ACF plots of white noise series can be attributed to several factors related to sample size, statistical significance, and the nature of random processes. Indeed, the differences in critical values across samples arise from the sample size and its impact on the standard error of the ACF estimates. Larger samples yield smaller critical values.As sample sizes increase, the stability of the ACF improves, leading to values that more consistently reflect the true white noise characteristics (close to zero with fewer significant spikes).
A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
gafa_stock
## # A tsibble: 5,032 x 8 [!]
## # Key: Symbol [4]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2014-01-02 79.4 79.6 78.9 79.0 67.0 58671200
## 2 AAPL 2014-01-03 79.0 79.1 77.2 77.3 65.5 98116900
## 3 AAPL 2014-01-06 76.8 78.1 76.2 77.7 65.9 103152700
## 4 AAPL 2014-01-07 77.8 78.0 76.8 77.1 65.4 79302300
## 5 AAPL 2014-01-08 77.0 77.9 77.0 77.6 65.8 64632400
## 6 AAPL 2014-01-09 78.1 78.1 76.5 76.6 65.0 69787200
## 7 AAPL 2014-01-10 77.1 77.3 75.9 76.1 64.5 76244000
## 8 AAPL 2014-01-13 75.7 77.5 75.7 76.5 64.9 94623200
## 9 AAPL 2014-01-14 76.9 78.1 76.8 78.1 66.1 83140400
## 10 AAPL 2014-01-15 79.1 80.0 78.8 79.6 67.5 97909700
## # ℹ 5,022 more rows
stock_prices <-gafa_stock|>
autoplot(Close) +
labs(y = "Stocks closing prices in ($US)")
stock_prices
amazon_prices <-gafa_stock|>
filter(Symbol == "AMZN")|>
autoplot(Close) +
labs(y = "Amazon Daily closing prices ($US)")
amazon_prices
amazon_2018 <-gafa_stock|>
filter(Symbol == "AMZN", year(Date) == 2018)|>
autoplot(Close) +
labs(y = "Amazon Daily closing prices ($US)")
amazon_2018
The ACF and PACF plots are evidence that the amazon series is non -stationary. The ACF (which measures the correlation of a time series with its past values) reveals a very slow decrease and the value of r1 is large and positive.
amazon_2018 <-gafa_stock|>
filter(Symbol == "AMZN", year (Date) == 2018)|>
PACF(Close) |>
autoplot() + labs(subtitle ="Amazon Daily closing prices ($US)")
## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.
amazon_2018
For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
Load the data
head(global_economy)
## # A tsibble: 6 x 9 [1Y]
## # Key: Country [1]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Afghanistan AFG 1960 537777811. NA NA 7.02 4.13 8996351
## 2 Afghanistan AFG 1961 548888896. NA NA 8.10 4.45 9166764
## 3 Afghanistan AFG 1962 546666678. NA NA 9.35 4.88 9345868
## 4 Afghanistan AFG 1963 751111191. NA NA 16.9 9.17 9533954
## 5 Afghanistan AFG 1964 800000044. NA NA 18.1 8.89 9731361
## 6 Afghanistan AFG 1965 1006666638. NA NA 21.4 11.3 9938414
library(fabletools)
library(dplyr)
library(ggplot2)
# Plot of Turkey GDP (per capita)
gdp_turkey <- global_economy |>
filter(Country == "Turkey") |>
mutate(GDP_per_capita = GDP / Population)
# Plot the GDP per capita
gdp_turkey_plot <- autoplot(gdp_turkey, GDP_per_capita) +
labs(subtitle = "Turkish GDP per Capita ($US)")
gdp_turkey_plot
## Log transformation of GDP per capita
gdp_turkey <- gdp_turkey |>
mutate(log_GDP_per_capita = log(GDP_per_capita))
# Plot the log-transformed GDP per capita
gdp_turkey_log_plot <- autoplot(gdp_turkey, log_GDP_per_capita) +
labs(subtitle = "Log of Turkish GDP per Capita ($US)")
gdp_turkey_log_plot
data are still non-stationary
# Convert gdp_turkey to a tsibble
gdp_turkey_tsibble <- gdp_turkey |>
as_tsibble(index = Year, key = Country) # Use Year as the index and Country as the key
# Ensure the log_GDP_per_capita is a numeric vector and apply seasonal differencing
gdp_diff <- gdp_turkey_tsibble |>
mutate(Diffed_log_GDP = log_GDP_per_capita - lag(log_GDP_per_capita, 12))
# Plot the differenced series
gdp_diff |>
autoplot(Diffed_log_GDP) +
ggtitle("Seasonally Differenced Log GDP per Capita for Turkey") +
labs(y = "Differenced Log GDP per Capita", x = "Year") +
theme_minimal()
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
# First-order differencing
log_GDP_diff1 <- diff(log(gdp_turkey$GDP_per_capita))
# Perform second-order differencing on log_GDP_per_capita
log_GDP_diff2 <- diff(log_GDP_diff1)
# Create a new dataframe with the differenced values
gdp_turkey_diff2 <- data.frame(
Year = gdp_turkey$Year[-(1:2)], # Adjust Year accordingly
Diff2 = log_GDP_diff2
)
# Check the resulting dataset
print(head(gdp_turkey_diff2))
## Year Diff2
## 1 1962 0.66299987
## 2 1963 0.04278701
## 3 1964 -0.07248857
## 4 1965 -0.01002184
## 5 1966 0.10120085
## 6 1967 -0.06363837
# Perform the KPSS test on the second-order differenced data
kpss_result <- kpss.test(gdp_turkey_diff2$Diff2)
## Warning in kpss.test(gdp_turkey_diff2$Diff2): p-value greater than printed
## p-value
# Print the KPSS test result
print(kpss_result)
##
## KPSS Test for Level Stationarity
##
## data: gdp_turkey_diff2$Diff2
## KPSS Level = 0.15237, Truncation lag parameter = 3, p-value = 0.1
# Load necessary libraries
library(tidyverse)
library(tsibble)
library(fable)
library(ggfortify)
# Create a new tibble for gg_tsdisplay
gdp_turkey_diff2_ts <- gdp_turkey_diff2 %>%
as_tsibble(index = Year)
# Check for stationarity using gg_tsdisplay
gdp_turkey_diff2_ts %>%
gg_tsdisplay(Diff2, plot_type = "partial")
# ACF and PACF calculations
acf_values <- acf(gdp_turkey_diff2$Diff2, plot = FALSE)
pacf_values <- pacf(gdp_turkey_diff2$Diff2, plot = FALSE)
# Convert ACF and PACF to data frames for ggplot
acf_df <- data.frame(
Lag = acf_values$lag,
ACF = acf_values$acf
)
pacf_df <- data.frame(
Lag = pacf_values$lag,
PACF = pacf_values$acf
)
# Plot ACF
acf_plot <- ggplot(acf_df, aes(x = Lag, y = ACF)) +
geom_bar(stat = "identity", fill = "blue") +
ggtitle("ACF of Second-Order Differenced log_GDP_per_capita") +
labs(x = "Lag", y = "ACF") +
theme_minimal()
print(acf_plot)
# Plot PACF
pacf_plot <- ggplot(pacf_df, aes(x = Lag, y = PACF)) +
geom_bar(stat = "identity", fill = "red") +
ggtitle("PACF of Second-Order Differenced log_GDP_per_capita") +
labs(x = "Lag", y = "PACF") +
theme_minimal()
print(pacf_plot)
### Summary of the differenced data
# Summary of the differenced data
summary(gdp_turkey_diff2$Diff2)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.5785559 -0.0898084 -0.0009105 0.0098423 0.0963592 0.6629999
# Load the aus_accommodation dataset
aus_accommodation
## # A tsibble: 592 x 5 [1Q]
## # Key: State [8]
## Date State Takings Occupancy CPI
## <qtr> <chr> <dbl> <dbl> <dbl>
## 1 1998 Q1 Australian Capital Territory 24.3 65 67
## 2 1998 Q2 Australian Capital Territory 22.3 59 67.4
## 3 1998 Q3 Australian Capital Territory 22.5 58 67.5
## 4 1998 Q4 Australian Capital Territory 24.4 59 67.8
## 5 1999 Q1 Australian Capital Territory 23.7 58 67.8
## 6 1999 Q2 Australian Capital Territory 25.4 61 68.1
## 7 1999 Q3 Australian Capital Territory 28.2 66 68.7
## 8 1999 Q4 Australian Capital Territory 25.8 60 69.1
## 9 2000 Q1 Australian Capital Territory 27.3 60.9 69.7
## 10 2000 Q2 Australian Capital Territory 30.1 64.7 70.2
## # ℹ 582 more rows
glimpse(aus_accommodation)
## Rows: 592
## Columns: 5
## Key: State [8]
## $ Date <qtr> 1998 Q1, 1998 Q2, 1998 Q3, 1998 Q4, 1999 Q1, 1999 Q2, 1999 Q…
## $ State <chr> "Australian Capital Territory", "Australian Capital Territor…
## $ Takings <dbl> 24.27889, 22.32354, 22.52939, 24.39096, 23.72143, 25.38245, …
## $ Occupancy <dbl> 65.0, 59.0, 58.0, 59.0, 58.0, 61.0, 66.0, 60.0, 60.9, 64.7, …
## $ CPI <dbl> 67.0, 67.4, 67.5, 67.8, 67.8, 68.1, 68.7, 69.1, 69.7, 70.2, …
tas_data <- aus_accommodation |>
dplyr::filter(State == "Tasmania")
# Find the appropriate Box-Cox transformation
lambda <- forecast::BoxCox.lambda(tas_accommodation$Takings) # Use forecast:: to specify the function
## Registered S3 methods overwritten by 'forecast':
## method from
## autoplot.Arima ggfortify
## autoplot.acf ggfortify
## autoplot.ar ggfortify
## autoplot.bats ggfortify
## autoplot.decomposed.ts ggfortify
## autoplot.ets ggfortify
## autoplot.forecast ggfortify
## autoplot.stl ggfortify
## autoplot.ts ggfortify
## fitted.ar ggfortify
## fortify.ts ggfortify
## residuals.ar ggfortify
lambda
## [1] -0.005076712
# Step 1: Create first differenced column (Diff_Takings)
tas_accommodation2 <- tas_accommodation |>
mutate(Diff_Takings = c(NA, diff(Takings))) # First differencing
# Step 2: Create second differenced column (Diff2_Takings)
tas_accommodation2 <- tas_accommodation2 |>
mutate(Diff2_Takings = c(NA, diff(Diff_Takings))) # Second differencing
# Display the first few rows of the dataset
head(tas_accommodation2, n = 3)
## # A tsibble: 3 x 8 [1Q]
## # Key: State [1]
## Date State Takings Occupancy CPI Transformed_Takings Diff_Takings
## <qtr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1998 Q1 Tasmania 28.7 67 67 3.33 NA
## 2 1998 Q2 Tasmania 19.0 45 67.4 2.92 -9.68
## 3 1998 Q3 Tasmania 16.1 39 67.5 2.76 -2.91
## # ℹ 1 more variable: Diff2_Takings <dbl>
# Step 2: Convert to tsibble format (ensure the tsibble is created from the updated dataset)
tas_accommodation_ts <- tas_accommodation2 |>
as_tsibble(index = Date)
# Step 3: Check structure to ensure Diff2_Takings exists
str(tas_accommodation_ts)
## tbl_ts [74 × 8] (S3: tbl_ts/tbl_df/tbl/data.frame)
## $ Date : qtr [1:74] 1998 Q1, 1998 Q2, 1998 Q3, 1998 Q4, 1999 Q1, 1999 Q2, 1999 ...
## ..@ fiscal_start: num 1
## $ State : chr [1:74] "Tasmania" "Tasmania" "Tasmania" "Tasmania" ...
## $ Takings : num [1:74] 28.7 19 16.1 25.9 28.4 ...
## $ Occupancy : num [1:74] 67 45 39 56 66 48 41 56 66.2 49.7 ...
## $ CPI : num [1:74] 67 67.4 67.5 67.8 67.8 68.1 68.7 69.1 69.7 70.2 ...
## $ Transformed_Takings: num [1:74] 3.33 2.92 2.76 3.23 3.32 ...
## ..- attr(*, "lambda")= num -0.00508
## $ Diff_Takings : num [1:74] NA -9.68 -2.91 9.78 2.47 ...
## $ Diff2_Takings : num [1:74] NA NA 6.77 12.69 -7.32 ...
## - attr(*, "key")= tibble [1 × 2] (S3: tbl_df/tbl/data.frame)
## ..$ State: chr "Tasmania"
## ..$ .rows: list<int> [1:1]
## .. ..$ : int [1:74] 1 2 3 4 5 6 7 8 9 10 ...
## .. ..@ ptype: int(0)
## ..- attr(*, ".drop")= logi TRUE
## - attr(*, "index")= chr "Date"
## ..- attr(*, "ordered")= logi TRUE
## - attr(*, "index2")= chr "Date"
## - attr(*, "interval")= interval [1:1] 1Q
## ..@ .regular: logi TRUE
# Step 4: Plot time series and ACF using gg_tsdisplay
gg_tsdisplay(tas_accommodation_ts, Diff2_Takings, plot_type = "partial")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).
library(tsibble)
library(fable)
library(fabletools)
library(fable)
library(ggplot2)
library(ggfortify)
library(tidyverse)
library(fpp3)
souvenirs
## # A tsibble: 84 x 2 [1M]
## Month Sales
## <mth> <dbl>
## 1 1987 Jan 1665.
## 2 1987 Feb 2398.
## 3 1987 Mar 2841.
## 4 1987 Apr 3547.
## 5 1987 May 3753.
## 6 1987 Jun 3715.
## 7 1987 Jul 4350.
## 8 1987 Aug 3566.
## 9 1987 Sep 5022.
## 10 1987 Oct 6423.
## # ℹ 74 more rows
head(souvenirs)
## # A tsibble: 6 x 2 [1M]
## Month Sales
## <mth> <dbl>
## 1 1987 Jan 1665.
## 2 1987 Feb 2398.
## 3 1987 Mar 2841.
## 4 1987 Apr 3547.
## 5 1987 May 3753.
## 6 1987 Jun 3715.
# Glimpse of the souvenirs dataset
glimpse(souvenirs)
## Rows: 84
## Columns: 2
## $ Month <mth> 1987 Jan, 1987 Feb, 1987 Mar, 1987 Apr, 1987 May, 1987 Jun, 1987…
## $ Sales <dbl> 1664.81, 2397.53, 2840.71, 3547.29, 3752.96, 3714.74, 4349.61, 3…
library(forecast)
## Warning: package 'forecast' was built under R version 4.3.3
# Find the appropriate Box-Cox transformation lambda
lambda <- BoxCox.lambda(souvenirs$Sales)
lambda
## [1] -0.2444328
# Apply the Box-Cox transformation
souvenirs <- souvenirs %>%
mutate(Sales_bc = BoxCox(Sales, lambda))
# Apply first differencing
souvenirs <- souvenirs %>%
mutate(Diff_Sales = c(NA, diff(Sales)))
# Apply second differencing if necessary
souvenirs <- souvenirs %>%
mutate(Diff2_Sales = c(NA, diff(Diff_Sales)))
# View first few rows to check the differencing results
head(souvenirs, n = 3)
## # A tsibble: 3 x 5 [1M]
## Month Sales Sales_bc Diff_Sales Diff2_Sales
## <mth> <dbl> <dbl> <dbl> <dbl>
## 1 1987 Jan 1665. 3.42 NA NA
## 2 1987 Feb 2398. 3.48 733. NA
## 3 1987 Mar 2841. 3.51 443. -290.
# Visualize second differenced series
gg_tsdisplay(souvenirs, Diff2_Sales, plot_type = "partial")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).
aus_retail
## # A tsibble: 64,532 x 5 [1M]
## # Key: State, Industry [152]
## State Industry `Series ID` Month Turnover
## <chr> <chr> <chr> <mth> <dbl>
## 1 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Apr 4.4
## 2 Australian Capital Territory Cafes, restaurant… A3349849A 1982 May 3.4
## 3 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Jun 3.6
## 4 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Jul 4
## 5 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Aug 3.6
## 6 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Sep 4.2
## 7 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Oct 4.8
## 8 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Nov 5.4
## 9 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Dec 6.9
## 10 Australian Capital Territory Cafes, restaurant… A3349849A 1983 Jan 3.8
## # ℹ 64,522 more rows
set.seed(12345678)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
set.seed(131017)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
head(myseries, n=3)
## # A tsibble: 3 x 5 [1M]
## # Key: State, Industry [1]
## State Industry `Series ID` Month Turnover
## <chr> <chr> <chr> <mth> <dbl>
## 1 Northern Territory Food retailing A3349527K 1988 Apr 25.6
## 2 Northern Territory Food retailing A3349527K 1988 May 26.7
## 3 Northern Territory Food retailing A3349527K 1988 Jun 27.7
# Filter out a subset for simplicity (e.g., a specific state and industry)
myseries_vic <- aus_retail |>
filter(State == "Victoria", Industry == "Cafes, restaurants and catering services")
head(myseries_vic, n =3)
## # A tsibble: 3 x 5 [1M]
## # Key: State, Industry [1]
## State Industry `Series ID` Month Turnover
## <chr> <chr> <chr> <mth> <dbl>
## 1 Victoria Cafes, restaurants and catering servic… A3349640L 1982 Apr 36.4
## 2 Victoria Cafes, restaurants and catering servic… A3349640L 1982 May 36.2
## 3 Victoria Cafes, restaurants and catering servic… A3349640L 1982 Jun 35.7
autoplot(myseries_vic, Turnover) +
labs(title = "Retail Turnover Time Series")
This plot shows that the series needs some log transformation to
stabilize the variance
# log transformation for stabilizing variance)
lambda <- myseries_vic |> features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
lambda
## [1] 0.1734096
Lambda is not equal to 1, so we need to apply a Box-Cox transformation. ### Apply Box-Cox transformation (since lambda is not equal to 1).
# Box-Cox transformation
if (lambda != 1) { myseries_vic <- myseries_vic |> mutate(Turnover = box_cox(Turnover, lambda)) }
transformed <- TRUE
if (transformed) {
autoplot(myseries_vic, Turnover) +
labs(title = "Transformed Retail Turnover Time Series")
} else {
autoplot(myseries_vic, Turnover) +
labs(title = "Retail Turnover Time Series")
}
# Check for stationarity using a time series plot and ACF plot
gg_tsdisplay(myseries_vic, Turnover, plot_type = "partial")
library(dplyr)
library(tsibble)
library(lubridate)
# Apply first differencing to 'Turnover'
myseries_diff1 <- myseries_vic %>%
mutate(Diff1 = c(NA, diff(Turnover))) # Prepend NA to keep the length consistent
# View the resulting dataset
head(myseries_diff1)
## # A tsibble: 6 x 6 [1M]
## # Key: State, Industry [1]
## State Industry `Series ID` Month Turnover Diff1
## <chr> <chr> <chr> <mth> <dbl> <dbl>
## 1 Victoria Cafes, restaurants and caterin… A3349640L 1982 Apr 4.99 NA
## 2 Victoria Cafes, restaurants and caterin… A3349640L 1982 May 4.98 -0.0103
## 3 Victoria Cafes, restaurants and caterin… A3349640L 1982 Jun 4.95 -0.0259
## 4 Victoria Cafes, restaurants and caterin… A3349640L 1982 Jul 4.89 -0.0580
## 5 Victoria Cafes, restaurants and caterin… A3349640L 1982 Aug 4.78 -0.115
## 6 Victoria Cafes, restaurants and caterin… A3349640L 1982 Sep 4.86 0.0774
gg_tsdisplay(myseries_diff1, Diff1, plot_type = "partial")
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
# Apply second differencing to 'Turnover'
myseries_diff2 <- myseries_diff1 %>%
mutate(Diff2 = c(NA, NA, diff(Turnover, differences = 2))) # Prepend two NAs to keep the length consistent
head(myseries_diff2)
## # A tsibble: 6 x 7 [1M]
## # Key: State, Industry [1]
## State Industry `Series ID` Month Turnover Diff1 Diff2
## <chr> <chr> <chr> <mth> <dbl> <dbl> <dbl>
## 1 Victoria Cafes, restaurants and… A3349640L 1982 Apr 4.99 NA NA
## 2 Victoria Cafes, restaurants and… A3349640L 1982 May 4.98 -0.0103 NA
## 3 Victoria Cafes, restaurants and… A3349640L 1982 Jun 4.95 -0.0259 -0.0156
## 4 Victoria Cafes, restaurants and… A3349640L 1982 Jul 4.89 -0.0580 -0.0321
## 5 Victoria Cafes, restaurants and… A3349640L 1982 Aug 4.78 -0.115 -0.0571
## 6 Victoria Cafes, restaurants and… A3349640L 1982 Sep 4.86 0.0774 0.193
gg_tsdisplay(myseries_diff2, Diff2, plot_type = "partial")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).
library(fabletools)
library(dplyr)
# KPSS test for the original Turnover
kpss_original <- myseries_vic |>
features(Turnover, unitroot_kpss)
kpss_original
## # A tibble: 1 × 4
## State Industry kpss_stat kpss_pvalue
## <chr> <chr> <dbl> <dbl>
## 1 Victoria Cafes, restaurants and catering services 7.34 0.01
library(fabletools)
library(dplyr)
# KPSS test for first differencing
kpss_diff1 <- myseries_diff1 |>
features(Diff1, unitroot_kpss)
kpss_diff1
## # A tibble: 1 × 4
## State Industry kpss_stat kpss_pvalue
## <chr> <chr> <dbl> <dbl>
## 1 Victoria Cafes, restaurants and catering services 0.0205 0.1
library(fabletools)
library(dplyr)
# KPSS test for second differencing
kpss_diff2 <- myseries_diff2 |>
features(Diff2, unitroot_kpss)
kpss_diff2
## # A tibble: 1 × 4
## State Industry kpss_stat kpss_pvalue
## <chr> <chr> <dbl> <dbl>
## 1 Victoria Cafes, restaurants and catering services 0.00617 0.1
library(fabletools)
library(dplyr)
# Use the KPSS test to confirm stationarity
kpss_result <- myseries_diff1 |>
features(Diff1, unitroot_kpss)
kpss_result
## # A tibble: 1 × 4
## State Industry kpss_stat kpss_pvalue
## <chr> <chr> <dbl> <dbl>
## 1 Victoria Cafes, restaurants and catering services 0.0205 0.1
# KPSS test on Turnover
kpss_turnover <- kpss.test(myseries_diff2$Turnover)
## Warning in kpss.test(myseries_diff2$Turnover): p-value smaller than printed
## p-value
print(kpss_turnover)
##
## KPSS Test for Level Stationarity
##
## data: myseries_diff2$Turnover
## KPSS Level = 7.3383, Truncation lag parameter = 5, p-value = 0.01
# KPSS test on first differencing
kpss_diff1 <- kpss.test(myseries_diff2$Diff1, null = "Trend")
## Warning in kpss.test(myseries_diff2$Diff1, null = "Trend"): p-value greater
## than printed p-value
print(kpss_diff1)
##
## KPSS Test for Trend Stationarity
##
## data: myseries_diff2$Diff1
## KPSS Trend = 0.020508, Truncation lag parameter = 5, p-value = 0.1
# KPSS test on second differencing
kpss_diff2 <- kpss.test(myseries_diff2$Diff2, null = "Trend")
## Warning in kpss.test(myseries_diff2$Diff2, null = "Trend"): p-value greater
## than printed p-value
print(kpss_diff2)
##
## KPSS Test for Trend Stationarity
##
## data: myseries_diff2$Diff2
## KPSS Trend = 0.005508, Truncation lag parameter = 5, p-value = 0.1
Results interpretation:
Turnover: Non-stationary. The KPSS test for level stationarity indicates that the series is non-stationary. Since the p-value is significantly smaller than 0.05, we reject the null hypothesis, suggesting that the Turnover series is not stationary.
First Differencing (Diff1): Likely stationary (borderline). The KPSS test for trend stationarity indicates that the series may be stationary. The p-value (0.1) is greater than the common significance level of 0.05 but less than 0.1, suggesting that we cannot reject the null hypothesis of stationarity at the 0.05 level, but we may consider it borderline. An additional tests for confirmation might be useful.
Second Differencing (Diff2): Likely stationary (borderline). Similar to the first differencing, the KPSS test for trend stationarity indicates that the second differenced series may also be stationary. Again, the p-value (0.1) is at the borderline for the 0.05 level, suggesting that we cannot reject the null hypothesis of stationarity.
Overall, we can proceed with modeling using either Diff1 or Diff2 based on the requirements of our forecasting approach.
and σ2 = 1. The process starts with y1 = 0.
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
The code provides a comprehensive way to visualize and analyze the impact of changing parameters on ARIMA model behavior.
# simulate data from an AR(1) model with ϕ1=0.6 ϕ and σ2=1.
library(tsibble)
library(ggplot2)
# AR(1) Simulation
set.seed(1310) # For reproducibility
y_ar1 <- numeric(100)
e_ar1 <- rnorm(100)
for(i in 2:100) {
y_ar1[i] <- 0.6 * y_ar1[i-1] + e_ar1[i]
}
# Create a tsibble
ar1_data <- tsibble(idx = seq_len(100), y = y_ar1, index = idx)
# Plot the AR(1) series
ggplot(ar1_data, aes(x = idx, y = y)) +
geom_line() +
labs(title = "AR(1) Model: φ1 = 0.6", x = "Time", y = "Value") +
theme_minimal()
Now, Since we can modify the coefficient, let’s see how the plot changes with different values of ϕ1:
# AR(1) Simulation with φ1 = 0.9
y_ar1_high <- numeric(100)
for(i in 2:100) {
y_ar1_high[i] <- 0.9 * y_ar1_high[i-1] + e_ar1[i]
}
ar1_high_data <- tsibble(idx = seq_len(100), y = y_ar1_high, index = idx)
# Plot
ggplot(ar1_high_data, aes(x = idx, y = y)) +
geom_line() +
labs(title = "AR(1) Model: φ1 = 0.9", x = "Time", y = "Value") +
theme_minimal()
# AR(1) Simulation with φ1 = -0.6
y_ar1_neg <- numeric(100)
for(i in 2:100) {
y_ar1_neg[i] <- -0.6 * y_ar1_neg[i-1] + e_ar1[i]
}
ar1_neg_data <- tsibble(idx = seq_len(100), y = y_ar1_neg, index = idx)
# Plot
ggplot(ar1_neg_data, aes(x = idx, y = y)) +
geom_line() +
labs(title = "AR(1) Model: φ1 = -0.6", x = "Time", y = "Value") +
theme_minimal()
θ1 = 0.6
# MA(1) Simulation
set.seed(1310) # For reproducibility
y_ma1 <- numeric(100)
e_ma1 <- rnorm(100)
for(i in 2:100) {
y_ma1[i] <- e_ma1[i] + 0.6 * e_ma1[i-1]
}
# Create a tsibble
ma1_data <- tsibble(idx = seq_len(100), y = y_ma1, index = idx)
# Plot the MA(1) series
ggplot(ma1_data, aes(x = idx, y = y)) +
geom_line() +
labs(title = "MA(1) Model: θ1 = 0.6", x = "Time", y = "Value") +
theme_minimal()
Let see how the plot changes with different values of ϕ1:
# MA(1) Simulation with θ1 = 0.9
y_ma1_high <- numeric(100)
for(i in 2:100) {
y_ma1_high[i] <- e_ma1[i] + 0.9 * e_ma1[i-1]
}
ma1_high_data <- tsibble(idx = seq_len(100), y = y_ma1_high, index = idx)
# Plot
ggplot(ma1_high_data, aes(x = idx, y = y)) +
geom_line() +
labs(title = "MA(1) Model: θ1 = 0.9", x = "Time", y = "Value") +
theme_minimal()
# MA(1) Simulation with θ1 = -0.6
y_ma1_neg <- numeric(100)
for(i in 2:100) {
y_ma1_neg[i] <- e_ma1[i] - 0.6 * e_ma1[i-1]
}
ma1_neg_data <- tsibble(idx = seq_len(100), y = y_ma1_neg, index = idx)
# Plot
ggplot(ma1_neg_data, aes(x = idx, y = y)) +
geom_line() +
labs(title = "MA(1) Model: θ1 = -0.6", x = "Time", y = "Value") +
theme_minimal()
Conclusion: One notices that while the ARMA(1,1) model shows a more stationary pattern, the AR(2) model shows non-stationary behavior due to its parameter values.
# ARMA(1,1) Simulation
set.seed(1310) # For reproducibility
y_arma <- numeric(100)
e_arma <- rnorm(100)
for(i in 2:100) {
y_arma[i] <- 0.6 * y_arma[i-1] + e_arma[i] + 0.6 * e_arma[i-1]
}
# Create a tsibble
arma_data <- tsibble(idx = seq_len(100), y = y_arma, index = idx)
# Plot the ARMA(1,1) series
ggplot(arma_data, aes(x = idx, y = y)) +
geom_line() +
labs(title = "ARMA(1,1) Model: φ1 = 0.6, θ1 = 0.6", x = "Time", y = "Value") +
theme_minimal()
= − 0.8 and ϕ2 = 0.3
# AR(2) Simulation
set.seed(1310) # For reproducibility
y_ar2 <- numeric(100)
for(i in 3:100) {
y_ar2[i] <- -0.8 * y_ar2[i-1] + 0.3 * y_ar2[i-2] + e_ar1[i]
}
# Create a tsibble
ar2_data <- tsibble(idx = seq_len(100), y = y_ar2, index = idx)
# Plot the AR(2) series
ggplot(ar2_data, aes(x = idx, y = y)) +
geom_line() +
labs(title = "AR(2) Model: φ1 = -0.8, φ2 = 0.3", x = "Time", y = "Value") +
theme_minimal()
Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.
Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods. Write the model in terms of the backshift operator. Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a. Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens. Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?
# Load required libraries
library(fable)
library(fabletools)
library(tsibble)
library(dplyr)
library(ggplot2)
library(tsibbledata)
# Load the dataset
aus_airpassengers
## # A tsibble: 47 x 2 [1Y]
## Year Passengers
## <dbl> <dbl>
## 1 1970 7.32
## 2 1971 7.33
## 3 1972 7.80
## 4 1973 9.38
## 5 1974 10.7
## 6 1975 11.1
## 7 1976 10.9
## 8 1977 11.3
## 9 1978 12.1
## 10 1979 13.0
## # ℹ 37 more rows
glimpse(aus_airpassengers)
## Rows: 47
## Columns: 2
## $ Year <dbl> 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979,…
## $ Passengers <dbl> 7.3187, 7.3266, 7.7956, 9.3846, 10.6647, 11.0551, 10.8643, …
library(urca)
# a. Fit an ARIMA model to the data
fit <- aus_airpassengers %>%
model(ARIMA(Passengers))
# Instead of summary, use tidy() to get model coefficients
model_summary <- tidy(fit)
print(model_summary)
## # A tibble: 1 × 6
## .model term estimate std.error statistic p.value
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 ARIMA(Passengers) ma1 -0.896 0.0594 -15.1 3.24e-19
# Load required libraries
library(fable)
library(fabletools)
library(tsibble)
library(dplyr)
library(ggplot2)
library(tsibbledata)
library(gridExtra)
# Load the dataset
data("aus_airpassengers")
# a. Fit an ARIMA model to the data
fit <- aus_airpassengers %>%
model(ARIMA(Passengers))
# Summary of the model coefficients
model_summary <- tidy(fit)
print(model_summary)
## # A tibble: 1 × 6
## .model term estimate std.error statistic p.value
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 ARIMA(Passengers) ma1 -0.896 0.0594 -15.1 3.24e-19
# Extract residuals using augment()
residuals_fit <- fit %>%
augment() %>% # Augment the fitted model with residuals
pull(.resid) # Pull the residuals column
# Set up a 2x2 plotting area for diagnostics
par(mfrow = c(1, 1))
# Residuals vs. Time
plot.ts(residuals_fit, main = "Residuals", ylab = "Residuals", xlab = "Time")
abline(h = 0, col = "red")
# Histogram of residuals
hist(residuals_fit, main = "Histogram of Residuals", xlab = "Residuals", breaks = 10)
# Q-Q plot using ggplot2
qq_plot_data <- data.frame(residuals = residuals_fit)
ggplot(qq_plot_data, aes(sample = residuals)) +
geom_qq() +
geom_qq_line() +
ggtitle("Q-Q Plot of Residuals") +
xlab("Theoretical Quantiles") +
ylab("Sample Quantiles")
# ACF of residuals
acf(residuals_fit, main = "ACF of Residuals")
# Reset plotting parameters to default
par(mfrow = c(1, 1))
# Plot forecast for the next 10 periods
forecasts <- fit %>%
forecast(h = "10 years")
# Plot the forecasts
autoplot(forecasts) +
ggtitle("ARIMA Model Forecast for Air Passengers") +
labs(y = "Passengers (in millions)")
# Fit ARIMA(0,1,0) with drift
fit_drift <- aus_airpassengers %>%
model(ARIMA(Passengers ~ 1))
# Forecast for the next 10 periods
forecasts_drift <- fit_drift %>%
forecast(h = "10 years")
# Plot the forecasts
autoplot(forecasts_drift) +
ggtitle("ARIMA(0,1,0) Model with Drift Forecast for Air Passengers") +
labs(y = "Passengers (in millions)")
Comparison: We can compare the forecast plots visually by plotting them
on the same graph.
###d. ARIMA(2,1,2) Model with Drift: Fit an ARIMA(2,1,2) model with drift and plot forecasts.
# Load the dataset
data("aus_airpassengers")
# Fit ARIMA(2,1,2) with a constant term
fit_arima212 <- aus_airpassengers %>%
model(ARIMA(Passengers ~ pdq(2, 1, 2) + PDQ(0, 0, 0))) # Specify ARIMA(2,1,2) explicitly
## Warning: It looks like you're trying to fully specify your ARIMA model but have not said if a constant should be included.
## You can include a constant using `ARIMA(y~1)` to the formula or exclude it by adding `ARIMA(y~0)`.
## Warning: 1 error encountered for ARIMA(Passengers ~ pdq(2, 1, 2) + PDQ(0, 0, 0))
## [1] Could not find an appropriate ARIMA model.
## This is likely because automatic selection does not select models with characteristic roots that may be numerically unstable.
## For more details, refer to https://otexts.com/fpp3/arima-r.html#plotting-the-characteristic-roots
# Forecast for the next 10 periods
forecasts_arima212 <- fit_arima212 %>%
forecast(h = "10 years")
# Plot the forecasts
autoplot(aus_airpassengers, Passengers) +
autolayer(forecasts_arima212, level = NULL) + # Add forecast with no confidence interval
ggtitle("ARIMA(2,1,2) Model with Constant Term Forecast for Air Passengers") +
labs(y = "Passengers")
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).
You can fit the model again without the constant and observe the results.
# Load the dataset
data("aus_airpassengers")
# Fit ARIMA(2,1,2) without a constant term
fit_arima212_no_const <- aus_airpassengers %>%
model(ARIMA(Passengers ~ pdq(2, 1, 2))) # Removed the constant
## Warning: It looks like you're trying to fully specify your ARIMA model but have not said if a constant should be included.
## You can include a constant using `ARIMA(y~1)` to the formula or exclude it by adding `ARIMA(y~0)`.
## Warning: 1 error encountered for ARIMA(Passengers ~ pdq(2, 1, 2))
## [1] Could not find an appropriate ARIMA model.
## This is likely because automatic selection does not select models with characteristic roots that may be numerically unstable.
## For more details, refer to https://otexts.com/fpp3/arima-r.html#plotting-the-characteristic-roots
# Forecast for the next 10 periods
forecasts_arima212_no_const <- fit_arima212_no_const %>%
forecast(h = "10 years")
# Plot the forecasts
autoplot(aus_airpassengers, Passengers) +
autolayer(forecasts_arima212_no_const, level = NULL) + # Add forecast with no confidence interval
ggtitle("ARIMA(2,1,2) Model Without Constant Term Forecast for Air Passengers") +
labs(y = "Passengers")
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).
### Combined Plot of Both ARIMA Models
# Load necessary libraries
library(tsibble)
library(fable)
library(ggplot2)
library(dplyr)
# Load the dataset
data("aus_airpassengers")
# Convert the dataset to a tsibble
aus_airpassengers_tsibble <- as_tsibble(aus_airpassengers, index = Year)
# Fit ARIMA(2,1,2) with a constant term (using 1 to indicate constant)
fit_arima212 <- aus_airpassengers_tsibble %>%
model(ARIMA(Passengers ~ 1 + pdq(2, 1, 2))) # Model with constant
# Forecast for the next 10 years with constant
forecasts_arima212 <- fit_arima212 %>%
forecast(h = "10 years") %>%
mutate(Type = "With Constant") # Add type column
# Fit ARIMA(2,1,2) without a constant term (using 0 to exclude constant)
fit_arima212_no_const <- aus_airpassengers_tsibble %>%
model(ARIMA(Passengers ~ 0 + pdq(2, 1, 2))) # Model without constant
## Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 2))
## [1] non-stationary AR part from CSS
# Forecast for the next 10 years without constant
forecasts_arima212_no_const <- fit_arima212_no_const %>%
forecast(h = "10 years") %>%
mutate(Type = "Without Constant") # Add type column
# Combine the forecasts for plotting
forecasts_combined <- bind_rows(
forecasts_arima212,
forecasts_arima212_no_const
)
# Plot both forecasts side by side using facet_wrap
ggplot() +
geom_line(data = aus_airpassengers_tsibble, aes(x = Year, y = Passengers), color = "black", size = 1, linetype = "dashed") + # Original data
geom_line(data = forecasts_combined, aes(x = Year, y = .mean, color = Type), size = 1) +
facet_wrap(~ Type, scales = "free_y") + # Create facets for each forecast type
ggtitle("ARIMA(2,1,2) Model Forecasts for Air Passengers") +
labs(y = "Passengers", x = "Year") +
scale_color_manual(values = c("With Constant" = "blue", "Without Constant" = "red")) +
theme(legend.title = element_blank())
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).
With Constant, the model includes a drift term that allows for a trend
in the data, resulting in a forecast that may continue to grow or
decline depending on the trend. Without Constant, the model assumes the
differenced series has no average change over time, leading to forecasts
that eventually flatten out. Removing the constant is appropriate when
we believe that the long-term behavior of the series should not exhibit
a linear trend. Overall, if we need to model a series with a persistent
trend, it’s usually better to include the constant.
For the United States GDP series (from global_economy):
global_economy
## # A tsibble: 15,150 x 9 [1Y]
## # Key: Country [263]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Afghanistan AFG 1960 537777811. NA NA 7.02 4.13 8996351
## 2 Afghanistan AFG 1961 548888896. NA NA 8.10 4.45 9166764
## 3 Afghanistan AFG 1962 546666678. NA NA 9.35 4.88 9345868
## 4 Afghanistan AFG 1963 751111191. NA NA 16.9 9.17 9533954
## 5 Afghanistan AFG 1964 800000044. NA NA 18.1 8.89 9731361
## 6 Afghanistan AFG 1965 1006666638. NA NA 21.4 11.3 9938414
## 7 Afghanistan AFG 1966 1399999967. NA NA 18.6 8.57 10152331
## 8 Afghanistan AFG 1967 1673333418. NA NA 14.2 6.77 10372630
## 9 Afghanistan AFG 1968 1373333367. NA NA 15.2 8.90 10604346
## 10 Afghanistan AFG 1969 1408888922. NA NA 15.0 10.1 10854428
## # ℹ 15,140 more rows
head(global_economy)
## # A tsibble: 6 x 9 [1Y]
## # Key: Country [1]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Afghanistan AFG 1960 537777811. NA NA 7.02 4.13 8996351
## 2 Afghanistan AFG 1961 548888896. NA NA 8.10 4.45 9166764
## 3 Afghanistan AFG 1962 546666678. NA NA 9.35 4.88 9345868
## 4 Afghanistan AFG 1963 751111191. NA NA 16.9 9.17 9533954
## 5 Afghanistan AFG 1964 800000044. NA NA 18.1 8.89 9731361
## 6 Afghanistan AFG 1965 1006666638. NA NA 21.4 11.3 9938414
# extract US GDP series
us_gdp <- global_economy |>
filter(Country == "United States") |>
dplyr:: select(GDP)
head(us_gdp)
## # A tsibble: 6 x 2 [1Y]
## GDP Year
## <dbl> <dbl>
## 1 543300000000 1960
## 2 563300000000 1961
## 3 605100000000 1962
## 4 638600000000 1963
## 5 685800000000 1964
## 6 743700000000 1965
# Find the ideal lambda
gdp_lambda <- us_gdp |>
features(GDP, features = guerrero) |>
pull(lambda_guerrero)
gdp_lambda
## [1] 0.2819443
#Box-Cox transformation
us_gdp <- us_gdp |>
mutate(GDP_transformed = box_cox(GDP, gdp_lambda))
head(us_gdp, n = 3)
## # A tsibble: 3 x 3 [1Y]
## GDP Year GDP_transformed
## <dbl> <dbl> <dbl>
## 1 543300000000 1960 7215.
## 2 563300000000 1961 7289.
## 3 605100000000 1962 7438.
library(urca)
gdp_arima <- us_gdp |>
model(ARIMA(GDP_transformed))
report(gdp_arima)
## Series: GDP_transformed
## Model: ARIMA(1,1,0) w/ drift
##
## Coefficients:
## ar1 constant
## 0.4586 118.1822
## s.e. 0.1198 9.5047
##
## sigma^2 estimated as 5479: log likelihood=-325.32
## AIC=656.65 AICc=657.1 BIC=662.78
##Compare results with ETS (no transformation):
gdp_ets <- us_gdp %>%
model(ETS(GDP))
gdp_ets_forecast <- gdp_ets %>%
forecast(h = "5 years")
gdp_ets
## # A mable: 1 x 1
## `ETS(GDP)`
## <model>
## 1 <ETS(M,A,N)>
# Plot the GDP series with the 10-step-ahead forecast
autoplot(us_gdp, GDP) +
autolayer(gdp_ets_forecast, GDP, series = "Forecast", color = "blue") +
ggtitle("GDP Forecasts using ETS Model") +
ylab("GDP") +
xlab("Year") +
theme_minimal()
## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), :
## Ignoring unknown parameters: `series`
## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `series`
gdp_arima <- us_gdp |>
model(ARIMA(GDP_transformed))
# Generate forecast for 5 periods ahead
gdp_forecast <- gdp_arima |>
forecast(h = 5)
# Plot the forecast
gdp_forecast |>
autoplot(us_gdp) +
labs(title = "GDP Forecast using ARIMA Model",
y = "Transformed GDP",
x = "Year")
# Fit ARIMA models
gdp_arima <- us_gdp |>
model(ARIMA(GDP_transformed))
gdp_arima2 <- us_gdp |>
model(ARIMA(GDP_transformed ~ pdq(1,1,0) + PDQ(0,1,1)))
gdp_arima
## # A mable: 1 x 1
## `ARIMA(GDP_transformed)`
## <model>
## 1 <ARIMA(1,1,0) w/ drift>
# Compare AIC values and choose the best model
aic_arima1 <- glance(gdp_arima) %>% pull(AIC)
aic_arima2 <- glance(gdp_arima2) %>% pull(AIC)
best_model <- ifelse(aic_arima1 < aic_arima2, gdp_arima, gdp_arima2)
best_model
## [[1]]
## <lst_mdl[1]>
## [1] <ARIMA(1,1,0) w/ drift>
# Convert to mable to use gg_tsresiduals
best_model <- if(aic_arima1 < aic_arima2) {
gdp_arima
} else {
gdp_arima2
}
best_model |>
gg_tsresiduals() +
ggtitle("Residuals of the Best ARIMA Model")
# Produce forecasts
gdp_forecast <- best_model |>
forecast(h = "5 years")
# Produce forecasts
gdp_forecast <- best_model |>
forecast(h = "5 years")
gdp_forecast
## # A fable: 5 x 4 [1Y]
## # Key: .model [1]
## .model Year GDP_transformed .mean
## <chr> <dbl> <dist> <dbl>
## 1 ARIMA(GDP_transformed ~ pdq(1, 1, 0) + PDQ(0, 1,… 2018 N(19996, 5479) 19996.
## 2 ARIMA(GDP_transformed ~ pdq(1, 1, 0) + PDQ(0, 1,… 2019 N(20215, 17136) 20215.
## 3 ARIMA(GDP_transformed ~ pdq(1, 1, 0) + PDQ(0, 1,… 2020 N(20434, 32397) 20434.
## 4 ARIMA(GDP_transformed ~ pdq(1, 1, 0) + PDQ(0, 1,… 2021 N(20652, 49472) 20652.
## 5 ARIMA(GDP_transformed ~ pdq(1, 1, 0) + PDQ(0, 1,… 2022 N(20871, 67414) 20871.
autoplot(us_gdp, GDP) +
autolayer(gdp_forecast, .fitted, color = "red") +
ggtitle("GDP Forecasts")
Forecasts showing a stable linear upward trend for variables like GDP
are unrealistic. They fail to account for real-world disruptions such as
international conflicts, pandemics, or economic recessions, which often
cause unexpected fluctuations.
# Compare results with ETS (no transformation)
gdp_ets <- us_gdp |>
model(ETS(GDP))
gdp_ets
## # A mable: 1 x 1
## `ETS(GDP)`
## <model>
## 1 <ETS(M,A,N)>
gdp_ets_forecast <- gdp_ets |>
forecast(h = "5 years")
gdp_ets_forecast
## # A fable: 5 x 4 [1Y]
## # Key: .model [1]
## .model Year GDP .mean
## <chr> <dbl> <dist> <dbl>
## 1 ETS(GDP) 2018 N(2e+13, 2.7e+23) 2.01e13
## 2 ETS(GDP) 2019 N(2.1e+13, 9.1e+23) 2.07e13
## 3 ETS(GDP) 2020 N(2.1e+13, 2.1e+24) 2.14e13
## 4 ETS(GDP) 2021 N(2.2e+13, 3.9e+24) 2.21e13
## 5 ETS(GDP) 2022 N(2.3e+13, 6.7e+24) 2.28e13
# Plotting ETS forecasts
autoplot(us_gdp, GDP) +
autolayer(gdp_ets_forecast, .fitted, color = "blue") +
ggtitle("GDP Forecasts using ETS")
Once again, forecasts showing a stable linear upward trend for variables
like GDP are unrealistic. They fail to account for real-world
disruptions such as international conflicts, pandemics, or economic
recessions, which often cause unexpected fluctuations.