Assignment: Exercises 9.1, 9.2, 9.3, 9.5, 9.6, 9.7, 9.8 in Hyndman

Load required libraries

9.1

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

a.)Explain the differences among these figures. Do they all indicate that the data are white noise?

Answer: The Autocorrelation Function (ACF) measures the correlation of a time series with its past values and it is very much affected by the sample size. As a result, the various figures do not all indicate (with certainty) that the data are white noise.

The ACF of the small sample (36 numbers) shows some random spikes, but, given the small sample size, these spikes could be due to random variation rather than any underlying pattern. I would therefore be wary to conclude to a white noise for such a small sample size.

The middle ACF plot (with 360 numbers) more reliably indicates white noise. This is because a larger sample size (as 360) typically leads to more reliable estimates of autocorrelation. Also, the ACF plot shows fewer random spikes compared to the figure (with 36 numbers), and most of the values remain close to zero.

Lastly, with 1,000 observations, the ACF plot becomes more stable and provides a clearer picture of the series.The third ACF plot demonstrates values very close to zero, with minimal random fluctuations.The confidence intervals is also narrower, and there are no spikes crossing these intervals, which confirms the white noise nature of the series.

b.) Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

Answer: The differences in the critical values at different distances from the mean of zero and the variations in auto correlations in the ACF plots of white noise series can be attributed to several factors related to sample size, statistical significance, and the nature of random processes. Indeed, the differences in critical values across samples arise from the sample size and its impact on the standard error of the ACF estimates. Larger samples yield smaller critical values.As sample sizes increase, the stability of the ACF improves, leading to values that more consistently reflect the true white noise characteristics (close to zero with fewer significant spikes).

9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

Load the data

gafa_stock
## # A tsibble: 5,032 x 8 [!]
## # Key:       Symbol [4]
##    Symbol Date        Open  High   Low Close Adj_Close    Volume
##    <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>     <dbl>
##  1 AAPL   2014-01-02  79.4  79.6  78.9  79.0      67.0  58671200
##  2 AAPL   2014-01-03  79.0  79.1  77.2  77.3      65.5  98116900
##  3 AAPL   2014-01-06  76.8  78.1  76.2  77.7      65.9 103152700
##  4 AAPL   2014-01-07  77.8  78.0  76.8  77.1      65.4  79302300
##  5 AAPL   2014-01-08  77.0  77.9  77.0  77.6      65.8  64632400
##  6 AAPL   2014-01-09  78.1  78.1  76.5  76.6      65.0  69787200
##  7 AAPL   2014-01-10  77.1  77.3  75.9  76.1      64.5  76244000
##  8 AAPL   2014-01-13  75.7  77.5  75.7  76.5      64.9  94623200
##  9 AAPL   2014-01-14  76.9  78.1  76.8  78.1      66.1  83140400
## 10 AAPL   2014-01-15  79.1  80.0  78.8  79.6      67.5  97909700
## # ℹ 5,022 more rows

Plot of Stock prices at closing

stock_prices <-gafa_stock|>
  autoplot(Close) + 
  labs(y = "Stocks closing prices in ($US)")
stock_prices 

Plot of Amazon stock prices

amazon_prices <-gafa_stock|>
  filter(Symbol == "AMZN")|>
  autoplot(Close) + 
  labs(y = "Amazon Daily closing prices ($US)")
amazon_prices

Plot of Amazon stock prices in 2018

amazon_2018 <-gafa_stock|>
  filter(Symbol == "AMZN", year(Date) == 2018)|>
  autoplot(Close) + 
  labs(y = "Amazon Daily closing prices ($US)")
amazon_2018

The ACF and PACF plots are evidence that the amazon series is non -stationary. The ACF (which measures the correlation of a time series with its past values) reveals a very slow decrease and the value of r1 is large and positive.

Looking at Amazon PACF

amazon_2018 <-gafa_stock|>
  filter(Symbol == "AMZN", year (Date) == 2018)|>
 PACF(Close) |>
autoplot() + labs(subtitle ="Amazon Daily closing prices ($US)")
## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.
amazon_2018

9.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

a.) Turkish GDP from global_economy.

Load the data

head(global_economy)
## # A tsibble: 6 x 9 [1Y]
## # Key:       Country [1]
##   Country     Code   Year         GDP Growth   CPI Imports Exports Population
##   <fct>       <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
## 1 Afghanistan AFG    1960  537777811.     NA    NA    7.02    4.13    8996351
## 2 Afghanistan AFG    1961  548888896.     NA    NA    8.10    4.45    9166764
## 3 Afghanistan AFG    1962  546666678.     NA    NA    9.35    4.88    9345868
## 4 Afghanistan AFG    1963  751111191.     NA    NA   16.9     9.17    9533954
## 5 Afghanistan AFG    1964  800000044.     NA    NA   18.1     8.89    9731361
## 6 Afghanistan AFG    1965 1006666638.     NA    NA   21.4    11.3     9938414

Look at the data

library(fabletools)
library(dplyr)
library(ggplot2)

# Plot of Turkey GDP (per capita)
gdp_turkey <- global_economy |> 
  filter(Country == "Turkey") |> 
  mutate(GDP_per_capita = GDP / Population)

# Plot the GDP per capita
gdp_turkey_plot <- autoplot(gdp_turkey, GDP_per_capita) + 
  labs(subtitle = "Turkish GDP per Capita ($US)")
gdp_turkey_plot

Apply a log transformation to stabilize the variance

## Log transformation of GDP per capita
gdp_turkey <- gdp_turkey |> 
  mutate(log_GDP_per_capita = log(GDP_per_capita))

# Plot the log-transformed GDP per capita
gdp_turkey_log_plot <- autoplot(gdp_turkey, log_GDP_per_capita) + 
  labs(subtitle = "Log of Turkish GDP per Capita ($US)")
gdp_turkey_log_plot

data are still non-stationary

Let’s take a seasonal component

# Convert gdp_turkey to a tsibble
gdp_turkey_tsibble <- gdp_turkey |> 
  as_tsibble(index = Year, key = Country)  # Use Year as the index and Country as the key

# Ensure the log_GDP_per_capita is a numeric vector and apply seasonal differencing
gdp_diff <- gdp_turkey_tsibble |> 
  mutate(Diffed_log_GDP = log_GDP_per_capita - lag(log_GDP_per_capita, 12))

# Plot the differenced series
gdp_diff |> 
  autoplot(Diffed_log_GDP) + 
  ggtitle("Seasonally Differenced Log GDP per Capita for Turkey") + 
  labs(y = "Differenced Log GDP per Capita", x = "Year") + 
  theme_minimal()
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

# First-order differencing

log_GDP_diff1 <- diff(log(gdp_turkey$GDP_per_capita))
# Perform second-order differencing on log_GDP_per_capita

log_GDP_diff2 <- diff(log_GDP_diff1)  
# Create a new dataframe with the differenced values
gdp_turkey_diff2 <- data.frame(
  Year = gdp_turkey$Year[-(1:2)],  # Adjust Year accordingly
  Diff2 = log_GDP_diff2
)

# Check the resulting dataset
print(head(gdp_turkey_diff2))
##   Year       Diff2
## 1 1962  0.66299987
## 2 1963  0.04278701
## 3 1964 -0.07248857
## 4 1965 -0.01002184
## 5 1966  0.10120085
## 6 1967 -0.06363837

KPSS test on the second-order differenced data

# Perform the KPSS test on the second-order differenced data
kpss_result <- kpss.test(gdp_turkey_diff2$Diff2)
## Warning in kpss.test(gdp_turkey_diff2$Diff2): p-value greater than printed
## p-value
# Print the KPSS test result
print(kpss_result)
## 
##  KPSS Test for Level Stationarity
## 
## data:  gdp_turkey_diff2$Diff2
## KPSS Level = 0.15237, Truncation lag parameter = 3, p-value = 0.1

ACF and PACF calculations

# ACF and PACF calculations using gg_tsdisplay()

# Load necessary libraries
library(tidyverse)
library(tsibble)
library(fable)
library(ggfortify)  

# Create a new tibble for gg_tsdisplay
gdp_turkey_diff2_ts <- gdp_turkey_diff2 %>%
  as_tsibble(index = Year)

# Check for stationarity using gg_tsdisplay
gdp_turkey_diff2_ts %>%
  gg_tsdisplay(Diff2, plot_type = "partial")

# ACF and PACF calculations using ggplot2

# ACF and PACF calculations
acf_values <- acf(gdp_turkey_diff2$Diff2, plot = FALSE)
pacf_values <- pacf(gdp_turkey_diff2$Diff2, plot = FALSE)

# Convert ACF and PACF to data frames for ggplot
acf_df <- data.frame(
  Lag = acf_values$lag,
  ACF = acf_values$acf
)

pacf_df <- data.frame(
  Lag = pacf_values$lag,
  PACF = pacf_values$acf
)

# Plot ACF
acf_plot <- ggplot(acf_df, aes(x = Lag, y = ACF)) +
  geom_bar(stat = "identity", fill = "blue") +
  ggtitle("ACF of Second-Order Differenced log_GDP_per_capita") +
  labs(x = "Lag", y = "ACF") +
  theme_minimal()
print(acf_plot)

# Plot PACF
pacf_plot <- ggplot(pacf_df, aes(x = Lag, y = PACF)) +
  geom_bar(stat = "identity", fill = "red") +
  ggtitle("PACF of Second-Order Differenced log_GDP_per_capita") +
  labs(x = "Lag", y = "PACF") +
  theme_minimal()

print(pacf_plot)

### Summary of the differenced data

# Summary of the differenced data
summary(gdp_turkey_diff2$Diff2)
##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
## -0.5785559 -0.0898084 -0.0009105  0.0098423  0.0963592  0.6629999

b.) Accommodation takings in the state of Tasmania from aus_accommodation

Load dataset

# Load the aus_accommodation dataset
aus_accommodation
## # A tsibble: 592 x 5 [1Q]
## # Key:       State [8]
##       Date State                        Takings Occupancy   CPI
##      <qtr> <chr>                          <dbl>     <dbl> <dbl>
##  1 1998 Q1 Australian Capital Territory    24.3      65    67  
##  2 1998 Q2 Australian Capital Territory    22.3      59    67.4
##  3 1998 Q3 Australian Capital Territory    22.5      58    67.5
##  4 1998 Q4 Australian Capital Territory    24.4      59    67.8
##  5 1999 Q1 Australian Capital Territory    23.7      58    67.8
##  6 1999 Q2 Australian Capital Territory    25.4      61    68.1
##  7 1999 Q3 Australian Capital Territory    28.2      66    68.7
##  8 1999 Q4 Australian Capital Territory    25.8      60    69.1
##  9 2000 Q1 Australian Capital Territory    27.3      60.9  69.7
## 10 2000 Q2 Australian Capital Territory    30.1      64.7  70.2
## # ℹ 582 more rows
glimpse(aus_accommodation)
## Rows: 592
## Columns: 5
## Key: State [8]
## $ Date      <qtr> 1998 Q1, 1998 Q2, 1998 Q3, 1998 Q4, 1999 Q1, 1999 Q2, 1999 Q…
## $ State     <chr> "Australian Capital Territory", "Australian Capital Territor…
## $ Takings   <dbl> 24.27889, 22.32354, 22.52939, 24.39096, 23.72143, 25.38245, …
## $ Occupancy <dbl> 65.0, 59.0, 58.0, 59.0, 58.0, 61.0, 66.0, 60.0, 60.9, 64.7, …
## $ CPI       <dbl> 67.0, 67.4, 67.5, 67.8, 67.8, 68.1, 68.7, 69.1, 69.7, 70.2, …
tas_data <- aus_accommodation |> 
    dplyr::filter(State == "Tasmania")

Apply appropriate Box-Cox transformation

# Find the appropriate Box-Cox transformation
lambda <- forecast::BoxCox.lambda(tas_accommodation$Takings)  # Use forecast:: to specify the function
## Registered S3 methods overwritten by 'forecast':
##   method                 from     
##   autoplot.Arima         ggfortify
##   autoplot.acf           ggfortify
##   autoplot.ar            ggfortify
##   autoplot.bats          ggfortify
##   autoplot.decomposed.ts ggfortify
##   autoplot.ets           ggfortify
##   autoplot.forecast      ggfortify
##   autoplot.stl           ggfortify
##   autoplot.ts            ggfortify
##   fitted.ar              ggfortify
##   fortify.ts             ggfortify
##   residuals.ar           ggfortify
lambda 
## [1] -0.005076712

Apply differencing

# Step 1: Create first differenced column (Diff_Takings)
tas_accommodation2 <- tas_accommodation |> 
  mutate(Diff_Takings = c(NA, diff(Takings)))  # First differencing

# Step 2: Create second differenced column (Diff2_Takings)
tas_accommodation2 <- tas_accommodation2 |> 
  mutate(Diff2_Takings = c(NA, diff(Diff_Takings)))  # Second differencing

# Display the first few rows of the dataset
head(tas_accommodation2, n = 3)
## # A tsibble: 3 x 8 [1Q]
## # Key:       State [1]
##      Date State    Takings Occupancy   CPI Transformed_Takings Diff_Takings
##     <qtr> <chr>      <dbl>     <dbl> <dbl>               <dbl>        <dbl>
## 1 1998 Q1 Tasmania    28.7        67  67                  3.33        NA   
## 2 1998 Q2 Tasmania    19.0        45  67.4                2.92        -9.68
## 3 1998 Q3 Tasmania    16.1        39  67.5                2.76        -2.91
## # ℹ 1 more variable: Diff2_Takings <dbl>

Check for stationarity using gg_tsdisplay()

# Step 2: Convert to tsibble format (ensure the tsibble is created from the updated dataset)
tas_accommodation_ts <- tas_accommodation2 |> 
  as_tsibble(index = Date)

# Step 3: Check structure to ensure Diff2_Takings exists
str(tas_accommodation_ts)
## tbl_ts [74 × 8] (S3: tbl_ts/tbl_df/tbl/data.frame)
##  $ Date               : qtr [1:74] 1998 Q1, 1998 Q2, 1998 Q3, 1998 Q4, 1999 Q1, 1999 Q2, 1999 ...
##    ..@ fiscal_start: num 1
##  $ State              : chr [1:74] "Tasmania" "Tasmania" "Tasmania" "Tasmania" ...
##  $ Takings            : num [1:74] 28.7 19 16.1 25.9 28.4 ...
##  $ Occupancy          : num [1:74] 67 45 39 56 66 48 41 56 66.2 49.7 ...
##  $ CPI                : num [1:74] 67 67.4 67.5 67.8 67.8 68.1 68.7 69.1 69.7 70.2 ...
##  $ Transformed_Takings: num [1:74] 3.33 2.92 2.76 3.23 3.32 ...
##   ..- attr(*, "lambda")= num -0.00508
##  $ Diff_Takings       : num [1:74] NA -9.68 -2.91 9.78 2.47 ...
##  $ Diff2_Takings      : num [1:74] NA NA 6.77 12.69 -7.32 ...
##  - attr(*, "key")= tibble [1 × 2] (S3: tbl_df/tbl/data.frame)
##   ..$ State: chr "Tasmania"
##   ..$ .rows: list<int> [1:1] 
##   .. ..$ : int [1:74] 1 2 3 4 5 6 7 8 9 10 ...
##   .. ..@ ptype: int(0) 
##   ..- attr(*, ".drop")= logi TRUE
##  - attr(*, "index")= chr "Date"
##   ..- attr(*, "ordered")= logi TRUE
##  - attr(*, "index2")= chr "Date"
##  - attr(*, "interval")= interval [1:1] 1Q
##   ..@ .regular: logi TRUE
# Step 4: Plot time series and ACF using gg_tsdisplay
gg_tsdisplay(tas_accommodation_ts, Diff2_Takings, plot_type = "partial")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).

c.) Monthly sales from souvenirs.

Load the Data

library(tsibble)
library(fable)
library(fabletools)
library(fable)
library(ggplot2)
library(ggfortify)
library(tidyverse)
library(fpp3)

souvenirs
## # A tsibble: 84 x 2 [1M]
##       Month Sales
##       <mth> <dbl>
##  1 1987 Jan 1665.
##  2 1987 Feb 2398.
##  3 1987 Mar 2841.
##  4 1987 Apr 3547.
##  5 1987 May 3753.
##  6 1987 Jun 3715.
##  7 1987 Jul 4350.
##  8 1987 Aug 3566.
##  9 1987 Sep 5022.
## 10 1987 Oct 6423.
## # ℹ 74 more rows
head(souvenirs)
## # A tsibble: 6 x 2 [1M]
##      Month Sales
##      <mth> <dbl>
## 1 1987 Jan 1665.
## 2 1987 Feb 2398.
## 3 1987 Mar 2841.
## 4 1987 Apr 3547.
## 5 1987 May 3753.
## 6 1987 Jun 3715.
# Glimpse of the souvenirs dataset
glimpse(souvenirs)
## Rows: 84
## Columns: 2
## $ Month <mth> 1987 Jan, 1987 Feb, 1987 Mar, 1987 Apr, 1987 May, 1987 Jun, 1987…
## $ Sales <dbl> 1664.81, 2397.53, 2840.71, 3547.29, 3752.96, 3714.74, 4349.61, 3…

Find the Box-Cox Transformation

library(forecast) 
## Warning: package 'forecast' was built under R version 4.3.3
# Find the appropriate Box-Cox transformation lambda
lambda <- BoxCox.lambda(souvenirs$Sales)

lambda
## [1] -0.2444328
# Apply the Box-Cox transformation
souvenirs <- souvenirs %>%
  mutate(Sales_bc = BoxCox(Sales, lambda))

Determine the Order of Differencing

# Apply first differencing
souvenirs <- souvenirs %>%
  mutate(Diff_Sales = c(NA, diff(Sales)))

# Apply second differencing if necessary
souvenirs <- souvenirs %>%
  mutate(Diff2_Sales = c(NA, diff(Diff_Sales)))

# View first few rows to check the differencing results
head(souvenirs, n = 3)
## # A tsibble: 3 x 5 [1M]
##      Month Sales Sales_bc Diff_Sales Diff2_Sales
##      <mth> <dbl>    <dbl>      <dbl>       <dbl>
## 1 1987 Jan 1665.     3.42        NA          NA 
## 2 1987 Feb 2398.     3.48       733.         NA 
## 3 1987 Mar 2841.     3.51       443.       -290.

Check Stationarity

# Visualize second differenced series
gg_tsdisplay(souvenirs, Diff2_Sales, plot_type = "partial")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).

9.5 For your retail data (from Exercise 7 in Section 2.10 on time series graphics), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data

Load the dataset

aus_retail
## # A tsibble: 64,532 x 5 [1M]
## # Key:       State, Industry [152]
##    State                        Industry           `Series ID`    Month Turnover
##    <chr>                        <chr>              <chr>          <mth>    <dbl>
##  1 Australian Capital Territory Cafes, restaurant… A3349849A   1982 Apr      4.4
##  2 Australian Capital Territory Cafes, restaurant… A3349849A   1982 May      3.4
##  3 Australian Capital Territory Cafes, restaurant… A3349849A   1982 Jun      3.6
##  4 Australian Capital Territory Cafes, restaurant… A3349849A   1982 Jul      4  
##  5 Australian Capital Territory Cafes, restaurant… A3349849A   1982 Aug      3.6
##  6 Australian Capital Territory Cafes, restaurant… A3349849A   1982 Sep      4.2
##  7 Australian Capital Territory Cafes, restaurant… A3349849A   1982 Oct      4.8
##  8 Australian Capital Territory Cafes, restaurant… A3349849A   1982 Nov      5.4
##  9 Australian Capital Territory Cafes, restaurant… A3349849A   1982 Dec      6.9
## 10 Australian Capital Territory Cafes, restaurant… A3349849A   1983 Jan      3.8
## # ℹ 64,522 more rows

Applied code

set.seed(131017)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
head(myseries, n=3)
## # A tsibble: 3 x 5 [1M]
## # Key:       State, Industry [1]
##   State              Industry       `Series ID`    Month Turnover
##   <chr>              <chr>          <chr>          <mth>    <dbl>
## 1 Northern Territory Food retailing A3349527K   1988 Apr     25.6
## 2 Northern Territory Food retailing A3349527K   1988 May     26.7
## 3 Northern Territory Food retailing A3349527K   1988 Jun     27.7
# Filter out a subset for simplicity (e.g., a specific state and industry)
myseries_vic <-  aus_retail  |>
  filter(State == "Victoria", Industry == "Cafes, restaurants and catering services")
head(myseries_vic, n =3)
## # A tsibble: 3 x 5 [1M]
## # Key:       State, Industry [1]
##   State    Industry                                `Series ID`    Month Turnover
##   <chr>    <chr>                                   <chr>          <mth>    <dbl>
## 1 Victoria Cafes, restaurants and catering servic… A3349640L   1982 Apr     36.4
## 2 Victoria Cafes, restaurants and catering servic… A3349640L   1982 May     36.2
## 3 Victoria Cafes, restaurants and catering servic… A3349640L   1982 Jun     35.7

Plot the series to visualize graphics and see if transformations are needed.

autoplot(myseries_vic, Turnover) +
labs(title = "Retail Turnover Time Series")

This plot shows that the series needs some log transformation to stabilize the variance

Apply log transformation to the series using the lambda_guerrero function

# log transformation for stabilizing variance)
lambda <- myseries_vic |> features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
lambda
## [1] 0.1734096

Lambda is not equal to 1, so we need to apply a Box-Cox transformation. ### Apply Box-Cox transformation (since lambda is not equal to 1).

# Box-Cox transformation
if (lambda != 1) { myseries_vic <- myseries_vic |> mutate(Turnover = box_cox(Turnover, lambda)) }
transformed <- TRUE

if (transformed) {
  autoplot(myseries_vic, Turnover) + 
    labs(title = "Transformed Retail Turnover Time Series")
} else {
  autoplot(myseries_vic, Turnover) + 
    labs(title = "Retail Turnover Time Series")
}

Check for stationarity using a time series plot and ACF plot

# Check for stationarity using a time series plot and ACF plot 
gg_tsdisplay(myseries_vic, Turnover, plot_type = "partial")

Apply differencing to make the series stationary

First differencing

library(dplyr)
library(tsibble)
library(lubridate)

# Apply first differencing to 'Turnover'
myseries_diff1 <- myseries_vic %>%
  mutate(Diff1 = c(NA, diff(Turnover)))  # Prepend NA to keep the length consistent

# View the resulting dataset
head(myseries_diff1)
## # A tsibble: 6 x 6 [1M]
## # Key:       State, Industry [1]
##   State    Industry                        `Series ID`    Month Turnover   Diff1
##   <chr>    <chr>                           <chr>          <mth>    <dbl>   <dbl>
## 1 Victoria Cafes, restaurants and caterin… A3349640L   1982 Apr     4.99 NA     
## 2 Victoria Cafes, restaurants and caterin… A3349640L   1982 May     4.98 -0.0103
## 3 Victoria Cafes, restaurants and caterin… A3349640L   1982 Jun     4.95 -0.0259
## 4 Victoria Cafes, restaurants and caterin… A3349640L   1982 Jul     4.89 -0.0580
## 5 Victoria Cafes, restaurants and caterin… A3349640L   1982 Aug     4.78 -0.115 
## 6 Victoria Cafes, restaurants and caterin… A3349640L   1982 Sep     4.86  0.0774

Plot ACF and PACF of the first differenced series

gg_tsdisplay(myseries_diff1, Diff1, plot_type = "partial")
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

Check for higher-order differencing (second differencing)

# Apply second differencing to 'Turnover'
myseries_diff2 <- myseries_diff1 %>%
  mutate(Diff2 = c(NA, NA, diff(Turnover, differences = 2)))  # Prepend two NAs to keep the length consistent

head(myseries_diff2)
## # A tsibble: 6 x 7 [1M]
## # Key:       State, Industry [1]
##   State    Industry                `Series ID`    Month Turnover   Diff1   Diff2
##   <chr>    <chr>                   <chr>          <mth>    <dbl>   <dbl>   <dbl>
## 1 Victoria Cafes, restaurants and… A3349640L   1982 Apr     4.99 NA      NA     
## 2 Victoria Cafes, restaurants and… A3349640L   1982 May     4.98 -0.0103 NA     
## 3 Victoria Cafes, restaurants and… A3349640L   1982 Jun     4.95 -0.0259 -0.0156
## 4 Victoria Cafes, restaurants and… A3349640L   1982 Jul     4.89 -0.0580 -0.0321
## 5 Victoria Cafes, restaurants and… A3349640L   1982 Aug     4.78 -0.115  -0.0571
## 6 Victoria Cafes, restaurants and… A3349640L   1982 Sep     4.86  0.0774  0.193

Plot ACF and PACF of the second differenced series

gg_tsdisplay(myseries_diff2, Diff2, plot_type = "partial")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).

Confirm stationarity with the KPSS test

library(fabletools)
library(dplyr)
# KPSS test for the original Turnover
kpss_original <- myseries_vic |> 
  features(Turnover, unitroot_kpss)
kpss_original
## # A tibble: 1 × 4
##   State    Industry                                 kpss_stat kpss_pvalue
##   <chr>    <chr>                                        <dbl>       <dbl>
## 1 Victoria Cafes, restaurants and catering services      7.34        0.01
library(fabletools)
library(dplyr)

# KPSS test for first differencing
kpss_diff1 <- myseries_diff1 |> 
  features(Diff1, unitroot_kpss)
kpss_diff1
## # A tibble: 1 × 4
##   State    Industry                                 kpss_stat kpss_pvalue
##   <chr>    <chr>                                        <dbl>       <dbl>
## 1 Victoria Cafes, restaurants and catering services    0.0205         0.1
library(fabletools)
library(dplyr)

# KPSS test for second differencing
kpss_diff2 <- myseries_diff2 |> 
  features(Diff2, unitroot_kpss)
kpss_diff2
## # A tibble: 1 × 4
##   State    Industry                                 kpss_stat kpss_pvalue
##   <chr>    <chr>                                        <dbl>       <dbl>
## 1 Victoria Cafes, restaurants and catering services   0.00617         0.1
library(fabletools)
library(dplyr)
# Use the KPSS test to confirm stationarity 
kpss_result <- myseries_diff1 |>
features(Diff1, unitroot_kpss)
kpss_result
## # A tibble: 1 × 4
##   State    Industry                                 kpss_stat kpss_pvalue
##   <chr>    <chr>                                        <dbl>       <dbl>
## 1 Victoria Cafes, restaurants and catering services    0.0205         0.1
# KPSS test on Turnover
kpss_turnover <- kpss.test(myseries_diff2$Turnover)
## Warning in kpss.test(myseries_diff2$Turnover): p-value smaller than printed
## p-value
print(kpss_turnover)
## 
##  KPSS Test for Level Stationarity
## 
## data:  myseries_diff2$Turnover
## KPSS Level = 7.3383, Truncation lag parameter = 5, p-value = 0.01
# KPSS test on first differencing
kpss_diff1 <- kpss.test(myseries_diff2$Diff1, null = "Trend")
## Warning in kpss.test(myseries_diff2$Diff1, null = "Trend"): p-value greater
## than printed p-value
print(kpss_diff1)
## 
##  KPSS Test for Trend Stationarity
## 
## data:  myseries_diff2$Diff1
## KPSS Trend = 0.020508, Truncation lag parameter = 5, p-value = 0.1
# KPSS test on second differencing
kpss_diff2 <- kpss.test(myseries_diff2$Diff2, null = "Trend")
## Warning in kpss.test(myseries_diff2$Diff2, null = "Trend"): p-value greater
## than printed p-value
print(kpss_diff2)
## 
##  KPSS Test for Trend Stationarity
## 
## data:  myseries_diff2$Diff2
## KPSS Trend = 0.005508, Truncation lag parameter = 5, p-value = 0.1

Results interpretation:

Turnover: Non-stationary. The KPSS test for level stationarity indicates that the series is non-stationary. Since the p-value is significantly smaller than 0.05, we reject the null hypothesis, suggesting that the Turnover series is not stationary.

First Differencing (Diff1): Likely stationary (borderline). The KPSS test for trend stationarity indicates that the series may be stationary. The p-value (0.1) is greater than the common significance level of 0.05 but less than 0.1, suggesting that we cannot reject the null hypothesis of stationarity at the 0.05 level, but we may consider it borderline. An additional tests for confirmation might be useful.

Second Differencing (Diff2): Likely stationary (borderline). Similar to the first differencing, the KPSS test for trend stationarity indicates that the second differenced series may also be stationary. Again, the p-value (0.1) is at the borderline for the 0.05 level, suggesting that we cannot reject the null hypothesis of stationarity.

Overall, we can proceed with modeling using either Diff1 or Diff2 based on the requirements of our forecasting approach.

9.6 Simulate and plot some data from simple ARIMA models.

a. Use the following R code to generate data from an AR(1) model with Ï• 1 = 0.6

and σ2 = 1. The process starts with y1 = 0.

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

The code provides a comprehensive way to visualize and analyze the impact of changing parameters on ARIMA model behavior.

b. Simulating and Plotting an AR(1) Model. Produce a time plot for the series. How does the plot change as you change ϕ1 ?

# simulate data from an AR(1) model with ϕ1=0.6 ϕ and σ2=1.

library(tsibble)
library(ggplot2)

# AR(1) Simulation
set.seed(1310) # For reproducibility
y_ar1 <- numeric(100)
e_ar1 <- rnorm(100)

for(i in 2:100) {
  y_ar1[i] <- 0.6 * y_ar1[i-1] + e_ar1[i]
}

# Create a tsibble
ar1_data <- tsibble(idx = seq_len(100), y = y_ar1, index = idx)

# Plot the AR(1) series
ggplot(ar1_data, aes(x = idx, y = y)) +
  geom_line() +
  labs(title = "AR(1) Model: φ1 = 0.6", x = "Time", y = "Value") +
  theme_minimal()

Now, Since we can modify the coefficient, let’s see how the plot changes with different values of ϕ1:

For ϕ1= 0.9 (more persistence)

# AR(1) Simulation with φ1 = 0.9
y_ar1_high <- numeric(100)
for(i in 2:100) {
  y_ar1_high[i] <- 0.9 * y_ar1_high[i-1] + e_ar1[i]
}
ar1_high_data <- tsibble(idx = seq_len(100), y = y_ar1_high, index = idx)

# Plot
ggplot(ar1_high_data, aes(x = idx, y = y)) +
  geom_line() +
  labs(title = "AR(1) Model: φ1 = 0.9", x = "Time", y = "Value") +
  theme_minimal()

For ϕ1 = −0.6 (oscillating behavior):

# AR(1) Simulation with φ1 = -0.6
y_ar1_neg <- numeric(100)
for(i in 2:100) {
  y_ar1_neg[i] <- -0.6 * y_ar1_neg[i-1] + e_ar1[i]
}
ar1_neg_data <- tsibble(idx = seq_len(100), y = y_ar1_neg, index = idx)

# Plot
ggplot(ar1_neg_data, aes(x = idx, y = y)) +
  geom_line() +
  labs(title = "AR(1) Model: φ1 = -0.6", x = "Time", y = "Value") +
  theme_minimal()

e. Simulating and Plotting an MA(1) Model

generating data from an MA(1) model with

θ1 = 0.6

# MA(1) Simulation
set.seed(1310) # For reproducibility
y_ma1 <- numeric(100)
e_ma1 <- rnorm(100)

for(i in 2:100) {
  y_ma1[i] <- e_ma1[i] + 0.6 * e_ma1[i-1]
}

# Create a tsibble
ma1_data <- tsibble(idx = seq_len(100), y = y_ma1, index = idx)

# Plot the MA(1) series
ggplot(ma1_data, aes(x = idx, y = y)) +
  geom_line() +
  labs(title = "MA(1) Model: θ1 = 0.6", x = "Time", y = "Value") +
  theme_minimal()

Let see how the plot changes with different values of ϕ1:

For ϕ1= 0.9:

# MA(1) Simulation with θ1 = 0.9
y_ma1_high <- numeric(100)
for(i in 2:100) {
  y_ma1_high[i] <- e_ma1[i] + 0.9 * e_ma1[i-1]
}
ma1_high_data <- tsibble(idx = seq_len(100), y = y_ma1_high, index = idx)

# Plot
ggplot(ma1_high_data, aes(x = idx, y = y)) +
  geom_line() +
  labs(title = "MA(1) Model: θ1 = 0.9", x = "Time", y = "Value") +
  theme_minimal()

For θ1 = 0.6

# MA(1) Simulation with θ1 = -0.6
y_ma1_neg <- numeric(100)
for(i in 2:100) {
  y_ma1_neg[i] <- e_ma1[i] - 0.6 * e_ma1[i-1]
}
ma1_neg_data <- tsibble(idx = seq_len(100), y = y_ma1_neg, index = idx)

# Plot
ggplot(ma1_neg_data, aes(x = idx, y = y)) +
  geom_line() +
  labs(title = "MA(1) Model: θ1 = -0.6", x = "Time", y = "Value") +
  theme_minimal()

Conclusion: One notices that while the ARMA(1,1) model shows a more stationary pattern, the AR(2) model shows non-stationary behavior due to its parameter values.

3. Simulating and Plotting an ARMA(1,1) Model

Let’s generate data from an ARMA(1,1) model with θ1 = 0.6 and θ1 = 0.6

# ARMA(1,1) Simulation
set.seed(1310) # For reproducibility
y_arma <- numeric(100)
e_arma <- rnorm(100)

for(i in 2:100) {
  y_arma[i] <- 0.6 * y_arma[i-1] + e_arma[i] + 0.6 * e_arma[i-1]
}

# Create a tsibble
arma_data <- tsibble(idx = seq_len(100), y = y_arma, index = idx)

# Plot the ARMA(1,1) series
ggplot(arma_data, aes(x = idx, y = y)) +
  geom_line() +
  labs(title = "ARMA(1,1) Model: φ1 = 0.6, θ1 = 0.6", x = "Time", y = "Value") +
  theme_minimal()

4. Simulating and Plotting an AR(2) Model

let’s generate data from an AR(2) model with ϕ1

= − 0.8 and ϕ2 = 0.3

# AR(2) Simulation
set.seed(1310) # For reproducibility
y_ar2 <- numeric(100)
for(i in 3:100) {
  y_ar2[i] <- -0.8 * y_ar2[i-1] + 0.3 * y_ar2[i-2] + e_ar1[i]
}

# Create a tsibble
ar2_data <- tsibble(idx = seq_len(100), y = y_ar2, index = idx)

# Plot the AR(2) series
ggplot(ar2_data, aes(x = idx, y = y)) +
  geom_line() +
  labs(title = "AR(2) Model: φ1 = -0.8, φ2 = 0.3", x = "Time", y = "Value") +
  theme_minimal()

9.7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods. Write the model in terms of the backshift operator. Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a. Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens. Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

# Load required libraries
library(fable)
library(fabletools)
library(tsibble)
library(dplyr)
library(ggplot2)
library(tsibbledata)

# Load the dataset
aus_airpassengers
## # A tsibble: 47 x 2 [1Y]
##     Year Passengers
##    <dbl>      <dbl>
##  1  1970       7.32
##  2  1971       7.33
##  3  1972       7.80
##  4  1973       9.38
##  5  1974      10.7 
##  6  1975      11.1 
##  7  1976      10.9 
##  8  1977      11.3 
##  9  1978      12.1 
## 10  1979      13.0 
## # ℹ 37 more rows
glimpse(aus_airpassengers)
## Rows: 47
## Columns: 2
## $ Year       <dbl> 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979,…
## $ Passengers <dbl> 7.3187, 7.3266, 7.7956, 9.3846, 10.6647, 11.0551, 10.8643, …

a. Find an Appropriate ARIMA Model: Use the ARIMA() function to find an appropriate ARIMA model.

library(urca)
# a. Fit an ARIMA model to the data
fit <- aus_airpassengers %>%
  model(ARIMA(Passengers))

# Instead of summary, use tidy() to get model coefficients
model_summary <- tidy(fit)
print(model_summary)
## # A tibble: 1 × 6
##   .model            term  estimate std.error statistic  p.value
##   <chr>             <chr>    <dbl>     <dbl>     <dbl>    <dbl>
## 1 ARIMA(Passengers) ma1     -0.896    0.0594     -15.1 3.24e-19
# Load required libraries
library(fable)
library(fabletools)
library(tsibble)
library(dplyr)
library(ggplot2)
library(tsibbledata)
library(gridExtra)

# Load the dataset
data("aus_airpassengers")

# a. Fit an ARIMA model to the data
fit <- aus_airpassengers %>%
  model(ARIMA(Passengers))

# Summary of the model coefficients
model_summary <- tidy(fit)
print(model_summary)
## # A tibble: 1 × 6
##   .model            term  estimate std.error statistic  p.value
##   <chr>             <chr>    <dbl>     <dbl>     <dbl>    <dbl>
## 1 ARIMA(Passengers) ma1     -0.896    0.0594     -15.1 3.24e-19
# Extract residuals using augment()
residuals_fit <- fit %>% 
  augment() %>%         # Augment the fitted model with residuals
  pull(.resid)         # Pull the residuals column

# Set up a 2x2 plotting area for diagnostics
par(mfrow = c(1, 1))

# Residuals vs. Time
plot.ts(residuals_fit, main = "Residuals", ylab = "Residuals", xlab = "Time")
abline(h = 0, col = "red")

# Histogram of residuals
hist(residuals_fit, main = "Histogram of Residuals", xlab = "Residuals", breaks = 10)

# Q-Q plot using ggplot2
qq_plot_data <- data.frame(residuals = residuals_fit)
ggplot(qq_plot_data, aes(sample = residuals)) +
  geom_qq() +
  geom_qq_line() +
  ggtitle("Q-Q Plot of Residuals") +
  xlab("Theoretical Quantiles") +
  ylab("Sample Quantiles")

# ACF of residuals
acf(residuals_fit, main = "ACF of Residuals")

# Reset plotting parameters to default
par(mfrow = c(1, 1))

# Plot forecast for the next 10 periods
forecasts <- fit %>%
  forecast(h = "10 years")

# Plot the forecasts
autoplot(forecasts) +
  ggtitle("ARIMA Model Forecast for Air Passengers") +
  labs(y = "Passengers (in millions)")

b. Write the Model in Terms of the Backshift Operator: The backshift operator can be used to express the ARIMA model. For example, if the model selected is ARIMA(1,1,1), it can be written as: \[(1−ϕ1​B)(1−B)(yt​−c)=(1+θ1​B)et\]

c. ARIMA(0,1,0) Model with Drift: Fit an ARIMA(0,1,0) model with drift and plot forecasts.

# Fit ARIMA(0,1,0) with drift
fit_drift <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ 1))

# Forecast for the next 10 periods
forecasts_drift <- fit_drift %>%
  forecast(h = "10 years")

# Plot the forecasts
autoplot(forecasts_drift) +
  ggtitle("ARIMA(0,1,0) Model with Drift Forecast for Air Passengers") +
  labs(y = "Passengers (in millions)")

Comparison: We can compare the forecast plots visually by plotting them on the same graph.

###d. ARIMA(2,1,2) Model with Drift: Fit an ARIMA(2,1,2) model with drift and plot forecasts.

# Load the dataset
data("aus_airpassengers")

# Fit ARIMA(2,1,2) with a constant term
fit_arima212 <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ pdq(2, 1, 2) + PDQ(0, 0, 0)))  # Specify ARIMA(2,1,2) explicitly
## Warning: It looks like you're trying to fully specify your ARIMA model but have not said if a constant should be included.
## You can include a constant using `ARIMA(y~1)` to the formula or exclude it by adding `ARIMA(y~0)`.
## Warning: 1 error encountered for ARIMA(Passengers ~ pdq(2, 1, 2) + PDQ(0, 0, 0))
## [1] Could not find an appropriate ARIMA model.
## This is likely because automatic selection does not select models with characteristic roots that may be numerically unstable.
## For more details, refer to https://otexts.com/fpp3/arima-r.html#plotting-the-characteristic-roots
# Forecast for the next 10 periods
forecasts_arima212 <- fit_arima212 %>%
  forecast(h = "10 years")

# Plot the forecasts
autoplot(aus_airpassengers, Passengers) + 
  autolayer(forecasts_arima212, level = NULL) +  # Add forecast with no confidence interval
  ggtitle("ARIMA(2,1,2) Model with Constant Term Forecast for Air Passengers") + 
  labs(y = "Passengers")
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).

Remove the Constant and See What Happens

You can fit the model again without the constant and observe the results.

# Load the dataset
data("aus_airpassengers")

# Fit ARIMA(2,1,2) without a constant term
fit_arima212_no_const <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ pdq(2, 1, 2)))  # Removed the constant
## Warning: It looks like you're trying to fully specify your ARIMA model but have not said if a constant should be included.
## You can include a constant using `ARIMA(y~1)` to the formula or exclude it by adding `ARIMA(y~0)`.
## Warning: 1 error encountered for ARIMA(Passengers ~ pdq(2, 1, 2))
## [1] Could not find an appropriate ARIMA model.
## This is likely because automatic selection does not select models with characteristic roots that may be numerically unstable.
## For more details, refer to https://otexts.com/fpp3/arima-r.html#plotting-the-characteristic-roots
# Forecast for the next 10 periods
forecasts_arima212_no_const <- fit_arima212_no_const %>%
  forecast(h = "10 years")

# Plot the forecasts
autoplot(aus_airpassengers, Passengers) + 
  autolayer(forecasts_arima212_no_const, level = NULL) +  # Add forecast with no confidence interval
  ggtitle("ARIMA(2,1,2) Model Without Constant Term Forecast for Air Passengers") + 
  labs(y = "Passengers")
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).

### Combined Plot of Both ARIMA Models

# Load necessary libraries
library(tsibble)
library(fable)
library(ggplot2)
library(dplyr)

# Load the dataset
data("aus_airpassengers")

# Convert the dataset to a tsibble
aus_airpassengers_tsibble <- as_tsibble(aus_airpassengers, index = Year)

# Fit ARIMA(2,1,2) with a constant term (using 1 to indicate constant)
fit_arima212 <- aus_airpassengers_tsibble %>%
  model(ARIMA(Passengers ~ 1 + pdq(2, 1, 2)))  # Model with constant

# Forecast for the next 10 years with constant
forecasts_arima212 <- fit_arima212 %>%
  forecast(h = "10 years") %>%
  mutate(Type = "With Constant")  # Add type column

# Fit ARIMA(2,1,2) without a constant term (using 0 to exclude constant)
fit_arima212_no_const <- aus_airpassengers_tsibble %>%
  model(ARIMA(Passengers ~ 0 + pdq(2, 1, 2)))  # Model without constant
## Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 2))
## [1] non-stationary AR part from CSS
# Forecast for the next 10 years without constant
forecasts_arima212_no_const <- fit_arima212_no_const %>%
  forecast(h = "10 years") %>%
  mutate(Type = "Without Constant")  # Add type column

# Combine the forecasts for plotting
forecasts_combined <- bind_rows(
  forecasts_arima212,
  forecasts_arima212_no_const
)

# Plot both forecasts side by side using facet_wrap
ggplot() + 
  geom_line(data = aus_airpassengers_tsibble, aes(x = Year, y = Passengers), color = "black", size = 1, linetype = "dashed") +  # Original data
  geom_line(data = forecasts_combined, aes(x = Year, y = .mean, color = Type), size = 1) +
  facet_wrap(~ Type, scales = "free_y") +  # Create facets for each forecast type
  ggtitle("ARIMA(2,1,2) Model Forecasts for Air Passengers") + 
  labs(y = "Passengers", x = "Year") +
  scale_color_manual(values = c("With Constant" = "blue", "Without Constant" = "red")) +
  theme(legend.title = element_blank())
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).

With Constant, the model includes a drift term that allows for a trend in the data, resulting in a forecast that may continue to grow or decline depending on the trend. Without Constant, the model assumes the differenced series has no average change over time, leading to forecasts that eventually flatten out. Removing the constant is appropriate when we believe that the long-term behavior of the series should not exhibit a linear trend. Overall, if we need to model a series with a persistent trend, it’s usually better to include the constant.

9.8

For the United States GDP series (from global_economy):

  1. if necessary, find a suitable Box-Cox transformation for the data;
  2. fit a suitable ARIMA model to the transformed data using ARIMA();
  3. try some other plausible models by experimenting with the orders chosen;
  4. choose what you think is the best model and check the residual diagnostics;
  5. produce forecasts of your fitted model. Do the forecasts look reasonable?
  6. compare the results with what you would obtain using ETS() (with no transformation).

Load the dataset

global_economy
## # A tsibble: 15,150 x 9 [1Y]
## # Key:       Country [263]
##    Country     Code   Year         GDP Growth   CPI Imports Exports Population
##    <fct>       <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
##  1 Afghanistan AFG    1960  537777811.     NA    NA    7.02    4.13    8996351
##  2 Afghanistan AFG    1961  548888896.     NA    NA    8.10    4.45    9166764
##  3 Afghanistan AFG    1962  546666678.     NA    NA    9.35    4.88    9345868
##  4 Afghanistan AFG    1963  751111191.     NA    NA   16.9     9.17    9533954
##  5 Afghanistan AFG    1964  800000044.     NA    NA   18.1     8.89    9731361
##  6 Afghanistan AFG    1965 1006666638.     NA    NA   21.4    11.3     9938414
##  7 Afghanistan AFG    1966 1399999967.     NA    NA   18.6     8.57   10152331
##  8 Afghanistan AFG    1967 1673333418.     NA    NA   14.2     6.77   10372630
##  9 Afghanistan AFG    1968 1373333367.     NA    NA   15.2     8.90   10604346
## 10 Afghanistan AFG    1969 1408888922.     NA    NA   15.0    10.1    10854428
## # ℹ 15,140 more rows
head(global_economy)
## # A tsibble: 6 x 9 [1Y]
## # Key:       Country [1]
##   Country     Code   Year         GDP Growth   CPI Imports Exports Population
##   <fct>       <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
## 1 Afghanistan AFG    1960  537777811.     NA    NA    7.02    4.13    8996351
## 2 Afghanistan AFG    1961  548888896.     NA    NA    8.10    4.45    9166764
## 3 Afghanistan AFG    1962  546666678.     NA    NA    9.35    4.88    9345868
## 4 Afghanistan AFG    1963  751111191.     NA    NA   16.9     9.17    9533954
## 5 Afghanistan AFG    1964  800000044.     NA    NA   18.1     8.89    9731361
## 6 Afghanistan AFG    1965 1006666638.     NA    NA   21.4    11.3     9938414

Extract US GDP series

# extract US GDP series
us_gdp <- global_economy |>
  filter(Country == "United States") |>
 dplyr:: select(GDP)
head(us_gdp)
## # A tsibble: 6 x 2 [1Y]
##            GDP  Year
##          <dbl> <dbl>
## 1 543300000000  1960
## 2 563300000000  1961
## 3 605100000000  1962
## 4 638600000000  1963
## 5 685800000000  1964
## 6 743700000000  1965

a. Box-Cox transformation using lambda_guerrero

# Find the ideal lambda
 gdp_lambda <- us_gdp |>
  features(GDP, features = guerrero) |>
  pull(lambda_guerrero)
gdp_lambda
## [1] 0.2819443
#Box-Cox transformation 
us_gdp <- us_gdp |>
  mutate(GDP_transformed = box_cox(GDP, gdp_lambda))
head(us_gdp, n = 3)
## # A tsibble: 3 x 3 [1Y]
##            GDP  Year GDP_transformed
##          <dbl> <dbl>           <dbl>
## 1 543300000000  1960           7215.
## 2 563300000000  1961           7289.
## 3 605100000000  1962           7438.

b. Fit a suitable ARIMA model to the transformed data using ARIMA()

library(urca)
gdp_arima <- us_gdp |> 
  model(ARIMA(GDP_transformed))
report(gdp_arima)
## Series: GDP_transformed 
## Model: ARIMA(1,1,0) w/ drift 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

c. Try some other plausible models by experimenting with the orders chosen

Compare results with ETS (with no transformation)

##Compare results with ETS (no transformation):
gdp_ets <- us_gdp %>% 
  model(ETS(GDP))
gdp_ets_forecast <- gdp_ets %>% 
  forecast(h = "5 years")
gdp_ets 
## # A mable: 1 x 1
##     `ETS(GDP)`
##        <model>
## 1 <ETS(M,A,N)>

# Plotting ETS forecasts

# Plot the GDP series with the 10-step-ahead forecast
autoplot(us_gdp, GDP) +
  autolayer(gdp_ets_forecast, GDP, series = "Forecast", color = "blue") +
  ggtitle("GDP Forecasts using ETS Model") +
  ylab("GDP") +
  xlab("Year") +
  theme_minimal()
## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), :
## Ignoring unknown parameters: `series`
## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `series`

gdp_arima <- us_gdp |>
  model(ARIMA(GDP_transformed))

# Generate forecast for 5 periods ahead
gdp_forecast <- gdp_arima |>
  forecast(h = 5)
# Plot the forecast
gdp_forecast |>
  autoplot(us_gdp) +
  labs(title = "GDP Forecast using ARIMA Model",
       y = "Transformed GDP",
       x = "Year")

d.choose what you think is the best model and check the residual diagnostics

# Fit ARIMA models
gdp_arima <- us_gdp |> 
  model(ARIMA(GDP_transformed))
gdp_arima2 <- us_gdp |> 
  model(ARIMA(GDP_transformed ~ pdq(1,1,0) + PDQ(0,1,1)))
gdp_arima
## # A mable: 1 x 1
##   `ARIMA(GDP_transformed)`
##                    <model>
## 1  <ARIMA(1,1,0) w/ drift>

Compare AIC values and choose the best model

# Compare AIC values and choose the best model
aic_arima1 <- glance(gdp_arima) %>% pull(AIC)
aic_arima2 <- glance(gdp_arima2) %>% pull(AIC)
best_model <- ifelse(aic_arima1 < aic_arima2, gdp_arima, gdp_arima2)
best_model
## [[1]]
## <lst_mdl[1]>
## [1] <ARIMA(1,1,0) w/ drift>

Check residual diagnostics

# Convert to mable to use gg_tsresiduals
best_model <- if(aic_arima1 < aic_arima2) {
  gdp_arima
} else {
  gdp_arima2
}
best_model |>
  gg_tsresiduals() + 
  ggtitle("Residuals of the Best ARIMA Model")

# Produce forecasts
gdp_forecast <- best_model |>
  forecast(h = "5 years")

e. Produce forecasts of your fitted model. Do the forecasts look reasonable?

# Produce forecasts
gdp_forecast <- best_model |>
  forecast(h = "5 years")
gdp_forecast
## # A fable: 5 x 4 [1Y]
## # Key:     .model [1]
##   .model                                             Year GDP_transformed  .mean
##   <chr>                                             <dbl>          <dist>  <dbl>
## 1 ARIMA(GDP_transformed ~ pdq(1, 1, 0) + PDQ(0, 1,…  2018  N(19996, 5479) 19996.
## 2 ARIMA(GDP_transformed ~ pdq(1, 1, 0) + PDQ(0, 1,…  2019 N(20215, 17136) 20215.
## 3 ARIMA(GDP_transformed ~ pdq(1, 1, 0) + PDQ(0, 1,…  2020 N(20434, 32397) 20434.
## 4 ARIMA(GDP_transformed ~ pdq(1, 1, 0) + PDQ(0, 1,…  2021 N(20652, 49472) 20652.
## 5 ARIMA(GDP_transformed ~ pdq(1, 1, 0) + PDQ(0, 1,…  2022 N(20871, 67414) 20871.
autoplot(us_gdp, GDP) + 
  autolayer(gdp_forecast, .fitted, color = "red") + 
  ggtitle("GDP Forecasts")

Forecasts showing a stable linear upward trend for variables like GDP are unrealistic. They fail to account for real-world disruptions such as international conflicts, pandemics, or economic recessions, which often cause unexpected fluctuations.

f. compare the results with what you would obtain using ETS() (with no transformation)

# Compare results with ETS (no transformation)
gdp_ets <- us_gdp |> 
  model(ETS(GDP))
gdp_ets
## # A mable: 1 x 1
##     `ETS(GDP)`
##        <model>
## 1 <ETS(M,A,N)>
gdp_ets_forecast <- gdp_ets |>
  forecast(h = "5 years")
gdp_ets_forecast 
## # A fable: 5 x 4 [1Y]
## # Key:     .model [1]
##   .model    Year                 GDP   .mean
##   <chr>    <dbl>              <dist>   <dbl>
## 1 ETS(GDP)  2018   N(2e+13, 2.7e+23) 2.01e13
## 2 ETS(GDP)  2019 N(2.1e+13, 9.1e+23) 2.07e13
## 3 ETS(GDP)  2020 N(2.1e+13, 2.1e+24) 2.14e13
## 4 ETS(GDP)  2021 N(2.2e+13, 3.9e+24) 2.21e13
## 5 ETS(GDP)  2022 N(2.3e+13, 6.7e+24) 2.28e13
# Plotting ETS forecasts
autoplot(us_gdp, GDP) + 
  autolayer(gdp_ets_forecast, .fitted, color = "blue") + 
  ggtitle("GDP Forecasts using ETS")

Once again, forecasts showing a stable linear upward trend for variables like GDP are unrealistic. They fail to account for real-world disruptions such as international conflicts, pandemics, or economic recessions, which often cause unexpected fluctuations.