In this assignment, the problems 9.1, 9.2, 9.3, 9.5, 9.6, 9.7,and 9.8 have been solved from the Hyndman book. (book link:https://otexts.com/fpp3/toolbox-exercises.html).

Libraries

library(fpp3)

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.0 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.5
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.3.2
## ✔ lubridate   1.9.3     ✔ fable       0.3.4
## ✔ ggplot2     3.5.1     ✔ fabletools  0.4.2

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

library(gridExtra)

## 
## Attaching package: 'gridExtra'

## The following object is masked from 'package:dplyr':
## 
##     combine

Question-9.1:

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise?

The figures display ACF plots for white noise series with 36, 360, and 1,000 data points. In the first plot (36 data points), several spikes around ± 0.25 are visible, which i think are the result from random variation and this is typical in small samples. Despite these fluctuations, the ACF values fall within the expected range for white noise. As the sample size increases to 360 and 1,000 data points, the spikes become smaller with the ACF values very close to around zero. It indicates the reduced random variation in larger samples.

All three figures indicate that the data are white noise. While the smaller sample shows more pronounced fluctuations within the expected bounds. The larger samples confirm that the data exhibit no significant correlation, trends, or seasonal patterns, which is consistent with the characteristics of white noise.

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The critical values are dependent on the sample size. As the sample size increases, the bounds become narrower. For the smaller dataset (36 data points), the critical values are farther from zero (± 1.96/sqrt(36) ~ ± 0.33). It allows more random fluctuation. For larger datasets (360 or 1000 data points), the critical values move closer to zero (± 1.96/sqrt(360) ~ ± 0.10 or ± 1.96/sqrt(1000) ~ ± 0.06). Because the larger samples provide more precise estimates of the autocorrelation and reduce the room for random variation.

Although all the figures refer to white noise, the autocorrelations differ due to random variation and sample size. In smaller samples (36 data points), there is more random fluctuation found. This results in larger spikes in the ACF. As the sample size increases, the random variation diminishes,and the autocorrelation values get closer to zero.This indicates that the larger samples give a more accurate representation of the true characteristics of white noise. It shows no significant autocorrelation at any lag.

Question-9.2:

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

#head(gafa_stock)
#view(gafa_stock)

amzn_close<-gafa_stock|>filter(Symbol=="AMZN",year(Date) == 2015)
#head(amzn_close)

amzn_close|>autoplot(Close)

amzn_close |> ACF(Close) |>
  autoplot() + labs(subtitle = "Amazon closing stock price")

## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.

amzn_close |> ACF(difference(Close)) |>
  autoplot() + labs(subtitle = "Changes in Amazon closing stock price")

## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.

gg_tsdisplay(amzn_close, Close, plot_type = "partial")

## Warning: Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.

In the time series plot above, an upward trend with changing level observed, which indicating that the series is non-stationary.

The ACF plot shoes that the autocorrelation values decrease slowly as the lags increase rather to decrease quickly to zero. This indicates that the time series is non-stationary. Also, the r1 value here is large and positive, which is another indication of non-stationary for this time series data.

The PACF plot above shows a large initial spike, which is also a clear indication of non-stationary for this time series data.

Question-9.3:

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Turkish GDP from global_economy.

#view(global_economy)
turkish_gdp <- global_economy|>filter(Country == "Turkey")

gg_tsdisplay(turkish_gdp, GDP, plot_type = "partial")

From the time series of the Turkish GDP, ACF and PACF plots, it is clearly seen that the series is non-stationary.

Finding lambda for box-cox transformation:

lambda1 <- turkish_gdp |>
  features(GDP,features=guerrero)|>
  pull(lambda_guerrero)
lambda1

## [1] 0.1572187

The suggesting value of the lambda for the transformation is found to be 0.16.

turkish_gdp |>
  features(box_cox(GDP, lambda1), unitroot_ndiffs)

## # A tibble: 1 × 2
##   Country ndiffs
##   <fct>    <int>
## 1 Turkey       1

The appropriate differencing order is found to be 1 after the transformation.

plot1 <- turkish_gdp |>
  autoplot(difference(box_cox(GDP, lambda1), 1)) +
  labs(title="Turkkish GDP After Transformation & Differencing",y="GDP")

plot2 <- turkish_gdp |>
  ACF(difference(box_cox(GDP, lambda1), 1)) |>
  autoplot() + labs(title=" ACF of Turkish GDP After Transformation & Differencing")

plot3<- turkish_gdp |>
  PACF(difference(box_cox(GDP, lambda1), 1)) |>
  autoplot() + labs(title="PACF of Turkish GDP After Transformation & Differencing")

grid.arrange(plot1, plot2, plot3, nrow=2)

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

The time series exhibits no trends or seasonality, and the ACF and PACF plots indicate that all spikes fall within the significance bounds. These observations suggest that the data has become stationary.

Accommodation takings in the state of Tasmania from aus_accommodation.

#view(aus_accommodation)

tasmania_acom <- aus_accommodation |>
  filter(State == "Tasmania")

plot4 <- tasmania_acom |>
  autoplot(Takings) +
  labs(title="Tasmania Accommodation Takings")

plot5 <- tasmania_acom|>
  ACF(Takings) |>
  autoplot() + labs(title="ACF of Accommodation Takings in Tasmania ")

plot6<- tasmania_acom|>
  PACF(Takings) |>
  autoplot() + labs(title="PACF of Accommodation Takings in Tasmania ")

grid.arrange(plot4, plot5, plot6, nrow = 2)

The time series of accommodation takings in Tasmania, ACF and PACF plots show that the series is non-stationary.

Finding lambda:

lambda2 <- tasmania_acom|>
  features(Takings,features=guerrero)|>
  pull(lambda_guerrero)
lambda2

## [1] 0.001819643

The lambda value is found to be 0.002 for the box-cox transformation.

tasmania_acom |>
  features(box_cox(Takings, lambda2), unitroot_nsdiffs)

## # A tibble: 1 × 2
##   State    nsdiffs
##   <chr>      <int>
## 1 Tasmania       1

After the transformation, it is found that the data needs one order of seasonal differencing.

tasmania_acom |>
  features(difference(box_cox(Takings, lambda2),4), unitroot_ndiffs)

## # A tibble: 1 × 2
##   State    ndiffs
##   <chr>     <int>
## 1 Tasmania      0

The number of differecing is found to be zero (0), suggesting that there is no further differencing is needed for the data.

plot7 <- tasmania_acom |>
  autoplot(difference(box_cox(Takings, lambda2), 4)) +
  labs(title="Tasmania Accommodation Takings After Transformation & Differencing")

plot8<- tasmania_acom |>
  ACF(difference(box_cox(Takings, lambda2), 4)) |>
  autoplot() + labs(title="ACF of Accommodation Takings After Tranformation & Differencing ")

plot9 <- tasmania_acom |>
  PACF(difference(box_cox(Takings, lambda2), 4)) |>
  autoplot() + labs(title="PACF of Accommodation Takings After Transformation & Differencing ")

grid.arrange(plot7, plot8, plot9, nrow = 2)

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

There are no trends or seasonality present in the time series, and the ACF and PACF plots show no repeating cycles or trending spikes at different lags, suggesting that the series has become stationary. Additionally, the number of spikes outside the bounds in the ACF plot has decreased from 13 to 3, and in the PACF plot, it has reduced from 4 to 2, further indicating that the data has become stationary.

Monthly sales from souvenirs.

#view(souvenirs)

plot10 <- souvenirs|>
  autoplot(Sales) +
  labs(title="Monthly Sales from Souvenirs")

plot11 <- souvenirs|>
  ACF(Sales) |>
  autoplot() + labs(title="ACF of Monthly Sales ")

plot12 <- souvenirs|>
  PACF(Sales) |>
  autoplot() + labs(title="PACF of Monthly Sales ")

grid.arrange(plot10, plot11, plot12, nrow = 2)

The time series, ACF and PACF reveal that the data may have seasonality, indicating non-stationary series. I will apply seasonal differencing to see if it is appropriate for making the data stationary.

Finding lambda:

lambda3 <- souvenirs|>
  features(Sales,features=guerrero)|>
  pull(lambda_guerrero)
lambda3

## [1] 0.002118221

souvenirs|>
  features(box_cox(Sales, lambda3), unitroot_nsdiffs)

## # A tibble: 1 × 1
##   nsdiffs
##     <int>
## 1       1

The suggesting lambda for box-cox transformation is found to be 0.00212. The transformed data may require one seasonal differencing to be stationary.

souvenirs|>
  features(difference(box_cox(Sales, lambda3), 12), unitroot_ndiffs)

## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      0

It is found that there is no additional differencing is required here.

plot12 <- souvenirs|>
  autoplot(difference(box_cox(Sales, lambda3), 12)) +
  labs(title="Monthly Sales After Transformation & Differencing")

plot13 <- souvenirs|>
  ACF(difference(box_cox(Sales, lambda3), 12)) |>
  autoplot() + labs(title="ACF of Monthly Sales After Tranformation & Differencing")

plot14<- souvenirs |>
  PACF(difference(box_cox(Sales, lambda3), 12)) |>
  autoplot() + labs(title="PACF of Monthly Sales After Transformation & Differencing")

grid.arrange(plot12, plot13, plot14, nrow = 2)

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

The time series above has no trending and seasonality and the number of spikes outside the bounds have been reduced to 2 in both the ACF and PACF plots, suggesting the data has become stationary. There is no evidence of trending and seasonality found for the spikes patterns at different lags in the ACF and PACF plots, which also suggesting that the data is stationary.

Question-9.5:

For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

Randomly get one Australian retail market time series data:

set.seed(1357)
myseries <- aus_retail|>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

Get time series, ACF and PACF plots:

plot15 <- myseries |>
  autoplot(Turnover) +
  labs(title="Monthly Turnover of Retail Market")

plot16 <- myseries |>
  ACF(Turnover) |>
  autoplot() + labs(title="ACF of Monthly Turnover of Retail Market")

plot17 <- myseries |>
  PACF(Turnover) |>
  autoplot() + labs(title="PACF of Monthly Turnover of Retail Market")

grid.arrange(plot15, plot16, plot17, nrow = 2)

The upward trend and seasonality in the time series and the characteristic of the ACF and PACF plot reveal that the data is not stationary.

Get lambda:

lambda4 <- myseries |>
  features(Turnover,features=guerrero)|>
  pull(lambda_guerrero)
lambda4

## [1] -0.3426586

The suggesting value of lambda is found to be -0.343.

myseries |>
  features(box_cox(Turnover, lambda4), unitroot_nsdiffs)

## # A tibble: 1 × 3
##   State           Industry                         nsdiffs
##   <chr>           <chr>                              <int>
## 1 South Australia Other specialised food retailing       1

After the transformation, it is found that the data needs one order of seasonal differencing to make it stationary.

myseries |>
  features(difference(box_cox(Turnover, lambda4), 12), unitroot_ndiffs)

## # A tibble: 1 × 3
##   State           Industry                         ndiffs
##   <chr>           <chr>                             <int>
## 1 South Australia Other specialised food retailing      1

After applying seasonal difference it is seen that the data still need one more additional differencing to obtain the stationary data. I will apply the ordinary differencing or first difference to obtain stationary data.

myseries |>
  features(difference(difference(box_cox(Turnover, lambda4), 12), 1), unitroot_ndiffs)

## # A tibble: 1 × 3
##   State           Industry                         ndiffs
##   <chr>           <chr>                             <int>
## 1 South Australia Other specialised food retailing      0

Now, it is seen that there is no further differencing is required for this data. The data has laready become a stationary data.

Get time series, ACF and PACF plots after applying transformation and differencing:

plot18 <- myseries |>
  autoplot(difference(difference(box_cox(Turnover, lambda4), 12), 1)) +
  labs(title="Monthly Turnover After Transformation & Differencing")

plot19 <- myseries |>
  ACF(difference(difference(box_cox(Turnover, lambda4), 12), 1)) |>
  autoplot() + labs(title="ACF of Monthly Turnover After Tranformation & Differencing")

plot20 <- myseries |>
  PACF(difference(difference(box_cox(Turnover, lambda4), 12), 1)) |>
  autoplot() + labs(title="PACF of Monthly Turnover After Transformation & Differencing ")

grid.arrange(plot18, plot19, plot20, nrow = 2)

## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).

The time series shows no trend and seasonality, with both the variance and mean appearing nearly constant, suggesting that the data has become stationary. The ACF and PACF plots do not display any seasonal spike patterns or trends. While the autocorrelations in the ACF plot were previously higher, they have now decreased. However, as the lags increase, the reduction in autocorrelations is inconsistent and shows some variation, indicating that the data has not achieved perfect stationarity. Instead, it can be said that the time series has become mostly stationary.

Question-9.6:

Simulate and plot some data from simple ARIMA models.

Use the following R code to generate data from an AR(1) model with ϕ1 = 0.6 and σ^2 = 1. The process starts with y1 = 0.

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

Produce a time plot for the series. How does the plot change as you change ϕ1?

Produce time plot for the series above:

autoplot(sim) + labs(title="Time Plot with ϕ1=0.6 ")

## Plot variable not specified, automatically selected `.vars = y`

Get time plots for different phi values:

set.seed(246)

y2 <- numeric(100)
y3 <- numeric(100)
e2 <- rnorm(100)
e3 <- rnorm(100)

for(i in 2:100){
    y2[i] <- 0.9*y2[i-1] + e2[i]
    y3[i] <- 0.1*y3[i-1] + e3[i]
    
}

sim <- tsibble(idx = seq_len(100),y2 = y2, y3 = y3, index = idx)

time_plot1 <-  sim %>% autoplot(y2) + labs( title = "Time plot with ϕ1=0.9")
time_plot2 <- sim %>% autoplot(y3) + labs( title = "Time plot with ϕ1=0.1")

grid.arrange(time_plot1, time_plot2, nrow = 2)

It is seen that as phi approaches to 1, the yt starts to be equivalent to a random walk. On the other hand, as phi approaches to 0, the yt starts to be equivalent to white noise.

Write your own code to generate data from an MA(1) model with θ1 = 0.6 , and σ^2 = 1.

set.seed(357)

y1 <- numeric(100)
e1 <- rnorm(100)

for(i in 2:100)
  y1[i] <- 0.6*e1[i-1] + e1[i]
  
sim1 <- tsibble(idx = seq_len(100), y1 = y1, index = idx)

Produce a time plot for the series. How does the plot change as you change θ1?

plot03<- sim1|> autoplot(y1) + labs( title = " Time plot with θ = 0.6")

set.seed(357)
y4 <- numeric(100)
y5 <- numeric(100)
e4 <- rnorm(100)
e5 <- rnorm(100)

for(i in 2:100){
  y4[i] <- 0.1*e4[i-1] + e4[i]
  y5[i] <- 0.9*e5[i-1] + e5[i]
}
sim2 <- tsibble(idx = seq_len(100), y4 = y4, y5 = y5, index = idx)

plot04 <- sim2|> autoplot(y4) + labs( title = " Time plot with θ = 0.1")
plot05 <- sim2|>autoplot(y5) + labs( title = "Time plot with θ = 0.9")

grid.arrange(plot03, plot04, plot05, nrow = 2)

It seems like the pattern of the series changes as theta (𝜃) changes. As theta (𝜃) gets smaller, the series reduces the magnitude of fluctuations and increases their frequency, while a larger theta (𝜃) keeps the pattern relatively unchanged.

Generate data from an ARMA(1,1) model with ϕ1= 0.6, θ1 = 0.6 and σ^2 = 1 .

set.seed(579)

y06 <- (numeric(100))
e06 <- rnorm(100)

for(i in 2:100)
  y06[i] <- 0.6*y06[i-1] + 0.6*e06[i-1] + e06[i] 

sim06 <- tsibble(idx = seq_len(100), y06 = y06, index = idx)
head(sim06)

## # A tsibble: 6 x 2 [1]
##     idx    y06
##   <int>  <dbl>
## 1     1  0    
## 2     2  0.610
## 3     3  2.30 
## 4     4  1.84 
## 5     5  1.53 
## 6     6 -0.637

Generate data from an AR(2) model with ϕ1= -0.8, ϕ2 = 0.3 and σ^2 = 1. (Note that these parameters will give a non-stationary series.)

set.seed(3579)

y07 <- (numeric(100))
e07 <- rnorm(100)

for(i in 3:100)
  y07[i] <- -0.8*y07[i-1] + 0.3*y07[i-2] + e07[i]

sim07 <- tsibble(idx = seq_len(100), y07 = y07, index = idx)

head(sim07)

## # A tsibble: 6 x 2 [1]
##     idx     y07
##   <int>   <dbl>
## 1     1  0     
## 2     2  0     
## 3     3  0.245 
## 4     4 -0.0295
## 5     5 -0.519 
## 6     6 -0.586

Graph the latter two series and compare them.

plot06<- sim06 |> autoplot(y06) + labs( title = "ARMA(1,1) with ϕ = 0.6 and θ1 = 0.6")
plot07 <- sim07 |> autoplot(y07) + labs( title = "AR(2) with ϕ1 = -0.8 and ϕ2 = 0.3")

grid.arrange(plot06, plot07, nrow = 2)

The ARMA(1,1) with ϕ = 0.6 and θ1 = 0.6 series shows no trend and seasonality rather to having some random fluctuations in the series. The series seems nearly to be stationary. On the other hand, the AR(2) with ϕ1 = -0.8 and ϕ2 = 0.3 seires has a constant flat line upto a certain time index in the early part of it, suggesting that there is little change in the values initially.In the later part of the series, oscilations around the mean of the data appear. As time progresses, the amplitude of these oscillations increases. This behavior arises due to the negative coefficient in the AR(2) model which causes the series to alternate between positive and negative values,leading to a pattern of oscillation. It is also noted that the y-axis scale is larger for the AR(2) sereis compared to the ARMA(1,1) series, reflecting greater variation in the AR(2) series over time.

Question-9.7:

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

#view(aus_airpassengers)
aus_airpas<-aus_airpassengers|>filter(Year<=2011)
#view(aus_airpas)

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

Get model:

fit <- aus_airpas|>model(ARIMA(Passengers))
report(fit)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8756
## s.e.   0.0722
## 
## sigma^2 estimated as 4.671:  log likelihood=-87.8
## AIC=179.61   AICc=179.93   BIC=182.99

The ARIMA (0,2,1) model was selected.

Get residuals:

fit |> gg_tsresiduals()

From the ACF plot, it is seen that all the autocorrelations are within the threshold limits, indicating that the residuals are behaving like white noise.

Get forecasts:

fit|>forecast(h=10)|>autoplot(aus_airpas)

Write the model in terms of the backshift operator.

ARIMA(0,2,1): (1-B)^2 * yt = (1-0.8756B)*et

Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

fit1 <- aus_airpas |>model(ARIMA(Passengers ~ pdq(0,1,0)))

report(fit1)

## Series: Passengers 
## Model: ARIMA(0,1,0) w/ drift 
## 
## Coefficients:
##       constant
##         1.3669
## s.e.    0.3319
## 
## sigma^2 estimated as 4.629:  log likelihood=-89.08
## AIC=182.17   AICc=182.48   BIC=185.59

fit1|>forecast(h=10)|>autoplot(aus_airpas)

The forecasted line appears to be slightly steeper for the ARIMA(0,2,1) model compared to the ARIMA(0,1,0) model. Considering the trend of the curve, the forecasts produced by the ARIMA(0,2,1) model seem to be slightly better, although the forecasts for both models are quite similar. Note that the corrected AICc value for the ARIMA(0,2,1) model is 179.93, while it is 182.48 for the ARIMA(0,1,0) model with drift. This indicates that the ARIMA(0,1,0) model with drift performs worse than the ARIMA(0,2,1) model for this dataset.

Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

fit2 <- aus_airpas |>model(ARIMA(Passengers ~ pdq(2,1,2)))

report(fit2)

## Series: Passengers 
## Model: ARIMA(2,1,2) w/ drift 
## 
## Coefficients:
##          ar1      ar2      ma1     ma2  constant
##       1.4694  -0.5103  -1.5736  0.6780    0.0650
## s.e.  0.3780   0.3558   0.3081  0.2688    0.0294
## 
## sigma^2 estimated as 4.748:  log likelihood=-87.74
## AIC=187.47   AICc=189.94   BIC=197.75

fit2|>forecast(h=10)|>autoplot(aus_airpas)

The forecasts produced by the new ARIMA(2,1,2) model with drift appear very similar to those from parts a and c. However, the corrected AICc value is larger (189.94) than that of the previous two models.

fit3 <- aus_airpas|> model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))

## Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 2))
## [1] non-stationary AR part from CSS

report(fit3)

## Series: Passengers 
## Model: NULL model 
## NULL model

After removing the constant, I found an error stating “non-stationary AR part from CSS”, and the report reflects that no model was built.

Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

fit4 <- aus_airpas|> model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))

## Warning: Model specification induces a quadratic or higher order polynomial trend. 
## This is generally discouraged, consider removing the constant or reducing the number of differences.

report(fit4)

## Series: Passengers 
## Model: ARIMA(0,2,1) w/ poly 
## 
## Coefficients:
##           ma1  constant
##       -1.0000    0.0721
## s.e.   0.0709    0.0260
## 
## sigma^2 estimated as 4.086:  log likelihood=-85.74
## AIC=177.48   AICc=178.15   BIC=182.55

fit4|> forecast(h=10) |> autoplot(aus_airpas)

This time, I received a warning stating that the model specification induces a quadratic or higher-order polynomial trend in the forecasts, which is generally discouraged. The message suggested considering the removal of the constant or reducing the number of differences.The forecasted line here is steeper than before, indicating higher values compared to the previous forecasts.

Question-9.8:

For the United States GDP series (from global_economy):

#view(global_economy)
us_gdp <- global_economy|> filter(Country == "United States")

autoplot(us_gdp)

## Plot variable not specified, automatically selected `.vars = GDP`

The time series plot for the US GDP data exhibits curvature. Therefore, applying a Box-Cox transformation may convert the curve into a linear form.

if necessary, find a suitable Box-Cox transformation for the data;

lambda6 <- us_gdp |>features(GDP,features=guerrero)|>pull(lambda_guerrero)
lambda6

## [1] 0.2819443

us_gdp |> autoplot(box_cox(GDP, lambda6))

It is clear that the transformation helps reduce the curvature of the time series, making it a straighter line to some extent.

fit a suitable ARIMA model to the transformed data using ARIMA();

fit_us_gdp <- us_gdp |> model(ARIMA(box_cox(GDP, lambda6)))

report(fit_us_gdp)

## Series: GDP 
## Model: ARIMA(1,1,0) w/ drift 
## Transformation: box_cox(GDP, lambda6) 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

An ARIMA(1,1,0) model with drift has been suggesting by the ARIMA() function.

try some other plausible models by experimenting with the orders chosen;

fit_new_model <- us_gdp |>
  model(
    "ARIMA(0,1,1)" = ARIMA(box_cox(GDP, lambda6) ~ 1 + pdq(0,1,1)),
    "ARIMA(0,1,2)" = ARIMA(box_cox(GDP, lambda6) ~ 1 + pdq(0,1,2)),
    "ARIMA(2,1,0)" = ARIMA(box_cox(GDP, lambda6) ~ 1 + pdq(2,1,0)),
    "ARIMA(2,1,1)" = ARIMA(box_cox(GDP, lambda6) ~ 1 + pdq(2,1,1)),
    "ARIMA(1,1,1)" = ARIMA(box_cox(GDP, lambda6) ~ 1 + pdq(1,1,1))
      )

glance(fit_new_model)|> arrange (AIC)|>select(.model:BIC)

## # A tibble: 5 × 6
##   .model       sigma2 log_lik   AIC  AICc   BIC
##   <chr>         <dbl>   <dbl> <dbl> <dbl> <dbl>
## 1 ARIMA(1,1,1)  5580.   -325.  659.  659.  667.
## 2 ARIMA(2,1,0)  5580.   -325.  659.  659.  667.
## 3 ARIMA(0,1,1)  5689.   -326.  659.  659.  665.
## 4 ARIMA(0,1,2)  5634.   -326.  659.  660.  667.
## 5 ARIMA(2,1,1)  5647.   -325.  660.  661.  671.

Among the five new models, the ARIMA (1,1,1) model appears to be the best performing model based on its lowest corrected AICc value of 659.4135.

choose what you think is the best model and check the residual diagnostics;

When comparing the best new model, ARIMA (1,1,1), with the previous model, ARIMA (1,1,0), I found that the previous model performs better, as its corrected AICc value of 657.1 is lower than the corrected AICc value of 659.4135 for the ARIMA (1,1,1) model.

Check residual diagnostics of ARIMA(1,1,0):

fit_us_gdp |> gg_tsresiduals()

The autocorrelations are all within the threshold limits in the ACF plot, indicating that the residuals are behaving like a white noise.

produce forecasts of your fitted model. Do the forecasts look reasonable?

fit_us_gdp|>forecast(h=10)|>autoplot(us_gdp)

The forecasted line is trending upward, reflecting alignment with the overall trend of the time series. Therefore, the forecasts seem reasonable here.

compare the results with what you would obtain using ETS() (with no transformation).

fit_ets_model <- us_gdp |> model(ETS(GDP))

report(fit_ets_model)

## Series: GDP 
## Model: ETS(M,A,N) 
##   Smoothing parameters:
##     alpha = 0.9990876 
##     beta  = 0.5011949 
## 
##   Initial states:
##          l[0]        b[0]
##  448093333334 64917355687
## 
##   sigma^2:  7e-04
## 
##      AIC     AICc      BIC 
## 3190.787 3191.941 3201.089

fit_ets_model |> forecast(h=10) |> autoplot(us_gdp)

The forecasted line produced by the ETS model seems less steep than the one produced by the ARIMA (1,1,0) model, indicating that it missed the steepness of the overall trend to some extent. Moreover, the prediction intervals for the ETS model are wider than those of the ARIMA (1,1,0) model. Additionally, the corrected AICc value of 3191.941 for the ETS model is much higher, making it perform worse compared to the ARIMA (1,1,0) model.

Data624_HW06

Mahmud Hasan Al Raji

2024-10-20