DPlunkett-HW6

9.1, 9.2, 9.3, 9.5, 9.6, 9.7, 9.8 in Hyndman

9.1

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

a.

Explain the differences among these figures. Do they all indicate that the data are white noise?

Yes, all the data look like white noise, with no autocorrelations outside of the limits (the dashed lines).

b.

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

Answering the second of the two question first. The autocorrelations are different in scale, but not essentially different in shape.

Which leads to the second answer: With randomly generated data, as the sample size gets larger (36 -> 360 -> 1000) how we calculate the limits changes, as N (sample size) is part of that calculation: ±1.96/sqrt(N).

9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

amzn <- gafa_stock |>
  filter (Symbol=='AMZN')

amzn |> autoplot(Close)

Here we see a clear trend, which indicates data is not stationary.

amzn |> 
  ACF(Close) |> 
  autoplot()

## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.

Here we see the autocorrelation approaching 0 very slowly, which indicates data is not stationary.

amzn |> 
  PACF(Close) |> 
  autoplot()

## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.

And here we see the large (=1) first value in the PACF, which indicates data is not stationary.

9.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

a.

Turkish GDP from global_economy.

t <- global_economy |>
  filter(Country == 'Turkey')

pt1<-autoplot(t)+
  scale_y_continuous(labels = label_number(suffix = "M", scale = 1e-9))

## Plot variable not specified, automatically selected `.vars = GDP`

pt2<-t |> 
  ACF(GDP) |> 
  autoplot()

pt3<-t |> 
  PACF(GDP) |> 
  autoplot()

ggarrange(pt1,pt2,pt3)

t |>
  features(GDP, unitroot_kpss)

## # A tibble: 1 × 3
##   Country kpss_stat kpss_pvalue
##   <fct>       <dbl>       <dbl>
## 1 Turkey       1.22        0.01

The p-value is shown as 0.01 (and therefore it may be smaller than that) indicating the data are not stationary. We can difference the data, and apply the test again.

t_lmbda <- t|>
  features(GDP, features= guerrero) |>
  pull(lambda_guerrero)
t_lmbda

## [1] 0.1572187

t |>
  features(box_cox(GDP, t_lmbda), unitroot_ndiffs)

## # A tibble: 1 × 2
##   Country ndiffs
##   <fct>    <int>
## 1 Turkey       1

pt4<-t |>
  autoplot(difference(box_cox(GDP,t_lmbda),1))

pt5<-t |> 
  ACF(difference(box_cox(GDP,t_lmbda),1)) |> 
  autoplot()

pt6<-t |> 
  PACF(difference(box_cox(GDP,t_lmbda),1)) |> 
  autoplot()

ggarrange(pt4,pt5,pt6)

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

t |>
  mutate(t_diff=difference(box_cox(GDP,t_lmbda),1)) |>
  features(t_diff, unitroot_kpss)

## # A tibble: 1 × 3
##   Country kpss_stat kpss_pvalue
##   <fct>       <dbl>       <dbl>
## 1 Turkey     0.0889         0.1

This time, the p-value is reported as 0.1 (and so it could be larger than that). We can conclude that the differenced data appear stationary.

b.

Accommodation takings in the state of Tasmania from aus_accommodation.

a <- aus_accommodation |>
  filter(State == 'Tasmania')

pa1<-autoplot(a)

## Plot variable not specified, automatically selected `.vars = Takings`

pa2<-a |> 
  ACF(Takings) |> 
  autoplot()

pa3<-a |> 
  PACF(Takings) |> 
  autoplot()

ggarrange(pa1,pa2,pa3)

a |>
  features(Takings, unitroot_kpss)

## # A tibble: 1 × 3
##   State    kpss_stat kpss_pvalue
##   <chr>        <dbl>       <dbl>
## 1 Tasmania      1.88        0.01

The p-value is shown as 0.01 (and therefore it may be smaller than that) indicating the data are not stationary. We can difference the data, and apply the test again.

a_lmbda <- a |>
  features(Takings, features= guerrero) |>
  pull(lambda_guerrero)
a_lmbda

## [1] 0.001819643

#Note unitroot_nsdiffs() as there seems to be seasonality
a |>
  features(box_cox(Takings, a_lmbda), unitroot_nsdiffs)

## # A tibble: 1 × 2
##   State    nsdiffs
##   <chr>      <int>
## 1 Tasmania       1

#note 4 bcs quarterly

pa4<-a |>
  autoplot(difference(box_cox(Takings,a_lmbda),4))

pa5<-a |> 
  ACF(difference(box_cox(Takings,a_lmbda),4)) |> 
  autoplot()

pa6<-a |> 
  PACF(difference(box_cox(Takings,a_lmbda),4)) |> 
  autoplot()

ggarrange(pa4,pa5,pa6)

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

a |>
  mutate(a_diff=difference(box_cox(Takings,a_lmbda),4)) |>
  features(a_diff, unitroot_kpss)

## # A tibble: 1 × 3
##   State    kpss_stat kpss_pvalue
##   <chr>        <dbl>       <dbl>
## 1 Tasmania     0.198         0.1

This time, the p-value is reported as 0.1 (and so it could be larger than that). We can conclude that the differenced data appear stationary.

c.

Monthly sales from souvenirs.

s <- souvenirs 

ps1<-autoplot(s)

## Plot variable not specified, automatically selected `.vars = Sales`

ps2<-s |> 
  ACF(Sales) |> 
  autoplot()

ps3<-s |> 
  PACF(Sales) |> 
  autoplot()

ggarrange(ps1,ps2,ps3)

s |>
  features(Sales, unitroot_kpss)

## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1      1.31        0.01

The p-value is shown as 0.01 (and therefore it may be smaller than that) indicating the data are not stationary. We can difference the data, and apply the test again.

s_lmbda <- s |>
  features(Sales, features= guerrero) |>
  pull(lambda_guerrero)
s_lmbda

## [1] 0.002118221

#Note unitroot_nsdiffs() as there seems to be seasonality
s |>
  features(box_cox(Sales, s_lmbda), unitroot_nsdiffs)

## # A tibble: 1 × 1
##   nsdiffs
##     <int>
## 1       1

#note 12 bcs monthly

ps4<-s |>
  autoplot(difference(box_cox(Sales,s_lmbda),12))

ps5<-s |> 
  ACF(difference(box_cox(Sales,s_lmbda),12)) |> 
  autoplot()

ps6<-s |> 
  PACF(difference(box_cox(Sales,s_lmbda),12)) |> 
  autoplot()

ggarrange(ps4,ps5,ps6)

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

s |>
  mutate(s_diff=difference(box_cox(Sales,s_lmbda),12)) |>
  features(s_diff, unitroot_kpss)

## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1     0.240         0.1

This time, the p-value is reported as 0.1 (and so it could be larger than that). We can conclude that the differenced data appear stationary.

9.5

For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

myseries<-aus_retail |>
  filter (`Series ID`=='A3349849A')

pr1<-myseries |> autoplot(Turnover)

pr2<-myseries |> 
  ACF(Turnover) |> 
  autoplot()

pr3<-myseries |> 
  PACF(Turnover) |> 
  autoplot()

ggarrange(ps1,ps2,ps3)

myseries |>
  features(Turnover, unitroot_kpss)

## # A tibble: 1 × 4
##   State                        Industry                    kpss_stat kpss_pvalue
##   <chr>                        <chr>                           <dbl>       <dbl>
## 1 Australian Capital Territory Cafes, restaurants and cat…      6.54        0.01

The p-value is shown as 0.01, not stationary. We can difference the data.

r_lmbda <- myseries |>
  features(Turnover, features= guerrero) |>
  pull(lambda_guerrero)
r_lmbda

## [1] 0.5054374

#Note unitroot_nsdiffs() as there seems to be seasonality
myseries |>
  features(box_cox(Turnover, s_lmbda), unitroot_nsdiffs)

## # A tibble: 1 × 3
##   State                        Industry                                 nsdiffs
##   <chr>                        <chr>                                      <int>
## 1 Australian Capital Territory Cafes, restaurants and catering services       0

Here, using nsdiffs is 0, where it has been 1 in the prior examples. It really looks seasonal, but will try ndiffs.

#Even though there seems to be seasonality, trying ndiffs
myseries |>
  features(box_cox(Turnover, s_lmbda), unitroot_ndiffs)

## # A tibble: 1 × 3
##   State                        Industry                                 ndiffs
##   <chr>                        <chr>                                     <int>
## 1 Australian Capital Territory Cafes, restaurants and catering services      1

Ndiffs is 1, which is what we want. Perhaps it was not as seasonal as it looks? Or something else was wrong?

#note 12 bcs monthly

pr4<-myseries |>
  autoplot(difference(box_cox(Turnover,r_lmbda),12))

pr5<-myseries |> 
  ACF(difference(box_cox(Turnover,r_lmbda),12)) |> 
  autoplot()

pr6<-myseries |> 
  PACF(difference(box_cox(Turnover,r_lmbda),12)) |> 
  autoplot()

ggarrange(pr4,pr5,pr6)

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

myseries |>
  mutate(r_diff=difference(box_cox(Turnover,r_lmbda),12)) |>
  features(r_diff, unitroot_kpss)

## # A tibble: 1 × 4
##   State                        Industry                    kpss_stat kpss_pvalue
##   <chr>                        <chr>                           <dbl>       <dbl>
## 1 Australian Capital Territory Cafes, restaurants and cat…    0.0560         0.1

9.6

Simulate and plot some data from simple ARIMA models.

a.

Use the following R code to generate data from an AR(1) model with ϕ1=0.6 and σ2=1. The process starts with y1=0.

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

b.

Produce a time plot for the series. How does the plot change as you change ϕ1?

for(i in 2:100)
  y[i] <- 0.1*y[i-1] + e[i]
sim1 <- tsibble(idx = seq_len(100), y = y, index = idx)

for(i in 2:100)
  y[i] <- 0.4*y[i-1] + e[i]
sim4 <- tsibble(idx = seq_len(100), y = y, index = idx)

for(i in 2:100)
  y[i] <- 0.8*y[i-1] + e[i]
sim8 <- tsibble(idx = seq_len(100), y = y, index = idx)

s1<-autoplot(sim1)+ 
  ggtitle("ϕ1=.1")+
  theme(plot.title = element_text(hjust = 0.5))

## Plot variable not specified, automatically selected `.vars = y`

s2<-autoplot(sim4)+ 
  ggtitle("ϕ1=.4")+
  theme(plot.title = element_text(hjust = 0.5))

## Plot variable not specified, automatically selected `.vars = y`

s3<-autoplot(sim)+ # this is the one from part a
  ggtitle("ϕ1=.6 (the example given)")+
  theme(plot.title = element_text(hjust = 0.5))

## Plot variable not specified, automatically selected `.vars = y`

s4<-autoplot(sim8)+ 
  ggtitle("ϕ1=.8")+
  theme(plot.title = element_text(hjust = 0.5))

## Plot variable not specified, automatically selected `.vars = y`

ggarrange(s1,s2,s3,s4)

When ϕ1 is closer to 0, the result is closer to white noise.

c.

Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1.

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*e[i-1] + e[i]
simMA <- tsibble(idx = seq_len(100), y = y, index = idx)

d.

Produce a time plot for the series. How does the plot change as you change θ1?

for(i in 2:100)
  y[i] <- 0.1*e[i-1] + e[i]
simMA1 <- tsibble(idx = seq_len(100), y = y, index = idx)

for(i in 2:100)
  y[i] <- 0.4*e[i-1] + e[i]
simMA4 <- tsibble(idx = seq_len(100), y = y, index = idx)

for(i in 2:100)
  y[i] <- 0.8*e[i-1] + e[i]
simMA8 <- tsibble(idx = seq_len(100), y = y, index = idx)

sMA1<-autoplot(simMA1)+ 
  ggtitle("θ1=.1")+
  theme(plot.title = element_text(hjust = 0.5))

## Plot variable not specified, automatically selected `.vars = y`

sMA2<-autoplot(simMA4)+ 
  ggtitle("θ1=.4")+
  theme(plot.title = element_text(hjust = 0.5))

## Plot variable not specified, automatically selected `.vars = y`

sMA3<-autoplot(simMA)+ # this is the one from part c
  ggtitle("θ1=.6")+
  theme(plot.title = element_text(hjust = 0.5))

## Plot variable not specified, automatically selected `.vars = y`

sMA4<-autoplot(simMA8)+ 
  ggtitle("θ1=.8")+
  theme(plot.title = element_text(hjust = 0.5))

## Plot variable not specified, automatically selected `.vars = y`

ggarrange(sMA1,sMA2,sMA3,sMA4)

As θ1 increases, moving away from 0, the pattern is smoother. The peaks and valleys are in the same place, but it is less jagged.

e.

Generate data from an ARMA(1,1) model with ϕ1=0.6, θ1=0.6 and σ2=1.

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <-  0.6*y[i-1]+0.6*e[i-1] + e[i]
ARMA11 <- tsibble(idx = seq_len(100), y = y, index = idx)

f.

Generate data from an AR(2) model with ϕ1=−0.8, ϕ2=0.3 and σ2=1. (Note that these parameters will give a non-stationary series.)

y <- numeric(100)
e <- rnorm(100)
for(i in 3:100)
  y[i] <-  -0.8*y[i-1]+0.3*y[i-2] + e[i]
AR2 <- tsibble(idx = seq_len(100), y = y, index = idx)

g.

Graph the latter two series and compare them.

pARMA<-autoplot(ARMA11)+ 
  ggtitle("ARMA(1,1) ϕ1=0.6, θ1=0.6 and σ2=1")+
  theme(plot.title = element_text(hjust = 0.5, size=8))

## Plot variable not specified, automatically selected `.vars = y`

pAR<-autoplot(AR2)+ 
  ggtitle("AR(2) ϕ1=−0.8, ϕ2=0.3 and σ2=1")+
  theme(plot.title = element_text(hjust = 0.5, size=8))

## Plot variable not specified, automatically selected `.vars = y`

ggarrange(pARMA, pAR)

The ARMA(1,1) on the left looks stationary. The AR(2) on the right has a pattern that increases over time.

7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

pass <- aus_airpassengers |>
  filter (Year<=2011)

a.

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

fit<-pass |>
  model(ARIMA(Passengers))
report(fit)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8756
## s.e.   0.0722
## 
## sigma^2 estimated as 4.671:  log likelihood=-87.8
## AIC=179.61   AICc=179.93   BIC=182.99

fit |> gg_tsresiduals()

The time plot of the residuals shows that the variation of the residuals is much the same across the historical data, with 3 exceptions and therefore the residual variance can be treated as constant.
The acf shows all lags within range.
The consistency of variation can also be seen on the histogram of the residuals, though there outliers on either side which keep it from looking normal.

b.

Write the model in terms of the backshift operator.

(1−B)^2*yt=(1+θB)εt

c.

Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

fit2 <- pass |>
  model(ARIMA(Passengers ~pdq(0,1,0)))

#never plotted part a, so doing it here to make easy to compare
#also adjusting the scales so differences are easier to see
pa<-fit |>
  forecast(h=10) |>
  autoplot(pass) +
  ggtitle("part a: ARIMA(0,2,1)")+
  theme(plot.title = element_text(hjust = 0.5, size=10)) +
  theme(axis.text = element_text(size = 6))+ 
  theme(legend.position = "none")  +
  scale_y_continuous(limits = c(0, 100))


pc<-fit2 |>
  forecast(h=10) |>
  autoplot(pass) +
  ggtitle("part c: ARIMA(0,1,1)")+
  theme(plot.title = element_text(hjust = 0.5, size=10)) +
  theme(axis.text = element_text(size = 6)) + 
  theme(legend.position = "none")  + 
    theme(axis.title.y = element_blank()) +
  scale_y_continuous(limits = c(0, 100))

ggarrange(pa,pc)

As you can see, the plot from part a has a much steeper incline.

d.

Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

fit3 <- pass |>
  model(ARIMA(Passengers ~pdq(2,1,2)))

pd<-fit3|>
  forecast(h=10) |>
  autoplot(pass) +
  ggtitle("part d: ARIMA(2,1,2)")+
  theme(plot.title = element_text(hjust = 0.5, size=10)) +
  theme(axis.text = element_text(size = 6)) + 
  theme(legend.position = "none")  + 
    theme(axis.title.y = element_blank()) +
  scale_y_continuous(limits = c(0, 100))

ggarrange(pa,pc,pd)

Part d looks more like part a, with a narrower confidence interval.

fit3_1 <- pass |>
  model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))

## Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 2))
## [1] non-stationary AR part from CSS

report(fit3_1)

## Series: Passengers 
## Model: NULL model 
## NULL model

Removing the constant makes the model NULL.

e.

Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

fit4 <- pass |>
  model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))

## Warning: Model specification induces a quadratic or higher order polynomial trend. 
## This is generally discouraged, consider removing the constant or reducing the number of differences.

pe<-fit4|>
  forecast(h=10) |>
  autoplot(pass) +
  ggtitle("part e: ARIMA(0,2,1) with constant")+
  theme(plot.title = element_text(hjust = 0.5, size=10)) +
  theme(axis.text = element_text(size = 6)) + 
  theme(legend.position = "none")  + 
    theme(axis.title.y = element_blank()) +
  scale_y_continuous(limits = c(0, 100))

ggarrange(pa,pc,pd,pe)

Warning “This is generally discouraged” :)
But one can plot it. It is the steepest yet, with the narrowest confidence intervals and no confidence intervals at the end.

8

For the United States GDP series (from global_economy):

usgdp<- global_economy |>
  filter (Country == 'United States')

pus1<-autoplot(usgdp)+
  scale_y_continuous(labels = label_number(suffix = "B", scale = 1e-12))

## Plot variable not specified, automatically selected `.vars = GDP`

pus2<-usgdp |> 
  ACF(GDP) |> 
  autoplot()

pus3<-usgdp |> 
  PACF(GDP) |> 
  autoplot()

ggarrange(pus1,pus2,pus3)

usgdp |>
  features(GDP, unitroot_kpss)

## # A tibble: 1 × 3
##   Country       kpss_stat kpss_pvalue
##   <fct>             <dbl>       <dbl>
## 1 United States      1.49        0.01

The p-value is shown as 0.01 indicating the data are not stationary. We can difference the data.

a.

if necessary, find a suitable Box-Cox transformation for the data;

us_lmbda <- usgdp|>
  features(GDP, features= guerrero) |>
  pull(lambda_guerrero)
us_lmbda

## [1] 0.2819443

usgdp |>
  features(box_cox(GDP, us_lmbda), unitroot_ndiffs)

## # A tibble: 1 × 2
##   Country       ndiffs
##   <fct>          <int>
## 1 United States      1

pus4<-usgdp |>
  autoplot(difference(box_cox(GDP,us_lmbda),1))

pus5<-usgdp |> 
  ACF(difference(box_cox(GDP,us_lmbda),1)) |> 
  autoplot()

pus6<-usgdp |> 
  PACF(difference(box_cox(GDP,us_lmbda),1)) |> 
  autoplot()

ggarrange(pt4,pt5,pt6)

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

usgdp |>
  mutate(us_diff=difference(box_cox(GDP,us_lmbda),1)) |>
  features(us_diff, unitroot_kpss)

## # A tibble: 1 × 3
##   Country       kpss_stat kpss_pvalue
##   <fct>             <dbl>       <dbl>
## 1 United States     0.208         0.1

This time, the p-value is reported as 0.1 (and so it could be larger than that). We can conclude that the differenced data appear stationary.

b.

fit a suitable ARIMA model to the transformed data using ARIMA();

usfit<-usgdp |>
  model(ARIMA(box_cox(GDP,us_lmbda)))

report(usfit)

## Series: GDP 
## Model: ARIMA(1,1,0) w/ drift 
## Transformation: box_cox(GDP, us_lmbda) 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

c.

try some other plausible models by experimenting with the orders chosen;

oth_fit <- usgdp |>
  model(suggested110 = ARIMA(box_cox(GDP,us_lmbda) ~ pdq(1,1,0)),
        oth210 = ARIMA(box_cox(GDP,us_lmbda) ~ pdq(2,1,0)),
        oth013 = ARIMA(box_cox(GDP,us_lmbda) ~ pdq(0,1,3)),
        oth021 = ARIMA(box_cox(GDP,us_lmbda) ~ pdq(0,2,1)),
        oth212 = ARIMA(box_cox(GDP,us_lmbda) ~ pdq(2,1,2)),
        oth111 = ARIMA(box_cox(GDP,us_lmbda) ~ pdq(1,2,1)),
        search = ARIMA(box_cox(GDP,us_lmbda), stepwise=FALSE))

oth_fit |> pivot_longer(!Country, names_to = "Model name",
                         values_to = "Orders")

## # A mable: 7 x 3
## # Key:     Country, Model name [7]
##   Country       `Model name`                  Orders
##   <fct>         <chr>                        <model>
## 1 United States suggested110 <ARIMA(1,1,0) w/ drift>
## 2 United States oth210       <ARIMA(2,1,0) w/ drift>
## 3 United States oth013       <ARIMA(0,1,3) w/ drift>
## 4 United States oth021                <ARIMA(0,2,1)>
## 5 United States oth212       <ARIMA(2,1,2) w/ drift>
## 6 United States oth111                <ARIMA(1,2,1)>
## 7 United States search       <ARIMA(1,1,0) w/ drift>

glance(oth_fit) |> arrange(AICc) |> select(.model:BIC)

## # A tibble: 7 × 6
##   .model       sigma2 log_lik   AIC  AICc   BIC
##   <chr>         <dbl>   <dbl> <dbl> <dbl> <dbl>
## 1 oth111        5761.   -321.  648.  649.  655.
## 2 oth021        6020.   -323.  650.  650.  654.
## 3 suggested110  5479.   -325.  657.  657.  663.
## 4 search        5479.   -325.  657.  657.  663.
## 5 oth210        5580.   -325.  659.  659.  667.
## 6 oth013        5708.   -325.  661.  662.  671.
## 7 oth212        5734.   -325.  662.  664.  674.

The models have similar AICc values. Of the models fitted, we find that an ARIMA(1,1,1) gives the lowest AICc value, followed by the ARIMA(0,2,1) with drift. The automated stepwise selection had identified an ARIMA(1,1,0) with drift model, which has the 3rd lowest AICc value, and the full search suggested the same ARIMA(1,1,0) with drift.

d.

choose what you think is the best model and check the residual diagnostics;

oth_fit |>
  select(oth111) |>
  gg_tsresiduals()

The ACF plot of the residuals from the ARIMA(1,1,1) model shows that all autocorrelations are within the threshold limits, indicating that the residuals are behaving like white noise.

e.

produce forecasts of your fitted model. Do the forecasts look reasonable?

oth_fit |>
  forecast(h=10) |>
  filter(.model=='oth111') |>
  autoplot(usgdp)+
  scale_y_continuous(labels = label_number(suffix = "B", scale = 1e-12))

Yes, the forecast looks reasonable.

f.

compare the results with what you would obtain using ETS() (with no transformation).

usETS <-usgdp |>
  model(ETS(GDP)) 

usETS|>
  forecast(h = 10) |>
  autoplot(usgdp)+
  scale_y_continuous(labels = label_number(suffix = "B", scale = 1e-12))

In the ETS model, the slope is not as steep, and the confidence intervals are wider.

report(usETS)

## Series: GDP 
## Model: ETS(M,A,N) 
##   Smoothing parameters:
##     alpha = 0.9990876 
##     beta  = 0.5011949 
## 
##   Initial states:
##          l[0]        b[0]
##  448093333334 64917355687
## 
##   sigma^2:  7e-04
## 
##      AIC     AICc      BIC 
## 3190.787 3191.941 3201.089

The AICc with ETS is 3192, whereas the ARIMA(1,1,1) model was 648.

usETS |> gg_tsresiduals()

The ACF plot of the residuals from the ETS model shows that all autocorrelations are within the threshold limits, indicating that the residuals are behaving like white noise.

DPlunkett-HW6

D Plunkett

2024-10-18

9.1

a.

b.

9.2

9.3

a.

b.

c.

9.5

9.6

a.

b.

c.

d.

e.

f.

g.

7

a.

b.

c.

d.

e.

8

a.

b.

c.

d.

e.

f.