DATA624 HW6

Do exercises 9.1, 9.2, 9.3, 9.5, 9.6, 9.7, 9.8 in Hyndman.

Ex 9.1

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

a.Explain the differences among these figures. Do they all indicate that the data are white noise?

A white noise series is stationary — it does not matter when you observe it, it should look much the same at any point in time. For a stationary time series, the ACF will drop to zero relatively quickly, while the ACF of non-stationary data decreases slowly. Also, for non-stationary data, the value of r1 is often large and positive. The more the random numbers, the closer the value to zero. They are all white noise.

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise? The formula for critical values are +/-1.96*sigma/sqrt(n), the more the random number (n), the smaller the critical values.

Ex 9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be difference.

library(fpp3)
library(ggplot2)
library(feasts)

AMZN<-gafa_stock|>filter(Symbol=='AMZN')|>select(Close)

autoplot(AMZN, Close) +
  ggtitle("Daily Closing Prices for Amazon stock") +
  theme(plot.title = element_text(hjust = 0.5))

acf(AMZN, main='ACF Plot')

pacf(AMZN, main='PACF Plot')

The time series show a upward trend and non-stationary data.

ACF measures the overall correlation at various lags, while PACF measures the direct correlation between a variable and its lagged values, after accounting for the influence of intervening lags.

The ACF is approaching zero very slowly. Moreover, the ACF values are much higher than the critical value in blue line. There is a slight tail off at ACF plot.

The PACF is also outside the critical limit for the first lag value. There is a tail off at PACF plot too.

They all show non-stationary series.

Ex 9.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Turkish GDP from global_economy.

# a Turkish GDP
turkey_gdp<-global_economy|>filter(Country=='Turkey')|>select(GDP)
gg_tsdisplay(turkey_gdp)

# Lambda
lambda_gdp <- turkey_gdp %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

lambda_gdp

## [1] 0.1572187

#Order of differencing
turkey_gdp|>mutate(GDP=box_cox(GDP, lambda_gdp))|>features(GDP, unitroot_ndiffs)

Lambda is 0.157 whilst order of differencing is 1.

Accommodation takings in the state of Tasmania from aus_accommodation.

# a Tasmania accomodation takings
ac_tasmania<-aus_accommodation|>filter(State=='Tasmania')|>select(Takings)
gg_tsdisplay(ac_tasmania, Takings)

# Lambda
lambda_takings <- ac_tasmania %>%
  features(Takings, features = guerrero) %>%
  pull(lambda_guerrero)

lambda_takings

## [1] 0.001819643

#Order of differencing
ac_tasmania|>mutate(Takings=box_cox(Takings, lambda_gdp))|>features(Takings, unitroot_ndiffs)

Lambda is 0 whilst order of differencing is 1.

Monthly sales from souvenirs.

# Monthly sales
gg_tsdisplay(souvenirs, Sales)

# Lambda
lambda_sales <- souvenirs %>%
  features(Sales, features = guerrero) %>%
  pull(lambda_guerrero)

lambda_sales

## [1] 0.002118221

#Order of differencing
souvenirs|>mutate(Sales=box_cox(Sales, lambda_sales))|>features(Sales, unitroot_ndiffs)

Lambda is 0.002 whilst order of differencing is 1.

Ex 9.5

For your retail data (from Exercise 8 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

#my series
set.seed(123456)

myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

# Monthly sales
gg_tsdisplay(myseries, Turnover)

# Lambda
lambda_to <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

lambda_to

## [1] -0.03391672

#Order of differencing
myseries|>mutate(Turnover=box_cox(Turnover, lambda_to))|>features(Turnover, unitroot_ndiffs)

Lambda is -0.0339 whilst order of differencing is 1.

Ex 9.6

Simulate and plot some data from simple ARIMA models.

Use the following R code to generate data from an AR(1) model with ϕ1=0.6 and σ2=1. The process starts with y1=0

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

Produce a time plot for the series. How does the plot change as you change ϕ1?

sim|>autoplot()+ 
  ggtitle("Time plot with ϕ1 = 0.6")

for(ph in c(-1, -0.5, 0, 0.5, 1))
{
  for(i in 2:100)
  {
    y[i] <- ph*y[i-1] + e[i]
    sim <- tsibble(idx = seq_len(100), y = y, index = idx)
  }
  #print(autoplot(sim)+ggtitle(paste0("Time plot with ϕ1 =", ph)))
  
}

As $changes from -1 to 1, the cycle “frequency” decrease with the increase of $

Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1

y_MA <- numeric(100)
e_MA <- rnorm(100)

for(i in 2:100)
{
  y_MA[i] <- e_MA[i]+0.6*e_MA[i-1] 
  sim_MA <- tsibble(idx = seq_len(100), y = y_MA, index = idx)
}

sim_MA|>autoplot()+ 
  ggtitle("MA Time plot with θ1 = 0.6")

## Plot variable not specified, automatically selected `.vars = y`

Produce a time plot for the series. How does the plot change as you change θ1

for(theta in c(-1, -0.5, 0, 0.5, 1))
{
  for(i in 2:100)
  {
  y_MA[i] <- e_MA[i] + theta*e_MA[i-1] 
  sim_MA <- tsibble(idx = seq_len(100), y = y_MA, index = idx)
  }
  print(autoplot(sim_MA, y)+ ggtitle(paste0("Time plot with θ1 =", theta)))
}

The time series doesn’t change much with the change of θ1

Generate data from an ARMA(1,1) model with ϕ1=0.6, θ1=0.6 and σ2=1

y_ARMA <- numeric(100)
e_ARMA <- rnorm(100)

for(i in 2:100)
{
  y_ARMA[i] <- 0.6*y_ARMA[i-1]+ 0.6*e_ARMA[i-1] + e_ARMA[i]
  sim_ARMA <- tsibble(idx = seq_len(100), y = y_ARMA, index = idx)
}

sim_ARMA|>autoplot(y)+ 
  ggtitle("ARMA(1,1) model time plot with ϕ1=0.6, θ1 = 0.6 and σ2=1")

Generate data from an AR(2) model with ϕ1=−0.8, ϕ2=0.3 and σ2=1. (Note that these parameters will give a non-stationary series.)

y_AR2 <- numeric(100)

for(i in 3:100)
{
  y_AR2[i] <- -0.8*y_AR2[i-1] + 0.3*y_AR2[i-2] + e_ARMA[i]
  sim_AR2 <- tsibble(idx = seq_len(100), y = y_AR2, index = idx)
}

sim_AR2|>autoplot(y)+ 
  ggtitle("AR(2) model time plot with ϕ1=-0.8, ϕ2=0.3 and σ2=1")

Graph the latter two series and compare them. The graph found above are very different, the ARMA(1,1) model looks like a random time series without much pattern. The AR(2) model looks like a time series with an expanding magnitude.

Ex 9.7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

library(urca)

fit <- aus_airpassengers |>
  model(ARIMA(Passengers))

report(fit)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

fit|>gg_tsresiduals()+
  labs(title = 'Australian Air Passengers Residual Checking')

fit|>forecast(h=10)|>autoplot(aus_airpassengers)+labs(title = 'Australian Air Passengers with 10 year forecast')

This is an ARIMA(0,2,1) model.

Write the model in terms of the backshift operator. yt=−0.8963∗ϵt−1+ϵt
Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

fit2<-aus_airpassengers|>
  model(ARIMA(Passengers~pdq(0,1,0)))

fit2|>forecast(h=10)|>autoplot(aus_airpassengers)+labs(title = 'Australian Air Passengers with 10 year forecast from an ARIMA(0,1,0) model')

It is similar to the last model (ARIMA(0,2,1)). The value at year 10 is less than the previous model.

Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

fit3a<-aus_airpassengers|>
  model(ARIMA(Passengers~1+pdq(2,1,2)))

fit3a|>forecast(h=10)|>autoplot(aus_airpassengers)+labs(title = 'Australian Air Passengers with 10 year forecast from an ARIMA(2,1,2) model')

#Without the constant
fit3b<-aus_airpassengers|>
  model(ARIMA(Passengers~0+pdq(2,1,2)))

## Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 2))
## [1] CSS里有非平穩的AR部分

fit3b|>forecast(h=10)|>autoplot(aus_airpassengers)+labs(title = 'Australian Air Passengers with 10 year forecast from an ARIMA(2,1,2) model without the constant')

## Warning in max(ids, na.rm = TRUE): max里所有的参数都不存在；回覆-Inf

## Warning in max(ids, na.rm = TRUE): max里所有的参数都不存在；回覆-Inf

## Warning: Removed 10 rows containing missing values (`()`).

The model is similar to previous one. Removing the constant bring the model to infinity.

Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

fit4<-aus_airpassengers|>
  model(ARIMA(Passengers~1+pdq(0,2,1)))

## Warning: Model specification induces a quadratic or higher order polynomial trend. 
## This is generally discouraged, consider removing the constant or reducing the number of differences.

fit4|>forecast(h=10)|>autoplot(aus_airpassengers)+labs(title = 'Australian Air Passengers with 10 year forecast from an ARIMA(0,2,1) model')

There is an error showing model specification induces a quadratic or higher order polynomial trend.

Ex 9.8

For the United States GDP series (from global_economy):

if necessary, find a suitable Box-Cox transformation for the data;

us_gdp<-global_economy|>
  filter(Country=="United States")|>
  select(GDP)

lambda_us_gdp <- us_gdp %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

lambda_us_gdp

## [1] 0.2819443

us_gdp|>autoplot(GDP)

us_gdp|>autoplot(box_cox(GDP,lambda_gdp))

The plot is almost linear after box_cox transformation.

fit a suitable ARIMA model to the transformed data using ARIMA();

us_gdp_bc<-us_gdp|>mutate(GDP=box_cox(GDP, lambda_gdp))

fit_us_gdp<-us_gdp_bc|>
  model(ARIMA(GDP))

report(fit_us_gdp)

## Series: GDP 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.6534
## s.e.   0.1266
## 
## sigma^2 estimated as 3.881:  log likelihood=-117.16
## AIC=238.31   AICc=238.54   BIC=242.36

The best fit is ARIMA(0,2,1) model

try some other plausible models by experimenting with the orders chosen;

us_gdp_bc|>features(GDP, unitroot_ndiffs)

us_gdp_bc|>gg_tsdisplay(GDP, plot_type = 'partial')

Using the result of part b and the output of feature, try the following:

fit_us_gdp1<-us_gdp_bc|>
  model(ARIMA(GDP~pdq(2,2,1)))

report(fit_us_gdp1)

## Series: GDP 
## Model: ARIMA(2,2,1) 
## 
## Coefficients:
##          ar1      ar2      ma1
##       0.2310  -0.0871  -0.7454
## s.e.  0.1966   0.1611   0.1575
## 
## sigma^2 estimated as 3.865:  log likelihood=-116.02
## AIC=240.05   AICc=240.83   BIC=248.15

for(p in 2:0)
  for(d in 2:0)
    for(q in 1:0)
      fit_us_gdpn<-us_gdp_bc|>model(ARIMA(1+GDP~pdq(p,d,q)))

## Warning: It looks like you're trying to fully specify your ARIMA model but have not said if a constant should be included.
## You can include a constant using `ARIMA(y~1)` to the formula or exclude it by adding `ARIMA(y~0)`.

## Warning: 1 error encountered for ARIMA(1 + GDP ~ pdq(p, d, q))
## [1] Could not find an appropriate ARIMA model.
## This is likely because automatic selection does not select models with characteristic roots that may be numerically unstable.
## For more details, refer to https://otexts.com/fpp3/arima-r.html#plotting-the-characteristic-roots

## Warning in sqrt(diag(best$var.coef)): 产生了NaNs

## Warning: It looks like you're trying to fully specify your ARIMA model but have not said if a constant should be included.
## You can include a constant using `ARIMA(y~1)` to the formula or exclude it by adding `ARIMA(y~0)`.

## Warning: 1 error encountered for ARIMA(1 + GDP ~ pdq(p, d, q))
## [1] Could not find an appropriate ARIMA model.
## This is likely because automatic selection does not select models with characteristic roots that may be numerically unstable.
## For more details, refer to https://otexts.com/fpp3/arima-r.html#plotting-the-characteristic-roots

## Warning: It looks like you're trying to fully specify your ARIMA model but have not said if a constant should be included.
## You can include a constant using `ARIMA(y~1)` to the formula or exclude it by adding `ARIMA(y~0)`.

## Warning: 1 error encountered for ARIMA(1 + GDP ~ pdq(p, d, q))
## [1] Could not find an appropriate ARIMA model.
## This is likely because automatic selection does not select models with characteristic roots that may be numerically unstable.
## For more details, refer to https://otexts.com/fpp3/arima-r.html#plotting-the-characteristic-roots

## Warning: It looks like you're trying to fully specify your ARIMA model but have not said if a constant should be included.
## You can include a constant using `ARIMA(y~1)` to the formula or exclude it by adding `ARIMA(y~0)`.

## Warning: 1 error encountered for ARIMA(1 + GDP ~ pdq(p, d, q))
## [1] Could not find an appropriate ARIMA model.
## This is likely because automatic selection does not select models with characteristic roots that may be numerically unstable.
## For more details, refer to https://otexts.com/fpp3/arima-r.html#plotting-the-characteristic-roots

      print(paste0('P=',p,'D=',d,'Q=',q))

## [1] "P=0D=0Q=0"

      report(fit_us_gdpn)

## Series: GDP 
## Model: ARIMA(0,0,0) w/ mean 
## Transformation: 1 + GDP 
## 
## Coefficients:
##       constant
##       619.1675
## s.e.   13.9025
## 
## sigma^2 estimated as 11407:  log likelihood=-352.71
## AIC=709.42   AICc=709.64   BIC=713.54

I tried to run all combinations, but there are onl valid model for ARIMA(0,2,1), ARIMA(0,0,0) and ARIMA(2,2,1). ARIMA(0,2,1) model has the smallest AICC and the best fit.

choose what you think is the best model and check the residual diagnostics;

fit_us_gdp|>gg_tsresiduals()+
  labs(title='US GDP ARIMA(0,2,1) Model')

The ACF plot resembles white noise and the residual is left skewed with the max. at one.

produce forecasts of your fitted model. Do the forecasts look reasonable?

fit_us_gdp|>forecast(h=15)|>autoplot(us_gdp_bc)+
  labs(title='US GDP Box-Cox Transformed with ARIMA(0,2,1) model')

compare the results with what you would obtain using ETS() (with no transformation).

fit_ets<-us_gdp|>model(ETS(GDP))

report(fit_ets)

## Series: GDP 
## Model: ETS(M,A,N) 
##   Smoothing parameters:
##     alpha = 0.9990876 
##     beta  = 0.5011949 
## 
##   Initial states:
##          l[0]        b[0]
##  448093333334 64917355687
## 
##   sigma^2:  7e-04
## 
##      AIC     AICc      BIC 
## 3190.787 3191.941 3201.089

By comparing the AIC of both model, ARIMA(0,2,1) model with box-cox transformation is much better than the ETS model.