hw6

exercises 9.1, 9.2, 9.3, 9.5, 9.6, 9.7, 9.8

9.1

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise?
- The differences among the figures are all have different size of the height in each lag, and they indicate to have white noises since they are inside the two blue lines.
Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?
- The critical values at different distances from the mean of zero because they ard calculated with standard deviation (1.96/sqrt(T)), where when T (time) increases, the critical value decreases. The autocorrelatons are different in each fgure when they each refer to white noise because of larger series sizes of random numbers, leading to chances of autocorrelation to decrease.

9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

The time series increases until around 2019, the decreases, but overall increasing trend, so non-stationary. ACF plot is decreasing and has a large height that goes beyond the blue line, even though stationary time series goes to zero quickly. PACF plot starts off with 1.00, shows its non-stationary. It should be differenced because it will help the mean be stabilized.

library(fpp3)

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.1
## ✔ lubridate   1.9.4     ✔ fable       0.4.1
## ✔ ggplot2     3.5.1

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

gafa_stock |>
  filter(Symbol == "AMZN") |>
  gg_tsdisplay(Close, plot_type='partial')

## Warning: Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.

9.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
1. Turkish GDP from global_economy.
  1. Turkish GDP is both non seasonal and non stationary. The differencing once is recommended by kpss test, and is stationary.
2. Accommodation takings in the state of Tasmania from aus_accommodation.
  1. Accommodation takings in the state of Tasmania is seasonal but non stationary. The differencing is recommended only once with kpss test
3. Monthly sales from souvenirs.
  1. They are seasonal and non stationary, and wll only do differencing once according to kpss testing.

global_economy |>
  filter(Country == "Turkey") |>
  gg_tsdisplay(GDP, plot_type='partial')

lambda <- global_economy |>
  filter(Country == "Turkey") |>
  features(GDP, features = guerrero) |>
  pull(lambda_guerrero)

global_economy |>
  filter(Country == "Turkey") |>
  features(box_cox(GDP,lambda), unitroot_ndiffs)

global_economy |>
  filter(Country == "Turkey") |>
  gg_tsdisplay(difference(box_cox(GDP,lambda)), plot_type='partial')

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

aus_accommodation |>
  filter(State == "Tasmania") |>
  gg_tsdisplay(Takings, plot_type='partial')

lambda <-aus_accommodation |>
  filter(State == "Tasmania") |>
  features(Takings, features = guerrero) |>
  pull(lambda_guerrero)

aus_accommodation |>
  filter(State == "Tasmania") |>
  features(box_cox(Takings,lambda), unitroot_nsdiffs)

9.5

For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

The data is both seasonal and non stationary, and the differencing is only suggested once, by the KPSS unit root testing. ACF and PACF is around zero, so it should be stationary.

set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1)) 

myseries |>
  gg_tsdisplay(Turnover, plot_type='partial', lag = 50)

lambda <- myseries |>
  features(Turnover, features = guerrero) |>
  pull(lambda_guerrero)

myseries |>
  features(box_cox(Turnover, lambda), unitroot_nsdiffs)

myseries |>
  gg_tsdisplay(difference(box_cox(Turnover,lambda), 15), plot_type='partial', lag = 50)

## Warning: Removed 15 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 15 rows containing missing values or values outside the scale range
## (`geom_point()`).

9.6

Simulate and plot some data from simple ARIMA models.

Use the following R code to generate data from an AR(1) model witho1=0.6 and p2=1. The process starts with y1=0
Produce a time plot for the series. How does the plot change as you change o1

Different o1 has different plot, like as o1 decrease, the plot becomes bigger in height, negative value makes the plotting even larger.
Write your own code to generate data from an MA(1) model with x1=0.6 and p2=1
Produce a time plot for the series. How does the plot change as you change 01

It seems to have the opposite effect with 01, the more the value is, the larger the height, and vice versa, helps the centering around the mean.

Generate data from an ARMA(1,1) model with o1=0.6, x1=0.6 and p2=1

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

set.seed(1111)
y <- numeric(100)

for(i in 2:100)
  y[i] <- e[i] + 0.6 * e[i-1] + 0.6 * y[i-1]

tsibble(idx = seq_len(100), y = y, index = idx) |>
  autoplot(y)

Generate data from an AR(2) model with o1=-0.8 o2=0.3 and p2=1

set.seed(1111)
y <- numeric(100)

for(i in 3:100)
  y[i] <- -0.8 * y[i-1] + 0.3 * y[i-2] + e[i]

tsibble(idx = seq_len(100), y = y, index = idx) |>
  autoplot(y)

Graph the latter two series and compare them.

ARMA(1,1) looks to be stationary and random. ACF plot seems to have alot of up and down and only the last value goes beyond the blue line, and PACF is the same. AR model doesn’t seem stationary, ACF is up and down as well but the size consistently decreases overtime. PACF plot has only 1 negative value.

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim |>
  autoplot(y)

for(i in 2:100)
  y[i] <- 0.2*y[i-1] + e[i]

tsibble(idx = seq_len(100), y = y, index = idx) |>
  autoplot(y)

for(i in 2:100)
  y[i] <- -0.8*y[i-1] + e[i]

tsibble(idx = seq_len(100), y = y, index = idx) |>
  autoplot(y)

for(i in 2:100)
  y[i] <- 0*y[i-1] + e[i]

tsibble(idx = seq_len(100), y = y, index = idx) |>
  autoplot(y)

for(i in 2:100)
  y[i] <- e[i] + 0.6 * e[i-1]

sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim |>
  autoplot(y)

for(i in 2:100)
  y[i] <- e[i] + 0.2 * e[i-1]

tsibble(idx = seq_len(100), y = y, index = idx) |>
  autoplot(y)

for(i in 2:100)
  y[i] <- e[i] + -0.8 * e[i-1]

tsibble(idx = seq_len(100), y = y, index = idx) |>
  autoplot(y)

for(i in 2:100)
  y[i] <- e[i] + 0 * e[i-1]

tsibble(idx = seq_len(100), y = y, index = idx) |>
  autoplot(y)

set.seed(1111)
y <- numeric(100)

tsibble(idx = seq_len(100), y = y, index = idx) |>
  gg_tsdisplay(y, plot_type='partial')

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_segment()`).
## Removed 20 rows containing missing values or values outside the scale range
## (`geom_segment()`).

for(i in 3:100)
  y[i] <- -0.8 * y[i-1] + 0.3 * y[i-2] + e[i]

tsibble(idx = seq_len(100), y = y, index = idx) |>
  gg_tsdisplay(y, plot_type='partial')

9.7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

Residuals look to be ARMA(0,2,1)
Write the model in terms of the backshift operator.
Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

The model has higher forcasting in comparison, also the slope looks more slow.
Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

The model is more similar other parts, and they seem to be white noise. Removing the constant makes it not stationary.
Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

The slope is steeper after applying the model.

nextperiods <- aus_airpassengers |>
  filter(Year <= 2011) |>
  model(ARIMA(Passengers)) 
report(nextperiods)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8756
## s.e.   0.0722
## 
## sigma^2 estimated as 4.671:  log likelihood=-87.8
## AIC=179.61   AICc=179.93   BIC=182.99

nextperiods |>
  gg_tsresiduals()

nextperiodss <-aus_airpassengers |>
  filter(Year <= 2011) |>
  model(ARIMA(Passengers ~ pdq(0,1,0)))

nextperiodss |>
  forecast(h=15) |>
  autoplot(aus_airpassengers)

nextperiodss |>
  gg_tsresiduals()

nextperiodsss <-aus_airpassengers |>
  filter(Year <= 2011) |>
  model(ARIMA(Passengers ~ pdq(2,1,2)))

nextperiodsss |>
  forecast(h=15) |>
  autoplot(aus_airpassengers)

nextperiodsss |>
  gg_tsresiduals()

nextperiodssss <-aus_airpassengers |>
  filter(Year <= 2011) |>
  model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))

## Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 2))
## [1] non-stationary AR part from CSS

nextperiodsssss <-aus_airpassengers |>
  filter(Year <= 2011) |>
  model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))

## Warning: Model specification induces a quadratic or higher order polynomial trend. 
## This is generally discouraged, consider removing the constant or reducing the number of differences.

nextperiodsssss |>
  forecast(h=10) |>
  autoplot(aus_airpassengers)

nextperiodsssss |> 
  gg_tsresiduals()

9.8

For the United States GDP series (from global_economy):
1. if necessary, find a suitable Box-Cox transformation for the data;
  
  GDP has increasing trend and is not stationary, and suitable box-cos is 0.2819443
2. fit a suitable ARIMA model to the transformed data using ARIMA();
```
ARIMA(1,1,0)
```
3. try some other plausible models by experimenting with the orders chosen;
  
  The difference is only once with unit root test. The ACF plot has a decreasing trend and only one value in PACF plot, also with ARIMA(1,2,0) having the lowest.
4. choose what you think is the best model and check the residual diagnostics;
  
  The lowest ARIMA’s residuals seems to be white noise.
5. produce forecasts of your fitted model. Do the forecasts look reasonable?
  
  The forecasts seem reasonable. The trend is has a consistent slope.
6. compare the results with what you would obtain using ETS() (with no transformation).
  
  The model has a larger AICc, so this model is worse.

global_economy |>
  filter(Code == "USA") |>
  gg_tsdisplay(GDP, plot_type='partial')

lambda <-global_economy |>
  filter(Code == "USA") |>
  features(GDP, features = guerrero) |>
  pull(lambda_guerrero)

lambda

## [1] 0.2819443

fit <- global_economy |>
  filter(Code == "USA") |>
  model(ARIMA(box_cox(GDP, lambda))) 

report(fit)

## Series: GDP 
## Model: ARIMA(1,1,0) w/ drift 
## Transformation: box_cox(GDP, lambda) 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

global_economy |>
  filter(Code == "USA") |>
  gg_tsdisplay(box_cox(GDP,lambda), plot_type='partial')

global_economy |>
  filter(Code == "USA") |>
  features(box_cox(GDP,lambda), unitroot_ndiffs)

fit <- global_economy |>
  filter(Code == "USA") |>
  model(arima110 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,1,0)),
        arima120 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,2,0)),
        arima111 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,1,1)))

glance(fit)

fit |>
  select(arima120) |>
  gg_tsresiduals()

fit |>
  forecast(h=10) |>
  filter(.model=='arima120') |>
  autoplot(global_economy)

ets <- global_economy |>
  filter(Code == "USA") |>
  model(ETS(GDP))

report(ets)

## Series: GDP 
## Model: ETS(M,A,N) 
##   Smoothing parameters:
##     alpha = 0.9990876 
##     beta  = 0.5011949 
## 
##   Initial states:
##          l[0]        b[0]
##  448093333334 64917355687
## 
##   sigma^2:  7e-04
## 
##      AIC     AICc      BIC 
## 3190.787 3191.941 3201.089

ets |>
  forecast(h=10) |>
  autoplot(global_economy)

hw6

Ali Ahmed

2025-03-14

9.1

9.2

9.3

9.5

9.6

9.7

9.8