data624 assignment6

library(fpp3)

## ── Attaching packages ────────────────────────────────────────────── fpp3 0.5 ──

## ✔ tibble      3.1.6     ✔ tsibble     1.1.3
## ✔ dplyr       1.1.0     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.0     ✔ feasts      0.3.0
## ✔ lubridate   1.8.0     ✔ fable       0.3.2
## ✔ ggplot2     3.4.1     ✔ fabletools  0.3.2

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✔ readr   2.1.2     ✔ stringr 1.5.0
## ✔ purrr   1.0.1     ✔ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ lubridate::as.difftime() masks base::as.difftime()
## ✖ lubridate::date()        masks base::date()
## ✖ dplyr::filter()          masks stats::filter()
## ✖ tsibble::intersect()     masks lubridate::intersect(), base::intersect()
## ✖ tsibble::interval()      masks lubridate::interval()
## ✖ dplyr::lag()             masks stats::lag()
## ✖ tsibble::setdiff()       masks lubridate::setdiff(), base::setdiff()
## ✖ tsibble::union()         masks lubridate::union(), base::union()

library(gridExtra)

## 
## Attaching package: 'gridExtra'

## The following object is masked from 'package:dplyr':
## 
##     combine

library(ggplot2)
library(forecast)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

## 
## Attaching package: 'forecast'

## The following objects are masked from 'package:fabletools':
## 
##     accuracy, forecast

library(quantmod)

## Loading required package: xts

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following object is masked from 'package:tsibble':
## 
##     index

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## 
## Attaching package: 'xts'

## The following objects are masked from 'package:dplyr':
## 
##     first, last

## Loading required package: TTR

library(ggfortify)

## Registered S3 methods overwritten by 'ggfortify':
##   method                 from    
##   autoplot.Arima         forecast
##   autoplot.acf           forecast
##   autoplot.ar            forecast
##   autoplot.bats          forecast
##   autoplot.decomposed.ts forecast
##   autoplot.ets           forecast
##   autoplot.forecast      forecast
##   autoplot.stl           forecast
##   autoplot.ts            forecast
##   fitted.ar              forecast
##   fortify.ts             forecast
##   residuals.ar           forecast

library(tseries)
library(knitr)
library(latex2exp)
library(ggpubr)

## 
## Attaching package: 'ggpubr'

## The following object is masked from 'package:forecast':
## 
##     gghistogram

Q 9.1 Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise?

ans: a white noise series is stationary, the ACF bars of all 3 of them are inside of the dashline which indicate the point of statistical significance. This mean they are not statistically significant. However, the strong correlation is at different lag.and all plot look like Figure 2.23: Autocorrelation function for the white noise series.

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

ans: it is because all each autocorrelation out of the 3 plot is close to zero, also there is no spikes are outside of the dashline which is the bounds.

Q 9.2 A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

there is no seasonal pattern, and it is neg trend in 2018 which is the end of the plot, and r value is high.

gafa_stock %>%
  filter(Symbol == "AMZN") %>%
  gg_tsdisplay(Close) +
  labs(title = "AMZ Closing Stock Price")

## Warning: Provided data has an irregular interval, results should be treated with
## caution. Computing ACF by observation.

gafa_stock %>%
  filter(Symbol == "AMZN") %>%
  features(Close, unitroot_ndiffs)

## # A tibble: 1 × 2
##   Symbol ndiffs
##   <chr>   <int>
## 1 AMZN        1

9.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Turkish GDP from global_economy.

the lambda is .157 and ndiff is 1.

turkey <- global_economy %>% 
  filter(Country=='Turkey') %>% 
  select(Country, GDP)

turkey

## # A tsibble: 58 x 3 [1Y]
## # Key:       Country [1]
##    Country          GDP  Year
##    <fct>          <dbl> <dbl>
##  1 Turkey  13995067818.  1960
##  2 Turkey   8022222222.  1961
##  3 Turkey   8922222222.  1962
##  4 Turkey  10355555556.  1963
##  5 Turkey  11177777778.  1964
##  6 Turkey  11944444444.  1965
##  7 Turkey  14122222222.  1966
##  8 Turkey  15666666667.  1967
##  9 Turkey  17500000000   1968
## 10 Turkey  19466666667.  1969
## # … with 48 more rows

lambda <- turkey %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

lambda

## [1] 0.1572187

turkey  %>%
  mutate(GDP = box_cox(GDP, lambda)) %>%
  features(GDP, unitroot_ndiffs)

## # A tibble: 1 × 2
##   Country ndiffs
##   <fct>    <int>
## 1 Turkey       1

global_economy %>%
  filter(Country == "Turkey") %>%
  gg_tsdisplay(GDP, plot_type='partial') +
  labs(title = "Not transformed Turkish GDP")

global_economy %>%
  filter(Country == "Turkey") %>%
  gg_tsdisplay(difference(box_cox(GDP,lambda)), plot_type='partial') +
  labs(title = "Turkish GDP with lambda ")

## Warning: Removed 1 row containing missing values (`geom_line()`).

## Warning: Removed 1 rows containing missing values (`geom_point()`).

Accommodation takings in the state of Tasmania from aus_accommodation.

Tasmania <- aus_accommodation %>% 
  filter(State=='Tasmania') %>% 
  select(State, Takings)


lambda <- Tasmania %>%
  features(Takings, features = guerrero) %>%
  pull(lambda_guerrero)

lambda

## [1] 0.001819643

Tasmania  %>%
  mutate(Takings = box_cox(Takings, lambda)) %>%
  features(Takings, unitroot_ndiffs)

## # A tibble: 1 × 2
##   State    ndiffs
##   <chr>     <int>
## 1 Tasmania      1

aus_accommodation %>%
  filter(State == "Tasmania") %>%
  gg_tsdisplay(Takings, plot_type='partial') +
  labs(title = "Not transformed Tasmania Accomodation Takings")

aus_accommodation %>%
  filter(State == "Tasmania") %>%
  gg_tsdisplay(difference(box_cox(Takings,lambda), 4), plot_type='partial') +
  labs(title = "Differenced Tasmania Accomodation Takings with lambda ")

## Warning: Removed 4 rows containing missing values (`geom_line()`).

## Warning: Removed 4 rows containing missing values (`geom_point()`).

Monthly sales from souvenirs.

lambda_souvenirs <- souvenirs %>% 
  features(Sales, features = guerrero) %>%
  pull(lambda_guerrero)


souvenirs %>%
  mutate(Sales = box_cox(Sales, lambda_souvenirs)) %>%
  features(Sales, unitroot_ndiffs)

## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      1

souvenirs %>%
  gg_tsdisplay(Sales, plot_type='partial', lag = 36) +
  labs(title = "Non-transformed Monthly Souvenir Sales")

souvenirs %>%
  gg_tsdisplay(difference(box_cox(Sales,lambda_souvenirs), 12), plot_type='partial', lag = 36) +
  labs(title = "Differenced souvenirs sales with lambda ")

## Warning: Removed 12 rows containing missing values (`geom_line()`).

## Warning: Removed 12 rows containing missing values (`geom_point()`).

9.5

For your retail data (from Exercise 8 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

set.seed(1234)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1)) 

myseries %>%
  gg_tsdisplay(Turnover, plot_type='partial', lag = 36) +
  labs(title = "Not transformed Retail Turnover")

lambda <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

myseries %>%
  features(box_cox(Turnover, lambda), unitroot_nsdiffs)

## # A tibble: 1 × 3
##   State    Industry                                      nsdiffs
##   <chr>    <chr>                                           <int>
## 1 Tasmania Cafes, restaurants and takeaway food services       1

myseries %>%
  gg_tsdisplay(difference(box_cox(Turnover,lambda), 12), plot_type='partial', lag = 36) +
  labs(title = "Differenced Tasmania Accomodation Takings with lambda ")

## Warning: Removed 12 rows containing missing values (`geom_line()`).

## Warning: Removed 12 rows containing missing values (`geom_point()`).

9.6

Simulate and plot some data from simple ARIMA models.

Use the following R code to generate data from an AR(1) model

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

b.Produce a time plot for the series. How does the plot change as you change

for(i in 2:100)
  y[i] <- 1*y[i-1] + e[i]
sim2 <- tsibble(idx = seq_len(100), y = y, index = idx) 

sim2 %>% 
  autoplot(y)+
  labs(title = "AR(1) model with phi i = 1 ")

for(i in 2:100)
  y[i] <- 2*y[i-1] + e[i]
sim3 <- tsibble(idx = seq_len(100), y = y, index = idx)

sim3 %>% 
  autoplot(y)+
  labs(title = "AR(1) model with phi i = 2 ")

for(i in 2:100)
  y[i] <- -0.6*y[i-1] + e[i]
sim4 <- tsibble(idx = seq_len(100), y = y, index = idx)

sim4 %>% 
  autoplot(y)+
  labs(title = "AR(1) model with phi i = -0.6 ")

Write your own code to generate data from an MA(1) model with
Produce a time plot for the series. How does the plot change as you change

for(i in 2:100)
  y[i] <- e[i] + 0.6 * e[i-1]

sim9c1 <- tsibble(idx = seq_len(100), y = y, index = idx)

sim9c1 %>% 
  autoplot(y)+
  labs(title = "AR(1) model with theta i = .06 ")

for(i in 2:100)
  y[i] <- e[i] + 1 * e[i-1]
sim9c2 <- tsibble(idx = seq_len(100), y = y, index = idx)

sim9c2 %>% 
  autoplot(y)+
  labs(title = "AR(1) model with theta i = 1 ")

for(i in 2:100)
  y[i] <- e[i] + 2 * e[i-1]

sim9c3 <- tsibble(idx = seq_len(100), y = y, index = idx)

sim9c3 %>% 
  autoplot(y)+
  labs(title = "AR(1) model with theta i = 2 ")

for(i in 2:100)
  y[i] <- e[i] + -.06  * e[i-1]

sim9c4 <- tsibble(idx = seq_len(100), y = y, index = idx)

sim9c4 %>% 
  autoplot(y)+
  labs(title = "AR(1) model with theta i = -.06 ")

e. Generate data from an ARMA(1,1) model with

for(i in 2:100)
  y[i] <- 0.6*y[i-1] + 0.6*e[i-1] + e[i]

q9e <- tsibble(idx = seq_len(100), y = y, index = idx)

Generate data from an AR(2) model with

set.seed(1234)
y <- numeric(100)

for(i in 3:100)
  y[i] <- -0.8 * y[i-1] + 0.3 * y[i-2] + e[i]

q9f <- tsibble(idx = seq_len(100), y = y, index = idx)

q9e %>%
  gg_tsdisplay(y, plot_type='partial')

q9f %>%
  gg_tsdisplay(y, plot_type='partial')

9.7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

forecast is not working.

aus_airpassengers2 <- aus_airpassengers
autoplot(aus_airpassengers2)

## Plot variable not specified, automatically selected `.vars = Passengers`

fit <- aus_airpassengers2 %>%
  model(ARIMA(Passengers)) 

report(fit)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

fit %>% gg_tsresiduals()

b. Write the model in terms of the backshift operator.

\((1-\phi_1B) (1-B)^2y_t = c + (1 + \theta_1B)e_t\)

\(y_t = -0.8963*e_t-1 + e_t\)

c.Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

fit2 <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ pdq(0,1,0)))


fit2 %>% gg_tsresiduals()

d.Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

no model is found.

fit3 <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ pdq(2,1,2)))

## Warning: It looks like you're trying to fully specify your ARIMA model but have not said if a constant should be included.
## You can include a constant using `ARIMA(y~1)` to the formula or exclude it by adding `ARIMA(y~0)`.

## Warning: 1 error encountered for ARIMA(Passengers ~ pdq(2, 1, 2))
## [1] Could not find an appropriate ARIMA model.
## This is likely because automatic selection does not select models with characteristic roots that may be numerically unstable.
## For more details, refer to https://otexts.com/fpp3/arima-r.html#plotting-the-characteristic-roots

e.Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

no model is found.

fit3 <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ pdq(2,1,2)))

## Warning: It looks like you're trying to fully specify your ARIMA model but have not said if a constant should be included.
## You can include a constant using `ARIMA(y~1)` to the formula or exclude it by adding `ARIMA(y~0)`.

## Warning: 1 error encountered for ARIMA(Passengers ~ pdq(2, 1, 2))
## [1] Could not find an appropriate ARIMA model.
## This is likely because automatic selection does not select models with characteristic roots that may be numerically unstable.
## For more details, refer to https://otexts.com/fpp3/arima-r.html#plotting-the-characteristic-roots

9.8

a the line look smooth, so it does not look like Box-Cox transformation is needed.

usgdp <- global_economy %>% 
  filter(Country=="United States")%>%
  select(Country, GDP)

usgdp %>% autoplot(GDP) +
  labs(title = "US GDP")

it looks like 0,2,2 fit the best.

fit <- usgdp %>%
  model(
    arima = ARIMA(GDP, stepwise = FALSE, approx = FALSE))

report(fit)

## Series: GDP 
## Model: ARIMA(0,2,2) 
## 
## Coefficients:
##           ma1      ma2
##       -0.4206  -0.3048
## s.e.   0.1197   0.1078
## 
## sigma^2 estimated as 2.615e+22:  log likelihood=-1524.08
## AIC=3054.15   AICc=3054.61   BIC=3060.23

fit 222 has the lower value of AIC, AICc and BIC which is more ideal to use.

fit222 <- usgdp %>%
  model(ARIMA(GDP ~ pdq(2,2,2)))

report(fit222)

## Series: GDP 
## Model: ARIMA(2,2,2) 
## 
## Coefficients:
##          ar1      ar2      ma1     ma2
##       1.3764  -0.4780  -1.9659  1.0000
## s.e.  0.1216   0.1354   0.0723  0.0719
## 
## sigma^2 estimated as 2.283e+22:  log likelihood=-1521.14
## AIC=3052.27   AICc=3053.47   BIC=3062.4

#p=1
fit122 <- usgdp %>%
  model(ARIMA(GDP ~ pdq(1,2,2)))

report(fit122)

## Series: GDP 
## Model: ARIMA(1,2,2) 
## 
## Coefficients:
##          ar1      ma1      ma2
##       0.2053  -0.5912  -0.1928
## s.e.  0.3008   0.2886   0.2102
## 
## sigma^2 estimated as 2.646e+22:  log likelihood=-1523.86
## AIC=3055.72   AICc=3056.51   BIC=3063.82

Like we saw, fit222 has lower value of all 3 items, so it is a better model overall.

fit222 %>% gg_tsresiduals() +
  labs(title = "US GDP 10 Year Forecast",
       subtitle = "ARIMA(2,2,2)")

forecast gave me error messages

same thing, fit222 and 0,2,2 has lower value than fitets.

fitets <- usgdp %>% 
  model(ETS(GDP))

report(fitets)

## Series: GDP 
## Model: ETS(M,A,N) 
##   Smoothing parameters:
##     alpha = 0.9990876 
##     beta  = 0.5011949 
## 
##   Initial states:
##          l[0]        b[0]
##  448093333334 64917355687
## 
##   sigma^2:  7e-04
## 
##      AIC     AICc      BIC 
## 3190.787 3191.941 3201.089

data624 assignment6

Jay Lee

3/18/2023