9.1, 9.2, 9.3, 9.5, 9.6, 9.7, 9.8 in Hyndman
9.1 Explain the differences among these figures. Do they all indicate that the data are white noise? Each one of the figures has a different amount of numbers in their series. as the number increases the ACF bounds (critical values) get smaller. They do indicate the data is white noise due to the ACF graphs not showing any autocorrelation past the critical values. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise? The critical values are at different distances from the mean of zero due the larger sample sizes. the autocorrelations are also smaller with increasing sample sizes. With a larger sample size the smaller the deviation from the mean will indicate an autocorrelation. Due to the different sampling of the data the ACF graphs may differ in pattern but still do not show any critical values at any lag.
9.2 A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
library(fpp3)
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.5
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.4.1
## ✔ lubridate 1.9.3 ✔ fable 0.4.0
## ✔ ggplot2 3.5.1
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.5
## ✔ purrr 1.0.2 ✔ stringr 1.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readr)
library(latex2exp)
library(seasonal)
##
## Attaching package: 'seasonal'
##
## The following object is masked from 'package:tibble':
##
## view
library(GGally)
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
library(mlbench)
library(AppliedPredictiveModeling)
library(e1071)
##
## Attaching package: 'e1071'
##
## The following object is masked from 'package:fabletools':
##
## interpolate
library(Hmisc)
##
## Attaching package: 'Hmisc'
##
## The following object is masked from 'package:e1071':
##
## impute
##
## The following objects are masked from 'package:dplyr':
##
## src, summarize
##
## The following objects are masked from 'package:base':
##
## format.pval, units
library(caret)
## Loading required package: lattice
##
## Attaching package: 'caret'
##
## The following object is masked from 'package:purrr':
##
## lift
##
## The following objects are masked from 'package:fabletools':
##
## MAE, RMSE
library(dplyr)
#select relevant rows and columns and graph
head(gafa_stock)
## # A tsibble: 6 x 8 [!]
## # Key: Symbol [1]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2014-01-02 79.4 79.6 78.9 79.0 67.0 58671200
## 2 AAPL 2014-01-03 79.0 79.1 77.2 77.3 65.5 98116900
## 3 AAPL 2014-01-06 76.8 78.1 76.2 77.7 65.9 103152700
## 4 AAPL 2014-01-07 77.8 78.0 76.8 77.1 65.4 79302300
## 5 AAPL 2014-01-08 77.0 77.9 77.0 77.6 65.8 64632400
## 6 AAPL 2014-01-09 78.1 78.1 76.5 76.6 65.0 69787200
unique(gafa_stock$Symbol)
## [1] "AAPL" "AMZN" "FB" "GOOG"
gafa_stock |> filter(Symbol == "AMZN") |> select(Date, Close, Symbol) |> autoplot() + labs(subtitle = "Amazon Close")
## Plot variable not specified, automatically selected `.vars = Close`
amazonstock <- gafa_stock |> filter(Symbol == "AMZN") |> select(Date, Close, Symbol)
#graph ACF
amazonstock |> ACF(Close) |>
autoplot() + labs(subtitle = "Amazon closing stock price")
## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.
#graph PACF
amazonstock |>
PACF(Close) |>
autoplot()
## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.
#graph the difference
amazonstock |>
mutate(diff_close = difference(Close)) |>
autoplot(diff_close) + labs(title = "Difference in closing Amazon")
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
the plot of the closing stock prices for amazon show that it is non stationary due to a clear trend occurring. there is a general increase then decrease. There is also a very strong autocorrelation at different lags in the ACF. The PACF also shows several autocorrelations at large lags. Taking a difference between an observation and the one before it the data resembles white noise and is stationary.
For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
Turkish GDP from global_economy. Accommodation takings in the state of Tasmania from aus_accommodation. Monthly sales from souvenirs.
head(global_economy)
## # A tsibble: 6 x 9 [1Y]
## # Key: Country [1]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Afghanistan AFG 1960 537777811. NA NA 7.02 4.13 8996351
## 2 Afghanistan AFG 1961 548888896. NA NA 8.10 4.45 9166764
## 3 Afghanistan AFG 1962 546666678. NA NA 9.35 4.88 9345868
## 4 Afghanistan AFG 1963 751111191. NA NA 16.9 9.17 9533954
## 5 Afghanistan AFG 1964 800000044. NA NA 18.1 8.89 9731361
## 6 Afghanistan AFG 1965 1006666638. NA NA 21.4 11.3 9938414
turkeygdp <- global_economy |> dplyr::select(Country, Year, GDP) |> filter(Country == "Turkey")
turkeygdp |> autoplot()
## Plot variable not specified, automatically selected `.vars = GDP`
turkeygdp$GDP <- as.numeric(turkeygdp$GDP)
#boxturkey <- BoxCoxTrans(turkeygdp$GDP)
lambda <- turkeygdp |> #test to see if any lambda would be helpful
features(GDP, features = guerrero) |>
pull(lambda_guerrero)
turkeygdp |>
autoplot(box_cox(GDP, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed GDP with $\\lambda$ = ",
round(lambda,2))))
box cox transformation is not enough due to the trend that still exists
tried to do a difference
#turkeybox <- turkeygdp |> box_cox(GDP, lambda = lambda)
#graph the difference
turkeygdp2 <- turkeygdp |>
mutate(diff_boxGDP = difference(box_cox(GDP, lambda = lambda)))
turkeygdp |>
mutate(diff_boxGDP = difference(box_cox(GDP, lambda = lambda))) |>
autoplot(diff_boxGDP) + labs(title = "Turkey GDP Box Cox then difference")
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
turkeygdp2 |>
features(diff_boxGDP, unitroot_kpss)
## # A tibble: 1 × 3
## Country kpss_stat kpss_pvalue
## <fct> <dbl> <dbl>
## 1 Turkey 0.0889 0.1
now the data looks stationary. kpss null hypothesis data is stationary for greater than 0.01, p > 0.01 data is stationary
#graph ACF
turkeygdp2 |> ACF(diff_boxGDP) |>
autoplot() + labs(subtitle = "TurkeyGDP box cox then difference")
#graph PACF
turkeygdp2 |>
PACF(diff_boxGDP) |>
autoplot() + labs(subtitle = "TurkeyGDP box cox then difference")
no autocorrelations detected.
Accommodation takings in the state of Tasmania from aus_accommodation.
head(aus_accommodation)
## # A tsibble: 6 x 5 [1Q]
## # Key: State [1]
## Date State Takings Occupancy CPI
## <qtr> <chr> <dbl> <dbl> <dbl>
## 1 1998 Q1 Australian Capital Territory 24.3 65 67
## 2 1998 Q2 Australian Capital Territory 22.3 59 67.4
## 3 1998 Q3 Australian Capital Territory 22.5 58 67.5
## 4 1998 Q4 Australian Capital Territory 24.4 59 67.8
## 5 1999 Q1 Australian Capital Territory 23.7 58 67.8
## 6 1999 Q2 Australian Capital Territory 25.4 61 68.1
#select tasmania takings
tastak <- aus_accommodation |> select(Date, State, Takings) |> filter(State == "Tasmania")
tastak |> autoplot() + labs(subtitle = "Tasmania Takings")
## Plot variable not specified, automatically selected `.vars = Takings`
data is seasonal with an upward trend. having a trend is non
stationary.
#boxcox it
lambda <- tastak |> #test to see if any lambda would be helpful
features(Takings, features = guerrero) |>
pull(lambda_guerrero)
#plot with the found lambda
tastak |>
autoplot(box_cox(Takings, lambda )) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Tasmania Takings with $\\lambda$ = ",
round(lambda,5))))
taking a small lambda evens the variance of the data. however it is not
all centered around a mean (exhibits a trend). need to implement a
difference
#add the box cox difference to the dataframe
tastak2 <- tastak |>
mutate(diff_boxtak = difference(box_cox(Takings, lambda = lambda)))
#graph the ACF of the box cox difference
tastak2 |>
autoplot(diff_boxtak) + labs(title = "Turkey GDP Box Cox then difference")
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
tastak2 |> ACF(diff_boxtak) |>
autoplot() + labs(subtitle = "Tasmania Takings box cox then difference")
#graph PACF
tastak2 |>
PACF(diff_boxtak) |>
autoplot() + labs(subtitle = "TurkeyGDP box cox then difference")
tastak2 |>
features(diff_boxtak, unitroot_kpss)
## # A tibble: 1 × 3
## State kpss_stat kpss_pvalue
## <chr> <dbl> <dbl>
## 1 Tasmania 0.258 0.1
transformed data is centered around zero but shows high autocorrelation. kpss fails to reject the null hypothesis indicating the data is stationary. attempting 1 more difference
head(tastak2)
## # A tsibble: 6 x 4 [1Q]
## # Key: State [1]
## Date State Takings diff_boxtak
## <qtr> <chr> <dbl> <dbl>
## 1 1998 Q1 Tasmania 28.7 NA
## 2 1998 Q2 Tasmania 19.0 -0.414
## 3 1998 Q3 Tasmania 16.1 -0.167
## 4 1998 Q4 Tasmania 25.9 0.477
## 5 1999 Q1 Tasmania 28.4 0.0915
## 6 1999 Q2 Tasmania 20.1 -0.345
#error due to na
tastak2 <- tastak2 |>
filter(!is.na(diff_boxtak))
tastak3 <- tastak2 |>
mutate(diff_boxtak2 = difference(diff_boxtak))
#graph the ACF of the box cox difference
tastak3 |>
autoplot(diff_boxtak2) + labs(title = "Turkey GDP Box Cox difference = 2")
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
tastak3 |> ACF(diff_boxtak2) |>
autoplot() + labs(subtitle = "Tasmania Takings box cox difference = 2")
#graph PACF
tastak3 |>
PACF(diff_boxtak2) |>
autoplot() + labs(subtitle = "Tasmania Takings box cox then difference = 2")
tastak3 |>
features(diff_boxtak2, unitroot_kpss)
## # A tibble: 1 × 3
## State kpss_stat kpss_pvalue
## <chr> <dbl> <dbl>
## 1 Tasmania 0.444 0.0581
Taking an additional difference does not help. may be best to stick with the first difference after the boxcox transformation.
Monthly sales from souvenirs.
head(souvenirs)
## # A tsibble: 6 x 2 [1M]
## Month Sales
## <mth> <dbl>
## 1 1987 Jan 1665.
## 2 1987 Feb 2398.
## 3 1987 Mar 2841.
## 4 1987 Apr 3547.
## 5 1987 May 3753.
## 6 1987 Jun 3715.
#select tasmania takings
souvenirs |> autoplot() + labs(subtitle = "souvenir sales")
## Plot variable not specified, automatically selected `.vars = Sales`
#boxcox it
lambda <- souvenirs |> #test to see if any lambda would be helpful
features(Sales, features = guerrero) |>
pull(lambda_guerrero)
#plot with the found lambda
souvenirs |>
autoplot(box_cox(Sales, lambda )) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Souvenir sales with $\\lambda$ = ",
round(lambda,5))))
#add the box cox difference to the dataframe
souvenirs2 <- souvenirs |>
mutate(diff_boxsale = difference(box_cox(Sales, lambda = lambda)))
#graph the ACF of the box cox difference
souvenirs2 |>
autoplot(diff_boxsale) + labs(title = "souvenir Box Cox then difference")
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
souvenirs2 |> ACF(diff_boxsale) |>
autoplot() + labs(subtitle = "souvenir box cox then difference")
#graph PACF
souvenirs2 |>
PACF(diff_boxsale) |>
autoplot() + labs(subtitle = "souvenir box cox then difference")
souvenirs2 |>
features(diff_boxsale, unitroot_kpss)
## # A tibble: 1 × 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 0.0631 0.1
#attempting second difference
#remove any nas
souvenirs2 <- souvenirs2 |>
filter(!is.na(diff_boxsale))
souvenirs3 <- souvenirs2 |>
mutate(diff_boxsale2 = difference(diff_boxsale))
#graph the ACF of the box cox difference
souvenirs3 |>
autoplot(diff_boxsale2) + labs(title = "souvenir Box Cox difference = 2")
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
souvenirs3 |> ACF(diff_boxsale2) |>
autoplot() + labs(subtitle = "souvenirbox cox difference = 2")
#graph PACF
souvenirs3 |>
PACF(diff_boxsale2) |> autoplot() + labs(subtitle = "souvenir box cox then difference = 2")
souvenirs3 |>
features(diff_boxsale2, unitroot_kpss)
## # A tibble: 1 × 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 0.0419 0.1
initially upward trend with increasing spread. box cox can make the spread more even. differencing can make the data centered around zero. PACF shows less autocorrelations when taking the first difference. may be best to stick with the first difference. Also kpss is greater than 0.01 for the first difference.
For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
set.seed(987654321)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries |> autoplot()
## Plot variable not specified, automatically selected `.vars = Turnover`
head(myseries)
## # A tsibble: 6 x 5 [1M]
## # Key: State, Industry [1]
## State Industry `Series ID` Month Turnover
## <chr> <chr> <chr> <mth> <dbl>
## 1 Victoria Other recreational goods retailing A3349415T 1982 Apr 15.6
## 2 Victoria Other recreational goods retailing A3349415T 1982 May 15.8
## 3 Victoria Other recreational goods retailing A3349415T 1982 Jun 15.2
## 4 Victoria Other recreational goods retailing A3349415T 1982 Jul 15.2
## 5 Victoria Other recreational goods retailing A3349415T 1982 Aug 14.5
## 6 Victoria Other recreational goods retailing A3349415T 1982 Sep 15.1
box cox transform the data to even the spread.
#boxcox it
lambda <- myseries |> #test to see if any lambda would be helpful
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
#plot with the found lambda
myseries |>
autoplot(box_cox(Turnover, lambda )) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Turnover with $\\lambda$ = ",
round(lambda,5))))
the spread of the data is now more even. need to center it.
#add the box cox difference to the dataframe
myseries2 <- myseries |>
mutate(diff_boxturn = difference(box_cox(Turnover, lambda = lambda)))
#graph the ACF of the box cox difference
myseries2 |>
autoplot(diff_boxturn) + labs(title = "souvenir Box Cox then difference")
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
myseries2 |> ACF(diff_boxturn) |>
autoplot() + labs(subtitle = "turnover box cox then difference")
#graph PACF
myseries2 |>
PACF(diff_boxturn) |>
autoplot() + labs(subtitle = "turnover box cox then difference")
#kpss test null is data is stationary
myseries2 |>
features(diff_boxturn, unitroot_kpss)
## # A tibble: 1 × 4
## State Industry kpss_stat kpss_pvalue
## <chr> <chr> <dbl> <dbl>
## 1 Victoria Other recreational goods retailing 0.0418 0.1
myseries2 |> features(diff_boxturn, unitroot_nsdiffs)
## # A tibble: 1 × 3
## State Industry nsdiffs
## <chr> <chr> <int>
## 1 Victoria Other recreational goods retailing 1
doing a first order difference transforms the data to something that looks stationary. kpss p>0.01 also indicates the data is stationary. can check second order
#difference the difference
myseries3 <- myseries2 |>
mutate(diff_boxturn2 = difference(diff_boxturn))
#graph the ACF of the box cox difference
myseries3 |>
autoplot(diff_boxturn2) + labs(title = "turnover Box Cox difference = 2")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
myseries3 |> ACF(diff_boxturn2) |>
autoplot() + labs(subtitle = " turnover box cox difference = 2")
#graph PACF
myseries3 |>
PACF(diff_boxturn2) |> autoplot() + labs(subtitle = "turnover box cox then difference = 2")
myseries3 |>
features(diff_boxturn2, unitroot_kpss)
## # A tibble: 1 × 4
## State Industry kpss_stat kpss_pvalue
## <chr> <chr> <dbl> <dbl>
## 1 Victoria Other recreational goods retailing 0.0148 0.1
first order and second order difference appear to be very similar. I would stick with the first order difference. a seasonal difference may be needed instead of additional order differencing.
Simulate and plot some data from simple ARIMA models.
Use the following R code to generate data from an AR(1) model with
The process starts with
y <- numeric(100) e <- rnorm(100) for(i in 2:100) y[i] <-
0.6*y[i-1] + e[i] sim <- tsibble(idx = seq_len(100), y = y, index =
idx) Produce a time plot for the series. How does the plot change as you
change phi
Write your own code to generate data from an MA(1) model with
Produce a time plot for the series. How does the plot change as you change
Generate data from an ARMA(1,1) model with
Generate data from an AR(2) model with
. (Note that these parameters will give a non-stationary series.)
Graph the latter two series and compare them.
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim |> autoplot() + labs(subtitle = "phi 0.6")
## Plot variable not specified, automatically selected `.vars = y`
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.1*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim |> autoplot() + labs(subtitle = "phi 0.1")
## Plot variable not specified, automatically selected `.vars = y`
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 1*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim |> autoplot() + labs(subtitle = "phi 1")
## Plot variable not specified, automatically selected `.vars = y`
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 2*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim |> autoplot() + labs(subtitle = "phi 2")
## Plot variable not specified, automatically selected `.vars = y`
How does the plot change as you change phi ? as phi is increased the data becomes less stationary and exhibits more trends. the time of the cyclic behavior also appears to be shorter. also phi is supposed to be less than 1 it appears to diverge if set to higher than 1.
Write your own code to generate data from an MA(1) model with
th1=0.6and sig2=1.
set.seed(6)
#MA uses theta * e(i-1) + e(i) instead of theta* y(i-1) for AR
y <- numeric(100)
e <- rnorm(100, sd = 1)
for(i in 2:100)
y[i] <- 0.6*e[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
#attempting to fit a MA(1) model to the simulated MA(1)
fit <- sim |>
model(
ma01 = ARIMA(y ~ pdq(0,0,1) )
)
fitted_data <- fit %>%
augment() %>%
mutate(idx = sim$idx)
#
fitted_data |>
ggplot(aes(x = idx, y = .fitted)) + # fitted is from the model
geom_line() +
geom_line(data = sim, aes(x=idx,y=y, color = "blue"))+ #despite being set to blue comes out red
labs(title = "fit ARIMA pdq(0,0,1) model for MA(1) ")
a MA(1) model was fitted to the similated data Produce a time plot for
the series. How does the plot change as you change th1?
#copy from above change phi
set.seed(6)
#copied the simulated
y <- numeric(100)
e <- rnorm(100, sd = 1)
for(i in 2:100)
y[i] <- .9*e[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
head(sim)
## # A tsibble: 6 x 2 [1]
## idx y
## <int> <dbl>
## 1 1 0
## 2 2 -0.387
## 3 3 0.302
## 4 4 2.51
## 5 5 1.58
## 6 6 0.390
#then fit a model to the simulated
fit <- sim |>
model(
ma01 = ARIMA(y ~ pdq(0,0,1) )
)
fitted_data <- fit %>%
augment() %>%
mutate(idx = sim$idx)
#
fitted_data |>
ggplot(aes(x = idx, y = .fitted)) + # fitted is from the model
geom_line() +
geom_line(data = sim, aes(x=idx,y=y), color = "blue")+ # did not set color
labs(title = "fit ARIMA pdq(0,0,1) model for MA(1) phi 0.9")
#changing phi to 0.2
set.seed(6)
#copied the simulated
y <- numeric(100)
e <- rnorm(100, sd = 1)
for(i in 2:100)
y[i] <- .2*e[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
head(sim)
## # A tsibble: 6 x 2 [1]
## idx y
## <int> <dbl>
## 1 1 0
## 2 2 -0.576
## 3 3 0.743
## 4 4 1.90
## 5 5 0.370
## 6 6 0.373
#then fit a model to the simulated
fit <- sim |>
model(
ma01 = ARIMA(y ~ pdq(0,0,1) )
)
fitted_data <- fit %>%
augment() %>%
mutate(idx = sim$idx)
#
fitted_data |>
ggplot(aes(x = idx, y = .fitted)) + # fitted is from the model
geom_line() +
geom_line(data = sim, aes(x=idx,y=y), color = "blue")+ #
labs(title = "fit ARIMA pdq(0,0,1) model for MA(1) phi 0.2")
as phi decreases the spread/range of the model decreases. iit will tend
to stick closer to zero as the error terms have less of an impact due to
the smaller multiplier.
Generate data from an ARMA(1,1) model with
phi1=0.6 th1=0.6and sig2=1.
set.seed(6)
y <- numeric(100)
e <- rnorm(100, sd = 1)
for(i in 2:100)
y[i] <- e[i] + 0.6 * e[i-1] + 0.6 * y[i-1]
sim2 <- tsibble(idx = seq_len(100), y = y, index = idx)
sim2 |> autoplot(y)
6f Generate data from an AR(2) model with
phi1=-0.8 , phi2=0.3 and sig2=1 (Note that these parameters will give a
non-stationary series.)
#formula phi1 y(t-1) + phi2 y(t-2) + et
set.seed(6)
y <- numeric(100)
e <- rnorm(100, sd = 1)
for(i in 3:100)
y[i] <- e[i] + (-1)*0.8 * y[i-1] + 0.3 * y[i-2]
sim3 <- tsibble(idx = seq_len(100), y = y, index = idx)
sim3 |> autoplot(y)
series indeed looks non stationary
ggplot() +
geom_line(data = sim2 |> filter(idx<50), aes(x = idx, y = y, color = "ARMA(1,1)")) +
geom_line(data = sim3|> filter(idx<50), aes(x = idx, y = y, color = "AR(2)")) +
labs(title = "ARMA(1,1),AR(2)") +
scale_color_manual(values = c("Series 1" = "blue", "Series 2" = "red"))
## Warning: No shared levels found between `names(values)` of the manual scale and the
## data's colour values.
## No shared levels found between `names(values)` of the manual scale and the
## data's colour values.
## No shared levels found between `names(values)` of the manual scale and the
## data's colour values.
the range of the AR(2) model increases. the spread of the data for the
ARMA(1,1) is less. the ARMA(2,2) has a more consistent cyclic
pattern.
Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.
Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.
head(aus_airpassengers)
## # A tsibble: 6 x 2 [1Y]
## Year Passengers
## <dbl> <dbl>
## 1 1970 7.32
## 2 1971 7.33
## 3 1972 7.80
## 4 1973 9.38
## 5 1974 10.7
## 6 1975 11.1
#select the relevant years
air <- aus_airpassengers |> filter(Year<2012) |> filter(Year>1969)
aus_airpassengers |> autoplot()
## Plot variable not specified, automatically selected `.vars = Passengers`
#fit a model
air_fit <- air |>
model(ARIMA(Passengers, stepwise=FALSE))
air_fit |> glance()
## # A tibble: 1 × 8
## .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 ARIMA(Passengers, stepwise… 4.67 -87.8 180. 180. 183. <cpl> <cpl>
air_fit |> report()
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8756
## s.e. 0.0722
##
## sigma^2 estimated as 4.671: log likelihood=-87.8
## AIC=179.61 AICc=179.93 BIC=182.99
# dof = p + q
#augment to see features
air_fit |> augment() |>
features(.innov, ljung_box, lag = 1, dof = 1)
## # A tibble: 1 × 3
## .model lb_stat lb_pvalue
## <chr> <dbl> <dbl>
## 1 ARIMA(Passengers, stepwise = FALSE) 2.38 0
#check residuals
air_fit |>
gg_tsresiduals()
#look at forcast
air_fit |>
forecast(h=10) |>
autoplot(aus_airpassengers)
auto picked ARIMA model has residuals that definitely resemble white
noise. no autocorrelation outside the critical values. the residuals
count resemble a gaussian.
b.Write the model in terms of the backshift operator ARIMA (0,2,1) p = 0 d = 2, (yt-y(t-1))-(y(t-1)-y(t-2)) = yt(1-B) - yt(B^1 -B^2) =yt((1-B)-(B-B^2)) = yt(B^2-2B+1) q = 1, yt = et + phi_1 * e(t-1) = et(1 + phi_1*B)
##yt(B^2-2B+1) = et(1+phi_1B) yt(B-1)^2 = et(1+phi_1B) Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a. Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens
#comparing the automatically found model (0,2,1) and the 0,1,0
allfit <- air |>
model(
arima021 = ARIMA(Passengers, stepwise=FALSE),
arima010 = ARIMA(Passengers ~ pdq(0,1,0) )
#, arima212 = ARIMA(Passengers ~ pdq(2,1,2) )
# , drift1 = NAIVE(Passengers~ drift())
)
#forcast for both 021 and 010
forecast(allfit, h=10) |>
autoplot(aus_airpassengers) +
labs(title = "Air Passengers",
y="Passengers")
#021 and 212 comparison
allfit2 <- air |>
model(
arima021 = ARIMA(Passengers, stepwise=FALSE),
# arima010 = ARIMA(Passengers ~ pdq(0,1,0) ) ,
arima212 = ARIMA(Passengers ~ pdq(2,1,2) )
# , drift1 = NAIVE(Passengers~ drift())
)
forecast(allfit2, h=10) |>
autoplot(aus_airpassengers) +
labs(title = "Air Passengers",
y="Passengers")
#setting y ~ 0 removes the constant
fitrm <- air |>
model(
arima212 = ARIMA(Passengers ~ 1+ pdq(2,1,2) ),
arima212000 = ARIMA(Passengers ~ 0+ pdq(2,1,2) )
)
## Warning: 1 error encountered for arima212000
## [1] non-stationary AR part from CSS
forecast(fitrm, h=10) |>
autoplot(aus_airpassengers) +
labs(title = "Air Passengers constant removed",
y="Passengers")
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).
the ARIMA(0,1,0) decreases the predicted values when compared to the
021.
the 021 and the 212 model match very closely compared to the 010.
removing the constant causes the model to have a non stationary AR part
and it refuses to forecast.
For the United States GDP series (from global_economy):
if necessary, find a suitable Box-Cox transformation for the data; fit a suitable ARIMA model to the transformed data using ARIMA(); try some other plausible models by experimenting with the orders chosen; choose what you think is the best model and check the residual diagnostics; produce forecasts of your fitted model. Do the forecasts look reasonable? compare the results with what you would obtain using ETS() (with no transformation).
head(global_economy)
## # A tsibble: 6 x 9 [1Y]
## # Key: Country [1]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Afghanistan AFG 1960 537777811. NA NA 7.02 4.13 8996351
## 2 Afghanistan AFG 1961 548888896. NA NA 8.10 4.45 9166764
## 3 Afghanistan AFG 1962 546666678. NA NA 9.35 4.88 9345868
## 4 Afghanistan AFG 1963 751111191. NA NA 16.9 9.17 9533954
## 5 Afghanistan AFG 1964 800000044. NA NA 18.1 8.89 9731361
## 6 Afghanistan AFG 1965 1006666638. NA NA 21.4 11.3 9938414
#unique(global_economy$Country) checked if it was united states , united states of america, US ect.
usgdp <- global_economy |> filter(Country == 'United States') |> select(Country, Year, GDP)
usgdp |> autoplot()
## Plot variable not specified, automatically selected `.vars = GDP`
Initially, I do not see how a box cox transformation would really help since the data does not look cyclic at this time scale. the spread seems to be confinded to a line. tested a box cox tranformation anyway
lambda <- usgdp |> #test to see if any lambda would be helpful
features(GDP, features = guerrero) |>
pull(lambda_guerrero)
usgdp |>
autoplot(box_cox(GDP, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed GDP with $\\lambda$ = ",
round(lambda,2))))
usgdp <- usgdp |> mutate(boxgdp = box_cox(usgdp$GDP, lambda = lambda))
box cox causes it to be closer to a straight line
#fit a model
us_fit <- usgdp |>
model(ARIMA(boxgdp, stepwise=FALSE, approx = FALSE))
us_fit |> glance()
## # A tibble: 1 × 9
## Country .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 United States ARIMA(boxgdp… 5479. -325. 657. 657. 663. <cpl> <cpl>
us_fit |> report()
## Series: boxgdp
## Model: ARIMA(1,1,0) w/ drift
##
## Coefficients:
## ar1 constant
## 0.4586 118.1822
## s.e. 0.1198 9.5047
##
## sigma^2 estimated as 5479: log likelihood=-325.32
## AIC=656.65 AICc=657.1 BIC=662.78
# dof = p + q
#augment to see features
us_fit |> augment() |>
features(.innov, ljung_box, lag = 10, dof = 1)
## # A tibble: 1 × 4
## Country .model lb_stat lb_pvalue
## <fct> <chr> <dbl> <dbl>
## 1 United States ARIMA(boxgdp, stepwise = FALSE, approx = FALS… 3.81 0.923
#check residuals
us_fit |>
gg_tsresiduals()
#look at forcast
us_fit |>
forecast(h=10) |>
autoplot(data = usgdp)
there are no autocorrelations above the critical values of the ACF. the
residuals are not perfectly gaussian. the ljung box test does indicate
that the residuals are randomly distributed.
usgdp |> select(Year, boxgdp)|> gg_lag()
## Plot variable not specified, automatically selected `y = boxgdp`
usgdp |>select( boxgdp) |> gg_tsdisplay(boxgdp, plot_type = 'partial')
#ARIMA boxcox (0,0,14)
usgdp |>select( boxgdp) |> gg_tsdisplay(difference(boxgdp), plot_type = 'partial')
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
# for differenced data ARIMA (1, 1, 0) acf or ARIMA(0,1,1) pacf
will try a 0,1,1 based off the pacf since a 1,1,0 based off the acf graph for the difference order 1 box cox transformation matches the auto picked model.
us_fit2 <- usgdp |>
model( arima110 = ARIMA(boxgdp, stepwise=FALSE, approx = FALSE),
arima011 = ARIMA(boxgdp ~ pdq(p=0,d=1,q=1)))
us_fit2 |> glance()
## # A tibble: 2 × 9
## Country .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 United States arima110 5479. -325. 657. 657. 663. <cpl [1]> <cpl [0]>
## 2 United States arima011 5689. -326. 659. 659. 665. <cpl [0]> <cpl [1]>
us_fit2 |> report()
## Warning in report.mdl_df(us_fit2): Model reporting is only supported for
## individual models, so a glance will be shown. To see the report for a specific
## model, use `select()` and `filter()` to identify a single model.
## # A tibble: 2 × 9
## Country .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 United States arima110 5479. -325. 657. 657. 663. <cpl [1]> <cpl [0]>
## 2 United States arima011 5689. -326. 659. 659. 665. <cpl [0]> <cpl [1]>
# dof = p + q
#augment to see features
us_fit2 |> augment() |>
features(.innov, ljung_box, lag = 10, dof = 1)
## # A tibble: 2 × 4
## Country .model lb_stat lb_pvalue
## <fct> <chr> <dbl> <dbl>
## 1 United States arima011 6.22 0.717
## 2 United States arima110 3.81 0.923
#check residuals
us_fit2 |> select(arima011) |>
gg_tsresiduals()
#look at forcast
forecast(us_fit2, h=10) |> autoplot(data = usgdp)
when compared to ARIMA 011, ARIMA 110 has slightly wider confidence
levels. the 110 has a better p value, its residuals resemble more of
white noise the AICc/AIC is slightly lower for the 110 model. however
the models are very comparable producing similar predictions.out of the
two the model selected would be the 110. both predictions look
reasonable.
etsfit <- usgdp |> model(ETS(GDP))
forecast(etsfit, h =10) |> autoplot(usgdp)
etsfit |> glance()
## # A tibble: 1 × 10
## Country .model sigma2 log_lik AIC AICc BIC MSE AMSE MAE
## <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 United States ETS(GD… 6.78e-4 -1590. 3191. 3192. 3201. 2.79e22 1.20e23 0.0191
etsfit |> report()
## Series: GDP
## Model: ETS(M,A,N)
## Smoothing parameters:
## alpha = 0.9990876
## beta = 0.5011949
##
## Initial states:
## l[0] b[0]
## 448093333334 64917355687
##
## sigma^2: 7e-04
##
## AIC AICc BIC
## 3190.787 3191.941 3201.089
#augment to see features
etsfit |> augment() |>
features(.innov, ljung_box, lag = 10, dof = 1)
## # A tibble: 1 × 4
## Country .model lb_stat lb_pvalue
## <fct> <chr> <dbl> <dbl>
## 1 United States ETS(GDP) 3.82 0.923
#check residuals
etsfit |>
gg_tsresiduals()
the ets model performs well, at first glance it has larger confidence intervals compared to the ARIMA models on the box cox transformed data. the ets residuals have white noise behavior, so it may be an accurate model. the ARIMA models may offer a tighter prediction, however the ets model is more likely to contain the true values with its’ wider confidence.