9.1

A:

Each subsequent lag plot appears as though a round of differencing was applied. Though they are simply representations of the distribution of white noise for increasingly larger sample sizes, they resemble what a lag plot would look like for a dataset that underwent differencing. The spread in the acf values is tighter distributed above and below the axis in the following two plots. The first plot shows quite a bit of noise, but the third plot shows very limited noise.

B:

The critical values are at different distances because there is a lower probability for impactful outliers as the sample size increases. As the confidence level increases, the distribution of probabilities will decrease in magnitude.

The autocorrelations are different in each figure because the confidence level is higher for larger sample sizes, which leads to a tighter confidence interval.

9.2

amzn <- subset(gafa_stock,Symbol == "AMZN")

as_tsibble(amzn,index = Date,key = Symbol)

autoplot(amzn,Close)

amzn %>%
  gg_tsdisplay(Close,plot_type = "partial")

## Warning: Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.

First, the data is non-stationary as evidenced by the plot of the Closing prices. Because there is a substantial difference between the closing prices in 2016 to the closing prices in 2019, at least one round of differencing is necessary to flatten the time series plot.

This is further shown in the ACF, as each lag is exceedingly outside the confidence interval. The PACF shows that the first lag is an outlier, so it may only require one round of differencing.

First Order Differencing

The first order differencing shows a much flatter time series plot, and the ACF contains very few outliers. The PACF falls within the confidence interval for the first six lags.

amzn_diff <- amzn %>%
  mutate(Close_d1 = c(0,diff(Close)))

amzn_diff %>%
  gg_tsdisplay(Close_d1,plot_type = "partial")

## Warning: Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.

Second Order Differencing

The second order differencing overcorrected, so only one round of differencing is required.

amzn_diff <- amzn_diff %>%
  mutate(Close_d2 = c(0,diff(Close_d1)))

amzn_diff %>%
  gg_tsdisplay(Close_d2,plot_type = "partial")

## Warning: Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.

9.3

A

turkey <- subset(global_economy,Country=="Turkey")

autoplot(turkey,GDP)

lambda <- turkey %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

transform <- box_cox(turkey$GDP,lambda)

ggarrange(autoplot(turkey,GDP)+labs(title="Time Series of Turkey's GDP")+theme(plot.title = element_text(hjust = 0.5)),autoplot(turkey,transform)+labs(title="Box-Cox Transformed GDP Series")+theme(plot.title = element_text(hjust = 0.5)),ncol=1)

turkey_diff <- turkey %>%
  mutate(GDP_d1 = c(0,diff(transform)))


turkey_diff <- turkey_diff %>%
  mutate(GDP_d2 = c(0,diff(GDP_d1)))


ggarrange(autoplot(turkey,transform)+labs(title="Box-Cox Transformed GDP Series")+theme(plot.title = element_text(hjust = 0.5)),autoplot(turkey_diff,GDP_d1)+labs(title="First Order Differencing")+theme(plot.title = element_text(hjust = 0.5)),autoplot(turkey_diff,GDP_d2)+labs(title="Second Order Differencing")+theme(plot.title = element_text(hjust = 0.5)),ncol = 1)

Response:

After the first order differencing, the successive rounds of differencing lead to an overfit, so only one round of differencing is needed with a Box-Cox Transformation with a lambda of 0.1572.

B

tasmania <- subset(aus_accommodation,State=="Tasmania")


autoplot(tasmania,Takings)

lambda <- tasmania %>%
  features(Takings, features = guerrero) %>%
  pull(lambda_guerrero)

transform <- box_cox(tasmania$Takings,lambda)

tasmania$tf <- transform

ggarrange(autoplot(tasmania,Takings)+labs(title="Time Series of Tasmania's Accommodation Takings")+theme(plot.title = element_text(hjust = 0.5)),autoplot(tasmania,transform)+labs(title="Box-Cox Transformed Takings Series")+theme(plot.title = element_text(hjust = 0.5)),ncol=1)

tasmania %>% 
  gg_tsdisplay(difference(Takings,4),plot_type='partial',lag = 12)+
  labs(title = "Seasonally Differenced Takings Series",y="")

## Warning: Removed 4 row(s) containing missing values (geom_path).

## Warning: Removed 4 rows containing missing values (geom_point).

tasmania %>% 
  gg_tsdisplay(difference(Takings,4) %>% difference(),plot_type='partial',lag = 12)+
  labs(title = "Second-Order Seasonally Differenced Takings Series",y="")

## Warning: Removed 5 row(s) containing missing values (geom_path).

## Warning: Removed 5 rows containing missing values (geom_point).

tasmania %>% 
  gg_tsdisplay(difference(tf,4),plot_type='partial',lag = 12)+
  labs(title = "Seasonally Differenced Box-Cox Transformed Takings Series",y="")

## Warning: Removed 4 row(s) containing missing values (geom_path).

## Warning: Removed 4 rows containing missing values (geom_point).

tasmania %>% 
  gg_tsdisplay(difference(Takings,4) %>% difference(),plot_type='partial',lag = 12)+
  labs(title = "Second-Order Seasonally Differenced Box-Cox Transformed Takings Series",y="")

## Warning: Removed 5 row(s) containing missing values (geom_path).

## Warning: Removed 5 rows containing missing values (geom_point).

Response:

The seasonally differenced series required two rounds of differencing to be made stationary. Thus, two rounds of seasonal differencing is needed for the Tasmania dataset. Similarly, the Box-Cox transformed (lambda 0.0018) Takings series needed two rounds of differencing to be made stationary.

C

souvenirs <- souvenirs

autoplot(souvenirs,Sales)

lambda <- souvenirs %>%
  features(Sales, features = guerrero) %>%
  pull(lambda_guerrero)

transform <- box_cox(souvenirs$Sales,lambda)

souvenirs$tf <- transform

ggarrange(autoplot(souvenirs,Sales)+labs(title="Time Series of Monthly Souvenir Sales")+theme(plot.title = element_text(hjust = 0.5)),autoplot(souvenirs,transform)+labs(title="Box-Cox Transformed Souvenir Series")+theme(plot.title = element_text(hjust = 0.5)),ncol=1)

souvenirs %>% 
  gg_tsdisplay(difference(Sales,12),plot_type='partial',lag = 36)+
  labs(title = "Seasonally Differenced Souvenirs Series",y="")

## Warning: Removed 12 row(s) containing missing values (geom_path).

## Warning: Removed 12 rows containing missing values (geom_point).

souvenirs %>% 
  gg_tsdisplay(difference(Sales,12) %>% difference(),plot_type='partial',lag = 36)+
  labs(title = "Second-Order Seasonally Differenced Souvenirs Series",y="")

## Warning: Removed 13 row(s) containing missing values (geom_path).

## Warning: Removed 13 rows containing missing values (geom_point).

souvenirs %>% 
  gg_tsdisplay(difference(tf,12),plot_type='partial',lag = 36)+
  labs(title = "Seasonally Differenced Box-Cox Transformed Souvenirs Series",y="")

## Warning: Removed 12 row(s) containing missing values (geom_path).

## Warning: Removed 12 rows containing missing values (geom_point).

souvenirs %>% 
  gg_tsdisplay(difference(tf,12) %>% difference(),plot_type='partial',lag = 36)+
  labs(title = "Second-Order Seasonally Differenced Box-Cox Transformed Souvenirs Series",y="")

## Warning: Removed 13 row(s) containing missing values (geom_path).

## Warning: Removed 13 rows containing missing values (geom_point).

### Response

The greatest stationarity is achieved when the Box-Cox transformation with a lambda of 0.0021 undergoes two rounds of differencing.

9.5

set.seed(34)

aus_retail_series <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

autoplot(aus_retail_series)

## Plot variable not specified, automatically selected `.vars = Turnover`

#gg_season(aus_retail_series)
#gg_subseries(aus_retail_series)
#gg_lag(aus_retail_series)

lambda <- aus_retail_series %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

transform <- box_cox(aus_retail_series$Turnover,lambda)

aus_retail_series$tf <- transform

aus_retail_series %>% 
  gg_tsdisplay(difference(tf,12),plot_type='partial',lag = 36)+
  labs(title = "Seasonally Differenced Box-Cox Transformed Austrailian Retail Series",y="")

## Warning: Removed 12 row(s) containing missing values (geom_path).

## Warning: Removed 12 rows containing missing values (geom_point).

aus_retail_series %>% 
  gg_tsdisplay(difference(tf,12) %>% difference(),plot_type='partial',lag = 36)+
  labs(title = "Second-Order Seasonally Differenced Box-Cox Transformed Austrailian Retail Series",y="")

## Warning: Removed 13 row(s) containing missing values (geom_path).

## Warning: Removed 13 rows containing missing values (geom_point).

For the Australian Retail Series, a Box-Cox transformation with a lambda of 0.1705 requires two rounds of seasonal differencing to become stationary.

9.6

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

#glimpse(sim)

#autoplot(sim)

for(i in 2:100)
  y[i] <- 0.8*y[i-1] + e[i]
sim_08 <- tsibble(idx = seq_len(100), y = y, index = idx)
#autoplot(sim_08)

for(i in 2:100)
  y[i] <- 0.1*y[i-1] + e[i]
sim_01 <- tsibble(idx = seq_len(100), y = y, index = idx)
#autoplot(sim_01)

for(i in 2:100)
  y[i] <- -0.8*y[i-1] + e[i]
sim_neg08 <- tsibble(idx = seq_len(100), y = y, index = idx)
#autoplot(sim_neg08)

ggarrange(autoplot(sim)+labs(title="Phi = 0.6")+theme(plot.title = element_text(hjust = 0.5)),autoplot(sim_08)+labs(title="Phi = 0.8")+theme(plot.title = element_text(hjust = 0.5)),autoplot(sim_01)+labs(title="Phi = 0.1")+theme(plot.title = element_text(hjust = 0.5)),autoplot(sim_neg08)+labs(title="Phi = -0.8")+theme(plot.title = element_text(hjust = 0.5)))

## Plot variable not specified, automatically selected `.vars = y`
## Plot variable not specified, automatically selected `.vars = y`
## Plot variable not specified, automatically selected `.vars = y`
## Plot variable not specified, automatically selected `.vars = y`

Response for Part B:

As Phi increases, the noise level decreases. With a negative phi value, the time series plot is very noisy and offers limited insights.

C

theta <- 0.6
y <- numeric(100)
e <- rnorm(100)

for(i in 2:100)
  y[i] <- theta*e[i-1] + e[i]
#Big change from ARIMA to MA is that e is multiplied by theta rather than y
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

#glimpse(sim)

#autoplot(sim)

theta <- 0.8
for(i in 2:100)
  y[i] <- theta*e[i-1] + e[i]
sim_08 <- tsibble(idx = seq_len(100), y = y, index = idx)
#autoplot(sim_08)

theta <- 0.1
for(i in 2:100)
  y[i] <- theta*e[i-1] + e[i]
sim_01 <- tsibble(idx = seq_len(100), y = y, index = idx)
#autoplot(sim_01)

theta <- -0.8
for(i in 2:100)
  y[i] <- theta*e[i-1] + e[i]
sim_neg08 <- tsibble(idx = seq_len(100), y = y, index = idx)
#autoplot(sim_neg08)

ggarrange(autoplot(sim)+labs(title="Theta = 0.6")+theme(plot.title = element_text(hjust = 0.5)),autoplot(sim_08)+labs(title="Theta = 0.8")+theme(plot.title = element_text(hjust = 0.5)),autoplot(sim_01)+labs(title="Theta = 0.1")+theme(plot.title = element_text(hjust = 0.5)),autoplot(sim_neg08)+labs(title="Theta = -0.8")+theme(plot.title = element_text(hjust = 0.5)))

## Plot variable not specified, automatically selected `.vars = y`
## Plot variable not specified, automatically selected `.vars = y`
## Plot variable not specified, automatically selected `.vars = y`
## Plot variable not specified, automatically selected `.vars = y`

Similarly, the MA model reveals a much more noisy plot when the value of \(\theta\) decreases.

E

theta = 0.6
phi = 0.6
for(i in 2:100)
  y[i] <- e[i] + theta*e[i-1] + phi*y[i-1]


sim <- tsibble(idx = seq_len(100), y = y, index = idx)

autoplot(sim)+labs(title="ARMA: Phi = 0.6, Theta = 0.6")+theme(plot.title = element_text(hjust = 0.5))

## Plot variable not specified, automatically selected `.vars = y`

F

phi1 <- -0.8
phi2 <- 0.3
y <- numeric(100)
e <- rnorm(100)

for(i in 3:100)
  y[i] <- e[i] + phi1*y[i-1] + phi2*y[i-2]

sim <- tsibble(idx = seq_len(100), y = y, index = idx)

autoplot(sim)+labs(title="Phi_1 = -0.8, Phi_2 = 0.3")+theme(plot.title = element_text(hjust = 0.5))

## Plot variable not specified, automatically selected `.vars = y`

The AR series follows a much more clearly defined pattern than the ARMA series. The range of the AR series is expanding, while the ARMA series appears relatively stationary.

9.7

aus_air <- subset(aus_airpassengers,Year < 2012)

autoplot(aus_air)

## Plot variable not specified, automatically selected `.vars = Passengers`

lambda <- aus_air %>%
  features(Passengers, features = guerrero) %>%
  pull(lambda_guerrero)

transform <- box_cox(aus_air$Passengers,lambda)


aus_air_diff <- aus_air %>%
  mutate(Passengers_d1 = c(0,diff(transform)))

#autoplot(aus_retail_diff,Turnover_d1)

aus_air_diff <- aus_air_diff %>%
  mutate(Passengers_d2 = c(0,diff(Passengers_d1)))

#autoplot(aus_retail_diff,Turnover_d2)

ggarrange(autoplot(aus_air,transform)+labs(title="Box-Cox Transformed Series")+theme(plot.title = element_text(hjust = 0.5)),autoplot(aus_air_diff,Passengers_d1)+labs(title="First Order Differencing")+theme(plot.title = element_text(hjust = 0.5)),autoplot(aus_air_diff,Passengers_d2)+labs(title="Second Order Differencing")+theme(plot.title = element_text(hjust = 0.5)),ncol = 1)

aus_air_diff %>%
  gg_tsdisplay(Passengers_d1,plot_type = "partial")

fit <- aus_air %>%
  model(arima012 = ARIMA(Passengers ~ pdq(0,1,2)),
        arima010 = ARIMA(Passengers ~ pdq(0,1,0)),
        stepwise = ARIMA(Passengers),
        search = ARIMA(Passengers, stepwise=FALSE))


fit_summary <- data.frame(rbind(c("arima012",0,1,2),c("arima010",0,1,0),c("stepwise",0,2,1),c("search",0,2,1)))

colnames(fit_summary) <- c("Model","p","d","q")

kbl(fit_summary, longtable = T, booktabs = T, caption = "Parameters of Each Model") %>%
  kable_styling(latex_options = c("repeat_header"))

Parameters of Each Model
Model	d	q
arima012	1	2
arima010	1	0
stepwise	2	1
search	2	1

glance(fit) %>%
  arrange(AICc) %>%
  select(.model:BIC)

aus_air %>%
  model(arima012 = ARIMA(Passengers ~ pdq(0,1,2)),
        arima010 = ARIMA(Passengers ~ pdq(0,1,0)),
        stepwise = ARIMA(Passengers),
        search = ARIMA(Passengers, stepwise=FALSE)) %>%
  forecast(h=10) %>%
  autoplot(aus_air)

B

The optimal model is ARIMA(0,2,1).

\(y_t = -0.896{\epsilon}_{t-1} + \epsilon_t\)

C

The ARIMA(0,1,0) forecast is less specific and the confidence interval is broader than ARIMA(0,2,1).

aus_air %>%
  model(arima010 = ARIMA(Passengers ~ pdq(0,1,0))) %>%
  forecast(h=10) %>%
  autoplot(aus_air)+
  labs(title = "ARIMA(0,1,0)")+theme(plot.title = element_text(hjust = 0.5))

aus_air %>%
  model(arima010 = ARIMA(Passengers ~ pdq(2,1,2))) %>%
  forecast(h=10) %>%
  autoplot(aus_air)+
  labs(title = "ARIMA(2,1,2)")+theme(plot.title = element_text(hjust = 0.5))

ARIMA(2,1,2) appears to actually predict changes in the growth over time, whereas the ARIMA(0,1,0) model appears to predict a steady increase in Passenger growth. Without the constant, the model will not yield any predictions.

E

aus_air %>%
  model(arima021 = ARIMA(Passengers ~ 1 + pdq(0,2,1))) %>%
  forecast(h=10) %>%
  autoplot(aus_air)+
  labs(title = "ARIMA(0,2,1)")+theme(plot.title = element_text(hjust = 0.5))

## Warning: Model specification induces a quadratic or higher order polynomial trend. 
## This is generally discouraged, consider removing the constant or reducing the number of differences.

The ARIMA(0,2,1) model with a coefficient forecasts a steady and sharper increase in Passenger count over the next 10 years.

9.8

us <- subset(global_economy,Code == "USA")

lambda <- us %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

transform <- box_cox(us$GDP,lambda)

us_diff <- us %>%
  mutate(GDP_d1 = c(0,diff(transform)))

#autoplot(aus_retail_diff,Turnover_d1)

us_diff <- us_diff %>%
  mutate(GDP_d2 = c(0,diff(GDP_d1)))

#autoplot(aus_retail_diff,Turnover_d2)

ggarrange(autoplot(us,transform)+labs(title="Box-Cox Transformed US GDP Series")+theme(plot.title = element_text(hjust = 0.5)),autoplot(us_diff,GDP_d1)+labs(title="First Order Differencing")+theme(plot.title = element_text(hjust = 0.5)),autoplot(us_diff,GDP_d2)+labs(title="Second Order Differencing")+theme(plot.title = element_text(hjust = 0.5)),ncol = 1)

A Box-Cox Transformation with a lambda of 0.282 was needed.

us %>%
  model(ARIMA(GDP)) %>%
  forecast(h=10) %>%
  autoplot(us)+
  labs(title = "ARIMA(0,2,2)")

fit <- us %>%
  model(ARIMA(GDP))


us %>%
  model(arima012 = ARIMA(GDP ~ pdq(0,1,2)),
        arima110 = ARIMA(GDP ~ pdq(1,1,0)),
        stepwise = ARIMA(GDP),
        search = ARIMA(GDP, stepwise=FALSE)) %>%
  forecast(h=10) %>%
  autoplot(us)

The confidence intervals for ARIMA(0,1,2) and ARIMA(1,1,0) are much wider than that of the “search” method’s ARIMA(0,2,2) model, which is indicative of a more conservative forecast.

fit <- us %>%
  model(arima012 = ARIMA(GDP ~ pdq(0,1,2)),
        arima110 = ARIMA(GDP ~ pdq(1,1,0)),
        stepwise = ARIMA(GDP),
        search = ARIMA(GDP, stepwise=FALSE))

fit_summary <- data.frame(rbind(c("arima012",0,1,2),c("arima110",1,1,0),c("stepwise",0,2,2),c("search",0,2,2)))

colnames(fit_summary) <- c("Model","p","d","q")

kbl(fit_summary, longtable = T, booktabs = T, caption = "Parameters of Each Model") %>%
  kable_styling(latex_options = c("repeat_header"))

Parameters of Each Model
Model	p	d	q
arima012	0	1	2
arima110	1	1	0
stepwise	0	2	2
search	0	2	2

us_diff %>%
  gg_tsdisplay(GDP_d1,plot_type = "partial")

us_diff %>%
  gg_tsdisplay(GDP_d2,plot_type = "partial")

The data becomes much more stationary with two rounds of differencing. Because none of the lags are outside of the significance range, p can be zero.

ets <- us %>% 
  model(ETS(GDP))

The ETS method yields a much more distant van than the standard ARIMA method.

DATA 624 Assignment 6

Shane Hylton

2023-03-20

9.1

A:

B:

9.2

First Order Differencing

Second Order Differencing

9.3

A

Response:

B

Response:

C

9.5

9.6

Response for Part B:

C

E

F

9.7

B

C

E

9.8