Data624_homework6

(1)Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise? Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers. Figure 9.32: Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers.

The figures show that as numbers increase the coundaries become tighter. The plots also show that data exhibit characteristics of white noise since over 95% of lag data in each plot falls within these boundaries.

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The variance in critical values relative to the zero mean shifts as a consequence of the law of large numbers. With an expanding sample size, the occurrence of outliers deviating from the mean diminishes.

classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(forecast)

## Warning: package 'forecast' was built under R version 4.3.3

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

library(magrittr)
library(fpp3)

## ── Attaching packages ────────────────────────────────────────────── fpp3 0.5 ──

## ✔ tibble      3.2.1     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.0     ✔ feasts      0.3.1
## ✔ lubridate   1.9.3     ✔ fable       0.3.3
## ✔ ggplot2     3.5.0     ✔ fabletools  0.4.0
## ✔ tsibble     1.1.4

## Warning: package 'ggplot2' was built under R version 4.3.3

## Warning: package 'tsibble' was built under R version 4.3.3

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ tidyr::extract()     masks magrittr::extract()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

library(tsibble)

ggtsdisplay(gafa_stock$Close)

From the plots we see that most of the lags are outside therefore most likely not white noise, The first graph shows a significant drop in the middle.

(3)For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Turkish GDP from global_economy. Accommodation takings in the state of Tasmania from aus_accommodation. Monthly sales from souvenirs.

df <- filter(global_economy, Country == "Turkey")
ggtsdisplay(df$GDP)

# Determine the optimal lambda value for Box-Cox transformation
lambda <- BoxCox.lambda(df$GDP)

# Apply Box-Cox transformation
BoxCoxedData <- BoxCox(df$GDP, lambda)

# Apply differencing
DiffData <- diff(BoxCoxedData)

ggtsdisplay(DiffData)

dftas <- filter(aus_accommodation, State == "Tasmania")
ggtsdisplay(dftas$Takings)

lambda <- BoxCox.lambda(dftas$Takings)

# Apply Box-Cox transformation
BoxCoxedData <- BoxCox(dftas$Takings, lambda)

# Apply differencing
DiffData <- diff(BoxCoxedData)

# Plot the time series data
ggtsdisplay(DiffData)

dfsou <- souvenirs$Sales
ggtsdisplay(souvenirs$Sales)

lambda <- BoxCox.lambda(souvenirs$Sales)

# Apply Box-Cox transformation
BoxCoxedData <- BoxCox(dfsou, lambda)

# Apply differencing
DiffData <- diff(BoxCoxedData)

# Plot the time series data
ggtsdisplay(DiffData)

(5) For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries_train <- myseries |>
filter(year(Month) < 2011)
autoplot(myseries)

## Plot variable not specified, automatically selected `.vars = Turnover`

# Determine the optimal lambda value for Box-Cox transformation
lambda <- BoxCox.lambda(myseries_train$Turnover)

# Apply Box-Cox transformation
BoxCoxedData <- BoxCox(myseries$Turnover, lambda)

# Apply differencing
DiffData <- diff(BoxCoxedData)

# Plot the time series data after transformation and differencing
plot(DiffData, type = "l", xlab = "Time", ylab = "Differenced Data", main = "Differenced Time Series Data")

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
autoplot(sim)

## Plot variable not specified, automatically selected `.vars = y`

set.seed(123)  # for reproducibility
phi <- 0.6
sigma <- 1
y_ar1 <- arima.sim(model = list(ar = phi), n = 100, sd = sigma)
# Create tsibble object
sim_ar1 <- tsibble(idx = seq_len(100), y = y_ar1, index = idx)

# Produce a time plot
autoplot(sim_ar1, ylab = "AR(1) Series")

## Plot variable not specified, automatically selected `.vars = y`

## Warning in geom_line(...): Ignoring unknown parameters: `ylab`

set.seed(123)  # for reproducibility
theta <- 0.6
sigma <- 1
y_ma1 <- arima.sim(model = list(ma = theta), n = 100, sd = sigma)


# Create tsibble object
sim_ma1 <- tsibble(idx = seq_len(100), y = y_ma1, index = idx)

# Produce a time plot
autoplot(sim_ma1, ylab = "MA(1) Series")

## Plot variable not specified, automatically selected `.vars = y`

## Warning in geom_line(...): Ignoring unknown parameters: `ylab`

set.seed(123)  # for reproducibility
phi <- 0.6
theta <- 0.6
sigma <- 1
y_arma11 <- arima.sim(model = list(ar = phi, ma = theta), n = 100, sd = sigma)

# Create tsibble object
sim_arma11 <- tsibble(idx = seq_len(100), y = y_arma11, index = idx)

# Produce a time plot
autoplot(sim_arma11, ylab = "ARMA(1,1) Series")

## Plot variable not specified, automatically selected `.vars = y`

## Warning in geom_line(...): Ignoring unknown parameters: `ylab`

For the United States GDP series (from global_economy):

if necessary, find a suitable Box-Cox transformation for the data; fit a suitable ARIMA model to the transformed data using ARIMA(); try some other plausible models by experimenting with the orders chosen; choose what you think is the best model and check the residual diagnostics; produce forecasts of your fitted model. Do the forecasts look reasonable? compare the results with what you would obtain using ETS() (with no transformation).

# Filter data for the USA and select the GDP column
df <- filter(global_economy, Code == "USA") %>%
      select(GDP)

# Plot the time series data
autoplot(df, ylab = "GDP")

## Plot variable not specified, automatically selected `.vars = GDP`

## Warning in geom_line(...): Ignoring unknown parameters: `ylab`

# Determine the optimal lambda value for Box-Cox transformation
lambda <- BoxCox.lambda(df$GDP)

# Apply Box-Cox transformation
BoxCoxedData <- BoxCox(df$GDP, lambda)

# Apply differencing
DiffData <- diff(BoxCoxedData)

# Plot the time series data after transformation and differencing
plot(DiffData, type = "l", xlab = "Time", ylab = "Differenced GDP", main = "Differenced Time Series Data")

Data624_homework6

2024-03-28