Assignment 6

library(png)
library(fpp3)

## Warning: package 'fpp3' was built under R version 4.4.1

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.1
## ✔ lubridate   1.9.4     ✔ fable       0.4.1
## ✔ ggplot2     3.5.1

## Warning: package 'lubridate' was built under R version 4.4.1

## Warning: package 'tsibble' was built under R version 4.4.1

## Warning: package 'feasts' was built under R version 4.4.1

## Warning: package 'fabletools' was built under R version 4.4.1

## Warning: package 'fable' was built under R version 4.4.1

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

library(dplyr)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0     ✔ readr   2.1.5
## ✔ purrr   1.0.2     ✔ stringr 1.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()     masks stats::filter()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag()        masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(fabletools)
library(fable)
library(ggplot2)
library(gridExtra)

## 
## Attaching package: 'gridExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     combine

9.1 Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise? The difference in the three plots would be the sample size, the 360 number size show a autocorrelation near zero for the non-zero lags, and the 1,000 size would be the most reliable plot to indicate the prescence of white noise as the non zero lags are very close to zero. Over all, the three plots show there is white noise.

#Figure 9.32
img <- readPNG("~/Downloads/Screenshot2.png")
plot(1:2, type="n", ann=FALSE, axes = FALSE)
rasterImage(img, 1, 1, 2, 2)

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise? The critical values are different due to the sample sizes, as the sample sizes increases the more narrow the critical value becomes. Critical value= ± 1.96 x 1/sqrt(t), as t increases, which is the time series size increase the more the critical value will decrease. The autocorrelation decreases as the higher the sample size is as there is less chance of uncertainty.

9.2 A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

The time series for amazon daily closing price has an upwards trend meaning the data is non-stationary. The ACF plot shows the values are decreasing slowly, hinting non-stationary and the partial ACF the first value was a large positive value also hinting non-stationary data.

Amazon<-gafa_stock|>
  filter(Symbol=="AMZN")

Amazon|>
  autoplot(Close)+
  labs(y="Closing Price(USD$)", title= "Amazon Stock Closing Prices (USD$)")

Amazon|>
  ACF(Close)|>
  autoplot()+
  labs(title = "Amazon Closing Prices")

## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.

Amazon|>
  PACF(Close)|>
  autoplot()+
  labs(title = "Amazon Closing Prices")

## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.

9.3 For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

a.Turkish GDP from global_economy. Lambda of .157 suggesting a log transformation, as the lambda is nearly zero. Based on the KPSS p value of 0.01 the data is not stationary. No seasonal difference, only one first order difference needed to be applied to the data.

TurkGDP<-global_economy|>
  filter(Country=="Turkey")

TurkGDP|>
  autoplot(GDP)+
  labs(title = "Turish GDP")

#Pull lambda for box cox
TurkGDPLam<-TurkGDP|>
  features(GDP, features = guerrero)|>
  pull(lambda_guerrero)

#Box-cox transformation plot
TurkGDP|>
  autoplot(box_cox(GDP, TurkGDPLam))+
  labs(title = paste0("Box-cox Transformed Turkish GDP lambda=", round(TurkGDPLam,3)))

#Check order in difference and if data is stationary

TurkGDP|>#KPSS Test
  features(GDP, unitroot_kpss)

## # A tibble: 1 × 3
##   Country kpss_stat kpss_pvalue
##   <fct>       <dbl>       <dbl>
## 1 Turkey       1.22        0.01

#Check how many season difference
TurkGDP |> 
  features(GDP, unitroot_nsdiffs)

## # A tibble: 1 × 2
##   Country nsdiffs
##   <fct>     <int>
## 1 Turkey        0

TurkGDP |> #Check how many first order differences are needed
  features(GDP, unitroot_ndiffs)

## # A tibble: 1 × 2
##   Country ndiffs
##   <fct>    <int>
## 1 Turkey       1

#Plot seasonal of difference
TurkGDP|>
  autoplot(difference(GDP))+
  labs(title ="Difference Turkish GDP lambda")

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

b.Accommodation takings in the state of Tasmania from aus_accommodation. This data has a p-value of .01, meaning the data is not stationary. Based on the lambda log transformation can be done with box-cox, and requires a seasonal difference.

Tasmania<-aus_accommodation|>
  filter(State=="Tasmania")

Tasmania|>
  autoplot(Takings)+
  labs(title = "Tasmania Takings")

#Pull lambda for box cox
Tas_lambda<-Tasmania|>
  features(Takings, features = guerrero)|>
  pull(lambda_guerrero)

#Box-cox transformation plot
Tasmania|>
  autoplot(box_cox(Takings, Tas_lambda))+
  labs(title = paste0("Box-cox Transformed Tasmania Takings lambda=", round(Tas_lambda,3)))

#Check order in difference and if data is stationary

Tasmania|>#KPSS Test
  features(Takings, unitroot_kpss)

## # A tibble: 1 × 3
##   State    kpss_stat kpss_pvalue
##   <chr>        <dbl>       <dbl>
## 1 Tasmania      1.88        0.01

Tasmania |> #Check if seasonal differences are needed
  mutate(diff_Takings=box_cox(Takings, Tas_lambda))|>
  features(Takings, unitroot_nsdiffs)

## # A tibble: 1 × 2
##   State    nsdiffs
##   <chr>      <int>
## 1 Tasmania       1

Tasmania |> #Check how many first order differences are needed
  mutate(diff_Takings=difference(box_cox(Takings, Tas_lambda)))|>
  features(diff_Takings, unitroot_ndiffs)

## # A tibble: 1 × 2
##   State    ndiffs
##   <chr>     <int>
## 1 Tasmania      0

Tasmania|>
  autoplot(difference(box_cox(Takings, Tas_lambda)))+
  labs(title ="Tasmania Takings seasonal difference")

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

c.Monthly sales from souvenirs.

SS<-souvenirs
SS|>
  autoplot(Sales)+
  labs(title = "Monthly souvenir sales")

#Pull lambda for box cox
SSL<-SS|>
  features(Sales, features = guerrero)|>
  pull(lambda_guerrero)

#Box-cox transformation plot
SS|>
  autoplot(box_cox(Sales, SSL))+
  labs(title = paste0("Box-cox Transformed Souvenirs Sale lambda=", round(SSL,3)))

#Check order in difference and if data is stationary

SS|>#KPSS Test
  features(Sales, unitroot_kpss)

## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1      1.31        0.01

#Check how many season difference
SS |> 
  mutate(ssales=box_cox(Sales, SSL))|>
  features(ssales, unitroot_nsdiffs)

## # A tibble: 1 × 1
##   nsdiffs
##     <int>
## 1       1

SS |> #Check how many first order differences are needed
  mutate(diff_sales=difference(box_cox(Sales, SSL),12))|>
  features(diff_sales, unitroot_ndiffs)

## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      0

#Plot seasonal of difference
SS |>
  autoplot(difference(box_cox(Sales, SSL), 12))+
  labs(title ="Seasonal Difference Souvenirs Sale lambda")

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

9.5 For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

autoplot(myseries, Turnover)+
  labs(title="Turnover from Australia")

AusLam<-myseries|>
  features(Turnover, features = guerrero)|>
  pull(lambda_guerrero)

myseries|>
  autoplot(box_cox(Turnover, AusLam))+
  labs(title = paste0("Box-cox Transformed Australia Turnover lambda=", round(AusLam,3)))

myseries|>#KPSS Test
  features(Turnover, unitroot_kpss)

## # A tibble: 1 × 4
##   State              Industry                              kpss_stat kpss_pvalue
##   <chr>              <chr>                                     <dbl>       <dbl>
## 1 Northern Territory Clothing, footwear and personal acce…      5.72        0.01

myseries|> #Check if seasonal differences are needed
  mutate(STurnover=box_cox(Turnover,AusLam))|>
  features(STurnover, unitroot_nsdiffs)

## # A tibble: 1 × 3
##   State              Industry                                            nsdiffs
##   <chr>              <chr>                                                 <int>
## 1 Northern Territory Clothing, footwear and personal accessory retailing       1

myseries|> #Check how many first order differences are needed
  mutate(diff_Turnover=difference(box_cox(Turnover,AusLam)))|>
  features(diff_Turnover, unitroot_ndiffs)

## # A tibble: 1 × 3
##   State              Industry                                            ndiffs
##   <chr>              <chr>                                                <int>
## 1 Northern Territory Clothing, footwear and personal accessory retailing      0

myseries|>
  autoplot(difference(box_cox(Turnover,AusLam)))+
  labs(title ="Australia Turnover seasonal difference")

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

9.6 Simulate and plot some data from simple ARIMA models. Use the following R code to generate data from an AR(1) model with ϕ1=0.6 and σ2=1. The process starts with y1=0.

Produce a time plot for the series. How does the plot change as you change 𝜙1? Changing the autoregression coefficient to a lower value the plot become more fluctuations but the larger the autoregression coefficient created a smoother plot.

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim|>
  autoplot(y)+
  labs(title="AR(1) model with phi=.6")

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 1.0*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim|>
  autoplot(y)+
  labs(title="AR(1) model with phi=1.0")

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.5*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim|>
  autoplot(y)+
  labs(title="AR(1) model with phi=.5")

Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1.

y_MA <- numeric(100)
e_MA <- rnorm(100)

for(i in 2:100)
  y_MA[i] <- 0.6*e_MA[i-1] + e_MA[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim|>
  autoplot(y_MA)+
  labs(title="MA(1) model with Theta=.6")

Produce a time plot for the series. How does the plot change as you change θ1 ? The higher the θ1 the more smoother the plot would be and with fewer patterns of changes.

y_MA <- numeric(100)
e_MA <- rnorm(100)

for(i in 2:100)
  y_MA[i] <- 0.2*e_MA[i-1] + e_MA[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim|>
  autoplot(y_MA)+
  labs(title="MA(1) model with Theta=0.2")

y_MA <- numeric(100)
e_MA <- rnorm(100)

for(i in 2:100)
  y_MA[i] <- 1.0*e_MA[i-1] + e_MA[i]
sim <- tsibble(idx = seq_len(100), y_MA = y_MA, index = idx)

sim|>
  autoplot(y_MA)+
  labs(title="MA(1) model with Theta=1.0")

Generate data from an ARMA(1,1) model with ϕ1=0.6 , θ1=0.6 and σ2=1.

 y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + 0.6*e[i-1]+ e[i]
sim_ARMA <- tsibble(idx = seq_len(100), y = y, index = idx)

Generate data from an AR(2) model with ϕ1=−0.8 , ϕ2=0.3 and σ2=1. (Note that these parameters will give a non-stationary series.)

 y <- numeric(100)
e <- rnorm(100)
for(i in 3:100)
  y[i] <- -0.8*y[i-1] + 0.3*y[i-2]+ e[i]
sim_AR2 <- tsibble(idx = seq_len(100), y = y, index = idx)

Graph the latter two series and compare them. The AR2 plot has an increasing variance while the ARMA model seems have more constant variance through out.

p1<-sim_ARMA|>
  autoplot(y)+
  labs(title="ARMA(1,1) model with phi=.6 and theta= 0.6")

p2<-sim_AR2|>
  autoplot(y)+
  labs(title="AR2 model with phi1=.8 and phi2= 0.3")

grid.arrange(p1,p2, ncol=2)

9.7 Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods. The selected model was ARIMA (0,2,1). Checked on white noise, and there noise present due to the residual plot spikes.

Write the model in terms of the backshift operator. (1-B)^2yt=(1-0.8963)ϵt.

Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a. Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens. Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens? Removing the constant caused the model to have a steeper line.

head(aus_airpassengers)

## # A tsibble: 6 x 2 [1Y]
##    Year Passengers
##   <dbl>      <dbl>
## 1  1970       7.32
## 2  1971       7.33
## 3  1972       7.80
## 4  1973       9.38
## 5  1974      10.7 
## 6  1975      11.1

aus_airpassengers|>
  autoplot()

## Plot variable not specified, automatically selected `.vars = Passengers`

AA_fit<-aus_airpassengers|>
  model(ARIMA(Passengers)) #auto ARIMA

report(AA_fit)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

AA_fit |> forecast(h=10) |>
  autoplot(aus_airpassengers) +
  labs(y = "Passengers", title = "Australia Passenger")

#check for white noise
aus_airpassengers |>
  model(ARIMA(Passengers))|>
  gg_tsresiduals()

AA2_fit<-aus_airpassengers|>
  model(arima010 = ARIMA(Passengers ~ 1+pdq(0,1,0)),
        arima021 = ARIMA(Passengers ~ pdq(0,2,1)),
        )

AA2_fit|>
  forecast(h=10)|>
  autoplot(aus_airpassengers, level=NULL)

AA3_fit<-aus_airpassengers|>
  model(arima010 = ARIMA(Passengers ~ 1+ pdq(0,1,0)),
        arima021 = ARIMA(Passengers ~ pdq(0,2,1)),
        arima212 = ARIMA(Passengers ~ 1+ pdq(2,1,2)),
        nocon=ARIMA(Passengers~0+pdq(2,1,2)))

## Warning: 1 error encountered for nocon
## [1] non-stationary AR part from CSS

AA3_fit|>
  forecast(h=10)|>
  autoplot(aus_airpassengers, level=NULL)

## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).

#ARIMA 021 with constant
AA_Cfit<-aus_airpassengers|>
  model(ARIMA021 = ARIMA(Passengers ~ 1+ pdq(0,2,1)))

## Warning: Model specification induces a quadratic or higher order polynomial trend. 
## This is generally discouraged, consider removing the constant or reducing the number of differences.

AA_Cfit|>
  forecast(h=10)|>
  autoplot(aus_airpassengers, level=NULL)+
  labs(title = "Passengers ARIMA021 with constant")

AA_Nfit<-aus_airpassengers|>
  model(ARIMA021 = ARIMA(Passengers ~ 0+ pdq(0,2,1)))

AA_Nfit|>
  forecast(h=10)|>
  autoplot(aus_airpassengers, level=NULL)+
  labs(title = "Passengers ARIMA021 No constant")

9.8 For the United States GDP series (from global_economy):

if necessary, find a suitable Box-Cox transformation for the data; fit a suitable ARIMA model to the transformed data using ARIMA(); try some other plausible models by experimenting with the orders chosen; choose what you think is the best model and check the residual diagnostics; produce forecasts of your fitted model. Do the forecasts look reasonable? I chose ARIMA(1,1,0) as the best model. The forecast for ARMIA 1,1,0 seemed to be reasonable as it followed the curved pattern of the plot.

compare the results with what you would obtain using ETS() (with no transformation).

The ARIMA model was better as it had a lower RMSE, meaning a better fit model.

US_GDP<-global_economy|>
  filter(Code=="USA")|>
  select(Year, GDP)

US_GDP|>
  autoplot(GDP) #Shows no seasonality but there is an upwards trend, so transformation is needed.

#find lambda
US_Lam<-US_GDP|>
  features(GDP, features = guerrero)|>
  pull(lambda_guerrero)

US_GDP|>
  autoplot(box_cox(GDP, US_Lam))+
  labs(title = paste0("Box-cox Transformed USA GDP with lambda=", round(US_Lam,3)))

US_GDP<-US_GDP|> #Check if seasonal differences are needed
  mutate(SGDP=box_cox(GDP,US_Lam))

US_fit<-US_GDP|>
  model(ARIMA(SGDP))
report(US_fit)

## Series: SGDP 
## Model: ARIMA(1,1,0) w/ drift 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

US_fit2<-US_GDP|>
  model(arima010 = ARIMA(GDP ~ pdq(0,1,0)),
        arima110 = ARIMA(GDP ~ pdq(1,1,0)))

## Warning in sqrt(diag(best$var.coef)): NaNs produced

US_fit2|>
  forecast(h=10)|>
  autoplot(US_GDP, level=NULL)

#check for white noise chose model ARIMA (1,1,0)
US_fit |>
  gg_tsresiduals()

US_fit|>
  forecast(h=10)|>
  autoplot(US_GDP, level=NULL)+
  labs(title = "US GDP ARIMA110")

#ETS Vs ARIMA
US_GDP |>
  slice(-n()) |>
  stretch_tsibble(.init = 10) |>
  model(
    ETS(GDP),
    ARIMA(GDP)
  ) |>
  forecast(h = 1) |>
  accuracy(US_GDP) |>
  select(.model, RMSE:MAPE)

## # A tibble: 2 × 5
##   .model              RMSE           MAE   MPE  MAPE
##   <chr>              <dbl>         <dbl> <dbl> <dbl>
## 1 ARIMA(GDP) 271483432886. 158769768673. 2.62   3.80
## 2 ETS(GDP)   190604101250. 126812323851. 0.594  1.73

US_GDP|>
  model(ETS(GDP))|>
  forecast(h=10)|>
  autoplot(US_GDP) +
  labs(title = "USA GDP",
       y = "GDP")

Assignment 6

Andreina A

2025-03-16