HW 6 624 Arima LSzydziak 101221

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(latex2exp)
library(ggplot2)
library(fabletools)
library(fpp3)

## -- Attaching packages -------------------------------------------- fpp3 0.4.0 --

## v tibble      3.1.4      v tsibbledata 0.3.0 
## v tidyr       1.1.3      v feasts      0.2.2 
## v lubridate   1.7.10     v fable       0.3.1 
## v tsibble     1.0.1

## -- Conflicts ------------------------------------------------- fpp3_conflicts --
## x lubridate::date()    masks base::date()
## x dplyr::filter()      masks stats::filter()
## x tsibble::intersect() masks base::intersect()
## x tsibble::interval()  masks lubridate::interval()
## x dplyr::lag()         masks stats::lag()
## x tsibble::setdiff()   masks base::setdiff()
## x tsibble::union()     masks base::union()

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
1. Explain the differences among these figures. Do they all indicate that the data are white noise?

Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers.

Figure 9.32: Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers.

b.  Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

DISCUSSION:

From the text:  "Time series that show no autocorrelation are called white noise.  For white noise series, we expect each autocorrelation to be close to zero. Of course, they will not be exactly equal to zero as there is some random variation. For a white noise series, we expect 95% of the spikes in the ACF to lie within  ±2/√T  where T  is the length of the time series. It is common to plot these bounds on a graph of the ACF (the blue dashed lines above).

So, the bounds for the three are computed =/-2/√T:

2/sqrt(36)

## [1] 0.3333333

2/sqrt(360)

## [1] 0.1054093

2/sqrt(1000)

## [1] 0.06324555

DISCUSSION:

This explains why the ACFS are different with respect to the bounds. They are indicate white noise because no spikes are outside of these bounds. These were a series of random numbers so we would expect positive and negative values.

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

gafa_stock %>%
filter(Symbol=="AMZN")%>%
autoplot(Close)+labs(y="$", title="Amazon CLOSE")

gafa_stock %>%
filter(Symbol=="AMZN")%>%
  ACF(Close)%>%autoplot()

## Warning: Provided data has an irregular interval, results should be treated with
## caution. Computing ACF by observation.

gafa_stock %>%
filter(Symbol=="AMZN")%>%
  PACF(Close)%>%autoplot()

## Warning: Provided data has an irregular interval, results should be treated with
## caution. Computing ACF by observation.

DISCUSSION:

The autoplot shows a trend upward than downward suggesting non-stationarity.

The ACF plot indicates autocorrelation with a downward decreasing lag without a scalloped pattern(no seasonality) with all lags exhibiting correlations significantly different from zero. This needs addressing.

The PCAF also lags exhibiting partial correlations significantly different from zero. This needs addressing as well.

All three plots suggest to 1st difference of the series.

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
1. Turkish GDP from global_economy.

global_economy %>%
filter(Country=="Turkey")%>%
autoplot(GDP)+labs(y="$", title="TURKEY GDP")

Turkey<-global_economy %>%
filter(Country=="Turkey")

lambda<-Turkey%>%
  features(GDP,features=guerrero)%>%
  pull(lambda_guerrero)

Turkey%>% mutate(newG=GDP^lambda)

## # A tsibble: 58 x 10 [1Y]
## # Key:       Country [1]
##    Country Code   Year       GDP Growth     CPI Imports Exports Population  newG
##    <fct>   <fct> <dbl>     <dbl>  <dbl>   <dbl>   <dbl>   <dbl>      <dbl> <dbl>
##  1 Turkey  TUR    1960   1.40e10  NA    5.40e-5    3.67    2.06   27472331  39.4
##  2 Turkey  TUR    1961   8.02e 9   1.16 5.57e-5    6.79    5.12   28146893  36.1
##  3 Turkey  TUR    1962   8.92e 9   5.57 5.78e-5    7.97    5.60   28832805  36.7
##  4 Turkey  TUR    1963   1.04e10   9.07 6.15e-5    6.97    4.18   29531342  37.5
##  5 Turkey  TUR    1964   1.12e10   5.46 6.22e-5    5.47    4.47   30244232  38.0
##  6 Turkey  TUR    1965   1.19e10   2.82 6.50e-5    5.40    4.56   30972965  38.4
##  7 Turkey  TUR    1966   1.41e10  11.2  7.05e-5    5.66    4.09   31717477  39.4
##  8 Turkey  TUR    1967   1.57e10   4.73 8.04e-5    4.96    4.11   32477961  40.1
##  9 Turkey  TUR    1968   1.75e10   6.78 8.53e-5    5.08    3.68   33256432  40.8
## 10 Turkey  TUR    1969   1.95e10   4.08 8.95e-5    4.74    3.60   34055361  41.5
## # ... with 48 more rows

  Turkey%>%
  autoplot(box_cox(GDP, lambda))+
             labs(y="",
                  title=latex2exp::TeX(paste0(
                    "Transformed GDP with $$\\lambda$ =",
                    round(lambda,2))))

  lambda

## [1] 0.1572187

Turkey<-Turkey%>% mutate(newG=GDP^lambda)

Turkey%>%autoplot(newG)

#plot after box plot transformation

Turkey%>%autoplot(newG)

#test if uncorrelated with itself.
Turkey%>% features(newG,ljung_box, lag=10)

## # A tibble: 1 x 3
##   Country lb_stat lb_pvalue
##   <fct>     <dbl>     <dbl>
## 1 Turkey     341.         0

#reject based on p value, need differencing

Turkey %>%
  mutate(diff_close = difference(newG)) %>%
  features(diff_close, ljung_box, lag = 10)

## # A tibble: 1 x 3
##   Country lb_stat lb_pvalue
##   <fct>     <dbl>     <dbl>
## 1 Turkey     5.84     0.829

#diff ok look at ACF and PACF

Turkey%>%
  ACF(difference(newG))%>%autoplot()

Turkey%>%
  PACF(difference(newG))%>%
  autoplot()

DISCUSSION:

Turkey dataset - stationarity met after boxcox and 1st difference.  Validated by the ACF, PACF all within interval.

b. Accommodation takings in the state of Tasmania from aus_accommodation.

aus_accommodation %>%
filter(State=="Tasmania")%>%
autoplot(Takings)+labs(y="$", title="Tasmania Takings")

Tasmania<-aus_accommodation %>%
filter(State=="Tasmania")

lambda1<-Tasmania%>%
  features(Takings,features=guerrero)%>%
  pull(lambda_guerrero)

lambda1

## [1] -0.04884781

Tasmania<-Tasmania%>% mutate(newT=Takings^lambda1)


  Tasmania%>%
  autoplot(box_cox(Takings, lambda1))+
             labs(y="",
                  title=latex2exp::TeX(paste0(
                    "Transformed Takings with $$\\lambda1$ =",
                    round(lambda1,2))))

 #plot after box plot transformation 
Tasmania%>%autoplot(newT)

#test if uncorrelated with itself.
Tasmania%>% features(newT,unitroot_kpss)

## # A tibble: 1 x 3
##   State    kpss_stat kpss_pvalue
##   <chr>        <dbl>       <dbl>
## 1 Tasmania      1.83        0.01

#reject based on p value, not stationary need differencing

Tasmania %>%
  mutate(diff_Taking = difference(newT))%>%
  features(diff_Taking, unitroot_kpss)

## # A tibble: 1 x 3
##   State    kpss_stat kpss_pvalue
##   <chr>        <dbl>       <dbl>
## 1 Tasmania     0.256         0.1

#1st difference ok

Tasmania<-Tasmania %>%
  mutate(diff_Taking = difference(newT))

Tasmania %>%
  features(diff_Taking, unitroot_ndiffs)

## # A tibble: 1 x 2
##   State    ndiffs
##   <chr>     <int>
## 1 Tasmania      0

#diff ok look at ACF and PACF

Tasmania%>%
  ACF((diff_Taking))%>%autoplot()

Tasmania%>%
  PACF((diff_Taking))%>%
  autoplot()

Tasmania %>%
  
  features(diff_Taking, unitroot_nsdiffs)

## # A tibble: 1 x 2
##   State    nsdiffs
##   <chr>      <int>
## 1 Tasmania       1

# this suggests to take 1 seasonal difference


Tasmania<-Tasmania %>%
  mutate(diff_Taking2 = difference(diff_Taking,12))

Tasmania%>%
  ACF((diff_Taking2))%>%
  autoplot()

Tasmania%>%
  PACF((diff_Taking2))%>%
  autoplot()

Tasmania%>% autoplot(diff_Taking2)

## Warning: Removed 13 row(s) containing missing values (geom_path).

DISCUSSION:

First transforming with box cox, then taking the first difference followed by taking a seasonal difference greatly improved the ACF and the PACF plots.  There is still an area of concern with repect to these plots at 12, however, the autoplot shows an improvement toward stationarity.

c.  Monthly sales from souvenirs.

souvenirs %>%
autoplot(Sales)+labs(y="$", title="Souvenir sales")

lambda2<-souvenirs%>%
  features(Sales,features=guerrero)%>%
  pull(lambda_guerrero)

lambda2

## [1] 0.002118221

SouvSales<-souvenirs%>% mutate(newS=Sales^lambda2)


  SouvSales%>%
  autoplot(box_cox(Sales, lambda2))+
             labs(y="",
                  title=latex2exp::TeX(paste0(
                    "Transformed Sales with $$\\lambda2$ =",
                    round(lambda2,2))))

 #plot after box plot transformation 
SouvSales%>%autoplot(newS)

#test if uncorrelated with itself.
SouvSales%>% features(newS,unitroot_kpss)

## # A tibble: 1 x 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1      1.79        0.01

#reject based on p value, not stationary need differencing

SouvSales %>%
  mutate(diff_Sales = difference(newS))%>%
  features(diff_Sales, unitroot_kpss)

## # A tibble: 1 x 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1    0.0631         0.1

#1st difference ok

SouvSales <-SouvSales  %>%
  mutate(diff_Sales = difference(newS))

SouvSales %>%
  features(diff_Sales, unitroot_ndiffs)

## # A tibble: 1 x 1
##   ndiffs
##    <int>
## 1      0

#diff ok look at ACF and PACF

SouvSales%>%
  ACF((diff_Sales))%>%autoplot()

SouvSales%>%
  PACF((diff_Sales))%>%
  autoplot()

SouvSales %>%
  
  features(diff_Sales, unitroot_nsdiffs)

## # A tibble: 1 x 1
##   nsdiffs
##     <int>
## 1       1

# this suggests to take 1 seasonal difference


SouvSales<-SouvSales %>%
  mutate(diff_Sales2 = difference(diff_Sales,12))

SouvSales%>%
  ACF((diff_Sales2))%>%
  autoplot()

SouvSales%>%
  PACF((diff_Sales2))%>%
  autoplot()

SouvSales%>% autoplot(diff_Sales2)

## Warning: Removed 13 row(s) containing missing values (geom_path).

DISCUSSION:

This series was transformed with a boxcox transformation, 1st differenced and then a seasonality difference.  While there is a marked improvement towards stationary, the model still needs more work as displayed in the ACF and PACF plots exceeding the blue boundaries.  

The model may be more complex and needs more analysis.

5.  For your retail data (from Exercise 8 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

aus_retail %>% distinct(Industry)

## # A tibble: 20 x 1
##    Industry                                                         
##    <chr>                                                            
##  1 Cafes, restaurants and catering services                         
##  2 Cafes, restaurants and takeaway food services                    
##  3 Clothing retailing                                               
##  4 Clothing, footwear and personal accessory retailing              
##  5 Department stores                                                
##  6 Electrical and electronic goods retailing                        
##  7 Food retailing                                                   
##  8 Footwear and other personal accessory retailing                  
##  9 Furniture, floor coverings, houseware and textile goods retailing
## 10 Hardware, building and garden supplies retailing                 
## 11 Household goods retailing                                        
## 12 Liquor retailing                                                 
## 13 Newspaper and book retailing                                     
## 14 Other recreational goods retailing                               
## 15 Other retailing                                                  
## 16 Other retailing n.e.c.                                           
## 17 Other specialised food retailing                                 
## 18 Pharmaceutical, cosmetic and toiletry goods retailing            
## 19 Supermarket and grocery stores                                   
## 20 Takeaway food services

set.seed(1111111)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries %>% distinct(Industry)

## # A tibble: 1 x 1
##   Industry                    
##   <chr>                       
## 1 Newspaper and book retailing

head(myseries)

## # A tsibble: 6 x 5 [1M]
## # Key:       State, Industry [1]
##   State             Industry                     `Series ID`    Month Turnover
##   <chr>             <chr>                        <chr>          <mth>    <dbl>
## 1 Western Australia Newspaper and book retailing A3349822A   1982 Apr      9.7
## 2 Western Australia Newspaper and book retailing A3349822A   1982 May     11  
## 3 Western Australia Newspaper and book retailing A3349822A   1982 Jun     10.7
## 4 Western Australia Newspaper and book retailing A3349822A   1982 Jul      9  
## 5 Western Australia Newspaper and book retailing A3349822A   1982 Aug      9.1
## 6 Western Australia Newspaper and book retailing A3349822A   1982 Sep     10

str(myseries)

## tbl_ts [441 x 5] (S3: tbl_ts/tbl_df/tbl/data.frame)
##  $ State    : chr [1:441] "Western Australia" "Western Australia" "Western Australia" "Western Australia" ...
##  $ Industry : chr [1:441] "Newspaper and book retailing" "Newspaper and book retailing" "Newspaper and book retailing" "Newspaper and book retailing" ...
##  $ Series ID: chr [1:441] "A3349822A" "A3349822A" "A3349822A" "A3349822A" ...
##  $ Month    : mth [1:441] 1982 Apr, 1982 May, 1982 Jun, 1982 Jul, 1982 Aug, 1982 Sep...
##  $ Turnover : num [1:441] 9.7 11 10.7 9 9.1 10 7.7 8.4 11.8 7.4 ...
##  - attr(*, "key")= tibble [1 x 3] (S3: tbl_df/tbl/data.frame)
##   ..$ State   : chr "Western Australia"
##   ..$ Industry: chr "Newspaper and book retailing"
##   ..$ .rows   : list<int> [1:1] 
##   .. ..$ : int [1:441] 1 2 3 4 5 6 7 8 9 10 ...
##   .. ..@ ptype: int(0) 
##   ..- attr(*, ".drop")= logi TRUE
##  - attr(*, "index")= chr "Month"
##   ..- attr(*, "ordered")= logi TRUE
##  - attr(*, "index2")= chr "Month"
##  - attr(*, "interval")= interval [1:1] 1M
##   ..@ .regular: logi TRUE

autoplot(myseries,.vars=Turnover)

gg_season(myseries,y=Turnover)

gg_subseries(myseries,y=Turnover)

gg_lag(myseries,y=Turnover)

DISCUSSION:

A quick look at this timeseries shows an upward trend, with possible yearly seasonality and also possible a 10 year cyclicity. Let’s take a closer look at seasonality. Note the downwardtrend in 2010.
 There is a yearly seasonality marked by a sharp increase in retail trade turnover in December.
 This subseries plot also depicts the seasonal changes, increase in December. You can see the downward trend starting in 2010 as well.
 The lag plot shows positive relationship at lag 1, and lag2, which reflect the seasonality.  Note the decrease in ACF as the lags increase due to the trend and the scallop due to the yearly seasonality.
 
 Let's tranform due to increased variability...

lambda3<-myseries%>%
  features(Turnover,features=guerrero)%>%
  pull(lambda_guerrero)

lambda3

## [1] -0.03519692

myseries<-myseries%>% mutate(newT=Turnover^lambda3)


  myseries%>%
  autoplot(box_cox(Turnover, lambda3))+
             labs(y="",
                  title=latex2exp::TeX(paste0(
                    "Transformed Turnover with $$\\lambda3$ =",
                    round(lambda3,2))))

 #plot after box plot transformation 
myseries%>%autoplot(newT)

#interestingly the graph seemed to invert



#test if uncorrelated with itself.
myseries%>% features(newT,unitroot_kpss)

## # A tibble: 1 x 4
##   State             Industry                     kpss_stat kpss_pvalue
##   <chr>             <chr>                            <dbl>       <dbl>
## 1 Western Australia Newspaper and book retailing      6.05        0.01

#reject based on p value, not stationary need differencing

myseries %>%
  mutate(diff_Turn = difference(newT))%>%
  features(diff_Turn, unitroot_kpss)

## # A tibble: 1 x 4
##   State             Industry                     kpss_stat kpss_pvalue
##   <chr>             <chr>                            <dbl>       <dbl>
## 1 Western Australia Newspaper and book retailing    0.0187         0.1

#1st difference ok

myseries <-myseries  %>%
  mutate(diff_Turn = difference(newT))

myseries %>%
  features(diff_Turn, unitroot_ndiffs)

## # A tibble: 1 x 3
##   State             Industry                     ndiffs
##   <chr>             <chr>                         <int>
## 1 Western Australia Newspaper and book retailing      0

#diff ok look at ACF and PACF

myseries%>%
  ACF((diff_Turn))%>%autoplot()

myseries%>%
  PACF((diff_Turn))%>%
  autoplot()

#AF and PACF show autocorrelations and partial autocorrelations, needs more work

myseries %>%
  
  features(diff_Turn, unitroot_nsdiffs)

## # A tibble: 1 x 3
##   State             Industry                     nsdiffs
##   <chr>             <chr>                          <int>
## 1 Western Australia Newspaper and book retailing       1

# this suggests to take 1 seasonal difference


myseries<-myseries %>%
  mutate(diff_Turn2 = difference(diff_Turn,12))

myseries%>%
  ACF((diff_Turn2))%>%
  autoplot()

myseries%>%
  PACF((diff_Turn2))%>%
  autoplot()

myseries%>% autoplot(diff_Turn2)

## Warning: Removed 13 row(s) containing missing values (geom_path).

DISCUSSION:

 My series does seem to approach stationary however, the ACF and the PACF plot reveal issues.  We may be leaving some information on the table.

Simulate and plot some data from simple ARIMA models.
1. Use the following R code to generate data from an AR(1) model with ϕ1=0.6 and σ2=1. The process starts with y1=0.

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

Produce a time plot for the series. How does the plot change as you change ϕ1 ?

sim%>%
  autoplot(y)

DISCUSSION:

Let’s try phi1=.1 phi2=.9

set.seed(6543221)
y2 <- ts(numeric(100))
y3 <- ts(numeric(100))
e2 <- rnorm(100)
e3 <- rnorm(100)

for(i in 2:100){
    y2[i] <- 0.1*y2[i-1] + e2[i]
    y3[i] <- 0.9*y3[i-1] + e3[i]
    
}

sim <- tsibble(idx = seq_len(100), y = y, y2 = y2, y3 = y3, index = idx)


plot1 <- sim %>% autoplot(y) + labs( title = "phi = 0.6")
plot2 <- sim %>% autoplot(y2) + labs( title = "phi = 0.1")
plot3 <- sim %>% autoplot(y3) + labs( title = "phi = 0.9")

plot1

plot2

plot3

DISCUSSION:

As phi approaches 0, seems to approach stationarity, white noise.

Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1.

set.seed(12345566)
y4 <- numeric(100)
e4 <- rnorm(100)

for(i in 2:100)
  y4[i] <- 0.6*e4[i-1] + e4[i]
sim <- tsibble(idx = seq_len(100), y4 = y4, index = idx)

Produce a time plot for the series. How does the plot change as you change θ1 ?

sim %>% autoplot(y4) + labs(title="MA(1) 0.6 ")

DISCUSSION: Changing theta:

Let’s try theta1=.1 theta2=.9

set.seed(55667788)

y5 <- ts(numeric(100))
e5 <- rnorm(100)

y6 <- ts(numeric(100))
e6 <- rnorm(100)


for(i in 2:100){
    y5[i] <- 0.1*e5[i-1] + e5[i]
    y6[i] <- 0.9*e6[i-1] + e6[i]
    
}

sim <- tsibble(idx = seq_len(100), y4 = y4, y5 = y5, y6 = y6, index = idx)


plot1a <- sim %>% autoplot(y4) + labs( title = "theta = 0.6")
plot2a <- sim %>% autoplot(y5) + labs( title = "theta = 0.1")
plot3a <- sim %>% autoplot(y6) + labs( title = "theta = 0.9")

plot1a

plot2a

plot3a

DISCUSSION:

Changing theta changes the scale of the series.

Generate data from an ARMA(1,1) model with
ϕ1=0.6, θ1=0.6 and σ2=1.

set.seed(222222)
a7 <- ts(numeric(100))
e7 <- rnorm(100)


for(i in 2:100){
  a7[i] <- 0.6*a7[i] + 0.6*e7[i-1] + e7[i] 
}


sim7 <- tsibble(idx = seq_len(100), a7 = a7, index = idx)


plot7 <- sim7 %>% autoplot(a7) + labs( title = "ARMA(1,1) theta=.6, phi=.6")

plot7

Generate data from an AR(2) model with
ϕ1=−0.8 ,ϕ2=0.3 and σ2=1 . (Note that these parameters will give a non-stationary series.)

set.seed(333333)
y8 <- ts(numeric(100))
e8 <- rnorm(100)


for(i in 3:100){
  y8[i] <- -0.8*y8[i-1] + 0.3*y8[i-2] + e8[i] 
}


sim8 <- tsibble(idx = seq_len(100), y8 = y8, index = idx)


plot8 <- sim8 %>% autoplot(y8) + labs( title = "AR(2) phi=-.8,.3")

plot8

g.  Graph the latter two series and compare them.

plot7

plot8

DISCUSSION: ARMA(1,1) a7[i] <- 0.6a7[i] + 0.6e7[i-1] + e7[i]

phi - .6

AR(2) y8[i] <- -0.8y8[i-1] + 0.3y8[i-2] + e8[i]

ARMA(1,1) similar to AR(1) with additional error term, appears approaching stationarity.

AR(2) has negative coefficient of -.8 which will cause the first term to alternatively be +,-,+,-. The magnitude of the first coefficient as compared to the second coefficient dominates.

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

head(aus_airpassengers)

## # A tsibble: 6 x 2 [1Y]
##    Year Passengers
##   <dbl>      <dbl>
## 1  1970       7.32
## 2  1971       7.33
## 3  1972       7.80
## 4  1973       9.38
## 5  1974      10.7 
## 6  1975      11.1

autoplot(aus_airpassengers)

## Plot variable not specified, automatically selected `.vars = Passengers`

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

fit<-aus_airpassengers %>%
  model(ARIMA(Passengers))

report(fit)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

DISCUSSION: Plotting the residuals of ARIMA(0,2,1)

fit2<-aus_airpassengers %>%
  model(arima021 =ARIMA(Passengers ~ pdq(0,2,1)))

fit2 %>%
  select(arima021) %>%
  gg_tsresiduals()

augment(fit2) %>% features(.innov, ljung_box, lag = 10)

## # A tibble: 1 x 3
##   .model   lb_stat lb_pvalue
##   <chr>      <dbl>     <dbl>
## 1 arima021    6.70     0.754

DISCUSSION:

acf and hist good, ljung_box not significant so random amount of change

  fit2 %>% forecast(h=10) %>%
  autoplot(aus_airpassengers)

Write the model in terms of the backshift operator.

DISCUSSION: ARIMA(0,2,1) \(((1−B)^2)Yt=(1+(\Theta 1)B)\epsilon t\)

Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

fit3<-aus_airpassengers %>%
  model(arima010 =ARIMA(Passengers ~ 1 + pdq(0,1,0)))

report(fit3)

## Series: Passengers 
## Model: ARIMA(0,1,0) w/ drift 
## 
## Coefficients:
##       constant
##         1.4191
## s.e.    0.3014
## 
## sigma^2 estimated as 4.271:  log likelihood=-98.16
## AIC=200.31   AICc=200.59   BIC=203.97

fit3 %>% forecast(h=10) %>%
  autoplot(aus_airpassengers)

Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

#ARIMA(2,1,2) with drift
fit4<-aus_airpassengers %>%
  model(arima212 =ARIMA(Passengers ~ 1 + pdq(2,1,2)))

report(fit4)

## Series: Passengers 
## Model: ARIMA(2,1,2) w/ drift 
## 
## Coefficients:
##           ar1      ar2     ma1     ma2  constant
##       -0.5518  -0.7327  0.5895  1.0000    3.2216
## s.e.   0.1684   0.1224  0.0916  0.1008    0.7224
## 
## sigma^2 estimated as 4.031:  log likelihood=-96.23
## AIC=204.46   AICc=206.61   BIC=215.43

fit4 %>% forecast(h=10) %>%
  autoplot(aus_airpassengers)

DISCUSSION:

ARIMA(2,1,2) without constant

#ARIMA(2,1,2) 
fit5<-aus_airpassengers %>%
  model(arima212n =ARIMA(Passengers ~ 0+pdq(2,1,2)))

## Warning: 1 error encountered for arima212n
## [1] non-stationary AR part from CSS

report(fit5)

## Series: Passengers 
## Model: NULL model 
## NULL model

fit5 %>% forecast(h=10) %>%
  autoplot(aus_airpassengers)

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning: Removed 10 row(s) containing missing values (geom_path).

DISCUSSION:

When removing constant, Warning message: 1 error encountered for arima212n [1] non-stationary AR part from CSS.

Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

#ARIMA(0,2,1) 
fit6<-aus_airpassengers %>%
  model(arima021 =ARIMA(Passengers ~ 1 +pdq(0,2,1)))

## Warning: Model specification induces a quadratic or higher order polynomial trend. 
## This is generally discouraged, consider removing the constant or reducing the number of differences.

report(fit6)

## Series: Passengers 
## Model: ARIMA(0,2,1) w/ poly 
## 
## Coefficients:
##           ma1  constant
##       -1.0000    0.0571
## s.e.   0.0585    0.0213
## 
## sigma^2 estimated as 3.855:  log likelihood=-95.1
## AIC=196.21   AICc=196.79   BIC=201.63

fit6 %>% forecast(h=10) %>%
  autoplot(aus_airpassengers)

DISCUSSION:

The forecasts are different, but similarly trending upward with the ARIMA(0,2,1) with the steepest increase.

For the United States GDP series (from global_economy):

US<-global_economy %>%
  filter(Country =="United States")
  
US %>%
  autoplot(GDP) + 
  labs(y = "GDP", title = "United States GDP")

if necessary, find a suitable Box-Cox transformation for the data;

DISCUSSION:

From the plot, we see no increasing or decreasing variance, so a box-plot is not necessary. The curvature indicates variable change in slope, however, so let’s run the boxplot.

lambda_U<-US%>%
  features(GDP,features=guerrero)%>%
  pull(lambda_guerrero)

lambda_U

## [1] 0.2819443

US<-US%>% mutate(GDP_U=GDP^lambda_U)


  US%>%
  autoplot(box_cox(GDP, lambda_U))+
             labs(y="",
                  title=latex2exp::TeX(paste0(
                    "Transformed Sales with $$\\lambda_U$ =",
                    round(lambda_U,2))))

 #plot after box plot transformation 
US%>%autoplot(GDP_U)

DISCUSSION:

The boxplot transformation removed the curvature.

fit a suitable ARIMA model to the transformed data using ARIMA();

fit_U<-
  US %>%
  model(ARIMA(GDP_U))
report(fit_U)

## Series: GDP_U 
## Model: ARIMA(1,1,0) w/ drift 
## 
## Coefficients:
##          ar1  constant
##       0.4586   33.3208
## s.e.  0.1198    2.6798
## 
## sigma^2 estimated as 435.6:  log likelihood=-253.16
## AIC=512.32   AICc=512.77   BIC=518.45

try some other plausible models by experimenting with the orders chosen;

US%>%
  features(GDP_U,unitroot_kpss)

## # A tibble: 1 x 3
##   Country       kpss_stat kpss_pvalue
##   <fct>             <dbl>       <dbl>
## 1 United States      1.55        0.01

US%>%
  features(GDP_U,unitroot_ndiffs)

## # A tibble: 1 x 2
##   Country       ndiffs
##   <fct>          <int>
## 1 United States      1

US%>%
  features(GDP_U,unitroot_nsdiffs)

## # A tibble: 1 x 2
##   Country       nsdiffs
##   <fct>           <int>
## 1 United States       0

#this suggests not stationary and we need to apply 1 difference.


USFit<-US%>%
  model(arima110= ARIMA(GDP_U ~pdq(1,1,0)),
        arima120= ARIMA(GDP_U ~pdq(1,2,0)),
        arima111= ARIMA(GDP_U ~pdq(1,1,1)),
        stepwise = ARIMA(GDP_U),
        serach = ARIMA(GDP_U, stepwise=FALSE))

USFit %>%
  pivot_longer(!Country, names_to ="Model name", values_to ="Orders")

## # A mable: 5 x 3
## # Key:     Country, Model name [5]
##   Country       `Model name`                  Orders
##   <fct>         <chr>                        <model>
## 1 United States arima110     <ARIMA(1,1,0) w/ drift>
## 2 United States arima120              <ARIMA(1,2,0)>
## 3 United States arima111     <ARIMA(1,1,1) w/ drift>
## 4 United States stepwise     <ARIMA(1,1,0) w/ drift>
## 5 United States serach       <ARIMA(1,1,0) w/ drift>

glance(USFit) %>%
  arrange (AICc) %>%
  select(.model:BIC)

## # A tibble: 5 x 6
##   .model   sigma2 log_lik   AIC  AICc   BIC
##   <chr>     <dbl>   <dbl> <dbl> <dbl> <dbl>
## 1 arima110   436.   -253.  512.  513.  518.
## 2 stepwise   436.   -253.  512.  513.  518.
## 3 serach     436.   -253.  512.  513.  518.
## 4 arima120   539.   -255.  514.  514.  518.
## 5 arima111   444.   -253.  514.  515.  522.

DISCUSSION:

The model with the lowest AICc is the arima(1,1,0) as the algorithm indicated.

choose what you think is the best model and check the residual diagnostics;

DISCUSSION:

Let’s look at the residuals for the best candidate model ARIMA(1,1,0)

US%>%
  gg_tsdisplay(difference(GDP_U), plot_type='partial')

## Warning: Removed 1 row(s) containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_point).

augment(USFit)%>%
  filter(.model=='arima110')%>%
  features(.innov, ljung_box, lag=10, dof=3)

## # A tibble: 1 x 4
##   Country       .model   lb_stat lb_pvalue
##   <fct>         <chr>      <dbl>     <dbl>
## 1 United States arima110    3.81     0.801

I not particularly happy with the first residual plot. However, ljung_box test ok.

produce forecasts of your fitted model. Do the forecasts look reasonable?

USFit %>%
  forecast(h=5) %>%
  filter(.model=='arima110') %>%
  autoplot(US)

compare the results with what you would obtain using ETS() (with no transformation).

fit_exp<-US %>%
  model(ETS(GDP))

fc<- fit_exp %>%
  forecast(h=10) %>%
  autoplot(US)
fc

report(fit_exp)

## Series: GDP 
## Model: ETS(M,A,N) 
##   Smoothing parameters:
##     alpha = 0.9990876 
##     beta  = 0.5011949 
## 
##   Initial states:
##          l[0]        b[0]
##  448093333334 64917355687
## 
##   sigma^2:  7e-04
## 
##      AIC     AICc      BIC 
## 3190.787 3191.941 3201.089

# recall arima(1,1,0 plot)
glance(USFit) %>%
  arrange (AICc) %>%
  select(.model:BIC)

## # A tibble: 5 x 6
##   .model   sigma2 log_lik   AIC  AICc   BIC
##   <chr>     <dbl>   <dbl> <dbl> <dbl> <dbl>
## 1 arima110   436.   -253.  512.  513.  518.
## 2 stepwise   436.   -253.  512.  513.  518.
## 3 serach     436.   -253.  512.  513.  518.
## 4 arima120   539.   -255.  514.  514.  518.
## 5 arima111   444.   -253.  514.  515.  522.

DISCUSSION:

AICc for ETS 3191.9 AICc for Arima(1,1,0)= 512

Arima (1,1,0 ) is the better model.

HW 6 624 Arima LSzydziak 101221

Lisa Szydziak

10/12/2021