library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(latex2exp)
library(ggplot2)
library(fabletools)
library(fpp3)
## -- Attaching packages -------------------------------------------- fpp3 0.4.0 --
## v tibble 3.1.4 v tsibbledata 0.3.0
## v tidyr 1.1.3 v feasts 0.2.2
## v lubridate 1.7.10 v fable 0.3.1
## v tsibble 1.0.1
## -- Conflicts ------------------------------------------------- fpp3_conflicts --
## x lubridate::date() masks base::date()
## x dplyr::filter() masks stats::filter()
## x tsibble::intersect() masks base::intersect()
## x tsibble::interval() masks lubridate::interval()
## x dplyr::lag() masks stats::lag()
## x tsibble::setdiff() masks base::setdiff()
## x tsibble::union() masks base::union()
Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers.
Figure 9.32: Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers.
b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?
DISCUSSION:
From the text: "Time series that show no autocorrelation are called white noise. For white noise series, we expect each autocorrelation to be close to zero. Of course, they will not be exactly equal to zero as there is some random variation. For a white noise series, we expect 95% of the spikes in the ACF to lie within ±2/√T where T is the length of the time series. It is common to plot these bounds on a graph of the ACF (the blue dashed lines above).
So, the bounds for the three are computed =/-2/√T:
2/sqrt(36)
## [1] 0.3333333
2/sqrt(360)
## [1] 0.1054093
2/sqrt(1000)
## [1] 0.06324555
DISCUSSION:
This explains why the ACFS are different with respect to the bounds. They are indicate white noise because no spikes are outside of these bounds. These were a series of random numbers so we would expect positive and negative values.
gafa_stock %>%
filter(Symbol=="AMZN")%>%
autoplot(Close)+labs(y="$", title="Amazon CLOSE")
gafa_stock %>%
filter(Symbol=="AMZN")%>%
ACF(Close)%>%autoplot()
## Warning: Provided data has an irregular interval, results should be treated with
## caution. Computing ACF by observation.
gafa_stock %>%
filter(Symbol=="AMZN")%>%
PACF(Close)%>%autoplot()
## Warning: Provided data has an irregular interval, results should be treated with
## caution. Computing ACF by observation.
DISCUSSION:
The autoplot shows a trend upward than downward suggesting non-stationarity.
The ACF plot indicates autocorrelation with a downward decreasing lag without a scalloped pattern(no seasonality) with all lags exhibiting correlations significantly different from zero. This needs addressing.
The PCAF also lags exhibiting partial correlations significantly different from zero. This needs addressing as well.
All three plots suggest to 1st difference of the series.
For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
global_economy %>%
filter(Country=="Turkey")%>%
autoplot(GDP)+labs(y="$", title="TURKEY GDP")
Turkey<-global_economy %>%
filter(Country=="Turkey")
lambda<-Turkey%>%
features(GDP,features=guerrero)%>%
pull(lambda_guerrero)
Turkey%>% mutate(newG=GDP^lambda)
## # A tsibble: 58 x 10 [1Y]
## # Key: Country [1]
## Country Code Year GDP Growth CPI Imports Exports Population newG
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Turkey TUR 1960 1.40e10 NA 5.40e-5 3.67 2.06 27472331 39.4
## 2 Turkey TUR 1961 8.02e 9 1.16 5.57e-5 6.79 5.12 28146893 36.1
## 3 Turkey TUR 1962 8.92e 9 5.57 5.78e-5 7.97 5.60 28832805 36.7
## 4 Turkey TUR 1963 1.04e10 9.07 6.15e-5 6.97 4.18 29531342 37.5
## 5 Turkey TUR 1964 1.12e10 5.46 6.22e-5 5.47 4.47 30244232 38.0
## 6 Turkey TUR 1965 1.19e10 2.82 6.50e-5 5.40 4.56 30972965 38.4
## 7 Turkey TUR 1966 1.41e10 11.2 7.05e-5 5.66 4.09 31717477 39.4
## 8 Turkey TUR 1967 1.57e10 4.73 8.04e-5 4.96 4.11 32477961 40.1
## 9 Turkey TUR 1968 1.75e10 6.78 8.53e-5 5.08 3.68 33256432 40.8
## 10 Turkey TUR 1969 1.95e10 4.08 8.95e-5 4.74 3.60 34055361 41.5
## # ... with 48 more rows
Turkey%>%
autoplot(box_cox(GDP, lambda))+
labs(y="",
title=latex2exp::TeX(paste0(
"Transformed GDP with $$\\lambda$ =",
round(lambda,2))))
lambda
## [1] 0.1572187
Turkey<-Turkey%>% mutate(newG=GDP^lambda)
Turkey%>%autoplot(newG)
#plot after box plot transformation
Turkey%>%autoplot(newG)
#test if uncorrelated with itself.
Turkey%>% features(newG,ljung_box, lag=10)
## # A tibble: 1 x 3
## Country lb_stat lb_pvalue
## <fct> <dbl> <dbl>
## 1 Turkey 341. 0
#reject based on p value, need differencing
Turkey %>%
mutate(diff_close = difference(newG)) %>%
features(diff_close, ljung_box, lag = 10)
## # A tibble: 1 x 3
## Country lb_stat lb_pvalue
## <fct> <dbl> <dbl>
## 1 Turkey 5.84 0.829
#diff ok look at ACF and PACF
Turkey%>%
ACF(difference(newG))%>%autoplot()
Turkey%>%
PACF(difference(newG))%>%
autoplot()
DISCUSSION:
Turkey dataset - stationarity met after boxcox and 1st difference. Validated by the ACF, PACF all within interval.
b. Accommodation takings in the state of Tasmania from aus_accommodation.
aus_accommodation %>%
filter(State=="Tasmania")%>%
autoplot(Takings)+labs(y="$", title="Tasmania Takings")
Tasmania<-aus_accommodation %>%
filter(State=="Tasmania")
lambda1<-Tasmania%>%
features(Takings,features=guerrero)%>%
pull(lambda_guerrero)
lambda1
## [1] -0.04884781
Tasmania<-Tasmania%>% mutate(newT=Takings^lambda1)
Tasmania%>%
autoplot(box_cox(Takings, lambda1))+
labs(y="",
title=latex2exp::TeX(paste0(
"Transformed Takings with $$\\lambda1$ =",
round(lambda1,2))))
#plot after box plot transformation
Tasmania%>%autoplot(newT)
#test if uncorrelated with itself.
Tasmania%>% features(newT,unitroot_kpss)
## # A tibble: 1 x 3
## State kpss_stat kpss_pvalue
## <chr> <dbl> <dbl>
## 1 Tasmania 1.83 0.01
#reject based on p value, not stationary need differencing
Tasmania %>%
mutate(diff_Taking = difference(newT))%>%
features(diff_Taking, unitroot_kpss)
## # A tibble: 1 x 3
## State kpss_stat kpss_pvalue
## <chr> <dbl> <dbl>
## 1 Tasmania 0.256 0.1
#1st difference ok
Tasmania<-Tasmania %>%
mutate(diff_Taking = difference(newT))
Tasmania %>%
features(diff_Taking, unitroot_ndiffs)
## # A tibble: 1 x 2
## State ndiffs
## <chr> <int>
## 1 Tasmania 0
#diff ok look at ACF and PACF
Tasmania%>%
ACF((diff_Taking))%>%autoplot()
Tasmania%>%
PACF((diff_Taking))%>%
autoplot()
Tasmania %>%
features(diff_Taking, unitroot_nsdiffs)
## # A tibble: 1 x 2
## State nsdiffs
## <chr> <int>
## 1 Tasmania 1
# this suggests to take 1 seasonal difference
Tasmania<-Tasmania %>%
mutate(diff_Taking2 = difference(diff_Taking,12))
Tasmania%>%
ACF((diff_Taking2))%>%
autoplot()
Tasmania%>%
PACF((diff_Taking2))%>%
autoplot()
Tasmania%>% autoplot(diff_Taking2)
## Warning: Removed 13 row(s) containing missing values (geom_path).
DISCUSSION:
First transforming with box cox, then taking the first difference followed by taking a seasonal difference greatly improved the ACF and the PACF plots. There is still an area of concern with repect to these plots at 12, however, the autoplot shows an improvement toward stationarity.
c. Monthly sales from souvenirs.
souvenirs %>%
autoplot(Sales)+labs(y="$", title="Souvenir sales")
lambda2<-souvenirs%>%
features(Sales,features=guerrero)%>%
pull(lambda_guerrero)
lambda2
## [1] 0.002118221
SouvSales<-souvenirs%>% mutate(newS=Sales^lambda2)
SouvSales%>%
autoplot(box_cox(Sales, lambda2))+
labs(y="",
title=latex2exp::TeX(paste0(
"Transformed Sales with $$\\lambda2$ =",
round(lambda2,2))))
#plot after box plot transformation
SouvSales%>%autoplot(newS)
#test if uncorrelated with itself.
SouvSales%>% features(newS,unitroot_kpss)
## # A tibble: 1 x 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 1.79 0.01
#reject based on p value, not stationary need differencing
SouvSales %>%
mutate(diff_Sales = difference(newS))%>%
features(diff_Sales, unitroot_kpss)
## # A tibble: 1 x 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 0.0631 0.1
#1st difference ok
SouvSales <-SouvSales %>%
mutate(diff_Sales = difference(newS))
SouvSales %>%
features(diff_Sales, unitroot_ndiffs)
## # A tibble: 1 x 1
## ndiffs
## <int>
## 1 0
#diff ok look at ACF and PACF
SouvSales%>%
ACF((diff_Sales))%>%autoplot()
SouvSales%>%
PACF((diff_Sales))%>%
autoplot()
SouvSales %>%
features(diff_Sales, unitroot_nsdiffs)
## # A tibble: 1 x 1
## nsdiffs
## <int>
## 1 1
# this suggests to take 1 seasonal difference
SouvSales<-SouvSales %>%
mutate(diff_Sales2 = difference(diff_Sales,12))
SouvSales%>%
ACF((diff_Sales2))%>%
autoplot()
SouvSales%>%
PACF((diff_Sales2))%>%
autoplot()
SouvSales%>% autoplot(diff_Sales2)
## Warning: Removed 13 row(s) containing missing values (geom_path).
DISCUSSION:
This series was transformed with a boxcox transformation, 1st differenced and then a seasonality difference. While there is a marked improvement towards stationary, the model still needs more work as displayed in the ACF and PACF plots exceeding the blue boundaries.
The model may be more complex and needs more analysis.
5. For your retail data (from Exercise 8 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
aus_retail %>% distinct(Industry)
## # A tibble: 20 x 1
## Industry
## <chr>
## 1 Cafes, restaurants and catering services
## 2 Cafes, restaurants and takeaway food services
## 3 Clothing retailing
## 4 Clothing, footwear and personal accessory retailing
## 5 Department stores
## 6 Electrical and electronic goods retailing
## 7 Food retailing
## 8 Footwear and other personal accessory retailing
## 9 Furniture, floor coverings, houseware and textile goods retailing
## 10 Hardware, building and garden supplies retailing
## 11 Household goods retailing
## 12 Liquor retailing
## 13 Newspaper and book retailing
## 14 Other recreational goods retailing
## 15 Other retailing
## 16 Other retailing n.e.c.
## 17 Other specialised food retailing
## 18 Pharmaceutical, cosmetic and toiletry goods retailing
## 19 Supermarket and grocery stores
## 20 Takeaway food services
set.seed(1111111)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries %>% distinct(Industry)
## # A tibble: 1 x 1
## Industry
## <chr>
## 1 Newspaper and book retailing
head(myseries)
## # A tsibble: 6 x 5 [1M]
## # Key: State, Industry [1]
## State Industry `Series ID` Month Turnover
## <chr> <chr> <chr> <mth> <dbl>
## 1 Western Australia Newspaper and book retailing A3349822A 1982 Apr 9.7
## 2 Western Australia Newspaper and book retailing A3349822A 1982 May 11
## 3 Western Australia Newspaper and book retailing A3349822A 1982 Jun 10.7
## 4 Western Australia Newspaper and book retailing A3349822A 1982 Jul 9
## 5 Western Australia Newspaper and book retailing A3349822A 1982 Aug 9.1
## 6 Western Australia Newspaper and book retailing A3349822A 1982 Sep 10
str(myseries)
## tbl_ts [441 x 5] (S3: tbl_ts/tbl_df/tbl/data.frame)
## $ State : chr [1:441] "Western Australia" "Western Australia" "Western Australia" "Western Australia" ...
## $ Industry : chr [1:441] "Newspaper and book retailing" "Newspaper and book retailing" "Newspaper and book retailing" "Newspaper and book retailing" ...
## $ Series ID: chr [1:441] "A3349822A" "A3349822A" "A3349822A" "A3349822A" ...
## $ Month : mth [1:441] 1982 Apr, 1982 May, 1982 Jun, 1982 Jul, 1982 Aug, 1982 Sep...
## $ Turnover : num [1:441] 9.7 11 10.7 9 9.1 10 7.7 8.4 11.8 7.4 ...
## - attr(*, "key")= tibble [1 x 3] (S3: tbl_df/tbl/data.frame)
## ..$ State : chr "Western Australia"
## ..$ Industry: chr "Newspaper and book retailing"
## ..$ .rows : list<int> [1:1]
## .. ..$ : int [1:441] 1 2 3 4 5 6 7 8 9 10 ...
## .. ..@ ptype: int(0)
## ..- attr(*, ".drop")= logi TRUE
## - attr(*, "index")= chr "Month"
## ..- attr(*, "ordered")= logi TRUE
## - attr(*, "index2")= chr "Month"
## - attr(*, "interval")= interval [1:1] 1M
## ..@ .regular: logi TRUE
autoplot(myseries,.vars=Turnover)
gg_season(myseries,y=Turnover)
gg_subseries(myseries,y=Turnover)
gg_lag(myseries,y=Turnover)
DISCUSSION:
A quick look at this timeseries shows an upward trend, with possible yearly seasonality and also possible a 10 year cyclicity. Let’s take a closer look at seasonality. Note the downwardtrend in 2010.
There is a yearly seasonality marked by a sharp increase in retail trade turnover in December.
This subseries plot also depicts the seasonal changes, increase in December. You can see the downward trend starting in 2010 as well.
The lag plot shows positive relationship at lag 1, and lag2, which reflect the seasonality. Note the decrease in ACF as the lags increase due to the trend and the scallop due to the yearly seasonality.
Let's tranform due to increased variability...
lambda3<-myseries%>%
features(Turnover,features=guerrero)%>%
pull(lambda_guerrero)
lambda3
## [1] -0.03519692
myseries<-myseries%>% mutate(newT=Turnover^lambda3)
myseries%>%
autoplot(box_cox(Turnover, lambda3))+
labs(y="",
title=latex2exp::TeX(paste0(
"Transformed Turnover with $$\\lambda3$ =",
round(lambda3,2))))
#plot after box plot transformation
myseries%>%autoplot(newT)
#interestingly the graph seemed to invert
#test if uncorrelated with itself.
myseries%>% features(newT,unitroot_kpss)
## # A tibble: 1 x 4
## State Industry kpss_stat kpss_pvalue
## <chr> <chr> <dbl> <dbl>
## 1 Western Australia Newspaper and book retailing 6.05 0.01
#reject based on p value, not stationary need differencing
myseries %>%
mutate(diff_Turn = difference(newT))%>%
features(diff_Turn, unitroot_kpss)
## # A tibble: 1 x 4
## State Industry kpss_stat kpss_pvalue
## <chr> <chr> <dbl> <dbl>
## 1 Western Australia Newspaper and book retailing 0.0187 0.1
#1st difference ok
myseries <-myseries %>%
mutate(diff_Turn = difference(newT))
myseries %>%
features(diff_Turn, unitroot_ndiffs)
## # A tibble: 1 x 3
## State Industry ndiffs
## <chr> <chr> <int>
## 1 Western Australia Newspaper and book retailing 0
#diff ok look at ACF and PACF
myseries%>%
ACF((diff_Turn))%>%autoplot()
myseries%>%
PACF((diff_Turn))%>%
autoplot()
#AF and PACF show autocorrelations and partial autocorrelations, needs more work
myseries %>%
features(diff_Turn, unitroot_nsdiffs)
## # A tibble: 1 x 3
## State Industry nsdiffs
## <chr> <chr> <int>
## 1 Western Australia Newspaper and book retailing 1
# this suggests to take 1 seasonal difference
myseries<-myseries %>%
mutate(diff_Turn2 = difference(diff_Turn,12))
myseries%>%
ACF((diff_Turn2))%>%
autoplot()
myseries%>%
PACF((diff_Turn2))%>%
autoplot()
myseries%>% autoplot(diff_Turn2)
## Warning: Removed 13 row(s) containing missing values (geom_path).
DISCUSSION:
My series does seem to approach stationary however, the ACF and the PACF plot reveal issues. We may be leaving some information on the table.
Simulate and plot some data from simple ARIMA models.
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim%>%
autoplot(y)
DISCUSSION:
Let’s try phi1=.1 phi2=.9
set.seed(6543221)
y2 <- ts(numeric(100))
y3 <- ts(numeric(100))
e2 <- rnorm(100)
e3 <- rnorm(100)
for(i in 2:100){
y2[i] <- 0.1*y2[i-1] + e2[i]
y3[i] <- 0.9*y3[i-1] + e3[i]
}
sim <- tsibble(idx = seq_len(100), y = y, y2 = y2, y3 = y3, index = idx)
plot1 <- sim %>% autoplot(y) + labs( title = "phi = 0.6")
plot2 <- sim %>% autoplot(y2) + labs( title = "phi = 0.1")
plot3 <- sim %>% autoplot(y3) + labs( title = "phi = 0.9")
plot1
plot2
plot3
DISCUSSION:
As phi approaches 0, seems to approach stationarity, white noise.
set.seed(12345566)
y4 <- numeric(100)
e4 <- rnorm(100)
for(i in 2:100)
y4[i] <- 0.6*e4[i-1] + e4[i]
sim <- tsibble(idx = seq_len(100), y4 = y4, index = idx)
sim %>% autoplot(y4) + labs(title="MA(1) 0.6 ")
DISCUSSION: Changing theta:
Let’s try theta1=.1 theta2=.9
set.seed(55667788)
y5 <- ts(numeric(100))
e5 <- rnorm(100)
y6 <- ts(numeric(100))
e6 <- rnorm(100)
for(i in 2:100){
y5[i] <- 0.1*e5[i-1] + e5[i]
y6[i] <- 0.9*e6[i-1] + e6[i]
}
sim <- tsibble(idx = seq_len(100), y4 = y4, y5 = y5, y6 = y6, index = idx)
plot1a <- sim %>% autoplot(y4) + labs( title = "theta = 0.6")
plot2a <- sim %>% autoplot(y5) + labs( title = "theta = 0.1")
plot3a <- sim %>% autoplot(y6) + labs( title = "theta = 0.9")
plot1a
plot2a
plot3a
DISCUSSION:
Changing theta changes the scale of the series.
set.seed(222222)
a7 <- ts(numeric(100))
e7 <- rnorm(100)
for(i in 2:100){
a7[i] <- 0.6*a7[i] + 0.6*e7[i-1] + e7[i]
}
sim7 <- tsibble(idx = seq_len(100), a7 = a7, index = idx)
plot7 <- sim7 %>% autoplot(a7) + labs( title = "ARMA(1,1) theta=.6, phi=.6")
plot7
set.seed(333333)
y8 <- ts(numeric(100))
e8 <- rnorm(100)
for(i in 3:100){
y8[i] <- -0.8*y8[i-1] + 0.3*y8[i-2] + e8[i]
}
sim8 <- tsibble(idx = seq_len(100), y8 = y8, index = idx)
plot8 <- sim8 %>% autoplot(y8) + labs( title = "AR(2) phi=-.8,.3")
plot8
g. Graph the latter two series and compare them.
plot7
plot8
DISCUSSION: ARMA(1,1) a7[i] <- 0.6a7[i] + 0.6e7[i-1] + e7[i]
phi - .6
AR(2) y8[i] <- -0.8y8[i-1] + 0.3y8[i-2] + e8[i]
ARMA(1,1) similar to AR(1) with additional error term, appears approaching stationarity.
AR(2) has negative coefficient of -.8 which will cause the first term to alternatively be +,-,+,-. The magnitude of the first coefficient as compared to the second coefficient dominates.
head(aus_airpassengers)
## # A tsibble: 6 x 2 [1Y]
## Year Passengers
## <dbl> <dbl>
## 1 1970 7.32
## 2 1971 7.33
## 3 1972 7.80
## 4 1973 9.38
## 5 1974 10.7
## 6 1975 11.1
autoplot(aus_airpassengers)
## Plot variable not specified, automatically selected `.vars = Passengers`
fit<-aus_airpassengers %>%
model(ARIMA(Passengers))
report(fit)
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8963
## s.e. 0.0594
##
## sigma^2 estimated as 4.308: log likelihood=-97.02
## AIC=198.04 AICc=198.32 BIC=201.65
DISCUSSION: Plotting the residuals of ARIMA(0,2,1)
fit2<-aus_airpassengers %>%
model(arima021 =ARIMA(Passengers ~ pdq(0,2,1)))
fit2 %>%
select(arima021) %>%
gg_tsresiduals()
augment(fit2) %>% features(.innov, ljung_box, lag = 10)
## # A tibble: 1 x 3
## .model lb_stat lb_pvalue
## <chr> <dbl> <dbl>
## 1 arima021 6.70 0.754
DISCUSSION:
acf and hist good, ljung_box not significant so random amount of change
fit2 %>% forecast(h=10) %>%
autoplot(aus_airpassengers)
DISCUSSION: ARIMA(0,2,1) \(((1−B)^2)Yt=(1+(\Theta 1)B)\epsilon t\)
fit3<-aus_airpassengers %>%
model(arima010 =ARIMA(Passengers ~ 1 + pdq(0,1,0)))
report(fit3)
## Series: Passengers
## Model: ARIMA(0,1,0) w/ drift
##
## Coefficients:
## constant
## 1.4191
## s.e. 0.3014
##
## sigma^2 estimated as 4.271: log likelihood=-98.16
## AIC=200.31 AICc=200.59 BIC=203.97
fit3 %>% forecast(h=10) %>%
autoplot(aus_airpassengers)
#ARIMA(2,1,2) with drift
fit4<-aus_airpassengers %>%
model(arima212 =ARIMA(Passengers ~ 1 + pdq(2,1,2)))
report(fit4)
## Series: Passengers
## Model: ARIMA(2,1,2) w/ drift
##
## Coefficients:
## ar1 ar2 ma1 ma2 constant
## -0.5518 -0.7327 0.5895 1.0000 3.2216
## s.e. 0.1684 0.1224 0.0916 0.1008 0.7224
##
## sigma^2 estimated as 4.031: log likelihood=-96.23
## AIC=204.46 AICc=206.61 BIC=215.43
fit4 %>% forecast(h=10) %>%
autoplot(aus_airpassengers)
DISCUSSION:
ARIMA(2,1,2) without constant
#ARIMA(2,1,2)
fit5<-aus_airpassengers %>%
model(arima212n =ARIMA(Passengers ~ 0+pdq(2,1,2)))
## Warning: 1 error encountered for arima212n
## [1] non-stationary AR part from CSS
report(fit5)
## Series: Passengers
## Model: NULL model
## NULL model
fit5 %>% forecast(h=10) %>%
autoplot(aus_airpassengers)
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 10 row(s) containing missing values (geom_path).
DISCUSSION:
When removing constant, Warning message: 1 error encountered for arima212n [1] non-stationary AR part from CSS.
#ARIMA(0,2,1)
fit6<-aus_airpassengers %>%
model(arima021 =ARIMA(Passengers ~ 1 +pdq(0,2,1)))
## Warning: Model specification induces a quadratic or higher order polynomial trend.
## This is generally discouraged, consider removing the constant or reducing the number of differences.
report(fit6)
## Series: Passengers
## Model: ARIMA(0,2,1) w/ poly
##
## Coefficients:
## ma1 constant
## -1.0000 0.0571
## s.e. 0.0585 0.0213
##
## sigma^2 estimated as 3.855: log likelihood=-95.1
## AIC=196.21 AICc=196.79 BIC=201.63
fit6 %>% forecast(h=10) %>%
autoplot(aus_airpassengers)
DISCUSSION:
The forecasts are different, but similarly trending upward with the ARIMA(0,2,1) with the steepest increase.
US<-global_economy %>%
filter(Country =="United States")
US %>%
autoplot(GDP) +
labs(y = "GDP", title = "United States GDP")
DISCUSSION:
From the plot, we see no increasing or decreasing variance, so a box-plot is not necessary. The curvature indicates variable change in slope, however, so let’s run the boxplot.
lambda_U<-US%>%
features(GDP,features=guerrero)%>%
pull(lambda_guerrero)
lambda_U
## [1] 0.2819443
US<-US%>% mutate(GDP_U=GDP^lambda_U)
US%>%
autoplot(box_cox(GDP, lambda_U))+
labs(y="",
title=latex2exp::TeX(paste0(
"Transformed Sales with $$\\lambda_U$ =",
round(lambda_U,2))))
#plot after box plot transformation
US%>%autoplot(GDP_U)
DISCUSSION:
The boxplot transformation removed the curvature.
fit_U<-
US %>%
model(ARIMA(GDP_U))
report(fit_U)
## Series: GDP_U
## Model: ARIMA(1,1,0) w/ drift
##
## Coefficients:
## ar1 constant
## 0.4586 33.3208
## s.e. 0.1198 2.6798
##
## sigma^2 estimated as 435.6: log likelihood=-253.16
## AIC=512.32 AICc=512.77 BIC=518.45
US%>%
features(GDP_U,unitroot_kpss)
## # A tibble: 1 x 3
## Country kpss_stat kpss_pvalue
## <fct> <dbl> <dbl>
## 1 United States 1.55 0.01
US%>%
features(GDP_U,unitroot_ndiffs)
## # A tibble: 1 x 2
## Country ndiffs
## <fct> <int>
## 1 United States 1
US%>%
features(GDP_U,unitroot_nsdiffs)
## # A tibble: 1 x 2
## Country nsdiffs
## <fct> <int>
## 1 United States 0
#this suggests not stationary and we need to apply 1 difference.
USFit<-US%>%
model(arima110= ARIMA(GDP_U ~pdq(1,1,0)),
arima120= ARIMA(GDP_U ~pdq(1,2,0)),
arima111= ARIMA(GDP_U ~pdq(1,1,1)),
stepwise = ARIMA(GDP_U),
serach = ARIMA(GDP_U, stepwise=FALSE))
USFit %>%
pivot_longer(!Country, names_to ="Model name", values_to ="Orders")
## # A mable: 5 x 3
## # Key: Country, Model name [5]
## Country `Model name` Orders
## <fct> <chr> <model>
## 1 United States arima110 <ARIMA(1,1,0) w/ drift>
## 2 United States arima120 <ARIMA(1,2,0)>
## 3 United States arima111 <ARIMA(1,1,1) w/ drift>
## 4 United States stepwise <ARIMA(1,1,0) w/ drift>
## 5 United States serach <ARIMA(1,1,0) w/ drift>
glance(USFit) %>%
arrange (AICc) %>%
select(.model:BIC)
## # A tibble: 5 x 6
## .model sigma2 log_lik AIC AICc BIC
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 arima110 436. -253. 512. 513. 518.
## 2 stepwise 436. -253. 512. 513. 518.
## 3 serach 436. -253. 512. 513. 518.
## 4 arima120 539. -255. 514. 514. 518.
## 5 arima111 444. -253. 514. 515. 522.
DISCUSSION:
The model with the lowest AICc is the arima(1,1,0) as the algorithm indicated.
DISCUSSION:
Let’s look at the residuals for the best candidate model ARIMA(1,1,0)
US%>%
gg_tsdisplay(difference(GDP_U), plot_type='partial')
## Warning: Removed 1 row(s) containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_point).
augment(USFit)%>%
filter(.model=='arima110')%>%
features(.innov, ljung_box, lag=10, dof=3)
## # A tibble: 1 x 4
## Country .model lb_stat lb_pvalue
## <fct> <chr> <dbl> <dbl>
## 1 United States arima110 3.81 0.801
I not particularly happy with the first residual plot. However, ljung_box test ok.
USFit %>%
forecast(h=5) %>%
filter(.model=='arima110') %>%
autoplot(US)
fit_exp<-US %>%
model(ETS(GDP))
fc<- fit_exp %>%
forecast(h=10) %>%
autoplot(US)
fc
report(fit_exp)
## Series: GDP
## Model: ETS(M,A,N)
## Smoothing parameters:
## alpha = 0.9990876
## beta = 0.5011949
##
## Initial states:
## l[0] b[0]
## 448093333334 64917355687
##
## sigma^2: 7e-04
##
## AIC AICc BIC
## 3190.787 3191.941 3201.089
# recall arima(1,1,0 plot)
glance(USFit) %>%
arrange (AICc) %>%
select(.model:BIC)
## # A tibble: 5 x 6
## .model sigma2 log_lik AIC AICc BIC
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 arima110 436. -253. 512. 513. 518.
## 2 stepwise 436. -253. 512. 513. 518.
## 3 serach 436. -253. 512. 513. 518.
## 4 arima120 539. -255. 514. 514. 518.
## 5 arima111 444. -253. 514. 515. 522.
DISCUSSION:
AICc for ETS 3191.9 AICc for Arima(1,1,0)= 512
Arima (1,1,0 ) is the better model.