All three plots show that the variations are white noise as they are well within the confidence interval (blue dotted lines). I notice as the samples grow larger (36->360->1000), the variation from 0 grow smaller and the confidence interval also becomes more precise
The critical values are at different distances from the mean due to sampling variation. The smaller the dataset, the more impact anomalies have. This is why the datasets with smaller sample size have larger variations from the mean. Since the autocorrelation lines get closer to zero as the sample size increases because its more confident the variations wont exceed those boundaries as there is more data analyzed
we can see that the data is not stationary as there looks to be a trend in the ACF plot. The PACF plot also shows a large spike in the beginning which indicates the series is non stationary
amzn = gafa_stock%>%
filter(Symbol == 'AMZN')
amzn%>%
autoplot(Close)+labs(title = 'AMZN Stock Closing Price')
amzn%>%
ACF(Close)%>%
autoplot()+labs(title = 'AMZN ACF')
## Warning: Provided data has an irregular interval, results should be treated with
## caution. Computing ACF by observation.
amzn%>%
PACF(Close)%>%
autoplot()+labs(title = 'AMZN PACF')
## Warning: Provided data has an irregular interval, results should be treated with
## caution. Computing ACF by observation.
get_lambda = function(data, feature){
data%>%
features(data[feature],features = guerrero)%>%
pull(lambda_guerrero)%>%
return()
}
Turkish GDP
Before transformation, we see that the turkish gdp ACF plot has a clear trend. After applying box_cox transformation and first order differencing, the turkish data is stationary.
turkish_gdp = global_economy%>%
filter(Country =='Turkey')%>%
select(GDP)
turkish_gdp%>%
ACF(GDP)%>%
autoplot()+labs(title = 'ACF Turkish GDP')
turkish_gdp%>%
mutate(diff_gdp = difference(box_cox(GDP,get_lambda(turkish_gdp,'GDP'))))%>%
ACF(diff_gdp)%>%
autoplot()+labs(title = 'ACF First Differencing')
Tasmanian Takings
Before differencing, the Takings ACF plot shows a pattern that follows a seasonal trend
Since the data has a seasonal component (quarterly) we apply differencing based on each quarter. This along with box_cox transformation leads to a staionary ACF plot for the data
tas_taking = aus_accommodation%>%
filter(State == "Tasmania")%>%
select(Takings)
tas_taking%>%
ACF(Takings)%>%
autoplot()+labs(title = 'ACF Tasmanian Takings')
tas_taking%>%
model(
classical_decomposition(Takings)
)%>%
components()%>%
autoplot(seasonal)
tas_taking%>%
mutate(dif_take = difference(box_cox(Takings,get_lambda(tas_taking,'Takings')),4))%>%
ACF(dif_take)%>%
autoplot()+labs(title = 'ACF Fisrt Differencing')
monthly sales
Initial ACF plot shows a cyclic trend in the sales data
since the data has a seasonal component (12 months) We apply differencing by each month. This, along with box-cox transformation, leads to data with ACF plot showing stationary data
souvenirs%>%
ACF(Sales)%>%
autoplot()+labs(title = 'ACF Monthly Sales')
souvenirs%>%
model(
classical_decomposition(Sales)
)%>%
components()%>%
autoplot(seasonal)
souvenirs%>%
ACF(difference(box_cox(Sales,get_lambda(souvenirs,'Sales')),12))%>%
autoplot()+labs(title = 'ACF first differencing')
Since the data is monthly, we apply differencing by month (12). This is not enough to make the data stationary as there is still a clear trend but we were able to remove the seasonality of the trend.
After the second differencing the data is stationary
set.seed(8009)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries%>%
ACF(Turnover)%>%
autoplot()+labs(title = 'ACF Turnover')
first_dif = myseries%>%
mutate(first_dif = difference(box_cox(Turnover,get_lambda(myseries,'Turnover')),12))
first_dif %>%
ACF(first_dif)%>%
autoplot()+labs(title = 'ACF First Differencing')
first_dif %>%
ACF(difference(first_dif))%>%
autoplot()+labs(title = 'ACF Second Differencing')
get_AR1 = function(phi){
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- phi*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
return(sim)
}
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- .6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim%>%
autoplot(y)
When phi1 is smaller, there is more noise in the data. There is less noise when phi1 is larger
get_AR1(0.1)%>%
autoplot(y)
get_AR1(1)%>%
autoplot(y)
get_MA1 = function(theta){
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- theta*e[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
return(sim)
}
As theta is smaller, the previous value (e[i-1]) holds less and less weight
set.seed(12345)
get_MA1(0.1)%>%
autoplot(y)
get_MA1(0.5)%>%
autoplot(y)
get_MA1(1)%>%
autoplot(y)
get_arima = function(phi,theta){
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- phi*y[i-1] + theta*e[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
return(sim)
}
arima = get_arima(0.6,0.6)
head(arima)
## # A tsibble: 6 x 2 [1]
## idx y
## <int> <dbl>
## 1 1 0
## 2 2 0.323
## 3 3 -0.241
## 4 4 0.791
## 5 5 1.08
## 6 6 0.614
get_AR2 = function(theta1,theta2){
y <- numeric(100)
e <- rnorm(100)
for(i in 3:100)
y[i] <- theta1*y[i-1] + theta2*y[i-2]+ e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
return(sim)
}
AR2 = get_AR2(-0.8,0.3)
head(AR2)
## # A tsibble: 6 x 2 [1]
## idx y
## <int> <dbl>
## 1 1 0
## 2 2 0
## 3 3 0.284
## 4 4 -1.23
## 5 5 0.452
## 6 6 0.0982
AR(2) model has a clear pattern to it (Increasing expoentially to the left) while there is no clear pattern generated by the arima model. Arima model maintains noise of the underlying data while the AR(2) does not
arima%>%
autoplot(y)
AR2%>%
autoplot(y)
arima_model = aus_airpassengers%>%
model(ARIMA(Passengers))
ARIMA(0,2,1) was chosen residuals are normal and around 0
arima_model%>%
report()
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8963
## s.e. 0.0594
##
## sigma^2 estimated as 4.308: log likelihood=-97.02
## AIC=198.04 AICc=198.32 BIC=201.65
arima_model%>%
gg_tsresiduals()
arima_model%>%
forecast(h=10)%>%
autoplot(aus_airpassengers)+labs(title = 'ARIMA(0,2,1)')
y(1-B)^2 = c + e(1+θ*B)
aus_airpassengers %>%
model(ARIMA(Passengers~pdq(0,1,0)))%>%
forecast(h=10)%>%
autoplot(aus_airpassengers)+labs(title = 'ARIMA(0,1,0)')
removing the constant introduces null values
aus_airpassengers %>%
model(ARIMA(Passengers~pdq(2,1,2)+1))%>%
forecast(h=10)%>%
autoplot(aus_airpassengers)+labs(title = 'ARIMA(2,1,2)')
Introducing the constant makes the forecast more curved than the original
aus_airpassengers %>%
model(ARIMA(Passengers~pdq(0,1,2)+1))%>%
forecast(h=10)%>%
autoplot(aus_airpassengers)+labs(title = 'ARIMA(0,1,2)+c')
usa_gdp = global_economy%>%
filter(Country == 'United States')%>%
select(GDP)
usa_gdp%>%
autoplot(GDP)
apply box cox to gdp to make the data more linear
usa_gdp = usa_gdp%>%
mutate(trans_gdp = box_cox(GDP,get_lambda(usa_gdp,'GDP')))
usa_gdp%>%
autoplot(trans_gdp)
automatically chosen model: ARIMA(1,1,0)
arima.usa_gdp = usa_gdp%>%
model(ARIMA(trans_gdp))
arima.usa_gdp%>%
report()
## Series: trans_gdp
## Model: ARIMA(1,1,0) w/ drift
##
## Coefficients:
## ar1 constant
## 0.4586 118.1822
## s.e. 0.1198 9.5047
##
## sigma^2 estimated as 5479: log likelihood=-325.32
## AIC=656.65 AICc=657.1 BIC=662.78
arima.models = usa_gdp %>%
model(
"ARIMA(000)" = ARIMA(trans_gdp~pdq(0,0,0)),
"ARIMA(212)" = ARIMA(trans_gdp~pdq(2,1,2)),
"ARIMA(110)" = ARIMA(trans_gdp~pdq(1,1,0)),
"ARIMA(021)" = ARIMA(trans_gdp~pdq(0,2,1))
)
Based on AIC, ARIMA(021) is the best model
The residuals are mostly around 0 with some spikes. Data is slightly skewed. Spikes seem to be white noise
arima.models%>%
report()
## Warning in report.mdl_df(.): Model reporting is only supported for individual
## models, so a glance will be shown. To see the report for a specific model, use
## `select()` and `filter()` to identify a single model.
## # A tibble: 4 x 8
## .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 ARIMA(000) 16049564. -563. 1130. 1130. 1134. <cpl [0]> <cpl [0]>
## 2 ARIMA(212) 5734. -325. 662. 664. 674. <cpl [2]> <cpl [2]>
## 3 ARIMA(110) 5479. -325. 657. 657. 663. <cpl [1]> <cpl [0]>
## 4 ARIMA(021) 6020. -323. 650. 650. 654. <cpl [0]> <cpl [1]>
arima.models%>%
select('ARIMA(021)')%>%
gg_tsresiduals()
Forecast looks reasonable based on previous trend
ARIMA.021 = usa_gdp%>%
model(ARIMA(trans_gdp~pdq(0,2,1)))
ARIMA.021%>%
forecast(h=10)%>%
autoplot(usa_gdp)+labs(title = 'USA GDP(box-cox) forecast')
The forecast using the ETS model casts a wider net. There is far greater variance in the forecast versus the ARIMA(021) model.
ETS_model = usa_gdp%>%
model(ETS(GDP))
ETS_model%>%
forecast(h=10)%>%
autoplot(usa_gdp)+labs(title = 'USA GDP Forecast (ETS)')
Based on RMSE, the ARIMA(021) model is superior to the ETS model
ETS_model%>%
accuracy%>%
select(RMSE)
## # A tibble: 1 x 1
## RMSE
## <dbl>
## 1 166906260796.
ARIMA.021%>%
accuracy%>%
select(RMSE)
## # A tibble: 1 x 1
## RMSE
## <dbl>
## 1 75.6