Do the exercises 9.1, 9.2, 9.3, 9.5, 9.6, 9.7, 9.8 in Hyndman. Please submit both the Rpubs link as well as your .rmd file.
A. Explain the differences among these figures. Do they all indicate that the data are white noise?
I believe that the difference among the graph is the length of the time series which is smaller causing the ACF bounds to become continuosly narrower. In addition each graph indicates the data is white noise as the spikes remain within the bounds.
knitr::include_graphics('D:/CUNY SPS/Spring 2022/DATA 624/9.11.JPG')
Figure 9.32: Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers.
B. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?
We can see that critical values are defined by the formula (±2–√T). According to the text which depends on T. we see that in the plots, bound length decreases as the length of time series increases. For all three cases the critical value is different due to length of time series.
I feel that the autoplot graph of AZMN closing stock price shows that there are clear trends and seasonal pattern. For example, the plot is not exactly horizontal.
Looking at the ACF graphs, you can see that the closing price is well beyond the ACF boundaries. Meaning that the data is not white noise. If the data is stationary, the spikes would almost completely stay within the ACF boundaries.
gafa_stock %>%
filter(Symbol == 'AMZN') %>%
autoplot(Close) +
labs(title='Amazon Closing Stock Prices')
gafa_stock %>%
filter(Symbol == 'AMZN') %>%
gg_tsdisplay(Close, plot_type = 'partial')
Gfdiffffff1 <- diff(gafa_stock$Close)
ggtsdisplay(Gfdiffffff1)
A. Turkish GDP from global_economy.
TurkishGDP <- global_economy %>% filter(Country == "Turkey")
ggtsdisplay(TurkishGDP$GDP)
X_BoxCox <- BoxCox(TurkishGDP$GDP, lambda = BoxCox.lambda(TurkishGDP$GDP))
ndiffs(X_BoxCox)
## [1] 1
X <- diff(X_BoxCox)
ggtsdisplay(X)
Tsm1 <- aus_accommodation %>% filter(State == "Tasmania")
ggtsdisplay(Tsm1$Takings)
T_BxCx <- BoxCox(Tsm1$Takings, lambda = BoxCox.lambda(Tsm1$Takings))
ndiffs(T_BxCx)
## [1] 1
Tsmdif <- diff(T_BxCx)
ggtsdisplay(Tsmdif)
Sv1 <- souvenirs
ggtsdisplay(Sv1$Sales)
Sv_BxCx <- BoxCox(Sv1$Sales, lambda = BoxCox.lambda(Sv1$Sales))
ndiffs(Sv_BxCx)
## [1] 1
Svdif <- diff(Sv_BxCx)
ggtsdisplay(Svdif)
set.seed(888)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
ggtsdisplay(myseries$Turnover)
Given our familiarity with this series, the increasing variance with the levels, and the strong correlation, it is evident that the series is non-stationary and needs a BoxCox transformation.
trs_series <- myseries
trs_series$Turnover <- box_cox(myseries$Turnover, BoxCox.lambda(myseries$Turnover))
ggtsdisplay(trs_series$Turnover)
Now we should examine the trs series kpss stat and differencing order
unitroot_kpss(trs_series$Turnover)
## kpss_stat kpss_pvalue
## 6.128458 0.010000
since we not high kpss stat, then we could assume the series is NOT stationary
unitroot_ndiffs(trs_series$Turnover)
## ndiffs
## 1
ggtsdisplay(diff(trs_series$Turnover))
unitroot_kpss(diff(trs_series$Turnover))
## kpss_stat kpss_pvalue
## 0.01054472 0.10000000
So we see after we do a 1st order differencing our series produces a kpss_stat val that is in range of acceptance for stationary series
Calculating the first order differencing of a time series is useful for converting a non stationary time series to a stationary form. It is calculated as follows. The i-th data point Y_i of a time series is replaced by Y’i = (Y_i - Y(i-1).
A. Use the following R code to generate data from an AR(1) model with ϕ1=0.6 and σ2=1. The process starts with y1=0.
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
B. Produce a time plot for the series. How does the plot change as you change ϕ1
#ϕ=0.6, σ^2 = 1, y=0
sim %>% autoplot(y)
It appears that an increase in the ϕ1 forms a trend and reduce randomness.
#increase Ï• --> Ï•=1
for(i in 2:100)
y[i] <- 1*y[i-1] + e[i]
sim_h <- tsibble(idx = seq_len(100), y = y, index = idx)
sim_h %>% autoplot(y)
#decrease Ï• --> Ï•=0
for(i in 2:100)
y[i] <- 0*y[i-1] + e[i]
sim_l <- tsibble(idx = seq_len(100), y = y, index = idx)
sim_l %>% autoplot(y)
We can see that a reduction of Ï• appears to the reduction in magnitude.
C. Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1
#MA(1) model with θ = 0.6 and σ ^ 2 = 1
for(i in 2:100)
y[i] <- 0.6*e[i-1] + e[i]
sim_m1 <- tsibble(idx = seq_len(100), y = y, index = idx)
#θ=0.6, σ^2 = 1
sim_m1 %>% autoplot(y)
D. Produce a time plot for the series. How does the plot change as you change θ1?
#decrease θ --> θ=0
for(i in 2:100)
y[i] <- 0*e[i-1] + e[i]
sim_m1_l <- tsibble(idx = seq_len(100), y = y, index = idx)
sim_m1_l %>% autoplot(y)
#increase θ --> θ=0.9
for(i in 2:100)
y[i] <- 0.9*e[i-1] + e[i]
sim_m1_h <- tsibble(idx = seq_len(100), y = y, index = idx)
sim_m1_h %>% autoplot(y)
I think that reducing of θ results in the reduction in magnitude and increase in frequency of spikes. Also more recent observations appear to have more weight. More over, while an increase has no noticeable effect on the magnitude and the output plot more closely resembles that of our original MA(1) plot.
E. Generate data from an ARMA(1,1) model with ϕ1=0.6 , θ1=0.6 and σ2=1
#ARMA(1,1) model with ϕ = 0.6, θ = 0.6 and σ^2 = 1
for(i in 2:100)
y[i] <- 0.6*y[i-1] + 0.6*e[i-1] + e[i] #combo AR-MA models
sim_ar <- tsibble(idx = seq_len(100), y = y, index = idx)
head(sim_ar)
## # A tsibble: 6 x 2 [1]
## idx y
## <int> <dbl>
## 1 1 0
## 2 2 -0.197
## 3 3 0.0422
## 4 4 -1.80
## 5 5 -2.32
## 6 6 -3.71
#AR(2) model with ϕ1 = 0.8, ϕ2 = 0.3 and σ^2 = 1
for(i in 3:100)
y[i] <- -0.8*y[i-1] + 0.3*y[i-2] + e[i] #combo AR-MA models
sim_ardos <- tsibble(idx = seq_len(100), y = y, index = idx)
head(sim_ardos)
## # A tsibble: 6 x 2 [1]
## idx y
## <int> <dbl>
## 1 1 0
## 2 2 -0.197
## 3 3 -0.120
## 4 4 -1.62
## 5 5 1.01
## 6 6 -3.46
sim_ardos %>%
gg_tsdisplay(y, plot_type = 'partial')
sim_ar %>%
gg_tsdisplay(y, plot_type = 'partial')
AIC = 198.04 AICc = 198.32 The ARIMA function show ARIMA(0, 2, 1)
autoplot(aus_airpassengers)
## Plot variable not specified, automatically selected `.vars = Passengers`
fit <- aus_airpassengers %>%
model(ARIMA(Passengers))
report(fit)
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8963
## s.e. 0.0594
##
## sigma^2 estimated as 4.308: log likelihood=-97.02
## AIC=198.04 AICc=198.32 BIC=201.65
helpful to view our residuals
fit %>% gg_tsresiduals()
this may suggest whitenoise. let see forecast now
fit %>% forecast(h=10) %>% autoplot(aus_airpassengers)
b.Write the model in terms of the backshift operator. Arima(0,2,1) in terms of the backshift operator: ((1-B)^2)(1 +(theta)B)
c.Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.
fit2 <- Arima(aus_airpassengers$Passengers, order = c(0,1,0), include.drift = TRUE)
fit2 %>% forecast(h=10) %>% autoplot()
When we compare to part (a) the ACF lags can be noticed more postive. Additionally AIC is also bit higher. It looks like the residuals still appear to be white noise. Our confidence interval for part (a) is slightly larger than the range for this plot.
AIC = 200.31 AICc = 200.59
fit2
## Series: aus_airpassengers$Passengers
## ARIMA(0,1,0) with drift
##
## Coefficients:
## drift
## 1.4191
## s.e. 0.3014
##
## sigma^2 = 4.271: log likelihood = -98.16
## AIC=200.31 AICc=200.59 BIC=203.97
checkresiduals(fit2$residuals)
fit3 <- Arima(aus_airpassengers$Passengers, order = c(2,1,2), include.drift = TRUE)
fit3 %>% forecast(h=10) %>% autoplot()
fit3
## Series: aus_airpassengers$Passengers
## ARIMA(2,1,2) with drift
##
## Coefficients:
## ar1 ar2 ma1 ma2 drift
## -0.5518 -0.7327 0.5895 1.0000 1.4101
## s.e. 0.1684 0.1224 0.0916 0.1008 0.3162
##
## sigma^2 = 4.031: log likelihood = -96.23
## AIC=204.46 AICc=206.61 BIC=215.43
AIC = 204.46, AICc = 206.61
checkresiduals(fit3$residuals)
Our plot with constant not involved THIS IS THROWING SOME SORT OF ERROR
#fit4 <- Arima(aus_airpassengers$Passengers, order = c(2,1,2), include.drift = FALSE)
#fit4 %>% forecast(h=10) %>% autoplot()
fit4 <- Arima(aus_airpassengers$Passengers, order = c(0,2,1), include.constant = TRUE)
fit4 %>% forecast(h=10) %>% autoplot()
us_gdp <- global_economy%>%filter(Country=="United States")
lambda<-BoxCox.lambda((us_gdp%>% select(GDP))[[1]])
us_gdp<-us_gdp%>% mutate(boxGDP = box_cox(GDP,lambda=lambda))
us_gdp%>% gg_tsdisplay(boxGDP,plot_type = "partial")+labs(title=glue("the US gdp lambda:{lambda}"))
us_gdp_model<-us_gdp%>% model(ARIMA(boxGDP))
report(us_gdp_model)
## Series: boxGDP
## Model: ARIMA(1,1,0) w/ drift
##
## Coefficients:
## ar1 constant
## 0.4586 118.2764
## s.e. 0.1198 9.5123
##
## sigma^2 estimated as 5488: log likelihood=-325.37
## AIC=656.74 AICc=657.19 BIC=662.87
us_gdp_model<-us_gdp%>% model(ARIMA(boxGDP~0+pdq(1,1,1)+PDQ(0,1,1)))
report(us_gdp_model)
## Series: boxGDP
## Model: ARIMA(1,1,1)
##
## Coefficients:
## ar1 ma1
## 0.9905 -0.6583
## s.e. 0.0126 0.1460
##
## sigma^2 estimated as 6094: log likelihood=-329.45
## AIC=664.9 AICc=665.35 BIC=671.02
us_gdp_model<-us_gdp%>% model(ARIMA(boxGDP~1+pdq(1,0,1)+PDQ(0,0,1)))
report(us_gdp_model)
## Series: boxGDP
## Model: ARIMA(1,0,1) w/ mean
##
## Coefficients:
## ar1 ma1 constant
## 1 0.7673 0.0192
## s.e. 0 0.0668 0.0342
##
## sigma^2 estimated as 23800: log likelihood=-373.96
## AIC=755.92 AICc=756.67 BIC=764.16
us_gdp_model<-us_gdp%>% model(ARIMA(boxGDP))
us_gdp_model[[2]][[1]]$fit %>% checkresiduals()
fit<-us_gdp_model%>%forecast(h=10)
fit%>%autoplot(us_gdp)
us_gdp_model<-us_gdp%>% model( ets=ETS(GDP))
fit<-us_gdp_model%>%forecast(h=10)
fit%>%autoplot(us_gdp)