In class today we futhered our discussion of arima models. We focused mainly on how to account for auto regression and seasonal changes. We began by defining examples of an arima model with specific parameters: ARIMA (1,0,0) \[{y_t}={\phi}{y_{(t-1)}}+{\epsilon_t}\]
ARIMA (0,0,1) \[{y_t}={\theta}{\epsilon_{(t-1)}}+{\epsilon_t}\] Ljung Box Stat The test is applied to the residuals of a time series after fitting our ARIMA model. If our model is good, then the residuals should be small and Ljung Box Stat will be small too.
Ho: Model is Adequate Ha: Model is Inadequate
If we reject the null, we can look at Autocorrelation functions and partial autocorrelation functions to try and improve our model.
Our ARIMA model can include the seasonal trend as well ARIMA(p,d,q)x(P,D,Q)S The capital letters represent a Seasonal component, and the S is a factor that represents how often to include the seasonal aspect. We will work with an example:
births <- scan("http://robjhyndman.com/tsdldata/data/nybirths.dat")
birthstimeseries <- ts(births, frequency=12, start=c(1946,1))
TScomp <- decompose(log(birthstimeseries))
plot(TScomp)
The plot of the decomposition is a great way to separate a time series into different components. It shows us obeserved values against the trend, any seasonal variation, and random error terms.
We can also separate out the seasonal trend as well.
noseason <- birthstimeseries - TScomp$seasonal
plot(noseason)
plot(TScomp$seasonal)
When the model is built, we essentially break it down into it’s seasonal and non seasonal components. But, we have two different “difference” factors that need to be built now. Seasonal differences: for monthly data is: \[{y_t}-{y_{(t-12)}}\] For quarterly differences: \[{y_t}-{y_{(t-4)}}\]
Nonseasonal differences This is the same as what we had before in our nomal model: \[{y_t}-{y_{t-1}}\]
D: # of differences that we need to do to reduce seasonal noise d: # of differences needed to remove upward trend
We do the D difference first because there is a chance that we can eliminate the need for an upward/downward trend. We can use the ‘diff’ function to find the order of our differencing. Because our data is annual, we choose our lag=1 which is the default
diffs <- diff(log(birthstimeseries))
plot(diffs)
The adf tes:
adf.test(birthstimeseries)
## Warning in adf.test(birthstimeseries): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: birthstimeseries
## Dickey-Fuller = -5.9547, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
So this adf test will tell us a p-value against a given alternative hypothesis. With the value of less than .01, we can say with accuracy that our data is stationary, or that there is no seasonal trend. We can conclude that: \[D=0, d=0\]
In the case that we would like to chose our own inputs (p,d,q) and (P,D,Q), we can use the acf and pacf functions. The Acf function can tell us whether or not our q has a value, while the Pacf will signal if p has a value
Acf(births)
Pacf(births)
What we want to look for is if the lines are above the dotted line plotted on the graph, if so, it indicates that we may want to add different order values to our data. Looking at the Acf plot, the lines are above at every lag level. This indicates that we should have an order p, while the Partial Acf shows high levels in the first two lags. Thus, we should use a p with order 2. #Creating our Model# We can then focus on building our model. This is down with the arima function in which we input the time series name, our p/d/q/P/D/Q, and the seasonal number
mod1 <- arima(log(AirPassengers), order = c(2, 0, 4), season = list(order = c(0, 1, 1), period= 1))
summary(mod1)
##
## Call:
## arima(x = log(AirPassengers), order = c(2, 0, 4), seasonal = list(order = c(0,
## 1, 1), period = 1))
##
## Coefficients:
## ar1 ar2 ma1 ma2 ma3 ma4 sma1
## -0.5348 -0.4647 0.3337 0.2989 -0.3381 -0.5928 0.4068
## s.e. 0.1812 0.0977 0.1607 0.0863 0.1147 0.1259 0.2835
##
## sigma^2 estimated as 0.007733: log likelihood = 143, aic = -270.01
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.01960085 0.08763261 0.06967392 0.3419897 1.257728 0.7691502
## ACF1
## Training set -0.03873496
You can continue to adjust the model, but will can then use the information here to develop the teststatistic. Last Learning Log, we discussed how to create an ARIMA model using the auto.arima function. This concept is very related to that function because this is simiply a manual way of creating optimal ARIMA models whereas auto.arima does it automatically