library(forecast)
This is forecast 8.3
Crossvalidated is a great place to get help on forecasting issues:
http://stats.stackexchange.com/tags/forecasting.
library(quantmod)
Loading required package: xts
Loading required package: zoo
Attaching package: ‘zoo’
The following objects are masked from ‘package:base’:
as.Date, as.Date.numeric
Attaching package: ‘xts’
The following objects are masked from ‘package:gdata’:
first, last
Loading required package: TTR
Version 0.4-0 included new data defaults. See ?getSymbols.
Learn from a quantmod author: https://www.datacamp.com/courses/importing-and-managing-financial-data-in-r
library(tseries)
‘tseries’ version: 0.10-44
‘tseries’ is a package for time series analysis and computational finance.
See ‘library(help="tseries")’ for details.
library(timeSeries)
Loading required package: timeDate
Attaching package: ‘timeSeries’
The following object is masked from ‘package:zoo’:
time<-
library(forecast)
library(xts)
data("wineind")
head(wineind)
Jan Feb Mar Apr May Jun
1980 15136 16733 20016 17708 18019 19227
plot.ts(wineind)
automod<-auto.arima(wineind)
diff1<-diff(wineind, lag=12)
adf.test(diff1) #stationary
p-value smaller than printed p-value
Augmented Dickey-Fuller Test
data: diff1
Dickey-Fuller = -4.458, Lag order = 5, p-value = 0.01
alternative hypothesis: stationary
We create an arima model with our data set and used a lag of 12 for (monthly data). We plot the model to see if we need to remove any noice such as seasonality.
The function Acf computes (and by default plots) an estimate of the autocorrelation function of a (possibly multivariate) time series. Function Pacf computes (and by default plots) an estimate of the partial autocorrelation function of a (possibly multivariate) time series. Function Ccf computes the cross-correlation or cross-covariance of two univariate series.
Acf(diff1)
Pacf(diff1)
We have Acf of 3 and Pacf of 4. The Ljung box test is used to test the “overall” randomness based on a number of lags. The null hypothesis is the data are independently distributed. The alternative is the data are not independently distributed; they exhibit serial correlation.
summary(automod)
Series: wineind
ARIMA(1,1,2)(0,1,1)[12]
Coefficients:
ar1 ma1 ma2 sma1
0.4299 -1.4673 0.5339 -0.6600
s.e. 0.2984 0.2658 0.2340 0.0799
sigma^2 estimated as 5399312: log likelihood=-1497.05
AIC=3004.1 AICc=3004.48 BIC=3019.57
Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set -170.7415 2208.571 1619.979 -1.403046 6.55824 0.8234203 0.01025396
Box.test(automod$residuals, type = "Ljung")
Box-Ljung test
data: automod$residuals
X-squared = 0.018823, df = 1, p-value = 0.8909
A p-value of .8909 means we accept the null hypothesis so our data is independently distributed.
Once we have the model, we can use it to forecast values of our interest so we know how many bottles of wine to produce.
forecast(automod,h=1)
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
Sep 1994 25248.61 22270.74 28226.49 20694.35 29802.88
plot(forecast(automod,h=10)) #10 months
From the forecast, we want to have 29,803 bottles of wine prepared for next month. We can also plot the forecast and R will display the interval so we can see it visually.
moda <- arima(wineind, order = c(1,0,1), season = list( order = c( 1,1,1), period=12))
modb <- arima(wineind, order = c(2,0,2), season = list( order = c( 1,1,1), period=12))
moda
Call:
arima(x = wineind, order = c(1, 0, 1), seasonal = list(order = c(1, 1, 1), period = 12))
Coefficients:
ar1 ma1 sar1 sma1
0.9922 -0.8978 0.1994 -0.7746
s.e. 0.0109 0.0323 0.1299 0.1055
sigma^2 estimated as 5295670: log likelihood = -1505.84, aic = 3021.67
modb
Call:
arima(x = wineind, order = c(2, 0, 2), seasonal = list(order = c(1, 1, 1), period = 12))
Coefficients:
ar1 ar2 ma1 ma2 sar1 sma1
0.0962 0.8891 0.0917 -0.8969 0.2023 -0.7744
s.e. 0.0472 0.0463 0.0814 0.0747 0.1323 0.1090
sigma^2 estimated as 5062547: log likelihood = -1503.66, aic = 3021.32
mydfa <- length(coef(modb)) - length(coef(moda))
(teststat1<- -2*as.numeric(logLik(moda) - logLik(modb)))
[1] 4.348894
pchisq(teststat1, df = mydfa, lower.tail = FALSE)
[1] 0.113671
Accet the null which says we should go with the simplier model.