library(tseries)
library(readr)
library(ggplot2)
beer1 <- read_csv("C:/My Work/Data Analytics/GL/TSF/beer1.csv")
Parsed with column specification:
cols(
OzBeer = col_double()
)
head(beer1)
ggplot(ts,aes(x=beer1$OzBeer,y=c(1:72)))+geom_line()
ts<-ts(beer1$OzBeer,frequency = 4,start=c(1990,1))
plot(ts,col="red")
cnt_ma12=ma(beer1$OzBeer,12)
cnt_ma6=ma(beer1$OzBeer,6)
cnt_ma4=ma(beer1$OzBeer,4)
ggplot() +
geom_line(data = beer1, aes(x = c(1:72), y = ts, colour = "Actual Sales")) +
geom_line(data = beer1, aes(x = c(1:72), y = cnt_ma12, colour = "Monthly Moving Average")) +
geom_line(data = beer1, aes(x = c(1:72), y = cnt_ma6, colour = "Hal-yearly Moving Average")) +
geom_line(data = beer1, aes(x = c(1:72), y = cnt_ma4, colour = "Quarterly Moving Average")) +
ylab('Beer Sales')
As we can see that the maximum smoothening is achived using ma(4).
kpss.test(ts)
p-value smaller than printed p-value
KPSS Test for Level Stationarity
data: ts
KPSS Level = 3.0456, Truncation lag parameter = 1, p-value = 0.01
So the series is not stationary
ndiffs(ts)
[1] 1
dif = diff(ts,differences = 1)
kpss.test(dif)
p-value greater than printed p-value
KPSS Test for Level Stationarity
data: dif
KPSS Level = 0.036853, Truncation lag parameter = 1, p-value = 0.1
After differenciating it once, we see that the series is stationary.
plot(dif,col = "blue")
abline(lm(dif~time(dif)),col = "red")
decom<-decompose(ts)
plot(decom,col = "blue")
Another test for stationarity, Dickey-Fuller test
adf.test(ts)
Augmented Dickey-Fuller Test
data: ts
Dickey-Fuller = -0.73068, Lag order = 4, p-value = 0.9633
alternative hypothesis: stationary
hence we accept NULL hypothesis
adf.test(dif)
p-value smaller than printed p-value
Augmented Dickey-Fuller Test
data: dif
Dickey-Fuller = -6.2957, Lag order = 4, p-value = 0.01
alternative hypothesis: stationary
this proves that d = 1.
ma4<-ma(dif,4)
par(mfrow=c(2,1))
Acf(ma4, main='')
Pacf(ma4, main='')
from the above charts we can conclude that p = 1 and q = 3. but we must test the same with our model
auto.arima(ma4, seasonal=TRUE)
Series: ma4
ARIMA(0,1,1)(0,0,1)[4] with drift
Coefficients:
ma1 sma1 drift
-0.7671 -0.6583 0.0636
s.e. 0.1187 0.1073 0.0177
sigma^2 estimated as 2.123: log likelihood=-120.82
AIC=249.65 AICc=250.27 BIC=258.64
fit<-arima(ma4,order =c(0,1,1),seasonal = list(order =c(0,0,1),period =4))
fit
Call:
arima(x = ma4, order = c(0, 1, 1), seasonal = list(order = c(0, 0, 1), period = 4))
Coefficients:
ma1 sma1
-0.5858 -0.5674
s.e. 0.1229 0.1095
sigma^2 estimated as 2.392: log likelihood = -123.49, aic = 252.98
tsdisplay(residuals(fit), lag.max=15, main='Seasonal Model Residuals')
fcast <- forecast(fit, h=8)
plot(fcast)