Quarterly beer sales data has been provided in the beer.csv files.
Part A) Using the Winter-Holts methods and model the data and predict for the next 2 years.
Part B) Using the ARIMA method model the data and predict for the next 2 years.
library(tseries)
library(astsa)
library(ggplot2)
library(fpp2)
library(TSA)
beer_data <- read.csv("beer.csv",header = TRUE)
attach(beer_data)
names(beer_data)
## [1] "OzBeer"
str(beer_data)
## 'data.frame': 72 obs. of 1 variable:
## $ OzBeer: num 284 213 227 308 262 ...
print(beer_data)
## OzBeer
## 1 284.4
## 2 212.8
## 3 226.9
## 4 308.4
## 5 262.0
## 6 227.9
## 7 236.1
## 8 320.4
## 9 271.9
## 10 232.8
## 11 237.0
## 12 313.4
## 13 261.4
## 14 226.8
## 15 249.9
## 16 314.3
## 17 286.1
## 18 226.5
## 19 260.4
## 20 311.4
## 21 294.7
## 22 232.6
## 23 257.2
## 24 339.2
## 25 279.1
## 26 249.8
## 27 269.8
## 28 345.7
## 29 293.8
## 30 254.7
## 31 277.5
## 32 363.4
## 33 313.4
## 34 272.8
## 35 300.1
## 36 369.5
## 37 330.8
## 38 287.8
## 39 305.9
## 40 386.1
## 41 335.2
## 42 288.0
## 43 308.3
## 44 402.3
## 45 352.8
## 46 316.1
## 47 324.9
## 48 404.8
## 49 393.0
## 50 318.9
## 51 327.0
## 52 442.3
## 53 383.1
## 54 331.6
## 55 361.4
## 56 445.9
## 57 386.6
## 58 357.2
## 59 373.6
## 60 466.2
## 61 409.6
## 62 369.8
## 63 378.6
## 64 487.0
## 65 419.2
## 66 376.7
## 67 392.8
## 68 506.1
## 69 458.4
## 70 387.4
## 71 426.9
## 72 525.0
length(beer_data)
## [1] 1
summary(beer_data)
## OzBeer
## Min. :212.8
## 1st Qu.:272.6
## Median :317.5
## Mean :329.9
## 3rd Qu.:379.7
## Max. :525.0
To predict the next 2 years beer sales, it is required to convert beer data into ts object(time series).
beer_ts <- ts(beer_data, start = 2000, frequency = 4, end=2018)
is.ts(beer_ts)
## [1] TRUE
summary(beer_ts)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 212.8 272.8 316.1 329.3 378.6 525.0
str(beer_ts)
## Time-Series [1:73] from 2000 to 2018: 284 213 227 308 262 ...
ts.plot(beer_ts, col="blue" ,main="Beer Sales in past 18 years", xlab="Years", ylab="Sales", type="b")
Overall Upward positive trend shows increase in sales with time.
Seasonality foreseen: Regular pattern which increases slowly and constantly.
There is Heteroscadasticity: Variation from one season to another seems to increase over time.
deltat(beer_ts)
## [1] 0.25
frequency(beer_ts)
## [1] 4
cycle(beer_ts)
## Qtr1 Qtr2 Qtr3 Qtr4
## 2000 1 2 3 4
## 2001 1 2 3 4
## 2002 1 2 3 4
## 2003 1 2 3 4
## 2004 1 2 3 4
## 2005 1 2 3 4
## 2006 1 2 3 4
## 2007 1 2 3 4
## 2008 1 2 3 4
## 2009 1 2 3 4
## 2010 1 2 3 4
## 2011 1 2 3 4
## 2012 1 2 3 4
## 2013 1 2 3 4
## 2014 1 2 3 4
## 2015 1 2 3 4
## 2016 1 2 3 4
## 2017 1 2 3 4
## 2018 1
There is a presence of seasonality in pattern.
annual_beer <- aggregate(beer_ts)
str(annual_beer)
## Time-Series [1:18] from 2000 to 2017: 1032 1046 1055 1052 1084 ...
summary(annual_beer)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1032 1094 1283 1320 1512 1798
plot(annual_beer,col="green", main="Annual consumption of beer YOY")
The annual plot also shows the positive time over the time pace.
avg_beer <- aggregate(beer_ts,FUN=mean)
summary(avg_beer)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 258.1 273.6 320.8 329.9 377.9 449.4
str(avg_beer)
## Time-Series [1:18] from 2000 to 2017: 258 262 264 263 271 ...
plot(avg_beer,col="red", main="Average consumption of beer YOY")
This Average consumption is also showing overall same as annual plot.
ggsubseriesplot(beer_ts,facets=TRUE)
Quarterly based sub-series plot shows overall in quarter 4 and 1, beer sales are high. Other two quarter decreases with time.
boxplot(beer_ts~cycle(beer_ts), main="Variation between quarters and years")
Same pattern foreseen as subseries plot.
gglagplot(beer_ts)
Lag1 and Lag5, Lag2 and Lag6, Lag3 and Lag7, Lag4 and Lag8 shows almost same in pattern. This shows seasonality.
ggseasonplot(beer_ts,facets=TRUE,polar = TRUE)
Least sales seems to be in quarter2. Descending order in sales: Quarter 4 < Quarter 1 < Quarter 3 < Quarter 2
acf(beer_ts)
This above code helps in finding the correlation between past observations. This plot tails off gradually over time. With time, the effect of past observation decreases. The non-seasonal component reflects AR model.
pacf(beer_ts)
This plot shows cuts off at lag2 itself. Where, it continues but later it completely becomes zero. There seems presence of both AR(1) and MA(2) model and even presence of some error.
adf.test(beer_ts)
##
## Augmented Dickey-Fuller Test
##
## data: beer_ts
## Dickey-Fuller = -2.2708, Lag order = 4, p-value = 0.4651
## alternative hypothesis: stationary
This above code is called Augmented Dickey-Fuller Test.
kpss.test(beer_ts)
##
## KPSS Test for Level Stationarity
##
## data: beer_ts
## KPSS Level = 2.944, Truncation lag parameter = 1, p-value = 0.01
This above code is called Kwiatkowski-Phillips-Schmidt-Shin Test.
periodogram(beer_ts)
There seems a presence of seasonal components.
To Detrend the pattern and removing seasonality.
diff_beer <- diff(beer_ts,lag=frequency(beer_ts))
kpss.test(diff_beer)
##
## KPSS Test for Level Stationarity
##
## data: diff_beer
## KPSS Level = 0.13187, Truncation lag parameter = 1, p-value = 0.1
adf.test(diff_beer)
##
## Augmented Dickey-Fuller Test
##
## data: diff_beer
## Dickey-Fuller = -3.1977, Lag order = 4, p-value = 0.09544
## alternative hypothesis: stationary
BoxCox.lambda(diff_beer)
## [1] 1.154986
plot(diff_beer, col="RED", main="Detrend and Deseasonality")
acf(diff_beer)
pacf(diff_beer)
There is no much of the difference in both acf and pacf plot.
trend_beer = ma(beer_ts,order = 4,centre = T)
plot(beer_ts, col="blue", main ="Quartely Beer Sales in Australia")
lines(trend_beer, col="green")
Trend line shows in upward direction.
plot(trend_beer,col="red",main="TrendLine in Beer Sales series")
detrend_beer = beer_ts - trend_beer
plot(detrend_beer, col="blue", main = "Time Series after removing the Trend Component")
After removing the trend, this beer_ts seems to be stable.
m_beer = t(matrix(data = detrend_beer, nrow = 4))
m_beer
## [,1] [,2] [,3] [,4]
## [1,] NA NA -28.4250 53.9875
## [2,] 4.5500 -32.2000 -26.7375 55.7125
## [3,] 6.4875 -31.8500 -25.4625 53.0000
## [4,] 0.1375 -36.1875 -16.2875 45.0625
## [5,] 15.5875 -44.9625 -11.7750 37.3875
## [6,] 20.3250 -44.8500 -21.7750 60.0250
## [7,] -3.8000 -35.4875 -18.1375 55.3125
## [8,] 1.8375 -40.4375 -22.3000 58.8875
## [9,] 3.8000 -40.3875 -16.0250 49.3250
## [10,] 8.0250 -37.7750 -22.3000 57.3250
## [11,] 6.1000 -43.4250 -27.3500 60.9375
## [12,] 5.8500 -33.2375 -29.7750 44.7500
## [13,] 32.3375 -46.7125 -42.0625 72.8875
## [14,] 7.8000 -48.4500 -19.5375 61.3250
## [15,] -2.7000 -36.1625 -25.1750 62.9750
## [16,] 4.1750 -38.8500 -33.8500 72.4875
## [17,] 2.0500 -44.6125 -35.8000 71.2625
## [18,] 17.9625 -59.6625 -0.7750 NA
## [19,] NA NA NA -28.4250
seasonal_beer = colMeans(m_beer, na.rm = T)
seasonal_beer
## [1] 7.677941 -40.897059 -23.530556 52.456944
seasonal = rep(seasonal_beer,18)
plot.ts(seasonal, col="blue", main="Seasonal Component of Time Series")
Additive seasonality is reflected as the changes remains constant over time.
random = detrend_beer-seasonal
plot(random,col="violet", main = "Random Noise")
This randomized pattern shows the behavior of White Noise series pattern.
reconstruct = trend_beer + seasonal + random
plot(reconstruct,col="Blue", main = "Getting the original time series")
Time_series = Trend + Season + Random
decompose_beer <- decompose(beer_ts)
plot(decompose_beer)
The same series as that of original time seies is seen in this decomposed beer series.
fit <- stl(beer_ts, s.window="period")
plot(fit)
autoplot(diff_beer)
The growth in sales of beer is somewhat constant, not much in variation. Therefore, seasonality is additive and even trend is upward.
As there is a presence of trend and seasonality, So will use hw() method.
holt_beer <- hw(beer_ts,seasonal = "additive")
summary(holt_beer)
##
## Forecast method: Holt-Winters' additive method
##
## Model Information:
## Holt-Winters' additive method
##
## Call:
## hw(y = beer_ts, seasonal = "additive")
##
## Smoothing parameters:
## alpha = 0.0436
## beta = 0.0234
## gamma = 1e-04
##
## Initial states:
## l = 258.441
## b = 1.0453
## s = 57.5941 -23.3313 -37.9635 3.7006
##
## sigma: 24.1468
##
## AIC AICc BIC
## 787.6165 790.4736 808.2306
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.1894768 22.78531 11.43097 -0.2476881 3.649031 0.7204395
## ACF1
## Training set -0.07767902
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2018 Q2 404.5985 373.6531 435.5439 357.2716 451.9254
## 2018 Q3 419.9567 388.9419 450.9714 372.5237 467.3896
## 2018 Q4 501.6079 470.4674 532.7484 453.9827 549.2331
## 2019 Q1 448.4274 417.0888 479.7660 400.4992 496.3557
## 2019 Q2 407.4879 375.8632 439.1126 359.1222 455.8536
## 2019 Q3 422.8461 390.8343 454.8579 373.8882 471.8039
## 2019 Q4 504.4973 471.9844 537.0102 454.7731 554.2215
## 2020 Q1 451.3168 418.1783 484.4553 400.6358 501.9978
Here, Beta is small so slope changes slowly.
This is for additive trend and additive seasonality.
autoplot(forecast(holt_beer,n.ahead=8))
For next 2 years, it seems sales in beer will decrease in comparison to past years.
states=holt_beer$model$states[,1:3]
colnames(states)= cbind('level','trend','seasonality')
plot(states,col="red",main = "Decomposing the forecast")
Overall level increases but there is a drop in trend though the seasonality remains same.
checkresiduals(holt_beer)
##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' additive method
## Q* = 2.9783, df = 3, p-value = 0.395
##
## Model df: 8. Total lags used: 11
ets_beer <- ets(beer_ts)
ets_beer
## ETS(M,A,A)
##
## Call:
## ets(y = beer_ts)
##
## Smoothing parameters:
## alpha = 0.0589
## beta = 0.0588
## gamma = 1e-04
##
## Initial states:
## l = 257.0572
## b = 0.8197
## s = 55.4221 -22.076 -39.1088 5.7627
##
## sigma: 0.0575
##
## AIC AICc BIC
## 749.2171 752.0742 769.8312
This above method is an automated method of Winter Holt’s. This precise defines Type of Error, Trend and Seasonality. i.e ETS(M,A,A)
M = Multiplicative Error
A = Additive Trend
A = Additive Seasonality
Forecasting for next 2 years.
ets_beer %>% forecast(h=8) %>% autoplot()
80 and 95 percent level shows, beyond this shaded area, sales will rise or drop. High chances of 80% level.
arima_beer = arima((beer_ts), c(1,0, 2),seasonal = list(order = c(0, 1, 2), period = 4, include.constant=FALSE))
arima_beer
##
## Call:
## arima(x = (beer_ts), order = c(1, 0, 2), seasonal = list(order = c(0, 1, 2),
## period = 4, include.constant = FALSE))
##
## Coefficients:
## ar1 ma1 ma2 sma1 sma2
## 0.9421 -1.1161 0.4079 -1.1885 0.8452
## s.e. 0.0643 0.2827 0.1893 0.1808 0.2796
##
## sigma^2 estimated as 442.5: log likelihood = -313.91, aic = 637.82
AIC(arima_beer)
## [1] 639.8218
BIC(arima_beer)
## [1] 653.2264
According to the least value of AIC, ARIMA(1,0,2)x(0,1,2)[4] is highly preferrable.
auto.arima(beer_ts,stepwise = FALSE)
## Series: beer_ts
## ARIMA(1,0,2)(0,1,2)[4]
##
## Coefficients:
## ar1 ma1 ma2 sma1 sma2
## 0.9421 -1.1161 0.4079 -1.1885 0.8452
## s.e. 0.0643 0.2827 0.1893 0.1808 0.2796
##
## sigma^2 estimated as 477.1: log likelihood=-313.91
## AIC=639.82 AICc=641.18 BIC=653.23
The above chunk is the advanced method in arima() function. This auto.arima() function helps in finding the lowest AIC value based model.
auto.arima(beer_ts,stepwise=FALSE) %>% forecast(h=8) %>% autoplot()
According to this model, year 2019 shows decrease in sales consumption but again 2020 goes high but quarter2 of 2020 drops down.
checkresiduals(arima_beer)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,2)(0,1,2)[4]
## Q* = 1.1202, df = 3, p-value = 0.7722
##
## Model df: 5. Total lags used: 8
fets <- function(x,h)
{
forecast(ets(x),h=h)
}
farima <- function(x,h)
{
forecast(auto.arima(x),h=h)
}
e1 <- tsCV(beer_ts,fets,h=1)
e2 <- tsCV(beer_ts,farima,h=1)
mean(e1^2, na.rm=TRUE)
## [1] 959.2144
mean(e2^2, na.rm=TRUE)
## [1] 808.0356
From MSE values, seems ARIMA model is the best fit.
According to Winter Holt’s method and ARIMA model, though ARIMA model is best fit model but both the models reflects almost same that beer consumption in year 2019. Even Quarter4 also goes down which affects to rest of the quarters too. Again in 2020, quarter 4 beer consumption will increase but quarter2 will again go down beyond earlier one.
2018 sales of quarter2 is least according to past 18 years.
In 2019, Quarter4(Oct-Dec) sales sightly will goes down but will remain steady in 2020.
In 2019 and 2020, Quarter2(Apr-June) sales will gradually goes down with time.
Quarter3(July-Sept) may get affected/influenced due to quarter2 sales in upcoming years.
Quarter1(Jan-Mar) will maintain same as it is more affected/influenced with Quarter4 sales.