Problem Statement:

Quarterly beer sales data has been provided in the beer.csv files.

Part A) Using the Winter-Holts methods and model the data and predict for the next 2 years.

Part B) Using the ARIMA method model the data and predict for the next 2 years.

library(tseries)
library(astsa)
library(ggplot2)
library(fpp2)
library(TSA)
beer_data <- read.csv("beer.csv",header = TRUE)
attach(beer_data)

Exploratory Data analysis

names(beer_data)
## [1] "OzBeer"
str(beer_data)
## 'data.frame':    72 obs. of  1 variable:
##  $ OzBeer: num  284 213 227 308 262 ...
print(beer_data)
##    OzBeer
## 1   284.4
## 2   212.8
## 3   226.9
## 4   308.4
## 5   262.0
## 6   227.9
## 7   236.1
## 8   320.4
## 9   271.9
## 10  232.8
## 11  237.0
## 12  313.4
## 13  261.4
## 14  226.8
## 15  249.9
## 16  314.3
## 17  286.1
## 18  226.5
## 19  260.4
## 20  311.4
## 21  294.7
## 22  232.6
## 23  257.2
## 24  339.2
## 25  279.1
## 26  249.8
## 27  269.8
## 28  345.7
## 29  293.8
## 30  254.7
## 31  277.5
## 32  363.4
## 33  313.4
## 34  272.8
## 35  300.1
## 36  369.5
## 37  330.8
## 38  287.8
## 39  305.9
## 40  386.1
## 41  335.2
## 42  288.0
## 43  308.3
## 44  402.3
## 45  352.8
## 46  316.1
## 47  324.9
## 48  404.8
## 49  393.0
## 50  318.9
## 51  327.0
## 52  442.3
## 53  383.1
## 54  331.6
## 55  361.4
## 56  445.9
## 57  386.6
## 58  357.2
## 59  373.6
## 60  466.2
## 61  409.6
## 62  369.8
## 63  378.6
## 64  487.0
## 65  419.2
## 66  376.7
## 67  392.8
## 68  506.1
## 69  458.4
## 70  387.4
## 71  426.9
## 72  525.0
length(beer_data)
## [1] 1
summary(beer_data)
##      OzBeer     
##  Min.   :212.8  
##  1st Qu.:272.6  
##  Median :317.5  
##  Mean   :329.9  
##  3rd Qu.:379.7  
##  Max.   :525.0

To predict the next 2 years beer sales, it is required to convert beer data into ts object(time series).

beer_ts <- ts(beer_data, start = 2000, frequency = 4, end=2018)
is.ts(beer_ts)
## [1] TRUE
summary(beer_ts)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   212.8   272.8   316.1   329.3   378.6   525.0
str(beer_ts)
##  Time-Series [1:73] from 2000 to 2018: 284 213 227 308 262 ...
ts.plot(beer_ts, col="blue" ,main="Beer Sales in past 18 years", xlab="Years", ylab="Sales", type="b")

  1. Overall Upward positive trend shows increase in sales with time.

  2. Seasonality foreseen: Regular pattern which increases slowly and constantly.

  3. There is Heteroscadasticity: Variation from one season to another seems to increase over time.

deltat(beer_ts)
## [1] 0.25
frequency(beer_ts)
## [1] 4
cycle(beer_ts)
##      Qtr1 Qtr2 Qtr3 Qtr4
## 2000    1    2    3    4
## 2001    1    2    3    4
## 2002    1    2    3    4
## 2003    1    2    3    4
## 2004    1    2    3    4
## 2005    1    2    3    4
## 2006    1    2    3    4
## 2007    1    2    3    4
## 2008    1    2    3    4
## 2009    1    2    3    4
## 2010    1    2    3    4
## 2011    1    2    3    4
## 2012    1    2    3    4
## 2013    1    2    3    4
## 2014    1    2    3    4
## 2015    1    2    3    4
## 2016    1    2    3    4
## 2017    1    2    3    4
## 2018    1

There is a presence of seasonality in pattern.

annual_beer <- aggregate(beer_ts)
str(annual_beer)
##  Time-Series [1:18] from 2000 to 2017: 1032 1046 1055 1052 1084 ...
summary(annual_beer)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1032    1094    1283    1320    1512    1798
plot(annual_beer,col="green", main="Annual consumption of beer YOY")

The annual plot also shows the positive time over the time pace.

avg_beer <- aggregate(beer_ts,FUN=mean)
summary(avg_beer)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   258.1   273.6   320.8   329.9   377.9   449.4
str(avg_beer)
##  Time-Series [1:18] from 2000 to 2017: 258 262 264 263 271 ...
plot(avg_beer,col="red", main="Average consumption of beer YOY")

This Average consumption is also showing overall same as annual plot.

ggsubseriesplot(beer_ts,facets=TRUE)

Quarterly based sub-series plot shows overall in quarter 4 and 1, beer sales are high. Other two quarter decreases with time.

boxplot(beer_ts~cycle(beer_ts), main="Variation between quarters and years")

Same pattern foreseen as subseries plot.

gglagplot(beer_ts)

Lag1 and Lag5, Lag2 and Lag6, Lag3 and Lag7, Lag4 and Lag8 shows almost same in pattern. This shows seasonality.

ggseasonplot(beer_ts,facets=TRUE,polar = TRUE)

Least sales seems to be in quarter2. Descending order in sales: Quarter 4 < Quarter 1 < Quarter 3 < Quarter 2

Autocorrelation Analysis

acf(beer_ts)

This above code helps in finding the correlation between past observations. This plot tails off gradually over time. With time, the effect of past observation decreases. The non-seasonal component reflects AR model.

pacf(beer_ts)

This plot shows cuts off at lag2 itself. Where, it continues but later it completely becomes zero. There seems presence of both AR(1) and MA(2) model and even presence of some error.

To check whether beer_ts series is stationary or non-stationary.

adf.test(beer_ts)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  beer_ts
## Dickey-Fuller = -2.2708, Lag order = 4, p-value = 0.4651
## alternative hypothesis: stationary

This above code is called Augmented Dickey-Fuller Test.

  1. Here, p-value > 0.05. Therefore, null hypothesis is accepted. As, this series is not stationary.
kpss.test(beer_ts)
## 
##  KPSS Test for Level Stationarity
## 
## data:  beer_ts
## KPSS Level = 2.944, Truncation lag parameter = 1, p-value = 0.01

This above code is called Kwiatkowski-Phillips-Schmidt-Shin Test.

  1. p-value < 0.05. Therefore, beer_ts series is not stationary.

To detect the seasonality.

periodogram(beer_ts)

  1. There seems a presence of seasonal components.

  2. To Detrend the pattern and removing seasonality.

diff_beer <- diff(beer_ts,lag=frequency(beer_ts))
kpss.test(diff_beer)
## 
##  KPSS Test for Level Stationarity
## 
## data:  diff_beer
## KPSS Level = 0.13187, Truncation lag parameter = 1, p-value = 0.1
  1. p-value > 0.05. Therefore, now the beer_series is stationary.
adf.test(diff_beer)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff_beer
## Dickey-Fuller = -3.1977, Lag order = 4, p-value = 0.09544
## alternative hypothesis: stationary
  1. p-value > 0.05 means stationary.

To check variance of stability

BoxCox.lambda(diff_beer)
## [1] 1.154986
  1. 1.15 closes to 1, which means highly stable.
plot(diff_beer, col="RED", main="Detrend and Deseasonality")

  • To find the autocorrelation
acf(diff_beer)

  1. Seasonality is not seen. More of stability.
pacf(diff_beer)

There is no much of the difference in both acf and pacf plot.

Analyzing trend,seasonality and randomization in beer time series.

  • Visualizing the trend
trend_beer = ma(beer_ts,order = 4,centre = T)
plot(beer_ts, col="blue", main ="Quartely Beer Sales in Australia")
lines(trend_beer, col="green")

Trend line shows in upward direction.

  1. Beer trend line
plot(trend_beer,col="red",main="TrendLine in Beer Sales series")

  1. Detrend beer time series
detrend_beer = beer_ts - trend_beer
plot(detrend_beer, col="blue", main = "Time Series after removing the Trend Component")

After removing the trend, this beer_ts seems to be stable.

m_beer = t(matrix(data = detrend_beer, nrow = 4))
m_beer
##          [,1]     [,2]     [,3]     [,4]
##  [1,]      NA       NA -28.4250  53.9875
##  [2,]  4.5500 -32.2000 -26.7375  55.7125
##  [3,]  6.4875 -31.8500 -25.4625  53.0000
##  [4,]  0.1375 -36.1875 -16.2875  45.0625
##  [5,] 15.5875 -44.9625 -11.7750  37.3875
##  [6,] 20.3250 -44.8500 -21.7750  60.0250
##  [7,] -3.8000 -35.4875 -18.1375  55.3125
##  [8,]  1.8375 -40.4375 -22.3000  58.8875
##  [9,]  3.8000 -40.3875 -16.0250  49.3250
## [10,]  8.0250 -37.7750 -22.3000  57.3250
## [11,]  6.1000 -43.4250 -27.3500  60.9375
## [12,]  5.8500 -33.2375 -29.7750  44.7500
## [13,] 32.3375 -46.7125 -42.0625  72.8875
## [14,]  7.8000 -48.4500 -19.5375  61.3250
## [15,] -2.7000 -36.1625 -25.1750  62.9750
## [16,]  4.1750 -38.8500 -33.8500  72.4875
## [17,]  2.0500 -44.6125 -35.8000  71.2625
## [18,] 17.9625 -59.6625  -0.7750       NA
## [19,]      NA       NA       NA -28.4250
  • Seasonality in beer
seasonal_beer = colMeans(m_beer, na.rm = T)
seasonal_beer
## [1]   7.677941 -40.897059 -23.530556  52.456944
seasonal = rep(seasonal_beer,18)
plot.ts(seasonal, col="blue", main="Seasonal Component of Time Series")

Additive seasonality is reflected as the changes remains constant over time.

  • Randomization in beer
random = detrend_beer-seasonal
plot(random,col="violet", main = "Random Noise")

This randomized pattern shows the behavior of White Noise series pattern.

  • Reconstruct the original time series
reconstruct = trend_beer + seasonal + random
plot(reconstruct,col="Blue", main = "Getting the original time series")

Time_series = Trend + Season + Random

Compare with original time series:

  • Decompose the beer_ts time series
decompose_beer <- decompose(beer_ts)
plot(decompose_beer)

The same series as that of original time seies is seen in this decomposed beer series.

  • Seasonal decomposition
fit <- stl(beer_ts, s.window="period")
plot(fit)

Modeling Techniques

Winter Holt’s method

autoplot(diff_beer)

  1. The growth in sales of beer is somewhat constant, not much in variation. Therefore, seasonality is additive and even trend is upward.

  2. As there is a presence of trend and seasonality, So will use hw() method.

holt_beer <- hw(beer_ts,seasonal = "additive")
summary(holt_beer)
## 
## Forecast method: Holt-Winters' additive method
## 
## Model Information:
## Holt-Winters' additive method 
## 
## Call:
##  hw(y = beer_ts, seasonal = "additive") 
## 
##   Smoothing parameters:
##     alpha = 0.0436 
##     beta  = 0.0234 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 258.441 
##     b = 1.0453 
##     s = 57.5941 -23.3313 -37.9635 3.7006
## 
##   sigma:  24.1468
## 
##      AIC     AICc      BIC 
## 787.6165 790.4736 808.2306 
## 
## Error measures:
##                      ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -0.1894768 22.78531 11.43097 -0.2476881 3.649031 0.7204395
##                     ACF1
## Training set -0.07767902
## 
## Forecasts:
##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2018 Q2       404.5985 373.6531 435.5439 357.2716 451.9254
## 2018 Q3       419.9567 388.9419 450.9714 372.5237 467.3896
## 2018 Q4       501.6079 470.4674 532.7484 453.9827 549.2331
## 2019 Q1       448.4274 417.0888 479.7660 400.4992 496.3557
## 2019 Q2       407.4879 375.8632 439.1126 359.1222 455.8536
## 2019 Q3       422.8461 390.8343 454.8579 373.8882 471.8039
## 2019 Q4       504.4973 471.9844 537.0102 454.7731 554.2215
## 2020 Q1       451.3168 418.1783 484.4553 400.6358 501.9978
  1. Here, Beta is small so slope changes slowly.

  2. This is for additive trend and additive seasonality.

  • Predict for next 2 years.
autoplot(forecast(holt_beer,n.ahead=8))

For next 2 years, it seems sales in beer will decrease in comparison to past years.

  • Beer sales decompose
states=holt_beer$model$states[,1:3]
colnames(states)= cbind('level','trend','seasonality')
plot(states,col="red",main = "Decomposing the forecast")

Overall level increases but there is a drop in trend though the seasonality remains same.

  • Analyzing the residuals
checkresiduals(holt_beer)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' additive method
## Q* = 2.9783, df = 3, p-value = 0.395
## 
## Model df: 8.   Total lags used: 11
  1. p-value > 0.05 shows residuals series is a WN series. Therefore, there is no past and present effects on future.

Automated method in Exponential smoothing methods: ets()

ets_beer <- ets(beer_ts)
ets_beer
## ETS(M,A,A) 
## 
## Call:
##  ets(y = beer_ts) 
## 
##   Smoothing parameters:
##     alpha = 0.0589 
##     beta  = 0.0588 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 257.0572 
##     b = 0.8197 
##     s = 55.4221 -22.076 -39.1088 5.7627
## 
##   sigma:  0.0575
## 
##      AIC     AICc      BIC 
## 749.2171 752.0742 769.8312

This above method is an automated method of Winter Holt’s. This precise defines Type of Error, Trend and Seasonality. i.e ETS(M,A,A)

  • M = Multiplicative Error

  • A = Additive Trend

  • A = Additive Seasonality

  • Forecasting for next 2 years.

ets_beer %>% forecast(h=8) %>% autoplot()

80 and 95 percent level shows, beyond this shaded area, sales will rise or drop. High chances of 80% level.

ARIMA model

arima_beer = arima((beer_ts), c(1,0, 2),seasonal = list(order = c(0, 1, 2), period = 4, include.constant=FALSE))
arima_beer
## 
## Call:
## arima(x = (beer_ts), order = c(1, 0, 2), seasonal = list(order = c(0, 1, 2), 
##     period = 4, include.constant = FALSE))
## 
## Coefficients:
##          ar1      ma1     ma2     sma1    sma2
##       0.9421  -1.1161  0.4079  -1.1885  0.8452
## s.e.  0.0643   0.2827  0.1893   0.1808  0.2796
## 
## sigma^2 estimated as 442.5:  log likelihood = -313.91,  aic = 637.82
AIC(arima_beer)
## [1] 639.8218
BIC(arima_beer)
## [1] 653.2264

According to the least value of AIC, ARIMA(1,0,2)x(0,1,2)[4] is highly preferrable.

Automated ARIMA model: auto.arima()

auto.arima(beer_ts,stepwise = FALSE)
## Series: beer_ts 
## ARIMA(1,0,2)(0,1,2)[4] 
## 
## Coefficients:
##          ar1      ma1     ma2     sma1    sma2
##       0.9421  -1.1161  0.4079  -1.1885  0.8452
## s.e.  0.0643   0.2827  0.1893   0.1808  0.2796
## 
## sigma^2 estimated as 477.1:  log likelihood=-313.91
## AIC=639.82   AICc=641.18   BIC=653.23

The above chunk is the advanced method in arima() function. This auto.arima() function helps in finding the lowest AIC value based model.

  • Forecasting using ARIMA model()
auto.arima(beer_ts,stepwise=FALSE) %>% forecast(h=8) %>% autoplot()

According to this model, year 2019 shows decrease in sales consumption but again 2020 goes high but quarter2 of 2020 drops down.

  • Analyzing the residuals
checkresiduals(arima_beer)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,2)(0,1,2)[4]
## Q* = 1.1202, df = 3, p-value = 0.7722
## 
## Model df: 5.   Total lags used: 8
  1. p-value > 0.05. So, Residuals are WN series.
  • To set up forecast function for ETS and ARIMA model in order to find best fit model.
fets <- function(x,h)
{
  forecast(ets(x),h=h)
}

farima <- function(x,h)
{
  forecast(auto.arima(x),h=h)
}
  • Compute CV errors for Winter Holt’s as e1 and ARIMA as e2
e1 <- tsCV(beer_ts,fets,h=1)
e2 <- tsCV(beer_ts,farima,h=1)
  • Find MSE of each model class
mean(e1^2, na.rm=TRUE)
## [1] 959.2144
mean(e2^2, na.rm=TRUE)
## [1] 808.0356

From MSE values, seems ARIMA model is the best fit.

Interpretation

Conclusion

  1. In 2019, Quarter4(Oct-Dec) sales sightly will goes down but will remain steady in 2020.

  2. In 2019 and 2020, Quarter2(Apr-June) sales will gradually goes down with time.

  3. Quarter3(July-Sept) may get affected/influenced due to quarter2 sales in upcoming years.

  4. Quarter1(Jan-Mar) will maintain same as it is more affected/influenced with Quarter4 sales.