American brewing history has seen numerous ups and downs, from the much obvious reasons like Prohibition to less visible, but equally crucial reasons like the introduction of refrigeration, which completely transformed the process of breweries in the United States. The nature of beer as a beverage is temperamental, as unlike wine and spirits it can not be stored for a long period of time. Due to various other reasons, the production of beer in the US has seen a lot of fluctuations over the period of time.
As per the statistics, beer production market has seen a great peek in the late 1800s. In 1871, the United States had 4131 operational breweries. Going through the dark period of Prohibition, the number of operational breweries in 1970 which came down to as low as 89 by just 42 companies.
Starting from 1970 the Beer market in The US started to rise slowly again, this was due to various reasons like the legalization of home production of beer etc. With the help of this project, we will try to analyze the production of beer in the US from 1970 t0 1977.
With the help of statistics, we will perform a time series analysis to observe interesting trends and patterns in the production of beer in the United States from the year 1970 to 1977.
We will start with analyzing the Time series plot of the data and try to find out the factors like seasonality, changing variance in the data, and, further, we will try to deal with these factors affecting the pattern/plot. Finally, we will try to fit a SARIMA model to the data with proper diagnosis and interpretation.
Source of the data- The data is taken from Census At School New Zealand website supported by A-stats NZ and The University Of Auckland.
The data has 96 observations and 2 variables, out of which variable “beerproductionusa” is our main variable of interest and the other variable is a monthly indicator.
library(tidyverse)
library(TSA)
library(tseries)
library(lmtest)
library(forecast)
# Data Import
setwd("/Users/zuaibshaikh/Desktop/SEM 3/Time Series Analysis/Final Project")
df <- read.csv("USA Beer production.csv")
head(df)
## Month beerproductionusa
## 1 1970M01 13.092
## 2 1970M02 11.978
## 3 1970M03 11.609
## 4 1970M04 10.812
## 5 1970M05 8.536
## 6 1970M06 9.617
# Time Series Conversion
df_TS <- ts(as.vector(df$beerproductionusa), start = 1970, end = 1977, frequency = 12)
time(df_TS)
## Jan Feb Mar Apr May Jun Jul Aug
## 1970 1970.000 1970.083 1970.167 1970.250 1970.333 1970.417 1970.500 1970.583
## 1971 1971.000 1971.083 1971.167 1971.250 1971.333 1971.417 1971.500 1971.583
## 1972 1972.000 1972.083 1972.167 1972.250 1972.333 1972.417 1972.500 1972.583
## 1973 1973.000 1973.083 1973.167 1973.250 1973.333 1973.417 1973.500 1973.583
## 1974 1974.000 1974.083 1974.167 1974.250 1974.333 1974.417 1974.500 1974.583
## 1975 1975.000 1975.083 1975.167 1975.250 1975.333 1975.417 1975.500 1975.583
## 1976 1976.000 1976.083 1976.167 1976.250 1976.333 1976.417 1976.500 1976.583
## 1977 1977.000
## Sep Oct Nov Dec
## 1970 1970.667 1970.750 1970.833 1970.917
## 1971 1971.667 1971.750 1971.833 1971.917
## 1972 1972.667 1972.750 1972.833 1972.917
## 1973 1973.667 1973.750 1973.833 1973.917
## 1974 1974.667 1974.750 1974.833 1974.917
## 1975 1975.667 1975.750 1975.833 1975.917
## 1976 1976.667 1976.750 1976.833 1976.917
## 1977
# TS Plot
plot(window(df_TS,start=c(1970,1)),ylab='Beer Production',
main=" Figure 1. Time series Plot of Beer Production")
plot(df_TS, type = "o", xlab="Year", ylab="Beer Production",
main="Figure 2. Change in Beer production from 1970 to 1977")
### Analysis & Interpretation
Trend- The production of beer is increasing over time and the figure clearly follows an upward trend.
Seasonality - A repeating cycle or pattern can be observed from the plot hence it can be inferred that figure/data do have the presence of seasonality.
Behaviour- The Presence of Auto-Regressive and moving average behaviour is observed.
Changing Variance - A change in variance is present.
Intervention Points- No major Intervention points are observed from the data.
# Pearson's Coefficient and correlation scatter plot
y = df_TS
x = zlag(df_TS)
index = 2:length(x)
cor(y[index], x[index])
## [1] 0.7909867
#Scatter Plot
plot(y = df_TS, x = zlag(df_TS),ylab="Beer Production",
xlab="Beer Production in the previous year",
main="Figure 3. Scatter plot to show the autocorrelation
between Beer production of previous year and current year")
### Interpretation
par(mfrow=c(1,2))
acf(df_TS, main = "Figure 4. ACF plot of the Beer production time series")
pacf(df_TS, main = "Figure 5. PACF plot of the Beer production time series")
par(mfrow=c(1,1))
In the ACF Plot, we can see strong correlations between the elements of the time series. The lags present are correlations of the observations with the observations of previous time stamps. In the ACF plot, we can see lags at 12,24,36…, from which we can conclude thatthere is a presence of Seasonality in the time series data.
The PACF plot states the partial autocorrelation between the series and and it’s lags.
acf(df_TS, lag.max = 100, main="Figure 6. ACF for the monthly beer production.")
pacf(df_TS, lag.max = 100, main="Figure 7. PACF for the monthly beer production.")
### Observation & Interpretation
Auto Correlation - The ACF function clearly depicts that the series is not white noise, as significant lags are observed in the series and we can say that there exists a significant dependence in the stochastic component. In the ACF plot, we can see lags at 12,24,36…, from which we can conclude there is a presence of Seasonality in the time series data.
A strong correlation can be observed in the PACF plot, a slow decaying pattern is also observed using the ACF plot, these observation signals towards the presence of a trend and non-stationarity.
#Hypothesis Test (ADF)
par(mfrow=c(1,1))
adf.test(df_TS, alternative = c("stationary"))
## Warning in adf.test(df_TS, alternative = c("stationary")): p-value smaller than
## printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: df_TS
## Dickey-Fuller = -8.1869, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
From the above output, it can be noted that the p-value is 0.01 which is less than 0.05, hence it can be said that series is stationary at a 5% level of significance.
As there is an obvious change in variance, transformation Techniques should be used in order to remove the variance present in the data.
# Box-cox Transformation
df_TS_BC = BoxCox.ar(df_TS)
title(main ='Figure 8. Lambda values & Log Likelihood')
# Confidence Interval
df_TS_BC$ci
## [1] -0.5 1.6
lambda = df_TS_BC$lambda[which(max(df_TS_BC$loglike) == df_TS_BC$loglike)]
lambda
## [1] 0.6
# Transformed Data
# Mid valve of CI
lambda = 0.6
# Box-Cox data
df_TS_BC = (df_TS^lambda-1)/lambda
# TS Plot
plot(df_TS_BC, type = "o", xlab="Year", ylab="Beer Production",
main="Figure 9. Change in Beer production from 1970 to 1977(Box-Cox)")
# QQ Plot
qqnorm(df_TS_BC, main ='Figure 10. Box-Cox Transformation- QQ plot')
qqline(df_TS_BC, col = 2)
#ACF Plot
acf(df_TS_BC, lag.max = 100, main=" Figure 11. ACF for the monthly beer production.")
#PACF
pacf(df_TS_BC, lag.max = 100, main="Figure 12. PACF for the monthly beer production.")
Upon checking the confidence interval and setting the midpoint of the confidence interval as lambda, the data is transformed (Box-Cox).
Time series plot of the transformed data does not show any decrease in the variation.
In the QQ plot, the tail of the QQ line is deviating from the normality.
The ACF & PACF plots do not show any major effect of transformation, the trend and seasonality are still the same, the presence of auto-correlation can be clearly seen from the outputs.
Considering all the above observations it is clear that Box-Cox Transformation was not successful in order to improve the data.
Let’s Try classical differencing techniques.
# First Ordinary Difference
df_TS.diff = diff(df_TS_BC)
#TS Plot
plot(df_TS.diff,ylab='First Difference of Beer Production',xlab='Time',main = "Figure 13. T.S plot of the first differences of Beer Production units.")
## Quantile-Quantile Plot of Box-Cox Transformed Data
# QQ Plot
qqnorm(df_TS.diff, main ='Figure 14. First Difference Transformation- QQ plot')
qqline(df_TS.diff, col = 2)
#ACF(1st Ordinary Differencing)
acf(df_TS.diff,lag.max=100,main="Figure 15. ACF of the first differences of Beer production units")
#PACF(1st Ordinary Differencing)
pacf(df_TS.diff,lag.max=100,main="Figure 16. PACF of the first differences
of Beer production units")
Time series plot of the transformed data shows that the trend is however reduced but the presence of seasonality is still there.
In the QQ plot of the data after first difference has also improved and moving towards normality.
Looking at the ACF and PACF plots it can be said that the presence of trend is faded a bit but the decaying pattern is still there, hence the seasonality is still present.
Considering all the above observations it can be said that with the help of 1st differencing Transformation there is a minor improvement in the data but we have not achieved the desired results as the presence of seasonality is yet to be dealt with.
Let’s try second classical approach, i.e 1st ordinary differencing and 1st seasonal differencing.
# D=1 & d=1
df_TS.diff.Diff = diff(df_TS.diff, lag=12)
# TS Plot
plot(df_TS.diff.Diff,ylab='One ordinary diff & one seasonal diff',xlab='Time',main = "Figure 17. T.S plot with one ordinary diff & one seasonal diff.")
# QQ Plot
qqnorm(df_TS.diff.Diff, main ='Figure 18. d=1 & D=1 Transformation- QQ plot')
qqline(df_TS.diff.Diff, col = 2)
# ACF Plot
acf(df_TS.diff.Diff,lag.max=100,main="Figure 19. ACF of the first ordinary difference
and one seasonal difference of Beer production units")
#PACF
pacf(df_TS.diff.Diff,lag.max=100,main="Figure 20. ACF of the first ordinary difference
and one seasonal difference of Beer production units")
Time series plot of the transformed data do not have any signs of trend as well as the seasonality is almost zero.
The QQ plot of the data is also looking good, there are only three point deviating from the qq line.
Looking at the ACF and PACF plots it can be said that the seasonality is almost gone from the data.
Considering all the above observations it can be said that with the help of 1st ordinary and 1st seasonal differencing Transformation we almost got rid of the seasonality and have achieved the desired results.
From the above ACF and PACF plots of 1 ordinary difference and 1 seasonal difference, we have
d = 1 D = 1 p = 1,2,3,4 P = 0 q = 1 Q = 0
Therefore, the SARIMA models are:
SARIMA(1,1,1) x (0,1,0)_12 SARIMA(2,1,1) x (0,1,0)_12 SARIMA(3,1,1) x (0,1,0)_12 SARIMA(4,1,1) x (0,1,0)_12
Through residual approach, time series, ACF and PACF plots are displayed. There was seasonality and a trend present in the time series, which was observed in the classical approach.
# TS Plot
m1.temp = arima(df_TS_BC,order=c(0,0,0),
seasonal=list(order=c(0,1,0), period=12))
res.m1 = rstandard(m1.temp)
plot(res.m1,xlab='Time',ylab='Residuals',
main="Figure 21. Time series plot of the residuals.")
#ACF Plot
acf(res.m1, lag.max = 100, main = "Figure 22. ACF for m1 residual")
pacf(res.m1, lag.max = 100, main = "Figure 23. PACF for m1 residual")
We fit a SARIMA(0,0,0) x (0,1,0)_12 model. We displayed ACF & PACF plots of the residuals.
In Figure 10, we can see time series plot with seasonal difference (D=1). We can see the trend has disappeared when differencing is applied however, there is a small presence of seasonality.
When we look at the ACF and PACF plots, it is seen that the presence of trend has decreased and a decaying pattern is visible. Hence the seasonality is still present.
After considering all the plots, we can say that, applying first seasonal difference, a small improvement can be seen in the plots.
From the above model, we get p=1 and q=1. Now, we place these values in model 2.
m2.temp = arima(df_TS_BC,order=c(1,0,1),
seasonal=list(order=c(0,1,0), period=12))
res.m2 = rstandard(m2.temp)
plot(res.m2,xlab='Time',ylab='Residuals',
main="Figure 24. Time series plot of the residuals.")
acf(res.m2, lag.max = 100, main = "Figure 25. ACF for m2 residual")
pacf(res.m2, lag.max = 100, main = "Figure 26. PACF for m2 residual")
We fit a SARIMA(1,0,1) x (0,1,0)_12 model and display ACF & PACF plots of the residuals.
We can see the time series plot with seasonal difference (D=1) and values of ‘p’ and ‘q’ equal to one respectively. We can see the trend has disappeared when we applied first seasonal difference, however, a little presence of seasonality is seen.
When we look at the ACF and PACF plots, we can see the presence of trend has decreased and a decaying pattern is seen. Hence, a presence of seasonality is still seen.
After considering all the plots, it is seen that after applying first seasonal difference and taking the value of ‘p’ and ‘q’ equal to 1, a decrease in seasonality and a decaying pattern is seen. However, further improvement is needed in the plot.
From the above ACF & PACF plots, we get q=1 and p=1 respectively. We place these values in model 3.
m3.temp = arima(df_TS_BC,order=c(1,1,1),
seasonal=list(order=c(0,1,0), period=12))
res.m3 = rstandard(m3.temp)
plot(res.m3,xlab='Time',ylab='Residuals',
main="Figure 27. Time series plot of the residuals.")
acf(res.m3, lag.max = 100, main = "Figure 28. ACF for m3 residual")
pacf(res.m3, lag.max = 100, main = "Figure 29. PACF for m3 residual")
We fit a SARIMA(1,1,1) x (0,1,0)_12 model and display ACF & PACF plots of the residuals.
In Figure 16, we can see the time series plot with Seasonal difference (D=1) and Ordinary difference (d=1) and values of ‘p’ and ‘q’ are equal to one respectively. We can see the trend has disappeared and the seasonality also has fairly decreased.
When we look at the ACF and PACF plots, a presence of trend is not seen and a decaying pattern is visible.
After considering all the plots, it is seen that, after applying first seasonal difference and first ordinary difference, seasonality has fairly reduced and the trend is not visible.
From the above ACF & PACF plots, we get q=0 and p=0 respectively.
From the above analysis, the model we found best is: SARIMA(1,1,1) x (0,1,0)_12
ACF and PACF plots are effective to identify MA(q) and AR(p) models. However, when it comes to ARMA and ARIMA models, theoretical ACF and PACF plots have infinitely many non-zero values. Hence, it becomes difficult to identify mixed models from sample ACF & PACF plots.
EACF method is used to identify the order of Auto-regressive and Moving Average components of ARMA or ARIMA models.
EACF method filters out AR components from ARMA or ARIMA models. This results in pure MA(q) or IMA(q,d) processes. It enjoys the cutoff property in its ACF plots.
eacf(res.m3)
## AR/MA
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 o o o o o o o o o o o o o o
## 1 x o o o o o o o o o o o o o
## 2 x x o o o o o o o o o o o o
## 3 o x o o o o o o o o o o o o
## 4 o x x o o o o o o o o o o o
## 5 x x o o o o o o o o o o o o
## 6 x x o o o o o o o o o o o o
## 7 x x o o o o o o o o o o o o
The possible models from the eacf are:
SARIMA(0,1,0) x (0,1,0)_12 SARIMA(0,1,1) x (0,1,0)_12 SARIMA(1,1,1) x (0,1,0)_12
In the figure below, each shaded area of the row of BIC table corresponds to a set of ARMA model.
The models are sorted according to the BIC score. The model with smallest BIC score is considered as the best model.
res = armasubsets(y=res.m3 , nar=5 , nma=5, y.name='test', ar.method='ols')
plot(res)
Model with smallest BIC score contains error lag 4. Hence, a model containing error lag 4 will be considered as the best model.
Second best model contains lags 2 of time series and error lag 4. Hence, a model containing time series lag 2 and error lag 4 will be considered as the second best model.
Here: p=4 q=0 p=4 q=2
The possible set of models from BIC table are:
SARIMA(4,1,0) x (0,1,0)_12 SARIMA(4,1,2) x (0,1,0)_12
Overall, models from residual approach are: SARIMA(1,1,1) x (0,1,0)_12 SARIMA(0,1,0) x (0,1,0)_12 SARIMA(0,1,1) x (0,1,0)_12 SARIMA(1,1,1) x (0,1,0)_12 SARIMA(4,1,0) x (0,1,0)_12 SARIMA(4,1,2) x (0,1,0)_12
Models from classical approach: SARIMA(1,1,1) x (0,1,0)_12 SARIMA(2,1,1) x (0,1,0)_12 SARIMA(3,1,1) x (0,1,0)_12 SARIMA(4,1,1) x (0,1,0)_12
In model fitting, we fit the above models for parameter estimation.
We use the Maximum Likelihood (ML), Conditional Sum of Squares (CSS) and difference of Conditional Sum of Squares & Maximum Likelihood (CSS-ML).
m1.temp.ML <-arima(df_TS_BC,order=c(1,1,1),seasonal=list(order=c(0,1,0),
period=12),method = "ML")
m1.tempCSS <-arima(df_TS_BC,order=c(1,1,1),seasonal=list(order=c(0,1,0),period=12),method = "CSS")
m1.temp.CSS.ML = Arima(df_TS, order=c(1,1,1), seasonal=list(order=c(0,1,0), period=12),method = "CSS-ML")
coeftest(m1.temp.ML)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.098906 0.147457 0.6707 0.5024
## ma1 -0.887540 0.094148 -9.4271 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest(m1.tempCSS)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.046298 0.146338 0.3164 0.7517
## ma1 -0.803322 0.082151 -9.7786 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest(m1.temp.CSS.ML)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.101836 0.145930 0.6978 0.4853
## ma1 -0.891450 0.091295 -9.7645 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
m3.temp.ML = Arima(df_TS, order=c(0,1,1), seasonal=list(order=c(0,1,0), period=12),method = "ML")
m3.tempCSS = Arima(df_TS, order=c(0,1,1), seasonal=list(order=c(0,1,0), period=12),method = "CSS")
m3.temp.CSS.ML = Arima(df_TS, order=c(0,1,1), seasonal=list(order=c(0,1,0), period=12),method = "CSS-ML")
coeftest(m3.temp.ML)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ma1 -0.852056 0.084761 -10.052 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest(m3.tempCSS)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ma1 -0.784327 0.071186 -11.018 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest(m3.temp.CSS.ML)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ma1 -0.852062 0.084759 -10.053 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
m4.temp.ML = Arima(df_TS, order=c(1,1,1), seasonal=list(order=c(0,1,0), period=12),method = "ML")
m4.tempCSS = Arima(df_TS, order=c(1,1,1), seasonal=list(order=c(0,1,0), period=12),method = "CSS")
m4.temp.CSS.ML = Arima(df_TS, order=c(1,1,1), seasonal=list(order=c(0,1,0), period=12),method = "CSS-ML")
coeftest(m4.temp.ML)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.101833 0.145930 0.6978 0.4853
## ma1 -0.891444 0.091298 -9.7641 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest(m4.tempCSS)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.054595 0.145861 0.3743 0.7082
## ma1 -0.812017 0.080103 -10.1371 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest(m4.temp.CSS.ML)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.101836 0.145930 0.6978 0.4853
## ma1 -0.891450 0.091295 -9.7645 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
m5.temp.ML = Arima(df_TS, order=c(4,1,0), seasonal=list(order=c(0,1,0), period=12),method = "ML")
m5.tempCSS = Arima(df_TS, order=c(4,1,0), seasonal=list(order=c(0,1,0), period=12),method = "CSS")
m5.temp.CSS.ML = Arima(df_TS, order=c(4,1,0), seasonal=list(order=c(0,1,0), period=12),method = "CSS-ML")
coeftest(m5.temp.ML)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.64565 0.11200 -5.7647 8.183e-09 ***
## ar2 -0.41872 0.13251 -3.1598 0.001579 **
## ar3 -0.26044 0.13548 -1.9224 0.054553 .
## ar4 -0.31234 0.11670 -2.6764 0.007442 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest(m5.tempCSS)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.65710 0.10900 -6.0284 1.656e-09 ***
## ar2 -0.42864 0.12873 -3.3297 0.0008695 ***
## ar3 -0.26248 0.13248 -1.9813 0.0475569 *
## ar4 -0.30993 0.11318 -2.7384 0.0061746 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest(m5.temp.CSS.ML)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.64565 0.11200 -5.7646 8.183e-09 ***
## ar2 -0.41872 0.13251 -3.1598 0.001579 **
## ar3 -0.26044 0.13548 -1.9224 0.054552 .
## ar4 -0.31234 0.11670 -2.6764 0.007442 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the above models, we find that the models obtained from Maximum Likelihood method, models AR1, AR2 and AR4 are significant as their Pr(>|z|) is less than 0.05
In the models obtained from Conditional sum of squares method, all the models AR1, AR2, AR3 and AR4 are significant as their Pr(>|z|) is less than 0.05
The models AR1, AR2 and AR4 obtained from the difference of Conditional Sum of Squares and Maximum Likelihood are significant as the parameters of these models are less than 0.05
m6.temp.ML = Arima(df_TS, order=c(4,1,2), seasonal=list(order=c(0,1,0), period=12),method = "ML")
m6.tempCSS = Arima(df_TS, order=c(4,1,2), seasonal=list(order=c(0,1,0), period=12),method = "CSS")
m6.temp.CSS.ML = Arima(df_TS, order=c(4,1,2), seasonal=list(order=c(0,1,0), period=12),method = "CSS-ML")
coeftest(m6.temp.ML)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.52043 0.30624 1.6994 0.08924 .
## ar2 -0.15987 0.14152 -1.1297 0.25860
## ar3 -0.14339 0.13261 -1.0813 0.27957
## ar4 -0.29213 0.13685 -2.1347 0.03278 *
## ma1 -1.34120 0.30373 -4.4158 1.006e-05 ***
## ma2 0.60195 0.29796 2.0202 0.04336 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest(m6.tempCSS)
## Warning in sqrt(diag(se)): NaNs produced
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.309067 NaN NaN NaN
## ar2 0.059497 0.064176 0.9271 0.35388
## ar3 0.012672 0.088980 0.1424 0.88676
## ar4 -0.223599 0.094943 -2.3551 0.01852 *
## ma1 -0.725282 NaN NaN NaN
## ma2 -0.393998 NaN NaN NaN
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest(m6.temp.CSS.ML)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.52026 0.30662 1.6967 0.08974 .
## ar2 -0.15992 0.14154 -1.1298 0.25855
## ar3 -0.14337 0.13262 -1.0811 0.27967
## ar4 -0.29219 0.13684 -2.1353 0.03274 *
## ma1 -1.34105 0.30410 -4.4099 1.034e-05 ***
## ma2 0.60181 0.29839 2.0169 0.04371 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the above models, we find that the models obtained from Maximum Likelihood method (models AR4, MA1 and MA2) are significant as their Pr(>|z|) is less than 0.05
From the method Conditional sum of squares, model AR4 is the only significant model.
From the method Conditional Sum of Squares, models AR4, MA1 and MA2 are significant as their Pr(>|z|) is less than 0.05
m7.temp.ML = Arima(df_TS, order=c(2,1,1), seasonal=list(order=c(0,1,0), period=12),method = "ML")
m7.tempCSS = Arima(df_TS, order=c(2,1,1), seasonal=list(order=c(0,1,0), period=12),method = "CSS")
m7.temp.CSS.ML = Arima(df_TS, order=c(2,1,1), seasonal=list(order=c(0,1,0), period=12),method = "CSS-ML")
coeftest(m7.temp.ML)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.121125 0.149578 0.8098 0.4181
## ar2 0.070518 0.139373 0.5060 0.6129
## ma1 -0.917010 0.101552 -9.0299 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest(m7.tempCSS)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.071104 0.149367 0.476 0.6340
## ar2 0.053897 0.141106 0.382 0.7025
## ma1 -0.837699 0.087052 -9.623 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest(m7.temp.CSS.ML)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.121123 0.149578 0.8098 0.4181
## ar2 0.070515 0.139372 0.5060 0.6129
## ma1 -0.917010 0.101552 -9.0300 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
m8.temp.ML = Arima(df_TS, order=c(3,1,1), seasonal=list(order=c(0,1,0), period=12),method = "ML")
m8.tempCSS = Arima(df_TS, order=c(3,1,1), seasonal=list(order=c(0,1,0), period=12),method = "CSS")
m8.temp.CSS.ML = Arima(df_TS, order=c(3,1,1), seasonal=list(order=c(0,1,0), period=12),method = "CSS-ML")
coeftest(m8.temp.ML)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.1236722 0.1570934 0.7873 0.4311
## ar2 0.0721947 0.1430399 0.5047 0.6138
## ar3 0.0075263 0.1392080 0.0541 0.9569
## ma1 -0.9200172 0.1163784 -7.9054 2.671e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest(m8.tempCSS)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.077011 0.155625 0.4948 0.6207
## ar2 0.055864 0.144208 0.3874 0.6985
## ar3 0.017269 0.141847 0.1217 0.9031
## ma1 -0.852218 0.098724 -8.6323 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest(m8.temp.CSS.ML)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.1236723 0.1570937 0.7873 0.4311
## ar2 0.0721943 0.1430401 0.5047 0.6138
## ar3 0.0075262 0.1392081 0.0541 0.9569
## ma1 -0.9200157 0.1163787 -7.9054 2.672e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
m9.temp.ML = Arima(df_TS, order=c(4,1,1), seasonal=list(order=c(0,1,0), period=12),method = "ML")
m9.tempCSS = Arima(df_TS, order=c(4,1,1), seasonal=list(order=c(0,1,0), period=12),method = "CSS")
m9.temp.CSS.ML = Arima(df_TS, order=c(4,1,1), seasonal=list(order=c(0,1,0), period=12),method = "CSS-ML")
coeftest(m9.temp.ML)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.046368 0.170788 -0.2715 0.78601
## ar2 -0.055513 0.146377 -0.3792 0.70450
## ar3 -0.080227 0.133902 -0.5991 0.54908
## ar4 -0.303839 0.128961 -2.3561 0.01847 *
## ma1 -0.739880 0.156017 -4.7423 2.113e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest(m9.tempCSS)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.042835 0.143911 -0.2976 0.7660
## ar2 -0.039907 0.131099 -0.3044 0.7608
## ar3 -0.048134 0.128339 -0.3751 0.7076
## ar4 -0.264109 0.125198 -2.1095 0.0349 *
## ma1 -0.854483 0.160296 -5.3307 9.785e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest(m9.temp.CSS.ML)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.046314 0.170775 -0.2712 0.78624
## ar2 -0.055503 0.146372 -0.3792 0.70454
## ar3 -0.080235 0.133903 -0.5992 0.54904
## ar4 -0.303857 0.128964 -2.3561 0.01847 *
## ma1 -0.739948 0.155983 -4.7438 2.098e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the above models, we find that the models obtained from Maximum Likelihood, Conditional Sum of Squares and difference of CSS-ML methods, models AR4 and MA1 are the only significant models, as their Pr(>|z|) is less than 0.05
Now from the above analysis, we have the following information:
| Models | Significant p,q,P,Q | Insignificant p,q,P,Q | Difference |
|---|---|---|---|
| m1.ML | 1 | 1 | 0 |
| m1.CSS | 1 | 1 | 0 |
| m1.CSS-ML | 1 | 1 | 0 |
| m3.ML | 1 | 0 | 1 |
| m3.CSS | 1 | 0 | 1 |
| m3.CSS-ML | 1 | 0 | 1 |
| m4.ML | 1 | 1 | 0 |
| m4.CSS | 1 | 1 | 0 |
| m4.CSS-ML | 1 | 1 | 0 |
| m5.ML | 3 | 1 | 2 |
| m5.CSS | 4 | 0 | 4 |
| m5.CSS-ML | 3 | 1 | 2 |
| m6.ML | 2 | 1 | 1 |
| m6.CSS | 2 | 1 | 1 |
| m6.CSS-ML | 2 | 1 | 1 |
| m7.ML | 1 | 2 | -1 |
| m7.CSS | 1 | 2 | -1 |
| m7.CSS-ML | 1 | 2 | -1 |
| m8.ML | 1 | 3 | -2 |
| m8.CSS | 1 | 3 | -2 |
| m8.CSS-ML | 1 | 3 | -2 |
| m9.ML | 2 | 3 | -1 |
| m9.CSS | 2 | 3 | -1 |
| m9.CSS-ML | 2 | 3 | -1 |
From the above table, it is clear that m5.CSS has the highest significant values followed by m5.ML and m5.CSS-ML .
Now, performing the residual analysis for the above models.
LBQPlot <- function(e, lagNumber)
{
pValue <- rep(0, lagNumber)
for(j in 0:lagNumber){
pValue[j] <- Box.test(e, lag = j, type = "Ljung-Box", fitdf = 0)$p.value
}
plot(pValue, ylim = c(0,1), type = "n", ylab = "P-values", xlab = "Lags", main = "Ljung Box Test")
points(pValue)
abline(h = 0.05, col = "red", lty = "dotted")
}
residual_analysis_stochastic <- function(model) {
par(mfrow=c(3,2))
st.res = rstandard(model)
plot(st.res,xlab='Time', type = "o", ylab='Standardised Residuals', main="Time series plot of residuals")
hist(st.res)
qqnorm(st.res)
qqline(st.res, col = 2)
acf(st.res, lag.max = 48, main = "ACF of the residuals.")
pacf(st.res, lag.max = 48, main = "PACF of the residuals.")
LBQPlot(st.res, 30)
par(mfrow=c(1,1))
return(shapiro.test(st.res))
}
residual_analysis_stochastic(m1.tempCSS)
##
## Shapiro-Wilk normality test
##
## data: st.res
## W = 0.9828, p-value = 0.3175
residual_analysis_stochastic(m1.temp.ML)
##
## Shapiro-Wilk normality test
##
## data: st.res
## W = 0.98474, p-value = 0.4163
residual_analysis_stochastic(m1.temp.CSS.ML)
##
## Shapiro-Wilk normality test
##
## data: st.res
## W = 0.98281, p-value = 0.3179
residual_analysis_stochastic(m3.tempCSS)
##
## Shapiro-Wilk normality test
##
## data: st.res
## W = 0.98063, p-value = 0.2308
residual_analysis_stochastic(m3.temp.ML)
##
## Shapiro-Wilk normality test
##
## data: st.res
## W = 0.98118, p-value = 0.2507
residual_analysis_stochastic(m3.temp.CSS.ML)
##
## Shapiro-Wilk normality test
##
## data: st.res
## W = 0.98118, p-value = 0.2507
residual_analysis_stochastic(m4.tempCSS)
##
## Shapiro-Wilk normality test
##
## data: st.res
## W = 0.98045, p-value = 0.2248
residual_analysis_stochastic(m4.temp.ML)
##
## Shapiro-Wilk normality test
##
## data: st.res
## W = 0.98281, p-value = 0.3179
residual_analysis_stochastic(m4.temp.CSS.ML)
##
## Shapiro-Wilk normality test
##
## data: st.res
## W = 0.98281, p-value = 0.3179
residual_analysis_stochastic(m5.tempCSS)
##
## Shapiro-Wilk normality test
##
## data: st.res
## W = 0.96904, p-value = 0.03865
residual_analysis_stochastic(m5.temp.ML)
##
## Shapiro-Wilk normality test
##
## data: st.res
## W = 0.97235, p-value = 0.06451
residual_analysis_stochastic(m5.temp.CSS.ML)
##
## Shapiro-Wilk normality test
##
## data: st.res
## W = 0.97235, p-value = 0.06451
residual_analysis_stochastic(m6.tempCSS)
##
## Shapiro-Wilk normality test
##
## data: st.res
## W = 0.94073, p-value = 0.0006997
residual_analysis_stochastic(m6.temp.ML)
##
## Shapiro-Wilk normality test
##
## data: st.res
## W = 0.98293, p-value = 0.3233
residual_analysis_stochastic(m6.temp.CSS.ML)
##
## Shapiro-Wilk normality test
##
## data: st.res
## W = 0.98293, p-value = 0.3233
residual_analysis_stochastic(m7.tempCSS)
##
## Shapiro-Wilk normality test
##
## data: st.res
## W = 0.9786, p-value = 0.1698
residual_analysis_stochastic(m7.temp.ML)
##
## Shapiro-Wilk normality test
##
## data: st.res
## W = 0.98154, p-value = 0.2643
residual_analysis_stochastic(m7.temp.CSS.ML)
##
## Shapiro-Wilk normality test
##
## data: st.res
## W = 0.98154, p-value = 0.2643
residual_analysis_stochastic(m8.tempCSS)
##
## Shapiro-Wilk normality test
##
## data: st.res
## W = 0.97917, p-value = 0.1852
residual_analysis_stochastic(m8.temp.ML)
##
## Shapiro-Wilk normality test
##
## data: st.res
## W = 0.98167, p-value = 0.2696
residual_analysis_stochastic(m8.temp.CSS.ML)
##
## Shapiro-Wilk normality test
##
## data: st.res
## W = 0.98167, p-value = 0.2696
residual_analysis_stochastic(m9.tempCSS)
##
## Shapiro-Wilk normality test
##
## data: st.res
## W = 0.9806, p-value = 0.2298
residual_analysis_stochastic(m9.temp.ML)
##
## Shapiro-Wilk normality test
##
## data: st.res
## W = 0.98467, p-value = 0.4121
residual_analysis_stochastic(m9.temp.CSS.ML)
##
## Shapiro-Wilk normality test
##
## data: st.res
## W = 0.98466, p-value = 0.4119
Above plots represents the residual analysis of all the obtained models using three different methods. Moving further lets analyse and interpret the best models among all above plots.
residual_analysis_stochastic(m5.tempCSS)
##
## Shapiro-Wilk normality test
##
## data: st.res
## W = 0.96904, p-value = 0.03865
Time series plot - In the time series plot, we cannot see any trend however, there is an intervention point during the year 1974.
Histogram of the residuals is normally distributed and symmetric
The Normal Q-Q plot exhibits normality and has only one outlier.
There is a presence of white noise in ACF plot.
The PACF plot exhibits two error lags.
Ljung Box Test - As per the result all the points lie above the reference line which is a good indication for the model and confirms that the model do not show any lag of fit.
From the shapiro-wilk test, we find the p_value is less than 0.05. Hence, we reject the hypothesis that there is no normality of residuals.
residual_analysis_stochastic(m5.temp.ML)
##
## Shapiro-Wilk normality test
##
## data: st.res
## W = 0.97235, p-value = 0.06451
Time series plot - In the time series plot, we cannot see any trend however, there is an intervention point between the years 1973 & 1974.
Histogram of the residuals is normally distributed and symmetric
The Normal Q-Q plot exhibits normality and has only one outlier.
There is a presence of white noise in ACF plot.
The PACF plot exhibits one error lag.
Ljung Box Test - As per the result all the points lie above the reference line which is a good indication for the model and confirms that the model do not show any lag of fit.
From the shapiro-wilk test, we find the p_value is greater than 0.05. Hence, we do not reject normality of residuals.
residual_analysis_stochastic(m5.temp.CSS.ML)
##
## Shapiro-Wilk normality test
##
## data: st.res
## W = 0.97235, p-value = 0.06451
Time series plot - In the time series plot, we cannot see any trend however, there is an intervention point between the years 1973 & 1974.
Histogram of the residuals is normally distributed and symmetric
The Normal Q-Q plot exhibits normality and has only one outlier.
There is a presence of white noise in ACF plot.
The PACF plot exhibits one error lag.
Ljung Box Test - As per the result all the points lie above the reference line which is a good indication for the model and confirms that the model do not show any lag of fit.
From the shapiro-wilk test, we find the p_value is greater than 0.05. Hence, we do not reject normality of residuals.
residual_analysis_stochastic(m1.temp.ML)
##
## Shapiro-Wilk normality test
##
## data: st.res
## W = 0.98474, p-value = 0.4163
Time series plot - In the time series plot, we cannot see any trend however, there is an intervention point between the years 1973 & 1974.
Histogram of the residuals is normally distributed and symmetric
The Normal Q-Q plot exhibits normality.
There is a presence of white noise in ACF plot.
The PACF plot exhibits no error lags.
Ljung Box Test - As per the result all the points lie above the reference line which is a good indication for the model and confirms that the model do not show any lag of fit.
From the shapiro-wilk test, we find the p_value is greater than 0.05. Hence, we do not reject normality of residuals.
residual_analysis_stochastic(m1.tempCSS)
##
## Shapiro-Wilk normality test
##
## data: st.res
## W = 0.9828, p-value = 0.3175
Time series plot - In the time series plot, we cannot see any trend however, there is an intervention point between the years 1973 & 1974.
Histogram of the residuals is normally distributed and symmetric
The Normal Q-Q plot exhibits normality.
There is a presence of white noise in ACF plot.
The PACF plot exhibits no error lags.
Ljung Box Test - As per the result all the points lie above the reference line which is a good indication for the model and confirms that the model do not show any lag of fit.
From the shapiro-wilk test, we find the p_value is greater than 0.05. Hence, we do not reject normality of residuals.
residual_analysis_stochastic(m1.temp.CSS.ML)
##
## Shapiro-Wilk normality test
##
## data: st.res
## W = 0.98281, p-value = 0.3179
Time series plot - In the time series plot, we cannot see any trend however, there is an intervention point between the years 1973 & 1974.
Histogram of the residuals is normally distributed and symmetric
The Normal Q-Q plot exhibits normality.
There is a presence of white noise in ACF plot.
The PACF plot exhibits one error lag.
Ljung Box Test - As per the result all the points lie above the reference line which is a good indication for the model and confirms that the model do not show any lag of fit.
From the shapiro-wilk test, we find the p_value is greater than 0.05. Hence, we do not reject normality of residuals.
So, here we took the best 3 models based on significance level. From which the best 2 models which appeared after residual analysis are m5.temp.ML and m5.temp.CSS.ML
sort.score <- function(x, score = c("bic", "aic")){
if (score == "aic"){
x[with(x, order(AIC)),]
} else if (score == "bic") {
x[with(x, order(BIC)),]
} else {
warning('score = "x" only accepts valid arguments ("aic","bic")')
}
}
sort.score(AIC(m1.temp.ML, m1.temp.CSS.ML,
m3.temp.ML, m3.temp.CSS.ML,
m4.temp.ML, m4.temp.CSS.ML,
m5.temp.ML, m5.temp.CSS.ML,
m6.temp.ML, m6.temp.CSS.ML,
m7.temp.ML, m7.temp.CSS.ML,
m8.temp.ML, m8.temp.CSS.ML,
m9.temp.ML, m9.temp.CSS.ML), score = "aic")
## df AIC
## m1.temp.ML 3 -8.092865
## m3.temp.CSS.ML 2 136.352072
## m3.temp.ML 2 136.352072
## m4.temp.ML 3 137.865541
## m1.temp.CSS.ML 3 137.865541
## m4.temp.CSS.ML 3 137.865541
## m9.temp.CSS.ML 6 139.056399
## m9.temp.ML 6 139.056399
## m7.temp.CSS.ML 4 139.605233
## m7.temp.ML 4 139.605233
## m6.temp.ML 7 140.522288
## m6.temp.CSS.ML 7 140.522289
## m8.temp.CSS.ML 5 141.602302
## m8.temp.ML 5 141.602302
## m5.temp.ML 5 144.504494
## m5.temp.CSS.ML 5 144.504494
sort.score(BIC(m1.temp.ML, m1.temp.CSS.ML,
m3.temp.ML, m3.temp.CSS.ML,
m4.temp.ML, m4.temp.CSS.ML,
m5.temp.ML, m5.temp.CSS.ML,
m6.temp.ML, m6.temp.CSS.ML,
m7.temp.ML, m7.temp.CSS.ML,
m8.temp.ML, m8.temp.CSS.ML,
m9.temp.ML, m9.temp.CSS.ML),score ="bic")
## df BIC
## m1.temp.ML 3 -1.262867
## m3.temp.CSS.ML 2 140.905404
## m3.temp.ML 2 140.905404
## m4.temp.ML 3 144.695540
## m1.temp.CSS.ML 3 144.695540
## m4.temp.CSS.ML 3 144.695540
## m7.temp.CSS.ML 4 148.711897
## m7.temp.ML 4 148.711897
## m9.temp.CSS.ML 6 152.716396
## m9.temp.ML 6 152.716396
## m8.temp.CSS.ML 5 152.985633
## m8.temp.ML 5 152.985633
## m5.temp.ML 5 155.887824
## m5.temp.CSS.ML 5 155.887824
## m6.temp.ML 7 156.458951
## m6.temp.CSS.ML 7 156.458951
frc = forecast(m5.temp.CSS.ML, h=24)
frc
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Feb 1977 14.88108 14.073754 15.68840 13.646384 16.11577
## Mar 1977 13.21153 12.355019 14.06803 11.901612 14.52144
## Apr 1977 12.67301 11.770467 13.57556 11.292688 14.05334
## May 1977 11.30102 10.351936 12.25010 9.849523 12.75251
## Jun 1977 12.36845 11.403916 13.33299 10.893322 13.84358
## Jul 1977 12.64569 11.597663 13.69371 11.042873 14.24850
## Aug 1977 11.96746 10.875872 13.05906 10.298019 13.63691
## Sep 1977 12.06018 10.927560 13.19280 10.327987 13.79237
## Oct 1977 13.82506 12.647016 15.00310 12.023399 15.62671
## Nov 1977 15.34030 14.133015 16.54758 13.493917 17.18668
## Dec 1977 15.94060 14.689787 17.19141 14.027647 17.85355
## Jan 1978 16.67561 15.387723 17.96351 14.705954 18.64527
## Feb 1978 15.05436 13.354388 16.75433 12.454478 17.65424
## Mar 1978 13.36760 11.564165 15.17104 10.609483 16.12572
## Apr 1978 12.83158 10.937403 14.72575 9.934688 15.72846
## May 1978 11.46994 9.481875 13.45800 8.429457 14.51042
## Jun 1978 12.52299 10.478049 14.56792 9.395525 15.65045
## Jul 1978 12.80990 10.643990 14.97581 9.497427 16.12237
## Aug 1978 12.12798 9.876410 14.37955 8.684502 15.57145
## Sep 1978 12.21954 9.888023 14.55106 8.653792 15.78529
## Oct 1978 13.98868 11.573348 16.40402 10.294746 17.68262
## Nov 1978 15.49960 13.017731 17.98146 11.703911 19.29528
## Dec 1978 16.10236 13.541242 18.66348 12.185467 20.01926
## Jan 1979 16.83685 14.203130 19.47056 12.808924 20.86477
In the above steps, we used all the approaches, viz. Classical and Residual approaches to find the best fitting model.
We found SARIMA(4,1,0) x (0,1,0)_12 model from the residual approach as the best fitting model.
We will consider this model for the prediction of time series of next two years.
The figure below shows the prediction of United States beer production for the next two years. We can also see that the prediction limits are getting wider and wider as we go further with time because there is uncertainty in the prediction.
plot(frc, main = " Figure 30. Forecast for the next 10 years (1977-1987)",
xlab = "Years", ylab = "Electric Production")
We performed a time series analysis to observe interesting trends and patterns in the production of beer in the United States from the year 1970 to 1977.
After performing all the necessary steps, we have predicted the production of beer in the United States for the next two years.
In the prediction, we can see the limits are getting wider as there is uncertainty in the predictions.
There is an upward trend in the predicted time series plot. This might be due to an increase in number of breweries, legalization of home production of beer, better storage facilities and other numerous reasons.
A Seasonal ARIMA was used to study and analyze the trends of beer production in the United States. This data depicted an upward trend and seasonality. To analyze such data, we cannot use a deterministic models. Deterministic models do not explain the seasonal behaviour of such a time series. A seasonal model works well with such time series. The residuals from the seasonal means plus linear time trend model are highly correlated at many lags. The following steps were taken to model the data:
Descriptive Analysis
Model Specification
Classical Approach
Residual Approach
Model Fitting
Diagnostic Check/Residual Analysis
Prediction/Forecasting
time series | Rob J Hyndman. (2021). Retrieved 13 June 2021, from https://robjhyndman.com/categories/time-series/
Explore the data - CensusAtSchool New Zealand. (2021). Retrieved 10 June 2021, from https://new.censusatschool.org.nz/explore/