1 Introduction

In this report, we will conduct the Model-Building Strategy to find the best fitting model for the data of monthly retail sales in millions of dollars for recreational goods in New Zealand from 1995 to 2010. After this, the forecast for monthly retail sales for recreational goods in New Zealand for the next 10 units of time will be given at the end of the report.

1.1 Target Feature

This series gives the monthly retail sales in millions of dollars for four categories of goods in New Zealand from 1995 to 2010. (There are many more categories available at infoshare). And we have selected one of the variable, Rec_goods as our target feature.

  • Rec_goods is recreational goods, and includes sport and camping equipment, toys and games, books and stationery, photographic equipment and marine equipment.

1.2 Preparing Data

Time Rec_goods
1995M05 103.2
1995M06 95.9
1995M07 98.0
1995M08 101.2
1995M09 103.6
1995M10 109.4

1.3 Data Visualization

1.3.1 Time Series Plot

By observing Figure.1 as below, we can say this time series has a little upwards trend (Increasing mean), a bit of changing variation and seasonality.



1.3.2 Scatter Plot

The scatter plot indicatess that there is a high positive correlation between the retail sales of previous year and current year.



1.4 Check Seasonality

There is a slow decaying pattern in the ACF and very high correlation on the first lag in the PACF implies the existence of a trend and non-stationarity.

1.5 Check Stationarity

ADF Test Assumtions:

  • \(H_0\): The given series is non-stationary
  • \(H_A\): Stationary
## 
## Title:
##  Augmented Dickey-Fuller Test
## 
## Test Results:
##   PARAMETER:
##     Lag Order: 14
##   STATISTIC:
##     Dickey-Fuller: 1.978
##   P VALUE:
##     0.9884 
## 
## Description:
##  Tue Jun 30 07:26:40 2020 by user:

Furthermore, with the p-value from the ADF-test of 0.9884, clearly it is greater than 0.05. this indicates the it is not significant enough to reject the null hypothesis. Hence, this implies that the series is non-stationary.



1.6 Normality Test

From the results, the Q-Q plot reveals that it did not meet the requirements of normality. It is further confirmed by the p-value = 5.065e-10, which is less than 0.05. Hence, this implies that the null hypothesis is rejected showing that the data is not normal.

## 
##  Shapiro-Wilk normality test
## 
## data:  retail.ts
## W = 0.89695, p-value = 5.065e-10



2 Model Specification

2.1 Check Stationarity & Seasonality

According to Figure 3 & 4, there is the presence of seasonality and non-stationarity as we can see some significant correlations at lag1, lag2 in ACF & PACF. So, in order to get rid of these characteristics, we apply seasonal lag.

2.2 Seasonality Lag

We will add the SARMA(1,1,1) component and see if we get rid of seasonal component.

2.3 Fitting in SARMA(1,1,1)

We observe a significant correlation at lag1 in ACF, a significant correration at lag1 and a slightly significant correlation at lag2 in PACF .This is due to the changing point and existing trend. We can move forward with ordinary differencing to get rid of the ordinary part.

m2.retail_ts = arima(retail.ts,order=c(0,0,0),seasonal=list(order=c(1,1,1), period=12))
m2_residual = residuals(m2.retail_ts)

2.4 Transformations

Before dealing with the behaviour of the series, at first we have to deal with the trend of the series followed by the changing variance .

2.4.1 Log Tranformation

The p-value = 0.99 with is greater than 0.05 implies that there is no statistical evidence to reject the null hypothesis. Hence, the log transformed series is still non-stationary.

lambda = 0 
retail.log = log(retail.ts)
plot(log(retail.ts), type="l",main="Figure.12: Time series plot of log Retail Sales \n for recreational goods")

order=ar(diff(retail.log))$order
adfTest(retail.log, lags = order)
## 
## Title:
##  Augmented Dickey-Fuller Test
## 
## Test Results:
##   PARAMETER:
##     Lag Order: 14
##   STATISTIC:
##     Dickey-Fuller: 2.1218
##   P VALUE:
##     0.99 
## 
## Description:
##  Tue Jun 30 07:26:41 2020 by user:

2.4.2 Box-Cox Transformation

The Box-cox transformed series still shows some incremented trend and the ACF and PACF shows some non-stationarity in the series with some changing variance. Hence, we need to perform the ADF test to confirm if the BOX-COX transformed series is stationary.

p=BoxCox.ar(retail.ts)

Lambda is found to be 0.35

lambda=0
lambda <- (as.numeric(p$ci[1]) + as.numeric(p$ci[2])) / 2
lambda
## [1] 0.35
retail.ts.BC = ((retail.ts^lambda)-1)/lambda

order=ar(diff(retail.ts.BC))$order
adfTest(retail.ts.BC, lags = order)
## 
## Title:
##  Augmented Dickey-Fuller Test
## 
## Test Results:
##   PARAMETER:
##     Lag Order: 14
##   STATISTIC:
##     Dickey-Fuller: 2.0909
##   P VALUE:
##     0.99 
## 
## Description:
##  Tue Jun 30 07:26:46 2020 by user:

The p-value = 0.99 with is greater than 0.05 implies that there is no statistical evidence to reject the null hypothesis. Hence, the series is non-stationary. But at this point, we will keep the Box-Cox transformation since it still helps with stabilizing the variance.

2.5 Differencing

In order to de-trend the series, the differencing is required.

2.5.1 * Applying First Differencing

From ACF and PACF below we consider {SARIMA(2,1,3)x(1,1,1)_12} as one of the candidate model.

diff.retail = arima(retail.ts.BC,order=c(0,1,0),seasonal=list(order=c(1,1,1), period=12))
m4_retail = residuals(diff.retail) 

2.6 EACF

From the EACF, the models that we selected are:

  • {SARIMA(0,1,3)X(1,1,1)_12}
  • {SARIMA(1,1,2)X(1,1,1)_12}
  • {SARIMA(1,1,3)X(1,1,1)_12}
  • {SARIMA(2,1,3)X(1,1,1)_12}
  • {SARIMA(2,1,4)X(1,1,1)_12}
  • {SARIMA(3,1,4)X(1,1,1)_12}
## AR/MA
##   0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 x x x o o o o o x x o  o  o  o 
## 1 x x o o o o o o o o o  o  o  o 
## 2 x x x o o o o o o o o  o  o  o 
## 3 x x x x o o o o o o o  o  o  o 
## 4 x x x x o o o o o o o  o  o  o 
## 5 x x x o o x o o o o o  o  o  o 
## 6 x x x x o x o o o o o  o  o  o 
## 7 x x x x o x x x o o o  o  o  o
res = armasubsets(y=m4_retail,nar=10,nma=10,y.name='test',ar.method='ols')
## Reordering variables and trying again:
plot(res)
title('Figure.20: BIC table for recreational goods sales ',line=5)

  • From BIC table set of probable models are: {SARIMA(1,1,0)X(1,1,1)_12}, {SARIMA(0,1,10)X(1,1,1)_12}

  • We consider only ARIMA(1,1,0) but not ARIMA(0,1,10) because it is a large model and violates principle of parsimony

  • Final set of candidate model: {SARIMA(0,1,3)X(1,1,1)_12}, {SARIMA(1,1,0)X(1,1,1)_12}, {SARIMA(1,1,2)X(1,1,1)_12}, {SARIMA(1,1,3)X(1,1,1)_12}, {SARIMA(2,1,3)X(1,1,1)_12}, {SARIMA(2,1,4)X(1,1,1)_12}, {SARIMA(3,1,4)X(1,1,1)_12}



3 Model Fitting

3.1 Parameter Estimation

We use Maximum likelihood estimate (ML) for parameter estimation.



3.1.1 {SARIMA(0,1,3)X(1,1,1)_12}

The residuals of {SARIMA(0,1,3)X(1,1,1)_12} are white noise.

## 
## z test of coefficients:
## 
##       Estimate Std. Error z value  Pr(>|z|)    
## ma1  -0.590409   0.076518 -7.7159 1.201e-14 ***
## ma2  -0.190141   0.096594 -1.9685   0.04901 *  
## ma3   0.117708   0.076810  1.5325   0.12541    
## sar1  0.248498   0.103094  2.4104   0.01594 *  
## sma1 -0.929483   0.140349 -6.6226 3.528e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3.1.2 {SARIMA(1,1,0)X(1,1,1)_12}

In spite of P-value < 0.05 and {SARIMA(1,1,0)X(1,1,1)_12} found to be significant, the residuals are not white noise.

## 
## z test of coefficients:
## 
##       Estimate Std. Error z value  Pr(>|z|)    
## ar1  -0.387029   0.070684 -5.4755 4.364e-08 ***
## sar1  0.299620   0.102639  2.9192   0.00351 ** 
## sma1 -0.920715   0.121113 -7.6021 2.913e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3.1.3 {SARIMA(1,1,2)X(1,1,1)_12}

The residuals are white noise

## 
## z test of coefficients:
## 
##       Estimate Std. Error z value  Pr(>|z|)    
## ar1  -0.647609   0.247858 -2.6128   0.00898 ** 
## ma1   0.061457   0.227838  0.2697   0.78736    
## ma2  -0.552131   0.136139 -4.0556 5.000e-05 ***
## sar1  0.251990   0.105428  2.3902   0.01684 *  
## sma1 -0.920779   0.128107 -7.1876 6.594e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3.1.4 {SARIMA(1,1,3)X(1,1,1)_12}

The residuals are white noise

## 
## z test of coefficients:
## 
##       Estimate Std. Error z value  Pr(>|z|)    
## ar1  -0.465547   0.398271 -1.1689   0.24244    
## ma1  -0.133731   0.396471 -0.3373   0.73589    
## ma2  -0.454120   0.221744 -2.0479   0.04057 *  
## ma3   0.067840   0.099852  0.6794   0.49688    
## sar1  0.243115   0.104268  2.3316   0.01972 *  
## sma1 -0.920220   0.128289 -7.1730 7.335e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3.1.5 {SARIMA(2,1,3)X(1,1,1)_12}

The residuals are not white noise

## 
## z test of coefficients:
## 
##        Estimate Std. Error   z value  Pr(>|z|)    
## ar1  -1.1450932  0.0076072 -150.5268 < 2.2e-16 ***
## ar2  -0.9975053  0.0045820 -217.6988 < 2.2e-16 ***
## ma1   0.5309463  0.0693785    7.6529 1.965e-14 ***
## ma2   0.2553908  0.0832273    3.0686  0.002151 ** 
## ma3  -0.6373133  0.0690047   -9.2358 < 2.2e-16 ***
## sar1  0.1383738  0.1103198    1.2543  0.209734    
## sma1 -0.8371407  0.0939882   -8.9069 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3.1.6 {SARIMA(2,1,4)X(1,1,1)_12}

The residuals are not white noise.

## 
## z test of coefficients:
## 
##        Estimate Std. Error   z value  Pr(>|z|)    
## ar1  -1.1463891  0.0065142 -175.9843 < 2.2e-16 ***
## ar2  -0.9989487  0.0026056 -383.3900 < 2.2e-16 ***
## ma1   0.6075968  0.0800136    7.5937  3.11e-14 ***
## ma2   0.2315085  0.0795120    2.9116  0.003596 ** 
## ma3  -0.6991960  0.0748511   -9.3412 < 2.2e-16 ***
## ma4  -0.1251409  0.0820637   -1.5249  0.127278    
## sar1  0.1366982  0.1099528    1.2432  0.213778    
## sma1 -0.8408126  0.0946122   -8.8869 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3.1.7 {SARIMA(3,1,4)X(1,1,1)_12}

The residuals are not white noise

## Warning in log(s2): NaNs produced
## Warning in stats::arima(x = x, order = order, seasonal = seasonal, xreg =
## xreg, : possible convergence problem: optim gave code = 1
## 
## z test of coefficients:
## 
##        Estimate Std. Error  z value  Pr(>|z|)    
## ar1  -0.2778385  0.2474655  -1.1227 0.2615497    
## ar2  -0.0040954  0.2833534  -0.0145 0.9884684    
## ar3   0.8654624  0.2463806   3.5127 0.0004436 ***
## ma1  -0.3164778  0.3003973  -1.0535 0.2920979    
## ma2  -0.2184214  0.1234635  -1.7691 0.0768744 .  
## ma3  -0.8834205  0.0517910 -17.0574 < 2.2e-16 ***
## ma4   0.5141942  0.2353213   2.1851 0.0288835 *  
## sar1  0.1445903  0.1118284   1.2930 0.1960229    
## sma1 -0.8316779  0.0956888  -8.6915 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We have 2 candidate models with the residuals being white noise: * {SARIMA(0,1,3)X(1,1,1)_12} * {SARIMA(1,1,2)X(1,1,1)_12} * {SARIMA(1,1,3)X(1,1,1)_12}

3.2 AIC & BIC Score:

We will compare AIC and BIC score for the candidate models. The model with the least scores will be selected as the most competative model.

##            df      AIC
## retail_112  6 7.194547
## retail_013  6 7.909108
## retail_113  7 8.818008
##            df      BIC
## retail_112  6 26.07951
## retail_013  6 26.79408
## retail_113  7 30.85047

Since AIC and BIC score of {SARIMA(1,1,2)X(1,1,1)_12} model are both the least, {SARIMA(1,1,2)X(1,1,1)_12} is the best selected model for forecasting.

3.3 Check Overfitting

{SARIMA(2,1,2)}X(1,1,1)_12} and {SARIMA(1,1,3)}X(1,1,1)_12} are considered to be checked for overfitting {SARIMA(1,1,2)X(1,1,1)_12}. For {SARIMA(1,1,3)}X(1,1,1)_12} we we have performed the analysis above before.

3.3.1 {SARIMA(2,1,2)}X(1,1,1)_12}

The coefficients in the specified model and in overfitted model are very different. We calculate AIC and BIC to compare the results.

## 
## z test of coefficients:
## 
##      Estimate Std. Error z value  Pr(>|z|)    
## ar1  -0.54398    0.28588 -1.9028   0.05706 .  
## ar2  -0.11679    0.14098 -0.8284   0.40744    
## ma1  -0.05935    0.28202 -0.2104   0.83332    
## ma2  -0.38788    0.24115 -1.6084   0.10774    
## sar1  0.23839    0.10446  2.2822   0.02248 *  
## sma1 -0.91957    0.12790 -7.1895 6.503e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

##            df      AIC
## retail_112  6 7.194547
## retail_212  7 8.646553
## retail_113  7 8.818008
##            df      BIC
## retail_112  6 26.07951
## retail_212  7 30.67901
## retail_113  7 30.85047

{SARIMA(1,1,2)}X(1,1,1)_12} is considered to be the best fit with the lowest AIC and BIC score.



4 Model Diagnostics

4.1 Residuals Analysis

\(SARIMA(1,1,2)X(1,1,1)_{12}\) is proven to be normal so we can proceed with the forecasting

residual.analysis(model = retail_112) #Normal
## 
##  Shapiro-Wilk normality test
## 
## data:  res.model
## W = 0.99364, p-value = 0.6087

  • Residual plots: The scattering on the Time series plot of standardised residuals seems to show it is randomly distributed over time.
  • There are no longer any changing variance and no trend that are present in the data. Hence, this support the the model \(SARIMA(1,1,2)X(1,1,1)_{12}\).
  • Histogram: The histogram for the stardarized residuals looks normally distributed.
  • ACF and PACF: ACF plot shows that it has no significant lags to begin with.
  • Ljung-Box Test: The points shown are all above the alpha which suggests that the null-hyperpothesis is not rejected and the data are independently distributed. The argument SquaredQ = FALSE, to test for regular autocorrelations.



5 Forecasting

\(SARIMA(1,1,2)X(1,1,1)_{12}\) is selected for forecasting 10 units ahead.

The prediction of the next 10 units is shown in Figure 42. Clearly, it reflects on the seasonality. We notice that the forecast’s limits become bigger as long as the prediction is made for longer durations. That is because the uncertainty level in the forecast will increase due to the seasonality, the ordinal, and autocorrelation characteristics. In Figure 42, we can see that the next 10 units of sales of recreational goods will fluctuate in the blue region (80% confidence interval) and the grey region (95% confidence interval).

prediction = Arima(retail.ts.BC,order=c(1,1,2),seasonal=list(order=c(1,1,1), period=12))
preds1 = forecast(prediction, h = 10)
plot(preds1,
     ylab='Sales', 
     xlab='Years', main ="Figure.42: Forecast with model {SARIMA(1,1,2)X(1,1,1)_12}")





6 Summary

Firstly, the recreational goods sales series is not normal and nonstationary. As a result, we applied Box-Cox transformation to improve the stability of variance. Also, after the first seasonal and ordinary differencing, it turns out to be stationary.

In model specification, we have listed a set of the candidate models which is {SARIMA(0,1,3)X(1,1,1)_12}, {SARIMA(1,1,0)X(1,1,1)_12}, {SARIMA(1,1,2)X(1,1,1)_12}, {SARIMA(1,1,3)X(1,1,1)_12}, {SARIMA(2,1,3)X(1,1,1)_12}, {SARIMA(2,1,4)X(1,1,1)_12}, {SARIMA(3,1,4)X(1,1,1)_12}

After the model fitting, we found the most competative model is {SARIMA(1,1,2)X(1,1,1)_12} model. Through model diagnostics, it can be proved that {SARIMA(1,1,2)X(1,1,1)_12} is a reliable and promising model to predict.

In the end, the prediction of sales of recreational goods for the next 10 units of time was performed with Box-Cox transformed {SARIMA(1,1,2)X(1,1,1)_12} model in Figure.42.



7 Reference



8 Appendix

8.1 Packages

The necessary packages has been installed and load below. The required codes and funtions for the projects are as follows;

library(TSA)
library(tseries)
library(forecast)
library(FitAR)
library(lmtest)
library(fUnitRoots)
library(readr)


#*Box Cox Search Function

BoxCoxSearch = function(y, lambda=seq(-3,3,0.01), 
                        m= c("sf", "sw","ad" ,"cvm", "pt", "lt", "jb"), plotit = T, verbose = T){
  N = length(m)
  BC.y = array(NA,N)
  BC.lam = array(NA,N)
  for (i in 1:N){
    if (m[i] == "sf"){
      wrt = "Shapiro-Francia Test"
    } else if (m[i] == "sw"){
      wrt = "Shapiro-Wilk  Test"
    } else if (m[i] == "ad"){
      wrt = "Anderson-Darling Test"
    } else if (m[i] == "cvm"){
      wrt = "Cramer-von Mises Test"
    } else if (m[i] == "pt"){
      wrt = "Pearson Chi-square Test"
    } else if (m[i] == "lt"){
      wrt = "Lilliefors Test"
    } else if (m[i] == "jb"){
      wrt = "Jarque-Bera Test"
    } 
    
    print(paste0("------------- ",wrt," -------------"))
    out = tryCatch({boxcoxnc(y, method = m[i], lam = lambda, lambda2 = NULL, plot = plotit, alpha = 0.05, verbose = verbose)
      BC.lam[i] = as.numeric(out$lambda.hat)}, 
      error = function(e) print("No results for this test!"))
    
  }
  return(list(lambda = BC.lam,p.value = BC.y))
}


# Residual Analysis Function

residual.analysis <- function(model, std = TRUE,start = 2, class = c("ARIMA","GARCH","ARMA-GARCH")[1]){
  # If you have an output from arima() function use class = "ARIMA"
  # If you have an output from garch() function use class = "GARCH"
  # If you have an output from ugarchfit() function use class = "ARMA-GARCH"
  
  if (class == "ARIMA"){
    if (std == TRUE){
      res.model = rstandard(model)
    }else{
      res.model = residuals(model)
    }
  }else if (class == "GARCH"){
    res.model = model$residuals[start:model$n.used]
  }else if (class == "ARMA-GARCH"){
    res.model = model@fit$residuals
  }else {
    stop("The argument 'class' must be either 'ARIMA' or 'GARCH' ")
  }
  par(mfrow=c(3,2))
  plot(res.model,type='o',ylab='Standardised residuals', main="Time series plot of standardised residuals")
  abline(h=0)
  hist(res.model,main="Histogram of standardised residuals")
  acf(res.model,main="ACF of standardised residuals")
  pacf(res.model,main="PACF of standardised residuals")
  qqnorm(res.model,main="QQ plot of standardised residuals")
  qqline(res.model, col = 2)
  print(shapiro.test(res.model))
  k=0
  #Ljung Box independence test for every lag H0:independent, H1: series is correlated at lags
  LBQPlot(res.model, lag.max =  length(model$residuals)-1, StartLag = k + 1, k = 0, SquaredQ = FALSE)
}

# Sort AIC and BIC Values Function
sort.score <- function(x, score = c("bic", "aic")){
  if (score == "aic"){
    x[with(x, order(AIC)),]
  } else if (score == "bic") {
    x[with(x, order(BIC)),]
  } else {
    warning('score = "x" only accepts valid arguments ("aic","bic")')
  }
}

ret<- read_csv("RetailNZTS4.csv")
retail = ret[,c("Time","Rec_goods")]
knitr::kable(head(retail))
retail.ts = ts(retail$Rec_goods,start=c(1995,5),end=c(2010,9),frequency = 12)
plot(retail.ts, 
     type='o', 
     ylab='Sales', 
     xlab='Years', 
     main='Figure.1: Time series plot of Retail Sales of Recreational Goods')
plot(y=retail.ts, x=zlag(retail.ts), ylab='Previous year of retail sales', xlab='Current year of retail sales',main="Figure.2: Scatter plot between the retail sales \n of previous year and current year.")
par(mar=c(5,4,4.5,1),mfrow=c(1,2), cex.main=1, cex.lab=1, cex.axis=1) 
#c(bottom, left, top, right)
acf(retail.ts, lag.max = 60,main="Figure.3: ACF plot of Retail Sales of \n Recreational Goods")
pacf(retail.ts, lag.max = 60,main="Figure.4: PACF plot of Retail Sales of \n Recreational Goods")
order=ar(diff(retail.ts))$order
adfTest(retail.ts, lags = order)
qqnorm(retail.ts,main ="Figure.5: Q-Q plot of Retail Sales of Recreational Goods")
qqline(retail.ts, col=2, lty=2, lw=2)
shapiro.test(retail.ts)
m1.retail_ts = arima(retail.ts,order=c(0,0,0),seasonal=list(order=c(0,1,0), period=12))
m1_residual = residuals(m1.retail_ts);  
par(mfrow=c(1,1))
plot(m1_residual,xlab='Time',ylab='Residuals',main="Figure.6: Time series plot of the residuals")

par(mar=c(5,4,4.5,1),mfrow=c(1,2), cex.main=1, cex.lab=1, cex.axis=1) 
#c(bottom, left, top, right)
acf(m1_residual, lag.max = 60, main = "Figure.7: The sample ACF of \n the residuals")
pacf(m1_residual, lag.max = 60, main = "Figure.8: The sample PACF of \n the residuals")


m2.retail_ts = arima(retail.ts,order=c(0,0,0),seasonal=list(order=c(1,1,1), period=12))
m2_residual = residuals(m2.retail_ts)
plot(m2_residual, xlab='Time',ylab='Residuals',main="Figure.9: Time series plot of the residuals")

par(mar=c(5,4,4.5,1),mfrow=c(1,2),cex.main=1, cex.lab=1, cex.axis=1) #c(bottom, left, top, right)
acf(m2_residual, lag.max = 60, main = "Figure.10: The sample ACF of \n the residuals")
pacf(m2_residual, lag.max = 60, main = "Figure.11: The sample PACF of \n the residuals")

lambda = 0 
retail.log = log(retail.ts)
plot(log(retail.ts), type="l",main="Figure.12: Time series plot of log Retail Sales \n for recreational goods")
order=ar(diff(retail.log))$order
adfTest(retail.log, lags = order)
p=BoxCox.ar(retail.ts)
lambda=0
lambda <- (as.numeric(p$ci[1]) + as.numeric(p$ci[2])) / 2
lambda
retail.ts.BC = ((retail.ts^lambda)-1)/lambda
plot(retail.ts.BC, ylab='Retail Price',
     xlab='Year',
     type='l',
     col = c("#003366"),
     lwd=2,
     main = "Figure.13: Time Series Plot of \n Transformed  Retail Sales of Recreational Goods")
par(mar=c(5,4,4.5,1),mfrow=c(1,2), cex.main=1, cex.lab=1, cex.axis=1) #c(bottom, left, top, right)
acf(retail.log, main = "Figure.14: ACF plot for \n tranformed Retail Sales")
pacf(retail.log, main = "Figure.15: PACF plot for \n transformed Retail Sales")
order=ar(diff(retail.ts.BC))$order
adfTest(retail.ts.BC, lags = order)
diff.retail = arima(retail.ts.BC,order=c(0,1,0),seasonal=list(order=c(1,1,1), period=12))
m4_retail = residuals(diff.retail) 

par(mfrow=c(1,1))
plot(m4_retail,xlab='Year',ylab='Residuals',main="Figure.16: Time series plot of the residuals")
abline(h=0, lty=2, lwd=2, col=2)
par(mar=c(5,4,4.5,1),mfrow=c(1,2), cex.main=1, cex.lab=1, cex.axis=1) 
acf(m4_retail, main = "Figure.17: The sample ACF of \n the residuals")
pacf(m4_retail, main = "Figure.18: The sample PACF of \n the residuals")
eacf(m4_retail)
res = armasubsets(y=m4_retail,nar=10,nma=10,y.name='test',ar.method='ols')
plot(res)
title('Figure.20: BIC table for recreational goods sales ',line=5)
retail_013 = arima(retail.ts.BC,order=c(0,1,3),seasonal=list(order=c(1,1,1), period=12), method = "ML")
coeftest(retail_013)
res_013 = residuals(retail_013);  
par(mfrow=c(1,1))
plot(res_013,xlab='Time',ylab='Residuals',main="Figure.21: Time series plot of the residuals of \n {SARIMA(0,1,3)X(1,1,1)_12}")
par(mfrow=c(1,2))
acf(res_013, lag.max = 60, main = "Figure.22: The sample ACF of \n the residuals")
pacf(res_013, lag.max = 60, main = "Figure.23: The sample PACF of \n the residuals")

retail_110 = arima(retail.ts.BC,order=c(1,1,0),seasonal=list(order=c(1,1,1), period=12), method = "ML")
coeftest(retail_110)
res_110 = residuals(retail_110);  
par(mfrow=c(1,1))
plot(res_110,xlab='Time',ylab='Residuals',main="Figure.21: Time series plot of the residuals of \n {SARIMA(1,1,0)X(1,1,1)_12}")
par(mfrow=c(1,2))
acf(res_110, lag.max = 60, main = "Figure.22: The sample ACF of \n the residuals")
pacf(res_110, lag.max = 60, main = "Figure.23: The sample PACF of \n the residuals")

retail_112 = arima(retail.ts.BC,order=c(1,1,2),seasonal=list(order=c(1,1,1), period=12), method = "ML")
coeftest(retail_112)
res_112 = residuals(retail_112);  
par(mfrow=c(1,1))
plot(res_112,xlab='Time',ylab='Residuals',main="Figure.24: Time series plot of the residuals of \n {SARIMA(1,1,2)X(1,1,1)_12}")
par(mar=c(5,4,4.5,1),mfrow=c(1,2), cex.main=1, cex.lab=1, cex.axis=1)
acf(res_112, lag.max = 60, main = "Figure.25: The sample ACF of \n the residuals")
pacf(res_112, lag.max = 60, main = "Figure.26: The sample PACF of \n the residuals")
retail_113 = arima(retail.ts.BC,order=c(1,1,3),seasonal=list(order=c(1,1,1), period=12), method = "ML")
coeftest(retail_113)
res_113 = residuals(retail_113);  
par(mfrow=c(1,1))
plot(res_113,xlab='Time',ylab='Residuals',main="Figure.27: Time series plot of the residuals of \n {SARIMA(1,1,3)X(1,1,1)_12}")
par(mfrow=c(1,2))
acf(res_113, lag.max = 60, main = "Figure.28: The sample ACF of \n the residuals")
pacf(res_113, lag.max = 60, main = "Figure.29: The sample PACF of \n the residuals")
retail_213 = arima(retail.ts.BC,order=c(2,1,3),seasonal=list(order=c(1,1,1), period=12), method = "ML")
coeftest(retail_213)
res_213 = residuals(retail_213); 
par(mfrow=c(1,1))
plot(res_213,xlab='Time',ylab='Residuals',main="Figure.30: Time series plot of the residuals of \n {SARIMA(2,1,3)X(1,1,1)_12}")
par(mfrow=c(1,2))
acf(res_213, lag.max = 60, main = "Figure.31: The sample ACF of \n the residuals")
pacf(res_213, lag.max = 60, main = "Figure.32: The sample PACF of \n the residuals")

retail_214 = arima(retail.ts.BC,order=c(2,1,4),seasonal=list(order=c(1,1,1), period=12), method = "ML")
coeftest(retail_214)
res_214 = residuals(retail_214); 
par(mfrow=c(1,1))
plot(res_214,xlab='Time',ylab='Residuals',main="Figure.33: Time series plot of the residuals of \n {SARIMA(2,1,4)X(1,1,1)_12}")
par(mfrow=c(1,2))
acf(res_214, lag.max = 60, main = "Figure.34: The sample ACF of \n the residuals")
pacf(res_214, lag.max = 60, main = "Figure.35: The sample PACF of \n the residuals")

retail_314 = arima(retail.ts.BC,order=c(3,1,4),seasonal=list(order=c(1,1,1), period=12), method = "ML")
coeftest(retail_314)
res_314 = residuals(retail_314);  
par(mfrow=c(1,1))
plot(res_314,xlab='Time',ylab='Residuals',main="Figure.36: Time series plot of the residuals of \n {SARIMA(3,1,4)X(1,1,1)_12}")
par(mfrow=c(1,2))
acf(res_314, lag.max = 60, main = "Figure.37: The sample ACF of \n the residuals")
pacf(res_314, lag.max = 60, main = "Figure.38: The sample PACF of \n the residuals")
sort.score(AIC(retail_013,retail_112,retail_113),score ="aic")
sort.score(BIC(retail_013,retail_112,retail_113),score ="bic")
retail_212 = arima(retail.ts.BC,order=c(2,1,2),seasonal=list(order=c(1,1,1), period=12), method = "ML")
coeftest(retail_212)
res_212 = residuals(retail_212);  
par(mfrow=c(1,1))
plot(res_212,xlab='Time',ylab='Residuals',main="Figure.39: Time series plot of the residuals")
par(mfrow=c(1,2))
acf(res_212, lag.max = 60, main = "Figure.40: The sample ACF of \n the residuals")
pacf(res_212, lag.max = 60, main = "Figure.41: The sample PACF of \n the residuals")
sort.score(AIC(retail_112,retail_113,retail_212),score ="aic")
sort.score(BIC(retail_112,retail_113,retail_212),score ="bic")
residual.analysis(model = retail_112) #Normal
prediction = Arima(retail.ts.BC,order=c(1,1,2),seasonal=list(order=c(1,1,1), period=12))
preds1 = forecast(prediction, h = 10)
plot(preds1,
     ylab='Sales', 
     xlab='Years', main ="Figure.42: Forecast with model {SARIMA(1,1,2)X(1,1,1)_12}")