Analysis objective

This analysis has an objective to forecast the USA Advanced Retail Sales per day for the next 24 month from the last available data in 2018. Based on monthly USA Advanced Retail Sales, these time series cumulate from 1992 to the latest 2018 available data.

This analysis could be particularly relevant for medium and large MNE’s who make their sales and procurement forecast based on this advanced economics index. Our time series is a conventional univariate time series with two columns, date and daily sales. The step applied in this analysis can be used in any forecast analysis.

Preparative pre R upload

Before we upload the data in R, with Excel, we transformed data from USA monthly sales to daily sales, and we adjusted the sales to have those in constant 2018 dollars. We plan using judgemental adjustment forecasting analysis approach. We will start this analysis with preliminary data observations.

Models used: explanation

We plan to use three classic forecasting models(Seasonal naïve, Exponential smoothing, ARIMA) and experienced one neural network model Seasonal naïve model.

In this forecast analysis approach, we gradually increase the complexity level of the models that we used.

The first more basics models applied is the Seasonal Naive model, which can perform if the data are highly seasonal. The seasonal naive model makes the forecast using the last value from the same season, for example, the year before to forecast the value for the next year.

The second model applied is the exponentials state smoothing method by using ETS model who refer to error, trend and seasonality. This model can perform better in a short-term and on a univariate time series forecast. The model uses the exponentially weighted moving average (EWMA) to “smooth” a time series and trying to eliminate the random effect. The model uses a smoothing constant (a) which is transformed into a damping factor (1-a), the constant (a) define the weight which is applied to each period. ETS model can be applied as additive or multiplicative, but R selects the most optimal.

The third model applied is the autoregressive integrated moving average (ARIMA), which is fares a way more complex model than the two previous models mainly because of the algorithm which backed this forecast model in R. ARIMA is the combination of two models. First, autoregressive model AR(p), which forecast the variable of interest using a linear combination of past values of the variable, where (p) is the lag number. Second, moving average models MA(q), which is applied as a linear regression of the current value of the series against current and previous white noise error terms or random shocks.

The fourth model applied, are neural network models (NN), which is the most complex model used in this analysis. This NN model performs in nonlinear time series and with big data sets. Because we have a significant shock in our data due to the 2008 market crash, we decided to test the predictive capacity of this model. The NN model is organised in multiples layers, the simplest networks contain no hidden layers and are equivalent to linear regressions. The coefficients attached to these predictors are called “weights”. The forecasts are obtained by a linear combination of the inputs. The weights are selected in the neural network framework by using a “learning algorithm”.

Analysis outline

Preliminary data manipulation: On Excel, we compute the sales per day and adjust in 2018 dollars to cancelled the inflation impact on sales.

Install R packages, load the data and declare this data series as a time series.
Preliminary data observations
Data decomposition: Stationarity and identify lag specification, Seasonal component, Cycle component, Trend component.
Finding the most accurate model

4.1. Seasonal Naive method
4.2. ETS method, exponential smouthing models
4.3. ARIMA model
4.4. Neural network model

Make the forecast for the next 24 months.
Conclusion
Create a report with Markdown

1. Package uploaded for this analysis

## Loading required package: ggplot2

## Loading required package: forecast

## Loading required package: fma

## Loading required package: expsmooth

2. Data Observarions

First, we will graph this time series, where the main goal will be to observes some specific characteristics. Second, we will identify the magnitude of characteristics as stationarity, seasonality and any visible shock or trend.

Time Plot

We observe that these univariate time series had a strong positive trend and it appear that there is a presence of seasonality. While it looks like there is one shock in 2008, which is potentially related to the subprime market crash. Even though, in this first observation, we can intuitively conclude that these time series have no stationary. In the next section, we will investigate and identify those potential problems in the way to make some correction to this time series and for selecting the most appropriate forecasting model.

3. Preliminary data decomposition

Why we investigate Stationarity? (unit root test)

A time series is stationary if it’s characteristics like mean, variance, covariance, are time variant: that is, they do not change over the time. Non-stationarity may cause autocorrelation which we explained in the next step.

We will make the Dickey-Fuller Test to check the stationarity in the data.

adf.test(RSXFSNtimeseries, alternative = "stationary", k=12)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  RSXFSNtimeseries
## Dickey-Fuller = -2.0592, Lag order = 12, p-value = 0.5515
## alternative hypothesis: stationary

The p-value can not allow you to reject the null hypothesis of non-stationarity. This test confirmed that this series is not stationary.

To correct the non-stationarity problem, we apply the first difference and make the Dickey-Fuller again.

DS <- diff(RSXFSNtimeseries)
adf.test(DS, alternative = "stationary", k=12)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  DS
## Dickey-Fuller = -3.7092, Lag order = 12, p-value = 0.0239
## alternative hypothesis: stationary

The Dickey-Fuller test allows you to reject the null hypothesis with this small p-value and we can conclude this series is stationary. By taking the first difference, we are making the making the corrections on this initial non-stationary time series.

We can visualize the data, for seeing the impact of the first difference on this time series.

We will take the first difference from the data to remove the trend. With this first difference, we can work with this time series without having the trend influence our forecasting project.

There are two mains reasons for why we need to have stationary: One is just because tools used in time series analysis and forecasting assume stationarity

When building models to forecast time series data (like ARIMA), we start by differencing the data (calculating sequentially 𝑥𝑡−𝑥𝑡−1 until we get to a point where the series is stationary. Models account for oscillations but not for trends, and therefore, accounting for trends by differencing allows us to use the models that account for oscillations.
Also, It’s important because it helps to identify the driving factors. When we detect a change in a time series, we may be able to infer a correlation. But we need both time series to be stationary (no trend and no seasonality); otherwise, the correlation we find will be misleading.

Why we investigate autocorrelation?

Autocorrelation means that there are correlations in the error or lag correlations of a given series with itself, lagged by a number of time units. Which signify:

for: \[Y_c =x + \beta X_i+u_i\] \[ Cov(u_i, u_s) \neq 0 \forall i\neq s \] Autocorrelation measures the linear relationship between lagged values of a time series. We will see dependence in the data across a range of lag value.

This graphic shows that the data are strongly non-random and further suggests that an autoregressive model might be appropriate.

We can check the autocorrelation by plotting residual and standardized residuals of regression against time and compare if they show a similar pattern which signs for autocorrelation.

If we are using unnecessary lags, where the consumer does not change their consumption habits readily. If we use lagged term, the resulting error term will reflect systematic pattern due to the influence of lagged consumption on current consumption.

Why we investigate seasonality?

Seasonality is a pattern which occurs when a time series is affected by seasonal factors such as the time of the year or the day of the week. We need to detect seasonality in a time series in the way to make the necessary adjustment or for choosing the appropriate models. Seasonality adjustment has three main reasons: 1. to aid in short-term forecasting 2. to help in relating time series to other series or extreme events 3. to allow series to be compared from year to year or month to month or the day today.

The seasonal comportment of a time series comprises three main types of systematic calendar related influences: 1. seasonal influences: winter, summer fall spring 2. trading day influences 3. moving holiday influences

We will use two different visualization graph to identify the seasonality.

We can observe that those 27 multiples colour lines, already have the same pattern over the years. Those colour lines informed you of the possible presence of seasonal cycles in this time series.

Now, look at this other seasonal plot which isolates the variation for one month at the time.

The horizontal lines indicate the means for each month. This form of plot enables the underlying seasonal pattern to be seen clearly, and also shows the changes in seasonality over time. It is especially useful in identifying changes within particular seasons.

Why we investigate trend?

Trend often refers to historical changes of data, trend implies a prediction of future behaviour. In other words, a positive trend means that it is likely that the growth continues or negative trend suppose economics slow down or decrease in sales.

The trend exists when there is a long-term increase or decrease in the data, while it informed you on the direction.

We will decompose the data series using a moving average of order m = 11 (we chose 11 for making lines which will show clearly the trend).

\[\hat{T}_t=\frac{1}{m}\sum_{j=-k}^n y_{t+j}, \]

Conclusion of preliminary data decomposition

After this first analysis, we observed that our time series, has a positive trend, are affected by some seasonality and are affected by autocorrelation. While the series is affected by a shock around years 2008, which can also be interpreted as the end of a cycle.

4. In this next step, we will try to find the most accurate model

Forecast with various models: we will use a benchmark method to forecast.

4.1 Fit with seasonal naive method

First, let’s use the seasonal naïve method as our benchmark.

This model using the most recent observation as a forecast, which is the most basic forecasting model. Because a naïve forecast is optimal when data follow a random walk, these are also called random walk forecasts. Because we observe seasonality in those data, we will apply the seasonal naïve model useful for highly seasonal data, like we have where (m) are the seasonal period.

\[\hat{y}_{T+h|T}=y_{T+h-m(k+1)} \]

fit_SN <- snaive(DS)
checkresiduals(fit_SN)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 527.36, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

We have a residual standard error (residual) = 290.55, what mean for the exact month the years before, which missing on average 290.55 million

The ACF graf shows us that we have autocorrelation in the residual so the model, is not a good predictive model. Residual interpretation: The forecasts from a model with autocorrelated errors are still unbiased, and so are not wrong, but they will usually have larger prediction intervals than they need to. Therefore we should always look at an ACF plot of the residuals.

The Ljung-Box test helps us for the autocorrelation analysis. The test tries to reject the independence of some values.

Test results interpretation: If the p-value is larger than 0.05 we don’t have enough statistical evidence to reject the null hypothesis. So, with the seasonal naïve method we can not assume that your values are dependent.

Discussion

The naive and Snaive model is fundamental models. Some business uses those model basic forecasting models, maybe because of the lack of internal resources. Producing or maintaining extra stock it is a cost for the company and creates inefficiency. We continue testing the forecast performance of others models.

4.2. Fit ETS method, exponential smoothing method

Second, we will apply ETS model: Error, Trend, Seasonal. The flexibility of the ETS model lies in its ability to trend and seasonal components of different traits. This function ets() automatically optimizes the choice of model and necessary parameters. We present the structure of the additive and the multiplicative form.

Assuming: \[\mu_t = \hat{y}_t=l_{t-1}+ b_{t-1} \] and \[ \varepsilon_t = y_{t} - \mu_{t} \] ETS additive \[ y_t = l_{t-1} + \phi b_{t-1} + \varepsilon_t \] \[ l_t = l_{t-1} + \phi b_{t-1} + \alpha\varepsilon_t \] \[ b_t = \phi b_{t-1} + \beta^*(l_{t}-l_{t-1}- \phi b_{t-1} = \phi b_{t-1}+\alpha\beta^*\varepsilon_t \] ETS multiplcative Assuming: \[\varepsilon_t = (y_t-\mu_t ) / \mu_t \] \[ y_t = (l_{t-1} + \phi b_{t-1}) (1+\varepsilon_t) \] \[ l_t = (l_{t-1} + \phi b_{t-1}) (1+\alpha\varepsilon_t) \] \[ b_t = \phi b_{t-1}+\beta(l_{t-1} + \phi b_{t-1})\varepsilon_t \]

fit_ets <- ets(RSXFSNtimeseries) #residual = 221.1631
checkresiduals(fit_ets)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,Ad,A)
## Q* = 319.37, df = 7, p-value < 2.2e-16
## 
## Model df: 17.   Total lags used: 24

We have a residual sd = 215.22, which are more accurate than the seasonal naïve models and what mean for the exact month the years before, which missing on average 215.20 million.

So this model increases the precision and offer a better fit but: if we look to AFC graph, we observe that there remains autocorrelation because of the bar going out of the 95% confidence permutated bleu line.

The Ljung-Box test Test results interpretation: This p-value can not allow you to assume that there are independent in (e).

Discussion

We realize that with using just a bit more complex forecasting model, we increase accuracy and can make a significant cost savings to any company. We continue our analysis with a one of the most performant forecasting model the ARIMA.

4.3. Fit on ARIMA model

ARIMA model is a Generalized random walk model which is fine-tuned to eliminate all residual autocorrelation. It is a Generalized exponential smoothing model that can incorporate long-term trends and seasonality.

AR(p) model \[ (1-\sum^p_{k=1}\alpha_kL^k)X_t = \varepsilon_t \] MA(q) model \[ X_t = (1+\sum^q_{k=1}\beta_kL^k)\varepsilon_t \] Integration, the first difference operator, delta, is defined as: \[\Delta X_t=X_t -X_{t-1} = (1-L)X_t \] Where \[ Y_t = (1-L)X_t\] ARIMA(p, d, q) full model \[ (1-\sum^p_{k=1}\alpha_kL^k)(1-L)^dX_t = (1+\sum^q_{k=1}\beta_kL^k)\varepsilon_t \]

ARIMAX model \[\Delta y_t=\alpha_0+\sum_{j}\alpha_j \Delta y_{t-j}+\sum_h\gamma_h\epsilon_{t-h}+X\beta+\epsilon_t \]

fit_ARIMA <- auto.arima(RSXFSNtimeseries, d=1, D=1, stepwise = FALSE, approximation = FALSE)
checkresiduals(fit_ARIMA)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,1,1)(0,1,2)[12]
## Q* = 69.64, df = 19, p-value = 1.056e-07
## 
## Model df: 5.   Total lags used: 24

We have a residual standard error (residual) = 196.20, what mean for the exact month the years before, which missing on average 196.20 million.

At this stage, we can conclude that the ARIMA model offers the best fit base on residual and standard deviations.

The Ljung-Box test Test results interpretation: The p-value is higher than the rejection criteria, but the ARIMA model performs better than the seasonal naive and ETS models.

Discussion

Having capacity for applying and Understanding complex models is a plus value for any business or forecasting team. Procurement material for manufacturing, distributions and strategic investments are al base on the capacity to forecast sales. Accuracy in short-term sales forecast has the critical impact on all operational planning. Accuracy on long-term forecasting, contribute for supporting the significant investment decision. In the end, short-term and long-term forecast accuracy will have a significant impact to optimize the growth of the company. Lousy forecasting can reduce company capacity for expansions.

4.4 Neural network models

In this analysis, we explore and test the predictions capacity using artificial neural network architecture. This model allows complex nonlinear relationships between the response variable and its predictors. Our time series are mostly linear, and in this case, the neural network is maybe not the most appropriate model, but we will compare performance again seasonal naive, ETS and ARIMA.

In neural network the inputs of each node are combinde using a weighted linear combination. For exemple the inputs (for 4 inputs) into hidden neuron j are combined linearly to give \[ z_j = b_j+\sum_{i=1}^4 w_i,_jx_i \] and in the hidden layer,this is then modified using a nonlinear function such as a sigmoid, \[ s(z) = \frac{1}{1=e^-z, \]

NNL <- nnetar(RSXFSNtimeseries)
NN <- nnetar(RSXFSNtimeseries, lambda="auto")
checkresiduals(NNL)

accuracy(NN)

##                      ME     RMSE      MAE         MPE      MAPE     MASE
## Training set -0.2510618 94.05648 72.79515 -0.01320968 0.6059007 0.192971
##                   ACF1
## Training set -0.020541

This very complex model demonstrate dramatic performance, which we will discuss further. If we based our models’ comparison only on ACF graphic, we could conclude that the NN model performs relatively at best, but we observe high autocorrelation at lag 12, which suggest a clear pattern. In the ACF graph mostly all, black lines, are in ours confidence blue lines forks, which is very good.

Discussion

Recently Neural network and deep-learning become a mature technology, ready to be implemented in different industries. Deep-learning will be an excellent technology tool to apply in forecasting and we can see the strong performance of those tools. The major problem that comes with this technology is the capacity for understanding deeply those models to understand how it works and where it could be applied. There is a lack of deep-learning expert, so it will be essential to invest time in understanding this technology applied in forecasting.

4.6 Comparative forecast performance

We will compare those forecast models performances with using two strategies.

First strategy: We will compare the indicators produced by the output of the accuracy() function when all the time series is the training set.

Second strategy: Like the first strategy, we will compare the indexes produce by the output of the accuracy() function but we will split the dataset by creating, one training set and one test set.

It is important to use the right tool for an accurate analysis and for understanding the impact on the results when a different strategy is used.

We will base ours performance comparison model analysis on those indexes which will we define it before we use it.

Mean absolute error: MAE \[ MAE=mean(e^2_{t}) \] Root mean squared error: RMSE \[ e_{t}={y}_{t}-\hat{y}_{t|N} \] \[ RMSE=\sqrt{mean(e^2_{t})} \] Mean absolute percentage error: MAPE \[ p_{t}=100e_{t}/y_{t} \] \[ MAPE=mean(|p_{t}|) \] Mean absolute scaled error: MASE \[ q_{t}= e_{t}/Q \] Where Q is a scaling statistic computed on the training data. \[ MASE=mean(|q_{j}|) \]

Autocorrelation of errors at lag 1 (ACF1) \[ ACF=\frac{Covariance(x_{t},x_{t-h})}{Variance(x_{t})} \] First strategy results: Tables comparaision for the first three models

#Seasonal Naive model
accuracy(fit_SN)

##                     ME     RMSE      MAE       MPE     MAPE MASE
## Training set 0.9864121 290.5499 233.6249 0.9691464 225.6828    1
##                    ACF1
## Training set -0.4594836

# ETS model
accuracy(fit_ets)

##                    ME     RMSE      MAE      MPE     MAPE      MASE
## Training set 19.55438 215.1894 172.3841 0.155552 1.462503 0.4569691
##                     ACF1
## Training set -0.03983657

# ARIMA model
accuracy(fit_ARIMA)

##                     ME     RMSE      MAE         MPE     MAPE      MASE
## Training set -1.902829 196.1934 148.8584 -0.03227272 1.233662 0.3946053
##                    ACF1
## Training set 0.00667003

# Neural Network
accuracy(NN)

##                      ME     RMSE      MAE         MPE      MAPE     MASE
## Training set -0.2510618 94.05648 72.79515 -0.01320968 0.6059007 0.192971
##                   ACF1
## Training set -0.020541

Based on the comparison of the error terms, MAE, RMSE and on MAPE criteria which have the advantage of being unit-free, the neural network models had the best performance by reaching the lower value for all indicator except for the ACF1. Based on RMSE which is also widely used to compare forecast models performance, what mean for the exact month the years before, they missing on average the RMSE value, we observe a higher level of accuracy from the ARIMA to the NN model. If we analyze the autocorrelation function ACF1, ARIMA model performs better 0.00667003, but NN has acceptable ACF1 results -0.01828772. NN have essential autocorrelation problem at lag 12 which we can potentially make the corrections.

Second strategy results:

Daily_sales_model <- window(x = RSXFSNtimeseries, start=c(1992), end=c(2004))
Daily_sales_test <- window(x = RSXFSNtimeseries, start=c(2004))
# ETS model
fit_ets_plus <- ets(Daily_sales_model)
Daily_sales_ETS_fc <- forecast(fit_ets_plus, h=24)
accuracy(Daily_sales_ETS_fc, Daily_sales_test)

##                     ME     RMSE      MAE         MPE     MAPE      MASE
## Training set -6.659537 190.2759 151.3922 -0.06390919 1.409688 0.4389858
## Test set     95.681251 248.1498 197.6170  0.67377130 1.505821 0.5730220
##                     ACF1 Theil's U
## Training set -0.13743327        NA
## Test set      0.01927045 0.2059301

# ARIMA model
fit_ARIMA_plus <- auto.arima(Daily_sales_model, d=1, D=1, stepwise = FALSE, approximation = FALSE)
Daily_sales_ARIMA_fc <- forecast(fit_ARIMA_plus, h=24)
accuracy(Daily_sales_ARIMA_fc, Daily_sales_test)

##                      ME     RMSE      MAE        MPE     MAPE      MASE
## Training set  -2.469742 183.6882 138.3438 -0.0338364 1.255896 0.4011497
## Test set     166.689943 280.8468 229.5354  1.2259530 1.739611 0.6655743
##                     ACF1 Theil's U
## Training set  0.01970801        NA
## Test set     -0.06924541 0.2334502

# NN model
Daily_sales_nnetar_auto <- nnetar(Daily_sales_model)
Daily_sales_nnetar_fc <- forecast(Daily_sales_nnetar_auto, h=24)
accuracy(Daily_sales_nnetar_fc, Daily_sales_test)

##                      ME     RMSE      MAE         MPE     MAPE      MASE
## Training set   1.051784 216.0604 173.1405 -0.03331723 1.555876 0.5020484
## Test set     518.654100 623.5177 521.2436  3.91961891 3.942675 1.5114287
##                    ACF1 Theil's U
## Training set 0.26727869        NA
## Test set     0.03453924 0.5221844

By applying this second comparative performance forecast strategy, I observed that if we split a time series to make a training set and a test set and compare the accuracy output it can induce to the wrong conclusion. A time series with a significant shock will distort the accuracy results. Based on this second strategy results ARIMA model performs globally better than NN model, which have lower ACF1 than ARIMA.

5. Make the forecast for the next 2 years

Finally, we will make our forecast for the next 24 months, with the most accurate and performance model. Based on our analysis we chose the NN model.

knitr::opts_chunk$set(echo = TRUE)
NN <- nnetar(RSXFSNtimeseries, lambda="auto")
fcast2 <- forecast(NN, h=24, PI=TRUE, npaths=100)
autoplot(fcast2, include = 60)

print(summary(fcast2))

## 
## Forecast method: NNAR(15,1,8)[12]
## 
## Model Information:
## 
## Average of 20 networks, each of which is
## a 15-8-1 network with 137 weights
## options were - linear output units 
## 
## Error measures:
##                    ME     RMSE      MAE          MPE      MAPE      MASE
## Training set 1.047128 91.98235 72.23767 -0.002223413 0.5975567 0.1914932
##                   ACF1
## Training set 0.0137776
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Aug 2018       14743.54 14554.60 14899.93 14479.52 14996.86
## Sep 2018       14322.19 14134.96 14455.40 14076.85 14579.33
## Oct 2018       14149.01 13912.26 14264.25 13809.45 14462.15
## Nov 2018       15524.45 15269.20 15726.40 15068.87 15843.50
## Dec 2018       16692.71 16423.32 16900.75 16295.78 17020.29
## Jan 2019       13011.42 12802.31 13217.13 12549.44 13305.77
## Feb 2019       13999.69 13705.79 14243.43 13473.22 14324.08
## Mar 2019       14725.98 14293.49 15036.28 14046.54 15213.96
## Apr 2019       14367.14 13997.27 14569.71 13707.94 14744.16
## May 2019       15325.98 14824.77 15639.26 14482.78 15799.60
## Jun 2019       14941.82 14486.19 15252.40 14209.25 15373.82
## Jul 2019       14418.99 13929.08 14735.79 13587.65 14882.55
## Aug 2019       14710.95 14267.86 15086.36 13717.27 15286.78
## Sep 2019       14242.45 13603.96 14578.98 13019.41 14804.51
## Oct 2019       14041.97 13178.53 14491.27 12628.54 14637.69
## Nov 2019       15264.08 14029.64 15979.65 12855.25 16099.60
## Dec 2019       16405.88 15154.30 16853.11 14165.15 17041.93
## Jan 2020       12797.66 11676.90 13334.76 10985.88 13501.50
## Feb 2020       13620.47 12323.81 14198.89 11560.87 14401.33
## Mar 2020       14128.51 12005.63 14893.59 11139.82 15189.65
## Apr 2020       13980.82 12209.46 14617.35 11445.65 14817.39
## May 2020       14751.80 12107.26 15693.93 11608.82 15991.52
## Jun 2020       14284.38 11983.45 15103.15 11637.22 15335.24
## Jul 2020       13802.47 11545.93 14628.03 11341.15 14831.27
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Aug 2018       14743.54 14554.60 14899.93 14479.52 14996.86
## Sep 2018       14322.19 14134.96 14455.40 14076.85 14579.33
## Oct 2018       14149.01 13912.26 14264.25 13809.45 14462.15
## Nov 2018       15524.45 15269.20 15726.40 15068.87 15843.50
## Dec 2018       16692.71 16423.32 16900.75 16295.78 17020.29
## Jan 2019       13011.42 12802.31 13217.13 12549.44 13305.77
## Feb 2019       13999.69 13705.79 14243.43 13473.22 14324.08
## Mar 2019       14725.98 14293.49 15036.28 14046.54 15213.96
## Apr 2019       14367.14 13997.27 14569.71 13707.94 14744.16
## May 2019       15325.98 14824.77 15639.26 14482.78 15799.60
## Jun 2019       14941.82 14486.19 15252.40 14209.25 15373.82
## Jul 2019       14418.99 13929.08 14735.79 13587.65 14882.55
## Aug 2019       14710.95 14267.86 15086.36 13717.27 15286.78
## Sep 2019       14242.45 13603.96 14578.98 13019.41 14804.51
## Oct 2019       14041.97 13178.53 14491.27 12628.54 14637.69
## Nov 2019       15264.08 14029.64 15979.65 12855.25 16099.60
## Dec 2019       16405.88 15154.30 16853.11 14165.15 17041.93
## Jan 2020       12797.66 11676.90 13334.76 10985.88 13501.50
## Feb 2020       13620.47 12323.81 14198.89 11560.87 14401.33
## Mar 2020       14128.51 12005.63 14893.59 11139.82 15189.65
## Apr 2020       13980.82 12209.46 14617.35 11445.65 14817.39
## May 2020       14751.80 12107.26 15693.93 11608.82 15991.52
## Jun 2020       14284.38 11983.45 15103.15 11637.22 15335.24
## Jul 2020       13802.47 11545.93 14628.03 11341.15 14831.27

6. Conclusion

This analysis compares the performance of some principal forecasting models used across different industries including one recently added neural network model. All details can potentially have a major impact on the forecast and on the decision made by a company. The accurate forecast will optimize the growth potential of any company.

7. Markdown report

Create markdown report with R Studio gives the opportunity to present results analysis in more convenient format to ours working partner without having specific knowledge about forecasting but who will use the forecast output for taking decisions.

USA Advanced Retail Sales 24 month forecasting analysis

Viktor Alexy

2018-08-21

Analysis objective

Preparative pre R upload

Models used: explanation

Analysis outline

1. Package uploaded for this analysis

2. Data Observarions

Time Plot

3. Preliminary data decomposition

Why we investigate Stationarity? (unit root test)

Why we investigate autocorrelation?

Why we investigate seasonality?

Why we investigate trend?

Conclusion of preliminary data decomposition

4. In this next step, we will try to find the most accurate model

4.1 Fit with seasonal naive method

Discussion

4.2. Fit ETS method, exponential smoothing method

Discussion

4.3. Fit on ARIMA model

Discussion

4.4 Neural network models

Discussion

4.6 Comparative forecast performance

5. Make the forecast for the next 2 years

6. Conclusion

7. Markdown report