Problem Statement

The provided spreadsheet shows monthly sales (in 10,000 rupees) of a certain company (Company X) from January 2011 to May 2017. The ultimate objective of this exercise is to predict sales for the period June 2017 to December 2018.

Read the data as a time series object in R. Plot the data. What are the major features you notice in this series? Use a moving average to smooth the data. Plot the series and the MA smoother. Is there a linear trend in the data? Using multiplicative seasonality estimate seasonal indices. In which month(s) do you see higher sales and which month(s) you see lower sales? Do the residuals form a stationary series? Do a formal test for stationarity writing down the null and alternative hypothesis. What is your conclusion? For the above parts, wherever applicable, you can do calculations in MS Excel or you may use R. The final output must be presented in MS Excel under the following column headings. Year, Month, Sales, MA, Detrended Data, Seasonal Index, Random Component

For the parts below you need to use R. MAPE calculation may be done in R by writing your own code, or you can copy the forecasted values in MS Excel and do the calculation.

Use the last 17 months as hold-out sample (Jan 2016 - May 2017). Fit a suitable exponential smoothing model to the rest of the data and calculate MAPE. You may try out different models and use the one which has minimum MAPE. Note that the forecasted values are saved in the component $mean. Use the ‘best’ model obtained from above to forecast for the period June 2017 to December 2018. Provide forecasted values as well as their upper and lower confidence limits.

Solution

Read the data

library(readxl)
SalesData <- read_excel("./Sales.xlsx")

Observations of the time series

library(smooth)
## Warning: package 'smooth' was built under R version 3.3.3
## This is package "smooth", v2.1.0
library(forecast)
## Warning: package 'forecast' was built under R version 3.3.3
library(graphics)
library(datasets)
library(tseries)
## Warning: package 'tseries' was built under R version 3.3.3
# Convert to a timeseries object explicitly stating start and end points and declaring monthly 
# data
SalesTS <- ts(SalesData[,c(-1,-2)], start=c(2011,1), end=c(2017,5), frequency=12)
plot(SalesTS)

  • Trend - Visual Inspection - Minor Up
  • Seasonality: Peaks towards the end of the year, Nov specifically; lows towards the 2nd quarter of the year.
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.3.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
HighMnths <- SalesData %>%
                filter(Year != 2017) %>%
                group_by(Year) %>%
                filter(Sales == max(Sales))
HighMnths
## Source: local data frame [6 x 3]
## Groups: Year [6]
## 
##    Year Month Sales
##   <dbl> <chr> <dbl>
## 1  2011   Nov   298
## 2  2012   Nov   375
## 3  2013   Nov   488
## 4  2014   Nov   612
## 5  2015   Nov   711
## 6  2016   Nov   895
library(dplyr)
LowMnths <- SalesData %>%
        filter(Year != 2017) %>%
        group_by(Year) %>%
        filter(Sales == min(Sales))
LowMnths
## Source: local data frame [6 x 3]
## Groups: Year [6]
## 
##    Year Month Sales
##   <dbl> <chr> <dbl>
## 1  2011   May    36
## 2  2012   May    78
## 3  2013   May    75
## 4  2014   Apr   141
## 5  2015   Jun   186
## 6  2016   May   189
  • Outliers - None
  • Long Term Cycle - None
  • Constant Variance - Not Considered
  • Abrupt Change - None

Trend using Moving Average

As there is seasonality in the time series we will use a 12 period moving average.

SalesMA12 <- ma(SalesTS, order=12) 
ts.plot(SalesTS, SalesMA12, lty=c(1:2), col=c('black','red'))

Decomposition of the time series

This is a multiplicative time series as there is both up trend along with seasonality.

# Decompose time series into multiplicative components #
Sales.mult <- decompose(SalesTS, type="multiplicative")
plot(Sales.mult)

plot(Sales.mult$figure, type="l") # Plots seasonality indices

plot(Sales.mult$trend, type="l") # Plots trend

plot(Sales.mult$random, type="l") # Plots random component

Write to file

Write the decomposed components to a file.

SalesData$MA <- Sales.mult$trend
SalesData$DetrendedData <- SalesData$Sales / Sales.mult$trend 
SeasonalIndex <- rep(Sales.mult$figure, 6)
SeasonalIndex[73:77] <- Sales.mult$figure[1:5]
SalesData$SeasonalIndex <- SeasonalIndex
SalesData$RandomComponent <- Sales.mult$random
#SalesData$Sales / ( SalesData$MA * SalesData$SeasonalIndex)

library(xlsx) #load the package
## Loading required package: rJava
## Loading required package: xlsxjars
write.xlsx(x = as.data.frame(SalesData), file = "TSAssignment_Gr4.xlsx",
           sheetName = "TestSheet", row.names = FALSE)

Test of Stationarity

We need to ensure that the time series is stationary before we can use the series for prediction. We will use the Augmented Dickey-Fuller Test to test stationarity, - Null hypothesis: Series non-stationary - Alternative hypothesis: Series stationary

adf.test(na.omit(Sales.mult$random)) 
## Warning in adf.test(na.omit(Sales.mult$random)): p-value smaller than
## printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  na.omit(Sales.mult$random)
## Dickey-Fuller = -4.1984, Lag order = 3, p-value = 0.01
## alternative hypothesis: stationary

As the p-value is less than 0.05, the null hypothesis is rejected, thus the series is stationary.

Modeling | Holt Winters Model

We will use all the combinations between 0.1 to 0.9 for alpha, beta and gamma each. Thus, in total we will train 9x9x9 = 729 models to find the combination that reduces the error metric(=MAPE) the most.

SalesTrain <- window(SalesTS, start=c(2011,1), end=c(2015,12))
SalesTest <- window(SalesTS, start=c(2016,1), end=c(2017,5))
plot(SalesTrain)

grid <- as.data.frame(matrix(0,729,4))
colnames(grid) <- c("alpha","beta" ,"gamma","MAPE")

counter <- 1

for(a in 1:9){
        for(b in 1:9){
                for (g in 1:9){
                
                grid$alpha[counter] <- a/10
                grid$beta[counter] <- b/10 
                grid$gamma[counter] <- g/10
                
                Sales.fit <- HoltWinters(SalesTrain, alpha= a/10, beta= b/10, gamma= g/10,
                                         seasonal = "mult")
                Sales.forecast <- forecast(Sales.fit, 17)
                Vec1<- cbind(SalesTest,Sales.forecast$mean)
                grid$MAPE[counter]<- mean(abs(Vec1[,1]-Vec1[,2])/Vec1[,1])                
                
                 
                counter <- counter + 1
                }
        }
}
  • Best Fitted Model
grid[which.min(grid$MAPE),]
##   alpha beta gamma       MAPE
## 2   0.1  0.1   0.2 0.09669882
  • The values of alpha = 0.1, beta = 0.1 and gamma = 0.2 results in the lowest MAPE
  • Let’s build the model using these values.
  • Lets visualize the predicted values against the actual values in the hold out period.
Sales.fit <- HoltWinters(SalesTrain, alpha= 0.1, beta= 0.1, gamma= 0.2, seasonal = "mult")
Sales.forecast <- forecast(Sales.fit, 17)
Vec1<- cbind(SalesTest,Sales.forecast$mean)
ts.plot(Vec1, col=c("blue", "red"))

The fitted values do not seem to be biased and follow the hold out series well.

Predictions

Let’s now make predictions for the period of June 2017 to December 2018, a period of 19 months.

SalesTrain <- window(SalesTS, start=c(2011,1), end=c(2017,5))
Sales.fit <- HoltWinters(SalesTrain, alpha= 0.1, beta= 0.1, gamma= 0.2, seasonal = "mult")
Sales.forecast <- forecast(Sales.fit, 19)
plot(Sales.forecast)