Solution

Read the data

library(readxl)
SalesData <- read_excel("./Sales.xlsx")

Observations of the time series

library(smooth)

## Warning: package 'smooth' was built under R version 3.3.3

## This is package "smooth", v2.1.0

library(forecast)

## Warning: package 'forecast' was built under R version 3.3.3

library(graphics)
library(datasets)
library(tseries)

## Warning: package 'tseries' was built under R version 3.3.3

# Convert to a timeseries object explicitly stating start and end points and declaring monthly 
# data
SalesTS <- ts(SalesData[,c(-1,-2)], start=c(2011,1), end=c(2017,5), frequency=12)
plot(SalesTS)

Trend - Visual Inspection - Minor Up
Seasonality: Peaks towards the end of the year, Nov specifically; lows towards the 2nd quarter of the year.

library(dplyr)

## Warning: package 'dplyr' was built under R version 3.3.3

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

HighMnths <- SalesData %>%
                filter(Year != 2017) %>%
                group_by(Year) %>%
                filter(Sales == max(Sales))
HighMnths

## Source: local data frame [6 x 3]
## Groups: Year [6]
## 
##    Year Month Sales
##   <dbl> <chr> <dbl>
## 1  2011   Nov   298
## 2  2012   Nov   375
## 3  2013   Nov   488
## 4  2014   Nov   612
## 5  2015   Nov   711
## 6  2016   Nov   895

library(dplyr)
LowMnths <- SalesData %>%
        filter(Year != 2017) %>%
        group_by(Year) %>%
        filter(Sales == min(Sales))
LowMnths

## Source: local data frame [6 x 3]
## Groups: Year [6]
## 
##    Year Month Sales
##   <dbl> <chr> <dbl>
## 1  2011   May    36
## 2  2012   May    78
## 3  2013   May    75
## 4  2014   Apr   141
## 5  2015   Jun   186
## 6  2016   May   189

Outliers - None
Long Term Cycle - None
Constant Variance - Not Considered
Abrupt Change - None

Trend using Moving Average

As there is seasonality in the time series we will use a 12 period moving average.

SalesMA12 <- ma(SalesTS, order=12) 
ts.plot(SalesTS, SalesMA12, lty=c(1:2), col=c('black','red'))

Decomposition of the time series

This is a multiplicative time series as there is both up trend along with seasonality.

# Decompose time series into multiplicative components #
Sales.mult <- decompose(SalesTS, type="multiplicative")
plot(Sales.mult)

plot(Sales.mult$figure, type="l") # Plots seasonality indices

plot(Sales.mult$trend, type="l") # Plots trend

plot(Sales.mult$random, type="l") # Plots random component

Write to file

Write the decomposed components to a file.

SalesData$MA <- Sales.mult$trend
SalesData$DetrendedData <- SalesData$Sales / Sales.mult$trend 
SeasonalIndex <- rep(Sales.mult$figure, 6)
SeasonalIndex[73:77] <- Sales.mult$figure[1:5]
SalesData$SeasonalIndex <- SeasonalIndex
SalesData$RandomComponent <- Sales.mult$random
#SalesData$Sales / ( SalesData$MA * SalesData$SeasonalIndex)

library(xlsx) #load the package

## Loading required package: rJava

## Loading required package: xlsxjars

write.xlsx(x = as.data.frame(SalesData), file = "TSAssignment_Gr4.xlsx",
           sheetName = "TestSheet", row.names = FALSE)

Test of Stationarity

We need to ensure that the time series is stationary before we can use the series for prediction. We will use the Augmented Dickey-Fuller Test to test stationarity, - Null hypothesis: Series non-stationary - Alternative hypothesis: Series stationary

adf.test(na.omit(Sales.mult$random))

## Warning in adf.test(na.omit(Sales.mult$random)): p-value smaller than
## printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  na.omit(Sales.mult$random)
## Dickey-Fuller = -4.1984, Lag order = 3, p-value = 0.01
## alternative hypothesis: stationary

As the p-value is less than 0.05, the null hypothesis is rejected, thus the series is stationary.

Modeling | Holt Winters Model

We will use all the combinations between 0.1 to 0.9 for alpha, beta and gamma each. Thus, in total we will train 9x9x9 = 729 models to find the combination that reduces the error metric(=MAPE) the most.

SalesTrain <- window(SalesTS, start=c(2011,1), end=c(2015,12))
SalesTest <- window(SalesTS, start=c(2016,1), end=c(2017,5))
plot(SalesTrain)

grid <- as.data.frame(matrix(0,729,4))
colnames(grid) <- c("alpha","beta" ,"gamma","MAPE")

counter <- 1

for(a in 1:9){
        for(b in 1:9){
                for (g in 1:9){
                
                grid$alpha[counter] <- a/10
                grid$beta[counter] <- b/10 
                grid$gamma[counter] <- g/10
                
                Sales.fit <- HoltWinters(SalesTrain, alpha= a/10, beta= b/10, gamma= g/10,
                                         seasonal = "mult")
                Sales.forecast <- forecast(Sales.fit, 17)
                Vec1<- cbind(SalesTest,Sales.forecast$mean)
                grid$MAPE[counter]<- mean(abs(Vec1[,1]-Vec1[,2])/Vec1[,1])                
                
                 
                counter <- counter + 1
                }
        }
}

Best Fitted Model

grid[which.min(grid$MAPE),]

##   alpha beta gamma       MAPE
## 2   0.1  0.1   0.2 0.09669882

The values of alpha = 0.1, beta = 0.1 and gamma = 0.2 results in the lowest MAPE
Let’s build the model using these values.
Lets visualize the predicted values against the actual values in the hold out period.

Sales.fit <- HoltWinters(SalesTrain, alpha= 0.1, beta= 0.1, gamma= 0.2, seasonal = "mult")
Sales.forecast <- forecast(Sales.fit, 17)
Vec1<- cbind(SalesTest,Sales.forecast$mean)
ts.plot(Vec1, col=c("blue", "red"))

The fitted values do not seem to be biased and follow the hold out series well.

Predictions

Let’s now make predictions for the period of June 2017 to December 2018, a period of 19 months.

SalesTrain <- window(SalesTS, start=c(2011,1), end=c(2017,5))
Sales.fit <- HoltWinters(SalesTrain, alpha= 0.1, beta= 0.1, gamma= 0.2, seasonal = "mult")
Sales.forecast <- forecast(Sales.fit, 19)
plot(Sales.forecast)

TimeSeries | Decomposition

Saurabh Sindwani

14 August 2017

Problem Statement