The provided spreadsheet shows monthly sales (in 10,000 rupees) of a certain company (Company X) from January 2011 to May 2017. The ultimate objective of this exercise is to predict sales for the period June 2017 to December 2018.
Read the data as a time series object in R. Plot the data. What are the major features you notice in this series? Use a moving average to smooth the data. Plot the series and the MA smoother. Is there a linear trend in the data? Using multiplicative seasonality estimate seasonal indices. In which month(s) do you see higher sales and which month(s) you see lower sales? Do the residuals form a stationary series? Do a formal test for stationarity writing down the null and alternative hypothesis. What is your conclusion? For the above parts, wherever applicable, you can do calculations in MS Excel or you may use R. The final output must be presented in MS Excel under the following column headings. Year, Month, Sales, MA, Detrended Data, Seasonal Index, Random Component
For the parts below you need to use R. MAPE calculation may be done in R by writing your own code, or you can copy the forecasted values in MS Excel and do the calculation.
Use the last 17 months as hold-out sample (Jan 2016 - May 2017). Fit a suitable exponential smoothing model to the rest of the data and calculate MAPE. You may try out different models and use the one which has minimum MAPE. Note that the forecasted values are saved in the component $mean. Use the ‘best’ model obtained from above to forecast for the period June 2017 to December 2018. Provide forecasted values as well as their upper and lower confidence limits.
library(readxl)
SalesData <- read_excel("./Sales.xlsx")
library(smooth)
## Warning: package 'smooth' was built under R version 3.3.3
## This is package "smooth", v2.1.0
library(forecast)
## Warning: package 'forecast' was built under R version 3.3.3
library(graphics)
library(datasets)
library(tseries)
## Warning: package 'tseries' was built under R version 3.3.3
# Convert to a timeseries object explicitly stating start and end points and declaring monthly
# data
SalesTS <- ts(SalesData[,c(-1,-2)], start=c(2011,1), end=c(2017,5), frequency=12)
plot(SalesTS)
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.3.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
HighMnths <- SalesData %>%
filter(Year != 2017) %>%
group_by(Year) %>%
filter(Sales == max(Sales))
HighMnths
## Source: local data frame [6 x 3]
## Groups: Year [6]
##
## Year Month Sales
## <dbl> <chr> <dbl>
## 1 2011 Nov 298
## 2 2012 Nov 375
## 3 2013 Nov 488
## 4 2014 Nov 612
## 5 2015 Nov 711
## 6 2016 Nov 895
library(dplyr)
LowMnths <- SalesData %>%
filter(Year != 2017) %>%
group_by(Year) %>%
filter(Sales == min(Sales))
LowMnths
## Source: local data frame [6 x 3]
## Groups: Year [6]
##
## Year Month Sales
## <dbl> <chr> <dbl>
## 1 2011 May 36
## 2 2012 May 78
## 3 2013 May 75
## 4 2014 Apr 141
## 5 2015 Jun 186
## 6 2016 May 189
As there is seasonality in the time series we will use a 12 period moving average.
SalesMA12 <- ma(SalesTS, order=12)
ts.plot(SalesTS, SalesMA12, lty=c(1:2), col=c('black','red'))
This is a multiplicative time series as there is both up trend along with seasonality.
# Decompose time series into multiplicative components #
Sales.mult <- decompose(SalesTS, type="multiplicative")
plot(Sales.mult)
plot(Sales.mult$figure, type="l") # Plots seasonality indices
plot(Sales.mult$trend, type="l") # Plots trend
plot(Sales.mult$random, type="l") # Plots random component
Write the decomposed components to a file.
SalesData$MA <- Sales.mult$trend
SalesData$DetrendedData <- SalesData$Sales / Sales.mult$trend
SeasonalIndex <- rep(Sales.mult$figure, 6)
SeasonalIndex[73:77] <- Sales.mult$figure[1:5]
SalesData$SeasonalIndex <- SeasonalIndex
SalesData$RandomComponent <- Sales.mult$random
#SalesData$Sales / ( SalesData$MA * SalesData$SeasonalIndex)
library(xlsx) #load the package
## Loading required package: rJava
## Loading required package: xlsxjars
write.xlsx(x = as.data.frame(SalesData), file = "TSAssignment_Gr4.xlsx",
sheetName = "TestSheet", row.names = FALSE)
We need to ensure that the time series is stationary before we can use the series for prediction. We will use the Augmented Dickey-Fuller Test to test stationarity, - Null hypothesis: Series non-stationary - Alternative hypothesis: Series stationary
adf.test(na.omit(Sales.mult$random))
## Warning in adf.test(na.omit(Sales.mult$random)): p-value smaller than
## printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: na.omit(Sales.mult$random)
## Dickey-Fuller = -4.1984, Lag order = 3, p-value = 0.01
## alternative hypothesis: stationary
As the p-value is less than 0.05, the null hypothesis is rejected, thus the series is stationary.
We will use all the combinations between 0.1 to 0.9 for alpha, beta and gamma each. Thus, in total we will train 9x9x9 = 729 models to find the combination that reduces the error metric(=MAPE) the most.
SalesTrain <- window(SalesTS, start=c(2011,1), end=c(2015,12))
SalesTest <- window(SalesTS, start=c(2016,1), end=c(2017,5))
plot(SalesTrain)
grid <- as.data.frame(matrix(0,729,4))
colnames(grid) <- c("alpha","beta" ,"gamma","MAPE")
counter <- 1
for(a in 1:9){
for(b in 1:9){
for (g in 1:9){
grid$alpha[counter] <- a/10
grid$beta[counter] <- b/10
grid$gamma[counter] <- g/10
Sales.fit <- HoltWinters(SalesTrain, alpha= a/10, beta= b/10, gamma= g/10,
seasonal = "mult")
Sales.forecast <- forecast(Sales.fit, 17)
Vec1<- cbind(SalesTest,Sales.forecast$mean)
grid$MAPE[counter]<- mean(abs(Vec1[,1]-Vec1[,2])/Vec1[,1])
counter <- counter + 1
}
}
}
grid[which.min(grid$MAPE),]
## alpha beta gamma MAPE
## 2 0.1 0.1 0.2 0.09669882
Sales.fit <- HoltWinters(SalesTrain, alpha= 0.1, beta= 0.1, gamma= 0.2, seasonal = "mult")
Sales.forecast <- forecast(Sales.fit, 17)
Vec1<- cbind(SalesTest,Sales.forecast$mean)
ts.plot(Vec1, col=c("blue", "red"))
The fitted values do not seem to be biased and follow the hold out series well.
Let’s now make predictions for the period of June 2017 to December 2018, a period of 19 months.
SalesTrain <- window(SalesTS, start=c(2011,1), end=c(2017,5))
Sales.fit <- HoltWinters(SalesTrain, alpha= 0.1, beta= 0.1, gamma= 0.2, seasonal = "mult")
Sales.forecast <- forecast(Sales.fit, 19)
plot(Sales.forecast)