I am working in a company that selling Agriculture Equipment, we have a product that marked as the highest sales since the beginning (about 9 years ago), i want to predict the sales for the next year in order to prepare how many stock that we need in order to fulfil the demand.
First, lets prepare the libraries.
library(tidyverse)
library(dplyr)
library(lubridate)
library(forecast)
library(MLmetrics)
library(tseries)
library(ggplot2)
library(plotly)
library(padr)
library(zoo)Next, we read the data. (I have remove product name in the data so it’s only shows Date and Quantity :) )
To make a timeseries data, first we need to have a sequential date (tanggal) in order and fill the empty QTY with average value, We are going to use pad and na.fill command.
sales <- sales %>%
mutate(TANGGAL = as.POSIXct(TANGGAL)) %>%
pad() %>% #extend the date range
mutate(QTY=na.fill(QTY, fill = "extend")) %>% #fill the NA column with average between notNA
select(TANGGAL, QTY)
head(sales, 5)Now we have a proper timeseries data, lets explore the data with HoltWinters method.
Creating timeseries the data shown are daily sales so i am using frequency 7*4 to get a better trend grasp.
visualize
It shows slightly sales increasement on late 2019, lets see a better detail with decompose() with multiplicative type
After we get the information, lets separate the data into train and test
test <- tail(sales_ts, 3) #using 3 days as test data
train <- head(sales_ts, length(sales_ts)-length(test))Model Fitting with HoltWinters method
Forecast Model
model_forecast <- forecast(object = model_sales, h = 24) #forecast for the next 24 days
sum(model_forecast$mean)## [1] 325.7463
I did compared this output with actual sales number, its about the same amount which is good, bet let see the visualisation, can the model predict it nicely?
Visualize the model
train %>% autoplot()+
autolayer(test, series = "test data")+
autolayer(model_forecast$mean, series = "forecast")accuracy test
## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set 3.484933 4.112942 3.484933 18.80892 18.80892 0 0.8861136
After the calculation steps above, it shows that the result for the next 24 days would be:
## [1] 325.7463
with MAPE 18,80%, I am quite confident with the result despite i have done cross-validation with the actual sales
One of my Mentor in my school tells that we can grasp a better pattern with Multi Seasonal Time Series (MSTS), its a method that can learn the-not-quite-tidy-time-series-data, we can add multiple frequency on reading the data, so let’s see the comparison.
For this i am going to use frequency of a week, a month and a year.
I am going to use only the data from 2011 to 2016 because it shows some unusual sales activity during 2017 to 2018 that can disrupt my model pattern.
sales.msts <- msts(sales$QTY, seasonal.periods = c(30, 30*3, 30*6), start = c(2011, 01,05), end = c(2016, 31,12))
sales.msts %>% mstl() %>% autoplot() From the visual above, it shows a better trend pattern so it means that we can grasp the data-pattern quite good compared to standard timeseries.
Cross-Validation
cv_train <- function(x){
# 30*12*3 is simply means daily time times 30 days times 12 months and 3 years
train <- head(x, length(x) - 30*6)
}
cv_test <- function(x){
test <- tail(x,30*6)
}
train_1 <- cv_train(sales.msts)
test_1 <- cv_test(sales.msts)Forecast& Evaluation
#forecast
forecast_1 <- forecast(object = mod_1,h = length(test_1))
#evaluation
eval_1 <- accuracy(as.vector(forecast_1$mean),test_1)
data.frame(eval_1)Visual 1
forecast_plot <- ggplot() +
geom_line(aes(x = (sales$TANGGAL %>% tail(30*6)),y = (sales$QTY %>% tail(30*6)),color = "actual")) +
geom_line(aes(x = (sales$TANGGAL %>% tail(30*6)),y = forecast_1$mean, color = "forecasted")) +
theme_minimal() +
labs(title = "Forecast using MSTS ARIMA",
x = "Date time", y = "Count")
ggplotly(forecast_plot)Lets try with log model
# forecast
forecast_1_adt <- forecast(object = mod_1_adt,h = length(test_1))
# evaluation
eval_1_adt <- data.frame(accuracy(as.vector(exp(forecast_1_adt$mean)),test_1)) %>%
mutate(Model = "ARIMA_2_adt")
eval_1_adtforecast_plot2 <- ggplot() +
geom_line(aes(x = (sales$TANGGAL %>% tail(30*12*5)),y = (sales$QTY %>% tail(30*12*5)),color = "actual")) +
geom_line(aes(x = (sales$TANGGAL %>% tail(30*6)),y = exp(forecast_1_adt$mean), color = "forecasted")) +
theme_minimal() +
labs(title = "Forecast using MSTS ARIMA",
x = "Date time", y = "Count")
ggplotly(forecast_plot2)## [1] 330.3879
By Using Multi-Seasonal Time Series, we’re able to grasp the data-pattern pretty well (by looking at the decomposed trend visual). Despite of the MAPE value are still quite high (45ish %), it’s still acceptable based on the business wise. I also checked with the forecast value that shows 330.38, it can precisely forecast with current sales condition.