As seen in the previous step, we found a linear model that let us predict demand for Takis based on a trend + whether that week had a payday or not. As we spotted that supermarkets in general behaved very similarly to takis in terms of trend and seasonality, now we’ll try to deploy this model to predict sales for this channel:

library(readr)
library(dplyr)
library(schoolmath)
library(caret)
# train <- readRDS(file = "train.rds")
# supermarkets <- train[train$Canal_ID == 2, ]
# supermarkets$payday <- ifelse(is.even(supermarkets$Semana), 0, 1)
# View(supermarkets %>%
#   select(Semana, payday) %>%
#   arrange(Semana) %>%
#   group_by(Semana) %>%
#   summarise(
#     payday = mean(payday)
#   ))

# supermarketstrain <- supermarkets[1:482537,]
# saveRDS(supermarketstrain, file = "supermarketstrain.rds")
supermarketstrain <- readRDS(file = "supermarketstrain.rds")

# supermarketstest <- supermarkets[482538:nrow(supermarkets),]
# saveRDS(supermarketstest, file = "supermarketstest.rds")
supermarketstest <- readRDS(file = "supermarketstest.rds")

fit <- train(Demanda_uni_equil ~ Semana + payday, data = supermarketstrain, method = "lm", na.action = na.pass)
predicteddemand <- predict(fit, supermarketstest)
postResample(pred = predicteddemand, obs = supermarketstest$Demanda_uni_equil)
##         RMSE     Rsquared          MAE 
## 9.290222e+01 5.533190e-08 5.494210e+01
# coef(lm(demanda ~ Semana + payday, data = train))

# test
# predicteddemand
lmpayday_error <- cbind(supermarketstest, predicteddemand)
lmpayday_error$error <- lmpayday_error$Demanda_uni_equil - lmpayday_error$predicteddemand
lmpayday_error$error_perc <- (lmpayday_error$error / lmpayday_error$Demanda_uni_equil)*100

# lmpayday_error
ggplot(lmpayday_error, aes(x = error_perc)) + geom_histogram()

lmpayday_error$error_perc <- round(lmpayday_error$error_perc, digits = 0)

lmpayday_error$error_bin <- cut(x = lmpayday_error$error_perc, breaks = c(-Inf, -2000, -100, -30, 30, 100, Inf))

ggplot(lmpayday_error, aes(x = error_bin)) + geom_bar()

Although we made some predictions ok (within a -30 - 30% error rate), most of the predictions are well above the actual demand. This could mean that there is a “low-performing” product category or client that we are not capturing.