Background

Introduction

A product backorder is an order for a good or service that cannot be filled at the current time due to a lack of available supply1. The item may not be held in the company’s available inventory but could still be in production, or the company may need to still manufacture more of the product. A backorder is a sign that the supply is limited and cannot accomodate the high demand from the customer.

What is the consequence of a product backorder? We are talking about unfulfilled demand, which means that we have a potential customers, potential market share and most importantly, potential revenues. Studies show that backorders can erode profits and incur have high costs of USD 15 to USD 20 per backorder fulfillment2. We will lost a great deal of revenues and sales because we cannot afford to give them their desired product. This can, in turn, lead to churn of customers to competitors, decreasing our market share and perhaps can damage our brand reputation. Therefore, predicting a product backorder is essential in day-to-day business. Several methods have been done in the past to manage the inventory and avoid back order, such as using Economic Order Quantitiy (EOQ)3.

With the rise of data science and their potential benefit for various industries, can we apply a machine learning model to predict a product backorder? This article will guide you on how to apply data science skills toward inventory management. The step-by-step explanation will be accompanied by R codes along the way if you wish to reproduce the result.

Library and Setup

Here are the following packages if you wish to reproduce the result.

Dataset

The data is acquired from Kaggle Competition: Can You Predict Product Backorders? However, the page has been taken down. If you wish to acquire the dataset, you can visit here .

The data consists of more than 1.9 millions of observations with 23 column/variables. The data contains historical records for the 8 weeks prior to the week we are trying to predict. The data were taken as weekly snapshots at the start of each week. We will predict if an item would go to backorder based on the present variables.

## Observations: 1,929,935
## Variables: 23
## $ sku               <int> 1026827, 1043384, 1043696, 1043852, 1044048, 1044...
## $ national_inv      <int> 0, 2, 2, 7, 8, 13, 1095, 6, 140, 4, 0, 20, 18, 29...
## $ lead_time         <int> NA, 9, NA, 8, NA, 8, NA, 2, NA, 8, 2, NA, NA, NA,...
## $ in_transit_qty    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ forecast_3_month  <int> 0, 0, 0, 0, 0, 0, 0, 0, 15, 0, 0, 0, 0, 0, 0, 0, ...
## $ forecast_6_month  <int> 0, 0, 0, 0, 0, 0, 0, 0, 114, 0, 0, 0, 0, 0, 0, 0,...
## $ forecast_9_month  <int> 0, 0, 0, 0, 0, 0, 0, 0, 152, 0, 0, 0, 0, 0, 0, 0,...
## $ sales_1_month     <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sales_3_month     <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sales_6_month     <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sales_9_month     <int> 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ min_bank          <int> 0, 0, 0, 1, 2, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ potential_issue   <fct> No, No, No, No, No, No, No, No, No, No, No, No, N...
## $ pieces_past_due   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ perf_6_month_avg  <dbl> -99.00, 0.99, -99.00, 0.10, -99.00, 0.82, -99.00,...
## $ perf_12_month_avg <dbl> -99.00, 0.99, -99.00, 0.13, -99.00, 0.87, -99.00,...
## $ local_bo_qty      <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ deck_risk         <fct> No, No, Yes, No, Yes, No, Yes, Yes, No, No, No, Y...
## $ oe_constraint     <fct> No, No, No, No, No, No, No, No, No, No, No, No, N...
## $ ppap_risk         <fct> No, No, No, No, No, No, No, Yes, No, No, No, No, ...
## $ stop_auto_buy     <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes,...
## $ rev_stop          <fct> No, No, No, No, No, No, No, No, No, No, No, No, N...
## $ went_on_backorder <fct> No, No, No, No, No, No, No, No, No, No, No, No, N...

Data Description:

  • sku : Stock Keeping Unit, ID for the product
  • national_inv : Current inventory level for the part
  • lead_time : Transit time for product (if available)
  • in_transit_qty – Amount of product in transit from source
  • forecast_3_month : Forecast sales for the next 3 months
  • forecast_6_month : Forecast sales for the next 6 months
  • forecast_9_month : Forecast sales for the next 9 months
  • sales_1_month : Sales quantity for the prior 1 month time period
  • sales_3_month : Sales quantity for the prior 3 month time period
  • sales_6_month : Sales quantity for the prior 6 month time period
  • sales_9_month : Sales quantity for the prior 9 month time period
  • min_bank : Minimum recommend amount to stock
  • potential_issue : Source issue for part identified
  • pieces_past_due : Parts overdue from source
  • perf_6_month_avg : Source performance for prior 6 month period
  • perf_12_month_avg : Source performance for prior 12 month period
  • local_bo_qty : Amount of stock orders overdue
  • deck_risk : Part risk flag
  • oe_constraint : Part risk flag
  • ppap_risk : Part risk flag
  • stop_auto_buy : Part risk flag
  • rev_stop : Part risk flag
  • went_on_backorder : Product Went on backorder

Modeling Workflow

Data Cleansing

Let’s see how many unique ID (sku) in this dataset.

## [1] 1929935

The number of unique ID is exactly the same as the number of rows. Therefore, we can remove them from our data, or we can make them as the rowname instad.

Let’s check the number of missing values on each variables.

##      national_inv         lead_time    in_transit_qty  forecast_3_month 
##                 0            115617                 0                 0 
##  forecast_6_month  forecast_9_month     sales_1_month     sales_3_month 
##                 0                 0                 0                 0 
##     sales_6_month     sales_9_month          min_bank   potential_issue 
##                 0                 0                 0                 0 
##   pieces_past_due  perf_6_month_avg perf_12_month_avg      local_bo_qty 
##                 0                 0                 0                 0 
##         deck_risk     oe_constraint         ppap_risk     stop_auto_buy 
##                 0                 0                 0                 0 
##          rev_stop went_on_backorder 
##                 0                 0

lead_time has a lot of missing values. But as we watch closely, the NAs didn’t really signify a missing values, but it represent a lack of transit time. Therefore, we can impute/change the value by 0 (no transit time).

Exploratory Data Analysis

Let’s check the class proportion of the target variable

## 
##         Yes          No 
## 0.007244285 0.992755715

A backorder is actually rarely happened. This pose a challenge since the classes are highly imbalanced. Let’s explore further by applying PCA to see if the backordered observation belongs to an outlier.

Dimension 1 and 2 can explain more than 60% of the total variances from the data. Now we will visualize the individual observations to see if the parts that went backorder is outliers.

Let’s also check the variable plot

Based on the result, most of the positive class (backordered items) is positioned on the axis of the first dimensions, indicating that they are most likely have low values on almost everything since most of variables are only contributing toward the first dimension (PC1). The individual plot also suggests that the positive class is not outliers because they are located closely to the negative class. Therefore, anomaly detection may not the best choice for this problem. We will go with generic classification problem and try to use Random Forest.

Cross-Validation

We will split the data into data training and data testing, with 80% of the data will belong to the data training.

## <1543949/385986/1929935>

Since the data is highly imbalance, with the positive class only consists of less than 1% of the total data, doing upsampling may be an unwise decision. We must also consider the computation burden. With more than 1 million of observations, cutting down the number of sample with downsampling is more approriate. Another approach to class imbalance is by creating a synthetic data via SMOTE. However, SMOTE only works for data that consists purely of numerical features and cannot handle categorical features.

We will downsample the data train to get a balanced class.

## 
## Yes  No 
## 0.5 0.5

Model Fitting

We will create a Random Forest model to predict the probability of an item to be went on backorder.

## Random Forest 
## 
## 22372 samples
##    21 predictor
##     2 classes: 'Yes', 'No' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 20135, 20134, 20135, 20135, 20134, 20135, ... 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa    
##    2    0.8725033  0.7450064
##   11    0.9018406  0.8036814
##   21    0.8987863  0.7975728
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 11.

The Random Forest model has Out of Bag Error Rate of 9.7%, so the accuracy of our model towards unseen data is around 90% based on the out of bag samples. However, since the target classes is highly imbalanced, we cannot take the accuracy metric at face value. We also need to check the other metrics, especially those that deals with the positive class: Recall and Precision.

Model Evaluation

Confusion Matrix

First, we inspect the confusion matrix. Confusion matrix shows how good our model in predicting the target (wheter the product went on back order or not).

##           Actual
## Prediction    Yes     No
##        Yes   2574  44913
##        No     221 338278

We can translate the confusion matrix into 4 metrics as follows.

The model has good performance in terms of recall, which means that the model can correctly predict positive class 90% of the time. This can also be interpreted that the model has low False Negative prediction: falsely predict a backorder case (positive) to be a negative one. This is great because if an item that should be on backorder but predicted as negative (no need to backorder), then we will loss revenues since our initial goal is to correctly identifiy if a product should went on backorder in order to fulfill the market demand. However, the Precision of our model is pretty low. This means that we have a lot of False Positive case (negative case predicted as need to backorder) compared to the True Positive (correctly predict that a product need a backorder). Therefore, we will end up ordering a lot of products and potentially we will have a huge inventory cost because most of the products will not sold quickly. This is an interesting dilemma. If the store owner wish to avoid any backorder at all cost (not accounting or undermining the inventory cost), then this model is great since it has high Recall. However, if the store owner want to consider the inventory cost and any other cost associated with ordering items, then we will need a better model or improve this one. We will choose the second option and see what we can do in order to minimize the damage of the cost associated with the False Positive case.

ROC Curve

ROC Curve can be used to evaluate the model ability to distinguish the positive class with the negative class and indicated by the True Positive Rate (Recall) and False Positive Rate (1 - Specificity). A great model should have TPR value close to 1 (perfect Recall) and False Positive Rate close to 0.

Based on the Curve, the model has satisying result. The low False Positive Rate means that our model have a low False Positive case compared to the True Negative case. Let’s check the Area Under ROC Curve (ROC) of our model if we wish to compare the Random Forest with other model.

Precision-Recall Curve

Another curve that we can use to observe the behaviour of our model is the Precision-Recall Curve. It visualizes the trade-off between Precision and Recall. We wish to see the equilibrium (middle point) that can give us good Precision and Recall.

As we can see, the curve is not pretty because it shows us that in order to increase the Precision, we must sacrifice a lot by dropping the Recall value. Let’s say we wish to have a Precision of 60%, we will have Recall value less than 20% as the result. Our model is not flexible enough at transitioning between Precision and Recall.

Profit Curve

Another approach is by calculating the expected profit given the cost and benefit. Based on the expected profit, we can see how many instance that we should predict as a positive case in order to maximize the profit. In order to analyze the profit, we can create a curve, called the Profit Curve and illustrated as the following graph4.

There are 3 different classifier models and a random classifier. The best model, according to the Profit Curve, is classifier 2 because it can produce maximum profit by predicting top 50% of items based on the probability to belong to the positive class. So, in Profit Curve, instead of dealing with the label class, we will use the probability.

The expected profit can be formulated as follows:

\[Expected\ Profit = P(TP)\ Benefit(TP) + P(TN)\ Benefit(TN) + P(FP)\ Benefit(FP) + P(FN)\ Benefit(TN) \]

The \(P(TP)\) indicate the probability of the prediction to be a True Positve, while the \(Benefit(TP)\) indicate how many benefit that we can get if it’s true. For example, if the cost of ordering a product is 100 and the revenue is 150, let’s try to calculate the data based on the confusion matrix of our model.

##           Actual
## Prediction   Yes    No
##        Yes 0.007 0.116
##        No  0.001 0.876

Let’s break down the calculation into 4 points:

  • If the product belong to the True Positive case, then we would get a revenue of USD 150 and a cost of USD 10. The benefit would be: 500-10 = USD 490.
  • If the product belong to the True Negative case, then we would not get any revenue and don’t have any cost because we don’t order any items. The benefit would be 0.
  • If the product belong to the False Positive case, then we would not get any revenue but we get a cost of 20. Then the benefit would be -20.
  • If the product belong to the False Negative case, then we would not get any revenue and don’t have any cost as well. The benefit would be 0. However, we may want to consider any opportunity cost associated with the loss revenue due to unable to fulfill the demand. We can add opportunity cost of 50 so the benefit would be -50.
  • Calculate the expected profit:

\[Expected\ Profit = 0.07\ 490 + 0.876\ 0 + 0.116\ (-20) + 0.001\ (-50) = 1.06\]

Based on the calculation above, we will get an expected profit of -17.36, which means that we will be more likely to have a lost and gain no profit.

I hope the illustration is clear enough before we go further. Now we will draw the Profit Curve. Since the data consists of millions of items, the cost and profit may vary greatly. However, for simplicity, I will use a uniform cost and profit value. The benefit will represent a cost and profit per unit.

First, we create the function to calculate the expected profit.

The next step is to draw the Profit Curve.

profit_value <- matrix(nrow = nrow(pred_test), ncol = 3)

# sort data
pred_test <- pred_test %>% arrange(desc(Yes))

# Sample 100 different thresholds
for (i in seq(1, nrow(pred_test), nrow(pred_test)*0.01)) {
  
 # set threshold
 thres <- pred_test$Yes[i]
 profit_value[i, 1] <- thres

 # classify
 pred_test <- pred_test %>% 
   mutate(predicted = ifelse(Yes < thres, "No", "Yes"))
 profit_value[i, 2] <- pred_test %>% filter(predicted == "Yes") %>% nrow()/nrow(pred_test)
 
 # expected profit
 confuse <- table(Prediction = pred_test$predicted, Actual = pred_test$actual) %>% 
   prop.table() %>% 
  rbind(matrix(0L, nrow = 2, ncol = 2) %>% `rownames<-`(c("No", "Yes"))) %>% 
  unique()
 
 profit_value[i, 3] <- expect_profit(confuse)
}

profit_value <- profit_value %>% 
  as.data.frame() %>% 
  `colnames<-`(c("threshold", "percent_yes", "expected_profit"))

# Get threshold with maximum profit
max_profit <- profit_value %>% 
  arrange(desc(expected_profit)) %>% 
  slice(1)

current_profit <- profit_value %>% 
  arrange(threshold) %>% 
  filter(threshold >= 0.5) %>% 
  slice(1)

# Visualize the result
profit_value %>% 
  ggplot(aes(threshold, expected_profit)) +
  geom_line(color = "firebrick") +
  geom_hline(yintercept = 0, linetype = "dashed", color = "skyblue4") +
  geom_vline(xintercept = 0.5, linetype = "dashed", color = "skyblue4") +
  geom_vline(xintercept = max_profit$threshold, linetype = "dashed", color = "green") +
  geom_point(data = max_profit, size = 3) +
  ggrepel::geom_text_repel(data= max_profit, box.padding = 3, nudge_x = 0.1,
                           aes(label = glue::glue("Max Profit
                                                  {comma(expected_profit, accuracy = 0.01)}"))
                               ) +
  geom_point(data = current_profit, size = 3) +
  ggrepel::geom_text_repel(data= current_profit, box.padding = 2, nudge_x = 0.1,
                           aes(label = glue::glue("Current Profit
                                                  {comma(expected_profit, accuracy = 0.01)}"))
                               ) +
  theme_minimal() +
  theme(panel.grid.minor = element_blank()) +
  labs(title = "Profit Curve",
       x = "Threshold", y = "Expected Profit")

Based on the Profit Curve, we will get a maximum expected value of only USD 1.72. We will expected a negative profit with threshold value of 0 to around 0.4 and threshold closing to 1. The maximum expected profit may be not good enough for us since the value is so little. We need to improve our model. However, remember that the cost and profit is just an assumption. You may want to apply the method with the real profit and cost of each item.

Model Improvement

The following sections constis of steps to improve our model by simply changing the resample scheme.

Data Preprocessing

On our first try, we directly use downsample in order to achieve a balanced class with 50:50 ratio for each class. On this occasion, we will use the combination of upsampling and downsampling together. First, we will downsample the data with the proportion of 1:5 with most of the data is the majority class. Then, we upsample the data so that the proportion would be of 50:50 for the both class. We can directly change the composition of the full dataset. However, that would cause a data leakage which will lead to a useless model in real time application. The resampling method should be done on the training dataset only.

## 
## Yes  No 
## 0.5 0.5

Model Fitting

We fit the model the same way as the previous occasion.

## Random Forest 
## 
## 111860 samples
##     18 predictor
##      2 classes: 'Yes', 'No' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 100674, 100674, 100674, 100674, 100674, 100674, ... 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa    
##    2    0.9164968  0.8329936
##   10    0.9684993  0.9369986
##   18    0.9663806  0.9327612
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 10.

We get an Out of Bag Error of only 3%, implying that we can achieve 97% of accuracy in the unseen data.

Model Evaluation

Now, we will observe how our new scheme can improve the Precision of the model. In our previous model, we can achieve 90% on Recall but only 5% on Precision.

As we can see, the Recall drop significantly from 90% to 81%. However, the Precision only rise up to 10%.

Let’s also check the AUC of the current model and compare them with the previous model.

Our previous model only has AUC of 0.955 while our current model has AUC up to 0.959. This shows that our new improved model can perform slightly better.

Now for the main part, let’s check the Profit Curve as well.

profit_value <- matrix(nrow = nrow(pred_test), ncol = 3)

# sort data
pred_test <- pred_test %>% arrange(desc(Yes))

# Sample 100 different thresholds
for (i in seq(1, nrow(pred_test), nrow(pred_test)*0.01)) {
  
 # set threshold
 thres <- pred_test$Yes[i]
 profit_value[i, 1] <- thres

 # classify
 pred_test <- pred_test %>% 
   mutate(predicted = ifelse(Yes < thres, "No", "Yes"))
 profit_value[i, 2] <- pred_test %>% filter(predicted == "Yes") %>% nrow()/nrow(pred_test)
 
 # expected profit
 confuse <- table(Prediction = pred_test$predicted, Actual = pred_test$actual) %>% 
   prop.table() %>% 
  rbind(matrix(0L, nrow = 2, ncol = 2) %>% `rownames<-`(c("No", "Yes"))) %>% 
  unique()
 
 profit_value[i, 3] <- expect_profit(confuse)
}

profit_value <- profit_value %>% 
  as.data.frame() %>% 
  `colnames<-`(c("threshold", "percent_yes", "expected_profit"))

# Get threshold with maximum profit
max_profit <- profit_value %>% 
  arrange(desc(expected_profit)) %>% 
  slice(1)

current_profit <- profit_value %>% 
  arrange(threshold) %>% 
  filter(threshold >= 0.5) %>% 
  slice(1)

# Visualize the result
profit_value %>% 
  ggplot(aes(threshold, expected_profit)) +
  geom_line(color = "firebrick") +
  geom_hline(yintercept = 0, linetype = "dashed", color = "skyblue4") +
  geom_vline(xintercept = 0.5, linetype = "dashed", color = "skyblue4") +
  geom_vline(xintercept = max_profit$threshold, linetype = "dashed", color = "green") +
  geom_point(data = max_profit, size = 3) +
  ggrepel::geom_text_repel(data= max_profit, box.padding = 3, nudge_x = 0.1,
                           aes(label = glue::glue("Max Profit
                                                  {comma(expected_profit, accuracy = 0.01)}"))
                               ) +
  geom_point(data = current_profit, size = 3) +
  ggrepel::geom_text_repel(data= current_profit, box.padding = 2, nudge_x = 0.1,
                           aes(label = glue::glue("Current Profit
                                                  {comma(expected_profit, accuracy = 0.01)}"))
                               ) +
  theme_minimal() +
  theme(panel.grid.minor = element_blank()) +
  labs(title = "Profit Curve",
       x = "Threshold", y = "Expected Profit")

With our current threshold (0.5), we can achieve an expected proft around USD 1.8. If we wish to get a maximum profit of USD 1.84, we can adjust the threshold to be around 0.55. The expected profit didn’t increase much from the previous model.

Conclusion

Based on this long steps that we’ve been through, we can conclude that the problem pose a great challenge to solve due to the unique characteristic of the backordered items. We cannot handle them with anomaly detection method since they are overallped with observations that didn’t went to backorder. On the other hand, we also have proven that Random Forest is good enough at classifying the target (based on the AUC value). The model is suitable if the management or store owner is only concerned with correctly predicting backordered item as much as possible when the inventory or any other cost associated with backorder is quite cheap and less risky compared to the benefit of backordering items.

Reference