Delivery Task COD Success Prediction and Analysis

This analysis aims to understand key factors influencing COD (Cash on Delivery) success using delivery task data. Insights from this study can support operational decisions to optimise success rates and resource allocation. The analysis involved several steps:

Exploratory Data Analysis (EDA): To understand data distributions, task patterns, and COD success rates.
Feature Engineering: Creating variables such as hour_created, origin_dest, and taskDuration to capture relevant delivery characteristics.
Machine Learning Modelling: Applying Random Forest with hyperparameter tuning to predict COD success probability.
Evaluation: Using metrics such as AUC, confusion matrix, and SHAP analysis to interpret model performance and feature contributions.
Insights Extraction: Translating model outputs into actionable business recommendations.

A. Packages

In this analysis, several libraries were loaded to support the entire process, such as tidyverse and dplyr for data manipulation, lubridate for date-time processing, listviewer for exploring JSON data structures, and caret for data partitioning and modeling preparation. Meanwhile, h2o served as the main library for machine learning modeling, providing comprehensive functions ranging from environment initialization, data conversion to H2O Frame, training various algorithms, hyperparameter tuning with grid search, to efficient model performance evaluation within a single framework.

lapply(c("tidyverse","dplyr","lubridate","listviewer","caret","h2o"),library,character.only=T)[[1]]

##  [1] "lubridate" "forcats"   "stringr"   "dplyr"     "purrr"     "readr"    
##  [7] "tidyr"     "tibble"    "ggplot2"   "tidyverse" "stats"     "graphics" 
## [13] "grDevices" "utils"     "datasets"  "methods"   "base"

B. Data

This dataset contains comprehensive information about delivery tasks, including task ID, creation and completion time, assigned worker,completion coordinates, and task type (flow). Additionally, it includes COD details such as payment amount and receipt status, as well as additional data (UserVar) covering delivery status, origin and destination branches, and package weight. This information is crucial for analyzing delivery patterns and predicting COD success.

B1. Preprocessing

At the preprocessing stage, data were mutated to extract dates and hours from taskCreatedTime and taskCompletedTime columns by converting their datetime format to Asia/Jakarta timezone. Several new variables were then created, including taskDuration to calculate task duration in minutes, hour_created to extract the task creation hour, origin_dest combining origin and destination branches, weekday to identify the day of task creation, and is_weekend as an indicator of whether the task was created on a weekend. These steps are essential to prepare relevant features for subsequent modeling and analysis.

#Raw Data
data<-jsonlite::fromJSON("C:/Users/ACERIndonesiaTest/OneDrive/Documents/data-task-sample-main/data-sample.json")

#Mutate taskCreated/Completed Date & Hour from both task time columns
Data<-data %>%
  mutate(
    taskCreatedTime_parsed = ymd_hms(taskCreatedTime, tz = "Asia/Jakarta"),
    taskCompletedTime_parsed = ymd_hms(taskCompletedTime, tz = "Asia/Jakarta"),
    taskCreatedDate = as.Date(taskCreatedTime_parsed),
    taskCreatedHour = format(taskCreatedTime_parsed, "%H:%M:%S"),
    taskCompletedDate = as.Date(taskCompletedTime_parsed),
    taskCompletedHour = format(taskCompletedTime_parsed, "%H:%M:%S")
  )

# Add New Columns (askDuration, hour_created, origin_dest, weekday, is_weekend)
Data$taskDuration<-(Data$taskCompletedTime_parsed-Data$taskCreatedTime_parsed)/60
Data$hour_created <- lubridate::hour(ymd_hms(Data$taskCreatedTime))
Data$origin_dest <- paste(Data$UserVar$branch_origin, Data$UserVar$branch_dest, sep = "_")
Data$weekday <- wday(ymd_hms(Data$taskCreatedTime), label = TRUE)
Data$is_weekend <- ifelse(Data$weekday %in% c("Sat","Sun"), 1, 0);listviewer::jsonedit(Data)

B2. Cleaned Data

The cleaned dataset contains key variables for modeling, including taskAssignedTo (courier, converted to factor), UserVar.weight (package weight as numeric), hour.created (task creation hour), origin.dest (combination of origin and destination branches as factor), is.weekend (weekend indicator as factor), and cod (COD receipt status converted to factor). This preprocessing ensures that all variables have appropriate data types for analysis and subsequent machine learning modeling.

Dat<-data.frame(taskAssignedTo=Data$taskAssignedTo%>%as.factor(),
           UserVar.weight=Data$UserVar$weight%>%as.numeric(),
           hour.created=Data$hour_created,
           origin.dest=Data$origin_dest%>%as.factor(),
           is.weekend=Data$is_weekend%>%as.factor(),
           cod=Data$cod$received%>%as.factor());Dat

C. Splitting

Data were split into 75% for training and 25% for testing, with a set seed to ensure reproducibility, preparing datasets for model training and evaluation.

set.seed(123);i<-createDataPartition(Dat$cod,p=0.75,list=F)
dtr<-Dat[i,]
dts<-Dat[-i,]

Training Set

trh<-as.h2o(dtr);trh%>%as.data.frame()

##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

Testing Set

tsh<-as.h2o(dts);tsh%>%as.data.frame()

##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

D. Hyperparameter Tuning

D1. Hyperparameter grid

Hyperparameter tuning for Random Forest was performed using grid search to identify the optimal parameter combination that produced the best performance based on Area Under Curve (AUC). Parameters tested included ntrees (number of trees) [200, 500, 1000], max_depth (maximum tree depth) [10, 20, 30, 40], min_rows (minimum rows per leaf) [1, 5, 10, 20], and sample_rate (proportion of data used per tree) [0.7 to 1.0]. Variations of mtries (number of variables considered at each split) ranged from default (-1) to fractions based on the total number of predictors, and col_sample_rate_per_tree (proportion of features used per tree) was tested at [0.6, 0.8, 1.0]. This approach ensured comprehensive model exploration to capture the most effective hyperparameter combinations for COD prediction.

hyper_params <- list(
  ntrees = c(200, 500, 1000),        
  max_depth = c(10, 20, 30, 40),     
  min_rows = c(1, 5, 10, 20),      
  sample_rate = c(0.7, 0.8, 0.9, 1.0),
  mtries = c(-1, sqrt(length(setdiff(names(trh), "cod"))), length(setdiff(names(trh), "cod"))/3, length(setdiff(names(trh), "cod"))/2), 
  col_sample_rate_per_tree = c(0.6, 0.8, 1.0) 
);listviewer::jsonedit(hyper_params)

D2. Search Criteria

Hyperparameter tuning used RandomDiscrete search criteria with max_models = 100, meaning the system randomly evaluated up to 100 different models out of all possible parameter combinations, making it more efficient than exhaustive (Cartesian) search. This strategy enabled broad parameter space exploration with faster computation time, while seed = 1234 ensured reproducible results

search_criteria <- list(
  strategy = "RandomDiscrete",
  max_models = 100,
  seed = 1234
);listviewer::jsonedit(search_criteria)

D3. Grid Search Hyperparameter Tuning

Hyperparameter tuning was performed using grid search with the predefined search criteria and parameters to obtain the best model based on AUC value.

rf_grid <- h2o.grid(
  algorithm = "randomForest",
  grid_id = "rf11gd",
  x = setdiff(names(trh), "cod"),
  y = "cod",
  training_frame = trh,
  validation_frame = tsh,
  hyper_params = hyper_params,
  search_criteria = search_criteria
)

E. Results

E1. The Grid Results Sorted by AUC

Grid search results showed that the best Random Forest model was obtained with col_sample_rate_per_tree = 0.6, max_depth = 40, min_rows = 1, mtries = -1, ntrees = 500, and sample_rate = 1, achieving an AUC of 0.9959. This model was selected from a total of 100 hyperparameter combinations tested, with AUC ranges across all models between 0.98 and 0.9959, indicating that most models already had very high performance. These findings suggest that although the best model had specific hyperparameter combinations, performance stability among combinations was also high, allowing consideration of trade-offs between performance and model complexity if needed.

rf_grid<-read_rds("C:/Users/ACERIndonesiaTest/OneDrive/Documents/DTA Grid Search Hyperpar RF1.rds")
sorted_grid <- h2o.getGrid("rf11gd", sort_by = "AUC", decreasing = T)
bm<-sorted_grid@summary_table%>%as.data.frame()%>%column_to_rownames("model_ids");bm

E2. Best Model Details

The best Random Forest model produced consisted of 500 trees with a maximum depth of 40, using min_rows = 1 to split data to the most granular level. It used col_sample_rate_per_tree = 0.6, meaning only 60% of features were considered per tree to enhance generalisation. sample_rate = 1 ensured all data were included in each tree, and mtries = -1 indicated default feature selection per split. This model was built to predict COD (Cash on Delivery) using factors such as taskAssignedTo (courier), UserVar.weight (package weight), hour.created (delivery hour), origin.dest (route), and is.weekend (delivery day). These combinations show the model was designed to capture complex patterns in COD data with high predictive potential.

Main Parameters

best_rf <- h2o.getModel(sorted_grid@model_ids[[1]]);best_rf@parameters%>%listviewer::jsonedit()

All Parameters

best_rf@allparameters%>%listviewer::jsonedit()

E3. Variable Importance of Best Model

Results showed origin.dest (route) was the most influential factor for COD success (52%), indicating origin-destination combinations significantly affect package acceptance. taskAssignedTo (25%) highlighted significant performance differences among couriers, suggesting opportunities for improvement through evaluation and training. hour.created (15%) also had an impact, showing certain delivery times had higher acceptance rates. Meanwhile, UserVar.weight (7%) had a small but relevant influence on COD preferences based on package weight, and is.weekend (2%) had almost no impact, indicating stable COD behaviour between weekdays and weekends.

Insight:

Prioritise routes with high success rates or evaluate strategies for routes with low success rates.
Schedule COD deliveries at optimal times for operational efficiency.
Improve courier competency and service standards through data-driven coaching.

h2o.varimp(best_rf)%>%ggplot(aes(x=variable%>%reorder(percentage),y=percentage*100,fill=percentage*100))+
  geom_bar(stat="identity",width=0.7)+scale_fill_gradient(low="steelblue",high="navy")+
  theme_minimal()+theme(panel.grid = element_blank(),axis.title = element_blank(),
                        axis.text = element_text(size=11),legend.title = element_blank(),
                        plot.title = element_text(hjust=0.5,face="bold"))+
  geom_label(aes(label=scales::percent(round(percentage,2))),
                            size=4,col="white")+coord_flip()+labs(title="Variable Importance Plot of Best RF Model")

E4. Evaluate the Best Model on Test Set

Evaluation on the test data shows that the model achieved a very high AUC of 0.9959, indicating an almost perfect ability to distinguish between classes. The confusion matrix shows high accuracy with only 13 errors out of 588 data points (2.2% error rate), where specificity reached 99.8% (414/415) and recall for the TRUE class was 93% (161/173). Additionally, the highest F1 score was 0.96 at a threshold of 0.64, and maximum accuracy was 97.8%. These results demonstrate that the model has excellent predictive performance and stability in classifying COD status on the test dataset.

AUC

perf_best <- h2o.performance(best_rf, newdata = tsh);perf_best@metrics$AUC[[1]]

## [1] 0.9958772

Confusion Matrix

perf_best@metrics$cm$table

Max Criteria and Metric Scores

perf_best@metrics$max_criteria_and_metric_scores

E5. How Features Influence COD Predictions (SHAP Analysis)

SHAP explains the magnitude and direction of each factor’s influence on model predictions, where positive values increase COD acceptance probability and negative values decrease it. In the plot, blue indicates data points with low factor values, while red indicates high factor values, making it easier to interpret the impact of low or high values on predictions.

Results showed origin.dest (route combinations) had the largest influence, meaning certain routes significantly increased or decreased COD success likelihood. taskAssignedTo (assigned courier) also had a large impact with both positive and negative contributions, indicating performance differences among couriers affecting COD outcomes. For hour.created (delivery hour), deliveries in early morning hours (blue) tended to decrease COD acceptance, while midday to evening hours (red) increased it. UserVar.weight (package weight) had a small impact, with heavier packages slightly reducing COD acceptance, while is.weekend (delivery day) had almost no influence, indicating stable COD acceptance between weekdays and weekends. In summary, route, courier, and delivery time are key factors that can be optimised to improve COD success rates.

hsh<-h2o.shap_summary_plot(best_rf, tsh) 
hsh+labs(title = "SHAP Summary Plot – Best RF Model", y = "SHAP Contribution", x = "Feature") + theme_minimal(base_size = 13) + theme( plot.title = element_text(face = "bold", hjust = 0.5), axis.text = element_text(color = "black",size=9), legend.title = element_text(face = "bold"), panel.grid = element_blank() )

F. Insights and Recommendations

F1. Insights

The analysis revealed that route combinations (origin.dest), courier assignments (taskAssignedTo), and delivery hours (hour.created) are the most influential factors determining COD success. Specifically, certain routes significantly increased or decreased acceptance rates, suggesting that operational strategies should prioritise or adjust these routes. Courier performance varied considerably, highlighting opportunities for targeted coaching to standardise service quality and improve success rates across the team. Delivery time also played a key role, with morning deliveries generally having lower acceptance, while midday to evening deliveries showed higher COD success, indicating potential for scheduling optimisation.

Package weight was found to have a small negative effect, with heavier packages slightly reducing acceptance probability, which could inform decisions on payment method options for specific weight categories. Additionally, day of the week (is.weekend) did not significantly impact COD success, suggesting operational scheduling can remain flexible across weekdays and weekends without affecting outcomes.

Overall, the Random Forest model demonstrated outstanding predictive performance (AUC 0.9959) with high accuracy and stability, confirming that the selected features effectively capture patterns associated with COD success. These insights provide clear directions for operational improvements, courier training, and strategic decision-making to optimise delivery efficiency and customer experience.

F2. Recommendations

Based on these findings, several practical recommendations are proposed to optimise operational efficiency and improve COD success rates.

Prioritise routes with high success rates and re-evaluate strategies for routes with consistently low acceptance.
Schedule deliveries during midday to evening hours to maximise COD success rates.
Implement coaching programs for couriers to address performance gaps and standardise service quality.
Consider payment method strategies for heavier packages to improve acceptance.
Integrate model predictions into operational planning tools to enable data-driven decision-making at scale.