Develop a model that illustrates the process where a product is dispatched from a warehouse and sent to the customer, after purchasing the product through an e-commerce website.
This project simulates the last-mile delivery process used by modern
e-commerce companies such as Amazon. A discrete-event simulation is
developed in R using the simmer package to
model the complete flow from order arrival, warehouse picking, packing,
and courier-based delivery.
The goal is to measure performance (lead time, SLA compliance) and
optimize the number of couriers needed to achieve a 95%
on-time delivery target while minimizing cost.
E-commerce companies rely heavily on efficient warehouse operations
and delivery logistics.
Once a customer places an order on a website, multiple processes
occur:
This simulation replicates these steps and evaluates different staffing levels.
This model is inspired by:
The final objective is to:
✔ Evaluate system performance under different courier staffing
levels
✔ Meet SLA target: 95% of deliveries within 120
minutes
✔ Minimize cost per delivered order
Order Arrival
Orders arrive according to a Poisson process: λ = 20
orders/hour.
Picking Process
Item retrieval time is modeled as an exponential distribution (mean = 5
minutes).
Packing Process
Packaging time is exponential (mean = 3 minutes).
Courier Dispatch & Delivery
Travel time is modeled as a normal distribution with mean 30 min and SD
10 min, turnicated between 5 and 60 min to avoid unrealistic extreme
values.
Order must be delivered within 120 minutes.
#Libraries
library(simmer)
library(dplyr)
library(ggplot2)
library(simmer.plot)
set.seed(123)
params <- list(
lambda_orders = 20/60,
pick_mean = 5,
pack_mean = 3,
travel_mean = 30,
travel_sd = 10,
sla_minutes = 120,
horizon_min = 7*24*60,
warmup_min = 24*60,
wage_per_hr = 20,
vehicle_per_hr= 8,
stage_mean=2
)
simulate_last_mile <- function(num_couriers, params) {
env <- simmer("warehouse_delivery")
order_path <- trajectory("order") %>%
seize("picker", 1) %>%
timeout(function() rexp(1, 1/params$pick_mean)) %>%
release("picker", 1) %>%
seize("packer", 1) %>%
timeout(function() rexp(1, 1/params$pack_mean)) %>%
release("packer", 1) %>%
seize("courier", 1) %>%
timeout(function() pmax(pmin(rnorm(1, params$travel_mean, params$travel_sd),60), 5)) %>%
release("courier", 1)
env %>%
add_resource("picker", 3) %>%
add_resource("packer", 2) %>%
##add_resource("stager", 1) %>%
add_resource("courier", num_couriers) %>%
add_generator("order",
order_path,
function() rexp(1, rate = params$lambda_orders))
env %>% run(until = params$horizon_min)
arr <- get_mon_arrivals(env) %>%
filter(end_time > params$warmup_min) %>%
mutate(
lead_time = end_time - start_time,
within_sla = lead_time <= params$sla_minutes
)
# Summary metrics
mean_lead <- mean(arr$lead_time)
p_sla <- mean(arr$within_sla)
n_orders <- nrow(arr)
hours_total <- params$horizon_min / 60
total_cost <- num_couriers * hours_total * (params$wage_per_hr + params$vehicle_per_hr)
cost_order <- total_cost / n_orders
tibble(
num_couriers = num_couriers,
mean_lead = mean_lead,
p_sla = p_sla,
cost_order = cost_order,
n_orders = n_orders
)
}
courier_range <- 3:15
run_experiment <- function(couriers, reps = 10, params) {
out <- lapply(couriers, function(c) {
replicate(reps, simulate_last_mile(c, params), simplify = FALSE) %>%
bind_rows()
})
bind_rows(out)
}
results <- run_experiment(courier_range, 10, params)
summary_results <- results %>%
group_by(num_couriers) %>%
summarise(
avg_lead = mean(mean_lead),
sla_rate = mean(p_sla),
cost = mean(cost_order)
)
summary_results
## # A tibble: 13 × 4
## num_couriers avg_lead sla_rate cost
## <int> <dbl> <dbl> <dbl>
## 1 3 4048. 0 16.3
## 2 4 3424. 0 16.3
## 3 5 2885. 0 16.3
## 4 6 2350. 0 16.4
## 5 7 1707. 0 16.3
## 6 8 1184. 0 16.4
## 7 9 605. 0.0217 16.3
## 8 10 173. 0.357 16.4
## 9 11 53.5 0.992 17.9
## 10 12 44.1 1.00 19.6
## 11 13 42.0 1 21.2
## 12 14 41.2 1 22.8
## 13 15 40.7 1 24.7
ggplot(summary_results, aes(num_couriers, avg_lead)) +
geom_line(color="blue") +
geom_point() +
geom_hline(yintercept = params$sla_minutes, linetype="dashed", color="red") +
labs(title="Average Delivery Lead Time vs Couriers",
x="Number of Couriers", y="Lead Time (minutes)")
ggplot(summary_results, aes(num_couriers, sla_rate)) +
geom_line(color="darkgreen") +
geom_point() +
geom_hline(yintercept = 0.95, linetype="dashed", color="red") +
scale_y_continuous(labels=scales::percent) +
labs(title="SLA Compliance vs Couriers",
x="Couriers", y="Probability of On-Time Delivery")
ggplot(summary_results, aes(num_couriers, cost)) +
geom_line(color="purple") +
geom_point() +
labs(title="Cost per Order vs Couriers",
x="Couriers", y="Cost per Order ($)")
Goal:
SLA ≥ 95%
Minimize cost per order
best <- summary_results %>%
filter(sla_rate >= 0.95) %>%
arrange(cost) %>%
slice(1)
best
## # A tibble: 1 × 4
## num_couriers avg_lead sla_rate cost
## <int> <dbl> <dbl> <dbl>
## 1 11 53.5 0.992 17.9
In the base scenario, staffing 11 couriers results in an average lead time of about 54 minutes, SLA compliance of 99.2%, and an estimated cost of $17.9 per order. This is the smallest courier level that meets the 95% of SLA requirement while minimizing cost.
The simulation model represents warehouse staff and couriers and studies the impact on the overall last-mile delivery process.
The following observations can be made based on the simulation results. Couriers are the most effective lever to improve the lead time and the overall SLA for the delivery service:
The observations are relevant to the actual settings of Amazon and e-commerce operations. In particular, robotics is being employed to increase the picking rates in the warehouses, the packing stations are often the bottleneck in the system, and the availability of the couriers is a major driver of customer experience in on-time delivery service.
The model abstracts away several real-world complexities, such as traffic variability, dynamic routing, peak-hour demand, and driver heterogeneity. The addition of these features is an interesting opportunity for the model extension.
The model shows that there is an efficient operating point for a given level of service and cost that is helpful for data-informed decision-making.