DATA 604 - Final_Simulation_Project

Bikash Bhowmik —- 23 May 2026

Instructions

Develop a simulation model that represents the process of fulfilling a customer order, where a product is picked from a warehouse and delivered to the customer after being purchased through an e-commerce platform.

Introduction

This project simulates the last-mile delivery process used by modern e-commerce companies such as Amazon. A discrete-event simulation is developed in R using the simmer package to model the complete flow from order arrival, warehouse picking, packing, and courier-based delivery. The goal is to measure performance (lead time, SLA compliance) and optimize the number of couriers needed to achieve a 95% on-time delivery target while minimizing cost.

E-commerce companies rely heavily on efficient warehouse operations and delivery logistics. Once a customer places an order on a website, multiple processes occur:

Order enters system
Warehouse worker picks items
Items are packed
A courier loads the package
Delivery is made to the customer
This simulation replicates these steps and evaluates different staffing levels.

This model is inspired by:

Amazon fulfillment center robotics
Optimization strategies described in last-mile delivery research
Warehouse process engineering references provided in class

The final objective is to:

✔ Evaluate system performance under different courier staffing levels
✔ Meet SLA target: 95% of deliveries within 120 minutes
✔ Minimize cost per delivered order

Conceptual Model

The conceptual model defines the structure of the simulation system by identifying its core components and their interactions.

Entities

  • Customer orders entering the system

Resources

  • Pickers (warehouse workers retrieving items)
  • Packers (staff responsible for packaging)
  • Couriers (drivers delivering orders)

Queues

  • Orders waiting for picking
  • Orders waiting for packing
  • Orders waiting for courier dispatch

Processes

  1. Order arrival into the system
  2. Item picking from warehouse
  3. Order packing
  4. Courier delivery to customer

Performance Metrics

  • Average lead time (order to delivery)
  • SLA compliance (delivery within 120 minutes)
  • Cost per order

System Flow
Order → Picking → Packing → Delivery

This diagram visually represents the flow of orders through the system, including processing stages and queues between resources.

Figure: Conceptual flow of the last-mile delivery system

This conceptual structure ensures that the simulation captures the essential operations of an e-commerce last-mile delivery system.

The interaction between system components follows a structured dependency across stages. Orders first interact with pickers through the picking queue, where each order must secure a picker resource before processing begins. After picking, orders move to the packing stage, where packers operate independently but only after picking is completed, establishing a sequential dependency. Finally, couriers serve as downstream resources, interacting only with fully packed orders. This creates a serial flow structure in which delays in downstream resources, particularly couriers, propagate backward and increase queue lengths in earlier stages.

Key Assumptions

  • Orders are independent and identically distributed
    This simplifies the model and is reasonable because customer orders typically arrive randomly and independently in large e-commerce systems.

  • No order cancellations or returns
    This keeps the model focused on the forward fulfillment process and avoids reverse logistics complexity.

  • FIFO queue discipline at all stages
    FIFO reflects standard warehouse operations where orders are processed in arrival order to ensure fairness.

  • Resources are always available (no breakdowns)
    This removes downtime variability and allows focus on system capacity and flow performance.

  • Travel times are independent of traffic conditions
    This approximates average delivery behavior without requiring complex real-time traffic modeling.

  • No batching or routing optimization is considered
    This isolates the impact of resource levels, especially couriers, without introducing routing complexity.

Process Logic

The simulation follows a discrete-event process flow where each order moves sequentially through the system.

  • Orders arrive based on a Poisson process, meaning interarrival times follow an exponential distribution

  • Each order enters the picking stage, where it: 1. Seizes a picker if available 2. Otherwise waits in a queue

  • After picking, the order moves to packing, following the same logic

  • Once packed, the order proceeds to the delivery stage, where: 1. A courier must be available 2. If not, the order waits in the delivery queue

Queue Discipline

  • First-In-First-Out (FIFO) is assumed for all stages

Resource Constraints

  • Limited number of pickers, packers, and couriers
  • Couriers are varied (3–15) to test system performance

Processing Times

  • Picking and packing follow exponential distributions
  • Delivery time follows a truncated normal distribution

Key Assumption

  • No priority orders or interruptions (non-preemptive system)

This logic replicates real-world warehouse operations where orders must pass through each stage before completion.

Each event in the simulation corresponds to a discrete transition such as an order arrival, the start of service when a resource is seized, or the completion of service when the resource is released. These events trigger state changes in the system, including queue updates and resource allocation, which collectively define the dynamic behavior of the process over time.

Process Explanation

When a customer places an order, it enters the system and begins its journey through the warehouse and delivery network.

First, the order joins the picking queue. If a picker is available, the item retrieval process starts immediately; otherwise, the order waits. After picking is completed, the order moves to the packing stage, where it is prepared for shipment.

Once packed, the order enters the delivery queue, where it must wait for an available courier. This stage is critical because courier availability directly impacts delivery time. If couriers are limited, orders accumulate in the queue, increasing delays.

After a courier is assigned, the order is delivered to the customer, completing the process. The total time from arrival to delivery is recorded as the lead time, which determines whether the order meets the Service Level Agreement (SLA).

The simulation shows that while picking and packing are important, the delivery stage is the primary bottleneck, as it has the greatest impact on overall system performance and customer satisfaction.

This behavior is consistent with real-world e-commerce operations such as Amazon, where delays in courier availability often lead to late deliveries, even when warehouse processing is efficient.

Model Description

Note: This simulation model is implemented using the simmer package in R. While Simio is commonly used for discrete-event simulation, simmer provides equivalent functionality including event scheduling, resource allocation, and queue-based process modeling. The same simulation logic and structure applied in Simio are replicated here programmatically.

Mapping to Simio Objects

  • add_generator() → Source (order arrivals)
  • seize() → Server (resource allocation)
  • timeout() → Processing time
  • release() → Resource release
  • Queues are automatically managed before each seize step

In Simio terminology, each stage in the process (picking, packing, delivery) behaves like a Server where entities are processed. The seize() function represents the entity entering a Server and occupying a resource, while release() corresponds to completion of service and freeing the Server. The order arrival created by add_generator() is equivalent to a Source, and the completed delivery represents a Sink where entities exit the system. The movement between stages conceptually follows Paths connecting Servers, even though movement is abstracted in this implementation.

This mapping ensures that the same discrete-event logic used in Simio is preserved.

In addition to this mapping, the simulation implicitly represents several other Simio constructs. The queues that form before each seize() step function as Buffers, where entities wait until resources become available. Resource capacities defined in add_resource() directly correspond to Server capacity in Simio, limiting how many entities can be processed simultaneously and influencing system throughput. Furthermore, the movement of orders between stages (picking, packing, delivery) conceptually represents Paths connecting Servers, even though physical travel is abstracted in the model. Together, these elements ensure that both congestion effects and resource constraints are accurately captured.

System Flow (Modeled After Amazon Warehouse)

  1. Order Arrival
    Orders arrive according to a Poisson process: λ = 20 orders/hour.

  2. Picking Process
    Item retrieval time is modeled as an exponential distribution (mean = 5 minutes).

  3. Packing Process
    Packaging time is exponential (mean = 3 minutes).

  4. Courier Dispatch & Delivery
    Travel time is modeled as a normal distribution with mean 30 min and SD 10 min, truncated between 5 and 60 min to avoid unrealistic extreme values.

SLA Target Order must be delivered within 120 minutes.

Simulation Parameters Horizon: 1 week (7 days) Warm-up removed: first 24 hours Couriers tested: 3 to 15 Costs: Driver wage: $20/hour Vehicle cost: $8/hour

params <- list(
  lambda_orders = 20/60,
  pick_mean     = 5,
  pack_mean     = 3,
  travel_mean   = 30,
  travel_sd     = 10,
  sla_minutes   = 120,
  horizon_min   = 7*24*60,
  warmup_min    = 24*60,
  wage_per_hr   = 20,
  vehicle_per_hr= 8,
  stage_mean=2
)
simulate_last_mile <- function(num_couriers, params) {

env <- simmer("warehouse_delivery")

order_path <- trajectory("order") %>%
seize("picker", 1) %>%
timeout(function() rexp(1, 1/params$pick_mean)) %>%
release("picker", 1) %>%


seize("packer", 1) %>%
timeout(function() rexp(1, 1/params$pack_mean)) %>%
release("packer", 1) %>%

seize("courier", 1) %>%
timeout(function() pmax(pmin(rnorm(1, params$travel_mean, params$travel_sd),60), 5)) %>%
release("courier", 1)


env %>%
add_resource("picker", 3) %>%
add_resource("packer", 2) %>%
##add_resource("stager", 1) %>%
add_resource("courier", num_couriers) %>%
add_generator("order",
order_path,
function() rexp(1, rate = params$lambda_orders))

env %>% run(until = params$horizon_min)

arr <- get_mon_arrivals(env) %>%
filter(end_time > params$warmup_min) %>%
mutate(
lead_time = end_time - start_time,
within_sla = lead_time <= params$sla_minutes
)




# Summary metrics

mean_lead <- mean(arr$lead_time)
p_sla     <- mean(arr$within_sla)
n_orders  <- nrow(arr)

hours_total <- params$horizon_min / 60
total_cost  <- num_couriers * hours_total * (params$wage_per_hr + params$vehicle_per_hr)
cost_order  <- total_cost / n_orders

tibble(
num_couriers = num_couriers,
mean_lead = mean_lead,
p_sla = p_sla,
cost_order = cost_order,
n_orders = n_orders
)

}
courier_range <- 3:15

run_experiment <- function(couriers, reps = 10, params) {
out <- lapply(couriers, function(c) {
replicate(reps, simulate_last_mile(c, params), simplify = FALSE) %>%
bind_rows()
})
bind_rows(out)
}

results <- run_experiment(courier_range, 10, params)

Model Validation

The model was validated through multiple checks:

  • Logical validation: The process flow correctly follows order → picking → packing → delivery.
  • Bottleneck behavior: As expected, courier capacity becomes the limiting factor at low staffing levels.
  • Replication stability: Results were averaged over multiple replications to ensure consistency.
  • Face validity: Output metrics such as lead time and SLA performance align with realistic expectations for e-commerce delivery systems.

Each scenario was simulated with 10 replications to reduce stochastic variability and ensure stable performance estimates.


Results

Summary Table

summary_results <- results %>%
group_by(num_couriers) %>%
summarise(
avg_lead = mean(mean_lead),
sla_rate = mean(p_sla),
cost = mean(cost_order)
)

library(gt)

summary_results %>%
  gt() %>%
  fmt_number(
    columns = c(avg_lead, sla_rate, cost),
    decimals = 2
  ) %>%
  cols_label(
    num_couriers = "Couriers",
    avg_lead = "Avg Lead Time",
    sla_rate = "SLA Rate",
    cost = "Cost per Order"
  ) %>%
  tab_header(
    title = "Courier Performance Summary"
  )
Courier Performance Summary
Couriers Avg Lead Time SLA Rate Cost per Order
3 4,048.31 0.00 16.33
4 3,424.24 0.00 16.33
5 2,884.75 0.00 16.33
6 2,350.39 0.00 16.38
7 1,706.95 0.00 16.30
8 1,183.81 0.00 16.36
9 605.42 0.02 16.33
10 172.69 0.36 16.41
11 53.48 0.99 17.91
12 44.13 1.00 19.57
13 41.98 1.00 21.19
14 41.22 1.00 22.78
15 40.70 1.00 24.71

Lead Time Plot

ggplot(summary_results, aes(num_couriers, avg_lead)) +
geom_line(color="blue") +
geom_point() +
geom_hline(yintercept = params$sla_minutes, linetype="dashed", color="red") +
labs(title="Average Delivery Lead Time vs Couriers",
x="Number of Couriers", y="Lead Time (minutes)")

The lead time decreases sharply as the number of couriers increases, showing that delivery capacity is the main bottleneck in the system. When courier availability is low, orders spend more time waiting in the delivery queue, significantly increasing total delay. After a certain point, the improvement becomes less steep, indicating diminishing returns. This suggests that adding couriers is most effective up to a threshold level.

SLA Compliance Plot

ggplot(summary_results, aes(num_couriers, sla_rate)) +
geom_line(color="darkgreen") +
geom_point() +
geom_hline(yintercept = 0.95, linetype="dashed", color="red") +
scale_y_continuous(labels=scales::percent) +
labs(title="SLA Compliance vs Couriers",
x="Couriers", y="Probability of On-Time Delivery")

SLA compliance improves steadily as courier capacity increases, demonstrating a direct relationship between staffing levels and service quality. The system reaches the 95% SLA target at around 11 couriers. Beyond this point, additional couriers provide only marginal improvement. This confirms that courier availability is the key driver of on-time delivery performance.

SLA by courier count

ggplot(summary_results, aes(x = factor(num_couriers), y = sla_rate)) +
  geom_bar(stat = "identity") +
  geom_hline(yintercept = 0.95, linetype = "dashed", color = "red") +
  labs(
    title = "SLA Compliance by Number of Couriers",
    x = "Number of Couriers",
    y = "SLA Rate"
  )

The bar chart clearly shows how SLA performance varies across different courier levels. It visually highlights the sharp improvement between low and medium staffing levels. The 95% SLA threshold is achieved at approximately 11 couriers, confirming the optimal operating point. This representation makes the decision boundary more intuitive than line plots.

Cost per Order Plot

ggplot(summary_results, aes(num_couriers, cost)) +
geom_line(color="purple") +
geom_point() +
labs(title="Cost per Order vs Couriers",
x="Couriers", y="Cost per Order ($)")

The cost per order increases almost linearly with the number of couriers due to higher labor and operational expenses. While more couriers improve service levels, they also raise total system cost. This creates a clear trade-off between operational efficiency and service quality. The plot highlights the need to balance cost with SLA performance.

Sensitivity Analysis: Impact of Pickers

simulate_with_pickers <- function(num_pickers, params) {

  env <- simmer("warehouse_delivery")

  order_path <- trajectory("order") %>%
    seize("picker", 1) %>%
    timeout(function() rexp(1, 1/params$pick_mean)) %>%
    release("picker", 1) %>%

    seize("packer", 1) %>%
    timeout(function() rexp(1, 1/params$pack_mean)) %>%
    release("packer", 1) %>%

    seize("courier", 1) %>%
    timeout(function() pmax(pmin(rnorm(1, params$travel_mean, params$travel_sd),60), 5)) %>%
    release("courier", 1)

  env %>%
    add_resource("picker", num_pickers) %>%
    add_resource("packer", 2) %>%
    add_resource("courier", 11) %>%  # keep optimal couriers fixed
    add_generator("order", order_path,
                  function() rexp(1, rate = params$lambda_orders))

  env %>% run(until = params$horizon_min)

  arr <- get_mon_arrivals(env) %>%
    filter(end_time > params$warmup_min) %>%
    mutate(lead_time = end_time - start_time)

  tibble(
    pickers = num_pickers,
    mean_lead = mean(arr$lead_time)
  )
}

picker_results <- bind_rows(lapply(2:6, simulate_with_pickers, params = params))

ggplot(picker_results, aes(pickers, mean_lead)) +
  geom_line() +
  geom_point() +
  labs(title="Impact of Pickers on Lead Time",
       x="Number of Pickers", y="Lead Time")

Increasing the number of pickers reduces lead time initially, but the improvement is relatively small compared to courier changes. This confirms that picking is not the primary bottleneck in the system. Once a minimum capacity is reached, additional pickers provide diminishing returns. This validates that delivery resources dominate system performance.

SLA with 95% Confidence Intervals

ci_results <- results %>%
  group_by(num_couriers) %>%
  summarise(
    mean_sla = mean(p_sla),
    sd_sla = sd(p_sla),
    n = n(),
    se = sd_sla / sqrt(n),
    lower = mean_sla - 1.96 * se,
    upper = mean_sla + 1.96 * se
  )

ggplot(ci_results, aes(num_couriers, mean_sla)) +
  geom_line() +
  geom_point() +
  geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.3) +
  labs(title="SLA with 95% Confidence Intervals",
       x="Couriers", y="SLA")

The confidence intervals show the statistical reliability of the SLA estimates across simulation replications. The relatively narrow intervals indicate low variability in system performance.

This suggests that the simulation results are stable and not driven by random fluctuations. The optimal solution at 11 couriers remains consistent within the confidence bounds, reinforcing the robustness of the decision.

The observed system behavior is consistent with queueing theory principles, where increasing service capacity reduces waiting time in a nonlinear manner. In particular, as the number of couriers increases, system utilization decreases, leading to shorter queues and improved service levels. This behavior resembles a multi-server queueing system, where delays grow rapidly as utilization approaches capacity and diminish once sufficient service capacity is introduced. The alignment between simulation results and theoretical expectations provides additional validation of the model’s correctness.

Animation

🛒
Order
➡️
📦
Picking
➡️
📋
Packing
➡️
🚚
Delivery
➡️
💰
Completed

E-Commerce Order Fulfillment Animation

The animation illustrates the end-to-end flow of the e-commerce order fulfillment process modeled in this simulation. Each stage represents a key operational step in the system: order placement, warehouse picking, packing, courier dispatch, and final delivery.

The process begins when a customer order enters the system. The order then moves sequentially through the warehouse operations, where it is first picked by a warehouse worker and then packed for shipment. After packing, the order waits for an available courier before being dispatched for last-mile delivery.

This visual representation reflects the discrete-event simulation logic implemented using the simmer package in R, where each stage corresponds to a service node with resource constraints and queueing behavior.

The animation helps demonstrate how orders progress through the system and highlights the dependency between stages. In particular, delays in courier availability can propagate backward, increasing waiting times in earlier stages and affecting overall lead time and SLA performance.

This aligns with real-world e-commerce systems such as Amazon, where the delivery phase is typically the primary bottleneck affecting customer satisfaction and on-time delivery performance.

Optimization

Goal:

SLA ≥ 95%

Minimize cost per order

best <- summary_results %>%
filter(sla_rate >= 0.95) %>%
arrange(cost) %>%
slice(1)
best
# A tibble: 1 × 4
  num_couriers avg_lead sla_rate  cost
         <int>    <dbl>    <dbl> <dbl>
1           11     53.5    0.992  17.9

Cost vs SLA Tradeoff Curve

ggplot(summary_results, aes(cost, sla_rate)) +
geom_point(size=3) +
geom_line() +
geom_hline(yintercept=0.95, linetype="dashed", color="red") +
labs(title="Cost vs SLA Tradeoff Curve",
x="Cost per Order ($)",
y="SLA Compliance")

The cost vs. SLA tradeoff curve clearly illustrates the relationship between service quality and operational expense. As the number of couriers increases, SLA compliance improves but at a higher cost per order. The optimal staffing level occurs at 11 couriers, where the system first satisfies the 95% SLA requirement while maintaining the lowest possible delivery cost.

In the base scenario, staffing 11 couriers results in an average lead time of about 54 minutes, SLA compliance of 99.2%, and an estimated cost of $17.9 per order. This is the smallest courier level that meets the 95% of SLA requirement while minimizing cost.

The variability across replications highlights the stochastic nature of the system, particularly due to random arrivals and service times. Despite this uncertainty, the optimal staffing level remains stable, reinforcing the robustness of the 11-courier solution.

Conclusion

The simulation model represents warehouse staff and couriers and studies the impact on the overall last-mile delivery process.

The following observations can be made based on the simulation results. Couriers are the most effective lever to improve the lead time and the overall SLA for the delivery service:

  • With a higher number of couriers, the lead time is lower and SLA is higher.
  • The delivery cost rises almost linearly with an increasing number of couriers, which establishes a clear cost–service trade-off.
  • The effective staffing level is 11 couriers, which provides a sufficiently high SLA (≥ 95%) and the least cost per order ( ~$17.90).
  • With lower staffing, the lack of couriers in the system is the performance bottleneck, as the long queues lead to high SLA misses on the delivery deadlines.
  • Packing and picking resources can also be part of the bottleneck area with low resource capacities, however, their effect on the system performance is weaker compared to the courier capacity.

The observations are relevant to the actual settings of Amazon and e-commerce operations. In particular, robotics is being employed to increase the picking rates in the warehouses, the packing stations are often the bottleneck in the system, and the availability of the couriers is a major driver of customer experience in on-time delivery service.

The model abstracts away several real-world complexities, such as traffic variability, dynamic routing, peak-hour demand, and driver heterogeneity. The addition of these features is an interesting opportunity for the model extension.

The model demonstrates the existence of an optimal operating point for a given level of service and cost that is helpful for data-informed decision-making.