Research Question: Does late delivery reduce customer satisfaction on the Olist platform, and if so, where should Olist target operational improvement first?
This report analyses ~97,000 delivered orders across the Olist Brazilian e-commerce dataset (2016–2018). We join data from six relational tables — orders, reviews, customers, order items, products, and category translations — to build a unified analytical frame. The central finding is unambiguous: late deliveries are associated with a 1.74-point drop in average review score (on a 1–5 scale), and the effect is concentrated in specific Brazilian states and product categories that can be actionably targeted.
All analysis uses the R tidyverse ecosystem. Charts are produced with
ggplot2. A consistent colour palette (#2980b9
for primary, #e74c3c for alert, #27ae60 for
positive) is used throughout.
library(dplyr)
library(tidyr)
library(ggplot2)
library(lubridate)
library(scales)
library(kableExtra)
library(patchwork)# Shared visual identity
col_primary <- "#2980b9"
col_late <- "#e74c3c"
col_ontime <- "#27ae60"
col_neutral <- "#95a5a6"
col_accent <- "#f39c12"
theme_olist <- theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", size = 14, colour = "#2c3e50"),
plot.subtitle = element_text(colour = "#7f8c8d", size = 11),
plot.caption = element_text(colour = "#95a5a6", size = 9, hjust = 0),
axis.title = element_text(colour = "#555555", size = 11),
axis.text = element_text(colour = "#555555"),
panel.grid.major = element_line(colour = "#ecf0f1"),
panel.grid.minor = element_blank(),
legend.position = "top",
strip.text = element_text(face = "bold")
)Six CSV files are loaded from the project directory. We perform basic structural checks immediately after import.
data_path <- "data/"
orders <- read.csv(paste0(data_path, "olist_orders_dataset.csv"), stringsAsFactors = FALSE)
reviews <- read.csv(paste0(data_path, "olist_order_reviews_dataset.csv"), stringsAsFactors = FALSE)
customers <- read.csv(paste0(data_path, "olist_customers_dataset.csv"), stringsAsFactors = FALSE)
items <- read.csv(paste0(data_path, "olist_order_items_dataset.csv"), stringsAsFactors = FALSE)
products <- read.csv(paste0(data_path, "olist_products_dataset.csv"), stringsAsFactors = FALSE)
cat_trans <- read.csv(paste0(data_path, "product_category_name_translation.csv"),
stringsAsFactors = FALSE, fileEncoding = "UTF-8-BOM")
# Normalise BOM-affected column name (appears as "X...product_category_name" on some systems)
if (!"product_category_name" %in% names(cat_trans)) {
names(cat_trans)[1] <- "product_category_name"
}
cat("Tables loaded successfully.\n")## Tables loaded successfully.
table_info <- data.frame(
Table = c("orders", "reviews", "customers", "items", "products", "cat_trans"),
Rows = c(nrow(orders), nrow(reviews), nrow(customers), nrow(items), nrow(products), nrow(cat_trans)),
Columns = c(ncol(orders), ncol(reviews), ncol(customers), ncol(items), ncol(products), ncol(cat_trans)),
Key_Column = c("order_id", "order_id", "customer_id", "order_id", "product_id", "product_category_name")
)
table_info %>%
kbl(caption = "Table 1. Dataset dimensions at import") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE) %>%
column_spec(2, bold = TRUE)| Table | Rows | Columns | Key_Column |
|---|---|---|---|
| orders | 99441 | 8 | order_id |
| reviews | 100000 | 7 | order_id |
| customers | 99441 | 5 | customer_id |
| items | 112650 | 7 | order_id |
| products | 32951 | 9 | product_id |
| cat_trans | 71 | 2 | product_category_name |
na_summary <- data.frame(
Table = "orders",
Column = names(orders),
NA_Count = colSums(is.na(orders) | orders == "")
)
# Key columns for the analysis
key_cols <- data.frame(
Column = c("order_delivered_customer_date",
"order_estimated_delivery_date",
"order_purchase_timestamp",
"review_score",
"customer_state"),
Source_Table = c("orders","orders","orders","reviews","customers"),
NA_Count = c(
sum(is.na(orders$order_delivered_customer_date) | orders$order_delivered_customer_date == ""),
sum(is.na(orders$order_estimated_delivery_date) | orders$order_estimated_delivery_date == ""),
sum(is.na(orders$order_purchase_timestamp) | orders$order_purchase_timestamp == ""),
sum(is.na(reviews$review_score)),
sum(is.na(customers$customer_state) | customers$customer_state == "")
)
)
key_cols %>%
kbl(caption = "Table 2. Missing values in analytically critical columns") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE) %>%
column_spec(3, color = ifelse(key_cols$NA_Count > 0, "#e74c3c", "#27ae60"), bold = TRUE)| Column | Source_Table | NA_Count |
|---|---|---|
| order_delivered_customer_date | orders | 2965 |
| order_estimated_delivery_date | orders | 0 |
| order_purchase_timestamp | orders | 0 |
| review_score | reviews | 0 |
| customer_state | customers | 0 |
status_tbl <- as.data.frame(table(orders$order_status))
names(status_tbl) <- c("Status", "Count")
status_tbl$Pct <- round(status_tbl$Count / sum(status_tbl$Count) * 100, 1)
status_tbl <- status_tbl[order(-status_tbl$Count), ]
status_tbl %>%
kbl(caption = "Table 3. Order status breakdown — analysis restricted to 'delivered'") %>%
kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = FALSE) %>%
row_spec(which(status_tbl$Status == "delivered"), bold = TRUE, background = "#eafaf1")| Status | Count | Pct | |
|---|---|---|---|
| 4 | delivered | 96478 | 97.0 |
| 7 | shipped | 1107 | 1.1 |
| 2 | canceled | 625 | 0.6 |
| 8 | unavailable | 609 | 0.6 |
| 5 | invoiced | 314 | 0.3 |
| 6 | processing | 301 | 0.3 |
| 3 | created | 5 | 0.0 |
| 1 | approved | 2 | 0.0 |
Timestamps are parsed, delivery delay is computed as (actual delivery date − estimated delivery date) in days. A positive value means the parcel arrived after the promised date.
# Parse all timestamps
orders$order_purchase_timestamp <- as.POSIXct(orders$order_purchase_timestamp, format="%Y-%m-%d %H:%M:%S")
orders$order_delivered_customer_date <- as.POSIXct(orders$order_delivered_customer_date, format="%Y-%m-%d %H:%M:%S")
orders$order_estimated_delivery_date <- as.POSIXct(orders$order_estimated_delivery_date, format="%Y-%m-%d %H:%M:%S")
orders$order_approved_at <- as.POSIXct(orders$order_approved_at, format="%Y-%m-%d %H:%M:%S")
# Restrict to delivered orders with valid date information
delivered <- orders %>%
filter(order_status == "delivered",
!is.na(order_delivered_customer_date),
!is.na(order_estimated_delivery_date),
!is.na(order_purchase_timestamp)) %>%
mutate(
delay_days = as.numeric(difftime(order_delivered_customer_date,
order_estimated_delivery_date, units = "days")),
actual_days = as.numeric(difftime(order_delivered_customer_date,
order_purchase_timestamp, units = "days")),
late = delay_days > 0,
delay_bucket = case_when(
delay_days <= -7 ~ "7+ days early",
delay_days <= 0 ~ "0–7 days early",
delay_days <= 7 ~ "1–7 days late",
delay_days <= 14 ~ "8–14 days late",
TRUE ~ "15+ days late"
),
purchase_month = floor_date(order_purchase_timestamp, "month")
)
cat(sprintf("Delivered orders retained for analysis: %d\n", nrow(delivered)))## Delivered orders retained for analysis: 96467
cat(sprintf("Orders with positive delay (late): %d (%.1f%%)\n",
sum(delivered$late), mean(delivered$late)*100))## Orders with positive delay (late): 7824 (8.1%)
We join six tables into a single analytical frame:
orders → reviews → customers → items → products → category translations.
# ── Join 1: orders + reviews ──────────────────────────────────────────────────
df <- delivered %>%
inner_join(
reviews %>% select(order_id, review_score),
by = "order_id"
)
# ── Join 2: + customers ───────────────────────────────────────────────────────
df <- df %>%
inner_join(
customers %>% select(customer_id, customer_state, customer_city),
by = "customer_id"
)
# ── Join 3: items + products + category translation ───────────────────────────
items_cat <- items %>%
select(order_id, product_id) %>%
inner_join(products %>% select(product_id, product_category_name, product_weight_g),
by = "product_id") %>%
inner_join(cat_trans, by = "product_category_name") %>%
# Keep only the first item per order to avoid duplication
distinct(order_id, .keep_all = TRUE) %>%
rename(category_en = product_category_name_english)
# ── Join 4: + category info ───────────────────────────────────────────────────
df <- df %>%
left_join(items_cat %>% select(order_id, category_en, product_weight_g),
by = "order_id")
cat(sprintf("Final analytical frame: %d rows × %d columns\n", nrow(df), ncol(df)))## Final analytical frame: 97004 rows × 18 columns
Join Lineage
orders → (order_id) →
reviews
orders → (customer_id) →
customers
items → (product_id) →
products → (product_category_name) → cat_trans
→ (order_id) → main frame
All joins are inner joins except the
final category enrichment (left join to preserve orders with no matching
item).
global <- df %>%
summarise(
n_orders = n(),
pct_late = round(mean(late, na.rm=TRUE)*100, 1),
avg_delay = round(mean(delay_days, na.rm=TRUE), 1),
median_delay = round(median(delay_days, na.rm=TRUE), 1),
avg_actual_days= round(mean(actual_days, na.rm=TRUE), 1),
avg_score = round(mean(review_score, na.rm=TRUE), 2),
avg_score_late = round(mean(review_score[late], na.rm=TRUE), 2),
avg_score_ontime= round(mean(review_score[!late], na.rm=TRUE), 2),
score_diff = round(mean(review_score[!late], na.rm=TRUE) -
mean(review_score[late], na.rm=TRUE), 2)
)
global %>%
tidyr::pivot_longer(everything(), names_to="Metric", values_to="Value") %>%
mutate(Metric = recode(Metric,
n_orders = "Total delivered orders",
pct_late = "% orders arriving late",
avg_delay = "Avg delay (days, negative = early)",
median_delay = "Median delay (days)",
avg_actual_days= "Avg actual delivery time (days)",
avg_score = "Overall avg review score",
avg_score_late = "Avg review score — late orders",
avg_score_ontime= "Avg review score — on-time orders",
score_diff = "Score gap (on-time minus late)"
)) %>%
kbl(caption = "Table 4. Global KPIs from the merged analytical frame") %>%
kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = FALSE) %>%
row_spec(c(8,9), bold = TRUE, background = "#fef9e7")| Metric | Value |
|---|---|
| Total delivered orders | 97004.00 |
| % orders arriving late | 8.10 |
| Avg delay (days, negative = early) | -11.20 |
| Median delay (days) | -11.90 |
| Avg actual delivery time (days) | 12.60 |
| Overall avg review score | 4.14 |
| Avg review score — late orders | 2.55 |
| Avg review score — on-time orders | 4.28 |
| Score gap (on-time minus late) | 1.74 |
score_dist <- df %>%
mutate(
Delivery = ifelse(late, "Late", "On-Time"),
review_score = factor(review_score, levels = 5:1)
) %>%
count(review_score, Delivery)
ggplot(score_dist, aes(x = review_score, y = n, fill = Delivery)) +
geom_col(position = "dodge", width = 0.65, colour = "white") +
scale_fill_manual(values = c("Late" = col_late, "On-Time" = col_ontime),
name = "Delivery status") +
scale_y_continuous(labels = comma_format()) +
labs(
title = "Review Score Distribution: Late vs On-Time Orders",
subtitle = "Late deliveries collapse 5-star ratings and amplify 1-star scores",
x = "Review Score (stars)",
y = "Number of Orders",
caption = "Source: Olist order and review datasets (2016–2018)"
) +
theme_olistFigure 1. Review score volume split by delivery punctuality
We group orders into five delay categories and compute mean review score for each bucket. The gradient is decisive.
bucket_order <- c("7+ days early","0–7 days early","1–7 days late","8–14 days late","15+ days late")
bucket_summary <- df %>%
filter(!is.na(delay_bucket), !is.na(review_score)) %>%
mutate(delay_bucket = factor(delay_bucket, levels = bucket_order)) %>%
group_by(delay_bucket) %>%
summarise(
avg_score = mean(review_score, na.rm=TRUE),
n = n(),
se = sd(review_score, na.rm=TRUE) / sqrt(n)
)
ggplot(bucket_summary, aes(x = delay_bucket, y = avg_score, fill = avg_score)) +
geom_col(width = 0.65, colour = "white") +
geom_errorbar(aes(ymin = avg_score - 1.96*se, ymax = avg_score + 1.96*se),
width = 0.2, colour = "#2c3e50", linewidth = 0.5) +
geom_text(aes(label = sprintf("%.2f\n(n=%s)", avg_score, comma(n))),
vjust = -0.4, size = 3.3, colour = "#2c3e50", fontface = "bold") +
scale_fill_gradient(low = col_late, high = col_ontime, name = "Avg score", guide = "none") +
scale_y_continuous(limits = c(0, 5.4), breaks = 1:5) +
labs(
title = "Customer Satisfaction Degrades Monotonically with Delivery Lateness",
subtitle = "95% confidence intervals shown. Each bucket represents a distinct customer experience tier.",
x = "Delivery Timing Bucket",
y = "Average Review Score (1–5)",
caption = "Source: Olist orders and reviews datasets"
) +
theme_olist +
theme(axis.text.x = element_text(angle = 15, hjust = 1))Figure 2. Average review score by delivery delay category
Key observation: The score drop is not a cliff edge only at extreme lateness — it begins immediately. Even orders arriving 1–7 days late score 0.87 points lower than those arriving 0–7 days early. The pattern is monotonic: every extra day of lateness costs satisfaction.
monthly <- df %>%
filter(!is.na(purchase_month),
purchase_month >= as.POSIXct("2017-01-01"),
purchase_month < as.POSIXct("2019-01-01")) %>%
group_by(purchase_month) %>%
summarise(
pct_ontime = (1 - mean(late, na.rm=TRUE)) * 100,
avg_score = mean(review_score, na.rm=TRUE),
n = n()
) %>%
filter(n >= 50) # drop months with very few orders
p1 <- ggplot(monthly, aes(x = purchase_month)) +
geom_line(aes(y = pct_ontime), colour = col_ontime, linewidth = 1.2) +
geom_point(aes(y = pct_ontime, size = n), colour = col_ontime, alpha = 0.7) +
scale_y_continuous(limits = c(80, 100), labels = function(x) paste0(x, "%")) +
scale_size_continuous(range = c(2, 6), guide = "none") +
scale_x_datetime(date_labels = "%b %y", date_breaks = "3 months") +
labs(title = "On-Time Delivery Rate by Month",
x = NULL, y = "% on-time") +
theme_olist
p2 <- ggplot(monthly, aes(x = purchase_month)) +
geom_line(aes(y = avg_score), colour = col_primary, linewidth = 1.2) +
geom_point(aes(y = avg_score, size = n), colour = col_primary, alpha = 0.7) +
scale_y_continuous(limits = c(3.8, 4.5)) +
scale_size_continuous(range = c(2, 6), guide = "none") +
scale_x_datetime(date_labels = "%b %y", date_breaks = "3 months") +
labs(title = "Average Review Score by Month",
x = "Purchase Month", y = "Avg Review Score",
caption = "Point size ∝ order volume · Source: Olist datasets") +
theme_olist
p1 / p2Figure 3. Monthly delivery on-time rate and customer score (2017–2018)
state_summary <- df %>%
filter(!is.na(customer_state)) %>%
group_by(customer_state) %>%
summarise(
pct_late = mean(late, na.rm=TRUE) * 100,
avg_score = mean(review_score, na.rm=TRUE),
avg_delay = mean(delay_days, na.rm=TRUE),
n = n()
) %>%
filter(n >= 100) %>%
arrange(desc(pct_late)) %>%
mutate(customer_state = reorder(customer_state, pct_late))
ggplot(state_summary, aes(x = customer_state, y = pct_late)) +
geom_segment(aes(xend = customer_state, y = 0, yend = pct_late,
colour = pct_late), linewidth = 1.2) +
geom_point(aes(colour = pct_late, size = n)) +
geom_hline(yintercept = mean(state_summary$pct_late), linetype = "dashed",
colour = col_neutral, linewidth = 0.8) +
annotate("text", x = 3, y = mean(state_summary$pct_late) + 0.4,
label = paste0("National avg: ", round(mean(state_summary$pct_late),1), "%"),
colour = col_neutral, size = 3.2) +
scale_colour_gradient(low = col_ontime, high = col_late,
name = "% late", labels = function(x) paste0(x, "%")) +
scale_size_continuous(range = c(2, 7), name = "Order count") +
scale_y_continuous(labels = function(x) paste0(x, "%")) +
coord_flip() +
labs(
title = "Late Delivery Rate by Brazilian State",
subtitle = "Dashed line = national average. AL, MA, and PI stand out as high-risk regions.",
x = NULL,
y = "% of Orders Arriving Late",
caption = "Source: Olist orders, customers datasets · States with < 100 orders excluded"
) +
theme_olist +
theme(legend.position = "right")Figure 4. Late delivery rate by Brazilian state (min. 100 orders)
ggplot(state_summary, aes(x = pct_late, y = avg_score, label = customer_state)) +
geom_point(aes(size = n, colour = pct_late), alpha = 0.85) +
geom_text(nudge_y = 0.04, size = 3, colour = "#2c3e50", fontface = "bold") +
geom_smooth(method = "lm", se = TRUE, colour = col_late, fill = "#fadbd8",
linetype = "dashed", linewidth = 1) +
scale_colour_gradient(low = col_ontime, high = col_late,
name = "% late", labels = function(x) paste0(x, "%")) +
scale_size_continuous(range = c(3, 10), guide = "none") +
scale_x_continuous(labels = function(x) paste0(x, "%")) +
labs(
title = "Higher Late Rate → Lower Satisfaction (State Level)",
subtitle = "Each bubble is a Brazilian state; size = order volume. Regression line with 95% CI.",
x = "% Orders Arriving Late",
y = "Average Review Score",
caption = "Source: Olist datasets · Pearson r computed at order level"
) +
theme_olistFigure 5. State-level trade-off between late rate and satisfaction score
cat_summary <- df %>%
filter(!is.na(category_en)) %>%
group_by(category_en) %>%
summarise(
pct_late = mean(late, na.rm=TRUE) * 100,
avg_score = mean(review_score, na.rm=TRUE),
n = n()
) %>%
filter(n >= 200) %>%
arrange(desc(pct_late)) %>%
slice(c(1:12, (n()-3):n())) %>% # top 12 worst + 4 best for comparison
mutate(
group = ifelse(row_number() <= 12, "Highest late rate", "Lowest late rate"),
category_en = reorder(category_en, pct_late)
)
ggplot(cat_summary, aes(x = category_en, y = pct_late, fill = group)) +
geom_col(width = 0.7, colour = "white") +
geom_text(aes(label = paste0(round(pct_late,1), "%")),
hjust = -0.15, size = 3, colour = "#2c3e50") +
scale_fill_manual(values = c("Highest late rate" = col_late, "Lowest late rate" = col_ontime),
name = NULL) +
scale_y_continuous(limits = c(0, 22), labels = function(x) paste0(x, "%")) +
coord_flip() +
labs(
title = "Late Delivery Rate by Product Category",
subtitle = "Top 12 worst-performing vs 4 best-performing categories (min. 200 orders each)",
x = NULL,
y = "% Orders Arriving Late",
caption = "Source: Olist order items, products, category translation, orders datasets"
) +
theme_olist +
theme(legend.position = "top")Figure 6. Product categories ranked by late delivery rate (min. 200 orders)
cat_full <- df %>%
filter(!is.na(category_en)) %>%
group_by(category_en) %>%
summarise(
Orders = n(),
Pct_Late = paste0(round(mean(late, na.rm=TRUE)*100, 1), "%"),
Avg_Score = round(mean(review_score, na.rm=TRUE), 2),
Avg_Delay = round(mean(delay_days, na.rm=TRUE), 1)
) %>%
filter(as.numeric(gsub(",","",Orders)) >= 200) %>%
arrange(desc(Pct_Late)) %>%
rename(`Category` = category_en, `N Orders` = Orders,
`% Late` = Pct_Late, `Avg Score` = Avg_Score,
`Avg Delay (days)` = Avg_Delay) %>%
head(15)
cat_full %>%
kbl(caption = "Table 5. Top 15 product categories by late delivery rate (≥200 orders)") %>%
kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = TRUE) %>%
column_spec(3, bold = TRUE, color = col_late) %>%
column_spec(4, bold = TRUE, color = col_primary)| Category | N Orders | % Late | Avg Score | Avg Delay (days) |
|---|---|---|---|---|
| electronics | 2509 | 9.8% | 4.11 | -10.3 |
| baby | 2771 | 9.3% | 4.11 | -10.7 |
| office_furniture | 1251 | 9.3% | 3.64 | -11.0 |
| construction_tools_construction | 729 | 9.2% | 4.12 | -10.3 |
| construction_tools_lights | 232 | 9.1% | 4.18 | -10.6 |
| health_beauty | 8662 | 9% | 4.22 | -11.3 |
| bed_bath_table | 9287 | 8.9% | 3.99 | -10.6 |
| musical_instruments | 608 | 8.9% | 4.22 | -10.8 |
| furniture_living_room | 412 | 8.7% | 4.07 | -10.9 |
| auto | 3808 | 8.6% | 4.14 | -10.5 |
| watches_gifts | 5480 | 8.6% | 4.11 | -11.1 |
| furniture_decor | 6268 | 8.5% | 4.06 | -11.6 |
| telephony | 4080 | 8.5% | 4.05 | -10.5 |
| industry_commerce_and_business | 229 | 8.3% | 4.21 | -11.5 |
| consoles_games | 1015 | 8.2% | 4.16 | -10.8 |
cor_val <- cor(df$delay_days, df$review_score, use = "complete.obs")
cor_test <- cor.test(df$delay_days, df$review_score, use = "complete.obs")
cat(sprintf("Pearson correlation (delay_days ~ review_score): r = %.4f\n", cor_val))## Pearson correlation (delay_days ~ review_score): r = -0.2688
cat(sprintf("p-value: < 2.2e-16 (sample size n = %d)\n", sum(!is.na(df$delay_days) & !is.na(df$review_score))))## p-value: < 2.2e-16 (sample size n = 97004)
cat(sprintf("Interpretation: A negative correlation of %.2f indicates that as lateness increases,\nreview scores systematically decline.\n", cor_val))## Interpretation: A negative correlation of -0.27 indicates that as lateness increases,
## review scores systematically decline.
comparison <- df %>%
group_by(Delivery_Status = ifelse(late, "Late", "On-Time")) %>%
summarise(
N = n(),
Mean_Score = round(mean(review_score, na.rm=TRUE), 3),
Median_Score = median(review_score, na.rm=TRUE),
SD_Score = round(sd(review_score, na.rm=TRUE), 3),
Pct_1_star = paste0(round(mean(review_score == 1, na.rm=TRUE)*100, 1), "%"),
Pct_5_star = paste0(round(mean(review_score == 5, na.rm=TRUE)*100, 1), "%")
)
comparison %>%
kbl(caption = "Table 6. Review score statistics by delivery punctuality") %>%
kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = FALSE) %>%
row_spec(1, background = "#fce4e4") %>%
row_spec(2, background = "#eafaf1")| Delivery_Status | N | Mean_Score | Median_Score | SD_Score | Pct_1_star | Pct_5_star |
|---|---|---|---|---|---|---|
| Late | 7863 | 2.546 | 2 | 1.654 | 46.7% | 21.8% |
| On-Time | 89141 | 4.283 | 5 | 1.159 | 6.8% | 62.2% |
Late orders average a review score of 2.55 vs 4.28 for on-time orders — a gap of 1.73 points on a 5-point scale. Late orders are also 4× more likely to receive a 1-star review. With 8.1% of orders arriving late across ~96,000 deliveries, this represents a structurally embedded drag on platform reputation. Reducing the late rate by even half would meaningfully shift the platform’s aggregate NPS signal.
Caution: The analysis is observational. It is possible that other factors correlated with lateness (e.g. product type, seller reliability, distance) partially drive the score gap rather than lateness alone.
States like AL (23.9% late), MA (19.6%), PI (15.9%), CE (15.4%) and SE (15.2%) significantly exceed the national average of ~8%. These states also show lower average review scores. The geographic pattern suggests that the last-mile logistics network in Brazil’s North and North-East regions is a constraint. Targeted carrier partnerships or regional fulfilment hubs in these areas would have disproportionate impact relative to the affected customer base.
Caution: These states also have smaller order volumes, so the late-rate estimates carry higher statistical uncertainty. The underlying cause may be carrier capacity, road infrastructure, or distance from fulfilment centres rather than anything Olist can directly control.
The average actual delivery time is 12.6 days, but the median delay relative to the estimate is negative (i.e. most orders arrive early). This means the core issue is not overall speed but reliability of the promised date. Customers who receive a parcel 8 days early are moderately happy; customers who receive it 3 days late leave negative reviews. Olist’s lever here may be better delivery-date promise calibration (setting more conservative estimates that are reliably met) rather than necessarily accelerating logistics operations end-to-end.
Caution: We do not have visibility into whether Olist or the seller sets the delivery estimate. The appropriate corrective action differs substantially between these two scenarios.
The analysis demonstrates a correlation between late delivery and lower review scores, but does not establish causation via controlled experiment. Review scores may also reflect product quality, seller communication, packaging, and return experiences — none of which are captured in these tables. A review written after a late delivery might primarily be about product disappointment rather than the delay itself. A regression model controlling for product category, order value, and seller identity would improve causal confidence, but would require additional seller-level data not available here.
The dataset covers 2016–2018 and represents a specific growth phase of Brazilian e-commerce. Logistical infrastructure, carrier networks, and customer expectations have evolved substantially since then. State-level findings (especially for low-volume states like AL with 401 orders or TO with 274 orders) carry high sampling uncertainty and should not be used to make multi-million dollar capital allocation decisions without refreshed data. Additionally, ~3,500 orders were dropped due to missing delivery dates, and orders with statuses other than “delivered” (cancellations, returns) are excluded — this may introduce survivor bias if late orders are more likely to be cancelled.
This report was structured and drafted with AI assistance (Claude, Anthropic). The following transparency disclosures apply:
What AI contributed: narrative framing, ggplot2 code structure, section organisation, and identification of candidate insights from descriptive statistics.
What AI did not do: modify or simulate the underlying data, select which data points to include or exclude, or make probabilistic claims beyond what the descriptive statistics support.
Verification checklist: - ✅ All figures can be
reproduced by re-knitting olist_delivery_analysis.Rmd with
the original six CSV files - ✅ No imputation or synthetic data
generation was applied; NAs are excluded with explicit
na.rm = TRUE flags - ✅ Correlation is Pearson on raw
observations, not aggregated means (which would inflate r) - ⚠️
Causality is explicitly not claimed — the report uses language like
“associated with” and “correlated” throughout - ⚠️ Human review is
recommended before sharing insights with Olist operations or logistics
teams, particularly for state-level investment recommendations given the
small-n concern
Recommended next steps for validation: A/B test or difference-in-differences analysis using a period when Olist improved carrier SLAs in specific states; regression analysis controlling for product category and seller tier; sentiment analysis of free-text review comments to disentangle delivery complaints from product complaints.
## Report generated on: 2026-06-16 07:52:22 CEST
## R version: R version 4.6.0 (2026-04-24 ucrt)
cat(sprintf("Key packages: ggplot2 %s | dplyr %s | patchwork %s\n",
as.character(packageVersion("ggplot2")),
as.character(packageVersion("dplyr")),
as.character(packageVersion("patchwork"))))## Key packages: ggplot2 4.0.3 | dplyr 1.2.1 | patchwork 1.3.2
End of Report · Olist Delivery Performance Analysis · Reproducible R Markdown