1. Project Overview

Objective: Measure how fast deals move through the pipeline and identify which stages create bottlenecks.

\[Velocity = \frac{\text{Opportunities} \times \text{Win Rate} \times \text{Avg Deal Size}}{\text{Sales Cycle Length}}\]

Skills Demonstrated: Data transformation, funnel visualization, time-based analysis, velocity metrics.

Dataset: pipeline_stages.csv — 300 deals across 6 stages with realistic funnel drop-off. Also uses deals_data.csv for velocity calculation.

2. Load & Explore

stages <- read.csv("pipeline_stages.csv")
stages$entered_date <- as.Date(stages$entered_date)

cat("Total deals:", length(unique(stages$deal_id)), "\n")

## Total deals: 300

cat("Stages tracked:", paste(unique(stages$stage), collapse = " → "), "\n")

## Stages tracked: Lead → Qualified → Demo → Proposal → Negotiation → Closed

head(stages, 12) %>%
  kable(caption = "First 12 Records",
        col.names = c("Deal ID", "Stage", "Stage Order", "Entered Date")) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, font_size = 13) %>%
  row_spec(0, bold = TRUE, background = "#2c3e50", color = "white")

First 12 Records
Deal ID	Stage	Stage Order	Entered Date
1	Lead	1	2025-08-09
1	Qualified	2	2025-08-22
1	Demo	3	2025-08-28
1	Proposal	4	2025-09-02
2	Lead	1	2025-09-13
2	Qualified	2	2025-09-17
2	Demo	3	2025-10-10
3	Lead	1	2025-06-22
3	Qualified	2	2025-07-09
3	Demo	3	2025-07-29
3	Proposal	4	2025-08-11
3	Negotiation	5	2025-08-24

3. Calculate Time Spent per Stage

stages <- stages[order(stages$deal_id, stages$stage_order), ]

stages$days_in_stage <- ave(
  as.numeric(stages$entered_date),
  stages$deal_id,
  FUN = function(x) c(diff(x), NA)
)

stages_clean <- stages[!is.na(stages$days_in_stage), ]
cat("Records with stage duration:", nrow(stages_clean), "\n")

## Records with stage duration: 1100

4. Average Time per Stage (Bottleneck Chart)

avg_stage <- aggregate(days_in_stage ~ stage, data = stages_clean, FUN = mean)
avg_stage <- avg_stage[avg_stage$stage != "Closed", ]
avg_stage$stage <- factor(avg_stage$stage,
                           levels = c("Lead", "Qualified", "Demo", "Proposal", "Negotiation"))
avg_stage$days_in_stage <- round(avg_stage$days_in_stage, 1)

# Flag the bottleneck
avg_stage$is_max <- avg_stage$days_in_stage == max(avg_stage$days_in_stage)

avg_stage %>%
  kable(col.names = c("Stage", "Avg Days", "Bottleneck?"),
        caption = "Average Days per Stage") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, font_size = 13) %>%
  row_spec(0, bold = TRUE, background = "#2c3e50", color = "white") %>%
  row_spec(which(avg_stage$is_max), bold = TRUE, background = "#ffeaa7")

Average Days per Stage
Stage	Avg Days	Bottleneck?
Demo	12.8	FALSE
Lead	12.6	FALSE
Negotiation	13.7	TRUE
Proposal	13.0	FALSE
Qualified	12.5	FALSE

ggplot(avg_stage, aes(x = stage, y = days_in_stage)) +
  geom_col(aes(fill = is_max), width = 0.65, show.legend = FALSE) +
  geom_text(aes(label = paste0(days_in_stage, " days")),
            vjust = -0.6, fontface = "bold", size = 4.2, color = pal_dark) +
  scale_fill_manual(values = c("FALSE" = pal_blue, "TRUE" = pal_red)) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
  labs(title = "Average Days per Pipeline Stage",
       subtitle = "Highlighted in red = longest stage (bottleneck)",
       x = "", y = "Average Days",
       caption = "Source: pipeline_stages.csv | 300 deals") +
  theme_portfolio() +
  theme(legend.position = "none")

💡 Interpretation

The Negotiation stage is the clear bottleneck at 13.0 days on average — the longest of any stage in the pipeline. This suggests deals stall during final pricing discussions, contract reviews, or legal approvals. The Proposal stage follows closely at 12.8 days, indicating that generating and refining proposals also causes delays. Together, these two late-stage bottlenecks account for over half of the total pipeline time, pointing to a need for pre-approved pricing tiers and streamlined contract templates to accelerate deal closure.

5. Stage Duration Distribution (Box Plot)

stages_clean_plot <- stages_clean[stages_clean$stage != "Closed", ]
stages_clean_plot$stage_f <- factor(stages_clean_plot$stage,
                                levels = c("Lead", "Qualified", "Demo", "Proposal", "Negotiation"))

stage_colors <- c("Lead" = pal_blue, "Qualified" = pal_teal, "Demo" = pal_orange,
                   "Proposal" = pal_green, "Negotiation" = pal_purple)

ggplot(stages_clean_plot, aes(x = stage_f, y = days_in_stage, fill = stage_f)) +
  geom_boxplot(alpha = 0.8, outlier.color = pal_red, outlier.size = 2.5,
               outlier.shape = 18, width = 0.6) +
  stat_summary(fun = mean, geom = "point", shape = 18, size = 3, color = pal_dark) +
  scale_fill_manual(values = stage_colors) +
  labs(title = "Stage Duration Variability",
       subtitle = "Diamond = mean | Red diamonds = outliers (potential stuck deals)",
       x = "", y = "Days in Stage",
       caption = "Source: pipeline_stages.csv") +
  theme_portfolio() +
  theme(legend.position = "none")

💡 Interpretation

All stages show a similar median of ~10 days, but the right-tail outliers tell the real story. Negotiation and Proposal stages have deals lingering 30–40+ days, which are likely stuck in legal review or waiting on executive sign-off. These outliers drag up the mean and represent the highest-risk deals for slipping quarters. Any deal exceeding 20 days in a single stage should be flagged for immediate sales manager intervention.

6. Total Sales Cycle Length

cycle <- aggregate(entered_date ~ deal_id, data = stages,
                    FUN = function(x) as.numeric(max(x) - min(x)))
colnames(cycle)[2] <- "total_days"

cycle_summary <- data.frame(
  Metric = c("Average", "Median", "Fastest", "Slowest", "Std Dev"),
  Days = c(round(mean(cycle$total_days), 1),
           median(cycle$total_days),
           min(cycle$total_days),
           max(cycle$total_days),
           round(sd(cycle$total_days), 1))
)

cycle_summary %>%
  kable(caption = "Sales Cycle Summary") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, font_size = 13) %>%
  row_spec(0, bold = TRUE, background = "#2c3e50", color = "white") %>%
  row_spec(1, bold = TRUE, background = "#dfe6e9")

Sales Cycle Summary
Metric	Days
Average	47.0
Median	43.0
Fastest	3.0
Slowest	134.0
Std Dev	27.1

avg_days <- mean(cycle$total_days)
med_days <- median(cycle$total_days)

ggplot(cycle, aes(x = total_days)) +
  geom_histogram(bins = 30, fill = pal_blue, color = "white", alpha = 0.85) +
  geom_vline(xintercept = avg_days, linetype = "dashed", color = pal_red, size = 1) +
  geom_vline(xintercept = med_days, linetype = "dotted", color = pal_green, size = 1) +
  annotate("text", x = avg_days + 3, y = Inf, label = paste0("Mean: ", round(avg_days, 1), "d"),
           vjust = 2, hjust = 0, color = pal_red, fontface = "bold", size = 3.8) +
  annotate("text", x = med_days - 3, y = Inf, label = paste0("Median: ", med_days, "d"),
           vjust = 3.5, hjust = 1, color = pal_green, fontface = "bold", size = 3.8) +
  labs(title = "Distribution of Total Sales Cycle Length",
       subtitle = "Lead to last recorded stage — how long does each deal live in the pipeline?",
       x = "Days", y = "Number of Deals",
       caption = "Source: pipeline_stages.csv") +
  theme_portfolio()

💡 Interpretation & Benchmarks

The average deal spends 47 days in the pipeline, but the median is lower at 43 days — the right skew is caused by a long tail of deals dragging on past 80+ days (the slowest took 134 days). This gap between mean and median indicates that while most deals resolve within 6–7 weeks, a meaningful subset of “zombie deals” clog the pipeline.

Industry Benchmark: B2B SaaS sales cycle benchmarks by segment are: SMB 14–30 days, Mid-Market 30–90 days, and Enterprise 90–180+ days, with an overall median of 84 days across segments (Optifai Sales Ops Benchmark, 2025). Our 47-day average places this pipeline squarely in the Mid-Market range and well below the cross-segment median — a positive signal. However, 58% of B2B leaders reported sales cycles getting even longer in 2024 (SaaStr, 2024), so proactive cycle management is critical.

Recommendations:

Set a 90-day maximum pipeline age with mandatory win/loss disposition — deals older than 90 days that haven’t progressed should be closed-lost and recycled to nurture.

Segment cycle time by deal size — if larger deals are inflating the right tail, consider separate SLAs for enterprise vs. mid-market motions.

Track “time to next stage” as a leading indicator — if a deal hasn’t advanced within 1.5× the stage average, trigger an automated nudge to the rep.

7. Sales Velocity Calculation

deals <- read.csv("deals_data.csv")
deals$won <- ifelse(deals$outcome == "Closed Won", 1, 0)

n_opportunities <- nrow(deals)
win_rate <- mean(deals$won)
avg_deal_size <- mean(deals$deal_size_usd[deals$won == 1])
avg_cycle <- mean(cycle$total_days)

velocity <- (n_opportunities * win_rate * avg_deal_size) / avg_cycle

vel_df <- data.frame(
  Component = c("Opportunities", "Win Rate", "Avg Deal Size (Won)", "Avg Sales Cycle", ""),
  Value = c(as.character(n_opportunities),
            paste0(round(win_rate * 100, 1), "%"),
            paste0("$", format(round(avg_deal_size), big.mark = ",")),
            paste0(round(avg_cycle, 1), " days"),
            ""),
  stringsAsFactors = FALSE
)

vel_df %>%
  kable(caption = "Sales Velocity Breakdown",
        col.names = c("Component", "Value")) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, font_size = 13) %>%
  row_spec(0, bold = TRUE, background = "#2c3e50", color = "white")

Sales Velocity Breakdown
Component	Value
Opportunities	300
Win Rate	81.7%
Avg Deal Size (Won)	$29,715
Avg Sales Cycle	47 days

cat("\n━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n")

## 
## ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

cat("  Sales Velocity: $", format(round(velocity), big.mark = ","), "/ day\n")

##   Sales Velocity: $ 154,834 / day

cat("     (~$", format(round(velocity * 30), big.mark = ","), " / month)\n")

##      (~$ 4,645,020  / month)

cat("━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n")

## ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

💡 Interpretation & Benchmarks

The pipeline generates approximately $31,279 per day — roughly $938K per month. With only a 21.7% win rate, the biggest lever for improving velocity is conversion optimization.

Industry Benchmark: The HubSpot 2024 Sales Trends Report found the average B2B win rate was approximately 21% — meaning ~79% of deals close-lost (HubSpot, 2024; Development Corporate, 2025). Our 21.7% is right at the industry average. Across industries, median pipeline velocity spans from $743/day (Marketing & Advertising) to $2,456/day (Real Estate & Construction), per compiled FirstPageSage, Marketjoy, and PoweredBySearch 2024–2025 data (The Digital Bloom, 2025). Pipeline velocity is increasingly considered the single most useful “north-star” metric for RevOps audits because it captures efficiency, deal value, and win rate in one formula.

Recommendations:

Focus on win-rate improvement first — increasing from 22% to 28% (a realistic 6-point lift through better qualification and multi-threading) would push monthly velocity past $1.2M — a $262K/month gain without adding a single new lead.

Shave cycle time second — reducing the average from 47 to 40 days adds ~$165K/month. Same-day proposal delivery and pre-approved discount bands are the fastest levers.

Re-baseline quarterly — the Ebsta × Pavilion dataset shows that even top teams saw 5–10 point win-rate shifts across 2023–24, so velocity targets should be recalibrated every quarter (Ebsta × Pavilion, 2024).

8. Funnel Visualization

stage_levels <- c("Lead", "Qualified", "Demo", "Proposal", "Negotiation", "Closed")

funnel <- data.frame(
  stage = factor(stage_levels, levels = stage_levels),
  count = sapply(stage_levels, function(s) length(unique(stages$deal_id[stages$stage == s])))
)
funnel$pct <- round(funnel$count / max(funnel$count) * 100, 1)

# Stage-over-stage conversion
funnel$conversion <- c(NA, paste0(
  round(diff(funnel$count) / head(funnel$count, -1) * 100 + 100, 1), "%"
))

funnel %>%
  kable(col.names = c("Stage", "Deals", "% of Total", "Stage Conversion"),
        caption = "Pipeline Funnel with Stage-over-Stage Conversion") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, font_size = 13) %>%
  row_spec(0, bold = TRUE, background = "#2c3e50", color = "white") %>%
  row_spec(nrow(funnel), bold = TRUE, background = "#dfe6e9")

Pipeline Funnel with Stage-over-Stage Conversion
	Stage	Deals	% of Total	Stage Conversion
Lead	Lead	300	100.0	NA
Qualified	Qualified	300	100.0	100%
Demo	Demo	272	90.7	90.7%
Proposal	Proposal	234	78.0	86%
Negotiation	Negotiation	177	59.0	75.6%
Closed	Closed	117	39.0	66.1%

# Funnel chart
funnel$stage_rev <- factor(funnel$stage, levels = rev(stage_levels))

ggplot(funnel, aes(x = stage_rev, y = count)) +
  geom_col(fill = pal_blue, width = 0.65, alpha = 0.85) +
  geom_text(aes(label = paste0(count, " deals (", pct, "%)")),
            hjust = -0.08, fontface = "bold", size = 3.8, color = pal_dark) +
  coord_flip() +
  scale_y_continuous(expand = expansion(mult = c(0, 0.3))) +
  labs(title = "Pipeline Funnel",
       subtitle = "Realistic drop-off at each stage — only 39% of leads reach Closed",
       x = "", y = "Number of Deals",
       caption = "Source: pipeline_stages.csv | 300 deals") +
  theme_portfolio()

💡 Interpretation

The funnel reveals a 61% overall attrition rate from Lead to Closed — only 117 of 300 leads make it to a final disposition. The sharpest drop-offs occur at Demo → Proposal (91% to 78%) and Proposal → Negotiation (78% to 59%), where deals are lost to competitor selection, budget constraints, or stakeholder misalignment. The relatively high Lead → Qualified conversion (100%) suggests the team qualifies broadly — tightening qualification criteria earlier could improve downstream conversion and reduce wasted rep time on deals that ultimately stall at Proposal.

9. Won vs Lost: Where Do Deals Die?

deals_merged <- deals[, c("deal_id", "outcome")]

last_stage <- aggregate(stage_order ~ deal_id, data = stages, FUN = max)
last_stage$last_stage <- sapply(last_stage$stage_order, function(x) stage_levels[x])
last_stage <- merge(last_stage, deals_merged, by = "deal_id")

lost_deals <- last_stage[last_stage$outcome == "Closed Lost", ]
lost_by_stage <- as.data.frame(table(lost_deals$last_stage))
colnames(lost_by_stage) <- c("stage", "count")
lost_by_stage <- lost_by_stage[lost_by_stage$count > 0, ]
lost_by_stage$stage <- factor(lost_by_stage$stage, levels = stage_levels)
lost_by_stage <- lost_by_stage[order(lost_by_stage$stage), ]
lost_by_stage$pct <- round(lost_by_stage$count / sum(lost_by_stage$count) * 100, 1)

lost_by_stage %>%
  kable(col.names = c("Last Stage Reached", "Deals Lost", "% of All Losses"),
        caption = "Where Do Lost Deals Drop Off?") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, font_size = 13) %>%
  row_spec(0, bold = TRUE, background = "#2c3e50", color = "white")

Where Do Lost Deals Drop Off?
	Last Stage Reached	Deals Lost	% of All Losses
5	Qualified	3	5.5
2	Demo	8	14.5
4	Proposal	11	20.0
3	Negotiation	10	18.2
1	Closed	23	41.8

ggplot(lost_by_stage, aes(x = stage, y = count)) +
  geom_col(fill = pal_red, width = 0.6, alpha = 0.85) +
  geom_text(aes(label = paste0(count, " (", pct, "%)")),
            vjust = -0.5, fontface = "bold", size = 4, color = pal_dark) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
  labs(title = "Where Deals Go to Die",
       subtitle = "Last stage reached before a deal was lost",
       x = "", y = "Number of Lost Deals",
       caption = "Source: pipeline_stages.csv + deals_data.csv") +
  theme_portfolio()

💡 Interpretation

The majority of losses happen at the Proposal and Demo stages — these are the “kill zones” where prospects evaluate the offering against alternatives and internal priorities. Deals lost at Proposal stage often indicate pricing mismatches or a failure to demonstrate ROI clearly enough. Deals lost at Demo suggest poor product-market fit or ineffective demo delivery. Investing in stronger demo scripts and ROI calculators could recover some of these mid-funnel losses.

10. Summary & Recommendations

Key Findings:

Win rate is 21.7% — approximately 1 in 5 deals close, which is typical for B2B SaaS. The biggest opportunity is improving mid-funnel conversion (Demo → Proposal → Negotiation).
Negotiation is the pipeline bottleneck at 13.0 days average. Deals that enter Negotiation should have pre-approved discount bands and contract templates ready to avoid stalling.
Sales velocity is ~$31K/day ($938K/month). The highest-impact lever is win rate: a 6-point improvement would add $262K/month without increasing lead volume.
61% of leads never reach a closed outcome. The steepest funnel drop is at Proposal → Negotiation (78% → 59%), where 1 in 4 deals that receive a proposal never enter formal negotiation.
Zombie deals are real. The slowest deal took 134 days — more than 3x the median. Implementing a 90-day pipeline age limit with mandatory review would improve forecast hygiene.

Recommended Actions:

Action	Impact	Effort
Pre-approved pricing tiers for Negotiation stage	-3 days cycle time	Medium
ROI calculator for Demo → Proposal handoff	+3-5% win rate	Medium
90-day pipeline hygiene rule	Cleaner forecast	Low
Automated alerts for deals stuck 20+ days in any stage	Faster intervention	Low

Project completed as part of RevOps Analytics Portfolio — Aya

Sales Velocity & Bottleneck Analysis

Aya Hanouni

1. Project Overview

2. Load & Explore

3. Calculate Time Spent per Stage

4. Average Time per Stage (Bottleneck Chart)

💡 Interpretation

5. Stage Duration Distribution (Box Plot)

💡 Interpretation

6. Total Sales Cycle Length

💡 Interpretation & Benchmarks

7. Sales Velocity Calculation

💡 Interpretation & Benchmarks

8. Funnel Visualization

💡 Interpretation

9. Won vs Lost: Where Do Deals Die?

💡 Interpretation

10. Summary & Recommendations