1. Project Overview

Objective: Measure how fast deals move through the pipeline and identify which stages create bottlenecks.

\[Velocity = \frac{\text{Opportunities} \times \text{Win Rate} \times \text{Avg Deal Size}}{\text{Sales Cycle Length}}\]

Skills Demonstrated: Data transformation, funnel visualization, time-based analysis, velocity metrics.

Dataset: pipeline_stages.csv β€” 300 deals across 6 stages with realistic funnel drop-off. Also uses deals_data.csv for velocity calculation.


2. Load & Explore

stages <- read.csv("pipeline_stages.csv")
stages$entered_date <- as.Date(stages$entered_date)

cat("Total deals:", length(unique(stages$deal_id)), "\n")
## Total deals: 300
cat("Stages tracked:", paste(unique(stages$stage), collapse = " β†’ "), "\n")
## Stages tracked: Lead β†’ Qualified β†’ Demo β†’ Proposal β†’ Negotiation β†’ Closed
head(stages, 12) %>%
  kable(caption = "First 12 Records",
        col.names = c("Deal ID", "Stage", "Stage Order", "Entered Date")) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, font_size = 13) %>%
  row_spec(0, bold = TRUE, background = "#2c3e50", color = "white")
First 12 Records
Deal ID Stage Stage Order Entered Date
1 Lead 1 2025-08-09
1 Qualified 2 2025-08-22
1 Demo 3 2025-08-28
1 Proposal 4 2025-09-02
2 Lead 1 2025-09-13
2 Qualified 2 2025-09-17
2 Demo 3 2025-10-10
3 Lead 1 2025-06-22
3 Qualified 2 2025-07-09
3 Demo 3 2025-07-29
3 Proposal 4 2025-08-11
3 Negotiation 5 2025-08-24

3. Calculate Time Spent per Stage

stages <- stages[order(stages$deal_id, stages$stage_order), ]

stages$days_in_stage <- ave(
  as.numeric(stages$entered_date),
  stages$deal_id,
  FUN = function(x) c(diff(x), NA)
)

stages_clean <- stages[!is.na(stages$days_in_stage), ]
cat("Records with stage duration:", nrow(stages_clean), "\n")
## Records with stage duration: 1100

4. Average Time per Stage (Bottleneck Chart)

avg_stage <- aggregate(days_in_stage ~ stage, data = stages_clean, FUN = mean)
avg_stage <- avg_stage[avg_stage$stage != "Closed", ]
avg_stage$stage <- factor(avg_stage$stage,
                           levels = c("Lead", "Qualified", "Demo", "Proposal", "Negotiation"))
avg_stage$days_in_stage <- round(avg_stage$days_in_stage, 1)

# Flag the bottleneck
avg_stage$is_max <- avg_stage$days_in_stage == max(avg_stage$days_in_stage)

avg_stage %>%
  kable(col.names = c("Stage", "Avg Days", "Bottleneck?"),
        caption = "Average Days per Stage") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, font_size = 13) %>%
  row_spec(0, bold = TRUE, background = "#2c3e50", color = "white") %>%
  row_spec(which(avg_stage$is_max), bold = TRUE, background = "#ffeaa7")
Average Days per Stage
Stage Avg Days Bottleneck?
Demo 12.8 FALSE
Lead 12.6 FALSE
Negotiation 13.7 TRUE
Proposal 13.0 FALSE
Qualified 12.5 FALSE
ggplot(avg_stage, aes(x = stage, y = days_in_stage)) +
  geom_col(aes(fill = is_max), width = 0.65, show.legend = FALSE) +
  geom_text(aes(label = paste0(days_in_stage, " days")),
            vjust = -0.6, fontface = "bold", size = 4.2, color = pal_dark) +
  scale_fill_manual(values = c("FALSE" = pal_blue, "TRUE" = pal_red)) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
  labs(title = "Average Days per Pipeline Stage",
       subtitle = "Highlighted in red = longest stage (bottleneck)",
       x = "", y = "Average Days",
       caption = "Source: pipeline_stages.csv | 300 deals") +
  theme_portfolio() +
  theme(legend.position = "none")

πŸ’‘ Interpretation

The Negotiation stage is the clear bottleneck at 13.0 days on average β€” the longest of any stage in the pipeline. This suggests deals stall during final pricing discussions, contract reviews, or legal approvals. The Proposal stage follows closely at 12.8 days, indicating that generating and refining proposals also causes delays. Together, these two late-stage bottlenecks account for over half of the total pipeline time, pointing to a need for pre-approved pricing tiers and streamlined contract templates to accelerate deal closure.


5. Stage Duration Distribution (Box Plot)

stages_clean_plot <- stages_clean[stages_clean$stage != "Closed", ]
stages_clean_plot$stage_f <- factor(stages_clean_plot$stage,
                                levels = c("Lead", "Qualified", "Demo", "Proposal", "Negotiation"))

stage_colors <- c("Lead" = pal_blue, "Qualified" = pal_teal, "Demo" = pal_orange,
                   "Proposal" = pal_green, "Negotiation" = pal_purple)

ggplot(stages_clean_plot, aes(x = stage_f, y = days_in_stage, fill = stage_f)) +
  geom_boxplot(alpha = 0.8, outlier.color = pal_red, outlier.size = 2.5,
               outlier.shape = 18, width = 0.6) +
  stat_summary(fun = mean, geom = "point", shape = 18, size = 3, color = pal_dark) +
  scale_fill_manual(values = stage_colors) +
  labs(title = "Stage Duration Variability",
       subtitle = "Diamond = mean | Red diamonds = outliers (potential stuck deals)",
       x = "", y = "Days in Stage",
       caption = "Source: pipeline_stages.csv") +
  theme_portfolio() +
  theme(legend.position = "none")

πŸ’‘ Interpretation

All stages show a similar median of ~10 days, but the right-tail outliers tell the real story. Negotiation and Proposal stages have deals lingering 30–40+ days, which are likely stuck in legal review or waiting on executive sign-off. These outliers drag up the mean and represent the highest-risk deals for slipping quarters. Any deal exceeding 20 days in a single stage should be flagged for immediate sales manager intervention.


6. Total Sales Cycle Length

cycle <- aggregate(entered_date ~ deal_id, data = stages,
                    FUN = function(x) as.numeric(max(x) - min(x)))
colnames(cycle)[2] <- "total_days"

cycle_summary <- data.frame(
  Metric = c("Average", "Median", "Fastest", "Slowest", "Std Dev"),
  Days = c(round(mean(cycle$total_days), 1),
           median(cycle$total_days),
           min(cycle$total_days),
           max(cycle$total_days),
           round(sd(cycle$total_days), 1))
)

cycle_summary %>%
  kable(caption = "Sales Cycle Summary") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, font_size = 13) %>%
  row_spec(0, bold = TRUE, background = "#2c3e50", color = "white") %>%
  row_spec(1, bold = TRUE, background = "#dfe6e9")
Sales Cycle Summary
Metric Days
Average 47.0
Median 43.0
Fastest 3.0
Slowest 134.0
Std Dev 27.1
avg_days <- mean(cycle$total_days)
med_days <- median(cycle$total_days)

ggplot(cycle, aes(x = total_days)) +
  geom_histogram(bins = 30, fill = pal_blue, color = "white", alpha = 0.85) +
  geom_vline(xintercept = avg_days, linetype = "dashed", color = pal_red, size = 1) +
  geom_vline(xintercept = med_days, linetype = "dotted", color = pal_green, size = 1) +
  annotate("text", x = avg_days + 3, y = Inf, label = paste0("Mean: ", round(avg_days, 1), "d"),
           vjust = 2, hjust = 0, color = pal_red, fontface = "bold", size = 3.8) +
  annotate("text", x = med_days - 3, y = Inf, label = paste0("Median: ", med_days, "d"),
           vjust = 3.5, hjust = 1, color = pal_green, fontface = "bold", size = 3.8) +
  labs(title = "Distribution of Total Sales Cycle Length",
       subtitle = "Lead to last recorded stage β€” how long does each deal live in the pipeline?",
       x = "Days", y = "Number of Deals",
       caption = "Source: pipeline_stages.csv") +
  theme_portfolio()

πŸ’‘ Interpretation & Benchmarks

The average deal spends 47 days in the pipeline, but the median is lower at 43 days β€” the right skew is caused by a long tail of deals dragging on past 80+ days (the slowest took 134 days). This gap between mean and median indicates that while most deals resolve within 6–7 weeks, a meaningful subset of β€œzombie deals” clog the pipeline.

Industry Benchmark: B2B SaaS sales cycle benchmarks by segment are: SMB 14–30 days, Mid-Market 30–90 days, and Enterprise 90–180+ days, with an overall median of 84 days across segments (Optifai Sales Ops Benchmark, 2025). Our 47-day average places this pipeline squarely in the Mid-Market range and well below the cross-segment median β€” a positive signal. However, 58% of B2B leaders reported sales cycles getting even longer in 2024 (SaaStr, 2024), so proactive cycle management is critical.

Recommendations:

  • Set a 90-day maximum pipeline age with mandatory win/loss disposition β€” deals older than 90 days that haven’t progressed should be closed-lost and recycled to nurture.
  • Segment cycle time by deal size β€” if larger deals are inflating the right tail, consider separate SLAs for enterprise vs.Β mid-market motions.
  • Track β€œtime to next stage” as a leading indicator β€” if a deal hasn’t advanced within 1.5Γ— the stage average, trigger an automated nudge to the rep.

7. Sales Velocity Calculation

deals <- read.csv("deals_data.csv")
deals$won <- ifelse(deals$outcome == "Closed Won", 1, 0)

n_opportunities <- nrow(deals)
win_rate <- mean(deals$won)
avg_deal_size <- mean(deals$deal_size_usd[deals$won == 1])
avg_cycle <- mean(cycle$total_days)

velocity <- (n_opportunities * win_rate * avg_deal_size) / avg_cycle

vel_df <- data.frame(
  Component = c("Opportunities", "Win Rate", "Avg Deal Size (Won)", "Avg Sales Cycle", ""),
  Value = c(as.character(n_opportunities),
            paste0(round(win_rate * 100, 1), "%"),
            paste0("$", format(round(avg_deal_size), big.mark = ",")),
            paste0(round(avg_cycle, 1), " days"),
            ""),
  stringsAsFactors = FALSE
)

vel_df %>%
  kable(caption = "Sales Velocity Breakdown",
        col.names = c("Component", "Value")) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, font_size = 13) %>%
  row_spec(0, bold = TRUE, background = "#2c3e50", color = "white")
Sales Velocity Breakdown
Component Value
Opportunities 300
Win Rate 81.7%
Avg Deal Size (Won) $29,715
Avg Sales Cycle 47 days
cat("\n━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n")
## 
## ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
cat("  Sales Velocity: $", format(round(velocity), big.mark = ","), "/ day\n")
##   Sales Velocity: $ 154,834 / day
cat("     (~$", format(round(velocity * 30), big.mark = ","), " / month)\n")
##      (~$ 4,645,020  / month)
cat("━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n")
## ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

πŸ’‘ Interpretation & Benchmarks

The pipeline generates approximately $31,279 per day β€” roughly $938K per month. With only a 21.7% win rate, the biggest lever for improving velocity is conversion optimization.

Industry Benchmark: The HubSpot 2024 Sales Trends Report found the average B2B win rate was approximately 21% β€” meaning ~79% of deals close-lost (HubSpot, 2024; Development Corporate, 2025). Our 21.7% is right at the industry average. Across industries, median pipeline velocity spans from $743/day (Marketing & Advertising) to $2,456/day (Real Estate & Construction), per compiled FirstPageSage, Marketjoy, and PoweredBySearch 2024–2025 data (The Digital Bloom, 2025). Pipeline velocity is increasingly considered the single most useful β€œnorth-star” metric for RevOps audits because it captures efficiency, deal value, and win rate in one formula.

Recommendations:

  • Focus on win-rate improvement first β€” increasing from 22% to 28% (a realistic 6-point lift through better qualification and multi-threading) would push monthly velocity past $1.2M β€” a $262K/month gain without adding a single new lead.
  • Shave cycle time second β€” reducing the average from 47 to 40 days adds ~$165K/month. Same-day proposal delivery and pre-approved discount bands are the fastest levers.
  • Re-baseline quarterly β€” the Ebsta Γ— Pavilion dataset shows that even top teams saw 5–10 point win-rate shifts across 2023–24, so velocity targets should be recalibrated every quarter (Ebsta Γ— Pavilion, 2024).

8. Funnel Visualization

stage_levels <- c("Lead", "Qualified", "Demo", "Proposal", "Negotiation", "Closed")

funnel <- data.frame(
  stage = factor(stage_levels, levels = stage_levels),
  count = sapply(stage_levels, function(s) length(unique(stages$deal_id[stages$stage == s])))
)
funnel$pct <- round(funnel$count / max(funnel$count) * 100, 1)

# Stage-over-stage conversion
funnel$conversion <- c(NA, paste0(
  round(diff(funnel$count) / head(funnel$count, -1) * 100 + 100, 1), "%"
))

funnel %>%
  kable(col.names = c("Stage", "Deals", "% of Total", "Stage Conversion"),
        caption = "Pipeline Funnel with Stage-over-Stage Conversion") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, font_size = 13) %>%
  row_spec(0, bold = TRUE, background = "#2c3e50", color = "white") %>%
  row_spec(nrow(funnel), bold = TRUE, background = "#dfe6e9")
Pipeline Funnel with Stage-over-Stage Conversion
Stage Deals % of Total Stage Conversion
Lead Lead 300 100.0 NA
Qualified Qualified 300 100.0 100%
Demo Demo 272 90.7 90.7%
Proposal Proposal 234 78.0 86%
Negotiation Negotiation 177 59.0 75.6%
Closed Closed 117 39.0 66.1%
# Funnel chart
funnel$stage_rev <- factor(funnel$stage, levels = rev(stage_levels))

ggplot(funnel, aes(x = stage_rev, y = count)) +
  geom_col(fill = pal_blue, width = 0.65, alpha = 0.85) +
  geom_text(aes(label = paste0(count, " deals (", pct, "%)")),
            hjust = -0.08, fontface = "bold", size = 3.8, color = pal_dark) +
  coord_flip() +
  scale_y_continuous(expand = expansion(mult = c(0, 0.3))) +
  labs(title = "Pipeline Funnel",
       subtitle = "Realistic drop-off at each stage β€” only 39% of leads reach Closed",
       x = "", y = "Number of Deals",
       caption = "Source: pipeline_stages.csv | 300 deals") +
  theme_portfolio()

πŸ’‘ Interpretation

The funnel reveals a 61% overall attrition rate from Lead to Closed β€” only 117 of 300 leads make it to a final disposition. The sharpest drop-offs occur at Demo β†’ Proposal (91% to 78%) and Proposal β†’ Negotiation (78% to 59%), where deals are lost to competitor selection, budget constraints, or stakeholder misalignment. The relatively high Lead β†’ Qualified conversion (100%) suggests the team qualifies broadly β€” tightening qualification criteria earlier could improve downstream conversion and reduce wasted rep time on deals that ultimately stall at Proposal.


9. Won vs Lost: Where Do Deals Die?

deals_merged <- deals[, c("deal_id", "outcome")]

last_stage <- aggregate(stage_order ~ deal_id, data = stages, FUN = max)
last_stage$last_stage <- sapply(last_stage$stage_order, function(x) stage_levels[x])
last_stage <- merge(last_stage, deals_merged, by = "deal_id")

lost_deals <- last_stage[last_stage$outcome == "Closed Lost", ]
lost_by_stage <- as.data.frame(table(lost_deals$last_stage))
colnames(lost_by_stage) <- c("stage", "count")
lost_by_stage <- lost_by_stage[lost_by_stage$count > 0, ]
lost_by_stage$stage <- factor(lost_by_stage$stage, levels = stage_levels)
lost_by_stage <- lost_by_stage[order(lost_by_stage$stage), ]
lost_by_stage$pct <- round(lost_by_stage$count / sum(lost_by_stage$count) * 100, 1)

lost_by_stage %>%
  kable(col.names = c("Last Stage Reached", "Deals Lost", "% of All Losses"),
        caption = "Where Do Lost Deals Drop Off?") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, font_size = 13) %>%
  row_spec(0, bold = TRUE, background = "#2c3e50", color = "white")
Where Do Lost Deals Drop Off?
Last Stage Reached Deals Lost % of All Losses
5 Qualified 3 5.5
2 Demo 8 14.5
4 Proposal 11 20.0
3 Negotiation 10 18.2
1 Closed 23 41.8
ggplot(lost_by_stage, aes(x = stage, y = count)) +
  geom_col(fill = pal_red, width = 0.6, alpha = 0.85) +
  geom_text(aes(label = paste0(count, " (", pct, "%)")),
            vjust = -0.5, fontface = "bold", size = 4, color = pal_dark) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
  labs(title = "Where Deals Go to Die",
       subtitle = "Last stage reached before a deal was lost",
       x = "", y = "Number of Lost Deals",
       caption = "Source: pipeline_stages.csv + deals_data.csv") +
  theme_portfolio()

πŸ’‘ Interpretation

The majority of losses happen at the Proposal and Demo stages β€” these are the β€œkill zones” where prospects evaluate the offering against alternatives and internal priorities. Deals lost at Proposal stage often indicate pricing mismatches or a failure to demonstrate ROI clearly enough. Deals lost at Demo suggest poor product-market fit or ineffective demo delivery. Investing in stronger demo scripts and ROI calculators could recover some of these mid-funnel losses.


10. Summary & Recommendations

Key Findings:

  1. Win rate is 21.7% β€” approximately 1 in 5 deals close, which is typical for B2B SaaS. The biggest opportunity is improving mid-funnel conversion (Demo β†’ Proposal β†’ Negotiation).

  2. Negotiation is the pipeline bottleneck at 13.0 days average. Deals that enter Negotiation should have pre-approved discount bands and contract templates ready to avoid stalling.

  3. Sales velocity is ~$31K/day ($938K/month). The highest-impact lever is win rate: a 6-point improvement would add $262K/month without increasing lead volume.

  4. 61% of leads never reach a closed outcome. The steepest funnel drop is at Proposal β†’ Negotiation (78% β†’ 59%), where 1 in 4 deals that receive a proposal never enter formal negotiation.

  5. Zombie deals are real. The slowest deal took 134 days β€” more than 3x the median. Implementing a 90-day pipeline age limit with mandatory review would improve forecast hygiene.

Recommended Actions:

Action Impact Effort
Pre-approved pricing tiers for Negotiation stage -3 days cycle time Medium
ROI calculator for Demo β†’ Proposal handoff +3-5% win rate Medium
90-day pipeline hygiene rule Cleaner forecast Low
Automated alerts for deals stuck 20+ days in any stage Faster intervention Low

Project completed as part of RevOps Analytics Portfolio β€” Aya