Objective: Determine how discounting affects win rates and deal revenue. Find the optimal discount range that maximizes wins without leaving money on the table.
Skills Demonstrated: Logistic regression, hypothesis testing, binned analysis, data visualization.
Dataset: deals_data.csv — 300 deals
with discount percentage and win/loss outcome.
deals <- read.csv("deals_data.csv")
deals$won <- ifelse(deals$outcome == "Closed Won", 1, 0)
cat("Discount range:", min(deals$discount_pct), "% to", max(deals$discount_pct), "%\n")
## Discount range: 0 % to 32.5 %
cat("Mean discount:", round(mean(deals$discount_pct), 1), "%\n")
## Mean discount: 13.2 %
cat("Overall win rate:", round(mean(deals$won) * 100, 1), "%\n")
## Overall win rate: 81.7 %
ggplot(deals, aes(x = discount_pct)) +
geom_histogram(bins = 20, fill = pal_blue, color = "white", alpha = 0.85) +
labs(title = "Distribution of Discounts Given",
subtitle = "How broadly are discounts being applied?",
x = "Discount %", y = "Count") +
theme_portfolio()
model <- glm(won ~ discount_pct, data = deals, family = "binomial")
summary(model)
##
## Call:
## glm(formula = won ~ discount_pct, family = "binomial", data = deals)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.72793 0.30025 5.755 8.66e-09 ***
## discount_pct -0.01726 0.01874 -0.921 0.357
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 285.85 on 299 degrees of freedom
## Residual deviance: 285.00 on 298 degrees of freedom
## AIC: 289
##
## Number of Fisher Scoring iterations: 4
coef_discount <- coef(model)["discount_pct"]
cat("Coefficient for discount_pct:", round(coef_discount, 4), "\n")
## Coefficient for discount_pct: -0.0173
cat("Odds ratio:", round(exp(coef_discount), 4), "\n")
## Odds ratio: 0.9829
if (coef_discount < 0) {
cat("=> Higher discounts are associated with LOWER win probability\n")
} else {
cat("=> Higher discounts are associated with HIGHER win probability\n")
}
## => Higher discounts are associated with LOWER win probability
Interpretation: The logistic regression tests whether discount percentage alone predicts deal outcome. A positive coefficient means bigger discounts are associated with higher close rates; a negative coefficient means the opposite. The odds ratio quantifies the magnitude — an odds ratio of 1.03, for example, means each additional 1% discount increases the odds of winning by 3%. Check the p-value: if p < 0.05, the relationship is statistically significant and worth acting on.
deals$discount_bucket <- cut(deals$discount_pct,
breaks = c(0, 5, 10, 15, 20, 25, 30, 40),
include.lowest = TRUE)
wr <- aggregate(won ~ discount_bucket, data = deals, FUN = function(x) c(rate = mean(x), n = length(x)))
wr_df <- data.frame(
bucket = wr$discount_bucket,
win_rate = round(wr$won[, "rate"] * 100, 1),
count = wr$won[, "n"]
)
wr_df %>%
kable(col.names = c("Discount Range", "Win Rate (%)", "Deals"),
align = "lrr", caption = "Win Rate by Discount Bucket") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE, position = "left") %>%
row_spec(which.max(wr_df$win_rate), bold = TRUE, background = "#d5f5e3") %>%
row_spec(which.min(wr_df$win_rate), bold = TRUE, background = "#fadbd8")
| Discount Range | Win Rate (%) | Deals |
|---|---|---|
| [0,5] | 88.2 | 51 |
| (5,10] | 81.0 | 58 |
| (10,15] | 83.6 | 73 |
| (15,20] | 75.0 | 56 |
| (20,25] | 80.5 | 41 |
| (25,30] | 82.4 | 17 |
| (30,40] | 75.0 | 4 |
ggplot(wr_df, aes(x = bucket, y = win_rate)) +
geom_col(fill = pal_orange, alpha = 0.85) +
geom_text(aes(label = paste0(win_rate, "%\nn=", count)), vjust = -0.3, size = 3.2) +
labs(title = "Win Rate by Discount Range",
subtitle = "Finding the sweet spot — highest win-rate bucket highlighted",
x = "Discount %", y = "Win Rate (%)") +
theme_portfolio()
Interpretation: This chart reveals the “sweet spot.” Look for the discount bucket with the highest win rate — that’s the range where discounts meaningfully help close deals. Buckets beyond the peak show diminishing or even negative returns, meaning you’re giving away margin without improving close rates.
ggplot(deals, aes(x = discount_pct, y = won)) +
geom_jitter(height = 0.05, alpha = 0.3, color = pal_dark) +
geom_smooth(method = "glm", method.args = list(family = "binomial"),
se = TRUE, color = pal_red, fill = pal_red, alpha = 0.15) +
labs(title = "Discount % vs Win Probability (Logistic Fit)",
subtitle = "Each dot is a deal — the curve shows the modeled probability of closing",
x = "Discount %", y = "P(Win)") +
theme_portfolio()
Interpretation: The logistic curve visualizes the model from Section 3. A flat curve means discount level has little effect on closing. An upward curve means bigger discounts genuinely help. A curve that rises then flattens (or dips) signals a point of diminishing returns — the inflection point is your practical discount cap.
won_deals <- deals[deals$won == 1, ]
rev_by_bucket <- aggregate(deal_size_usd ~ discount_bucket, data = won_deals, FUN = mean)
rev_by_bucket$deal_size_usd <- round(rev_by_bucket$deal_size_usd)
rev_by_bucket %>%
kable(col.names = c("Discount Range", "Avg Won Deal Size ($)"),
align = "lr", caption = "Average Revenue per Won Deal by Discount Bucket") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE, position = "left") %>%
row_spec(which.max(rev_by_bucket$deal_size_usd), bold = TRUE, background = "#d5f5e3") %>%
row_spec(which.min(rev_by_bucket$deal_size_usd), bold = TRUE, background = "#fadbd8")
| Discount Range | Avg Won Deal Size ($) |
|---|---|
| [0,5] | 31613 |
| (5,10] | 23502 |
| (10,15] | 28615 |
| (15,20] | 30016 |
| (20,25] | 31323 |
| (25,30] | 39643 |
| (30,40] | 52764 |
ggplot(rev_by_bucket, aes(x = discount_bucket, y = deal_size_usd)) +
geom_col(fill = pal_purple, alpha = 0.85) +
geom_text(aes(label = paste0("$", format(deal_size_usd, big.mark = ","))), vjust = -0.3, size = 3.2) +
labs(title = "Average Won Deal Size by Discount Bucket",
subtitle = "Are heavily discounted deals worth less?",
x = "Discount %", y = "Avg Deal Size ($)") +
theme_portfolio()
Interpretation: This answers a critical question: even if heavy discounts close more deals, are those deals smaller? If average deal size drops sharply in higher discount buckets, the extra wins may not compensate for the lost revenue per deal.
deals$high_discount <- ifelse(deals$discount_pct > median(deals$discount_pct), "High", "Low")
# Win rate comparison
cat("Win rate - Low discount:", round(mean(deals$won[deals$high_discount == "Low"]) * 100, 1), "%\n")
## Win rate - Low discount: 84.7 %
cat("Win rate - High discount:", round(mean(deals$won[deals$high_discount == "High"]) * 100, 1), "%\n")
## Win rate - High discount: 78.7 %
# Proportion test
low <- deals[deals$high_discount == "Low", ]
high <- deals[deals$high_discount == "High", ]
prop_test <- prop.test(
x = c(sum(low$won), sum(high$won)),
n = c(nrow(low), nrow(high))
)
print(prop_test)
##
## 2-sample test for equality of proportions with continuity correction
##
## data: c(sum(low$won), sum(high$won)) out of c(nrow(low), nrow(high))
## X-squared = 1.4249, df = 1, p-value = 0.2326
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.03397413 0.15397413
## sample estimates:
## prop 1 prop 2
## 0.8466667 0.7866667
Interpretation: The two-proportion z-test checks whether the win-rate difference between high- and low-discount deals is statistically significant (p < 0.05) or just random noise. If significant, it confirms that discount level meaningfully impacts outcomes. If not, other factors (meetings, deal source, etc.) likely matter more than discount alone.
model_full <- glm(won ~ discount_pct + deal_size_usd + num_meetings +
competitor_mentioned + deal_source,
data = deals, family = "binomial")
summary(model_full)
##
## Call:
## glm(formula = won ~ discount_pct + deal_size_usd + num_meetings +
## competitor_mentioned + deal_source, family = "binomial",
## data = deals)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.460e-01 5.586e-01 0.798 0.424606
## discount_pct -1.665e-02 2.016e-02 -0.826 0.408943
## deal_size_usd -2.038e-06 6.024e-06 -0.338 0.735146
## num_meetings 3.554e-01 9.306e-02 3.819 0.000134 ***
## competitor_mentioned -4.030e-01 3.210e-01 -1.256 0.209283
## deal_sourceOutbound 8.259e-01 4.899e-01 1.686 0.091829 .
## deal_sourcePartner -2.967e-01 3.950e-01 -0.751 0.452527
## deal_sourceReferral 7.184e-01 4.697e-01 1.529 0.126162
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 285.85 on 299 degrees of freedom
## Residual deviance: 257.02 on 292 degrees of freedom
## AIC: 273.02
##
## Number of Fisher Scoring iterations: 5
Interpretation: The multivariate model adds deal size, meetings, competitor presence, and deal source alongside discount. If discount remains significant here, its effect is real and independent. If it loses significance, the earlier result was confounded — e.g., reps may offer larger discounts on deals that were already at risk, making the discount look less effective than it truly is.
# Build a concise summary table
optimal_bucket <- wr_df$bucket[which.max(wr_df$win_rate)]
optimal_rate <- max(wr_df$win_rate)
worst_bucket <- wr_df$bucket[which.min(wr_df$win_rate)]
worst_rate <- min(wr_df$win_rate)
p_val_simple <- summary(model)$coefficients["discount_pct", "Pr(>|z|)"]
p_val_full <- summary(model_full)$coefficients["discount_pct", "Pr(>|z|)"]
summary_df <- data.frame(
Metric = c("Overall win rate",
"Optimal discount range",
"Best bucket win rate",
"Worst discount range",
"Worst bucket win rate",
"Simple model p-value",
"Full model p-value"),
Value = c(paste0(round(mean(deals$won) * 100, 1), "%"),
as.character(optimal_bucket),
paste0(optimal_rate, "%"),
as.character(worst_bucket),
paste0(worst_rate, "%"),
formatC(p_val_simple, format = "g", digits = 3),
formatC(p_val_full, format = "g", digits = 3))
)
summary_df %>%
kable(align = "lr", caption = "Discount Elasticity — Key Findings") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE, position = "left") %>%
row_spec(c(2, 3), bold = TRUE, background = "#d5f5e3")
| Metric | Value |
|---|---|
| Overall win rate | 81.7% |
| Optimal discount range | [0,5] |
| Best bucket win rate | 88.2% |
| Worst discount range | (15,20] |
| Worst bucket win rate | 75% |
| Simple model p-value | 0.357 |
| Full model p-value | 0.409 |
Takeaways:
- The optimal discount range is the bucket shown in green above — it delivers the highest win rate.
- Discounts beyond that range show diminishing returns and smaller average deal sizes.
- Check the p-values: if the full-model p-value is < 0.05, discount has a real, independent effect on close rates.
- Recommendation: Cap standard discounts at the upper bound of the optimal bucket and require manager approval above that threshold.
- Revenue per deal tends to drop at higher discount levels — monitor net revenue, not just win rate.
Project completed as part of RevOps Analytics Portfolio — Aya Hanouni