1. Project Overview

Objective: Determine how discounting affects win rates and deal revenue. Find the optimal discount range that maximizes wins without leaving money on the table.

Skills Demonstrated: Logistic regression, hypothesis testing, binned analysis, data visualization.

Dataset: deals_data.csv — 300 deals with discount percentage and win/loss outcome.

2. Load & Explore

deals <- read.csv("deals_data.csv")
deals$won <- ifelse(deals$outcome == "Closed Won", 1, 0)

cat("Discount range:", min(deals$discount_pct), "% to", max(deals$discount_pct), "%\n")

## Discount range: 0 % to 32.5 %

cat("Mean discount:", round(mean(deals$discount_pct), 1), "%\n")

## Mean discount: 13.2 %

cat("Overall win rate:", round(mean(deals$won) * 100, 1), "%\n")

## Overall win rate: 81.7 %

ggplot(deals, aes(x = discount_pct)) +
  geom_histogram(bins = 20, fill = pal_blue, color = "white", alpha = 0.85) +
  labs(title = "Distribution of Discounts Given",
       subtitle = "How broadly are discounts being applied?",
       x = "Discount %", y = "Count") +
  theme_portfolio()

3. Logistic Regression — Discount vs Win

model <- glm(won ~ discount_pct, data = deals, family = "binomial")
summary(model)

## 
## Call:
## glm(formula = won ~ discount_pct, family = "binomial", data = deals)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   1.72793    0.30025   5.755 8.66e-09 ***
## discount_pct -0.01726    0.01874  -0.921    0.357    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 285.85  on 299  degrees of freedom
## Residual deviance: 285.00  on 298  degrees of freedom
## AIC: 289
## 
## Number of Fisher Scoring iterations: 4

coef_discount <- coef(model)["discount_pct"]
cat("Coefficient for discount_pct:", round(coef_discount, 4), "\n")

## Coefficient for discount_pct: -0.0173

cat("Odds ratio:", round(exp(coef_discount), 4), "\n")

## Odds ratio: 0.9829

if (coef_discount < 0) {
  cat("=> Higher discounts are associated with LOWER win probability\n")
} else {
  cat("=> Higher discounts are associated with HIGHER win probability\n")
}

## => Higher discounts are associated with LOWER win probability

Interpretation: The logistic regression tests whether discount percentage alone predicts deal outcome. A positive coefficient means bigger discounts are associated with higher close rates; a negative coefficient means the opposite. The odds ratio quantifies the magnitude — an odds ratio of 1.03, for example, means each additional 1% discount increases the odds of winning by 3%. Check the p-value: if p < 0.05, the relationship is statistically significant and worth acting on.

4. Win Rate by Discount Bucket

deals$discount_bucket <- cut(deals$discount_pct,
                              breaks = c(0, 5, 10, 15, 20, 25, 30, 40),
                              include.lowest = TRUE)

wr <- aggregate(won ~ discount_bucket, data = deals, FUN = function(x) c(rate = mean(x), n = length(x)))
wr_df <- data.frame(
  bucket = wr$discount_bucket,
  win_rate = round(wr$won[, "rate"] * 100, 1),
  count = wr$won[, "n"]
)

wr_df %>%
  kable(col.names = c("Discount Range", "Win Rate (%)", "Deals"),
        align = "lrr", caption = "Win Rate by Discount Bucket") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, position = "left") %>%
  row_spec(which.max(wr_df$win_rate), bold = TRUE, background = "#d5f5e3") %>%
  row_spec(which.min(wr_df$win_rate), bold = TRUE, background = "#fadbd8")

Win Rate by Discount Bucket
Discount Range	Win Rate (%)	Deals
[0,5]	88.2	51
(5,10]	81.0	58
(10,15]	83.6	73
(15,20]	75.0	56
(20,25]	80.5	41
(25,30]	82.4	17
(30,40]	75.0	4

ggplot(wr_df, aes(x = bucket, y = win_rate)) +
  geom_col(fill = pal_orange, alpha = 0.85) +
  geom_text(aes(label = paste0(win_rate, "%\nn=", count)), vjust = -0.3, size = 3.2) +
  labs(title = "Win Rate by Discount Range",
       subtitle = "Finding the sweet spot — highest win-rate bucket highlighted",
       x = "Discount %", y = "Win Rate (%)") +
  theme_portfolio()

Interpretation: This chart reveals the “sweet spot.” Look for the discount bucket with the highest win rate — that’s the range where discounts meaningfully help close deals. Buckets beyond the peak show diminishing or even negative returns, meaning you’re giving away margin without improving close rates.

5. Scatter + Logistic Curve

ggplot(deals, aes(x = discount_pct, y = won)) +
  geom_jitter(height = 0.05, alpha = 0.3, color = pal_dark) +
  geom_smooth(method = "glm", method.args = list(family = "binomial"),
              se = TRUE, color = pal_red, fill = pal_red, alpha = 0.15) +
  labs(title = "Discount % vs Win Probability (Logistic Fit)",
       subtitle = "Each dot is a deal — the curve shows the modeled probability of closing",
       x = "Discount %", y = "P(Win)") +
  theme_portfolio()

Interpretation: The logistic curve visualizes the model from Section 3. A flat curve means discount level has little effect on closing. An upward curve means bigger discounts genuinely help. A curve that rises then flattens (or dips) signals a point of diminishing returns — the inflection point is your practical discount cap.

6. Revenue Impact of Discounts

won_deals <- deals[deals$won == 1, ]

rev_by_bucket <- aggregate(deal_size_usd ~ discount_bucket, data = won_deals, FUN = mean)
rev_by_bucket$deal_size_usd <- round(rev_by_bucket$deal_size_usd)

rev_by_bucket %>%
  kable(col.names = c("Discount Range", "Avg Won Deal Size ($)"),
        align = "lr", caption = "Average Revenue per Won Deal by Discount Bucket") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, position = "left") %>%
  row_spec(which.max(rev_by_bucket$deal_size_usd), bold = TRUE, background = "#d5f5e3") %>%
  row_spec(which.min(rev_by_bucket$deal_size_usd), bold = TRUE, background = "#fadbd8")

Average Revenue per Won Deal by Discount Bucket
Discount Range	Avg Won Deal Size ($)
[0,5]	31613
(5,10]	23502
(10,15]	28615
(15,20]	30016
(20,25]	31323
(25,30]	39643
(30,40]	52764

ggplot(rev_by_bucket, aes(x = discount_bucket, y = deal_size_usd)) +
  geom_col(fill = pal_purple, alpha = 0.85) +
  geom_text(aes(label = paste0("$", format(deal_size_usd, big.mark = ","))), vjust = -0.3, size = 3.2) +
  labs(title = "Average Won Deal Size by Discount Bucket",
       subtitle = "Are heavily discounted deals worth less?",
       x = "Discount %", y = "Avg Deal Size ($)") +
  theme_portfolio()

Interpretation: This answers a critical question: even if heavy discounts close more deals, are those deals smaller? If average deal size drops sharply in higher discount buckets, the extra wins may not compensate for the lost revenue per deal.

7. Hypothesis Test — High vs Low Discount

deals$high_discount <- ifelse(deals$discount_pct > median(deals$discount_pct), "High", "Low")

# Win rate comparison
cat("Win rate - Low discount:", round(mean(deals$won[deals$high_discount == "Low"]) * 100, 1), "%\n")

## Win rate - Low discount: 84.7 %

cat("Win rate - High discount:", round(mean(deals$won[deals$high_discount == "High"]) * 100, 1), "%\n")

## Win rate - High discount: 78.7 %

# Proportion test
low  <- deals[deals$high_discount == "Low", ]
high <- deals[deals$high_discount == "High", ]

prop_test <- prop.test(
  x = c(sum(low$won), sum(high$won)),
  n = c(nrow(low), nrow(high))
)
print(prop_test)

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(sum(low$won), sum(high$won)) out of c(nrow(low), nrow(high))
## X-squared = 1.4249, df = 1, p-value = 0.2326
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.03397413  0.15397413
## sample estimates:
##    prop 1    prop 2 
## 0.8466667 0.7866667

Interpretation: The two-proportion z-test checks whether the win-rate difference between high- and low-discount deals is statistically significant (p < 0.05) or just random noise. If significant, it confirms that discount level meaningfully impacts outcomes. If not, other factors (meetings, deal source, etc.) likely matter more than discount alone.

8. Multivariate — Controlling for Other Factors

model_full <- glm(won ~ discount_pct + deal_size_usd + num_meetings +
                    competitor_mentioned + deal_source,
                  data = deals, family = "binomial")
summary(model_full)

## 
## Call:
## glm(formula = won ~ discount_pct + deal_size_usd + num_meetings + 
##     competitor_mentioned + deal_source, family = "binomial", 
##     data = deals)
## 
## Coefficients:
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)           4.460e-01  5.586e-01   0.798 0.424606    
## discount_pct         -1.665e-02  2.016e-02  -0.826 0.408943    
## deal_size_usd        -2.038e-06  6.024e-06  -0.338 0.735146    
## num_meetings          3.554e-01  9.306e-02   3.819 0.000134 ***
## competitor_mentioned -4.030e-01  3.210e-01  -1.256 0.209283    
## deal_sourceOutbound   8.259e-01  4.899e-01   1.686 0.091829 .  
## deal_sourcePartner   -2.967e-01  3.950e-01  -0.751 0.452527    
## deal_sourceReferral   7.184e-01  4.697e-01   1.529 0.126162    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 285.85  on 299  degrees of freedom
## Residual deviance: 257.02  on 292  degrees of freedom
## AIC: 273.02
## 
## Number of Fisher Scoring iterations: 5

Interpretation: The multivariate model adds deal size, meetings, competitor presence, and deal source alongside discount. If discount remains significant here, its effect is real and independent. If it loses significance, the earlier result was confounded — e.g., reps may offer larger discounts on deals that were already at risk, making the discount look less effective than it truly is.

9. Summary & Recommendations

# Build a concise summary table
optimal_bucket <- wr_df$bucket[which.max(wr_df$win_rate)]
optimal_rate   <- max(wr_df$win_rate)
worst_bucket   <- wr_df$bucket[which.min(wr_df$win_rate)]
worst_rate     <- min(wr_df$win_rate)

p_val_simple   <- summary(model)$coefficients["discount_pct", "Pr(>|z|)"]
p_val_full     <- summary(model_full)$coefficients["discount_pct", "Pr(>|z|)"]

summary_df <- data.frame(
  Metric = c("Overall win rate",
             "Optimal discount range",
             "Best bucket win rate",
             "Worst discount range",
             "Worst bucket win rate",
             "Simple model p-value",
             "Full model p-value"),
  Value = c(paste0(round(mean(deals$won) * 100, 1), "%"),
            as.character(optimal_bucket),
            paste0(optimal_rate, "%"),
            as.character(worst_bucket),
            paste0(worst_rate, "%"),
            formatC(p_val_simple, format = "g", digits = 3),
            formatC(p_val_full, format = "g", digits = 3))
)

summary_df %>%
  kable(align = "lr", caption = "Discount Elasticity — Key Findings") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, position = "left") %>%
  row_spec(c(2, 3), bold = TRUE, background = "#d5f5e3")

Discount Elasticity — Key Findings
Metric	Value
Overall win rate	81.7%
Optimal discount range	[0,5]
Best bucket win rate	88.2%
Worst discount range	(15,20]
Worst bucket win rate	75%
Simple model p-value	0.357
Full model p-value	0.409

Takeaways:

The optimal discount range is the bucket shown in green above — it delivers the highest win rate.

Discounts beyond that range show diminishing returns and smaller average deal sizes.

Check the p-values: if the full-model p-value is < 0.05, discount has a real, independent effect on close rates.

Recommendation: Cap standard discounts at the upper bound of the optimal bucket and require manager approval above that threshold.

Revenue per deal tends to drop at higher discount levels — monitor net revenue, not just win rate.

Project completed as part of RevOps Analytics Portfolio — Aya Hanouni

Discount Elasticity & Price Sensitivity Analysis

Aya Hanouni