Presales Performance Analytics at MBCOM Technologies: An Exploratory and Inferential Study of the Broadcom Enterprise Software Pipeline (Africa, Middle East & CIS, 2023–2024)

Author

Philip Egah

Published

May 12, 2026

Executive Summary

This case study analyses 200 enterprise software deals from MBCOM Technologies’ Broadcom representative pipeline across Africa, the Middle East, and the Commonwealth of Independent States (CIS) over a two-year period (January 2023 – December 2024). MBCOM Technologies is the authorised Broadcom representative for the AME-CIS region, distributing enterprise software solutions including DX Application Performance Management (SED), Automic Workload Automation (AOD), IMS Tools (IMS-CAD), and Layer7 API Management (CB).

The central business problem is: what factors drive Annual Booking Value (ABV) and deal success in the Broadcom presales pipeline, and how can presales resources be allocated more effectively? Using five techniques — exploratory data analysis, data visualisation, hypothesis testing, correlation analysis, and linear regression — this study finds that:

  1. Deal type is the strongest discriminator of value: NEW business deals carry a mean ABV of $77,660 versus $28,912 for RENEWAL deals — a statistically significant difference (Welch’s t-test, p < 0.001, Cohen’s d = 0.56, medium effect).
  2. Technical presales engagement materially lifts deal value: SE-engaged deals average $52,575 ABV versus $29,573 for non-engaged deals (p = 0.031).
  3. POC score and SE days invested are the strongest predictors of ABV (r = 0.41 and r = 0.38 respectively), suggesting that deeper technical investment correlates with larger deal sizes.
  4. The regression model explains 39% of ABV variance (Adjusted R² = 0.39), with POC score, deal type (NEW), and Broadcom Region (IOI&EAF, KSA&BH) as the most significant predictors.

Recommendation: Prioritise presales resource allocation toward NEW business deals in IOI&EAF and KSA&BH sub-regions, mandate POC execution for deals above $30,000 ABV, and track SE days invested as a leading indicator of deal quality.


1 Professional Disclosure

1.1 Role and Organisation

Name: Philip Egah
Title: Presales Specialist — Enterprise Software (Africa, Middle East & CIS)
Organisation: MBCOM Technologies
Organisation Type: Broadcom Authorised Representative / Value-Added Distributor
Domain: Enterprise Software Distribution — Application Management, Workload Automation, Mainframe Tools, API Management
Geography: Sub-Saharan Africa, North Africa, Middle East, Central Asia (CIS)

MBCOM Technologies acts as Broadcom’s commercial and technical representative across the AME-CIS territory. As a Presales Specialist, my day-to-day responsibilities include qualifying inbound and outbound opportunities, delivering technical demonstrations to C-suite and IT leadership audiences, scoping and managing Proofs of Concept (POCs), writing technical sections of proposals and RFP responses, and engaging the Broadcom field team on deal strategy. I own the technical dimension of every deal from the Identified stage through to Award.

1.2 Technique Justifications

Technique 1 — Exploratory Data Analysis (EDA):
Before any presales resource is invested in a deal, I need to understand the shape of the pipeline: which deal types dominate, how ABV is distributed, which regions carry the most value, and where data quality issues lurk (e.g. sparse fields, outliers in deal size). EDA directly mirrors the weekly pipeline review I conduct with the regional sales team — the same questions I ask in those meetings (What does our funnel look like? Where are the outliers? Which deals have missing technical information?) are answered systematically here.

Technique 2 — Data Visualisation:
Visualisation is how I communicate pipeline health to non-technical stakeholders: country managers, the Broadcom regional director, and finance. A pipeline funnel chart, a regional ABV heatmap, and a monthly bookings trend are staples of our quarterly business reviews. This technique formalises that practice into reproducible, publication-quality outputs.

Technique 3 — Hypothesis Testing:
Two standing strategic questions in our team are (a) do NEW business deals justify the higher presales investment they require versus RENEWAL deals, and (b) does technical SE engagement actually improve deal outcomes? These are not intuitions — they are testable hypotheses, and the organisation should make resource allocation decisions based on statistical evidence rather than anecdote.

Technique 4 — Correlation Analysis:
Understanding which presales activities (demos, POC days, preparation hours) correlate most strongly with ABV helps me prioritise effort. Knowing whether POC score correlates with conversion guides how rigorously we define success criteria at the start of each POC. This directly informs the presales engagement playbook I contribute to at MBCOM.

Technique 5 — Linear Regression:
Forecasting the expected ABV of a deal based on observable characteristics (region, BU, deal type, SE engagement) is a standing request from the Broadcom regional finance team, who need to weight pipeline opportunities for quarterly forecasts. A regression model gives a principled, auditable basis for that weighting — replacing the current subjective confidence scoring.


2 Data Collection and Sampling

2.1 Source and Collection Method

The primary dataset was compiled from MBCOM Technologies’ Salesforce CRM (SFDC) pipeline export for the AME-CIS territory, supplemented by the author’s presales activity log maintained in a structured tracking workbook. The SFDC export covers all enterprise software opportunities created between 1 January 2023 and 31 December 2024 in which the author was involved as Technical Owner or supporting SE.

Data were extracted via SFDC’s standard report builder using the “SE Activity” report template, filtered to the AME-CIS Broadcom region and to opportunities where Technical Pre-Sales Engaged = Yes or where the author was listed as Technical Owner. The export was augmented with activity-level data (demo logs, POC tracker entries, travel records) maintained in a parallel tracking workbook updated weekly.

In compliance with MBCOM’s data classification policy, all customer-identifying fields (company names, contact names) have been anonymised using consistent pseudonyms. No pricing data specific to any individual customer contract has been disclosed. This submission has been reviewed under the author’s professional obligations and contains no material classified as commercially sensitive at the individual-deal level.

2.2 Sampling Frame

Parameter Detail
Universe All Broadcom AME-CIS enterprise software opportunities in SFDC, 2023–2024
Sampling method Census (all qualifying deals included; no random sampling applied)
Inclusion criterion Opportunity created Jan 2023–Dec 2024; BU = SED / AOD / IMS-CAD / CB or combination
Exclusion criterion Hardware-only deals (HW flag = TRUE); internal/test records; duplicate parent opportunities
Final sample 200 opportunities across 5 datasets
Time period 24 months: January 2023 – December 2024
Geography 8 sub-regions: WAF, IOI&EAF, South Africa, UAE/QA, KSA&BH, LEVANT, CIS, FSA

2.3 Dataset Overview

Five interrelated datasets were assembled:

Dataset Rows Key Variables Primary Role
SFDC Opportunity Pipeline 200 ABV, Stage, BU, Region, Type, SE Status Main analysis dataset
Presales Activity Log 350 Activity type, duration, travel cost, outcome Effort analysis
POC Tracker 80 KPIs defined/met, POC score, SE days POC effectiveness
Monthly KPIs 24 Pipeline value, win rate, CSAT, revenue booked Time series
Customer Engagement 120 NPS, IT budget, relationship score, renewal risk Account health

2.4 Ethical Notes

All data were collected in the course of normal professional duties. No personal data of private individuals is included. Customer names have been replaced with anonymised codes consistent with MBCOM’s data handling policy. No written consent was required as the data relates to business-to-business commercial transactions. The author holds legitimate access to all source systems used.


3 Data Description

3.1 Load Data and Initial Inspection

Code
library(tidyverse)
library(readxl)
library(skimr)
library(janitor)
library(knitr)
library(kableExtra)
library(scales)
library(ggthemes)
library(patchwork)
library(corrplot)
library(RColorBrewer)
library(car)
library(effectsize)
library(broom)
library(lubridate)

# Load datasets
path <- "Egah_Philip_MBCOM_Presales_Dataset_v7.xlsx"

df        <- read_excel(path, sheet = "SFDC_Opportunity_Pipeline")
df_act    <- read_excel(path, sheet = "Presales_Activity_Log")
df_poc    <- read_excel(path, sheet = "POC_Tracker")
df_kpi    <- read_excel(path, sheet = "Monthly_KPIs")
df_eng    <- read_excel(path, sheet = "Customer_Engagement")

# Clean column names
df     <- df %>% clean_names()
df_act <- df_act %>% clean_names()
df_poc <- df_poc %>% clean_names()
df_kpi <- df_kpi %>% clean_names()
df_eng <- df_eng %>% clean_names()

# Derived variables
df <- df %>%
  mutate(
    discount_pct   = (amount_converted - selling_price_converted) / amount_converted,
    deal_duration  = as.numeric(difftime(close_date, created_date, units = "days")),
    year           = year(created_date),
    quarter        = paste0(year, " Q", quarter(created_date)),
    abv_log        = log1p(abv_converted)
  )

cat("Pipeline: ", nrow(df), "rows,", ncol(df), "columns\n")
Pipeline:  200 rows, 46 columns
Code
cat("Date range:", format(min(df$created_date), "%d %b %Y"),
    "to", format(max(df$close_date, na.rm=TRUE), "%d %b %Y"), "\n")
Date range: 05 Jan 2023 to 22 Dec 2025 
Code
cat("Total pipeline ABV: $", format(sum(df$abv_converted), big.mark=","), "\n")
Total pipeline ABV: $ 9,916,900 
Code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import seaborn as sns
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

path = "Egah_Philip_MBCOM_Presales_Dataset_v7.xlsx"

df_py     = pd.read_excel(path, sheet_name="SFDC_Opportunity_Pipeline")
df_act_py = pd.read_excel(path, sheet_name="Presales_Activity_Log")
df_poc_py = pd.read_excel(path, sheet_name="POC_Tracker")
df_kpi_py = pd.read_excel(path, sheet_name="Monthly_KPIs")
df_eng_py = pd.read_excel(path, sheet_name="Customer_Engagement")

# Derived variables
df_py['Discount_Pct']   = (df_py['Amount (converted)'] - df_py['Selling Price (converted)']) / df_py['Amount (converted)']
df_py['Deal_Duration']  = (df_py['Close Date'] - df_py['Created Date']).dt.days
df_py['Year']           = df_py['Created Date'].dt.year
df_py['ABV_Log']        = np.log1p(df_py['ABV (converted)'])

print(f"Pipeline: {df_py.shape[0]} rows, {df_py.shape[1]} columns")
Pipeline: 200 rows, 45 columns
Code
print(f"Total pipeline ABV: ${df_py['ABV (converted)'].sum():,.0f}")
Total pipeline ABV: $9,916,900
Code
print(f"\nData types:\n{df_py.dtypes.value_counts()}")

Data types:
str               24
int64              7
float64            6
datetime64[us]     4
bool               3
int32              1
Name: count, dtype: int64

3.2 Summary Statistics

Code
df %>%
  select(abv_converted, amount_converted, selling_price_converted,
         discount_pct, deal_duration, new, renewal) %>%
  skim() %>%
  yank("numeric") %>%
  select(skim_variable, n_missing, mean, sd, p0, p25, p50, p75, p100) %>%
  kable(digits = 2, format.args = list(big.mark = ","),
        caption = "Table 1: Descriptive Statistics — Key Numeric Variables") %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = FALSE)
Table 1: Descriptive Statistics — Key Numeric Variables
skim_variable n_missing mean sd p0 p25 p50 p75 p100
abv_converted 0 49,584.50 79,426.50 2,000.00 12,775.00 29,600.00 52,575.00 681,800.00
amount_converted 0 49,155.50 78,738.76 2,000.00 12,925.00 29,050.00 53,275.00 673,500.00
selling_price_converted 0 44,566.00 69,883.32 1,700.00 11,500.00 26,700.00 46,325.00 608,400.00
discount_pct 0 0.09 0.08 -0.08 0.03 0.09 0.14 0.26
deal_duration 0 214.53 108.46 40.00 114.50 209.00 315.50 392.00
new 0 26,016.00 80,163.95 0.00 0.00 0.00 18,975.00 681,800.00
renewal 0 11,854.00 21,568.60 0.00 0.00 0.00 14,700.00 130,900.00
Code
num_cols = ['ABV (converted)', 'Amount (converted)', 'Selling Price (converted)',
            'Discount_Pct', 'Deal_Duration', 'NEW', 'RENEWAL']
desc = df_py[num_cols].describe().T.round(2)
desc.columns = ['Count','Mean','Std','Min','25%','50%','75%','Max']
print("Table 1: Descriptive Statistics — Key Numeric Variables")
Table 1: Descriptive Statistics — Key Numeric Variables
Code
print(desc.to_string())
                           Count      Mean       Std      Min       25%       50%       75%        Max
ABV (converted)            200.0  49584.50  79426.50  2000.00  12775.00  29600.00  52575.00  681800.00
Amount (converted)         200.0  49155.50  78738.76  2000.00  12925.00  29050.00  53275.00  673500.00
Selling Price (converted)  200.0  44566.00  69883.32  1700.00  11500.00  26700.00  46325.00  608400.00
Discount_Pct               200.0      0.09      0.08    -0.08      0.03      0.09      0.14       0.26
Deal_Duration              200.0    214.53    108.46    40.00    114.50    209.00    315.50     392.00
NEW                        200.0  26016.00  80163.95     0.00      0.00      0.00  18975.00  681800.00
RENEWAL                    200.0  11854.00  21568.60     0.00      0.00      0.00  14700.00  130900.00

3.3 Data Quality Issues Identified and Resolved

Two data quality issues were identified during EDA and addressed before analysis:

Issue 1 — Sparse administrative fields (Q2 SLIPPED, HW). These SFDC flags are populated only in specific workflow states and are entirely null across the 2023–2024 extract. They carry no analytical information for this study and are excluded from all analyses. This is an expected property of SFDC pipeline exports — not a data collection failure.

Resolution 1: Both columns are dropped from all analytical datasets.

Issue 2 — Right-skewed ABV distribution with high-leverage outliers. The ABV distribution is strongly right-skewed (skewness ≈ 4.2), with a small number of deals exceeding $500,000 pulling the mean well above the median ($49,585 vs. $29,600). This is realistic for enterprise software pipelines but violates the normality assumption required for ordinary linear regression.

Resolution 2: A log₁₊ transformation is applied to ABV for regression modelling. The raw ABV is retained for all descriptive and visualisation analyses to preserve interpretability.

Code
# ABV distribution skewness
skew_val <- e1071::skewness(df$abv_converted, na.rm = TRUE)

p_raw <- ggplot(df, aes(x = abv_converted)) +
  geom_histogram(bins = 40, fill = "#2E75B6", colour = "white", alpha = 0.85) +
  scale_x_continuous(labels = label_dollar(scale = 1e-3, suffix = "K")) +
  labs(title = "Raw ABV Distribution",
       subtitle = paste0("Skewness = ", round(skew_val, 2)),
       x = "ABV (USD)", y = "Count") +
  theme_minimal(base_size = 11) +
  theme(plot.title = element_text(face = "bold", colour = "#1F3864"))

p_log <- ggplot(df, aes(x = abv_log)) +
  geom_histogram(bins = 40, fill = "#70AD47", colour = "white", alpha = 0.85) +
  labs(title = "Log-Transformed ABV",
       subtitle = "Approximately normal after log₁₊ transform",
       x = "log₁₊(ABV)", y = "Count") +
  theme_minimal(base_size = 11) +
  theme(plot.title = element_text(face = "bold", colour = "#1F3864"))

p_raw + p_log +
  plot_annotation(title    = "Figure 1: ABV Distribution — Before and After Log Transformation",
                  subtitle = "Right-skewed raw distribution corrected for regression modelling",
                  theme    = theme(plot.title = element_text(face = "bold", size = 13,
                                                              colour = "#1F3864")))

Code
from scipy.stats import skew
fig, axes = plt.subplots(1, 2, figsize=(11, 4))

axes[0].hist(df_py['ABV (converted)'] / 1000, bins=40,
             color='#2E75B6', edgecolor='white', alpha=0.85)
axes[0].set_title(f"Raw ABV  |  Skewness = {skew(df_py['ABV (converted)']):,.2f}",
                  fontweight='bold', color='#1F3864')
axes[0].set_xlabel('ABV (USD 000s)'); axes[0].set_ylabel('Count')

axes[1].hist(df_py['ABV_Log'], bins=40,
             color='#70AD47', edgecolor='white', alpha=0.85)
axes[1].set_title('Log-Transformed ABV  |  Approx. Normal',
                  fontweight='bold', color='#1F3864')
axes[1].set_xlabel('log₁₊(ABV)'); axes[1].set_ylabel('Count')

fig.suptitle('Figure 1: ABV Distribution — Before and After Log Transformation',
             fontsize=13, fontweight='bold', color='#1F3864')
plt.tight_layout(); plt.show()


4 Exploratory Data Analysis

4.1 Pipeline Composition

Code
# Deal type breakdown
type_tbl <- df %>%
  count(type_simplified) %>%
  mutate(pct = n / sum(n),
         type_simplified = fct_reorder(type_simplified, n))

p1 <- ggplot(type_tbl, aes(x = type_simplified, y = n, fill = type_simplified)) +
  geom_col(show.legend = FALSE, width = 0.65) +
  geom_text(aes(label = paste0(n, " (", percent(pct, accuracy = 1), ")")),
            hjust = -0.1, size = 3.5, colour = "#333333") +
  scale_fill_brewer(palette = "Blues", direction = 1) +
  coord_flip() +
  scale_y_continuous(expand = expansion(mult = c(0, 0.18))) +
  labs(title = "Deal Type Distribution", x = NULL, y = "Count") +
  theme_minimal(base_size = 11) +
  theme(plot.title = element_text(face = "bold", colour = "#1F3864"),
        panel.grid.major.y = element_blank())

# BU breakdown
bu_tbl <- df %>%
  count(bu) %>%
  mutate(pct = n / sum(n),
         bu  = fct_reorder(bu, n))

p2 <- ggplot(bu_tbl, aes(x = bu, y = n, fill = bu)) +
  geom_col(show.legend = FALSE, width = 0.65) +
  geom_text(aes(label = paste0(n)), hjust = -0.1, size = 3.2, colour = "#333333") +
  scale_fill_manual(values = colorRampPalette(c("#BDD7EE","#1F3864"))(nrow(bu_tbl))) +
  coord_flip() +
  scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
  labs(title = "Business Unit Distribution", x = NULL, y = "Count") +
  theme_minimal(base_size = 11) +
  theme(plot.title = element_text(face = "bold", colour = "#1F3864"),
        panel.grid.major.y = element_blank())

p1 + p2 +
  plot_annotation(title = "Figure 2: Pipeline Composition by Deal Type and Business Unit",
                  theme = theme(plot.title = element_text(face = "bold", size = 13,
                                                           colour = "#1F3864")))

Code
fig, axes = plt.subplots(1, 2, figsize=(12, 4.5))

type_counts = df_py['TYPE SIMPLIFIED'].value_counts()
colors_type = ['#1F3864','#2E75B6','#70AD47','#ED7D31']
axes[0].barh(type_counts.index, type_counts.values,
             color=colors_type[:len(type_counts)], alpha=0.85)
for i, (v, pct) in enumerate(zip(type_counts.values,
                                  type_counts.values/type_counts.sum())):
    axes[0].text(v+1, i, f'{v} ({pct:.0%})', va='center', fontsize=9)
axes[0].set_title('Deal Type Distribution', fontweight='bold', color='#1F3864')
axes[0].set_xlabel('Count'); axes[0].invert_yaxis()

bu_counts = df_py['BU'].value_counts()
palette = plt.cm.Blues(np.linspace(0.35, 0.95, len(bu_counts)))
axes[1].barh(bu_counts.index, bu_counts.values, color=palette, alpha=0.9)
for i, v in enumerate(bu_counts.values):
    axes[1].text(v+0.3, i, str(v), va='center', fontsize=9)
axes[1].set_title('Business Unit Distribution', fontweight='bold', color='#1F3864')
axes[1].set_xlabel('Count'); axes[1].invert_yaxis()

fig.suptitle('Figure 2: Pipeline Composition by Deal Type and Business Unit',
             fontsize=13, fontweight='bold', color='#1F3864')
plt.tight_layout(); plt.show()

4.2 Deal Funnel by Stage

Code
stage_order  <- c("Identified","Qualified","Proposal","In negotiation","Awarded")
stage_colors <- c("#BDD7EE","#9DC3E6","#2E75B6","#1F5C9C","#1F3864")

stage_tbl <- df %>%
  count(stage) %>%
  mutate(stage = factor(stage, levels = stage_order)) %>%
  arrange(stage) %>%
  mutate(pct = n / sum(n))

ggplot(stage_tbl, aes(x = stage, y = n, fill = stage)) +
  geom_col(width = 0.7, show.legend = FALSE) +
  geom_text(aes(label = paste0(n, "\n(", percent(pct, accuracy=1),")")),
            vjust = -0.3, size = 3.5, fontface = "bold", colour = "#1F3864") +
  scale_fill_manual(values = stage_colors) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
  labs(title    = "Figure 3: Deal Pipeline Funnel — Stage Distribution",
       subtitle = "200 opportunities across 5 stages | Jan 2023 – Dec 2024",
       x = "Pipeline Stage", y = "Number of Deals") +
  theme_minimal(base_size = 12) +
  theme(plot.title    = element_text(face = "bold", size = 13, colour = "#1F3864"),
        plot.subtitle = element_text(colour = "grey40"),
        panel.grid.major.x = element_blank(),
        axis.text.x   = element_text(face = "bold"))

Code
stage_order = ['Identified','Qualified','Proposal','In negotiation','Awarded']
stage_counts = df_py['Stage'].value_counts().reindex(stage_order)
colors_funnel = ['#BDD7EE','#9DC3E6','#2E75B6','#1F5C9C','#1F3864']

fig, ax = plt.subplots(figsize=(10, 5))
bars = ax.bar(stage_counts.index, stage_counts.values,
              color=colors_funnel, width=0.65, alpha=0.95)
for bar, v in zip(bars, stage_counts.values):
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.5,
            f'{v}\n({v/len(df_py):.0%})', ha='center', va='bottom',
            fontsize=10, fontweight='bold', color='#1F3864')
ax.set_title('Figure 3: Deal Pipeline Funnel — Stage Distribution\n'
             '200 opportunities across 5 stages | Jan 2023–Dec 2024',
             fontsize=13, fontweight='bold', color='#1F3864')
ax.set_xlabel('Pipeline Stage', fontsize=11)
ax.set_ylabel('Number of Deals', fontsize=11)
ax.set_ylim(0, max(stage_counts.values)*1.2)
(0.0, 69.6)
Code
ax.spines[['top','right']].set_visible(False)
plt.tight_layout(); plt.show()

4.3 ABV by Region and Business Unit

Code
region_abv <- df %>%
  group_by(broadcom_region) %>%
  summarise(mean_abv = mean(abv_converted), n = n(), .groups = "drop") %>%
  mutate(broadcom_region = fct_reorder(broadcom_region, mean_abv))

p_reg <- ggplot(region_abv, aes(x = broadcom_region, y = mean_abv, fill = mean_abv)) +
  geom_col(width = 0.7, show.legend = FALSE) +
  geom_text(aes(label = paste0("$", round(mean_abv/1000,1), "K\nn=", n)),
            hjust = -0.1, size = 3.3, colour = "#333333") +
  scale_fill_gradient(low = "#BDD7EE", high = "#1F3864") +
  scale_y_continuous(labels = label_dollar(scale = 1e-3, suffix = "K"),
                     expand = expansion(mult = c(0, 0.25))) +
  coord_flip() +
  labs(title = "Mean ABV by Broadcom Sub-Region", x = NULL, y = "Mean ABV (USD)") +
  theme_minimal(base_size = 11) +
  theme(plot.title = element_text(face = "bold", colour = "#1F3864"),
        panel.grid.major.y = element_blank())

bu_abv <- df %>%
  group_by(bu) %>%
  summarise(mean_abv = mean(abv_converted), n = n(), .groups = "drop") %>%
  mutate(bu = fct_reorder(bu, mean_abv))

p_bu <- ggplot(bu_abv, aes(x = bu, y = mean_abv, fill = mean_abv)) +
  geom_col(width = 0.7, show.legend = FALSE) +
  geom_text(aes(label = paste0("$", round(mean_abv/1000,1),"K")),
            hjust = -0.1, size = 3.3, colour = "#333333") +
  scale_fill_gradient(low = "#FDEBD0", high = "#C0392B") +
  scale_y_continuous(labels = label_dollar(scale = 1e-3, suffix = "K"),
                     expand = expansion(mult = c(0, 0.22))) +
  coord_flip() +
  labs(title = "Mean ABV by Business Unit", x = NULL, y = "Mean ABV (USD)") +
  theme_minimal(base_size = 11) +
  theme(plot.title = element_text(face = "bold", colour = "#1F3864"),
        panel.grid.major.y = element_blank())

p_reg + p_bu +
  plot_annotation(title = "Figure 4: Annual Booking Value by Geography and Business Unit",
                  theme = theme(plot.title = element_text(face = "bold", size = 13,
                                                           colour = "#1F3864")))

Code
fig, axes = plt.subplots(1, 2, figsize=(13, 5))

reg = df_py.groupby('Broadcom Region')['ABV (converted)'].mean().sort_values()
cmap1 = plt.cm.Blues(np.linspace(0.35, 0.95, len(reg)))
axes[0].barh(reg.index, reg.values/1000, color=cmap1, alpha=0.9)
for i, v in enumerate(reg.values):
    axes[0].text(v/1000+0.5, i, f'${v/1000:,.1f}K', va='center', fontsize=9)
axes[0].set_title('Mean ABV by Broadcom Sub-Region', fontweight='bold', color='#1F3864')
axes[0].set_xlabel('Mean ABV (USD 000s)')

bu = df_py.groupby('BU')['ABV (converted)'].mean().sort_values()
cmap2 = plt.cm.Oranges(np.linspace(0.35, 0.95, len(bu)))
axes[1].barh(bu.index, bu.values/1000, color=cmap2, alpha=0.9)
for i, v in enumerate(bu.values):
    axes[1].text(v/1000+0.5, i, f'${v/1000:,.1f}K', va='center', fontsize=9)
axes[1].set_title('Mean ABV by Business Unit', fontweight='bold', color='#1F3864')
axes[1].set_xlabel('Mean ABV (USD 000s)')

fig.suptitle('Figure 4: Annual Booking Value by Geography and Business Unit',
             fontsize=13, fontweight='bold', color='#1F3864')
plt.tight_layout(); plt.show()

4.4 Monthly Pipeline and Bookings Trend

Code
df_kpi2 <- df_kpi %>%
  mutate(month = as.Date(month),
         yr    = factor(year))

ggplot(df_kpi2, aes(x = month)) +
  geom_area(aes(y = pipeline_value_usd / 1e6, fill = "Pipeline Value"),
            alpha = 0.25) +
  geom_line(aes(y = pipeline_value_usd / 1e6, colour = "Pipeline Value"),
            linewidth = 1.2) +
  geom_col(aes(y = revenue_booked_usd / 1e6, fill = "Revenue Booked"),
           alpha = 0.7, width = 25) +
  geom_line(aes(y = win_rate * 2.5, colour = "Win Rate (×2.5M scale)"),
            linewidth = 1, linetype = "dashed") +
  scale_y_continuous(
    name   = "Value (USD Millions)",
    labels = label_dollar(suffix = "M"),
    sec.axis = sec_axis(~ . / 2.5, name = "Win Rate",
                        labels = label_percent())
  ) +
  scale_colour_manual(values = c("Pipeline Value" = "#1F3864",
                                  "Win Rate (×2.5M scale)" = "#ED7D31"),
                      name = NULL) +
  scale_fill_manual(values  = c("Pipeline Value" = "#BDD7EE",
                                 "Revenue Booked" = "#70AD47"),
                    name = NULL) +
  scale_x_date(date_labels = "%b %Y", date_breaks = "3 months") +
  labs(title    = "Figure 5: Monthly Pipeline Value, Revenue Booked, and Win Rate",
       subtitle = "Jan 2023 – Dec 2024 | Dual-axis: value (left) and win rate (right)",
       x = NULL) +
  theme_minimal(base_size = 11) +
  theme(plot.title    = element_text(face = "bold", size = 13, colour = "#1F3864"),
        plot.subtitle = element_text(colour = "grey40"),
        legend.position  = "bottom",
        axis.text.x      = element_text(angle = 30, hjust = 1))

Code
df_kpi_py2 = df_kpi_py.copy()
df_kpi_py2['Month'] = pd.to_datetime(df_kpi_py2['Month'])

fig, ax1 = plt.subplots(figsize=(12, 5))
ax2 = ax1.twinx()

ax1.fill_between(df_kpi_py2['Month'],
                 df_kpi_py2['Pipeline Value (USD)']/1e6,
                 alpha=0.2, color='#1F3864')
ax1.plot(df_kpi_py2['Month'],
         df_kpi_py2['Pipeline Value (USD)']/1e6,
         color='#1F3864', linewidth=2, label='Pipeline Value')
ax1.bar(df_kpi_py2['Month'],
        df_kpi_py2['Revenue Booked (USD)']/1e6,
        width=25, alpha=0.7, color='#70AD47', label='Revenue Booked')
ax2.plot(df_kpi_py2['Month'], df_kpi_py2['Win Rate'],
         color='#ED7D31', linewidth=1.5, linestyle='--', label='Win Rate')

ax1.set_ylabel('Value (USD Millions)', fontsize=11)
ax2.set_ylabel('Win Rate', fontsize=11)
ax2.yaxis.set_major_formatter(mtick.PercentFormatter(xmax=1))

lines1, labels1 = ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax1.legend(lines1+lines2, labels1+labels2, loc='upper left', fontsize=9)

ax1.set_title('Figure 5: Monthly Pipeline Value, Revenue Booked, and Win Rate\n'
              'Jan 2023–Dec 2024',
              fontsize=13, fontweight='bold', color='#1F3864')
plt.xticks(rotation=30, ha='right'); plt.tight_layout(); plt.show()
(array([19358., 19448., 19539., 19631., 19723., 19814., 19905., 19997.,
       20089.]), [Text(19358.0, 0, '2023-01'), Text(19448.0, 0, '2023-04'), Text(19539.0, 0, '2023-07'), Text(19631.0, 0, '2023-10'), Text(19723.0, 0, '2024-01'), Text(19814.0, 0, '2024-04'), Text(19905.0, 0, '2024-07'), Text(19997.0, 0, '2024-10'), Text(20089.0, 0, '2025-01')])

4.5 Presales Activity Mix and SE Effort

Code
act_type_tbl <- df_act %>%
  count(activity_type) %>%
  mutate(pct = n / sum(n),
         activity_type = fct_reorder(activity_type, n))

p_act <- ggplot(act_type_tbl, aes(x = activity_type, y = pct, fill = activity_type)) +
  geom_col(width = 0.7, show.legend = FALSE) +
  geom_text(aes(label = paste0(n, " (", percent(pct, accuracy=1),")")),
            hjust = -0.05, size = 3.2) +
  scale_y_continuous(labels = label_percent(),
                     expand = expansion(mult = c(0, 0.22))) +
  scale_fill_manual(values = colorRampPalette(c("#EBF3FB","#1F3864"))(nrow(act_type_tbl))) +
  coord_flip() +
  labs(title = "SE Activity Type Distribution (n=350)", x = NULL, y = "Share") +
  theme_minimal(base_size = 11) +
  theme(plot.title = element_text(face = "bold", colour = "#1F3864"),
        panel.grid.major.y = element_blank())

mode_tbl <- df_act %>% count(delivery_mode)

p_mode <- ggplot(mode_tbl, aes(x = "", y = n, fill = delivery_mode)) +
  geom_col(width = 0.6, colour = "white") +
  geom_text(aes(label = paste0(delivery_mode, "\n", percent(n/sum(n), accuracy=1))),
            position = position_stack(vjust = 0.5), size = 3.5, colour = "white",
            fontface = "bold") +
  scale_fill_manual(values = c("#1F3864","#2E75B6","#70AD47")) +
  coord_polar("y") +
  labs(title = "Delivery Mode Split", x = NULL, y = NULL) +
  theme_void(base_size = 11) +
  theme(plot.title = element_text(face = "bold", colour = "#1F3864", hjust = 0.5),
        legend.position = "none")

p_act + p_mode +
  plot_layout(widths = c(2, 1)) +
  plot_annotation(title = "Figure 6: Presales Activity Mix and Delivery Mode",
                  theme = theme(plot.title = element_text(face = "bold", size = 13,
                                                           colour = "#1F3864")))

Code
fig, axes = plt.subplots(1, 2, figsize=(13, 5),
                          gridspec_kw={'width_ratios': [2, 1]})

act_counts = df_act_py['Activity Type'].value_counts().sort_values()
palette_act = plt.cm.Blues(np.linspace(0.3, 0.95, len(act_counts)))
axes[0].barh(act_counts.index, act_counts.values / act_counts.sum() * 100,
             color=palette_act, alpha=0.9)
for i, (v, pct) in enumerate(zip(act_counts.values,
                                  act_counts.values/act_counts.sum()*100)):
    axes[0].text(pct+0.2, i, f'{v} ({pct:.0f}%)', va='center', fontsize=8.5)
axes[0].set_xlabel('Share (%)')
axes[0].set_title('SE Activity Type Distribution (n=350)',
                  fontweight='bold', color='#1F3864')

mode_counts = df_act_py['Delivery Mode'].value_counts()
axes[1].pie(mode_counts.values, labels=mode_counts.index,
            colors=['#1F3864','#2E75B6','#70AD47'],
            autopct='%1.0f%%', textprops={'color':'white','fontweight':'bold'})
([<matplotlib.patches.Wedge object at 0x00000204656F8140>, <matplotlib.patches.Wedge object at 0x000002046B43D1F0>, <matplotlib.patches.Wedge object at 0x00000204656F83B0>], [Text(0.019746053144499617, 1.0998227554407187, 'Virtual'), Text(-0.6059864490452066, -0.9180307312795044, 'In-Person'), Text(0.9287606853592857, -0.5894095259927091, 'Hybrid')], [Text(0.010770574442454335, 0.5999033211494829, '49%'), Text(-0.33053806311556716, -0.500744035243366, '33%'), Text(0.5065967374687013, -0.3214961050869322, '18%')])
Code
axes[1].set_title('Delivery Mode Split', fontweight='bold', color='#1F3864')

fig.suptitle('Figure 6: Presales Activity Mix and Delivery Mode',
             fontsize=13, fontweight='bold', color='#1F3864')
plt.tight_layout(); plt.show()


5 Hypothesis Testing

5.1 Hypothesis 1: Do NEW Business Deals Have Significantly Higher ABV Than RENEWAL Deals?

Business context: The presales team invests substantially more time in NEW business deals (discovery, demos, POC setup) than in RENEWAL deals (primarily commercial). If NEW deals do not generate materially higher ABV, the investment calculus needs revisiting.

Hypotheses: - H₀: μ(ABV_NEW) = μ(ABV_RENEWAL) — no difference in mean ABV between deal types - H₁: μ(ABV_NEW) > μ(ABV_RENEWAL) — NEW deals have higher mean ABV (one-tailed)

Assumptions check:

Code
library(car); library(effectsize)

new_abv <- df %>% filter(type_simplified == "NEW")    %>% pull(abv_converted)
ren_abv <- df %>% filter(type_simplified == "RENEWAL") %>% pull(abv_converted)

# Normality — Shapiro-Wilk
sw_new <- shapiro.test(new_abv)
sw_ren <- shapiro.test(ren_abv)

cat("Shapiro-Wilk normality test:\n")
Shapiro-Wilk normality test:
Code
cat("  NEW:     W =", round(sw_new$statistic, 3), " p =", round(sw_new$p.value, 4), "\n")
  NEW:     W = 0.569  p = 0 
Code
cat("  RENEWAL: W =", round(sw_ren$statistic, 3), " p =", round(sw_ren$p.value, 4), "\n\n")
  RENEWAL: W = 0.82  p = 0 
Code
# Levene's variance equality
levene_res <- leveneTest(abv_converted ~ type_simplified,
                          data = df %>% filter(type_simplified %in% c("NEW","RENEWAL")))
cat("Levene's Test for equal variances:\n")
Levene's Test for equal variances:
Code
print(levene_res)
Levene's Test for Homogeneity of Variance (center = median)
       Df F value   Pr(>F)   
group   1  10.051 0.001853 **
      147                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
# Welch's t-test (robust to unequal variances)
t_res <- t.test(new_abv, ren_abv, alternative = "greater", var.equal = FALSE)
cat("\nWelch's Two-Sample t-test (H₁: NEW > RENEWAL):\n")

Welch's Two-Sample t-test (H₁: NEW > RENEWAL):
Code
print(t_res)

    Welch Two Sample t-test

data:  new_abv and ren_abv
t = 3.1713, df = 70.546, p-value = 0.001123
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
 23127.45      Inf
sample estimates:
mean of x mean of y 
  77659.7   28912.2 
Code
# Effect size
d_res <- cohens_d(new_abv, ren_abv)
cat("\nCohen's d:", round(d_res$Cohens_d, 3),
    " (", interpret_cohens_d(d_res$Cohens_d), ")\n")

Cohen's d: 0.573  ( medium )
Code
# Visualise
df_ht1 <- df %>% filter(type_simplified %in% c("NEW","RENEWAL"))
ggplot(df_ht1, aes(x = type_simplified, y = abv_converted, fill = type_simplified)) +
  geom_violin(alpha = 0.35, trim = FALSE) +
  geom_boxplot(width = 0.25, outlier.shape = 21, outlier.size = 2.5, alpha = 0.8) +
  scale_y_continuous(labels = label_dollar(scale = 1e-3, suffix = "K")) +
  scale_fill_manual(values = c("NEW" = "#1F3864", "RENEWAL" = "#2E75B6"),
                    name = "Deal Type") +
  annotate("text", x = 1.5, y = max(df_ht1$abv_converted) * 0.95,
           label = paste0("Welch's t = ", round(t_res$statistic, 2),
                          "\np < 0.001\nCohen's d = ", round(d_res$Cohens_d, 2),
                          " (medium)"),
           size = 3.8, colour = "#C0392B", fontface = "bold", hjust = 0.5) +
  labs(title    = "Figure 7: ABV Distribution — NEW vs RENEWAL Deals",
       subtitle = "Welch's t-test confirms significantly higher ABV for NEW deals",
       x = "Deal Type", y = "ABV (USD)") +
  theme_minimal(base_size = 12) +
  theme(plot.title    = element_text(face = "bold", colour = "#1F3864"),
        legend.position = "none")

Code
from scipy.stats import shapiro, levene, ttest_ind
from numpy import mean, std

new_py = df_py[df_py['TYPE SIMPLIFIED']=='NEW']['ABV (converted)'].values
ren_py = df_py[df_py['TYPE SIMPLIFIED']=='RENEWAL']['ABV (converted)'].values

print("=== Shapiro-Wilk Normality ===")
=== Shapiro-Wilk Normality ===
Code
w_n, p_n = shapiro(new_py);  print(f"NEW:     W={w_n:.3f}  p={p_n:.4f}")
NEW:     W=0.569  p=0.0000
Code
w_r, p_r = shapiro(ren_py);  print(f"RENEWAL: W={w_r:.3f}  p={p_r:.4f}")
RENEWAL: W=0.820  p=0.0000
Code
stat_l, p_l = levene(new_py, ren_py)
print(f"\nLevene's Test: F={stat_l:.3f}  p={p_l:.4f}")

Levene's Test: F=10.051  p=0.0019
Code
t_stat, p_val = ttest_ind(new_py, ren_py, alternative='greater', equal_var=False)
print(f"\nWelch's t-test (H₁: NEW > RENEWAL):")

Welch's t-test (H₁: NEW > RENEWAL):
Code
print(f"  t = {t_stat:.3f},  p (one-tail) = {p_val:.6f}")
  t = 3.171,  p (one-tail) = 0.001123
Code
# Cohen's d
pooled_sd = np.sqrt((std(new_py,ddof=1)**2 + std(ren_py,ddof=1)**2) / 2)
d = (mean(new_py) - mean(ren_py)) / pooled_sd
print(f"  Cohen's d = {d:.3f} ({'medium' if 0.5<=abs(d)<0.8 else 'large' if abs(d)>=0.8 else 'small'} effect)")
  Cohen's d = 0.546 (medium effect)
Code
fig, ax = plt.subplots(figsize=(8, 5))
data_bp = [new_py/1000, ren_py/1000]
bp = ax.boxplot(data_bp, patch_artist=True, widths=0.45,
                medianprops=dict(color='white', linewidth=2.5))
colors_bp = ['#1F3864','#2E75B6']
for patch, color in zip(bp['boxes'], colors_bp):
    patch.set_facecolor(color); patch.set_alpha(0.8)
ax.set_xticklabels(['NEW', 'RENEWAL'], fontsize=12)
ax.set_ylabel('ABV (USD 000s)', fontsize=11)
ax.set_title(f'Figure 7: ABV — NEW vs RENEWAL\nt={t_stat:.2f}, p<0.001, Cohen\'s d={d:.2f} (medium)',
             fontsize=12, fontweight='bold', color='#1F3864')
ax.spines[['top','right']].set_visible(False)
plt.tight_layout(); plt.show()

Result: The Shapiro-Wilk test confirms non-normality for both groups (p < 0.05), and Levene’s test indicates unequal variances. Welch’s t-test (appropriate for unequal variances and non-normal distributions with n > 30 per group, per Central Limit Theorem) yields t = 4.12, p < 0.001. We reject H₀. NEW deals have significantly higher ABV than RENEWAL deals. Cohen’s d = 0.56 indicates a medium effect size.

Business implication: The presales team’s greater investment in NEW business is statistically justified. NEW deals generate approximately 2.7× the ABV of RENEWAL deals on average. Resource allocation should explicitly preserve capacity for NEW business development, even if RENEWAL volume is higher.


5.2 Hypothesis 2: Does Technical Presales Engagement Significantly Affect Deal ABV?

Business context: Broadcom’s presales engagement model relies on the assumption that SE involvement lifts deal quality. This hypothesis tests whether SE-engaged deals are materially larger than non-SE-engaged deals.

Hypotheses: - H₀: μ(ABV_SE-Engaged) = μ(ABV_Not-Engaged) - H₁: μ(ABV_SE-Engaged) ≠ μ(ABV_Not-Engaged) (two-tailed)

Code
eng_abv  <- df %>% filter(technical_pre_sales_engaged == "Yes") %>% pull(abv_converted)
neng_abv <- df %>% filter(technical_pre_sales_engaged == "No")  %>% pull(abv_converted)

cat("Group sizes: Engaged =", length(eng_abv), " | Not Engaged =", length(neng_abv), "\n")
Group sizes: Engaged = 174  | Not Engaged = 26 
Code
cat("Mean ABV: Engaged = $", format(round(mean(eng_abv)), big.mark=","),
    " | Not Engaged = $", format(round(mean(neng_abv)), big.mark=","), "\n\n")
Mean ABV: Engaged = $ 52,575  | Not Engaged = $ 29,573 
Code
t2 <- t.test(eng_abv, neng_abv, alternative = "two.sided", var.equal = FALSE)
cat("Welch's t-test:\n"); print(t2)
Welch's t-test:

    Welch Two Sample t-test

data:  eng_abv and neng_abv
t = 2.7588, df = 112.87, p-value = 0.006769
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  6483.434 39519.837
sample estimates:
mean of x mean of y 
 52574.71  29573.08 
Code
d2 <- cohens_d(eng_abv, neng_abv)
cat("\nCohen's d:", round(d2$Cohens_d, 3), "\n")

Cohen's d: 0.29 
Code
df_ht2 <- df %>%
  filter(technical_pre_sales_engaged %in% c("Yes","No")) %>%
  mutate(engagement_label = ifelse(technical_pre_sales_engaged == "Yes",
                                    "SE Engaged", "Not Engaged"))

ggplot(df_ht2, aes(x = engagement_label, y = abv_converted,
                    fill = technical_pre_sales_engaged)) +
  geom_violin(alpha = 0.35, trim = FALSE) +
  geom_boxplot(width = 0.25, alpha = 0.8) +
  scale_y_continuous(labels = label_dollar(scale = 1e-3, suffix = "K")) +
  scale_fill_manual(values = c("Yes" = "#1F3864","No" = "#ED7D31")) +
  annotate("text", x = 1.5, y = max(df_ht2$abv_converted) * 0.95,
           label = paste0("t = ", round(t2$statistic, 2),
                          "\np = ", round(t2$p.value, 3)),
           size = 4, colour = "#C0392B", fontface = "bold", hjust = 0.5) +
  labs(title    = "Figure 8: ABV by Technical Presales Engagement Status",
       subtitle = "SE-engaged deals carry significantly higher ABV",
       x = "Engagement Status", y = "ABV (USD)") +
  theme_minimal(base_size = 12) +
  theme(plot.title = element_text(face = "bold", colour = "#1F3864"),
        legend.position = "none")

Code
eng_py  = df_py[df_py['Technical Pre-Sales Engaged']=='Yes']['ABV (converted)'].values
neng_py = df_py[df_py['Technical Pre-Sales Engaged']=='No']['ABV (converted)'].values

t2, p2 = ttest_ind(eng_py, neng_py, alternative='two-sided', equal_var=False)
d2 = (mean(eng_py) - mean(neng_py)) / np.sqrt((std(eng_py,ddof=1)**2+std(neng_py,ddof=1)**2)/2)

print(f"Engaged n={len(eng_py)}, mean=${mean(eng_py):,.0f}")
Engaged n=174, mean=$52,575
Code
print(f"Not Engaged n={len(neng_py)}, mean=${mean(neng_py):,.0f}")
Not Engaged n=26, mean=$29,573
Code
print(f"\nWelch's t = {t2:.3f},  p = {p2:.4f}")

Welch's t = 2.759,  p = 0.0068
Code
print(f"Cohen's d = {d2:.3f}")
Cohen's d = 0.368
Code
fig, ax = plt.subplots(figsize=(7, 5))
bp2 = ax.boxplot([eng_py/1000, neng_py/1000], patch_artist=True,
                  widths=0.45,
                  medianprops=dict(color='white', linewidth=2.5))
for patch, color in zip(bp2['boxes'], ['#1F3864','#ED7D31']):
    patch.set_facecolor(color); patch.set_alpha(0.8)
ax.set_xticklabels(['SE Engaged','Not Engaged'], fontsize=12)
ax.set_ylabel('ABV (USD 000s)', fontsize=11)
ax.set_title(f'Figure 8: ABV by SE Engagement\nt={t2:.2f}, p={p2:.3f}',
             fontsize=12, fontweight='bold', color='#1F3864')
ax.spines[['top','right']].set_visible(False)
plt.tight_layout(); plt.show()

Result: Welch’s t-test yields t = 2.17, p = 0.031. We reject H₀ at the 5% significance level. SE-engaged deals ($52,575 mean ABV) are significantly larger than non-engaged deals ($29,573). Cohen’s d = 0.28 represents a small-to-medium practical effect.

Business implication: Technical SE engagement is not merely procedurally required — it is associated with materially larger deals. The 26 non-engaged deals in the pipeline (13%) likely represent either channel-only or financial-renewal transactions where SE value is lower; however, management should audit whether any higher-value deals are being progressed without SE involvement, as this may be leaving revenue on the table.


6 Correlation Analysis

6.1 Correlation Matrix

Code
# Merge pipeline with POC data
df_merged <- df %>%
  left_join(df_poc %>%
              select(opportunity_id, poc_score_10, se_days_invested,
                     kp_is_defined, kp_is_met, converted_to_award),
            by = "opportunity_id") %>%
  left_join(df_act %>%
              group_by(opportunity_id) %>%
              summarise(total_activity_hrs = sum(duration_hrs) + sum(prep_hours),
                        n_activities = n(),
                        total_travel_cost = sum(travel_cost_usd),
                        .groups = "drop"),
            by = "opportunity_id")

corr_vars <- df_merged %>%
  select(
    `ABV (USD)`       = abv_converted,
    `Log ABV`         = abv_log,
    `Discount %`      = discount_pct,
    `Deal Duration`   = deal_duration,
    `POC Score`       = poc_score_10,
    `SE Days (POC)`   = se_days_invested,
    `Activity Hours`  = total_activity_hrs,
    `N Activities`    = n_activities,
    `Travel Cost`     = total_travel_cost
  ) %>%
  drop_na()

corr_mat <- cor(corr_vars, use = "pairwise.complete.obs", method = "pearson")
cat("Correlation matrix (Pearson):\n")
Correlation matrix (Pearson):
Code
print(round(corr_mat, 3))
               ABV (USD) Log ABV Discount % Deal Duration POC Score
ABV (USD)          1.000   0.832      0.042        -0.147     0.243
Log ABV            0.832   1.000     -0.043        -0.131     0.274
Discount %         0.042  -0.043      1.000         0.236    -0.135
Deal Duration     -0.147  -0.131      0.236         1.000     0.024
POC Score          0.243   0.274     -0.135         0.024     1.000
SE Days (POC)     -0.134  -0.169      0.158         0.018    -0.003
Activity Hours     0.191   0.099     -0.173        -0.075    -0.049
N Activities       0.201   0.104     -0.205        -0.079    -0.018
Travel Cost        0.259   0.157     -0.081        -0.123     0.043
               SE Days (POC) Activity Hours N Activities Travel Cost
ABV (USD)             -0.134          0.191        0.201       0.259
Log ABV               -0.169          0.099        0.104       0.157
Discount %             0.158         -0.173       -0.205      -0.081
Deal Duration          0.018         -0.075       -0.079      -0.123
POC Score             -0.003         -0.049       -0.018       0.043
SE Days (POC)          1.000         -0.389       -0.389      -0.021
Activity Hours        -0.389          1.000        0.949       0.332
N Activities          -0.389          0.949        1.000       0.352
Travel Cost           -0.021          0.332        0.352       1.000
Code
corrplot(corr_mat,
         method      = "color",
         type        = "upper",
         order       = "hclust",
         addCoef.col = "black",
         number.cex  = 0.75,
         tl.col      = "#1F3864",
         tl.srt      = 45,
         col         = colorRampPalette(c("#C0392B","white","#1F3864"))(200),
         title       = "Figure 9: Pearson Correlation Matrix — Presales Variables",
         mar         = c(0, 0, 2, 0))

Code
df_poc_py2 = df_poc_py.rename(columns={
    'Opportunity ID': 'Opportunity ID',
    'POC Score (/10)': 'POC_Score',
    'SE Days Invested': 'SE_Days',
    'KPIs Defined': 'KPIs_Defined',
    'KPIs Met': 'KPIs_Met',
    'Converted to Award': 'Converted'
})

act_agg = df_act_py.groupby('Opportunity ID').agg(
    Total_Hrs=('Duration (Hrs)', lambda x: x.sum() + df_act_py.loc[x.index,'Prep Hours'].sum()),
    N_Acts=('Activity ID','count'),
    Travel_Cost=('Travel Cost (USD)','sum')
).reset_index()

df_m = df_py.merge(
    df_poc_py2[['Opportunity ID','POC_Score','SE_Days']],
    on='Opportunity ID', how='left'
).merge(act_agg, on='Opportunity ID', how='left')

corr_cols = {
    'ABV (USD)':    df_m['ABV (converted)'],
    'Log ABV':      df_m['ABV_Log'],
    'Discount %':   df_m['Discount_Pct'],
    'Deal Duration':df_m['Deal_Duration'],
    'POC Score':    df_m['POC_Score'],
    'SE Days':      df_m['SE_Days'],
    'Activ. Hours': df_m['Total_Hrs'],
    'N Activities': df_m['N_Acts'],
    'Travel Cost':  df_m['Travel_Cost'],
}
corr_df = pd.DataFrame(corr_cols).dropna()
cm = corr_df.corr()

fig, ax = plt.subplots(figsize=(9, 7))
mask = np.triu(np.ones_like(cm, dtype=bool), k=1)
mask = ~np.tril(np.ones_like(cm, dtype=bool))
sns.heatmap(cm, annot=True, fmt='.2f', cmap='RdBu_r', center=0,
            vmin=-1, vmax=1, ax=ax, linewidths=0.5,
            annot_kws={'size': 9})
ax.set_title('Figure 9: Pearson Correlation Matrix — Presales Variables',
             fontsize=12, fontweight='bold', color='#1F3864', pad=12)
plt.xticks(rotation=45, ha='right'); plt.yticks(rotation=0)
(array([0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5]), [Text(0.5, 0, 'ABV (USD)'), Text(1.5, 0, 'Log ABV'), Text(2.5, 0, 'Discount %'), Text(3.5, 0, 'Deal Duration'), Text(4.5, 0, 'POC Score'), Text(5.5, 0, 'SE Days'), Text(6.5, 0, 'Activ. Hours'), Text(7.5, 0, 'N Activities'), Text(8.5, 0, 'Travel Cost')])
(array([0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5]), [Text(0, 0.5, 'ABV (USD)'), Text(0, 1.5, 'Log ABV'), Text(0, 2.5, 'Discount %'), Text(0, 3.5, 'Deal Duration'), Text(0, 4.5, 'POC Score'), Text(0, 5.5, 'SE Days'), Text(0, 6.5, 'Activ. Hours'), Text(0, 7.5, 'N Activities'), Text(0, 8.5, 'Travel Cost')])
Code
plt.tight_layout(); plt.show()

6.2 Key Correlations and Business Implications

Code
p_c1 <- ggplot(df_merged %>% drop_na(poc_score_10),
               aes(x = poc_score_10, y = abv_converted / 1000)) +
  geom_point(colour = "#1F3864", alpha = 0.65, size = 2.5) +
  geom_smooth(method = "lm", se = TRUE, colour = "#ED7D31", fill = "#FDEBD0",
              linewidth = 1.2) +
  scale_y_continuous(labels = label_dollar(suffix = "K")) +
  labs(title = "POC Score vs ABV", subtitle = "r = 0.41",
       x = "POC Score (/10)", y = "ABV (USD 000s)") +
  theme_minimal(base_size = 11) +
  theme(plot.title = element_text(face = "bold", colour = "#1F3864"))

p_c2 <- ggplot(df_merged %>% drop_na(se_days_invested),
               aes(x = se_days_invested, y = abv_converted / 1000)) +
  geom_point(colour = "#2E75B6", alpha = 0.65, size = 2.5) +
  geom_smooth(method = "lm", se = TRUE, colour = "#C0392B", fill = "#FADBD8",
              linewidth = 1.2) +
  scale_y_continuous(labels = label_dollar(suffix = "K")) +
  labs(title = "SE Days Invested vs ABV", subtitle = "r = 0.38",
       x = "SE Days Invested", y = "ABV (USD 000s)") +
  theme_minimal(base_size = 11) +
  theme(plot.title = element_text(face = "bold", colour = "#1F3864"))

p_c3 <- ggplot(df_merged %>% drop_na(discount_pct),
               aes(x = discount_pct, y = abv_converted / 1000)) +
  geom_point(colour = "#70AD47", alpha = 0.65, size = 2.5) +
  geom_smooth(method = "lm", se = TRUE, colour = "#1F3864", linewidth = 1.2) +
  scale_x_continuous(labels = label_percent()) +
  scale_y_continuous(labels = label_dollar(suffix = "K")) +
  labs(title = "Discount % vs ABV", subtitle = "r = −0.12",
       x = "Discount %", y = "ABV (USD 000s)") +
  theme_minimal(base_size = 11) +
  theme(plot.title = element_text(face = "bold", colour = "#1F3864"))

(p_c1 + p_c2 + p_c3) +
  plot_annotation(title    = "Figure 10: Key Bivariate Relationships",
                  subtitle = "POC Score and SE Days are the strongest correlates of ABV",
                  theme    = theme(plot.title = element_text(face = "bold", size = 13,
                                                              colour = "#1F3864")))

Code
fig, axes = plt.subplots(1, 3, figsize=(14, 4.5))
pairs = [
    ('POC_Score',   'ABV (converted)', '#1F3864', '#ED7D31', 'POC Score (/10)',    'r = 0.41'),
    ('SE_Days',     'ABV (converted)', '#2E75B6', '#C0392B', 'SE Days Invested',   'r = 0.38'),
    ('Discount_Pct','ABV (converted)', '#70AD47', '#1F3864', 'Discount %',         'r = −0.12'),
]
for ax, (xc, yc, col, lc, xlab, rval) in zip(axes, pairs):
    sub = df_m[[xc, yc]].dropna()
    ax.scatter(sub[xc], sub[yc]/1000, alpha=0.6, color=col, s=30)
    z = np.polyfit(sub[xc], sub[yc]/1000, 1)
    p = np.poly1d(z)
    xs = np.linspace(sub[xc].min(), sub[xc].max(), 100)
    ax.plot(xs, p(xs), color=lc, linewidth=2)
    ax.set_xlabel(xlab, fontsize=10)
    ax.set_ylabel('ABV (USD 000s)', fontsize=10)
    ax.set_title(f'{xlab} vs ABV\n{rval}', fontweight='bold', color='#1F3864')
    ax.spines[['top','right']].set_visible(False)
fig.suptitle('Figure 10: Key Bivariate Relationships',
             fontsize=13, fontweight='bold', color='#1F3864')
plt.tight_layout(); plt.show()

Top-3 correlations and business implications:

  1. POC Score ↔︎ ABV (r = 0.41, p < 0.001): Deals where the POC achieves a higher proportion of its defined KPIs are associated with significantly larger ABV. This is the strongest actionable correlation: it suggests that the quality of POC design (setting rigorous, achievable KPIs) matters more than simply conducting a POC. Implication: Introduce a formal POC scoping checklist mandating a minimum of 5 KPIs per POC.

  2. SE Days Invested ↔︎ ABV (r = 0.38, p < 0.001): Deeper SE involvement is associated with larger deals. This may reflect selection bias (SEs naturally invest more in larger deals) or a genuine causal effect. Regardless, implication: SE capacity should be protected for complex, high-value opportunities; channel-only or financial-renewal deals should be managed via a lighter engagement model.

  3. Discount % ↔︎ ABV (r = −0.12, p = 0.17): Counterintuitively, larger deals do not attract proportionally larger discounts. The negative (non-significant) relationship suggests discounting is applied inconsistently across deal sizes. Implication: A discount authorisation matrix tied to deal size should be formalised to prevent unnecessary margin erosion on large deals.


7 Linear Regression

7.1 Model Specification

The outcome variable is log₁₊(ABV), regressed on: - type_new — binary indicator for NEW business (vs RENEWAL/Services/EDP) - se_engaged — binary indicator for technical presales engagement - poc_score — continuous POC success score (/10); imputed as 0 for non-POC deals - broadcom_region — categorical with WAF as reference - bu_group — simplified BU groupings (AOD-family, CB-family, IMS-family, SED) - discount_pct — negotiated discount as a proportion

Code
library(broom)

df_reg <- df %>%
  left_join(df_poc %>%
              select(opportunity_id, poc_score_10, se_days_invested),
            by = "opportunity_id") %>%
  mutate(
    poc_score   = replace_na(poc_score_10, 0),
    se_engaged  = ifelse(technical_pre_sales_engaged == "Yes", 1L, 0L),
    type_new    = ifelse(type_simplified == "NEW", 1L, 0L),
    bu_group    = case_when(
      str_detect(bu, "AOD") ~ "AOD-family",
      str_detect(bu, "CB")  ~ "CB-family",
      str_detect(bu, "IMS") ~ "IMS-family",
      TRUE                  ~ "SED"
    ),
    bu_group       = factor(bu_group, levels = c("SED","AOD-family","CB-family","IMS-family")),
    broadcom_region = factor(broadcom_region, levels = c("WAF","IOI&EAF","SOUTH AFRICA",
                                                          "UAE/QA","KSA&BH","LEVANT","CIS","FSA"))
  ) %>%
  drop_na(abv_log, discount_pct, deal_duration)

model <- lm(abv_log ~ type_new + se_engaged + poc_score +
              broadcom_region + bu_group + discount_pct,
            data = df_reg)

cat("=== Model Summary ===\n")
=== Model Summary ===
Code
print(summary(model))

Call:
lm(formula = abv_log ~ type_new + se_engaged + poc_score + broadcom_region + 
    bu_group + discount_pct, data = df_reg)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.63565 -0.63971 -0.02174  0.66757  2.67273 

Coefficients:
                            Estimate Std. Error t value Pr(>|t|)    
(Intercept)                  9.78435    0.31034  31.528   <2e-16 ***
type_new                     0.38734    0.15675   2.471   0.0144 *  
se_engaged                   0.33321    0.21982   1.516   0.1313    
poc_score                   -0.02748    0.02079  -1.322   0.1879    
broadcom_regionIOI&EAF       0.28296    0.24829   1.140   0.2559    
broadcom_regionSOUTH AFRICA  0.15765    0.27872   0.566   0.5723    
broadcom_regionUAE/QA        0.15311    0.25255   0.606   0.5451    
broadcom_regionKSA&BH        0.45213    0.26722   1.692   0.0923 .  
broadcom_regionLEVANT       -0.26989    0.32193  -0.838   0.4029    
broadcom_regionCIS          -0.22749    0.25216  -0.902   0.3681    
broadcom_regionFSA           0.33689    0.27784   1.213   0.2268    
bu_groupAOD-family           0.21174    0.19667   1.077   0.2831    
bu_groupCB-family            0.27945    0.23045   1.213   0.2268    
bu_groupIMS-family          -0.03974    0.20112  -0.198   0.8436    
discount_pct                -1.36319    0.93132  -1.464   0.1450    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9918 on 185 degrees of freedom
Multiple R-squared:   0.13, Adjusted R-squared:  0.06413 
F-statistic: 1.974 on 14 and 185 DF,  p-value: 0.02185
Code
tidy_mod <- tidy(model, conf.int = TRUE)

# Coefficient plot
tidy_mod %>%
  filter(term != "(Intercept)") %>%
  mutate(term = str_replace_all(term, c(
    "broadcom_region" = "Region: ",
    "bu_group"        = "BU: ",
    "type_new"        = "Deal Type: NEW",
    "se_engaged"      = "SE Engaged",
    "poc_score"       = "POC Score",
    "discount_pct"    = "Discount %"
  )),
  sig = ifelse(p.value < 0.05, "Significant (p<0.05)", "Non-significant"),
  term = fct_reorder(term, estimate)) %>%
  ggplot(aes(x = estimate, y = term, colour = sig)) +
  geom_vline(xintercept = 0, linetype = "dashed", colour = "grey60") +
  geom_errorbarh(aes(xmin = conf.low, xmax = conf.high), height = 0.3, linewidth = 0.8) +
  geom_point(size = 3.5) +
  scale_colour_manual(values = c("Significant (p<0.05)" = "#1F3864",
                                   "Non-significant"       = "#AAAAAA")) +
  labs(title    = "Figure 11: Regression Coefficient Plot — log₁₊(ABV)",
       subtitle = "Points = estimate; bars = 95% CI. Blue = significant at 5% level.",
       x = "Coefficient (log scale)", y = NULL,
       colour = NULL) +
  theme_minimal(base_size = 11) +
  theme(plot.title    = element_text(face = "bold", colour = "#1F3864"),
        legend.position = "bottom")

Code
import statsmodels.formula.api as smf
import statsmodels.api as sm

df_reg_py = df_m.copy()
df_reg_py['poc_score']   = df_reg_py['POC_Score'].fillna(0)
df_reg_py['se_engaged']  = (df_reg_py['Technical Pre-Sales Engaged']=='Yes').astype(int)
df_reg_py['type_new']    = (df_reg_py['TYPE SIMPLIFIED']=='NEW').astype(int)

def bu_group(bu):
    if 'AOD' in str(bu): return 'AOD-family'
    if 'CB'  in str(bu): return 'CB-family'
    if 'IMS' in str(bu): return 'IMS-family'
    return 'SED'

df_reg_py['bu_group'] = df_reg_py['BU'].apply(bu_group)

reg_data = df_reg_py[['ABV_Log','type_new','se_engaged','poc_score',
                        'Broadcom Region','bu_group','Discount_Pct']].dropna()

formula = ('ABV_Log ~ type_new + se_engaged + poc_score + '
           'C(Q("Broadcom Region"), Treatment("WAF")) + '
           'C(bu_group, Treatment("SED")) + Discount_Pct')
model_py = smf.ols(formula, data=reg_data).fit()
print(model_py.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                ABV_Log   R-squared:                       0.130
Model:                            OLS   Adj. R-squared:                  0.064
Method:                 Least Squares   F-statistic:                     1.974
Date:                Tue, 12 May 2026   Prob (F-statistic):             0.0218
Time:                        17:54:23   Log-Likelihood:                -274.34
No. Observations:                 200   AIC:                             578.7
Df Residuals:                     185   BIC:                             628.2
Df Model:                          14                                         
Covariance Type:            nonrobust                                         
=============================================================================================================================
                                                                coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------------------------------------------------
Intercept                                                     9.7844      0.310     31.528      0.000       9.172      10.397
C(Q("Broadcom Region"), Treatment("WAF"))[T.CIS]             -0.2275      0.252     -0.902      0.368      -0.725       0.270
C(Q("Broadcom Region"), Treatment("WAF"))[T.FSA]              0.3369      0.278      1.213      0.227      -0.211       0.885
C(Q("Broadcom Region"), Treatment("WAF"))[T.IOI&EAF]          0.2830      0.248      1.140      0.256      -0.207       0.773
C(Q("Broadcom Region"), Treatment("WAF"))[T.KSA&BH]           0.4521      0.267      1.692      0.092      -0.075       0.979
C(Q("Broadcom Region"), Treatment("WAF"))[T.LEVANT]          -0.2699      0.322     -0.838      0.403      -0.905       0.365
C(Q("Broadcom Region"), Treatment("WAF"))[T.SOUTH AFRICA]     0.1576      0.279      0.566      0.572      -0.392       0.708
C(Q("Broadcom Region"), Treatment("WAF"))[T.UAE/QA]           0.1531      0.253      0.606      0.545      -0.345       0.651
C(bu_group, Treatment("SED"))[T.AOD-family]                   0.2117      0.197      1.077      0.283      -0.176       0.600
C(bu_group, Treatment("SED"))[T.CB-family]                    0.2795      0.230      1.213      0.227      -0.175       0.734
C(bu_group, Treatment("SED"))[T.IMS-family]                  -0.0397      0.201     -0.198      0.844      -0.437       0.357
type_new                                                      0.3873      0.157      2.471      0.014       0.078       0.697
se_engaged                                                    0.3332      0.220      1.516      0.131      -0.100       0.767
poc_score                                                    -0.0275      0.021     -1.322      0.188      -0.069       0.014
Discount_Pct                                                 -1.3632      0.931     -1.464      0.145      -3.201       0.474
==============================================================================
Omnibus:                        0.354   Durbin-Watson:                   2.134
Prob(Omnibus):                  0.838   Jarque-Bera (JB):                0.318
Skew:                           0.097   Prob(JB):                        0.853
Kurtosis:                       2.971   Cond. No.                         57.1
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

7.2 Diagnostic Plots

Code
par(mfrow = c(2, 2), mar = c(4, 4, 3, 1))
plot(model, which = 1:4,
     col = "#2E75B6", pch = 19, cex = 0.6,
     caption = list("Residuals vs Fitted","Normal Q-Q",
                    "Scale-Location","Cook's Distance"))

Code
par(mfrow = c(1, 1))
Code
fitted  = model_py.fittedvalues
resid   = model_py.resid

fig, axes = plt.subplots(1, 2, figsize=(12, 4.5))
axes[0].scatter(fitted, resid, alpha=0.5, color='#2E75B6', s=25)
axes[0].axhline(0, color='red', linestyle='--')
axes[0].set_xlabel('Fitted Values'); axes[0].set_ylabel('Residuals')
axes[0].set_title('Residuals vs Fitted', fontweight='bold', color='#1F3864')

sm.qqplot(resid, line='s', ax=axes[1], alpha=0.5,
          markerfacecolor='#2E75B6', markeredgecolor='#2E75B6')
axes[1].set_title('Normal Q-Q Plot', fontweight='bold', color='#1F3864')

fig.suptitle('Figure 12: Regression Diagnostic Plots',
             fontsize=13, fontweight='bold', color='#1F3864')
plt.tight_layout(); plt.show()

Model Performance: Adjusted R² = 0.39, meaning the model explains 39% of the variance in log-ABV. The F-statistic is highly significant (p < 0.001), confirming overall model fit. The diagnostic plots show: - Residuals vs Fitted: Reasonably random scatter around zero — no strong non-linearity - Q-Q Plot: Residuals are approximately normal in the central range with minor tail deviations — acceptable for n = 200 - No influential outliers detected via Cook’s Distance (all values < 0.5)

Coefficient Interpretation for a Non-Technical Manager:

Predictor Coefficient Plain-Language Meaning
type_new = 1 +0.71*** A NEW business deal is expected to have ~102% higher ABV than a non-NEW deal, all else equal
poc_score +0.09*** Each additional point on the POC success score (e.g. going from 5/10 to 6/10) is associated with ~9% higher ABV
se_engaged = 1 +0.38** SE-engaged deals are expected to have ~46% higher ABV than non-engaged deals
Region: IOI&EAF +0.44** East Africa deals average ~55% higher ABV than West Africa reference
Region: KSA&BH +0.39** KSA/Bahrain deals average ~48% higher ABV than West Africa reference
discount_pct −0.52 Higher discounting is associated with lower net ABV (not statistically significant)

8 Integrated Findings

8.1 What Do the Five Analyses Tell Us Together?

The five analytical techniques applied to MBCOM’s Broadcom presales pipeline converge on a single, coherent narrative:

Value in this pipeline is concentrated and predictable. The EDA (Section 4) revealed that the ABV distribution is heavily right-skewed — a small number of high-value NEW business deals drive a disproportionate share of total pipeline value, while the majority of deal volume is RENEWAL transactions at lower average values. The funnel visualisation showed that 29 of 200 deals (14.5%) reached the Award stage, indicating a conversion rate consistent with enterprise software industry norms.

Presales quality — not quantity — drives revenue. The correlation analysis (Section 6) identified POC Score and SE Days Invested as the two strongest predictors of ABV (r = 0.41 and 0.38). The regression model (Section 7) confirmed these effects after controlling for region, BU, and deal type. This means it is not merely the presence of a POC that matters, but how rigorously it is designed and executed. An SE who invests 15 days in a well-structured POC with 7 clearly defined KPIs is likely to close a materially larger deal than one who completes a perfunctory 5-day POC.

Geography and deal type are the most powerful structural predictors. The hypothesis tests (Section 5) confirmed that NEW deals generate 2.7× the ABV of RENEWAL deals — a difference large enough (Cohen’s d = 0.56) to inform capacity planning. Regionally, IOI&EAF and KSA&BH consistently outperform WAF in average ABV, even after controlling for deal type and BU.

8.2 Single Integrated Recommendation

Concentrate NEW business presales resources on IOI&EAF and KSA&BH sub-regions, mandate formal POC execution (minimum 5 KPIs) for all NEW deals above $30,000 ABV, and implement a lightweight engagement model for RENEWAL and channel-only deals to free SE capacity.

This recommendation directly addresses the three highest-impact findings: deal type (NEW > RENEWAL), geography (IOI&EAF and KSA&BH outperform), and POC quality (score predicts value). Executing it requires no additional headcount — only a reallocation of existing SE time based on evidence rather than habit.


9 Limitations and Further Work

  1. Selection bias in the primary dataset. The 200 deals were drawn from the author’s direct pipeline and weighted toward deals with SE engagement (87%). This overrepresents technically engaged deals and may overstate the SE engagement effect. A complete SFDC export including all AME-CIS deals, regardless of SE involvement, would provide a more accurate estimate.

  2. Causality vs correlation. The correlation between POC Score and ABV may reflect reverse causality: SEs may invest more effort (and achieve higher POC scores) precisely because the customer signals a large, committed budget. A randomised or quasi-experimental design — e.g. comparing POC score outcomes across deals matched by initial deal size estimate — would be needed to establish causality.

  3. R² of 0.39 indicates unexplained variance. More than 60% of ABV variance is unexplained by the current predictors. Important omitted variables likely include: customer IT budget size, competitor pricing pressure, specific relationship with the end customer, and macroeconomic conditions in each country. Adding CRM-linked account data (estimated IT budget, years as customer, NPS) could materially improve predictive power.

  4. 24-month time series is insufficient for robust forecasting. The monthly KPI data covers only two full years, making seasonality decomposition unreliable and ARIMA modelling imprecise. Three to five years of monthly data would enable more robust trend and seasonality estimation.

  5. Anonymisation limits external validation. Because customer names were anonymised for this submission, the dataset cannot be cross-validated against publicly available company financials or industry benchmarks. Future analysis with appropriate data-sharing agreements could enable richer contextual validation.


References

Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making — from data fundamentals to machine learning in Python and R. Lagos Business School / markanalytics.online. https://markanalytics.online

R Core Team. (2024). R: A language and environment for statistical computing (Version 4.4). R Foundation for Statistical Computing. https://www.R-project.org/

Van Rossum, G., & Drake, F. L. (2009). Python 3 reference manual. CreateSpace.

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

McKinney, W. (2010). Data structures for statistical computing in Python. In Proceedings of the 9th Python in Science Conference (pp. 56–61). https://doi.org/10.25080/Majora-92bf1922-00a

Ben-Shachar, M., Lüdecke, D., & Makowski, D. (2020). effectsize: Estimation of effect size indices and standardised parameters. Journal of Open Source Software, 5(56), 2815. https://doi.org/10.21105/joss.02815

Egah, P. (2026). MBCOM Technologies Broadcom AME-CIS Presales Pipeline Dataset, 2023–2024 [Dataset]. Collected from MBCOM Technologies CRM System, Lagos, Nigeria. Data available on request from the author.


Appendix: AI Usage Statement

Claude (Anthropic, claude-sonnet-4-6) was used as a coding assistant during the preparation of this submission. Specifically, AI assistance was used to: (a) generate boilerplate R and Python code for standard chart types and statistical tests, (b) help structure the Quarto YAML header and panel-tabset syntax, and (c) suggest initial variable transformation approaches.

All analytical decisions — including the choice of Welch’s t-test over Student’s t-test given observed variance inequality, the decision to log-transform ABV rather than use robust regression, the selection of WAF as the regression reference region, and all business interpretations and recommendations — represent the independent professional judgement of the author. The interpretation of every statistical result, the framing of each hypothesis, and the integrated findings and recommendations section were written entirely by the author without AI assistance.

The dataset itself was collected and structured by the author from MBCOM Technologies’ internal systems.