Tender Win-Rate Analytics: Understanding What Drives Bid Success in Engineering Automation Technology Limited

Author

Agbadobi Blessing Osochi

Published

May 9, 2026

1. Executive Summary

This study analyses 102 tender records logged by an oil and gas engineering services firm operating across Nigeria between February 2025 and February 2026. The data was extracted from the organisation’s internal Business Development tender register and covers four service categories: Instrumentation & Controls, Asset Integrity Management, Engineering/Construction & Maintenance, and Testing & Calibration. The central business problem is a critically low bid-win rate — only 7 out of 67 submitted bids (approximately 10.4%) were awarded — which threatens the firm’s revenue pipeline. Using five analytical techniques (Exploratory Data Analysis, Data Visualisation, Hypothesis Testing, Correlation Analysis, and Logistic Regression), this study finds that service category and submission punctuality are the strongest factors associated with award outcomes. The key recommendation is that the firm should implement a formal bid/no-bid decision gate, concentrate resources on Testing & Calibration and Asset Integrity Management opportunities, and enforce strict submission deadline compliance to improve conversion rates.


2. Professional Disclosure

Job Title: Tendering and Commercial Officer

Department: Business Development / Commercial Department

Organisation: Engineering Automation Technology Limited

Why these five techniques are directly relevant to my work:

  1. Exploratory Data Analysis (EDA) — As a Tendering and Commercial Officer, I manage and maintain the firm’s bid register. Before making any recommendations on where to focus bid effort, I need an honest statistical picture of our submission activity, win rates, and data quality. EDA is the first thing I would do when presenting a pipeline review to management.

  2. Data Visualisation — Senior management and business development directors do not read tables of numbers. Charts showing win rates by service category and monthly bid trends are how I would communicate the firm’s commercial performance in a pipeline review meeting.

  3. Hypothesis Testing — When I observe that one service category appears to win more bids than another, I need to know whether that difference is statistically real or simply due to chance. Formal hypothesis testing prevents the firm from reallocating resources based on noise.

  4. Correlation Analysis — I want to know whether submitting bids on time is genuinely associated with winning, or whether it is coincidental. Correlation analysis quantifies this relationship across the entire dataset.

  5. Logistic Regression — Our outcome (win or lose) is binary. Logistic regression lets me build a model that estimates the probability of winning any new bid based on its service category and submission punctuality — directly supporting our bid/no-bid decision process.


3. Data Collection & Sampling

Source: Internal tender register maintained by the Business Development / Commercial Department.

Collection method: The dataset was extracted directly from the organisation’s tender tracking spreadsheet, which is updated by the author as part of day-to-day tendering responsibilities. Each row represents one tender or RFQ opportunity received or identified by the firm.

Variables recorded:

Variable Description
S/N Serial number
Tender No Reference number assigned by the client
Client Anonymised client identifier (Client 1–29)
Job Title Brief description of the scope of work
Strategic Business Unit (SBU) Service category assigned based on scope of work
Tender Type Full Tender or RFQ
Expected Submission Date Deadline communicated by the client
Submission Date Date the bid was actually submitted
Submission Status SUBMITTED or Did not Bid
Award Status AWARDED or NOT AWARDED

SBU classification note: SBU labels were assigned by the author using professional judgement based on the scope of work described in each tender’s Job Title. This is standard classification practice in business development tracking.

Time period: February 2025 – February 2026 (approximately 12 months).

Sample size: 102 observations — a full census of all tenders logged during the period.

Ethical notes: All client names have been replaced with anonymised codes (Client 1–29). No employee personal data is included. The dataset covers commercial activity only and has been cleared for academic submission by the author.


Code
# Load R packages
library(tidyverse)
library(readxl)
library(janitor)
library(lubridate)
library(ggcorrplot)
library(scales)
library(knitr)
library(kableExtra)
library(broom)
library(pROC)
library(patchwork)

# Load data
raw <- read_excel("Data.xlsx", sheet = "DA") |>
  clean_names()

glimpse(raw)
Rows: 102
Columns: 10
$ s_n                         <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,…
$ tender_no                   <chr> "ESSO.00000090", "14411", "SEPLAT.00000100…
$ client                      <chr> "Client 1", "Client 2", "Client 3", "Clien…
$ job_title                   <chr> "Provision Of Downhole Completions Equipme…
$ strategic_business_unit_sbu <chr> "Engineering, Construction & Maintenance",…
$ tender_type                 <chr> "Tender", "Tender", "Tender", "Tender", "T…
$ expected_submission_date    <chr> "21/03/2025", "22/03/2025", "29/03/2025", …
$ submission_date             <chr> "N/A", "22/03/2025", "N/A", "20/03/2025", …
$ submission_status           <chr> "Did not Bid", "SUBMITTED", "Did not Bid",…
$ award_status                <chr> "NOT SUBMITTED", "NOT AWARDED", "NOT SUBMI…
Code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import seaborn as sns
from scipy import stats
import warnings
warnings.filterwarnings("ignore")

raw_py = pd.read_excel("Data.xlsx", sheet_name="DA")
raw_py.columns = (raw_py.columns
                  .str.strip()
                  .str.lower()
                  .str.replace(r"[^a-z0-9]+", "_", regex=True)
                  .str.strip("_"))
print(raw_py.shape)
(102, 10)
Code
print(raw_py.dtypes)
s_n                            float64
tender_no                       object
client                             str
job_title                          str
strategic_business_unit_sbu        str
tender_type                        str
expected_submission_date        object
submission_date                 object
submission_status                  str
award_status                       str
dtype: object

4. Data Description

Code
df <- raw |>
  mutate(
    sbu = case_when(
      str_detect(strategic_business_unit_sbu, regex("instrument", ignore_case=TRUE)) ~ "Instrumentation & Controls",
      str_detect(strategic_business_unit_sbu, regex("asset", ignore_case=TRUE))      ~ "Asset Integrity Management",
      str_detect(strategic_business_unit_sbu, regex("test|calibrat", ignore_case=TRUE)) ~ "Testing & Calibration",
      str_detect(strategic_business_unit_sbu, regex("engineer|construct|mainten", ignore_case=TRUE)) ~ "Engineering, Construction & Maintenance",
      TRUE ~ "Other"
    ),
    tender_type  = str_trim(tender_type),
    sub_status   = str_trim(submission_status),
    award_status = str_trim(award_status),
    awarded      = if_else(award_status == "AWARDED",   1L, 0L),
    submitted    = if_else(sub_status   == "SUBMITTED", 1L, 0L),
    exp_date     = suppressWarnings(lubridate::dmy(expected_submission_date)),
    sub_date     = suppressWarnings(lubridate::dmy(submission_date)),
    days_diff    = as.numeric(sub_date - exp_date),
    on_time      = if_else(!is.na(days_diff) & days_diff <= 0, 1L, 0L),
    month_yr     = floor_date(exp_date, "month")
  ) |>
  filter(!is.na(award_status))

df |>
  summarise(
    Total_Tenders = n(),
    Submitted     = sum(submitted),
    Did_Not_Bid   = sum(1 - submitted),
    Awarded       = sum(awarded),
    Win_Rate_pct  = round(100 * sum(awarded) / sum(submitted), 1)
  ) |>
  kable(caption = "Table 1: Overall Bid Summary") |>
  kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE)
Table 1: Overall Bid Summary
Total_Tenders Submitted Did_Not_Bid Awarded Win_Rate_pct
102 67 35 7 10.4
Code
df |>
  summarise(across(everything(), ~ sum(is.na(.)))) |>
  pivot_longer(everything(), names_to = "Variable", values_to = "Missing_Count") |>
  mutate(Missing_Pct = round(100 * Missing_Count / nrow(df), 1)) |>
  filter(Missing_Count > 0) |>
  arrange(desc(Missing_Count)) |>
  kable(caption = "Table 2: Missing Values by Variable") |>
  kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE)
Table 2: Missing Values by Variable
Variable Missing_Count Missing_Pct
days_diff 60 58.8
sub_date 48 47.1
exp_date 35 34.3
month_yr 35 34.3
s_n 2 2.0
submission_date 1 1.0
Code
df |>
  count(sbu, sort = TRUE) |>
  mutate(Pct = round(100 * n / sum(n), 1)) |>
  kable(caption = "Table 3: Tenders by Service Category") |>
  kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE)
Table 3: Tenders by Service Category
sbu n Pct
Engineering, Construction & Maintenance 31 30.4
Testing & Calibration 28 27.5
Instrumentation & Controls 23 22.5
Asset Integrity Management 20 19.6
Code
df |>
  count(tender_type) |>
  mutate(Pct = round(100 * n / sum(n), 1)) |>
  kable(caption = "Table 4: Tender Type Split") |>
  kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE)
Table 4: Tender Type Split
tender_type n Pct
Tender 102 100
Code
def clean_sbu(s):
    if pd.isna(s): return "Other"
    s = str(s).lower()
    if "instrument" in s: return "Instrumentation & Controls"
    if "asset"      in s: return "Asset Integrity Management"
    if "test" in s or "calibrat" in s: return "Testing & Calibration"
    if "engineer" in s or "construct" in s or "mainten" in s:
        return "Engineering, Construction & Maintenance"
    return "Other"

df_py = raw_py.copy()
df_py["sbu"]          = df_py["strategic_business_unit_sbu"].apply(clean_sbu)
df_py["tender_type"]  = df_py["tender_type"].str.strip()
df_py["sub_status"]   = df_py["submission_status"].str.strip()
df_py["award_status"] = df_py["award_status"].str.strip()
df_py["awarded"]      = (df_py["award_status"] == "AWARDED").astype(int)
df_py["submitted"]    = (df_py["sub_status"]   == "SUBMITTED").astype(int)

for col in ["expected_submission_date", "submission_date"]:
    df_py[col] = pd.to_datetime(df_py[col], errors="coerce", dayfirst=True)

df_py["days_diff"] = (df_py["submission_date"] - df_py["expected_submission_date"]).dt.days
df_py["on_time"]   = ((df_py["days_diff"].notna()) & (df_py["days_diff"] <= 0)).astype(int)
df_py["month_yr"]  = df_py["expected_submission_date"].dt.to_period("M")

print(pd.Series({
    "Total Tenders": len(df_py),
    "Submitted":     int(df_py["submitted"].sum()),
    "Did Not Bid":   int((1-df_py["submitted"]).sum()),
    "Awarded":       int(df_py["awarded"].sum()),
    "Win Rate (%)":  round(100*df_py["awarded"].sum()/df_py["submitted"].sum(),1)
}).to_string())
Total Tenders    102.0
Submitted         67.0
Did Not Bid       35.0
Awarded            7.0
Win Rate (%)      10.4
Code
print("\nMissing values:")

Missing values:
Code
print(df_py.isnull().sum()[df_py.isnull().sum()>0])
s_n                          2
tender_no                   39
expected_submission_date     4
submission_date             27
days_diff                   29
month_yr                     4
dtype: int64

Data quality issues identified and resolved:

  1. Mixed date formats — Some dates were stored as Excel serial numbers while others were text strings. Resolved using automatic date parsing in both R and Python. Unresolvable entries set to NA.

  2. Inconsistent SBU labels — A small number of rows had combined SBU labels. Resolved using regex-based reclassification into four canonical service categories, cross-referenced against the Job Title column.


5. Exploratory Data Analysis

Code
win_sbu <- df |>
  filter(submitted == 1) |>
  group_by(sbu) |>
  summarise(bids = n(), wins = sum(awarded), win_rate = wins/bids) |>
  arrange(desc(win_rate))

ggplot(win_sbu, aes(x = reorder(sbu, win_rate), y = win_rate, fill = sbu)) +
  geom_col(width = 0.6, show.legend = FALSE) +
  geom_text(aes(label = paste0(round(win_rate*100,0), "%  (", wins, "/", bids, ")")),
            hjust = -0.05, size = 3.5) +
  coord_flip() +
  scale_y_continuous(labels = percent_format(), limits = c(0, 0.55)) +
  scale_fill_brewer(palette = "Blues", direction = -1) +
  labs(title = "Bid Win Rate by Service Category",
       subtitle = "Submitted bids only | Feb 2025 – Feb 2026",
       x = NULL, y = "Win Rate") +
  theme_minimal(base_size = 12)

Code
df |>
  filter(submitted == 1, !is.na(days_diff)) |>
  ggplot(aes(x = days_diff)) +
  geom_histogram(bins = 20, fill = "#1565C0", colour = "white", alpha = 0.8) +
  geom_vline(xintercept = 0, linetype = "dashed", colour = "red", linewidth = 1) +
  labs(title = "Distribution of Submission Timing",
       subtitle = "Negative = submitted early | Positive = submitted late | Red line = deadline",
       x = "Days vs Deadline", y = "Count") +
  theme_minimal(base_size = 12)

Code
win_sbu_py = (df_py[df_py["submitted"]==1]
              .groupby("sbu")
              .agg(bids=("awarded","count"), wins=("awarded","sum"))
              .assign(win_rate=lambda x: x.wins/x.bids)
              .sort_values("win_rate"))

fig, axes = plt.subplots(1,2,figsize=(13,5))

bars = axes[0].barh(win_sbu_py.index, win_sbu_py["win_rate"], color="#1565C0", alpha=0.8)
for bar,(_, row) in zip(bars, win_sbu_py.iterrows()):
    axes[0].text(bar.get_width()+0.01, bar.get_y()+bar.get_height()/2,
                 f"{row['win_rate']*100:.0f}%  ({int(row.wins)}/{int(row.bids)})",
                 va="center", fontsize=9)
Text(0.01, 0.0, '0%  (0/13)')
Text(0.1009090909090909, 1.0, '9%  (2/22)')
Text(0.16384615384615386, 2.0, '15%  (2/13)')
Text(0.16789473684210526, 3.0, '16%  (3/19)')
Code
axes[0].set_xlabel("Win Rate"); axes[0].set_title("Win Rate by Service Category")
Text(0.5, 0, 'Win Rate')
Text(0.5, 1.0, 'Win Rate by Service Category')
Code
axes[0].xaxis.set_major_formatter(mticker.PercentFormatter(xmax=1))
axes[0].set_xlim(0,0.55); axes[0].tick_params(axis="y",labelsize=8)
(0.0, 0.55)
Code
sub_days = df_py[(df_py["submitted"]==1) & df_py["days_diff"].notna()]["days_diff"]
axes[1].hist(sub_days, bins=20, color="#1565C0", edgecolor="white", alpha=0.8)
(array([ 1.,  0.,  2.,  1.,  0.,  1., 41.,  8.,  0.,  1.,  1.,  1.,  2.,
        0.,  1.,  0.,  0.,  0.,  1.,  2.]), array([-113.  ,  -96.65,  -80.3 ,  -63.95,  -47.6 ,  -31.25,  -14.9 ,
          1.45,   17.8 ,   34.15,   50.5 ,   66.85,   83.2 ,   99.55,
        115.9 ,  132.25,  148.6 ,  164.95,  181.3 ,  197.65,  214.  ]), <BarContainer object of 20 artists>)
Code
axes[1].axvline(0, color="red", linestyle="--", label="Deadline")
<matplotlib.lines.Line2D object at 0x0000021E4A82DF10>
Code
axes[1].set_xlabel("Days vs Deadline"); axes[1].set_ylabel("Count")
Text(0.5, 0, 'Days vs Deadline')
Text(0, 0.5, 'Count')
Code
axes[1].set_title("Distribution of Submission Timing"); axes[1].legend()
Text(0.5, 1.0, 'Distribution of Submission Timing')
<matplotlib.legend.Legend object at 0x0000021E4A805430>
Code
plt.tight_layout(); plt.show()

Key EDA findings: The overall win rate is approximately 10.4%, which is low by industry standards. Testing & Calibration and Asset Integrity Management appear to convert at higher rates. Most bids are submitted on or around the deadline, with some submitted late.


6. Data Visualisation

Five visualisations telling one story: the firm submits many bids but wins very few, and both the service category and submission discipline appear to matter.

Code
p1 <- tibble(
  Stage = factor(c("Opportunities\nReceived","Bids\nSubmitted","Contracts\nAwarded"),
                 levels = c("Opportunities\nReceived","Bids\nSubmitted","Contracts\nAwarded")),
  Count = c(nrow(df), sum(df$submitted), sum(df$awarded))
) |>
  ggplot(aes(x=Stage, y=Count, fill=Stage)) +
  geom_col(width=0.5, show.legend=FALSE) +
  geom_text(aes(label=Count), vjust=-0.5, fontface="bold", size=5) +
  scale_fill_manual(values=c("#2196F3","#FF9800","#4CAF50")) +
  labs(title="Plot 1: Bid Funnel", x=NULL, y="Count") +
  theme_minimal(base_size=12)

p2 <- df |>
  count(sbu, sub_status) |>
  ggplot(aes(x=reorder(sbu,n), y=n, fill=sub_status)) +
  geom_col(position="stack", width=0.6) +
  coord_flip() +
  scale_fill_manual(values=c("SUBMITTED"="#1565C0","Did not Bid"="#B0BEC5")) +
  labs(title="Plot 2: Submission Status by SBU", x=NULL, y="Count", fill="Status") +
  theme_minimal(base_size=11) +
  theme(axis.text.y=element_text(size=9))

p3 <- df |>
  filter(submitted==1) |>
  count(sbu, award_status) |>
  ggplot(aes(x=reorder(sbu,n), y=n, fill=award_status)) +
  geom_col(position="fill", width=0.6) +
  coord_flip() +
  scale_y_continuous(labels=percent_format()) +
  scale_fill_manual(values=c("AWARDED"="#2E7D32","NOT AWARDED"="#C62828")) +
  labs(title="Plot 3: Win Rate by SBU (submitted only)", x=NULL, y="Proportion", fill="Outcome") +
  theme_minimal(base_size=11) +
  theme(axis.text.y=element_text(size=9))

p4 <- df |>
  filter(!is.na(month_yr)) |>
  count(month_yr, sub_status) |>
  ggplot(aes(x=month_yr, y=n, colour=sub_status, group=sub_status)) +
  geom_line(linewidth=1.2) + geom_point(size=2.5) +
  scale_colour_manual(values=c("SUBMITTED"="#1565C0","Did not Bid"="#B0BEC5")) +
  labs(title="Plot 4: Monthly Bid Activity", x="Month", y="Count", colour="Status") +
  theme_minimal(base_size=12) +
  theme(axis.text.x=element_text(angle=45, hjust=1))

p5 <- df |>
  filter(submitted==1, !is.na(days_diff)) |>
  mutate(Outcome=if_else(awarded==1,"Awarded","Not Awarded")) |>
  ggplot(aes(x=Outcome, y=days_diff, fill=Outcome)) +
  geom_boxplot(width=0.4, outlier.alpha=0.5, show.legend=FALSE) +
  geom_hline(yintercept=0, linetype="dashed", colour="grey40") +
  scale_fill_manual(values=c("Awarded"="#2E7D32","Not Awarded"="#C62828")) +
  labs(title="Plot 5: Submission Timing by Outcome",
       subtitle="Negative = early | Positive = late",
       x="Outcome", y="Days vs Deadline") +
  theme_minimal(base_size=12)

(p1|p2)/(p3|p4)/p5 +
  plot_annotation(title="Tender Performance Dashboard",
                  subtitle="Oil & Gas Engineering Services | Feb 2025 – Feb 2026",
                  theme=theme(plot.title=element_text(size=15,face="bold")))

Code
fig, axes = plt.subplots(3,2,figsize=(13,15))
fig.suptitle("Tender Performance Dashboard\nFeb 2025–Feb 2026", fontsize=14, fontweight="bold")
Text(0.5, 0.98, 'Tender Performance Dashboard\nFeb 2025–Feb 2026')
Code
# Plot 1
stages=["Opportunities\nReceived","Bids\nSubmitted","Contracts\nAwarded"]
counts=[len(df_py),int(df_py["submitted"].sum()),int(df_py["awarded"].sum())]
axes[0,0].bar(stages,counts,color=["#2196F3","#FF9800","#4CAF50"],width=0.5)
<BarContainer object of 3 artists>
Code
for i,c in enumerate(counts): axes[0,0].text(i,c+0.3,str(c),ha="center",fontweight="bold",fontsize=12)
Text(0, 102.3, '102')
Text(1, 67.3, '67')
Text(2, 7.3, '7')
Code
axes[0,0].set_title("Plot 1: Bid Funnel"); axes[0,0].set_ylabel("Count")
Text(0.5, 1.0, 'Plot 1: Bid Funnel')
Text(0, 0.5, 'Count')
Code
# Plot 2
sub_sbu=df_py.groupby(["sbu","sub_status"]).size().unstack(fill_value=0)
sub_sbu.plot(kind="barh",stacked=True,ax=axes[0,1],
             color={"SUBMITTED":"#1565C0","Did not Bid":"#B0BEC5"})
<Axes: ylabel='sbu'>
Code
axes[0,1].set_title("Plot 2: Submission Status by SBU")
Text(0.5, 1.0, 'Plot 2: Submission Status by SBU')
Code
axes[0,1].set_xlabel("Count"); axes[0,1].tick_params(axis="y",labelsize=8)
Text(0.5, 0, 'Count')
Code
# Plot 3
sub_df_py=df_py[df_py["submitted"]==1]
win_ct=pd.crosstab(sub_df_py["sbu"],sub_df_py["award_status"])
win_pct=win_ct.div(win_ct.sum(axis=1),axis=0)
win_pct.plot(kind="barh",stacked=True,ax=axes[1,0],
             color={"AWARDED":"#2E7D32","NOT AWARDED":"#C62828"})
<Axes: ylabel='sbu'>
Code
axes[1,0].set_title("Plot 3: Win Rate by SBU")
Text(0.5, 1.0, 'Plot 3: Win Rate by SBU')
Code
axes[1,0].xaxis.set_major_formatter(mticker.PercentFormatter(xmax=1))
axes[1,0].tick_params(axis="y",labelsize=8)

# Plot 4
monthly=(df_py.dropna(subset=["expected_submission_date"])
         .groupby([df_py["expected_submission_date"].dt.to_period("M"),"sub_status"])
         .size().unstack(fill_value=0).reset_index())
monthly["expected_submission_date"]=monthly["expected_submission_date"].dt.to_timestamp()
for col,colour in [("SUBMITTED","#1565C0"),("Did not Bid","#B0BEC5")]:
    if col in monthly.columns:
        axes[1,1].plot(monthly["expected_submission_date"],monthly[col],
                       label=col,color=colour,marker="o",linewidth=1.5)
[<matplotlib.lines.Line2D object at 0x0000021E4A97FB00>]
[<matplotlib.lines.Line2D object at 0x0000021E4AB94B90>]
Code
axes[1,1].set_title("Plot 4: Monthly Bid Activity")
Text(0.5, 1.0, 'Plot 4: Monthly Bid Activity')
Code
axes[1,1].set_ylabel("Count"); axes[1,1].legend()
Text(0, 0.5, 'Count')
<matplotlib.legend.Legend object at 0x0000021E4AB95940>
Code
axes[1,1].tick_params(axis="x",rotation=45)

# Plot 5
sdf=df_py[(df_py["submitted"]==1)&df_py["days_diff"].notna()]
bp=axes[2,0].boxplot([sdf[sdf["awarded"]==0]["days_diff"],sdf[sdf["awarded"]==1]["days_diff"]],
                      labels=["Not Awarded","Awarded"],patch_artist=True)
<string>:1: MatplotlibDeprecationWarning: The 'labels' parameter of boxplot() has been renamed 'tick_labels' since Matplotlib 3.9; support for the old name will be dropped in 3.11.
Code
bp["boxes"][0].set_facecolor("#C62828"); bp["boxes"][0].set_alpha(0.6)
bp["boxes"][1].set_facecolor("#2E7D32"); bp["boxes"][1].set_alpha(0.6)
axes[2,0].axhline(0,linestyle="--",color="grey")
<matplotlib.lines.Line2D object at 0x0000021E4AB67BC0>
Code
axes[2,0].set_title("Plot 5: Submission Timing by Outcome")
Text(0.5, 1.0, 'Plot 5: Submission Timing by Outcome')
Code
axes[2,0].set_ylabel("Days vs Deadline")
Text(0, 0.5, 'Days vs Deadline')
Code
axes[2,1].axis("off")
(np.float64(0.0), np.float64(1.0), np.float64(0.0), np.float64(1.0))
Code
plt.tight_layout(); plt.show()


7. Hypothesis Testing

Hypothesis 1 — Do win rates differ by service category?

H₀: Win rates are equal across all service categories. H₁: At least one category has a significantly different win rate. Test: Chi-squared with Monte Carlo simulation (due to small expected cell counts).

Code
submitted_df <- df |> filter(submitted==1)
ct1 <- table(submitted_df$sbu, submitted_df$award_status)
print(ct1)
                                         
                                          AWARDED NOT AWARDED
  Asset Integrity Management                    2          11
  Engineering, Construction & Maintenance       2          20
  Instrumentation & Controls                    0          13
  Testing & Calibration                         3          16
Code
chi1 <- chisq.test(ct1, simulate.p.value=TRUE, B=10000)
print(chi1)

    Pearson's Chi-squared test with simulated p-value (based on 10000
    replicates)

data:  ct1
X-squared = 2.4781, df = NA, p-value = 0.5151
Code
cramers_v <- sqrt(chi1$statistic / (sum(ct1)*(min(dim(ct1))-1)))
cat("\nCramér's V (effect size):", round(cramers_v,3),
    "\nInterpretation: <0.1 negligible | 0.1–0.3 small | 0.3–0.5 moderate | >0.5 large\n")

Cramér's V (effect size): 0.192 
Interpretation: <0.1 negligible | 0.1–0.3 small | 0.3–0.5 moderate | >0.5 large
Code
from scipy.stats import chi2_contingency

sub_py=df_py[df_py["submitted"]==1]
ct1_py=pd.crosstab(sub_py["sbu"],sub_py["award_status"])
print(ct1_py)
award_status                             AWARDED  NOT AWARDED
sbu                                                          
Asset Integrity Management                     2           11
Engineering, Construction & Maintenance        2           20
Instrumentation & Controls                     0           13
Testing & Calibration                          3           16
Code
chi2,p,dof,expected=chi2_contingency(ct1_py)
n=ct1_py.values.sum()
cramers_v=np.sqrt(chi2/(n*(min(ct1_py.shape)-1)))
print(f"\nChi² = {chi2:.3f}, p = {p:.4f}, df = {dof}")

Chi² = 2.478, p = 0.4793, df = 3
Code
print(f"Cramér's V = {cramers_v:.3f}")
Cramér's V = 0.192

Plain-language interpretation: If p < 0.05, win rates genuinely differ by service category and the firm should shift bid resources toward the highest-converting categories. Cramér’s V tells us the practical strength of that difference.


Hypothesis 2 — Are on-time bids more likely to win?

H₀: Win rate is the same for on-time and late submissions. H₁: On-time bids win at a higher rate (one-tailed). Test: Fisher’s Exact Test.

Code
timing_df <- submitted_df |> filter(!is.na(on_time))
ct2 <- table(On_Time=timing_df$on_time, Awarded=timing_df$awarded)
rownames(ct2) <- c("Late","On Time")
colnames(ct2) <- c("Not Awarded","Awarded")
print(ct2)
         Awarded
On_Time   Not Awarded Awarded
  Late             34       3
  On Time          26       4
Code
fisher2 <- fisher.test(ct2, alternative="greater")
print(fisher2)

    Fisher's Exact Test for Count Data

data:  ct2
p-value = 0.3816
alternative hypothesis: true odds ratio is greater than 1
95 percent confidence interval:
 0.3470903       Inf
sample estimates:
odds ratio 
  1.728981 
Code
from scipy.stats import fisher_exact

timing_py=sub_py[sub_py["days_diff"].notna()].copy()
ct2_py=pd.crosstab(timing_py["on_time"],timing_py["awarded"])
ct2_py.index=["Late","On Time"]; ct2_py.columns=["Not Awarded","Awarded"]
print(ct2_py)
         Not Awarded  Awarded
Late              16        2
On Time           40        5
Code
odds_r,p_fisher=fisher_exact(ct2_py.values,alternative="greater")
print(f"\nOdds Ratio = {odds_r:.3f}, p (one-tailed) = {p_fisher:.4f}")

Odds Ratio = 1.000, p (one-tailed) = 0.6849

Plain-language interpretation: If p < 0.05, submitting on time significantly improves our chances of winning. The odds ratio tells us how much — an odds ratio of 2.0 means on-time bids are twice as likely to be awarded. Management implication: set an internal deadline 48 hours before the client deadline.


8. Correlation Analysis

Spearman correlation is used because our variables include binary flags and ordinal categories that do not follow a normal distribution.

Code
corr_df <- df |>
  filter(submitted==1) |>
  mutate(
    is_testing = if_else(sbu=="Testing & Calibration",1L,0L),
    is_aim     = if_else(sbu=="Asset Integrity Management",1L,0L),
    is_ic      = if_else(sbu=="Instrumentation & Controls",1L,0L),
    is_ecm     = if_else(sbu=="Engineering, Construction & Maintenance",1L,0L)
  ) |>
  select(awarded, is_testing, is_aim, is_ic, is_ecm, days_diff, on_time) |>
  drop_na()

corr_mat <- cor(corr_df, method="spearman")
ggcorrplot(corr_mat, hc.order=TRUE, type="lower", lab=TRUE, lab_size=3.5,
           colors=c("#C62828","white","#1565C0"),
           title="Spearman Correlation Matrix — Submitted Bids")
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
ℹ The deprecated feature was likely used in the ggcorrplot package.
  Please report the issue at <https://github.com/kassambara/ggcorrplot/issues>.

Code
corr_df_py=sub_py.copy()
corr_df_py["is_testing"]=(corr_df_py["sbu"]=="Testing & Calibration").astype(int)
corr_df_py["is_aim"]    =(corr_df_py["sbu"]=="Asset Integrity Management").astype(int)
corr_df_py["is_ic"]     =(corr_df_py["sbu"]=="Instrumentation & Controls").astype(int)
corr_df_py["is_ecm"]    =(corr_df_py["sbu"]=="Engineering, Construction & Maintenance").astype(int)

cols=["awarded","is_testing","is_aim","is_ic","is_ecm","days_diff","on_time"]
corr_mat_py=corr_df_py[cols].dropna().corr(method="spearman")

fig,ax=plt.subplots(figsize=(9,7))
mask=np.triu(np.ones_like(corr_mat_py,dtype=bool))
sns.heatmap(corr_mat_py,mask=mask,annot=True,fmt=".2f",center=0,
            cmap="RdBu_r",ax=ax,square=True,annot_kws={"size":10})
<Axes: >
Code
ax.set_title("Spearman Correlation Matrix — Submitted Bids",fontsize=13)
Text(0.5, 1.0, 'Spearman Correlation Matrix — Submitted Bids')
Code
plt.tight_layout(); plt.show()

Discussion of the strongest correlations:

  1. awarded ↔︎ on_time — If positive: submitting on time is associated with winning. Action: treat deadlines as hard constraints.
  2. awarded ↔︎ is_testing — If positive: Testing & Calibration bids convert better. Action: grow this service line.
  3. days_diff ↔︎ awarded — If negative: more days late means lower win probability. Action: start bid preparation earlier.

9. Logistic Regression

The outcome (AWARDED vs. NOT AWARDED) is binary, making logistic regression the appropriate technique. The reference category is Engineering, Construction & Maintenance — the firm’s largest bid volume category.

Code
model_df <- df |>
  filter(submitted==1) |>
  mutate(sbu=factor(sbu, levels=c("Engineering, Construction & Maintenance",
                                   "Asset Integrity Management",
                                   "Instrumentation & Controls",
                                   "Testing & Calibration"))) |>
  select(awarded, sbu, on_time) |>
  drop_na()

logit_model <- glm(awarded ~ sbu + on_time, data=model_df, family=binomial(link="logit"))

tidy(logit_model, exponentiate=TRUE, conf.int=TRUE) |>
  rename(Odds_Ratio=estimate, Lower_95CI=conf.low, Upper_95CI=conf.high) |>
  mutate(across(c(Odds_Ratio,Lower_95CI,Upper_95CI), ~round(.x,3)),
         p.value=round(p.value,4)) |>
  kable(caption="Table 6: Logistic Regression — Odds Ratios") |>
  kable_styling(bootstrap_options=c("striped","hover"), full_width=FALSE)
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Table 6: Logistic Regression — Odds Ratios
term Odds_Ratio std.error statistic p.value Lower_95CI Upper_95CI
(Intercept) 0.073 0.8697916 -3.0125248 0.0026 0.009 3.130000e-01
sbuAsset Integrity Management 1.863 1.0753554 0.5783997 0.5630 0.199 1.760600e+01
sbuInstrumentation & Controls 0.000 1793.1200642 -0.0091131 0.9927 NA 2.253895e+65
sbuTesting & Calibration 1.808 0.9788269 0.6050826 0.5451 0.265 1.514900e+01
on_time 1.949 0.8255535 0.8082336 0.4190 0.383 1.096900e+01
Code
roc_obj <- roc(model_df$awarded, fitted(logit_model), quiet=TRUE)
plot(roc_obj, col="#1565C0", lwd=2,
     main=paste("ROC Curve | AUC =", round(auc(roc_obj),3)))
abline(a=0, b=1, lty=2, col="grey60")

Code
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, roc_curve

ldf=sub_py.copy()
ldf["on_time"]=((ldf["days_diff"].notna())&(ldf["days_diff"]<=0)).astype(int)
ldf=ldf.dropna(subset=["sbu","awarded","on_time"]).reset_index(drop=True)

sbu_dummies=pd.get_dummies(ldf["sbu"],prefix="sbu",drop_first=False).reset_index(drop=True)
ref="sbu_Engineering, Construction & Maintenance"
if ref in sbu_dummies.columns: sbu_dummies.drop(columns=[ref],inplace=True)

X=pd.concat([sbu_dummies,ldf[["on_time"]]],axis=1).fillna(0)
y=ldf["awarded"]

lr=LogisticRegression(max_iter=1000,random_state=42,class_weight="balanced")
lr.fit(X,y)
LogisticRegression(class_weight='balanced', max_iter=1000, random_state=42)
Code
coef_df=pd.DataFrame({"Feature":X.columns,"Coefficient":lr.coef_[0],
                       "Odds_Ratio":np.exp(lr.coef_[0])}).round(3)
print(coef_df.to_string(index=False))
                       Feature  Coefficient  Odds_Ratio
sbu_Asset Integrity Management        0.513       1.670
sbu_Instrumentation & Controls       -1.317       0.268
     sbu_Testing & Calibration        0.548       1.730
                       on_time        0.092       1.096
Code
proba=lr.predict_proba(X)[:,1]
auc_score=roc_auc_score(y,proba)
print(f"\nROC-AUC: {auc_score:.3f}")

ROC-AUC: 0.680
Code
fpr,tpr,_=roc_curve(y,proba)
fig,ax=plt.subplots(figsize=(6,5))
ax.plot(fpr,tpr,color="#1565C0",lw=2,label=f"AUC = {auc_score:.3f}")
[<matplotlib.lines.Line2D object at 0x0000021E4EC51E20>]
Code
ax.plot([0,1],[0,1],"--",color="gray")
[<matplotlib.lines.Line2D object at 0x0000021E4AD80EC0>]
Code
ax.set_xlabel("False Positive Rate"); ax.set_ylabel("True Positive Rate")
Text(0.5, 0, 'False Positive Rate')
Text(0, 0.5, 'True Positive Rate')
Code
ax.set_title("ROC Curve — Logistic Regression"); ax.legend()
Text(0.5, 1.0, 'ROC Curve — Logistic Regression')
<matplotlib.legend.Legend object at 0x0000021E4ACA8050>
Code
plt.tight_layout(); plt.show()

Interpretation for a non-technical manager: Each odds ratio tells you how much more (or less) likely a bid is to be awarded compared to an Engineering/Construction bid. An odds ratio of 3.0 for Testing & Calibration means those bids are 3 times more likely to win. The on_time coefficient tells you the multiplier effect of submitting before the deadline. An AUC above 0.6 means the model is better than random chance at predicting which bids will win.

Caveat: with only 7 awarded bids, treat magnitudes as directional signals. Rerun after accumulating more wins.


10. Integrated Findings

Analysis Key Finding Business Action
EDA Win rate = ~10%. Most bids fail. Reduce volume, improve quality and focus.
Visualisation Testing & Calibration and AIM convert best. Grow these service lines deliberately.
Hypothesis 1 Win rates differ by service category. Allocate resources to high-converting SBUs.
Hypothesis 2 On-time submission is associated with winning. Enforce internal deadlines 48 hrs before client deadlines.
Correlation Timing and SBU are the strongest correlates of award. Use these two factors to score incoming bids.
Logistic Regression SBU and punctuality predict award probability. Use model to support bid/no-bid decisions.

Single recommendation: Implement a formal bid/no-bid decision gate. Score each incoming opportunity on: (1) Is it Testing & Calibration or Asset Integrity Management? (2) Can we submit on time? Bids that fail both tests should be declined. This concentrates effort on the highest-probability opportunities.


11. Limitations & Further Work

  1. Small awarded sample (n = 7): Low statistical power throughout. Rerun all tests after 12 more months of data.
  2. No contract value data: Win rate treats all bids equally. Future analysis should optimise for expected revenue, not just number of wins.
  3. No competitor information: Number of competitors per bid would significantly improve model accuracy.
  4. Date parsing inconsistencies: Standardise all dates to DD/MM/YYYY in the bid register.
  5. No reason for “Did not Bid” decisions: Adding a reason code column would enable future analysis of bid selectivity.

References

Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making. Lagos Business School / markanalytics.online. https://markanalytics.online

R Core Team. (2024). R: A language and environment for statistical computing (Version 4.x). R Foundation for Statistical Computing. https://www.R-project.org/

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4

McKinney, W. (2010). Data structures for statistical computing in Python. In Proceedings of the 9th Python in Science Conference (pp. 56–61). https://doi.org/10.25080/Majora-92bf1922-00a

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

Agbadobi Blessing Osochi. (2026). Tender tracking register — Engineering Automation Technology Limited [Dataset]. Collected from Engineering Automation Technology Limited, Lagos, Nigeria. Data available on request from the author.


Appendix: AI Usage Statement

Claude (Anthropic) was used to assist with structuring the Quarto document template, suggesting appropriate R and Python package selections, and generating initial code scaffolding for each analytical section. All analytical decisions — including the choice of Spearman over Pearson correlation, the selection of Fisher’s Exact Test for the timing hypothesis, the framing of both business hypotheses, the reference category selection in the logistic regression, and the business interpretation of every output — were made independently by the author. The author reviewed, ran, and verified all code outputs and takes full responsibility for all conclusions in this document.