Tender Win-Rate Analytics: Understanding What Drives Bid Success in an Oil & Gas Engineering Services Firm

Author

Agbadobi Blessing Osochi

Published

May 9, 2026

1. Executive Summary

This study analyses 102 tender records logged by an oil and gas engineering services firm operating across Nigeria between February 2025 and February 2026. The data was extracted from the organisation’s internal Business Development tender register and covers four service categories: Instrumentation & Controls, Asset Integrity Management, Engineering/Construction & Maintenance, and Testing & Calibration. The central business problem is a critically low bid-win rate — only 7 out of 67 submitted bids (approximately 10.4%) were awarded — which threatens the firm’s revenue pipeline. Using five analytical techniques (Exploratory Data Analysis, Data Visualisation, Hypothesis Testing, Correlation Analysis, and Logistic Regression), this study finds that service category and submission punctuality are the strongest factors associated with award outcomes. The key recommendation is that the firm should implement a formal bid/no-bid decision gate, concentrate resources on Testing & Calibration and Asset Integrity Management opportunities, and enforce strict submission deadline compliance to improve conversion rates.

2. Professional Disclosure

Job Title: Tendering and Commercial Officer

Department: Business Development / Commercial Department

Organisation: Oil and Gas Engineering Services Firm, Lagos, Nigeria

Why these five techniques are directly relevant to my work:

Exploratory Data Analysis (EDA) — As a Tendering and Commercial Officer, I manage and maintain the firm’s bid register. Before making any recommendations on where to focus bid effort, I need an honest statistical picture of our submission activity, win rates, and data quality. EDA is the first thing I would do when presenting a pipeline review to management.
Data Visualisation — Senior management and business development directors do not read tables of numbers. Charts showing win rates by service category and monthly bid trends are how I would communicate the firm’s commercial performance in a pipeline review meeting.
Hypothesis Testing — When I observe that one service category appears to win more bids than another, I need to know whether that difference is statistically real or simply due to chance. Formal hypothesis testing prevents the firm from reallocating resources based on noise.
Correlation Analysis — I want to know whether submitting bids on time is genuinely associated with winning, or whether it is coincidental. Correlation analysis quantifies this relationship across the entire dataset.
Logistic Regression — Our outcome (win or lose) is binary. Logistic regression lets me build a model that estimates the probability of winning any new bid based on its service category and submission punctuality — directly supporting our bid/no-bid decision process.

3. Data Collection & Sampling

Source: Internal tender register maintained by the Business Development / Commercial Department.

Collection method: The dataset was extracted directly from the organisation’s tender tracking spreadsheet, which is updated by the author as part of day-to-day tendering responsibilities. Each row represents one tender or RFQ opportunity received or identified by the firm.

Variables recorded:

Variable	Description
S/N	Serial number
Tender No	Reference number assigned by the client
Client	Anonymised client identifier (Client 1–29)
Job Title	Brief description of the scope of work
Strategic Business Unit (SBU)	Service category assigned based on scope of work
Tender Type	Full Tender or RFQ
Expected Submission Date	Deadline communicated by the client
Submission Date	Date the bid was actually submitted
Submission Status	SUBMITTED or Did not Bid
Award Status	AWARDED or NOT AWARDED

SBU classification note: SBU labels were assigned by the author using professional judgement based on the scope of work described in each tender’s Job Title. This is standard classification practice in business development tracking.

Time period: February 2025 – February 2026 (approximately 12 months).

Sample size: 102 observations — a full census of all tenders logged during the period.

Ethical notes: All client names have been replaced with anonymised codes (Client 1–29). No employee personal data is included. The dataset covers commercial activity only and has been cleared for academic submission by the author.

Code

# Load R packages
library(tidyverse)
library(readxl)
library(janitor)
library(lubridate)
library(ggcorrplot)
library(scales)
library(knitr)
library(kableExtra)
library(broom)
library(pROC)
library(patchwork)

# Load data
raw <- read_excel("Data.xlsx", sheet = "DA") |>
  clean_names()

glimpse(raw)

Rows: 102
Columns: 10
$ s_n                         <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,…
$ tender_no                   <chr> "ESSO.00000090", "14411", "SEPLAT.00000100…
$ client                      <chr> "Client 1", "Client 2", "Client 3", "Clien…
$ job_title                   <chr> "Provision Of Downhole Completions Equipme…
$ strategic_business_unit_sbu <chr> "Engineering, Construction & Maintenance",…
$ tender_type                 <chr> "Tender", "Tender", "Tender", "Tender", "T…
$ expected_submission_date    <chr> "21/03/2025", "22/03/2025", "29/03/2025", …
$ submission_date             <chr> "N/A", "22/03/2025", "N/A", "20/03/2025", …
$ submission_status           <chr> "Did not Bid", "SUBMITTED", "Did not Bid",…
$ award_status                <chr> "NOT SUBMITTED", "NOT AWARDED", "NOT SUBMI…

Code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import seaborn as sns
from scipy import stats
import warnings
warnings.filterwarnings("ignore")

raw_py = pd.read_excel("Data.xlsx", sheet_name="DA")
raw_py.columns = (raw_py.columns
                  .str.strip()
                  .str.lower()
                  .str.replace(r"[^a-z0-9]+", "_", regex=True)
                  .str.strip("_"))
print(raw_py.shape)

(102, 10)

Code

print(raw_py.dtypes)

s_n                            float64
tender_no                       object
client                             str
job_title                          str
strategic_business_unit_sbu        str
tender_type                        str
expected_submission_date        object
submission_date                 object
submission_status                  str
award_status                       str
dtype: object

4. Data Description

Code

df <- raw |>
  mutate(
    sbu = case_when(
      str_detect(strategic_business_unit_sbu, regex("instrument", ignore_case=TRUE)) ~ "Instrumentation & Controls",
      str_detect(strategic_business_unit_sbu, regex("asset", ignore_case=TRUE))      ~ "Asset Integrity Management",
      str_detect(strategic_business_unit_sbu, regex("test|calibrat", ignore_case=TRUE)) ~ "Testing & Calibration",
      str_detect(strategic_business_unit_sbu, regex("engineer|construct|mainten", ignore_case=TRUE)) ~ "Engineering, Construction & Maintenance",
      TRUE ~ "Other"
    ),
    tender_type  = str_trim(tender_type),
    sub_status   = str_trim(submission_status),
    award_status = str_trim(award_status),
    awarded      = if_else(award_status == "AWARDED",   1L, 0L),
    submitted    = if_else(sub_status   == "SUBMITTED", 1L, 0L),
    exp_date     = suppressWarnings(lubridate::dmy(expected_submission_date)),
    sub_date     = suppressWarnings(lubridate::dmy(submission_date)),
    days_diff    = as.numeric(sub_date - exp_date),
    on_time      = if_else(!is.na(days_diff) & days_diff <= 0, 1L, 0L),
    month_yr     = floor_date(exp_date, "month")
  ) |>
  filter(!is.na(award_status))

df |>
  summarise(
    Total_Tenders = n(),
    Submitted     = sum(submitted),
    Did_Not_Bid   = sum(1 - submitted),
    Awarded       = sum(awarded),
    Win_Rate_pct  = round(100 * sum(awarded) / sum(submitted), 1)
  ) |>
  kable(caption = "Table 1: Overall Bid Summary") |>
  kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE)

Table 1: Overall Bid Summary
Total_Tenders	Submitted	Did_Not_Bid	Awarded	Win_Rate_pct
102	67	35	7	10.4

Code

df |>
  summarise(across(everything(), ~ sum(is.na(.)))) |>
  pivot_longer(everything(), names_to = "Variable", values_to = "Missing_Count") |>
  mutate(Missing_Pct = round(100 * Missing_Count / nrow(df), 1)) |>
  filter(Missing_Count > 0) |>
  arrange(desc(Missing_Count)) |>
  kable(caption = "Table 2: Missing Values by Variable") |>
  kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE)

Table 2: Missing Values by Variable
Variable	Missing_Count	Missing_Pct
days_diff	60	58.8
sub_date	48	47.1
exp_date	35	34.3
month_yr	35	34.3
s_n	2	2.0
submission_date	1	1.0

Code

df |>
  count(sbu, sort = TRUE) |>
  mutate(Pct = round(100 * n / sum(n), 1)) |>
  kable(caption = "Table 3: Tenders by Service Category") |>
  kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE)

Table 3: Tenders by Service Category
sbu	n	Pct
Engineering, Construction & Maintenance	31	30.4
Testing & Calibration	28	27.5
Instrumentation & Controls	23	22.5
Asset Integrity Management	20	19.6

Code

df |>
  count(tender_type) |>
  mutate(Pct = round(100 * n / sum(n), 1)) |>
  kable(caption = "Table 4: Tender Type Split") |>
  kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE)

Table 4: Tender Type Split
tender_type	n	Pct
Tender	102	100

Code

def clean_sbu(s):
    if pd.isna(s): return "Other"
    s = str(s).lower()
    if "instrument" in s: return "Instrumentation & Controls"
    if "asset"      in s: return "Asset Integrity Management"
    if "test" in s or "calibrat" in s: return "Testing & Calibration"
    if "engineer" in s or "construct" in s or "mainten" in s:
        return "Engineering, Construction & Maintenance"
    return "Other"

df_py = raw_py.copy()
df_py["sbu"]          = df_py["strategic_business_unit_sbu"].apply(clean_sbu)
df_py["tender_type"]  = df_py["tender_type"].str.strip()
df_py["sub_status"]   = df_py["submission_status"].str.strip()
df_py["award_status"] = df_py["award_status"].str.strip()
df_py["awarded"]      = (df_py["award_status"] == "AWARDED").astype(int)
df_py["submitted"]    = (df_py["sub_status"]   == "SUBMITTED").astype(int)

for col in ["expected_submission_date", "submission_date"]:
    df_py[col] = pd.to_datetime(df_py[col], errors="coerce", dayfirst=True)

df_py["days_diff"] = (df_py["submission_date"] - df_py["expected_submission_date"]).dt.days
df_py["on_time"]   = ((df_py["days_diff"].notna()) & (df_py["days_diff"] <= 0)).astype(int)
df_py["month_yr"]  = df_py["expected_submission_date"].dt.to_period("M")

print(pd.Series({
    "Total Tenders": len(df_py),
    "Submitted":     int(df_py["submitted"].sum()),
    "Did Not Bid":   int((1-df_py["submitted"]).sum()),
    "Awarded":       int(df_py["awarded"].sum()),
    "Win Rate (%)":  round(100*df_py["awarded"].sum()/df_py["submitted"].sum(),1)
}).to_string())

Total Tenders    102.0
Submitted         67.0
Did Not Bid       35.0
Awarded            7.0
Win Rate (%)      10.4

Code

print("\nMissing values:")


Missing values:

Code

print(df_py.isnull().sum()[df_py.isnull().sum()>0])

s_n                          2
tender_no                   39
expected_submission_date     4
submission_date             27
days_diff                   29
month_yr                     4
dtype: int64

Data quality issues identified and resolved:

Mixed date formats — Some dates were stored as Excel serial numbers while others were text strings. Resolved using automatic date parsing in both R and Python. Unresolvable entries set to NA.
Inconsistent SBU labels — A small number of rows had combined SBU labels. Resolved using regex-based reclassification into four canonical service categories, cross-referenced against the Job Title column.

5. Exploratory Data Analysis

Code

win_sbu <- df |>
  filter(submitted == 1) |>
  group_by(sbu) |>
  summarise(bids = n(), wins = sum(awarded), win_rate = wins/bids) |>
  arrange(desc(win_rate))

ggplot(win_sbu, aes(x = reorder(sbu, win_rate), y = win_rate, fill = sbu)) +
  geom_col(width = 0.6, show.legend = FALSE) +
  geom_text(aes(label = paste0(round(win_rate*100,0), "%  (", wins, "/", bids, ")")),
            hjust = -0.05, size = 3.5) +
  coord_flip() +
  scale_y_continuous(labels = percent_format(), limits = c(0, 0.55)) +
  scale_fill_brewer(palette = "Blues", direction = -1) +
  labs(title = "Bid Win Rate by Service Category",
       subtitle = "Submitted bids only | Feb 2025 – Feb 2026",
       x = NULL, y = "Win Rate") +
  theme_minimal(base_size = 12)

Code

df |>
  filter(submitted == 1, !is.na(days_diff)) |>
  ggplot(aes(x = days_diff)) +
  geom_histogram(bins = 20, fill = "#1565C0", colour = "white", alpha = 0.8) +
  geom_vline(xintercept = 0, linetype = "dashed", colour = "red", linewidth = 1) +
  labs(title = "Distribution of Submission Timing",
       subtitle = "Negative = submitted early | Positive = submitted late | Red line = deadline",
       x = "Days vs Deadline", y = "Count") +
  theme_minimal(base_size = 12)

Code

win_sbu_py = (df_py[df_py["submitted"]==1]
              .groupby("sbu")
              .agg(bids=("awarded","count"), wins=("awarded","sum"))
              .assign(win_rate=lambda x: x.wins/x.bids)
              .sort_values("win_rate"))

fig, axes = plt.subplots(1,2,figsize=(13,5))

bars = axes[0].barh(win_sbu_py.index, win_sbu_py["win_rate"], color="#1565C0", alpha=0.8)
for bar,(_, row) in zip(bars, win_sbu_py.iterrows()):
    axes[0].text(bar.get_width()+0.01, bar.get_y()+bar.get_height()/2,
                 f"{row['win_rate']*100:.0f}%  ({int(row.wins)}/{int(row.bids)})",
                 va="center", fontsize=9)

Text(0.01, 0.0, '0%  (0/13)')
Text(0.1009090909090909, 1.0, '9%  (2/22)')
Text(0.16384615384615386, 2.0, '15%  (2/13)')
Text(0.16789473684210526, 3.0, '16%  (3/19)')

Code

axes[0].set_xlabel("Win Rate"); axes[0].set_title("Win Rate by Service Category")

Text(0.5, 0, 'Win Rate')
Text(0.5, 1.0, 'Win Rate by Service Category')

Code

axes[0].xaxis.set_major_formatter(mticker.PercentFormatter(xmax=1))
axes[0].set_xlim(0,0.55); axes[0].tick_params(axis="y",labelsize=8)

(0.0, 0.55)

Code

sub_days = df_py[(df_py["submitted"]==1) & df_py["days_diff"].notna()]["days_diff"]
axes[1].hist(sub_days, bins=20, color="#1565C0", edgecolor="white", alpha=0.8)

(array([ 1.,  0.,  2.,  1.,  0.,  1., 41.,  8.,  0.,  1.,  1.,  1.,  2.,
        0.,  1.,  0.,  0.,  0.,  1.,  2.]), array([-113.  ,  -96.65,  -80.3 ,  -63.95,  -47.6 ,  -31.25,  -14.9 ,
          1.45,   17.8 ,   34.15,   50.5 ,   66.85,   83.2 ,   99.55,
        115.9 ,  132.25,  148.6 ,  164.95,  181.3 ,  197.65,  214.  ]), <BarContainer object of 20 artists>)

Code

axes[1].axvline(0, color="red", linestyle="--", label="Deadline")

<matplotlib.lines.Line2D object at 0x0000017661E94350>

Code

axes[1].set_xlabel("Days vs Deadline"); axes[1].set_ylabel("Count")

Text(0.5, 0, 'Days vs Deadline')
Text(0, 0.5, 'Count')

Code

axes[1].set_title("Distribution of Submission Timing"); axes[1].legend()

Text(0.5, 1.0, 'Distribution of Submission Timing')
<matplotlib.legend.Legend object at 0x0000017661314500>

Code

plt.tight_layout(); plt.show()

Key EDA findings: The overall win rate is approximately 10.4%, which is low by industry standards. Testing & Calibration and Asset Integrity Management appear to convert at higher rates. Most bids are submitted on or around the deadline, with some submitted late.

6. Data Visualisation

Five visualisations telling one story: the firm submits many bids but wins very few, and both the service category and submission discipline appear to matter.

Code

p1 <- tibble(
  Stage = factor(c("Opportunities\nReceived","Bids\nSubmitted","Contracts\nAwarded"),
                 levels = c("Opportunities\nReceived","Bids\nSubmitted","Contracts\nAwarded")),
  Count = c(nrow(df), sum(df$submitted), sum(df$awarded))
) |>
  ggplot(aes(x=Stage, y=Count, fill=Stage)) +
  geom_col(width=0.5, show.legend=FALSE) +
  geom_text(aes(label=Count), vjust=-0.5, fontface="bold", size=5) +
  scale_fill_manual(values=c("#2196F3","#FF9800","#4CAF50")) +
  labs(title="Plot 1: Bid Funnel", x=NULL, y="Count") +
  theme_minimal(base_size=12)

p2 <- df |>
  count(sbu, sub_status) |>
  ggplot(aes(x=reorder(sbu,n), y=n, fill=sub_status)) +
  geom_col(position="stack", width=0.6) +
  coord_flip() +
  scale_fill_manual(values=c("SUBMITTED"="#1565C0","Did not Bid"="#B0BEC5")) +
  labs(title="Plot 2: Submission Status by SBU", x=NULL, y="Count", fill="Status") +
  theme_minimal(base_size=11) +
  theme(axis.text.y=element_text(size=9))

p3 <- df |>
  filter(submitted==1) |>
  count(sbu, award_status) |>
  ggplot(aes(x=reorder(sbu,n), y=n, fill=award_status)) +
  geom_col(position="fill", width=0.6) +
  coord_flip() +
  scale_y_continuous(labels=percent_format()) +
  scale_fill_manual(values=c("AWARDED"="#2E7D32","NOT AWARDED"="#C62828")) +
  labs(title="Plot 3: Win Rate by SBU (submitted only)", x=NULL, y="Proportion", fill="Outcome") +
  theme_minimal(base_size=11) +
  theme(axis.text.y=element_text(size=9))

p4 <- df |>
  filter(!is.na(month_yr)) |>
  count(month_yr, sub_status) |>
  ggplot(aes(x=month_yr, y=n, colour=sub_status, group=sub_status)) +
  geom_line(linewidth=1.2) + geom_point(size=2.5) +
  scale_colour_manual(values=c("SUBMITTED"="#1565C0","Did not Bid"="#B0BEC5")) +
  labs(title="Plot 4: Monthly Bid Activity", x="Month", y="Count", colour="Status") +
  theme_minimal(base_size=12) +
  theme(axis.text.x=element_text(angle=45, hjust=1))

p5 <- df |>
  filter(submitted==1, !is.na(days_diff)) |>
  mutate(Outcome=if_else(awarded==1,"Awarded","Not Awarded")) |>
  ggplot(aes(x=Outcome, y=days_diff, fill=Outcome)) +
  geom_boxplot(width=0.4, outlier.alpha=0.5, show.legend=FALSE) +
  geom_hline(yintercept=0, linetype="dashed", colour="grey40") +
  scale_fill_manual(values=c("Awarded"="#2E7D32","Not Awarded"="#C62828")) +
  labs(title="Plot 5: Submission Timing by Outcome",
       subtitle="Negative = early | Positive = late",
       x="Outcome", y="Days vs Deadline") +
  theme_minimal(base_size=12)

(p1|p2)/(p3|p4)/p5 +
  plot_annotation(title="Tender Performance Dashboard",
                  subtitle="Oil & Gas Engineering Services | Feb 2025 – Feb 2026",
                  theme=theme(plot.title=element_text(size=15,face="bold")))

Code

fig, axes = plt.subplots(3,2,figsize=(13,15))
fig.suptitle("Tender Performance Dashboard\nFeb 2025–Feb 2026", fontsize=14, fontweight="bold")

Text(0.5, 0.98, 'Tender Performance Dashboard\nFeb 2025–Feb 2026')

Code

# Plot 1
stages=["Opportunities\nReceived","Bids\nSubmitted","Contracts\nAwarded"]
counts=[len(df_py),int(df_py["submitted"].sum()),int(df_py["awarded"].sum())]
axes[0,0].bar(stages,counts,color=["#2196F3","#FF9800","#4CAF50"],width=0.5)

<BarContainer object of 3 artists>

Code

for i,c in enumerate(counts): axes[0,0].text(i,c+0.3,str(c),ha="center",fontweight="bold",fontsize=12)

Text(0, 102.3, '102')
Text(1, 67.3, '67')
Text(2, 7.3, '7')

Code

axes[0,0].set_title("Plot 1: Bid Funnel"); axes[0,0].set_ylabel("Count")

Text(0.5, 1.0, 'Plot 1: Bid Funnel')
Text(0, 0.5, 'Count')

Code

# Plot 2
sub_sbu=df_py.groupby(["sbu","sub_status"]).size().unstack(fill_value=0)
sub_sbu.plot(kind="barh",stacked=True,ax=axes[0,1],
             color={"SUBMITTED":"#1565C0","Did not Bid":"#B0BEC5"})

<Axes: ylabel='sbu'>

Code

axes[0,1].set_title("Plot 2: Submission Status by SBU")

Text(0.5, 1.0, 'Plot 2: Submission Status by SBU')

Code

axes[0,1].set_xlabel("Count"); axes[0,1].tick_params(axis="y",labelsize=8)

Text(0.5, 0, 'Count')

Code

# Plot 3
sub_df_py=df_py[df_py["submitted"]==1]
win_ct=pd.crosstab(sub_df_py["sbu"],sub_df_py["award_status"])
win_pct=win_ct.div(win_ct.sum(axis=1),axis=0)
win_pct.plot(kind="barh",stacked=True,ax=axes[1,0],
             color={"AWARDED":"#2E7D32","NOT AWARDED":"#C62828"})

<Axes: ylabel='sbu'>

Code

axes[1,0].set_title("Plot 3: Win Rate by SBU")

Text(0.5, 1.0, 'Plot 3: Win Rate by SBU')

Code

axes[1,0].xaxis.set_major_formatter(mticker.PercentFormatter(xmax=1))
axes[1,0].tick_params(axis="y",labelsize=8)

# Plot 4
monthly=(df_py.dropna(subset=["expected_submission_date"])
         .groupby([df_py["expected_submission_date"].dt.to_period("M"),"sub_status"])
         .size().unstack(fill_value=0).reset_index())
monthly["expected_submission_date"]=monthly["expected_submission_date"].dt.to_timestamp()
for col,colour in [("SUBMITTED","#1565C0"),("Did not Bid","#B0BEC5")]:
    if col in monthly.columns:
        axes[1,1].plot(monthly["expected_submission_date"],monthly[col],
                       label=col,color=colour,marker="o",linewidth=1.5)

[<matplotlib.lines.Line2D object at 0x0000017661F2E1E0>]
[<matplotlib.lines.Line2D object at 0x0000017663C84F80>]

Code

axes[1,1].set_title("Plot 4: Monthly Bid Activity")

Text(0.5, 1.0, 'Plot 4: Monthly Bid Activity')

Code

axes[1,1].set_ylabel("Count"); axes[1,1].legend()

Text(0, 0.5, 'Count')
<matplotlib.legend.Legend object at 0x00000176632547A0>

Code

axes[1,1].tick_params(axis="x",rotation=45)

# Plot 5
sdf=df_py[(df_py["submitted"]==1)&df_py["days_diff"].notna()]
bp=axes[2,0].boxplot([sdf[sdf["awarded"]==0]["days_diff"],sdf[sdf["awarded"]==1]["days_diff"]],
                      labels=["Not Awarded","Awarded"],patch_artist=True)

<string>:1: MatplotlibDeprecationWarning: The 'labels' parameter of boxplot() has been renamed 'tick_labels' since Matplotlib 3.9; support for the old name will be dropped in 3.11.

Code

bp["boxes"][0].set_facecolor("#C62828"); bp["boxes"][0].set_alpha(0.6)
bp["boxes"][1].set_facecolor("#2E7D32"); bp["boxes"][1].set_alpha(0.6)
axes[2,0].axhline(0,linestyle="--",color="grey")

<matplotlib.lines.Line2D object at 0x0000017661D961E0>

Code

axes[2,0].set_title("Plot 5: Submission Timing by Outcome")

Text(0.5, 1.0, 'Plot 5: Submission Timing by Outcome')

Code

axes[2,0].set_ylabel("Days vs Deadline")

Text(0, 0.5, 'Days vs Deadline')

Code

axes[2,1].axis("off")

(np.float64(0.0), np.float64(1.0), np.float64(0.0), np.float64(1.0))

Code

plt.tight_layout(); plt.show()

7. Hypothesis Testing

Hypothesis 1 — Do win rates differ by service category?

H₀: Win rates are equal across all service categories. H₁: At least one category has a significantly different win rate. Test: Chi-squared with Monte Carlo simulation (due to small expected cell counts).

Code

submitted_df <- df |> filter(submitted==1)
ct1 <- table(submitted_df$sbu, submitted_df$award_status)
print(ct1)

                                         
                                          AWARDED NOT AWARDED
  Asset Integrity Management                    2          11
  Engineering, Construction & Maintenance       2          20
  Instrumentation & Controls                    0          13
  Testing & Calibration                         3          16

Code

chi1 <- chisq.test(ct1, simulate.p.value=TRUE, B=10000)
print(chi1)


    Pearson's Chi-squared test with simulated p-value (based on 10000
    replicates)

data:  ct1
X-squared = 2.4781, df = NA, p-value = 0.5204

Code

cramers_v <- sqrt(chi1$statistic / (sum(ct1)*(min(dim(ct1))-1)))
cat("\nCramér's V (effect size):", round(cramers_v,3),
    "\nInterpretation: <0.1 negligible | 0.1–0.3 small | 0.3–0.5 moderate | >0.5 large\n")


Cramér's V (effect size): 0.192 
Interpretation: <0.1 negligible | 0.1–0.3 small | 0.3–0.5 moderate | >0.5 large

Code

from scipy.stats import chi2_contingency

sub_py=df_py[df_py["submitted"]==1]
ct1_py=pd.crosstab(sub_py["sbu"],sub_py["award_status"])
print(ct1_py)

award_status                             AWARDED  NOT AWARDED
sbu                                                          
Asset Integrity Management                     2           11
Engineering, Construction & Maintenance        2           20
Instrumentation & Controls                     0           13
Testing & Calibration                          3           16

Code

chi2,p,dof,expected=chi2_contingency(ct1_py)
n=ct1_py.values.sum()
cramers_v=np.sqrt(chi2/(n*(min(ct1_py.shape)-1)))
print(f"\nChi² = {chi2:.3f}, p = {p:.4f}, df = {dof}")


Chi² = 2.478, p = 0.4793, df = 3

Code

print(f"Cramér's V = {cramers_v:.3f}")

Cramér's V = 0.192

Plain-language interpretation: If p < 0.05, win rates genuinely differ by service category and the firm should shift bid resources toward the highest-converting categories. Cramér’s V tells us the practical strength of that difference.

Hypothesis 2 — Are on-time bids more likely to win?

H₀: Win rate is the same for on-time and late submissions. H₁: On-time bids win at a higher rate (one-tailed). Test: Fisher’s Exact Test.

Code

timing_df <- submitted_df |> filter(!is.na(on_time))
ct2 <- table(On_Time=timing_df$on_time, Awarded=timing_df$awarded)
rownames(ct2) <- c("Late","On Time")
colnames(ct2) <- c("Not Awarded","Awarded")
print(ct2)

         Awarded
On_Time   Not Awarded Awarded
  Late             34       3
  On Time          26       4

Code

fisher2 <- fisher.test(ct2, alternative="greater")
print(fisher2)


    Fisher's Exact Test for Count Data

data:  ct2
p-value = 0.3816
alternative hypothesis: true odds ratio is greater than 1
95 percent confidence interval:
 0.3470903       Inf
sample estimates:
odds ratio 
  1.728981

Code

from scipy.stats import fisher_exact

timing_py=sub_py[sub_py["days_diff"].notna()].copy()
ct2_py=pd.crosstab(timing_py["on_time"],timing_py["awarded"])
ct2_py.index=["Late","On Time"]; ct2_py.columns=["Not Awarded","Awarded"]
print(ct2_py)

         Not Awarded  Awarded
Late              16        2
On Time           40        5

Code

odds_r,p_fisher=fisher_exact(ct2_py.values,alternative="greater")
print(f"\nOdds Ratio = {odds_r:.3f}, p (one-tailed) = {p_fisher:.4f}")


Odds Ratio = 1.000, p (one-tailed) = 0.6849

Plain-language interpretation: If p < 0.05, submitting on time significantly improves our chances of winning. The odds ratio tells us how much — an odds ratio of 2.0 means on-time bids are twice as likely to be awarded. Management implication: set an internal deadline 48 hours before the client deadline.

8. Correlation Analysis

Spearman correlation is used because our variables include binary flags and ordinal categories that do not follow a normal distribution.

Code

corr_df <- df |>
  filter(submitted==1) |>
  mutate(
    is_testing = if_else(sbu=="Testing & Calibration",1L,0L),
    is_aim     = if_else(sbu=="Asset Integrity Management",1L,0L),
    is_ic      = if_else(sbu=="Instrumentation & Controls",1L,0L),
    is_ecm     = if_else(sbu=="Engineering, Construction & Maintenance",1L,0L)
  ) |>
  select(awarded, is_testing, is_aim, is_ic, is_ecm, days_diff, on_time) |>
  drop_na()

corr_mat <- cor(corr_df, method="spearman")
ggcorrplot(corr_mat, hc.order=TRUE, type="lower", lab=TRUE, lab_size=3.5,
           colors=c("#C62828","white","#1565C0"),
           title="Spearman Correlation Matrix — Submitted Bids")

Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
ℹ The deprecated feature was likely used in the ggcorrplot package.
  Please report the issue at <https://github.com/kassambara/ggcorrplot/issues>.

Code

corr_df_py=sub_py.copy()
corr_df_py["is_testing"]=(corr_df_py["sbu"]=="Testing & Calibration").astype(int)
corr_df_py["is_aim"]    =(corr_df_py["sbu"]=="Asset Integrity Management").astype(int)
corr_df_py["is_ic"]     =(corr_df_py["sbu"]=="Instrumentation & Controls").astype(int)
corr_df_py["is_ecm"]    =(corr_df_py["sbu"]=="Engineering, Construction & Maintenance").astype(int)

cols=["awarded","is_testing","is_aim","is_ic","is_ecm","days_diff","on_time"]
corr_mat_py=corr_df_py[cols].dropna().corr(method="spearman")

fig,ax=plt.subplots(figsize=(9,7))
mask=np.triu(np.ones_like(corr_mat_py,dtype=bool))
sns.heatmap(corr_mat_py,mask=mask,annot=True,fmt=".2f",center=0,
            cmap="RdBu_r",ax=ax,square=True,annot_kws={"size":10})

<Axes: >

Code

ax.set_title("Spearman Correlation Matrix — Submitted Bids",fontsize=13)

Text(0.5, 1.0, 'Spearman Correlation Matrix — Submitted Bids')

Code

plt.tight_layout(); plt.show()

Discussion of the strongest correlations:

awarded ↔︎ on_time — If positive: submitting on time is associated with winning. Action: treat deadlines as hard constraints.
awarded ↔︎ is_testing — If positive: Testing & Calibration bids convert better. Action: grow this service line.
days_diff ↔︎ awarded — If negative: more days late means lower win probability. Action: start bid preparation earlier.

9. Logistic Regression

The outcome (AWARDED vs. NOT AWARDED) is binary, making logistic regression the appropriate technique. The reference category is Engineering, Construction & Maintenance — the firm’s largest bid volume category.

Code

model_df <- df |>
  filter(submitted==1) |>
  mutate(sbu=factor(sbu, levels=c("Engineering, Construction & Maintenance",
                                   "Asset Integrity Management",
                                   "Instrumentation & Controls",
                                   "Testing & Calibration"))) |>
  select(awarded, sbu, on_time) |>
  drop_na()

logit_model <- glm(awarded ~ sbu + on_time, data=model_df, family=binomial(link="logit"))

tidy(logit_model, exponentiate=TRUE, conf.int=TRUE) |>
  rename(Odds_Ratio=estimate, Lower_95CI=conf.low, Upper_95CI=conf.high) |>
  mutate(across(c(Odds_Ratio,Lower_95CI,Upper_95CI), ~round(.x,3)),
         p.value=round(p.value,4)) |>
  kable(caption="Table 6: Logistic Regression — Odds Ratios") |>
  kable_styling(bootstrap_options=c("striped","hover"), full_width=FALSE)

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Table 6: Logistic Regression — Odds Ratios
term	Odds_Ratio	std.error	statistic	p.value	Lower_95CI	Upper_95CI
(Intercept)	0.073	0.8697916	-3.0125248	0.0026	0.009	3.130000e-01
sbuAsset Integrity Management	1.863	1.0753554	0.5783997	0.5630	0.199	1.760600e+01
sbuInstrumentation & Controls	0.000	1793.1200642	-0.0091131	0.9927	NA	2.253895e+65
sbuTesting & Calibration	1.808	0.9788269	0.6050826	0.5451	0.265	1.514900e+01
on_time	1.949	0.8255535	0.8082336	0.4190	0.383	1.096900e+01

Code

roc_obj <- roc(model_df$awarded, fitted(logit_model), quiet=TRUE)
plot(roc_obj, col="#1565C0", lwd=2,
     main=paste("ROC Curve | AUC =", round(auc(roc_obj),3)))
abline(a=0, b=1, lty=2, col="grey60")

Code

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, roc_curve

ldf=sub_py.copy()
ldf["on_time"]=((ldf["days_diff"].notna())&(ldf["days_diff"]<=0)).astype(int)
ldf=ldf.dropna(subset=["sbu","awarded","on_time"]).reset_index(drop=True)

sbu_dummies=pd.get_dummies(ldf["sbu"],prefix="sbu",drop_first=False).reset_index(drop=True)
ref="sbu_Engineering, Construction & Maintenance"
if ref in sbu_dummies.columns: sbu_dummies.drop(columns=[ref],inplace=True)

X=pd.concat([sbu_dummies,ldf[["on_time"]]],axis=1).fillna(0)
y=ldf["awarded"]

lr=LogisticRegression(max_iter=1000,random_state=42,class_weight="balanced")
lr.fit(X,y)

LogisticRegression(class_weight='balanced', max_iter=1000, random_state=42)

Code

coef_df=pd.DataFrame({"Feature":X.columns,"Coefficient":lr.coef_[0],
                       "Odds_Ratio":np.exp(lr.coef_[0])}).round(3)
print(coef_df.to_string(index=False))

                       Feature  Coefficient  Odds_Ratio
sbu_Asset Integrity Management        0.513       1.670
sbu_Instrumentation & Controls       -1.317       0.268
     sbu_Testing & Calibration        0.548       1.730
                       on_time        0.092       1.096

Code

proba=lr.predict_proba(X)[:,1]
auc_score=roc_auc_score(y,proba)
print(f"\nROC-AUC: {auc_score:.3f}")


ROC-AUC: 0.680

Code

fpr,tpr,_=roc_curve(y,proba)
fig,ax=plt.subplots(figsize=(6,5))
ax.plot(fpr,tpr,color="#1565C0",lw=2,label=f"AUC = {auc_score:.3f}")

[<matplotlib.lines.Line2D object at 0x0000017663E804D0>]

Code

ax.plot([0,1],[0,1],"--",color="gray")

[<matplotlib.lines.Line2D object at 0x0000017668563380>]

Code

ax.set_xlabel("False Positive Rate"); ax.set_ylabel("True Positive Rate")

Text(0.5, 0, 'False Positive Rate')
Text(0, 0.5, 'True Positive Rate')

Code

ax.set_title("ROC Curve — Logistic Regression"); ax.legend()

Text(0.5, 1.0, 'ROC Curve — Logistic Regression')
<matplotlib.legend.Legend object at 0x000001766854B0B0>

Code

plt.tight_layout(); plt.show()

Interpretation for a non-technical manager: Each odds ratio tells you how much more (or less) likely a bid is to be awarded compared to an Engineering/Construction bid. An odds ratio of 3.0 for Testing & Calibration means those bids are 3 times more likely to win. The on_time coefficient tells you the multiplier effect of submitting before the deadline. An AUC above 0.6 means the model is better than random chance at predicting which bids will win.

Caveat: with only 7 awarded bids, treat magnitudes as directional signals. Rerun after accumulating more wins.

10. Integrated Findings

Analysis	Key Finding	Business Action
EDA	Win rate = ~10%. Most bids fail.	Reduce volume, improve quality and focus.
Visualisation	Testing & Calibration and AIM convert best.	Grow these service lines deliberately.
Hypothesis 1	Win rates differ by service category.	Allocate resources to high-converting SBUs.
Hypothesis 2	On-time submission is associated with winning.	Enforce internal deadlines 48 hrs before client deadlines.
Correlation	Timing and SBU are the strongest correlates of award.	Use these two factors to score incoming bids.
Logistic Regression	SBU and punctuality predict award probability.	Use model to support bid/no-bid decisions.

Single recommendation: Implement a formal bid/no-bid decision gate. Score each incoming opportunity on: (1) Is it Testing & Calibration or Asset Integrity Management? (2) Can we submit on time? Bids that fail both tests should be declined. This concentrates effort on the highest-probability opportunities.

11. Limitations & Further Work

Small awarded sample (n = 7): Low statistical power throughout. Rerun all tests after 12 more months of data.
No contract value data: Win rate treats all bids equally. Future analysis should optimise for expected revenue, not just number of wins.
No competitor information: Number of competitors per bid would significantly improve model accuracy.
Date parsing inconsistencies: Standardise all dates to DD/MM/YYYY in the bid register.
No reason for “Did not Bid” decisions: Adding a reason code column would enable future analysis of bid selectivity.

References

Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making. Lagos Business School / markanalytics.online. https://markanalytics.online

R Core Team. (2024). R: A language and environment for statistical computing (Version 4.x). R Foundation for Statistical Computing. https://www.R-project.org/

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4

McKinney, W. (2010). Data structures for statistical computing in Python. In Proceedings of the 9th Python in Science Conference (pp. 56–61). https://doi.org/10.25080/Majora-92bf1922-00a

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

Agbadobi Blessing Osochi. (2026). Tender tracking register — oil and gas engineering services, Nigeria [Dataset]. Collected from Engineering Automation Technology Limited, Lagos, Nigeria. Data available on request from the author.

Appendix: AI Usage Statement

Claude (Anthropic) was used to assist with structuring the Quarto document template, suggesting appropriate R and Python package selections, and generating initial code scaffolding for each analytical section. All analytical decisions — including the choice of Spearman over Pearson correlation, the selection of Fisher’s Exact Test for the timing hypothesis, the framing of both business hypotheses, the reference category selection in the logistic regression, and the business interpretation of every output — were made independently by the author. The author reviewed, ran, and verified all code outputs and takes full responsibility for all conclusions in this document.