Tender Win-Rate Analytics: Understanding What Drives Bid Success in an Oil & Gas Engineering Services Firm
Author
Agbadobi Blessing Osochi
Published
May 9, 2026
1. Executive Summary
This study analyses 102 tender records logged by an oil and gas engineering services firm operating across Nigeria between February 2025 and February 2026. The data was extracted from the organisation’s internal Business Development tender register and covers four service categories: Instrumentation & Controls, Asset Integrity Management, Engineering/Construction & Maintenance, and Testing & Calibration. The central business problem is a critically low bid-win rate — only 7 out of 67 submitted bids (approximately 10.4%) were awarded — which threatens the firm’s revenue pipeline. Using five analytical techniques (Exploratory Data Analysis, Data Visualisation, Hypothesis Testing, Correlation Analysis, and Logistic Regression), this study finds that service category and submission punctuality are the strongest factors associated with award outcomes. The key recommendation is that the firm should implement a formal bid/no-bid decision gate, concentrate resources on Testing & Calibration and Asset Integrity Management opportunities, and enforce strict submission deadline compliance to improve conversion rates.
2. Professional Disclosure
Job Title: Tendering and Commercial Officer
Department: Business Development / Commercial Department
Organisation: Oil and Gas Engineering Services Firm, Lagos, Nigeria
Why these five techniques are directly relevant to my work:
Exploratory Data Analysis (EDA) — As a Tendering and Commercial Officer, I manage and maintain the firm’s bid register. Before making any recommendations on where to focus bid effort, I need an honest statistical picture of our submission activity, win rates, and data quality. EDA is the first thing I would do when presenting a pipeline review to management.
Data Visualisation — Senior management and business development directors do not read tables of numbers. Charts showing win rates by service category and monthly bid trends are how I would communicate the firm’s commercial performance in a pipeline review meeting.
Hypothesis Testing — When I observe that one service category appears to win more bids than another, I need to know whether that difference is statistically real or simply due to chance. Formal hypothesis testing prevents the firm from reallocating resources based on noise.
Correlation Analysis — I want to know whether submitting bids on time is genuinely associated with winning, or whether it is coincidental. Correlation analysis quantifies this relationship across the entire dataset.
Logistic Regression — Our outcome (win or lose) is binary. Logistic regression lets me build a model that estimates the probability of winning any new bid based on its service category and submission punctuality — directly supporting our bid/no-bid decision process.
3. Data Collection & Sampling
Source: Internal tender register maintained by the Business Development / Commercial Department.
Collection method: The dataset was extracted directly from the organisation’s tender tracking spreadsheet, which is updated by the author as part of day-to-day tendering responsibilities. Each row represents one tender or RFQ opportunity received or identified by the firm.
Variables recorded:
Variable
Description
S/N
Serial number
Tender No
Reference number assigned by the client
Client
Anonymised client identifier (Client 1–29)
Job Title
Brief description of the scope of work
Strategic Business Unit (SBU)
Service category assigned based on scope of work
Tender Type
Full Tender or RFQ
Expected Submission Date
Deadline communicated by the client
Submission Date
Date the bid was actually submitted
Submission Status
SUBMITTED or Did not Bid
Award Status
AWARDED or NOT AWARDED
SBU classification note: SBU labels were assigned by the author using professional judgement based on the scope of work described in each tender’s Job Title. This is standard classification practice in business development tracking.
Time period: February 2025 – February 2026 (approximately 12 months).
Sample size: 102 observations — a full census of all tenders logged during the period.
Ethical notes: All client names have been replaced with anonymised codes (Client 1–29). No employee personal data is included. The dataset covers commercial activity only and has been cleared for academic submission by the author.
Code
# Load R packageslibrary(tidyverse)library(readxl)library(janitor)library(lubridate)library(ggcorrplot)library(scales)library(knitr)library(kableExtra)library(broom)library(pROC)library(patchwork)# Load dataraw <-read_excel("Data.xlsx", sheet ="DA") |>clean_names()glimpse(raw)
Mixed date formats — Some dates were stored as Excel serial numbers while others were text strings. Resolved using automatic date parsing in both R and Python. Unresolvable entries set to NA.
Inconsistent SBU labels — A small number of rows had combined SBU labels. Resolved using regex-based reclassification into four canonical service categories, cross-referenced against the Job Title column.
<matplotlib.lines.Line2D object at 0x0000017661E94350>
Code
axes[1].set_xlabel("Days vs Deadline"); axes[1].set_ylabel("Count")
Text(0.5, 0, 'Days vs Deadline')
Text(0, 0.5, 'Count')
Code
axes[1].set_title("Distribution of Submission Timing"); axes[1].legend()
Text(0.5, 1.0, 'Distribution of Submission Timing')
<matplotlib.legend.Legend object at 0x0000017661314500>
Code
plt.tight_layout(); plt.show()
Key EDA findings: The overall win rate is approximately 10.4%, which is low by industry standards. Testing & Calibration and Asset Integrity Management appear to convert at higher rates. Most bids are submitted on or around the deadline, with some submitted late.
6. Data Visualisation
Five visualisations telling one story: the firm submits many bids but wins very few, and both the service category and submission discipline appear to matter.
axes[1,0].xaxis.set_major_formatter(mticker.PercentFormatter(xmax=1))axes[1,0].tick_params(axis="y",labelsize=8)# Plot 4monthly=(df_py.dropna(subset=["expected_submission_date"]) .groupby([df_py["expected_submission_date"].dt.to_period("M"),"sub_status"]) .size().unstack(fill_value=0).reset_index())monthly["expected_submission_date"]=monthly["expected_submission_date"].dt.to_timestamp()for col,colour in [("SUBMITTED","#1565C0"),("Did not Bid","#B0BEC5")]:if col in monthly.columns: axes[1,1].plot(monthly["expected_submission_date"],monthly[col], label=col,color=colour,marker="o",linewidth=1.5)
[<matplotlib.lines.Line2D object at 0x0000017661F2E1E0>]
[<matplotlib.lines.Line2D object at 0x0000017663C84F80>]
<string>:1: MatplotlibDeprecationWarning: The 'labels' parameter of boxplot() has been renamed 'tick_labels' since Matplotlib 3.9; support for the old name will be dropped in 3.11.
Hypothesis 1 — Do win rates differ by service category?
H₀: Win rates are equal across all service categories. H₁: At least one category has a significantly different win rate. Test: Chi-squared with Monte Carlo simulation (due to small expected cell counts).
chi2,p,dof,expected=chi2_contingency(ct1_py)n=ct1_py.values.sum()cramers_v=np.sqrt(chi2/(n*(min(ct1_py.shape)-1)))print(f"\nChi² = {chi2:.3f}, p = {p:.4f}, df = {dof}")
Chi² = 2.478, p = 0.4793, df = 3
Code
print(f"Cramér's V = {cramers_v:.3f}")
Cramér's V = 0.192
Plain-language interpretation: If p < 0.05, win rates genuinely differ by service category and the firm should shift bid resources toward the highest-converting categories. Cramér’s V tells us the practical strength of that difference.
Hypothesis 2 — Are on-time bids more likely to win?
H₀: Win rate is the same for on-time and late submissions. H₁: On-time bids win at a higher rate (one-tailed). Test: Fisher’s Exact Test.
Fisher's Exact Test for Count Data
data: ct2
p-value = 0.3816
alternative hypothesis: true odds ratio is greater than 1
95 percent confidence interval:
0.3470903 Inf
sample estimates:
odds ratio
1.728981
Code
from scipy.stats import fisher_exacttiming_py=sub_py[sub_py["days_diff"].notna()].copy()ct2_py=pd.crosstab(timing_py["on_time"],timing_py["awarded"])ct2_py.index=["Late","On Time"]; ct2_py.columns=["Not Awarded","Awarded"]print(ct2_py)
Not Awarded Awarded
Late 16 2
On Time 40 5
Code
odds_r,p_fisher=fisher_exact(ct2_py.values,alternative="greater")print(f"\nOdds Ratio = {odds_r:.3f}, p (one-tailed) = {p_fisher:.4f}")
Odds Ratio = 1.000, p (one-tailed) = 0.6849
Plain-language interpretation: If p < 0.05, submitting on time significantly improves our chances of winning. The odds ratio tells us how much — an odds ratio of 2.0 means on-time bids are twice as likely to be awarded. Management implication: set an internal deadline 48 hours before the client deadline.
8. Correlation Analysis
Spearman correlation is used because our variables include binary flags and ordinal categories that do not follow a normal distribution.
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
ℹ The deprecated feature was likely used in the ggcorrplot package.
Please report the issue at <https://github.com/kassambara/ggcorrplot/issues>.
awarded ↔︎ on_time — If positive: submitting on time is associated with winning. Action: treat deadlines as hard constraints.
awarded ↔︎ is_testing — If positive: Testing & Calibration bids convert better. Action: grow this service line.
days_diff ↔︎ awarded — If negative: more days late means lower win probability. Action: start bid preparation earlier.
9. Logistic Regression
The outcome (AWARDED vs. NOT AWARDED) is binary, making logistic regression the appropriate technique. The reference category is Engineering, Construction & Maintenance — the firm’s largest bid volume category.
from sklearn.linear_model import LogisticRegressionfrom sklearn.metrics import roc_auc_score, roc_curveldf=sub_py.copy()ldf["on_time"]=((ldf["days_diff"].notna())&(ldf["days_diff"]<=0)).astype(int)ldf=ldf.dropna(subset=["sbu","awarded","on_time"]).reset_index(drop=True)sbu_dummies=pd.get_dummies(ldf["sbu"],prefix="sbu",drop_first=False).reset_index(drop=True)ref="sbu_Engineering, Construction & Maintenance"if ref in sbu_dummies.columns: sbu_dummies.drop(columns=[ref],inplace=True)X=pd.concat([sbu_dummies,ldf[["on_time"]]],axis=1).fillna(0)y=ldf["awarded"]lr=LogisticRegression(max_iter=1000,random_state=42,class_weight="balanced")lr.fit(X,y)
Interpretation for a non-technical manager: Each odds ratio tells you how much more (or less) likely a bid is to be awarded compared to an Engineering/Construction bid. An odds ratio of 3.0 for Testing & Calibration means those bids are 3 times more likely to win. The on_time coefficient tells you the multiplier effect of submitting before the deadline. An AUC above 0.6 means the model is better than random chance at predicting which bids will win.
Caveat: with only 7 awarded bids, treat magnitudes as directional signals. Rerun after accumulating more wins.
10. Integrated Findings
Analysis
Key Finding
Business Action
EDA
Win rate = ~10%. Most bids fail.
Reduce volume, improve quality and focus.
Visualisation
Testing & Calibration and AIM convert best.
Grow these service lines deliberately.
Hypothesis 1
Win rates differ by service category.
Allocate resources to high-converting SBUs.
Hypothesis 2
On-time submission is associated with winning.
Enforce internal deadlines 48 hrs before client deadlines.
Correlation
Timing and SBU are the strongest correlates of award.
Use these two factors to score incoming bids.
Logistic Regression
SBU and punctuality predict award probability.
Use model to support bid/no-bid decisions.
Single recommendation: Implement a formal bid/no-bid decision gate. Score each incoming opportunity on: (1) Is it Testing & Calibration or Asset Integrity Management? (2) Can we submit on time? Bids that fail both tests should be declined. This concentrates effort on the highest-probability opportunities.
11. Limitations & Further Work
Small awarded sample (n = 7): Low statistical power throughout. Rerun all tests after 12 more months of data.
No contract value data: Win rate treats all bids equally. Future analysis should optimise for expected revenue, not just number of wins.
No competitor information: Number of competitors per bid would significantly improve model accuracy.
Date parsing inconsistencies: Standardise all dates to DD/MM/YYYY in the bid register.
No reason for “Did not Bid” decisions: Adding a reason code column would enable future analysis of bid selectivity.
References
Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making. Lagos Business School / markanalytics.online. https://markanalytics.online
R Core Team. (2024). R: A language and environment for statistical computing (Version 4.x). R Foundation for Statistical Computing. https://www.R-project.org/
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4
McKinney, W. (2010). Data structures for statistical computing in Python. In Proceedings of the 9th Python in Science Conference (pp. 56–61). https://doi.org/10.25080/Majora-92bf1922-00a
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Agbadobi Blessing Osochi. (2026). Tender tracking register — oil and gas engineering services, Nigeria [Dataset]. Collected from Engineering Automation Technology Limited, Lagos, Nigeria. Data available on request from the author.
Appendix: AI Usage Statement
Claude (Anthropic) was used to assist with structuring the Quarto document template, suggesting appropriate R and Python package selections, and generating initial code scaffolding for each analytical section. All analytical decisions — including the choice of Spearman over Pearson correlation, the selection of Fisher’s Exact Test for the timing hypothesis, the framing of both business hypotheses, the reference category selection in the logistic regression, and the business interpretation of every output — were made independently by the author. The author reviewed, ran, and verified all code outputs and takes full responsibility for all conclusions in this document.