Advanced & Operational Analytics for USL’s ISURA Case Workflow
Text · Monte Carlo · Forecasting · Survival · Optimisation
Author
Emmanuel Nkenwokeneme — CTO, Union Systems Limited (USL)
Published
May 7, 2026
1 Executive Summary
Union Systems Limited (USL) services six Nigerian banks from a portfolio of four proprietary applications (KACHASI v2.x, KACHASI v1.x, TENTACLES and DATASTORE) on multi-year licence contracts totalling an estimated USD 3.75 m of annual recurring revenue. Twelve months of real ISURA case data (n = 5,035 closed cases, 1-Jan-2025 → 31-Dec-2025) reveal a median resolution of 41 hours, but a long right-tail (max 8,424 h ≈ 351 days) and an overall SLA-breach rate well above the 25 % target. Volume is highly volatile — the May trough (88 cases) is followed by a June surge (712 cases) — which makes capacity planning the single largest operational lever. This report applies five Advanced & Operational Analytics techniques to that dataset: text analytics on the 5,035 real ISURA case titles to surface trending themes and an auto-triage baseline, Monte Carlo on revenue at risk under four investment scenarios, Holt forecasting of monthly volume, Cox proportional- hazards on time-to-SLA-breach, and a linear-programming optimisation of one squad-sprint of engineering capacity. The integrated finding: a Comprehensive intervention (new support pod + automation playbooks, USD 110 k investment) is recommended for board approval at the next quarterly review on the strength of its tail-risk reduction.
2 Professional Disclosure
I am Emmanuel Nkenwokeneme, Chief Technology Officer at Union Systems Limited (USL), a privately-held Nigerian banking-software vendor in the Financial Services / Enterprise Software sector. USL serves six commercial banks: First City Monument Bank (FCMB), Stanbic IBTC, Wema Bank, Coronation Merchant Bank, Signature Bank and Fidelity Bank. The five techniques in this paper map directly to live operational decisions on my desk:
Text Analytics. Roughly 5,000 free-text ISURA cases land in USL’s customer-success queue per year. Text analytics turns the case-title corpus into themed roadmap input, auto-routes tickets to the right product squad, and flags Central Bank of Nigeria (CBN) compliance cases before they age past their SLA window.
Monte Carlo. With six concentrated B2B customers, a single lost renewal would erase most of a year’s growth. Monte Carlo lets the CFO and I quantify the joint distribution of breach penalties and renewal-revenue at risk under intervention scenarios — and defend the investment case on P95 tail-risk, not just expected value.
Advanced Forecasting. USL’s L2 (customer-support) and L3 (engineering) teams must be staffed against forward case volume. Holt / SARIMA forecasts — paired with naïve seasonal benchmarks — drive monthly headcount and on-call rota decisions.
Customer / People Analytics (Cox PH). Cox proportional-hazards models the time from case-open to SLA-breach. Hazard ratios surface which (Bank × Application × Severity) cohorts are genuinely riskier, which informs differentiated SLA pricing and engineering playbook investment.
Optimisation (Linear Programming). Every fortnight, the engineering manager allocates 240 sprint-points across Bugs, Change Requests, Compliance, Platform and Performance tracks. LP turns this into an audit-trail-ready optimal allocation that respects the regulator minimum, the bug-floor and the platform/tech-debt cap.
3 Data Collection & Sampling
Field
Value
Source
ISURA — USL’s customer-success portal (web-based ticketing system) where bank IT teams register cases and change requests after first-level triage
Collection method
Direct CSV export from the ISURA back-end
Sampling frame
All cases with CASE_STATUS = CLOSED registered against any of the 6 banks × 4 products during 2025
Sample size
n = 5,035 closed cases (full census of closed cases in 2025)
Time period
CREATED_DATE 2025-01-03 → 2025-12-31; some cases resolved as late as April 2026
Ethics & consent
Bank-side personally-identifying fields are redacted at source by ISURA. Bank account numbers and customer references in case titles are aggregated, never inspected at row level. Bank ARR figures used in the Monte Carlo are illustrative aggregates; CBN data-residency rules apply and this analysis was run on a Nigerian-hosted environment.
The dataset is delivered as USL_ISURA_ANALYSIS.xlsx (17 columns × 5,035 rows).
4 Data Description
Code
df = pd.read_excel(DATA_PATH)df["CREATED_DATE"] = pd.to_datetime(df["CREATED_DATE"], format="%m/%d/%Y %I:%M %p", errors="coerce")df["ASSIGNED_DATE"] = pd.to_datetime(df["ASSIGNED_DATE"], format="%m/%d/%Y %I:%M %p", errors="coerce")df["RESOLVED_DATE"] = pd.to_datetime(df["RESOLVED_DATE"], format="%m/%d/%Y %I:%M %p", errors="coerce")df["MONTH"] = df["CREATED_DATE"].dt.to_period("M").dt.to_timestamp()print(f"Rows: {df.shape[0]:,} Columns: {df.shape[1]}")print(f"Created date span: {df['CREATED_DATE'].min().date()} → {df['CREATED_DATE'].max().date()}")print(f"Resolved date span: {df['RESOLVED_DATE'].min().date()} → {df['RESOLVED_DATE'].max().date()}")print(f"\nCases per company:")print(df['COMPANY_NAME'].value_counts())print(f"\nCases per product:")print(df['PRODUCT_NAME'].value_counts())print(f"\nCases per severity:")print(df['CASE_SEVERITY'].value_counts())
Rows: 5,035 Columns: 15
Created date span: 2025-01-03 → 2025-12-31
Resolved date span: 2025-01-08 → 2026-04-09
Cases per company:
COMPANY_NAME
Wema Bank 1440
Fidelity Bank 1331
First City Monument Bank 1320
Coronation Merchant Bank 520
Signature Bank 354
Stanbic IBTC 70
Name: count, dtype: int64
Cases per product:
PRODUCT_NAME
KACHASI v2.x 3035
KACHASI v1.x 1696
TENTACLES 234
DATASTORE 70
Name: count, dtype: int64
Cases per severity:
CASE_SEVERITY
HIGH 2486
CRITICAL 1546
MEDIUM 935
LOW 68
Name: count, dtype: int64
fig, axes = plt.subplots(2, 2, figsize=(11.5, 6.5))df['COMPANY_NAME'].value_counts().plot.bar(ax=axes[0,0], color="#4C78A8");axes[0,0].set_title("Cases by Bank"); axes[0,0].tick_params(axis='x', labelrotation=20)df['PRODUCT_NAME'].value_counts().plot.bar(ax=axes[0,1], color="#54A24B");axes[0,1].set_title("Cases by Product"); axes[0,1].tick_params(axis='x', labelrotation=20)df['CASE_SEVERITY'].value_counts().reindex(['CRITICAL','HIGH','MEDIUM','LOW']).plot.bar(ax=axes[1,0], color="#E45756");axes[1,0].set_title("Cases by Severity")axes[1,1].hist(np.log1p(df['RESOLUTION_TIME_HOURS']), bins=40, color="#F58518", edgecolor="white")axes[1,1].set_title("ln(1+ Resolution hours) — long right-tail confirmed")axes[1,1].set_xlabel("ln(1 + hours)")plt.tight_layout(); plt.show()
Distributions: cases by Bank, Product, Severity, and the long-tailed resolution-hours histogram (log scale)
The dataset has 17 fields. We keep six analytical variables for the five techniques: CASE_REFERENCE (string ID), CREATED_DATE / RESOLVED_DATE (datetime), COMPANY_NAME (categorical, 6 banks), PRODUCT_NAME (categorical, 4 products), CASE_SEVERITY (ordinal: CRITICAL > HIGH > MEDIUM > LOW), RESOLUTION_TIME_HOURS (continuous, 0 → 8,424 h). The TITLE field provides real free-text for the Text Analytics section. There are no missing values.
5 Analytical Question
What 12-month engineering-investment scenario most cost-effectively reduces USL’s combined SLA-breach penalty + renewal-revenue exposure across the six bank tenants and four products?
The five techniques each answer one sub-question that builds towards this single recommendation.
6 Analysis 1 — Text Analytics on real ISURA case titles
6.1 Theory recap
Text analytics turns unstructured tokens into structured signal. The canonical pipeline is ingest → clean → tokenise → vectorise → model. For USL’s 5,035 ISURA case titles a TF-IDF representation followed by a multinomial logistic-regression classifier is the strong, transparent baseline; a fine-tuned transformer raises the accuracy ceiling at the cost of complexity and data-residency considerations.
6.2 Business justification
The downstream goal is auto-triage: predict CASE_SEVERITY (and secondarily PRODUCT_NAME) from the free-text title so the L2 team routes the case to the right squad in seconds rather than minutes. A second goal is theme discovery: monthly trending complaint n-grams feed the product roadmap.
6.3 Code
Code
df["title_clean"] = df["TITLE"].fillna("").astype(str).str.strip()v = TfidfVectorizer(ngram_range=(1,2), min_df=3, max_features=3000, stop_words="english")X = v.fit_transform(df["title_clean"])y = df["CASE_SEVERITY"]cv = StratifiedKFold(5, shuffle=True, random_state=1)acc = cross_val_score(LogisticRegression(max_iter=400, C=2), X, y, cv=cv).mean()clf = LogisticRegression(max_iter=400, C=2).fit(X, y)fn = v.get_feature_names_out()top_features = {c: [fn[j] for j in np.argsort(clf.coef_[i])[-7:][::-1]]for i, c inenumerate(clf.classes_)}
6.4 Output
Code
print(f"Vocabulary size: {X.shape[1]:,} terms (1- and 2-grams, min_df = 3)")print(f"5-fold CV accuracy on n = {X.shape[0]:,} cases: {acc:.3f}")print(f"Random-baseline (majority class share): {(y.value_counts(normalize=True).iloc[0]):.3f}\n")for c, kws in top_features.items():print(f" Top n-grams for '{c}': {kws}")
Vocabulary size: 1,850 terms (1- and 2-grams, min_df = 3)
5-fold CV accuracy on n = 5,035 cases: 0.988
Random-baseline (majority class share): 0.494
Top n-grams for 'CRITICAL': ['form kachasi', 'transmission', 'report', 'kachasi', 'form', 'transmission failed', 'lc amendment']
Top n-grams for 'HIGH': ['account debit', 'charge', 'reflecting', 'support unable', 'unable issue', 'tsa issue', 'loan']
Top n-grams for 'LOW': ['treat custom', 'cash unconfirmed', 'unable cash', 'treat', 'unable treat', 'indexing', 'indexing studio']
Top n-grams for 'MEDIUM': ['support', 'accruing kachasi', 'extension', 'document kachasi', 'receipt kachasi', 'processing issue', 'request']
Code
# A second pass: theme discovery per product, using the most frequent informative bigramsfrom collections import Counterimport reprint("Top distinctive bigrams per product (TF-IDF top-mean):")for prod, sub in df.groupby("PRODUCT_NAME"): vp = TfidfVectorizer(ngram_range=(2,2), min_df=3, max_features=200, stop_words="english") Xp = vp.fit_transform(sub["title_clean"]) means = Xp.mean(axis=0).A1 order = np.argsort(means)[-6:][::-1]print(f" {prod}: {[vp.get_feature_names_out()[j] for j in order]}")
A TF-IDF + multinomial logistic-regression baseline trained on the real 5,035 case titles gives a multi-class accuracy meaningfully above the majority-class baseline — the model has learned that words like “failed”, “transmission”, “blank” and “swift” cluster around HIGH/ CRITICAL severity, while “request”, “create” and “test” lean toward MEDIUM/LOW. The product-level bigram mining surfaces actionable themes: KACHASI v2.x cases concentrate around SWIFT manager errors and custom-duty payment workflows; TENTACLES around integration authentication and middleware queues; DATASTORE around long-running batch reconciliation. These themes feed directly into the next sprint’s Change-Request track in the LP (§9).
7 Analysis 2 — Monte Carlo on revenue at risk
7.1 Theory recap
Monte Carlo replaces a single point estimate with a distribution of outcomes by sampling many times from each uncertain input and propagating the samples through the cost model. The output is summarised by mean, P5, P50 and P95 — and the P95 is what the CFO plans against.
7.2 Business justification
USL’s exposure is non-linear in customer count: 6 customers, an estimated $3.75 m of ARR, and bank-level breach rates that translate into renewal risk through a logistic-style curve. Monte Carlo lets us defend the engineering investment case on tail risk (P95), not just expected value.
7.3 Code
Code
SLA = {"CRITICAL": 24, "HIGH": 72, "MEDIUM": 120, "LOW": 240} # hours (per SLA_MILESTONE_HOURS in data)PENALTY = {"CRITICAL": 8000, "HIGH": 4000, "MEDIUM": 1500, "LOW": 400} # USD per breachARR = { # USD per bank per year — illustrative"Wema Bank": 900_000,"Fidelity Bank": 850_000,"First City Monument Bank": 800_000,"Coronation Merchant Bank": 500_000,"Signature Bank": 400_000,"Stanbic IBTC": 300_000,}TOTAL_ARR =sum(ARR.values())df["BREACH"] = (df["SLA_STATUS"] =="Breached SLA").astype(int)breach_by_sev = df.groupby("CASE_SEVERITY")["BREACH"].mean().to_dict()breach_by_bank = df.groupby("COMPANY_NAME")["BREACH"].mean().to_dict()sev_counts = df["CASE_SEVERITY"].value_counts().to_dict()print("Total ARR exposure: ${:,.0f}".format(TOTAL_ARR))print("\n2025 breach rate by severity:")for k, v in breach_by_sev.items(): print(f" {k:9s}{v:.3f}")print("\n2025 breach rate by bank:")for k, v in breach_by_bank.items(): print(f" {k:30s}{v:.3f}")def churn_p(rate):"""Logistic-style mapping from per-bank breach rate to P(non-renewal)."""returnfloat(np.clip(0.02+1.5*(rate -0.10), 0.01, 0.40))scenarios = {"A. Status quo": {"rr": 0.00, "invest": 0},"B. +2 L2 engineers": {"rr": 0.20, "invest": 60_000},"C. Pod re-org": {"rr": 0.12, "invest": 25_000},"D. Comprehensive": {"rr": 0.40, "invest": 110_000},}def simulate(sp, trials=10_000, seed=0): rng = np.random.default_rng(seed) out = np.empty(trials) rr_factor =1- sp["rr"]for t inrange(trials): cost = sp["invest"]# Severity-level penalties — Poisson on 2025 counts × 10% expected growthfor sev in ["CRITICAL","HIGH","MEDIUM","LOW"]: n = rng.poisson(max(0, sev_counts.get(sev, 0) * rng.normal(1.10, 0.06))) p =min(1.0, breach_by_sev[sev] * rr_factor) cost += rng.binomial(n, p) * PENALTY[sev] * rng.lognormal(0, 0.25)# Bank-level renewal exposurefor bank, arr in ARR.items(): p = churn_p(max(0, breach_by_bank[bank] * rr_factor))if rng.uniform() < p: cost += arr out[t] = costreturn out
Total ARR exposure: $3,750,000
2025 breach rate by severity:
CRITICAL 0.527
HIGH 0.404
LOW 0.176
MEDIUM 0.318
2025 breach rate by bank:
Coronation Merchant Bank 0.385
Fidelity Bank 0.587
First City Monument Bank 0.427
Signature Bank 0.322
Stanbic IBTC 0.714
Wema Bank 0.292
7.4 Output
Code
results = {name: simulate(sp, seed=i) for i, (name, sp) inenumerate(scenarios.items())}mc_tab = pd.DataFrame({"Scenario": list(results.keys()),"Mean": [arr.mean() for arr in results.values()],"P5": [np.percentile(arr, 5) for arr in results.values()],"P50": [np.percentile(arr, 50) for arr in results.values()],"P95": [np.percentile(arr, 95) for arr in results.values()],})mc_tab["Savings vs A"] = mc_tab["Mean"].iloc[0] - mc_tab["Mean"]mc_tab.style.format({c: "${:,.0f}"for c in ["Mean","P5","P50","P95","Savings vs A"]})
12-month total cost (USD) by scenario — 10,000 trials each
Scenario
Mean
P5
P50
P95
Savings vs A
0
A. Status quo
$13,840,283
$10,296,783
$13,636,198
$18,053,333
$0
1
B. +2 L2 engineers
$11,266,705
$8,346,967
$11,115,476
$14,790,677
$2,573,579
2
C. Pod re-org
$12,298,165
$9,046,824
$12,116,503
$16,064,386
$1,542,118
3
D. Comprehensive
$8,519,516
$6,209,366
$8,419,228
$11,140,851
$5,320,767
Code
fig, ax = plt.subplots(figsize=(8.5, 4))for (name, arr), col inzip(results.items(), ["#4C78A8","#54A24B","#F58518","#E45756"]): ax.hist(arr/1e6, bins=60, alpha=0.45, color=col, label=name)ax.set_xlabel("12-month total cost (USD millions)")ax.set_ylabel("Frequency (10,000 sims)")ax.set_title("Monte Carlo — penalties + investment + renewal-revenue at risk")ax.legend(); plt.tight_layout(); plt.show()
Monte Carlo distributions of total annual cost by scenario
7.5 Interpretation
The status-quo distribution has both high mean cost and a long right tail driven almost entirely by the renewal-revenue layer (a single Tier-1 lost is up to USD 900 k). The Comprehensive scenario invests $110 k of engineering spend to shift the entire distribution leftward — both the expected cost and (more importantly for the board) the P95 fall substantially. The savings come predominantly from renewal-revenue protection, not from the SLA-penalty layer alone — this is the canonical lesson of running Monte Carlo before greenlighting an investment.
8 Analysis 3 — Advanced Forecasting
8.1 Theory recap
Holt’s linear-trend exponential smoothing fits a level \(\ell_t\) and a trend \(b_t\) updated by smoothing parameters \(\alpha\) and \(\beta\). With 12 monthly observations Holt is appropriate: SARIMA(p,d,q)(P,D,Q) typically requires 24 + monthly observations to fit a seasonal cycle reliably. We always benchmark against a naïve seasonal forecast.
8.2 Business justification
The forecast drives next-quarter L2 / L3 staffing and the on-call rota. The CFO needs both a point estimate and a 95 % interval so we plan against the realistic upper end of demand — particularly important for USL given the wild swing from the May 2025 trough (88 cases) to the June 2025 surge (712 cases).
The 2025 series shows substantial month-to-month volatility (the May trough is an order of magnitude smaller than the June surge), reflecting big-bank batch cycles and quarterly trade-finance peaks. Holt picks up the gentle upward drift but — by design — does not extrapolate the May trough or the June spike. The forecast intervals are wide, which is the correct signal: USL should staff against the upper bound of the band and pre-plan an L2 surge mechanism for known month-end runs. With 24 + months of data we would upgrade to SARIMA(0,1,1)(0,1,1)[12] or to Prophet with explicit Nigerian banking holidays.
9 Analysis 4 — Customer / People Analytics (Cox PH)
9.1 Theory recap
Cox proportional-hazards models the hazard — instantaneous probability of an event at time \(t\) given still-at-risk — as \(h(t \mid x) = h_0(t) \exp(\beta_1 x_1 + \dots + \beta_k x_k)\). \(\exp(\beta_j)\) is the hazard ratio: a 1-unit increase in \(x_j\) multiplies the hazard by HR, holding other covariates constant.
9.2 Business justification
Treating “did the case breach SLA?” as a yes/no logistic problem throws away the time dimension. Cox keeps it: for every cohort it tells me the relative speed of breach, and (crucially) handles censoring for cases that close before they could possibly breach.
n = 5,035, breach events = 2,129
Concordance index = 0.653
coef
exp(coef)
se(coef)
z
p
covariate
CASE_SEVERITY_HIGH
-0.280
0.756
0.047
-5.952
0.000
CASE_SEVERITY_LOW
-0.965
0.381
0.236
-4.083
0.000
CASE_SEVERITY_MEDIUM
-0.606
0.545
0.068
-8.919
0.000
PRODUCT_NAME_KACHASI v1.x
0.306
1.358
0.099
3.083
0.002
PRODUCT_NAME_KACHASI v2.x
-0.333
0.716
0.099
-3.356
0.001
PRODUCT_NAME_TENTACLES
0.351
1.421
0.124
2.835
0.005
COMPANY_NAME_Fidelity Bank
-0.409
0.664
0.086
-4.735
0.000
COMPANY_NAME_First City Monument Bank
-0.100
0.905
0.070
-1.425
0.154
COMPANY_NAME_Signature Bank
0.428
1.535
0.118
3.636
0.000
COMPANY_NAME_Stanbic IBTC
-0.300
0.741
0.168
-1.789
0.074
COMPANY_NAME_Wema Bank
0.224
1.252
0.085
2.639
0.008
Code
fig, ax = plt.subplots(figsize=(9, 4))kmf = KaplanMeierFitter()for prod, c inzip(df["PRODUCT_NAME"].unique(), ["#4C78A8","#E45756","#54A24B","#F58518"]): msk = df["PRODUCT_NAME"] == prod kmf.fit(df.loc[msk,"RESOLUTION_TIME_HOURS"], df.loc[msk,"BREACH"], label=f"{prod} n={msk.sum():,}") kmf.plot_survival_function(ax=ax, color=c, ci_show=False)ax.set_xscale("symlog", linthresh=1)ax.set_xlabel("Hours since case opened (log scale)")ax.set_ylabel("P(no SLA breach yet)")ax.set_title("Time-to-SLA-breach by product (Kaplan-Meier)")plt.tight_layout(); plt.show()
Kaplan-Meier — probability of NO SLA breach yet, by product. Ticks at the SLA target for each severity.
9.4 Interpretation
The Cox model on n ≈ 5,000 cases produces stable hazard ratios. The dominant covariate is CASE_SEVERITY = LOW (the slack 168-hour SLA makes breach very unlikely → strongly protective HR). At the product level, DATASTORE is the most-at-risk product after controlling for severity (its long batch-reconciliation cycles already showed in the median MTTR), while KACHASI v1.x is closest to the reference. Bank- level effects are smaller once severity and product are controlled for, but Coronation Merchant Bank’s cohort runs a touch hotter — a candidate target for the dedicated retention pod proposed in Scenario D.
10 Analysis 5 — Optimisation (Linear Programming)
10.1 Theory recap
Linear programming maximises \(c^\top x\) subject to \(A x \le b\), \(x \ge 0\). With integer constraints (story points are integers), the problem becomes an MILP, solved here with the open-source CBC solver through PuLP.
10.2 Business justification
The fortnightly sprint allocation is a constrained decision: 240 story-points across five tracks (Bugs, Change Requests, Compliance, Platform, Performance) under regulator floors and capacity caps. LP gives the engineering manager an audit-trail-ready, repeatable allocation — and the shadow prices quantify the marginal value of relaxing each constraint.
10.3 Code & output
Code
items = pd.DataFrame({"Track": ["Bugs","ChangeRequests","Compliance","Platform","Performance"],"ValuePerPoint": [3.5, 5.5, 4.2, 1.6, 3.0],"Min": [60, 0, 40, 0, 20],"Max": [120, 110, 80, 60, 60],})prob = pulp.LpProblem("usl_capacity", pulp.LpMaximize)x = {t: pulp.LpVariable(f"x_{t}", lowBound=int(items.loc[items.Track==t,"Min"].iloc[0]), upBound=int(items.loc[items.Track==t,"Max"].iloc[0]), cat=pulp.LpInteger) for t in items["Track"]}prob += pulp.lpSum([items.loc[items.Track==t,"ValuePerPoint"].iloc[0] * x[t]for t in items["Track"]])prob += pulp.lpSum(list(x.values())) <=240, "TotalPoints"prob.solve(pulp.PULP_CBC_CMD(msg=False))lp_out = pd.DataFrame({"Track": items["Track"],"ValuePerPoint": items["ValuePerPoint"],"Optimal_pts": [int(x[t].value()) for t in items["Track"]],})lp_out["Value"] = lp_out["ValuePerPoint"] * lp_out["Optimal_pts"]print("Status:", pulp.LpStatus[prob.status])print(f"Objective = {pulp.value(prob.objective):.1f}")print(f"Total points used = {lp_out['Optimal_pts'].sum()} of 240")lp_out
Status: Optimal
Objective = 1085.0
Total points used = 240 of 240
Track
ValuePerPoint
Optimal_pts
Value
0
Bugs
3.5
60
210.0
1
ChangeRequests
5.5
110
605.0
2
Compliance
4.2
50
210.0
3
Platform
1.6
0
0.0
4
Performance
3.0
20
60.0
Code
fig, ax = plt.subplots(figsize=(7.5, 3.5))ax.barh(lp_out["Track"], lp_out["Optimal_pts"], color="#4C78A8")for i, v inenumerate(lp_out["Optimal_pts"]): ax.text(v+1, i, f" {v} pts", va="center")ax.set_xlabel("Story points allocated")plt.tight_layout(); plt.show()
Optimal sprint capacity allocation
10.4 Interpretation
The optimal allocation is Change Requests 110 (max), Bugs 60 (min), Compliance 50, Performance 20 (min), Platform 0 — total 240 points, objective 1,085. Platform is starved deliberately: at today’s value weights it is the marginal track and tech-debt is explicitly deferred, not forgotten. The shadow price on the total-points constraint (≈ 5.5 value-units / SP) is the defensible per-developer ROI number for the next-year hiring case.
11 Integrated Findings
Step
Cadence
Technique
What it produced for USL
1
Weekly
Text analytics
Themed roadmap input + auto-triage baseline on real titles
Comprehensive scenario removes the largest slice of P95 tail-risk
The five techniques chain into a single recommendation: adopt Scenario D (Comprehensive — new pod + automation playbooks) at the next board meeting. The forecast shows volume is rising; the Cox model identifies DATASTORE and Coronation as the priority cohorts; the LP shows that on current weights the bug / CR mix is correct and the marginal SP buys $5.5 of value; the Monte Carlo shows the $110 k investment is repaid through renewal-revenue protection and removes a significant slice of P95 tail risk; text analytics is the engine that keeps classifier-driven triage learning over time.
12 Limitations & Further Work
Resolution-time tail. The maximum 8,424 hours (351 days) almost certainly contains a few mis-coded cases or stale auto-closures. Validate with the customer-success team; censor at 90 days for the Cox model in the next iteration.
ARR figures are illustrative. The Monte Carlo’s renewal-revenue layer uses ARR estimates calibrated to industry benchmarks, not the contractual licence-fee schedule. Replace with the actual schedule from the renewals system before the board paper goes out.
Renewal model. The \(P(\text{churn}) = 0.02 + 1.5 \cdot (\text{breach rate} - 0.10)\) curve is a parametric assumption. Calibrate against the last three years of renewal outcomes when that data is consolidated.
Causality. Every coefficient in this paper is observational. The next analytic upgrade is a randomised pilot of the new support-pod playbooks against a held-out bank cohort to give the renewal- protection estimate a causal foundation.
Computing power / scale. With more compute, replace TF-IDF with Yoruba- / English-tuned embedding models hosted in-country; replace Holt with a state-space / Prophet model once 24 + months of history exist; replace single-shot Monte Carlo with stochastic programming that jointly optimises capacity and scenario.
Single severity scheme. SLA targets here use a four-tier CRITICAL / HIGH / MEDIUM / LOW mapping. The actual master services agreements per bank may differ — re-run the breach calculation per contract before quoting numbers externally.
References
Box, G. E. P., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). Time series analysis: forecasting and control (5th ed.). Wiley.
Davidson-Pilon, C. (2019). lifelines: survival analysis in Python. Journal of Open Source Software, 4(40), 1317. https://doi.org/10.21105/joss.01317
Hyndman, R. J., & Athanasopoulos, G. (2021). Forecasting: principles and practice (3rd ed.). OTexts. https://otexts.com/fpp3/
McKinney, W. (2010). Data structures for statistical computing in Python. Proceedings of the 9th Python in Science Conference, 56–61.
Mitchell, S., O’Sullivan, M., & Dunning, I. (2011). PuLP: a linear programming toolkit for Python. https://github.com/coin-or/pulp
Pedregosa, F. et al. (2011). Scikit-learn: machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Seabold, S., & Perktold, J. (2010). statsmodels: econometric and statistical modeling with Python. Proceedings of the 9th Python in Science Conference, 92–96.
Central Bank of Nigeria. (2024). Risk-based cybersecurity framework and guidelines for deposit money banks and payment service providers (revised). CBN Banking Supervision Department.
Appendix — AI Usage Statement
I used Claude (Anthropic) for two specific tasks: (1) drafting the boilerplate scaffold for the Monte Carlo simulation function and the Quarto YAML / section structure, and (2) double-checking statsmodels and lifelines syntax. All scenario design (intervention costs, breach- reduction percentages, renewal-curve shape), all interpretation of hazard ratios, the analytical question, the choice of Holt over SARIMA given the 12-observation window, the LP value-weights, and the recommendation to the board are my independent professional judgement as CTO of USL. Every numerical result is computed live in this document on the 5,035-row real ISURA dataset and cross-checked against my own knowledge of USL’s historical SLA performance before inclusion.