---
title: "Assure Insurance Brokerage — Portfolio Analysis"
subtitle: "Production Schedule, Transaction Log & Commission Returns (2021–2025)"
author: "Assure Analytics Team"
date: today
date-format: "D MMMM YYYY"
format:
html:
toc: true
toc-depth: 3
toc-title: "Contents"
toc-location: left
number-sections: true
theme: cosmo
code-fold: true
code-tools: true
fig-width: 10
fig-height: 6
fig-align: center
df-print: paged
highlight-style: github
embed-resources: true
pdf:
toc: true
number-sections: true
colorlinks: true
geometry:
- top=25mm
- left=25mm
- right=25mm
- bottom=25mm
execute:
echo: false
warning: false
message: false
cache: false
params:
data_path: "assure_all_data_combined.csv"
---
```{python}
#| label: setup
#| include: false
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import seaborn as sns
from IPython.display import display
# ── Load data ────────────────────────────────────────────────────────────────
df = pd.read_csv("assure_all_data_combined.csv")
# ── Tidy column names ────────────────────────────────────────────────────────
df.columns = df.columns.str.strip()
# ── Force all financial/numeric columns to proper numeric types ───────────────
numeric_cols = [
"SUM_INSURED", "SUM_ASSURED", "GROSS_PREMIUM", "COMMISSION",
"COMMISSION_RATE_PCT", "BROKERAGE", "NET_PREMIUM", "FIVE_PCT_NET",
"BALANCE", "OTHER_DEDUCTIONS", "TOTAL_RECEIVABLE", "AMOUNT_RECEIVED",
"AMOUNT_REMITTED", "AMOUNT_UNREMITTED", "YEAR", "SN",
]
for col in numeric_cols:
if col in df.columns:
df[col] = pd.to_numeric(df[col], errors="coerce")
# ── Coerce date columns ──────────────────────────────────────────────────────
date_cols = ["EFFECTIVE_DATE", "RENEWAL_DATE"]
for col in date_cols:
df[col] = pd.to_datetime(df[col], errors="coerce")
# ── Palette ──────────────────────────────────────────────────────────────────
PALETTE = sns.color_palette("Blues_d", 8)
sns.set_theme(style="whitegrid", palette=PALETTE)
```
# Executive Summary {#sec-executive-summary}
```{python}
#| label: exec-summary
total_policies = len(df)
total_premium = df["GROSS_PREMIUM"].sum()
total_commission = df["COMMISSION"].sum()
total_si = df["SUM_INSURED"].sum()
total_net = df["NET_PREMIUM"].sum()
years_covered = sorted(df["YEAR"].unique())
n_insurers = df["INSURER"].nunique()
n_risk_types = df["RISK_TYPE"].dropna().nunique()
print(f"Dataset spans {years_covered[0]}–{years_covered[-1]} across "
f"{df['SOURCE_SHEET'].nunique()} source sheets.")
print(f" • Total records : {total_policies:,}")
print(f" • Gross premium (₦) : {total_premium:,.0f}")
print(f" • Net premium (₦) : {total_net:,.0f}")
print(f" • Total commission(₦): {total_commission:,.0f}")
print(f" • Total sum insured : {total_si:,.0f}")
print(f" • Distinct insurers : {n_insurers}")
print(f" • Distinct risk types: {n_risk_types}")
```
> This report consolidates **three operational sheets** — Production Schedule,
> Transaction Log, and Commission Returns — into a single analytical view covering
> five underwriting years (2021–2025). Key performance indicators are presented
> across portfolio composition, premium income, commission earnings, insurer
> relationships, and data-quality diagnostics.
---
# Data Overview {#sec-data-overview}
## Source Sheets {#sec-source-sheets}
```{python}
#| label: source-sheets-table
sheet_summary = (
df.groupby("SOURCE_SHEET")
.agg(
Records=("POLICY_NO", "count"),
Years=("YEAR", lambda x: f"{x.min()}–{x.max()}"),
Gross_Premium=("GROSS_PREMIUM", "sum"),
Commission=("COMMISSION", "sum"),
)
.reset_index()
.rename(columns={"SOURCE_SHEET": "Source Sheet"})
)
sheet_summary["Gross_Premium"] = sheet_summary["Gross_Premium"].map("₦{:,.0f}".format)
sheet_summary["Commission"] = sheet_summary["Commission"].map("₦{:,.0f}".format)
display(sheet_summary)
```
## Column Inventory {#sec-column-inventory}
The combined dataset contains **50 columns** spanning underwriting, financial,
transaction, and remittance information. The table below summarises each column,
its data type, and completeness rate.
```{python}
#| label: column-inventory
col_info = pd.DataFrame({
"Column": df.columns,
"Dtype": df.dtypes.astype(str).values,
"Non-Null Count": df.notnull().sum().values,
"Completeness (%)": (df.notnull().mean() * 100).round(1).values,
})
display(col_info)
```
## Year Distribution {#sec-year-distribution}
```{python}
#| label: year-distribution
year_counts = df["YEAR"].value_counts().sort_index()
fig, ax = plt.subplots(figsize=(8, 4))
ax.bar(year_counts.index.astype(str), year_counts.values, color=PALETTE[2])
ax.set_xlabel("Underwriting Year")
ax.set_ylabel("Number of Records")
ax.set_title("Records by Underwriting Year")
for i, v in enumerate(year_counts.values):
ax.text(i, v + 5, str(v), ha="center", fontsize=9)
plt.tight_layout()
plt.show()
```
---
# Portfolio Composition {#sec-portfolio-composition}
## Risk Types {#sec-risk-types}
```{python}
#| label: risk-type-chart
risk_counts = (
df["RISK_TYPE"]
.dropna()
.value_counts()
.head(15)
)
fig, ax = plt.subplots(figsize=(9, 5))
risk_counts.sort_values().plot(kind="barh", ax=ax, color=PALETTE[3])
ax.set_xlabel("Number of Policies")
ax.set_title("Top 15 Risk Types")
plt.tight_layout()
plt.show()
```
## Type of Cover {#sec-type-of-cover}
```{python}
#| label: cover-type-chart
cover_counts = df["TYPE_OF_COVER"].dropna().value_counts().head(15)
fig, ax = plt.subplots(figsize=(9, 5))
cover_counts.sort_values().plot(kind="barh", ax=ax, color=PALETTE[4])
ax.set_xlabel("Number of Policies")
ax.set_title("Top 15 Types of Cover")
plt.tight_layout()
plt.show()
```
## Transaction Types {#sec-transaction-types}
```{python}
#| label: transaction-type-chart
txn_counts = df["TRANSACTION_TYPE"].dropna().value_counts()
fig, ax = plt.subplots(figsize=(6, 4))
ax.pie(
txn_counts.values,
labels=txn_counts.index,
autopct="%1.1f%%",
colors=PALETTE[:len(txn_counts)],
startangle=140,
)
ax.set_title("Transaction Type Mix")
plt.tight_layout()
plt.show()
```
---
# Premium Analysis {#sec-premium-analysis}
## Gross Premium by Year {#sec-premium-by-year}
```{python}
#| label: premium-by-year
prem_yr = df.groupby("YEAR")["GROSS_PREMIUM"].sum()
fig, ax = plt.subplots(figsize=(8, 4))
prem_yr.plot(kind="bar", ax=ax, color=PALETTE[2])
ax.set_xlabel("Year")
ax.set_ylabel("Gross Premium (₦)")
ax.set_title("Total Gross Premium by Underwriting Year")
ax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f"₦{x/1e6:.1f}M"))
ax.tick_params(axis="x", rotation=0)
for i, v in enumerate(prem_yr.values):
ax.text(i, v + prem_yr.max() * 0.01, f"₦{v/1e6:.1f}M", ha="center", fontsize=8)
plt.tight_layout()
plt.show()
```
## Gross vs Net Premium by Year {#sec-gross-vs-net}
```{python}
#| label: gross-vs-net
gvn = df.groupby("YEAR")[["GROSS_PREMIUM", "NET_PREMIUM"]].sum()
fig, ax = plt.subplots(figsize=(9, 5))
gvn.plot(kind="bar", ax=ax, color=[PALETTE[2], PALETTE[5]])
ax.set_xlabel("Year")
ax.set_ylabel("Amount (₦)")
ax.set_title("Gross Premium vs Net Premium by Year")
ax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f"₦{x/1e6:.1f}M"))
ax.tick_params(axis="x", rotation=0)
ax.legend(["Gross Premium", "Net Premium"])
plt.tight_layout()
plt.show()
```
## Premium by Risk Type {#sec-premium-by-risk}
```{python}
#| label: premium-by-risk
risk_prem = (
df.groupby("RISK_TYPE")["GROSS_PREMIUM"]
.sum()
.dropna()
.sort_values(ascending=False)
.head(10)
)
fig, ax = plt.subplots(figsize=(9, 5))
risk_prem.sort_values().plot(kind="barh", ax=ax, color=PALETTE[3])
ax.xaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f"₦{x/1e6:.1f}M"))
ax.set_xlabel("Gross Premium (₦)")
ax.set_title("Top 10 Risk Types by Gross Premium")
plt.tight_layout()
plt.show()
```
## Sum Insured Distribution {#sec-sum-insured}
```{python}
#| label: sum-insured-dist
si = df["SUM_INSURED"][df["SUM_INSURED"] > 0]
fig, ax = plt.subplots(figsize=(8, 4))
ax.hist(np.log10(si + 1), bins=30, color=PALETTE[2], edgecolor="white")
ax.set_xlabel("log₁₀(Sum Insured)")
ax.set_ylabel("Frequency")
ax.set_title("Distribution of Sum Insured (log scale)")
plt.tight_layout()
plt.show()
```
---
# Commission Analysis {#sec-commission-analysis}
## Commission Income by Year {#sec-commission-by-year}
```{python}
#| label: commission-by-year
comm_yr = df.groupby("YEAR")["COMMISSION"].sum()
fig, ax = plt.subplots(figsize=(8, 4))
comm_yr.plot(kind="bar", ax=ax, color=PALETTE[4])
ax.set_xlabel("Year")
ax.set_ylabel("Commission (₦)")
ax.set_title("Total Commission by Underwriting Year")
ax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f"₦{x/1e6:.1f}M"))
ax.tick_params(axis="x", rotation=0)
plt.tight_layout()
plt.show()
```
## Commission Rate Distribution {#sec-commission-rate}
```{python}
#| label: commission-rate-dist
cr = df["COMMISSION_RATE_PCT"].dropna()
cr = cr[(cr > 0) & (cr <= 100)]
fig, ax = plt.subplots(figsize=(8, 4))
ax.hist(cr, bins=25, color=PALETTE[3], edgecolor="white")
ax.set_xlabel("Commission Rate (%)")
ax.set_ylabel("Frequency")
ax.set_title("Distribution of Commission Rates")
plt.tight_layout()
plt.show()
```
## Commission by Risk Type {#sec-commission-by-risk}
```{python}
#| label: commission-by-risk
comm_risk = (
df.groupby("RISK_TYPE")["COMMISSION"]
.sum()
.dropna()
.sort_values(ascending=False)
.head(10)
)
fig, ax = plt.subplots(figsize=(9, 5))
comm_risk.sort_values().plot(kind="barh", ax=ax, color=PALETTE[5])
ax.xaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f"₦{x/1e6:.1f}M"))
ax.set_xlabel("Commission (₦)")
ax.set_title("Top 10 Risk Types by Commission Earned")
plt.tight_layout()
plt.show()
```
---
# Insurer Analysis {#sec-insurer-analysis}
## Policy Count by Insurer {#sec-insurer-policy-count}
```{python}
#| label: insurer-policy-count
ins_count = df["INSURER"].value_counts().head(15)
fig, ax = plt.subplots(figsize=(9, 5))
ins_count.sort_values().plot(kind="barh", ax=ax, color=PALETTE[2])
ax.set_xlabel("Number of Policies")
ax.set_title("Top 15 Insurers by Policy Count")
plt.tight_layout()
plt.show()
```
## Premium Volume by Insurer {#sec-insurer-premium}
```{python}
#| label: insurer-premium
ins_prem = (
df.groupby("INSURER")["GROSS_PREMIUM"]
.sum()
.sort_values(ascending=False)
.head(15)
)
fig, ax = plt.subplots(figsize=(9, 5))
ins_prem.sort_values().plot(kind="barh", ax=ax, color=PALETTE[4])
ax.xaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f"₦{x/1e6:.1f}M"))
ax.set_xlabel("Gross Premium (₦)")
ax.set_title("Top 15 Insurers by Gross Premium Volume")
plt.tight_layout()
plt.show()
```
## Insurer Share (Pie) {#sec-insurer-share}
```{python}
#| label: insurer-share-pie
top_ins = df["INSURER"].value_counts().head(8)
other = df["INSURER"].value_counts().iloc[8:].sum()
labels = list(top_ins.index) + ["Other"]
values = list(top_ins.values) + [other]
fig, ax = plt.subplots(figsize=(8, 6))
ax.pie(values, labels=labels, autopct="%1.1f%%",
colors=sns.color_palette("Blues_d", len(labels)), startangle=140)
ax.set_title("Insurer Portfolio Share (Policy Count)")
plt.tight_layout()
plt.show()
```
---
# Temporal Trends {#sec-temporal-trends}
## Monthly Premium Trend {#sec-monthly-premium}
```{python}
#| label: monthly-premium-trend
df["EFF_MONTH"] = pd.to_datetime(df["EFFECTIVE_DATE"], errors="coerce").dt.to_period("M")
monthly = df.groupby("EFF_MONTH")["GROSS_PREMIUM"].sum().dropna()
monthly.index = monthly.index.to_timestamp()
fig, ax = plt.subplots(figsize=(12, 4))
ax.plot(monthly.index, monthly.values, color=PALETTE[3], linewidth=1.5, marker="o", markersize=3)
ax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f"₦{x/1e6:.1f}M"))
ax.set_xlabel("Month")
ax.set_ylabel("Gross Premium (₦)")
ax.set_title("Monthly Gross Premium Trend")
plt.tight_layout()
plt.show()
```
## Year-on-Year Growth {#sec-yoy-growth}
```{python}
#| label: yoy-growth
yoy = df.groupby("YEAR")["GROSS_PREMIUM"].sum()
yoy_growth = yoy.pct_change() * 100
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
yoy.plot(kind="bar", ax=axes[0], color=PALETTE[2])
axes[0].set_title("Gross Premium by Year")
axes[0].yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f"₦{x/1e6:.1f}M"))
axes[0].tick_params(axis="x", rotation=0)
yoy_growth.dropna().plot(kind="bar", ax=axes[1], color=PALETTE[5])
axes[1].axhline(0, color="black", linewidth=0.8)
axes[1].set_title("Year-on-Year Growth Rate (%)")
axes[1].tick_params(axis="x", rotation=0)
plt.tight_layout()
plt.show()
```
## Renewal vs New Business by Year {#sec-renewal-vs-new}
```{python}
#| label: renewal-vs-new
txn_yr = (
df[df["TRANSACTION_TYPE"].isin(["RNL", "NEW"])]
.groupby(["YEAR", "TRANSACTION_TYPE"])["GROSS_PREMIUM"]
.sum()
.unstack(fill_value=0)
)
fig, ax = plt.subplots(figsize=(9, 5))
txn_yr.plot(kind="bar", ax=ax, color=[PALETTE[2], PALETTE[5]])
ax.set_xlabel("Year")
ax.set_ylabel("Gross Premium (₦)")
ax.set_title("Renewal (RNL) vs New Business Premium by Year")
ax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f"₦{x/1e6:.1f}M"))
ax.tick_params(axis="x", rotation=0)
plt.tight_layout()
plt.show()
```
---
# Remittance & Receipts {#sec-remittance}
## Amount Remitted vs Unremitted {#sec-remitted-vs-unremitted}
```{python}
#| label: remitted-vs-unremitted
rem = df[["AMOUNT_REMITTED", "AMOUNT_UNREMITTED"]].dropna()
totals = {
"Remitted": rem["AMOUNT_REMITTED"].sum(),
"Unremitted": rem["AMOUNT_UNREMITTED"].sum(),
}
fig, ax = plt.subplots(figsize=(6, 4))
ax.bar(totals.keys(), totals.values(), color=[PALETTE[2], PALETTE[6]])
ax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f"₦{x/1e6:.1f}M"))
ax.set_title("Total Remitted vs Unremitted")
plt.tight_layout()
plt.show()
```
## Bank of Lodgement Distribution {#sec-bank-lodgement}
```{python}
#| label: bank-lodgement
bank_counts = df["BANK_OF_LODGEMENT"].dropna().value_counts().head(10)
fig, ax = plt.subplots(figsize=(8, 4))
bank_counts.sort_values().plot(kind="barh", ax=ax, color=PALETTE[3])
ax.set_xlabel("Number of Transactions")
ax.set_title("Top Banks of Lodgement")
plt.tight_layout()
plt.show()
```
## Receipt Patterns {#sec-receipt-patterns}
```{python}
#| label: receipt-patterns
# Amount received distribution (non-zero)
ar = df["AMOUNT_RECEIVED"].dropna()
ar = ar[ar > 0]
fig, ax = plt.subplots(figsize=(8, 4))
ax.hist(np.log10(ar + 1), bins=25, color=PALETTE[4], edgecolor="white")
ax.set_xlabel("log₁₀(Amount Received)")
ax.set_ylabel("Frequency")
ax.set_title("Distribution of Amounts Received (log scale)")
plt.tight_layout()
plt.show()
```
---
# Data Quality Assessment {#sec-data-quality}
## Completeness by Column {#sec-completeness}
```{python}
#| label: completeness-chart
completeness = df.notnull().mean().sort_values() * 100
fig, ax = plt.subplots(figsize=(9, 12))
completeness.plot(kind="barh", ax=ax, color=[
PALETTE[2] if v >= 80 else PALETTE[5] if v >= 50 else PALETTE[7]
for v in completeness
])
ax.axvline(80, color="green", linewidth=1, linestyle="--", label="80% threshold")
ax.axvline(50, color="orange", linewidth=1, linestyle="--", label="50% threshold")
ax.set_xlabel("Completeness (%)")
ax.set_title("Column Completeness (%)")
ax.legend()
plt.tight_layout()
plt.show()
```
## Missing Data Summary Table {#sec-missing-data-table}
```{python}
#| label: missing-data-table
missing = pd.DataFrame({
"Column": df.columns,
"Missing Count": df.isnull().sum().values,
"Missing (%)": (df.isnull().mean() * 100).round(1).values,
}).sort_values("Missing (%)", ascending=False)
display(missing[missing["Missing Count"] > 0])
```
## Duplicate Policy Numbers {#sec-duplicates}
```{python}
#| label: duplicates
dup = df.duplicated(subset=["POLICY_NO", "YEAR"], keep=False)
print(f"Potential duplicate (POLICY_NO + YEAR) records: {dup.sum():,}")
print(f" ({dup.sum() / len(df) * 100:.1f}% of total records)")
```
---
# Appendix {#sec-appendix}
## Full Column Reference {#sec-column-reference}
| Column | Description |
|--------|-------------|
| `SOURCE_SHEET` | Origin sheet: Production Schedule, Transaction Log, or Commission Returns |
| `YEAR` | Underwriting year |
| `SN` | Serial number |
| `POLICY_NO` | Insurance policy number |
| `UNDERWRITING_DETAILS` | Free-text underwriting notes |
| `RISK_TYPE` | Category of insured risk |
| `TYPE_OF_COVER` | Specific cover type |
| `INSURED` | Name of the insured party |
| `CLIENT_NAME` | Client name (Transaction Log) |
| `NAME_OF_ASSURED` | Assured name (Production Schedule) |
| `CUSTOMER_NAME` | Customer name (Commission Returns) |
| `INSURER` | Insurance company underwriting the risk |
| `EFFECTIVE_DATE` | Policy commencement date |
| `FROM_DATE` | Cover start date |
| `TO_DATE` | Cover end date |
| `END_DATE` | Policy expiry date |
| `RENEWAL_DATE` | Date of renewal |
| `DATE_PAID` | Date premium was paid |
| `DATE_PAID_PAYABLE` | Date premium became payable |
| `SUM_INSURED` | Total insured value |
| `SUM_ASSURED` | Sum assured (life/PA products) |
| `GROSS_PREMIUM` | Full premium before deductions |
| `COMMISSION` | Brokerage commission earned |
| `COMMISSION_RATE_PCT` | Commission rate as a percentage |
| `BROKERAGE` | Brokerage amount |
| `NET_PREMIUM` | Premium after commission deduction |
| `FIVE_PCT_NET` | 5% levy on net premium |
| `BALANCE` | Outstanding balance |
| `OTHER_DEDUCTIONS` | Any other deductions applied |
| `TOTAL_RECEIVABLE` | Total amount receivable from client |
| `TRANSACTION_TYPE` | NEW / RNL (renewal) / ADD / ADDITIONAL |
| `VOUCHER_NO` | Internal voucher reference |
| `ENDORSEMENT_NO` | Policy endorsement number |
| `LEDGER_ACCOUNT_NO` | Accounting ledger reference |
| `POLICY_TENOR` | Duration of policy |
| `DEBIT_NOTE_NO` | Debit note reference |
| `CREDIT_NOTE_NO` | Credit note reference |
| `AMOUNT_RECEIVED` | Cash received from client |
| `DATE_OF_RECEIPT` | Date cash was received |
| `RECEIPT_NO` | Receipt reference number |
| `BANK_OF_LODGEMENT` | Bank where funds were deposited |
| `DATE_OF_LODGEMENT` | Lodgement date |
| `AMOUNT_REMITTED` | Amount forwarded to insurer |
| `AMOUNT_UNREMITTED` | Amount not yet remitted |
| `DATE_TO_REMIT` | Target remittance date |
| `NAME_OF_BANK` | Insurer's bank |
| `CHEQUE_NO` | Cheque number for remittance |
| `INSURER_RECEIPT_NO` | Insurer's acknowledgement receipt |
| `OTHER_BROKER` | Co-broker reference |
| `REMARKS` | Miscellaneous notes |
## Session Information {#sec-session-info}
```{python}
#| label: session-info
import sys, platform, datetime
print(f"Report generated : {datetime.datetime.now().strftime('%Y-%m-%d %H:%M')}")
print(f"Python : {sys.version}")
print(f"Platform : {platform.platform()}")
print(f"pandas : {pd.__version__}")
print(f"numpy : {np.__version__}")
print(f"matplotlib : {plt.matplotlib.__version__}")
print(f"seaborn : {sns.__version__}")
```