Assure Insurance Brokerage — Portfolio Analysis

Production Schedule, Transaction Log & Commission Returns (2021–2025)

Author

Assure Analytics Team

Published

26 May 2026

1 Executive Summary

Dataset spans 2021–2025 across 3 source sheets.
  • Total records      : 1,250
  • Gross premium (₦)  : 1,129,107,724
  • Net premium (₦)    : 979,805,811
  • Total commission(₦): 149,251,433
  • Total sum insured  : 122,654,043,051
  • Distinct insurers  : 20
  • Distinct risk types: 98

This report consolidates three operational sheets — Production Schedule, Transaction Log, and Commission Returns — into a single analytical view covering five underwriting years (2021–2025). Key performance indicators are presented across portfolio composition, premium income, commission earnings, insurer relationships, and data-quality diagnostics.

2 Data Overview

2.1 Source Sheets

	Source Sheet	Records	Years	Gross_Premium	Commission
0	Sheet1_ProductionSchedule	445	2021–2025	₦384,758,557	₦50,899,225
1	Sheet2_TransactionLog	445	2021–2025	₦384,626,563	₦50,885,096
2	Sheet3_CommissionReturns	360	2022–2025	₦359,722,604	₦47,467,112

2.2 Column Inventory

The combined dataset contains 50 columns spanning underwriting, financial, transaction, and remittance information. The table below summarises each column, its data type, and completeness rate.

	Column	Dtype	Non-Null Count	Completeness (%)
0	SOURCE_SHEET	str	1250	100.0
1	YEAR	int64	1250	100.0
2	SN	int64	1250	100.0
3	POLICY_NO	str	1250	100.0
4	UNDERWRITING_DETAILS	str	445	35.6
5	RISK_TYPE	str	360	28.8
6	TYPE_OF_COVER	str	805	64.4
7	INSURED	str	804	64.3
8	CLIENT_NAME	str	359	28.7
9	NAME_OF_ASSURED	str	445	35.6
10	CUSTOMER_NAME	str	433	34.6
11	INSURER	str	1250	100.0
12	EFFECTIVE_DATE	datetime64[us]	1250	100.0
13	FROM_DATE	str	445	35.6
14	TO_DATE	str	445	35.6
15	END_DATE	str	445	35.6
16	RENEWAL_DATE	datetime64[us]	445	35.6
17	DATE_PAID	str	441	35.3
18	DATE_PAID_PAYABLE	str	358	28.6
19	SUM_INSURED	int64	1250	100.0
20	SUM_ASSURED	float64	445	35.6
21	GROSS_PREMIUM	int64	1250	100.0
22	COMMISSION	float64	1248	99.8
23	COMMISSION_RATE_PCT	float64	357	28.6
24	BROKERAGE	float64	443	35.4
25	NET_PREMIUM	float64	1247	99.8
26	FIVE_PCT_NET	float64	445	35.6
27	BALANCE	float64	445	35.6
28	OTHER_DEDUCTIONS	float64	360	28.8
29	TOTAL_RECEIVABLE	float64	360	28.8
30	TRANSACTION_TYPE	str	445	35.6
31	VOUCHER_NO	float64	445	35.6
32	ENDORSEMENT_NO	str	405	32.4
33	LEDGER_ACCOUNT_NO	float64	445	35.6
34	POLICY_TENOR	float64	445	35.6
35	DEBIT_NOTE_NO	str	390	31.2
36	CREDIT_NOTE_NO	str	408	32.6
37	AMOUNT_RECEIVED	float64	445	35.6
38	DATE_OF_RECEIPT	str	445	35.6
39	RECEIPT_NO	float64	337	27.0
40	BANK_OF_LODGEMENT	str	445	35.6
41	DATE_OF_LODGEMENT	str	445	35.6
42	AMOUNT_REMITTED	float64	445	35.6
43	AMOUNT_UNREMITTED	float64	445	35.6
44	DATE_TO_REMIT	float64	0	0.0
45	NAME_OF_BANK	str	442	35.4
46	CHEQUE_NO	str	405	32.4
47	INSURER_RECEIPT_NO	str	409	32.7
48	OTHER_BROKER	str	445	35.6
49	REMARKS	str	1249	99.9

2.3 Year Distribution

3 Portfolio Composition

3.1 Risk Types

3.2 Type of Cover

3.3 Transaction Types

4 Premium Analysis

4.1 Gross Premium by Year

4.2 Gross vs Net Premium by Year

4.3 Premium by Risk Type

4.4 Sum Insured Distribution

5 Commission Analysis

5.1 Commission Income by Year

5.2 Commission Rate Distribution

5.3 Commission by Risk Type

6 Insurer Analysis

6.1 Policy Count by Insurer

6.2 Premium Volume by Insurer

7 Temporal Trends

7.1 Monthly Premium Trend

7.2 Year-on-Year Growth

7.3 Renewal vs New Business by Year

8 Remittance & Receipts

8.1 Amount Remitted vs Unremitted

8.2 Bank of Lodgement Distribution

8.3 Receipt Patterns

9 Data Quality Assessment

9.1 Completeness by Column

9.2 Missing Data Summary Table

	Column	Missing Count	Missing (%)
44	DATE_TO_REMIT	1250	100.0
39	RECEIPT_NO	913	73.0
18	DATE_PAID_PAYABLE	892	71.4
23	COMMISSION_RATE_PCT	893	71.4
8	CLIENT_NAME	891	71.3
28	OTHER_DEDUCTIONS	890	71.2
29	TOTAL_RECEIVABLE	890	71.2
5	RISK_TYPE	890	71.2
35	DEBIT_NOTE_NO	860	68.8
32	ENDORSEMENT_NO	845	67.6
46	CHEQUE_NO	845	67.6
36	CREDIT_NOTE_NO	842	67.4
47	INSURER_RECEIPT_NO	841	67.3
10	CUSTOMER_NAME	817	65.4
17	DATE_PAID	809	64.7
45	NAME_OF_BANK	808	64.6
24	BROKERAGE	807	64.6
13	FROM_DATE	805	64.4
15	END_DATE	805	64.4
42	AMOUNT_REMITTED	805	64.4
41	DATE_OF_LODGEMENT	805	64.4
40	BANK_OF_LODGEMENT	805	64.4
38	DATE_OF_RECEIPT	805	64.4
34	POLICY_TENOR	805	64.4
33	LEDGER_ACCOUNT_NO	805	64.4
31	VOUCHER_NO	805	64.4
37	AMOUNT_RECEIVED	805	64.4
4	UNDERWRITING_DETAILS	805	64.4
9	NAME_OF_ASSURED	805	64.4
14	TO_DATE	805	64.4
16	RENEWAL_DATE	805	64.4
30	TRANSACTION_TYPE	805	64.4
27	BALANCE	805	64.4
20	SUM_ASSURED	805	64.4
26	FIVE_PCT_NET	805	64.4
43	AMOUNT_UNREMITTED	805	64.4
48	OTHER_BROKER	805	64.4
7	INSURED	446	35.7
6	TYPE_OF_COVER	445	35.6
22	COMMISSION	2	0.2
25	NET_PREMIUM	3	0.2
49	REMARKS	1	0.1

9.3 Duplicate Policy Numbers

Potential duplicate (POLICY_NO + YEAR) records: 1,212
  (97.0% of total records)

10 Appendix

10.1 Full Column Reference

Column	Description
`SOURCE_SHEET`	Origin sheet: Production Schedule, Transaction Log, or Commission Returns
`YEAR`	Underwriting year
`SN`	Serial number
`POLICY_NO`	Insurance policy number
`UNDERWRITING_DETAILS`	Free-text underwriting notes
`RISK_TYPE`	Category of insured risk
`TYPE_OF_COVER`	Specific cover type
`INSURED`	Name of the insured party
`CLIENT_NAME`	Client name (Transaction Log)
`NAME_OF_ASSURED`	Assured name (Production Schedule)
`CUSTOMER_NAME`	Customer name (Commission Returns)
`INSURER`	Insurance company underwriting the risk
`EFFECTIVE_DATE`	Policy commencement date
`FROM_DATE`	Cover start date
`TO_DATE`	Cover end date
`END_DATE`	Policy expiry date
`RENEWAL_DATE`	Date of renewal
`DATE_PAID`	Date premium was paid
`DATE_PAID_PAYABLE`	Date premium became payable
`SUM_INSURED`	Total insured value
`SUM_ASSURED`	Sum assured (life/PA products)
`GROSS_PREMIUM`	Full premium before deductions
`COMMISSION`	Brokerage commission earned
`COMMISSION_RATE_PCT`	Commission rate as a percentage
`BROKERAGE`	Brokerage amount
`NET_PREMIUM`	Premium after commission deduction
`FIVE_PCT_NET`	5% levy on net premium
`BALANCE`	Outstanding balance
`OTHER_DEDUCTIONS`	Any other deductions applied
`TOTAL_RECEIVABLE`	Total amount receivable from client
`TRANSACTION_TYPE`	NEW / RNL (renewal) / ADD / ADDITIONAL
`VOUCHER_NO`	Internal voucher reference
`ENDORSEMENT_NO`	Policy endorsement number
`LEDGER_ACCOUNT_NO`	Accounting ledger reference
`POLICY_TENOR`	Duration of policy
`DEBIT_NOTE_NO`	Debit note reference
`CREDIT_NOTE_NO`	Credit note reference
`AMOUNT_RECEIVED`	Cash received from client
`DATE_OF_RECEIPT`	Date cash was received
`RECEIPT_NO`	Receipt reference number
`BANK_OF_LODGEMENT`	Bank where funds were deposited
`DATE_OF_LODGEMENT`	Lodgement date
`AMOUNT_REMITTED`	Amount forwarded to insurer
`AMOUNT_UNREMITTED`	Amount not yet remitted
`DATE_TO_REMIT`	Target remittance date
`NAME_OF_BANK`	Insurer’s bank
`CHEQUE_NO`	Cheque number for remittance
`INSURER_RECEIPT_NO`	Insurer’s acknowledgement receipt
`OTHER_BROKER`	Co-broker reference
`REMARKS`	Miscellaneous notes

10.2 Session Information

Report generated : 2026-05-26 18:51
Python           : 3.14.5 (tags/v3.14.5:5607950, May 10 2026, 10:43:50) [MSC v.1944 64 bit (AMD64)]
Platform         : Windows-11-10.0.22621-SP0
pandas           : 3.0.3
numpy            : 2.4.4
matplotlib       : 3.10.9
seaborn          : 0.13.2

--- title: "Assure Insurance Brokerage — Portfolio Analysis" subtitle: "Production Schedule, Transaction Log & Commission Returns (2021–2025)" author: "Assure Analytics Team" date: today date-format: "D MMMM YYYY" format: html: toc: true toc-depth: 3 toc-title: "Contents" toc-location: left number-sections: true theme: cosmo code-fold: true code-tools: true fig-width: 10 fig-height: 6 fig-align: center df-print: paged highlight-style: github embed-resources: true pdf: toc: true number-sections: true colorlinks: true geometry: - top=25mm - left=25mm - right=25mm - bottom=25mm execute: echo: false warning: false message: false cache: false params: data_path: "assure_all_data_combined.csv" --- ```{python} #| label: setup #| include: false import pandas as pd import numpy as np import matplotlib.pyplot as plt import matplotlib.ticker as mticker import seaborn as sns from IPython.display import display # ── Load data ──────────────────────────────────────────────────────────────── df = pd.read_csv("assure_all_data_combined.csv") # ── Tidy column names ──────────────────────────────────────────────────────── df.columns = df.columns.str.strip() # ── Force all financial/numeric columns to proper numeric types ─────────────── numeric_cols = [ "SUM_INSURED", "SUM_ASSURED", "GROSS_PREMIUM", "COMMISSION", "COMMISSION_RATE_PCT", "BROKERAGE", "NET_PREMIUM", "FIVE_PCT_NET", "BALANCE", "OTHER_DEDUCTIONS", "TOTAL_RECEIVABLE", "AMOUNT_RECEIVED", "AMOUNT_REMITTED", "AMOUNT_UNREMITTED", "YEAR", "SN", ] for col in numeric_cols: if col in df.columns: df[col] = pd.to_numeric(df[col], errors="coerce") # ── Coerce date columns ────────────────────────────────────────────────────── date_cols = ["EFFECTIVE_DATE", "RENEWAL_DATE"] for col in date_cols: df[col] = pd.to_datetime(df[col], errors="coerce") # ── Palette ────────────────────────────────────────────────────────────────── PALETTE = sns.color_palette("Blues_d", 8) sns.set_theme(style="whitegrid", palette=PALETTE) ``` # Executive Summary {#sec-executive-summary} ```{python} #| label: exec-summary total_policies = len(df) total_premium = df["GROSS_PREMIUM"].sum() total_commission = df["COMMISSION"].sum() total_si = df["SUM_INSURED"].sum() total_net = df["NET_PREMIUM"].sum() years_covered = sorted(df["YEAR"].unique()) n_insurers = df["INSURER"].nunique() n_risk_types = df["RISK_TYPE"].dropna().nunique() print(f"Dataset spans {years_covered[0]}–{years_covered[-1]} across " f"{df['SOURCE_SHEET'].nunique()} source sheets.") print(f" • Total records : {total_policies:,}") print(f" • Gross premium (₦) : {total_premium:,.0f}") print(f" • Net premium (₦) : {total_net:,.0f}") print(f" • Total commission(₦): {total_commission:,.0f}") print(f" • Total sum insured : {total_si:,.0f}") print(f" • Distinct insurers : {n_insurers}") print(f" • Distinct risk types: {n_risk_types}") ``` > This report consolidates **three operational sheets** — Production Schedule, > Transaction Log, and Commission Returns — into a single analytical view covering > five underwriting years (2021–2025). Key performance indicators are presented > across portfolio composition, premium income, commission earnings, insurer > relationships, and data-quality diagnostics. --- # Data Overview {#sec-data-overview} ## Source Sheets {#sec-source-sheets} ```{python} #| label: source-sheets-table sheet_summary = ( df.groupby("SOURCE_SHEET") .agg( Records=("POLICY_NO", "count"), Years=("YEAR", lambda x: f"{x.min()}–{x.max()}"), Gross_Premium=("GROSS_PREMIUM", "sum"), Commission=("COMMISSION", "sum"), ) .reset_index() .rename(columns={"SOURCE_SHEET": "Source Sheet"}) ) sheet_summary["Gross_Premium"] = sheet_summary["Gross_Premium"].map("₦{:,.0f}".format) sheet_summary["Commission"] = sheet_summary["Commission"].map("₦{:,.0f}".format) display(sheet_summary) ``` ## Column Inventory {#sec-column-inventory} The combined dataset contains **50 columns** spanning underwriting, financial, transaction, and remittance information. The table below summarises each column, its data type, and completeness rate. ```{python} #| label: column-inventory col_info = pd.DataFrame({ "Column": df.columns, "Dtype": df.dtypes.astype(str).values, "Non-Null Count": df.notnull().sum().values, "Completeness (%)": (df.notnull().mean() * 100).round(1).values, }) display(col_info) ``` ## Year Distribution {#sec-year-distribution} ```{python} #| label: year-distribution year_counts = df["YEAR"].value_counts().sort_index() fig, ax = plt.subplots(figsize=(8, 4)) ax.bar(year_counts.index.astype(str), year_counts.values, color=PALETTE[2]) ax.set_xlabel("Underwriting Year") ax.set_ylabel("Number of Records") ax.set_title("Records by Underwriting Year") for i, v in enumerate(year_counts.values): ax.text(i, v + 5, str(v), ha="center", fontsize=9) plt.tight_layout() plt.show() ``` --- # Portfolio Composition {#sec-portfolio-composition} ## Risk Types {#sec-risk-types} ```{python} #| label: risk-type-chart risk_counts = ( df["RISK_TYPE"] .dropna() .value_counts() .head(15) ) fig, ax = plt.subplots(figsize=(9, 5)) risk_counts.sort_values().plot(kind="barh", ax=ax, color=PALETTE[3]) ax.set_xlabel("Number of Policies") ax.set_title("Top 15 Risk Types") plt.tight_layout() plt.show() ``` ## Type of Cover {#sec-type-of-cover} ```{python} #| label: cover-type-chart cover_counts = df["TYPE_OF_COVER"].dropna().value_counts().head(15) fig, ax = plt.subplots(figsize=(9, 5)) cover_counts.sort_values().plot(kind="barh", ax=ax, color=PALETTE[4]) ax.set_xlabel("Number of Policies") ax.set_title("Top 15 Types of Cover") plt.tight_layout() plt.show() ``` ## Transaction Types {#sec-transaction-types} ```{python} #| label: transaction-type-chart txn_counts = df["TRANSACTION_TYPE"].dropna().value_counts() fig, ax = plt.subplots(figsize=(6, 4)) ax.pie( txn_counts.values, labels=txn_counts.index, autopct="%1.1f%%", colors=PALETTE[:len(txn_counts)], startangle=140, ) ax.set_title("Transaction Type Mix") plt.tight_layout() plt.show() ``` --- # Premium Analysis {#sec-premium-analysis} ## Gross Premium by Year {#sec-premium-by-year} ```{python} #| label: premium-by-year prem_yr = df.groupby("YEAR")["GROSS_PREMIUM"].sum() fig, ax = plt.subplots(figsize=(8, 4)) prem_yr.plot(kind="bar", ax=ax, color=PALETTE[2]) ax.set_xlabel("Year") ax.set_ylabel("Gross Premium (₦)") ax.set_title("Total Gross Premium by Underwriting Year") ax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f"₦{x/1e6:.1f}M")) ax.tick_params(axis="x", rotation=0) for i, v in enumerate(prem_yr.values): ax.text(i, v + prem_yr.max() * 0.01, f"₦{v/1e6:.1f}M", ha="center", fontsize=8) plt.tight_layout() plt.show() ``` ## Gross vs Net Premium by Year {#sec-gross-vs-net} ```{python} #| label: gross-vs-net gvn = df.groupby("YEAR")[["GROSS_PREMIUM", "NET_PREMIUM"]].sum() fig, ax = plt.subplots(figsize=(9, 5)) gvn.plot(kind="bar", ax=ax, color=[PALETTE[2], PALETTE[5]]) ax.set_xlabel("Year") ax.set_ylabel("Amount (₦)") ax.set_title("Gross Premium vs Net Premium by Year") ax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f"₦{x/1e6:.1f}M")) ax.tick_params(axis="x", rotation=0) ax.legend(["Gross Premium", "Net Premium"]) plt.tight_layout() plt.show() ``` ## Premium by Risk Type {#sec-premium-by-risk} ```{python} #| label: premium-by-risk risk_prem = ( df.groupby("RISK_TYPE")["GROSS_PREMIUM"] .sum() .dropna() .sort_values(ascending=False) .head(10) ) fig, ax = plt.subplots(figsize=(9, 5)) risk_prem.sort_values().plot(kind="barh", ax=ax, color=PALETTE[3]) ax.xaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f"₦{x/1e6:.1f}M")) ax.set_xlabel("Gross Premium (₦)") ax.set_title("Top 10 Risk Types by Gross Premium") plt.tight_layout() plt.show() ``` ## Sum Insured Distribution {#sec-sum-insured} ```{python} #| label: sum-insured-dist si = df["SUM_INSURED"][df["SUM_INSURED"] > 0] fig, ax = plt.subplots(figsize=(8, 4)) ax.hist(np.log10(si + 1), bins=30, color=PALETTE[2], edgecolor="white") ax.set_xlabel("log₁₀(Sum Insured)") ax.set_ylabel("Frequency") ax.set_title("Distribution of Sum Insured (log scale)") plt.tight_layout() plt.show() ``` --- # Commission Analysis {#sec-commission-analysis} ## Commission Income by Year {#sec-commission-by-year} ```{python} #| label: commission-by-year comm_yr = df.groupby("YEAR")["COMMISSION"].sum() fig, ax = plt.subplots(figsize=(8, 4)) comm_yr.plot(kind="bar", ax=ax, color=PALETTE[4]) ax.set_xlabel("Year") ax.set_ylabel("Commission (₦)") ax.set_title("Total Commission by Underwriting Year") ax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f"₦{x/1e6:.1f}M")) ax.tick_params(axis="x", rotation=0) plt.tight_layout() plt.show() ``` ## Commission Rate Distribution {#sec-commission-rate} ```{python} #| label: commission-rate-dist cr = df["COMMISSION_RATE_PCT"].dropna() cr = cr[(cr > 0) & (cr <= 100)] fig, ax = plt.subplots(figsize=(8, 4)) ax.hist(cr, bins=25, color=PALETTE[3], edgecolor="white") ax.set_xlabel("Commission Rate (%)") ax.set_ylabel("Frequency") ax.set_title("Distribution of Commission Rates") plt.tight_layout() plt.show() ``` ## Commission by Risk Type {#sec-commission-by-risk} ```{python} #| label: commission-by-risk comm_risk = ( df.groupby("RISK_TYPE")["COMMISSION"] .sum() .dropna() .sort_values(ascending=False) .head(10) ) fig, ax = plt.subplots(figsize=(9, 5)) comm_risk.sort_values().plot(kind="barh", ax=ax, color=PALETTE[5]) ax.xaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f"₦{x/1e6:.1f}M")) ax.set_xlabel("Commission (₦)") ax.set_title("Top 10 Risk Types by Commission Earned") plt.tight_layout() plt.show() ``` --- # Insurer Analysis {#sec-insurer-analysis} ## Policy Count by Insurer {#sec-insurer-policy-count} ```{python} #| label: insurer-policy-count ins_count = df["INSURER"].value_counts().head(15) fig, ax = plt.subplots(figsize=(9, 5)) ins_count.sort_values().plot(kind="barh", ax=ax, color=PALETTE[2]) ax.set_xlabel("Number of Policies") ax.set_title("Top 15 Insurers by Policy Count") plt.tight_layout() plt.show() ``` ## Premium Volume by Insurer {#sec-insurer-premium} ```{python} #| label: insurer-premium ins_prem = ( df.groupby("INSURER")["GROSS_PREMIUM"] .sum() .sort_values(ascending=False) .head(15) ) fig, ax = plt.subplots(figsize=(9, 5)) ins_prem.sort_values().plot(kind="barh", ax=ax, color=PALETTE[4]) ax.xaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f"₦{x/1e6:.1f}M")) ax.set_xlabel("Gross Premium (₦)") ax.set_title("Top 15 Insurers by Gross Premium Volume") plt.tight_layout() plt.show() ``` ## Insurer Share (Pie) {#sec-insurer-share} ```{python} #| label: insurer-share-pie top_ins = df["INSURER"].value_counts().head(8) other = df["INSURER"].value_counts().iloc[8:].sum() labels = list(top_ins.index) + ["Other"] values = list(top_ins.values) + [other] fig, ax = plt.subplots(figsize=(8, 6)) ax.pie(values, labels=labels, autopct="%1.1f%%", colors=sns.color_palette("Blues_d", len(labels)), startangle=140) ax.set_title("Insurer Portfolio Share (Policy Count)") plt.tight_layout() plt.show() ``` --- # Temporal Trends {#sec-temporal-trends} ## Monthly Premium Trend {#sec-monthly-premium} ```{python} #| label: monthly-premium-trend df["EFF_MONTH"] = pd.to_datetime(df["EFFECTIVE_DATE"], errors="coerce").dt.to_period("M") monthly = df.groupby("EFF_MONTH")["GROSS_PREMIUM"].sum().dropna() monthly.index = monthly.index.to_timestamp() fig, ax = plt.subplots(figsize=(12, 4)) ax.plot(monthly.index, monthly.values, color=PALETTE[3], linewidth=1.5, marker="o", markersize=3) ax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f"₦{x/1e6:.1f}M")) ax.set_xlabel("Month") ax.set_ylabel("Gross Premium (₦)") ax.set_title("Monthly Gross Premium Trend") plt.tight_layout() plt.show() ``` ## Year-on-Year Growth {#sec-yoy-growth} ```{python} #| label: yoy-growth yoy = df.groupby("YEAR")["GROSS_PREMIUM"].sum() yoy_growth = yoy.pct_change() * 100 fig, axes = plt.subplots(1, 2, figsize=(12, 4)) yoy.plot(kind="bar", ax=axes[0], color=PALETTE[2]) axes[0].set_title("Gross Premium by Year") axes[0].yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f"₦{x/1e6:.1f}M")) axes[0].tick_params(axis="x", rotation=0) yoy_growth.dropna().plot(kind="bar", ax=axes[1], color=PALETTE[5]) axes[1].axhline(0, color="black", linewidth=0.8) axes[1].set_title("Year-on-Year Growth Rate (%)") axes[1].tick_params(axis="x", rotation=0) plt.tight_layout() plt.show() ``` ## Renewal vs New Business by Year {#sec-renewal-vs-new} ```{python} #| label: renewal-vs-new txn_yr = ( df[df["TRANSACTION_TYPE"].isin(["RNL", "NEW"])] .groupby(["YEAR", "TRANSACTION_TYPE"])["GROSS_PREMIUM"] .sum() .unstack(fill_value=0) ) fig, ax = plt.subplots(figsize=(9, 5)) txn_yr.plot(kind="bar", ax=ax, color=[PALETTE[2], PALETTE[5]]) ax.set_xlabel("Year") ax.set_ylabel("Gross Premium (₦)") ax.set_title("Renewal (RNL) vs New Business Premium by Year") ax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f"₦{x/1e6:.1f}M")) ax.tick_params(axis="x", rotation=0) plt.tight_layout() plt.show() ``` --- # Remittance & Receipts {#sec-remittance} ## Amount Remitted vs Unremitted {#sec-remitted-vs-unremitted} ```{python} #| label: remitted-vs-unremitted rem = df[["AMOUNT_REMITTED", "AMOUNT_UNREMITTED"]].dropna() totals = { "Remitted": rem["AMOUNT_REMITTED"].sum(), "Unremitted": rem["AMOUNT_UNREMITTED"].sum(), } fig, ax = plt.subplots(figsize=(6, 4)) ax.bar(totals.keys(), totals.values(), color=[PALETTE[2], PALETTE[6]]) ax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f"₦{x/1e6:.1f}M")) ax.set_title("Total Remitted vs Unremitted") plt.tight_layout() plt.show() ``` ## Bank of Lodgement Distribution {#sec-bank-lodgement} ```{python} #| label: bank-lodgement bank_counts = df["BANK_OF_LODGEMENT"].dropna().value_counts().head(10) fig, ax = plt.subplots(figsize=(8, 4)) bank_counts.sort_values().plot(kind="barh", ax=ax, color=PALETTE[3]) ax.set_xlabel("Number of Transactions") ax.set_title("Top Banks of Lodgement") plt.tight_layout() plt.show() ``` ## Receipt Patterns {#sec-receipt-patterns} ```{python} #| label: receipt-patterns # Amount received distribution (non-zero) ar = df["AMOUNT_RECEIVED"].dropna() ar = ar[ar > 0] fig, ax = plt.subplots(figsize=(8, 4)) ax.hist(np.log10(ar + 1), bins=25, color=PALETTE[4], edgecolor="white") ax.set_xlabel("log₁₀(Amount Received)") ax.set_ylabel("Frequency") ax.set_title("Distribution of Amounts Received (log scale)") plt.tight_layout() plt.show() ``` --- # Data Quality Assessment {#sec-data-quality} ## Completeness by Column {#sec-completeness} ```{python} #| label: completeness-chart completeness = df.notnull().mean().sort_values() * 100 fig, ax = plt.subplots(figsize=(9, 12)) completeness.plot(kind="barh", ax=ax, color=[ PALETTE[2] if v >= 80 else PALETTE[5] if v >= 50 else PALETTE[7] for v in completeness ]) ax.axvline(80, color="green", linewidth=1, linestyle="--", label="80% threshold") ax.axvline(50, color="orange", linewidth=1, linestyle="--", label="50% threshold") ax.set_xlabel("Completeness (%)") ax.set_title("Column Completeness (%)") ax.legend() plt.tight_layout() plt.show() ``` ## Missing Data Summary Table {#sec-missing-data-table} ```{python} #| label: missing-data-table missing = pd.DataFrame({ "Column": df.columns, "Missing Count": df.isnull().sum().values, "Missing (%)": (df.isnull().mean() * 100).round(1).values, }).sort_values("Missing (%)", ascending=False) display(missing[missing["Missing Count"] > 0]) ``` ## Duplicate Policy Numbers {#sec-duplicates} ```{python} #| label: duplicates dup = df.duplicated(subset=["POLICY_NO", "YEAR"], keep=False) print(f"Potential duplicate (POLICY_NO + YEAR) records: {dup.sum():,}") print(f" ({dup.sum() / len(df) * 100:.1f}% of total records)") ``` --- # Appendix {#sec-appendix} ## Full Column Reference {#sec-column-reference} | Column | Description | |--------|-------------| | `SOURCE_SHEET` | Origin sheet: Production Schedule, Transaction Log, or Commission Returns | | `YEAR` | Underwriting year | | `SN` | Serial number | | `POLICY_NO` | Insurance policy number | | `UNDERWRITING_DETAILS` | Free-text underwriting notes | | `RISK_TYPE` | Category of insured risk | | `TYPE_OF_COVER` | Specific cover type | | `INSURED` | Name of the insured party | | `CLIENT_NAME` | Client name (Transaction Log) | | `NAME_OF_ASSURED` | Assured name (Production Schedule) | | `CUSTOMER_NAME` | Customer name (Commission Returns) | | `INSURER` | Insurance company underwriting the risk | | `EFFECTIVE_DATE` | Policy commencement date | | `FROM_DATE` | Cover start date | | `TO_DATE` | Cover end date | | `END_DATE` | Policy expiry date | | `RENEWAL_DATE` | Date of renewal | | `DATE_PAID` | Date premium was paid | | `DATE_PAID_PAYABLE` | Date premium became payable | | `SUM_INSURED` | Total insured value | | `SUM_ASSURED` | Sum assured (life/PA products) | | `GROSS_PREMIUM` | Full premium before deductions | | `COMMISSION` | Brokerage commission earned | | `COMMISSION_RATE_PCT` | Commission rate as a percentage | | `BROKERAGE` | Brokerage amount | | `NET_PREMIUM` | Premium after commission deduction | | `FIVE_PCT_NET` | 5% levy on net premium | | `BALANCE` | Outstanding balance | | `OTHER_DEDUCTIONS` | Any other deductions applied | | `TOTAL_RECEIVABLE` | Total amount receivable from client | | `TRANSACTION_TYPE` | NEW / RNL (renewal) / ADD / ADDITIONAL | | `VOUCHER_NO` | Internal voucher reference | | `ENDORSEMENT_NO` | Policy endorsement number | | `LEDGER_ACCOUNT_NO` | Accounting ledger reference | | `POLICY_TENOR` | Duration of policy | | `DEBIT_NOTE_NO` | Debit note reference | | `CREDIT_NOTE_NO` | Credit note reference | | `AMOUNT_RECEIVED` | Cash received from client | | `DATE_OF_RECEIPT` | Date cash was received | | `RECEIPT_NO` | Receipt reference number | | `BANK_OF_LODGEMENT` | Bank where funds were deposited | | `DATE_OF_LODGEMENT` | Lodgement date | | `AMOUNT_REMITTED` | Amount forwarded to insurer | | `AMOUNT_UNREMITTED` | Amount not yet remitted | | `DATE_TO_REMIT` | Target remittance date | | `NAME_OF_BANK` | Insurer's bank | | `CHEQUE_NO` | Cheque number for remittance | | `INSURER_RECEIPT_NO` | Insurer's acknowledgement receipt | | `OTHER_BROKER` | Co-broker reference | | `REMARKS` | Miscellaneous notes | ## Session Information {#sec-session-info} ```{python} #| label: session-info import sys, platform, datetime print(f"Report generated : {datetime.datetime.now().strftime('%Y-%m-%d %H:%M')}") print(f"Python : {sys.version}") print(f"Platform : {platform.platform()}") print(f"pandas : {pd.__version__}") print(f"numpy : {np.__version__}") print(f"matplotlib : {plt.matplotlib.__version__}") print(f"seaborn : {sns.__version__}") ```