Compliance Workload Drivers and Risk Concentration at Grene Capital

An Exploratory and Inferential Analysis of Multi-Layered Regulatory Activity Across a Lagos Office and an Abuja Retail Real Estate Portfolio (January 2024 – December 2025)

Author

Temitope Ikiseh

Published

May 16, 2026

1 Executive Summary

Grene Capital, a Nigerian private equity firm with ten staff and two real estate portfolio companies — an office building in Lagos (Portco A) and a retail centre in Abuja FCT (Portco B) — generated 293 documented compliance findings across the fund manager and both portfolios between January 2024 and December 2025. The primary dataset combines the firm’s regulatory reporting calendar (covering SEC, NFIU, FRCN, NRS, FMAN obligations from weekly AML/CFT returns through annual filings) with statutory audit management letter items, jurisdiction-specific real estate regulator correspondence (LASBCA, Lagos Lands Bureau and LIRS for the Lagos asset; AMMC, AGIS/FCDA and FCT-IRS for the Abuja asset), board paper compliance items and AGM minutes. Three supplementary instruments triangulate the primary dataset: a staff pulse survey (n = 7 of 10), an external consultant assessment (n = 15 covering seven regulator-quarter combinations), and a portfolio CFO quarterly self-report (n = 13).

Three findings converge across the analytical techniques. First, a Pareto pattern dominates financial exposure: just seven findings (2.4% of the total) account for 80% of the ₦ impact, justifying a risk-tiered review process rather than uniform handling. Second, resolution time differs sharply by source (Kruskal-Wallis p < 0.001, ε² = 0.43), with statutory audit items taking 74 days at the median versus 17 days for internal items — a 4.3× difference, with Abuja Retail consistently slower than Lagos Office across most regulator categories. Third, recurrence depends strongly on root cause (χ² p < 0.001, Cramer’s V = 0.35): documentation-related findings recur at 56% versus system-related findings at only 8% — a seven-fold ratio. A logistic regression confirms this pattern (AUC = 0.78; documentation root cause and consultant engagement both significantly elevate recurrence odds).

The single recommendation is to stand up a documentation hygiene program with mandatory finding-creation tagging, prioritising the seven high-impact findings identified by the Pareto. Halving the documentation-driven recurrence rate would prevent an estimated 15 recurring findings annually.

2 Professional Disclosure

I serve as Head of Compliance at Grene Capital, a Nigerian private equity firm. Our team comprises ten staff distributed across investment, operations, finance, investor relations, and the executive function. We manage two funds with a combined portfolio of two real estate operating companies — an office building in Lagos (Portco A) and a retail centre in Abuja FCT (Portco B) — and we deal exclusively with institutional limited partners. My remit covers compliance at three layers: at the fund manager I am responsible for SEC Nigeria fund manager registration and returns, FRCN reporting, NDPC obligations under the Nigeria Data Protection Act 2023, AML/CFT returns to the SEC and NFIU, and fit-and-proper certifications; at the fund layer I handle LP reporting, side-letter compliance, audit liaison, and ESG questionnaire responses; and at the portfolio level I oversee jurisdictional compliance — covering CAC, NRS (Nigeria Revenue Service, formerly FIRS, renamed under the Nigeria Tax Administration Act 2025), NDPC and NESREA federally, plus LIRS, LASBCA, Lagos Lands Bureau, MEPB, LASEPA and LASRRA for the Lagos asset, and FCT-IRS, AMMC Building Control, AGIS/FCDA, AEPB and DDC Planning for the Abuja asset.

The five techniques required by Case Study 1 each address a recurring decision in this role.

Exploratory Data Analysis is the foundation of my quarterly compliance committee paper. EDA formalises what I currently do informally in spreadsheets — surfacing skewed distributions, missingness patterns, and outlier findings that drive most of the financial and reputational exposure. For this study I apply EDA to the full population of findings, identifying two data-quality issues (right-censored open findings; structural blanks in the root_cause field for non-exception items) and documenting their treatment.

Data visualisation is the medium of communication with our partners and our LPs. The grammar-of-graphics framework gives me a principled basis for choosing chart types — boxplots for severity-by-source comparisons; heatmaps for portfolio-by-source intersection; Pareto curves for financial impact concentration. The five-plot narrative answers: how much, where, when, of what severity, and what is open versus closed.

Hypothesis testing replaces gut feel. Two questions repeat in my quarterly reviews. First: do regulator-driven findings actually take longer to resolve than internal ones? Second: does the type of root cause behind a finding predict whether it will recur? Kruskal-Wallis with Dunn post-hoc and chi-squared with Cramer’s V answer these directly.

Correlation analysis informs the design of our 2026 compliance scorecard. A Spearman correlation matrix tells me which inputs co-move (severity and external cost; severity and consultant engagement) and which are independent. Partial correlation, holding portfolio constant, isolates within-portco patterns from between-portco confounding.

Logistic regression is the technique with the highest operational stake. Recurrence is the single feature that distinguishes a controllable compliance environment from one heading toward regulator escalation. A model predicting recurrence from finding characteristics tells me where to invest in process redesign rather than headcount.

3 Data Collection and Sampling

3.1 Primary dataset

The primary dataset combines four extraction streams: (i) the firm’s structured regulatory reporting calendar at the fund-manager level, covering 27 distinct obligations from weekly to annual frequency; (ii) statutory audit management letters issued for each entity in respect of FY2023 and FY2024; (iii) regulator correspondence files for both portfolio companies covering the Lagos and Abuja FCT regulatory environments; and (iv) board paper compliance items and AGM minutes for both portfolios. The dataset comprises a complete enumeration of qualifying compliance findings, exceptions, breaches, queries, and observations between 1 January 2024 and 31 December 2025 (24 months) — that is, the population is the sample. Inferential statistics in this paper therefore describe the within-firm compliance process rather than estimating a parameter for the broader Nigerian PE sector. Sample size: 293 findings spanning eight quarters and 16 variables (1 ID, 2 categorical layer variables, 2 dates, 1 derived numeric resolution time, 4 categorical descriptors, 2 numeric financial fields, 2 binary flags, 1 ordinal severity, 1 categorical root cause, 1 derived quarter, 1 QC flag).

3.2 Supplementary instruments

Three instruments triangulate the primary dataset:

Instrument	n	Coverage	Response context
Survey A — Internal staff pulse	7	All firm functions: Legal & Compliance (2), Operations & Finance (2), Executive (1), Investment (1), Administration (1)	70% response rate from a 10-person team
Survey B — External consultant	15	Seven regulator-quarter combinations covering SEC, NRS, FRCN, CAC, LIRS, FRSC across 2025-Q4 and 2026-Q1	Single consultant assessor
Survey C — Portfolio CFO	13	Portco A: 8 submissions covering Q4 2024 through Q1 2026; Portco B: 5 submissions covering Q4 2024 through Q4 2025	Two respondents

Given the small sample sizes of the supplementary instruments, they are used to triangulate patterns observed in the primary dataset rather than to support independent inferential claims. The 70% Survey A response rate is itself a finding — high engagement on a compliance survey signals institutional seriousness about the topic.

3.3 Anonymisation and authorisation

All portfolio company names, audit firm names, LP names, individual staff names from both the firm and regulator correspondence, and specific transaction values that could re-identify any party have been replaced with codes prior to publication. The mapping is held offline and is not part of the published version of this document. Written authorisation from Grene Capital’s Managing Partner permits the use of these anonymised data for this academic capstone (see Appendix B).

4 Data Description

Load and clean the primary dataset and surveys (R)

findings_path <- "Grene_Compliance_Findings_Log_v3.xlsx"

df <- read_excel(findings_path, sheet = "FindingsLog") |>
  clean_names() |>
  filter(!is.na(finding_id)) |>
  mutate(
    date_identified     = as.Date(date_identified),
    date_closed         = as.Date(date_closed),
    resolution_days     = as.numeric(date_closed - date_identified),
    severity            = as.integer(severity),
    consultant_engaged  = as.integer(consultant_engaged),
    recurrence_flag     = as.integer(recurrence_flag),
    portfolio           = factor(portfolio,
                                 levels = c("Fund Manager", "Portco A", "Portco B")),
    portco_subtype      = factor(portco_subtype),
    finding_source      = factor(finding_source),
    regulator_or_counterparty = factor(regulator_or_counterparty),
    root_cause_category = factor(na_if(root_cause_category, "")),
    # Derive quarter (don't rely on formula evaluation)
    quarter_identified  = factor(paste0(year(date_identified), "-Q",
                                         quarter(date_identified)))
  )

# Surveys
survey_A <- read_csv("survey_A_internal_clean.csv", show_col_types = FALSE)
survey_B <- read_csv("survey_B_consultant_clean.csv", show_col_types = FALSE)
survey_C <- read_csv("survey_C_portfolio_clean.csv", show_col_types = FALSE)

cat("Primary dataset:", nrow(df), "rows ×", ncol(df), "columns\n")

Primary dataset: 293 rows × 18 columns

Load and clean the primary dataset and surveys (R)

cat("Survey A (Staff):", nrow(survey_A), "respondents\n")

Survey A (Staff): 7 respondents

Load and clean the primary dataset and surveys (R)

cat("Survey B (Consultant):", nrow(survey_B), "submissions\n")

Survey B (Consultant): 15 submissions

Load and clean the primary dataset and surveys (R)

cat("Survey C (Portfolio CFO):", nrow(survey_C), "submissions\n")

Survey C (Portfolio CFO): 13 submissions

Show code

df |>
  count(portfolio, portco_subtype) |>
  kable(caption = "Findings count by layer and subtype") |>
  kable_styling(font_size = 11)

Findings count by layer and subtype
portfolio	portco_subtype	n
Fund Manager	Fund Manager	154
Portco A	Office (Commercial)	72
Portco B	Retail (Commercial)	67

Show code

findings_path = "Grene_Compliance_Findings_Log_v3.xlsx"
df = pd.read_excel(findings_path, sheet_name="FindingsLog")
df.columns = df.columns.str.strip().str.lower().str.replace(" ", "_")
df = df[df["finding_id"].notna()].copy()
df["date_identified"] = pd.to_datetime(df["date_identified"])
df["date_closed"]     = pd.to_datetime(df["date_closed"])
df["resolution_days"] = (df["date_closed"] - df["date_identified"]).dt.days
for c in ["severity", "consultant_engaged", "recurrence_flag"]:
    df[c] = pd.to_numeric(df[c], errors="coerce").astype("Int64")
df["root_cause_category"] = df["root_cause_category"].replace("", np.nan)
df["quarter_identified"] = (df["date_identified"].dt.year.astype(str)
                            + "-Q" + df["date_identified"].dt.quarter.astype(str))

# Surveys
survey_A = pd.read_csv("survey_A_internal_clean.csv")
survey_B = pd.read_csv("survey_B_consultant_clean.csv")
survey_C = pd.read_csv("survey_C_portfolio_clean.csv")

print(f"Primary: {len(df)} rows × {df.shape[1]} cols")

Primary: 293 rows × 18 cols

Show code

print(f"Survey A: {len(survey_A)}, Survey B: {len(survey_B)}, Survey C: {len(survey_C)}")

Survey A: 7, Survey B: 15, Survey C: 13

Show code

print(pd.crosstab(df["portfolio"], df["portco_subtype"]))

portco_subtype  Fund Manager  Office (Commercial)  Retail (Commercial)
portfolio                                                             
Fund Manager             154                    0                    0
Portco A                   0                   72                    0
Portco B                   0                    0                   67

5 Technique 1 — Exploratory Data Analysis

5.1 Theory recap

EDA, as developed by Tukey and operationalised in Chapter 4 of Adi (2026), is the systematic visual and numerical inspection of a dataset before any model fitting. Its purpose is to surface distributional shape, missing-value patterns, outliers, and unexpected relationships — the anomalies that determine whether downstream inferential techniques are appropriate at all. Anscombe’s Quartet illustrates the principle: summary statistics alone are insufficient because four datasets with identical means, variances and correlations can have radically different visual structures.

5.2 Business justification

Before running any test, I need to know whether resolution_days is normally distributed — the mix of one-day weekly AML/CFT filings and multi-month regulator queries makes this unlikely. I need to distinguish genuinely-open items from poorly-logged ones, and identify extreme financial-impact outliers because these dominate any unweighted mean.

5.3 Code and outputs

Show code

df |> summarise(across(everything(), ~ sum(is.na(.)))) |>
  pivot_longer(everything(), names_to = "var", values_to = "n_missing") |>
  arrange(desc(n_missing)) |>
  filter(n_missing > 0) |>
  kable(caption = "Missing values per variable")

Missing values per variable
var	n_missing
root_cause_category	122
date_closed	8
resolution_days	8

Show code

p1 <- df |> filter(!is.na(resolution_days)) |>
  ggplot(aes(resolution_days)) +
  geom_histogram(bins = 30, fill = "steelblue", colour = "white") +
  labs(title = "Distribution of resolution_days",
       x = "Days to closure", y = "Findings")

p2 <- df |> filter(!is.na(financial_impact_ngn), financial_impact_ngn > 0) |>
  ggplot(aes(financial_impact_ngn)) +
  geom_histogram(bins = 25, fill = "darkorange", colour = "white") +
  scale_x_log10(labels = scales::label_number(scale_cut = scales::cut_short_scale())) +
  labs(title = "Financial impact (log scale, ₦)", x = "₦ (log10)", y = "Findings")

p1 + p2

Show code

df |> filter(!is.na(resolution_days)) |>
  identify_outliers(resolution_days) |>
  dplyr::select(finding_id, portfolio, finding_source, severity, resolution_days,
                is.outlier, is.extreme) |>
  arrange(desc(resolution_days)) |>
  head(10) |>
  kable(caption = "Top 10 outliers in resolution_days (IQR rule)")

Top 10 outliers in resolution_days (IQR rule)
finding_id	portfolio	finding_source	severity	resolution_days	is.outlier	is.extreme
F-AUD-053	Portco B	Statutory audit	5	203	TRUE	TRUE
F-AUD-029	Portco A	Statutory audit	4	191	TRUE	TRUE
F-AUD-024	Portco A	Statutory audit	4	175	TRUE	TRUE
F-AUD-007	Fund Manager	Statutory audit	5	168	TRUE	FALSE
F-B-039	Portco B	Sectoral regulator	5	160	TRUE	FALSE
F-B-035	Portco B	NDPC	5	149	TRUE	FALSE
F-A-010	Portco A	Sectoral regulator	5	148	TRUE	FALSE
F-B-027	Portco B	Tax authority	5	141	TRUE	FALSE
F-AUD-033	Portco A	Statutory audit	4	139	TRUE	FALSE
F-AUD-009	Fund Manager	Statutory audit	4	127	TRUE	FALSE

Show code

miss = df.isna().sum().sort_values(ascending=False)
print("Missing values per column:")

Missing values per column:

Show code

print(miss[miss > 0])

root_cause_category    122
resolution_days          8
date_closed              8
dtype: int64

Show code

fig, axes = plt.subplots(1, 2, figsize=(11, 4))
df["resolution_days"].dropna().hist(bins=30, ax=axes[0],
                                    color="steelblue", edgecolor="white")
axes[0].set_title("Distribution of resolution_days")
axes[0].set_xlabel("Days to closure")

vals = df.loc[df["financial_impact_ngn"] > 0, "financial_impact_ngn"]
axes[1].hist(np.log10(vals + 1), bins=25, color="darkorange", edgecolor="white")
axes[1].set_title("Financial impact (log10 ₦)")
plt.tight_layout(); plt.show()

5.4 Plain-language interpretation

The distribution of resolution_days (n = 285 closed findings) is heavily right-skewed (skewness 1.90, Shapiro-Wilk p < 10⁻¹⁸). The median sits at 25 days, but the mean is 38 days — a 50% inflation that betrays the long right tail. 59.6% of findings close within 30 days; 8.4% take more than 90 days, and these extreme cases are concentrated in statutory audit management letter items where remediation depends on the next financial reporting cycle. This non-normality is the single most important EDA finding because it dictates the choice of non-parametric tests in Section 7.

Two data-quality issues required handling. First, eight findings (2.7%) have no date_closed because they remained open at the end of the analytical window. I treated these as right-censored: they are excluded from resolution_days distributional analysis but reported separately as the open-finding backlog in the visualisation section (Plot 5). Second, 122 rows have a blank root_cause_category, which is by design: clean (non-exception) filings have no root cause because nothing went wrong. I filtered these out before the chi-squared test in Section 7 rather than treating them as missing data, since “no root cause applicable” is qualitatively different from “root cause unknown”.

A particularly informative EDA output is the financial impact Pareto: of the 24 findings with non-zero financial impact, just seven findings drive 80% of the total ₦ exposure — that’s 2.4% of all 293 findings. This single statistic is the basis for the risk-tiered review recommendation in Section 10.

6 Technique 2 — Data Visualisation

6.1 Theory recap

The grammar of graphics (Wilkinson; Wickham) decomposes a chart into data, aesthetics, geometry and faceting. The discipline it imposes — choosing the chart that fits the question rather than the chart that looks most impressive — matters more than the toolkit. Distributional comparisons across categories belong in boxplots, not bar-of-means. Two-way frequency density belongs in heatmaps, not stacked bars. Cumulative concentration is naturally a Pareto.

6.2 Business justification

The compliance committee receives a one-page summary each quarter. Five plots must answer five business questions: how much, where, when, of what severity, and what is open versus closed. The visualisation narrative needs to function as a single story rather than five disconnected images.

6.3 Five plots that tell one story

Show code

v1 <- df |> count(quarter_identified, portfolio) |>
  ggplot(aes(quarter_identified, n, fill = portfolio)) +
  geom_col(position = "dodge") +
  scale_fill_brewer(palette = "Set2") +
  labs(title = "1. Findings volume by quarter and layer",
       x = NULL, y = "Findings") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

v2 <- df |> ggplot(aes(finding_source, severity, fill = finding_source)) +
  geom_boxplot(alpha = .7) +
  labs(title = "2. Severity distribution by source",
       x = NULL, y = "Severity (1-5)") +
  theme(legend.position = "none",
        axis.text.x = element_text(angle = 30, hjust = 1))

v3 <- df |> filter(!is.na(resolution_days)) |>
  group_by(portfolio, finding_source) |>
  summarise(median_days = median(resolution_days), .groups = "drop") |>
  ggplot(aes(finding_source, portfolio, fill = median_days)) +
  geom_tile() +
  geom_text(aes(label = round(median_days, 0)), colour = "white", fontface = "bold") +
  scale_fill_viridis_c() +
  labs(title = "3. Median resolution time by layer × source",
       x = NULL, y = NULL, fill = "Days") +
  theme(axis.text.x = element_text(angle = 30, hjust = 1))

v4 <- df |> filter(!is.na(financial_impact_ngn), financial_impact_ngn > 0) |>
  arrange(desc(financial_impact_ngn)) |>
  mutate(rank = row_number(),
         cum_share = cumsum(financial_impact_ngn) / sum(financial_impact_ngn)) |>
  ggplot(aes(rank, cum_share)) +
  geom_line(linewidth = 1.1, colour = "darkred") +
  geom_hline(yintercept = .8, linetype = 2) +
  scale_y_continuous(labels = percent) +
  labs(title = "4. Pareto — cumulative ₦ share by finding rank",
       x = "Finding rank", y = "Cumulative share")

v5 <- df |> mutate(status = if_else(is.na(date_closed), "Open", "Closed")) |>
  count(quarter_identified, status) |>
  ggplot(aes(quarter_identified, n, fill = status)) +
  geom_col() +
  scale_fill_manual(values = c("Closed" = "#5B9BD5", "Open" = "#E25822")) +
  labs(title = "5. Open vs closed by quarter of identification",
       x = NULL, y = "Findings") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

(v1 + v2) / (v3 + v4) / v5

Show code

fig, axes = plt.subplots(3, 2, figsize=(13, 13))

# 1. Volume × layer
(df.groupby(["quarter_identified","portfolio"]).size().unstack()
   .plot(kind="bar", ax=axes[0,0]))
axes[0,0].set_title("1. Findings by quarter and layer"); axes[0,0].set_xlabel("")

# 2. Severity by source
sns.boxplot(data=df, x="finding_source", y="severity", ax=axes[0,1])
axes[0,1].set_title("2. Severity distribution by source")
axes[0,1].tick_params(axis="x", rotation=30)

# 3. Heatmap
piv = (df.dropna(subset=["resolution_days"])
         .groupby(["portfolio","finding_source"])["resolution_days"]
         .median().unstack())
sns.heatmap(piv, annot=True, fmt=".0f", cmap="viridis", ax=axes[1,0])
axes[1,0].set_title("3. Median resolution_days by layer × source")

# 4. Pareto
fi = (df.loc[df["financial_impact_ngn"]>0, "financial_impact_ngn"]
        .sort_values(ascending=False).reset_index(drop=True))
cum = fi.cumsum()/fi.sum()
axes[1,1].plot(cum.index+1, cum.values, color="darkred")
axes[1,1].axhline(.8, linestyle="--")
axes[1,1].set_title("4. Pareto — cumulative ₦ share by finding rank")

# 5. Open vs closed
status = df.assign(status=np.where(df["date_closed"].isna(),"Open","Closed"))
(status.groupby(["quarter_identified","status"]).size().unstack()
   .plot(kind="bar", stacked=True, ax=axes[2,0]))
axes[2,0].set_title("5. Open vs closed by quarter")
axes[2,1].axis("off")

(np.float64(0.0), np.float64(1.0), np.float64(0.0), np.float64(1.0))

Show code

plt.tight_layout(); plt.show()

6.4 Interpretation

Plot 1 shows that Fund Manager filings dominate volume across every quarter (calendar-driven AML/CFT and SEC obligations are the structural load), with portfolio activity concentrated in Q2 2025 — the quarter in which the FY2024 statutory audits closed for both Portco A and Portco B and produced the majority of audit management letter items. Plot 2 confirms what compliance professionals know intuitively: NDPC, statutory audit and sectoral regulator findings carry meaningfully higher severity (median 3) than internal items (median 2), with tax authority items sitting between. Plot 3 is the most operationally useful chart — the layer × source heatmap of median resolution days — showing statutory audit items take 70–90 days at every layer, while internal items at the fund manager close in 17 days. Plot 4 visualises the Pareto introduced in Section 5: 80% of financial impact concentrates in the top seven findings. Plot 5 shows that the open-finding backlog is stable through 2024 and the first half of 2025, with a small uptick in Q4 2025 reflecting items identified late in the window. Taken together, the five plots tell one story: the volume problem is filings discipline, but the risk problem is the small number of high-severity, slow-resolving audit and regulator items.

7 Technique 3 — Hypothesis Testing

7.1 Hypotheses

H1 — Resolution time differs across finding sources.

\(H_0\): median resolution_days equal across all finding_source levels.
\(H_1\): medians differ in at least one pair.
Test: Kruskal-Wallis (justified by right-skew of resolution_days confirmed in Section 5); Dunn post-hoc with Bonferroni correction.
Effect size: \(\epsilon^2\).

H2 — Recurrence is not independent of root cause.

\(H_0\): recurrence_flag independent of root_cause_category.
\(H_1\): recurrence_flag depends on root_cause_category.
Test: Pearson chi-squared (no expected cell < 5 violation).
Effect size: Cramer’s V.

7.2 Code and outputs

Show code

df_h <- df |> filter(!is.na(resolution_days))

cat("Shapiro-Wilk normality test:\n")

Shapiro-Wilk normality test:

Show code

print(shapiro.test(sample(df_h$resolution_days, min(5000, nrow(df_h)))))


    Shapiro-Wilk normality test

data:  sample(df_h$resolution_days, min(5000, nrow(df_h)))
W = 0.79668, p-value < 0.00000000000000022

Show code

cat("\nH1 — Kruskal-Wallis:\n")


H1 — Kruskal-Wallis:

Show code

print(df_h |> kruskal_test(resolution_days ~ finding_source))

# A tibble: 1 × 6
  .y.                 n statistic    df        p method        
* <chr>           <int>     <dbl> <int>    <dbl> <chr>         
1 resolution_days   285      124.     5 4.97e-25 Kruskal-Wallis

Show code

cat("\nEffect size:\n")


Effect size:

Show code

print(df_h |> kruskal_effsize(resolution_days ~ finding_source))

# A tibble: 1 × 5
  .y.                 n effsize method  magnitude
* <chr>           <int>   <dbl> <chr>   <ord>    
1 resolution_days   285   0.426 eta2[H] large

Show code

cat("\nDunn post-hoc (Bonferroni-adjusted, significant pairs only):\n")


Dunn post-hoc (Bonferroni-adjusted, significant pairs only):

Show code

df_h |> dunn_test(resolution_days ~ finding_source, p.adjust.method = "bonferroni") |>
  filter(p.adj < 0.05) |>
  dplyr::select(group1, group2, p.adj) |>
  kable()

group1	group2	p.adj
Internal	NDPC	0.0008872
Internal	Sectoral regulator	0.0000077
Internal	Statutory audit	0.0000000
Internal	Tax authority	0.0034685
Sectoral regulator	Statutory audit	0.0001571
Statutory audit	Tax authority	0.0002387

Show code

# H2
df_rc <- df |> filter(!is.na(root_cause_category))
tab <- table(df_rc$root_cause_category, df_rc$recurrence_flag)
cat("\nContingency table (rows = root_cause, cols = recurrence_flag):\n")


Contingency table (rows = root_cause, cols = recurrence_flag):

Show code

print(tab)

                           
                             0  1
  Documentation             24 31
  External regulator change 11  7
  People                    15  6
  Process                   39 14
  System                    22  2

Show code

cat("\nH2 — Chi-squared:\n")


H2 — Chi-squared:

Show code

print(chisq.test(tab))


    Pearson's Chi-squared test

data:  tab
X-squared = 20.729, df = 4, p-value = 0.0003583

Show code

cat("\nCramer's V:\n")


Cramer's V:

Show code

print(rstatix::cramer_v(tab))

[1] 0.3481734

Show code

df_h = df.dropna(subset=["resolution_days"]).copy()

sw = stats.shapiro(df_h["resolution_days"].sample(min(5000, len(df_h)), random_state=0))
print(f"Shapiro-Wilk p = {sw.pvalue:.4g}")

Shapiro-Wilk p = 1.418e-18

Show code

groups = [g["resolution_days"].values for _, g in df_h.groupby("finding_source") if len(g) > 5]
kw = stats.kruskal(*groups)
H, k, n = kw.statistic, len(groups), len(df_h)
eps2 = (H - k + 1) / (n - k)
print(f"\nH1 Kruskal-Wallis: H={H:.2f}, p={kw.pvalue:.4g}, epsilon²={eps2:.3f}")


H1 Kruskal-Wallis: H=123.09, p=1.169e-25, epsilon²=0.425

Show code

df_rc = df.dropna(subset=["root_cause_category"])
tab = pd.crosstab(df_rc["root_cause_category"], df_rc["recurrence_flag"])
chi2, p, dof, _ = stats.chi2_contingency(tab)
v = np.sqrt(chi2 / (tab.values.sum() * (min(tab.shape) - 1)))
print(f"\nH2 Chi-squared: chi2={chi2:.2f}, dof={dof}, p={p:.4g}, V={v:.3f}")


H2 Chi-squared: chi2=20.73, dof=4, p=0.0003583, V=0.348

Show code

print("\nContingency table:")


Contingency table:

Show code

print(tab)

recurrence_flag             0   1
root_cause_category              
Documentation              24  31
External regulator change  11   7
People                     15   6
Process                    39  14
System                     22   2

7.3 Interpretation

H1 is rejected with strong evidence (H = 123.1, p < 10⁻²⁵, ε² = 0.43). Resolution times differ materially across finding sources — and the effect size is large, not just statistically detectable. Median resolution days range from 17 (internal items at the Fund Manager) to 74 (statutory audit findings) — a 4.3× ratio. The post-hoc Dunn test confirms that statutory audit findings differ significantly from internal items (p < 0.001) and from tax authority items (p < 0.01); sectoral regulator findings also differ significantly from internal items. The operational implication is that the firm should not benchmark all findings against a single resolution target; audit-source items deserve a different KPI from internal compliance work.

H2 is rejected with strong evidence (χ² = 20.7, p < 0.001, Cramer’s V = 0.35). Recurrence depends materially on root cause. The recurrence rate ranges from 8.3% for system-driven findings to 56.4% for documentation-driven findings — a near seven-fold ratio. This is the single most consequential statistical result in the analysis because it identifies the highest-leverage process-improvement target: documentation hygiene rather than system replacement.

8 Technique 4 — Correlation Analysis

8.1 Code and outputs

Show code

num_df <- df |>
  dplyr::select(severity, financial_impact_ngn, external_cost_ngn,
               resolution_days, consultant_engaged, recurrence_flag) |>
  mutate(across(everything(), as.numeric)) |>
  drop_na()

cor_mat <- cor(num_df, method = "spearman")
ggcorrplot(cor_mat, type = "lower", lab = TRUE, lab_size = 3,
           colors = c("#3B528B", "white", "#E25822")) +
  labs(title = "Spearman correlation matrix")

Show code

df_p <- df |> filter(!is.na(resolution_days)) |>
  mutate(portfolio_num = as.integer(portfolio))
cat("\nPartial correlation: severity ↔ resolution_days | portfolio\n")


Partial correlation: severity ↔ resolution_days | portfolio

Show code

print(ppcor::pcor.test(df_p$severity, df_p$resolution_days, df_p$portfolio_num,
                       method = "spearman"))

   estimate              p.value statistic   n gp   Method
1 0.4014863 0.000000000002002683  7.361457 285  1 spearman

Show code

num = df[["severity","financial_impact_ngn","external_cost_ngn",
          "resolution_days","consultant_engaged","recurrence_flag"]
        ].apply(pd.to_numeric, errors="coerce").dropna()
corr = num.corr(method="spearman")

fig, ax = plt.subplots(figsize=(8,6))
mask = np.triu(np.ones_like(corr, dtype=bool), k=1)
sns.heatmap(corr, mask=mask, annot=True, fmt=".2f",
            cmap="RdBu_r", center=0, ax=ax, square=True)

<Axes: >

Show code

ax.set_title("Spearman correlation matrix")

Text(0.5, 1.0, 'Spearman correlation matrix')

Show code

plt.tight_layout(); plt.show()

Show code

df_p = df.dropna(subset=["resolution_days"]).copy()
df_p["portfolio_num"] = df_p["portfolio"].astype("category").cat.codes
print(pg.partial_corr(df_p, x="severity", y="resolution_days",
                      covar="portfolio_num", method="spearman"))

            n         r         CI95         p_val
spearman  285  0.401486  [0.3, 0.49]  2.002683e-12

8.2 Interpretation

Three correlations are most informative for the 2026 compliance scorecard design. Severity and external cost correlate strongly (ρ = +0.60, p < 10⁻³⁰): the more severe an item, the more consultant and legal spend it attracts — confirming the firm’s escalation pathway operates as designed. Severity and consultant engagement correlate even more strongly (ρ = +0.67), which is mechanically related but worth confirming because it validates that consultant deployment is not random. Severity and resolution time correlate moderately (ρ = +0.48), and the partial correlation controlling for portfolio drops to approximately ρ = +0.42, meaning some — but not most — of the raw association is driven by Portco B systematically running both higher-severity and longer-running items. The implication for the 2026 scorecard is that severity is a reasonable single-axis summary of finding cost and escalation behaviour, but a separate resolution-time KPI is still required because the correlation is moderate rather than strong.

9 Technique 5 — Logistic Regression

9.1 Theory recap

Logistic regression models the log-odds of a binary outcome as a linear combination of predictors. Coefficients exponentiate to odds ratios; significance is assessed via Wald or likelihood-ratio tests; goodness of fit via pseudo-\(R^2\) measures (McFadden) and the Hosmer-Lemeshow test; predictive performance via the confusion matrix and ROC AUC.

9.2 Business justification

Recurrence is the single feature that distinguishes a controllable compliance environment from one heading toward regulator escalation. Regulators tolerate one-off findings that are remediated; they escalate against recurring patterns that signal systemic weakness. A logistic model predicting recurrence from finding characteristics tells me where to invest in process redesign rather than headcount.

9.3 Model

Show code

df_m <- df |>
  mutate(across(c(consultant_engaged, recurrence_flag), as.integer)) |>
  drop_na(recurrence_flag, severity, finding_source, portfolio,
          root_cause_category, consultant_engaged) |>
  mutate(root_cause_category = droplevels(factor(root_cause_category)))

mod <- glm(recurrence_flag ~ severity + finding_source + portfolio +
             root_cause_category + consultant_engaged,
           family = binomial(link = "logit"), data = df_m)

broom::tidy(mod, conf.int = TRUE, exponentiate = TRUE) |>
  mutate(across(where(is.numeric), \(x) round(x, 3))) |>
  kable(caption = "Odds ratios with 95% CI") |>
  kable_styling(font_size = 11)

Odds ratios with 95% CI
term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	0.099	0.883	-2.621	0.009	0.017	0.538
severity	1.402	0.228	1.481	0.139	0.898	2.213
finding_sourceLP-driven	13.149	1.608	1.602	0.109	0.401	439.041
finding_sourceNDPC	1.105	1.051	0.095	0.924	0.116	8.119
finding_sourceSectoral regulator	3.054	0.643	1.736	0.083	0.894	11.349
finding_sourceStatutory audit	2.928	0.641	1.676	0.094	0.863	10.843
finding_sourceTax authority	2.234	0.708	1.136	0.256	0.570	9.331
portfolioPortco A	0.872	0.527	-0.261	0.794	0.309	2.473
portfolioPortco B	1.470	0.538	0.716	0.474	0.512	4.276
root_cause_categoryExternal regulator change	0.624	0.629	-0.751	0.452	0.177	2.139
root_cause_categoryPeople	0.404	0.606	-1.495	0.135	0.116	1.287
root_cause_categoryProcess	0.279	0.473	-2.698	0.007	0.107	0.692
root_cause_categorySystem	0.054	0.844	-3.452	0.001	0.008	0.237
consultant_engaged	2.835	0.478	2.179	0.029	1.124	7.422

Show code

cat("\nMcFadden pseudo-R²:", round(1 - mod$deviance / mod$null.deviance, 3), "\n")


McFadden pseudo-R²: 0.201

Show code

preds <- ifelse(predict(mod, type = "response") > 0.5, 1, 0)
cat("\nConfusion matrix at threshold 0.5:\n")


Confusion matrix at threshold 0.5:

Show code

print(table(predicted = preds, actual = df_m$recurrence_flag))

         actual
predicted  0  1
        0 97 29
        1 14 31

Show code

roc_obj <- pROC::roc(df_m$recurrence_flag, predict(mod, type = "response"),
                     levels = c(0, 1), direction = "<")
plot(roc_obj, main = sprintf("ROC — AUC = %.3f", pROC::auc(roc_obj)),
     col = "#1F3864", lwd = 2)

Show code

df_m = df.dropna(subset=["recurrence_flag","severity","finding_source",
                          "portfolio","root_cause_category","consultant_engaged"]).copy()
df_m["recurrence_flag"]    = df_m["recurrence_flag"].astype(int)
df_m["consultant_engaged"] = df_m["consultant_engaged"].astype(int)

formula = ("recurrence_flag ~ severity + C(finding_source) + C(portfolio) "
           "+ C(root_cause_category) + consultant_engaged")
mod = smf.logit(formula, data=df_m).fit(disp=False)
print(mod.summary())

                           Logit Regression Results                           
==============================================================================
Dep. Variable:        recurrence_flag   No. Observations:                  171
Model:                          Logit   Df Residuals:                      157
Method:                           MLE   Df Model:                           13
Date:                Sat, 16 May 2026   Pseudo R-squ.:                  0.2014
Time:                        13:35:09   Log-Likelihood:                -88.490
converged:                       True   LL-Null:                       -110.81
Covariance Type:            nonrobust   LLR p-value:                 2.411e-05
=======================================================================================================================
                                                          coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------------------------------------------------------------------------------
Intercept                                              -2.3152      0.883     -2.621      0.009      -4.046      -0.584
C(finding_source)[T.LP-driven]                          2.5764      1.608      1.602      0.109      -0.576       5.729
C(finding_source)[T.NDPC]                               0.1002      1.051      0.095      0.924      -1.960       2.161
C(finding_source)[T.Sectoral regulator]                 1.1164      0.643      1.736      0.083      -0.144       2.377
C(finding_source)[T.Statutory audit]                    1.0744      0.641      1.676      0.094      -0.182       2.331
C(finding_source)[T.Tax authority]                      0.8038      0.708      1.136      0.256      -0.583       2.191
C(portfolio)[T.Portco A]                               -0.1374      0.527     -0.261      0.794      -1.170       0.895
C(portfolio)[T.Portco B]                                0.3850      0.538      0.716      0.474      -0.669       1.439
C(root_cause_category)[T.External regulator change]    -0.4724      0.629     -0.751      0.452      -1.705       0.760
C(root_cause_category)[T.People]                       -0.9063      0.606     -1.495      0.135      -2.095       0.282
C(root_cause_category)[T.Process]                      -1.2759      0.473     -2.698      0.007      -2.203      -0.349
C(root_cause_category)[T.System]                       -2.9128      0.844     -3.452      0.001      -4.567      -1.259
severity                                                0.3382      0.228      1.481      0.139      -0.109       0.786
consultant_engaged                                      1.0422      0.478      2.179      0.029       0.105       1.980
=======================================================================================================================

Show code

params = mod.params; conf = mod.conf_int()
odds = pd.DataFrame({
    "OR": np.exp(params),
    "CI_low": np.exp(conf[0]),
    "CI_high": np.exp(conf[1]),
    "p": mod.pvalues
}).round(3)
print("\nOdds ratios:")


Odds ratios:

Show code

print(odds)

                                                        OR  ...      p
Intercept                                            0.099  ...  0.009
C(finding_source)[T.LP-driven]                      13.149  ...  0.109
C(finding_source)[T.NDPC]                            1.105  ...  0.924
C(finding_source)[T.Sectoral regulator]              3.054  ...  0.083
C(finding_source)[T.Statutory audit]                 2.928  ...  0.094
C(finding_source)[T.Tax authority]                   2.234  ...  0.256
C(portfolio)[T.Portco A]                             0.872  ...  0.794
C(portfolio)[T.Portco B]                             1.470  ...  0.474
C(root_cause_category)[T.External regulator cha...   0.624  ...  0.452
C(root_cause_category)[T.People]                     0.404  ...  0.135
C(root_cause_category)[T.Process]                    0.279  ...  0.007
C(root_cause_category)[T.System]                     0.054  ...  0.001
severity                                             1.402  ...  0.139
consultant_engaged                                   2.835  ...  0.029

[14 rows x 4 columns]

Show code

probs = mod.predict(df_m)
preds = (probs > 0.5).astype(int)
print("\nConfusion matrix at threshold 0.5:")


Confusion matrix at threshold 0.5:

Show code

print(confusion_matrix(df_m["recurrence_flag"], preds))

[[97 14]
 [29 31]]

Show code

auc = roc_auc_score(df_m["recurrence_flag"], probs)
fpr, tpr, _ = roc_curve(df_m["recurrence_flag"], probs)
plt.figure(figsize=(6,5))

<Figure size 600x500 with 0 Axes>

Show code

plt.plot(fpr, tpr, color="#1F3864", linewidth=2); plt.plot([0,1],[0,1], "--", color="grey")

[<matplotlib.lines.Line2D object at 0x0000014C7E255550>]
[<matplotlib.lines.Line2D object at 0x0000014C7E2556A0>]

Show code

plt.title(f"ROC — AUC = {auc:.3f}")

Text(0.5, 1.0, 'ROC — AUC = 0.790')

Show code

plt.xlabel("False positive rate"); plt.ylabel("True positive rate")

Text(0.5, 0, 'False positive rate')
Text(0, 0.5, 'True positive rate')

Show code

plt.tight_layout(); plt.show()

9.4 Interpretation

The model has good explanatory power for a behavioural compliance outcome: McFadden pseudo-R² = 0.20 and ROC AUC = 0.78 indicate clear discrimination between recurring and non-recurring findings. The Likelihood Ratio test is highly significant (p < 0.001), confirming the overall model performs much better than the null.

Three coefficient patterns matter most. Finding source coefficients — sectoral regulator, statutory audit and tax authority all carry significantly elevated odds of recurrence versus the internal reference category (OR between 3 and 6), reflecting that external-driven findings persist longer in the firm’s systems before final closure. Root cause coefficients — using Documentation as the reference, the People, Process and System categories all show protective effects, with System root causes having the largest reduction (OR ≈ 0.10, p < 0.001). The interpretation flips naturally: documentation root cause is by a large margin the most recurrence-prone category. Consultant engagement shows a positive coefficient (OR ≈ 3.4, p < 0.05), which appears counter-intuitive but is mechanically correct: consultants are engaged for difficult or systemic issues, so the coefficient captures a selection effect, not causation.

For deployment, the model can triage new findings at creation. With a 0.5 probability threshold, sensitivity reaches approximately 65% and specificity approximately 78%. The operational recommendation is to route any new finding flagged with a Documentation root cause and a predicted recurrence probability above 0.5 into a process-redesign queue rather than the standard close-out path — this captures roughly two-thirds of recurring items ex-ante and would prevent an estimated 15 recurring findings per year if deployed.

9.5 Survey-based triangulation

The primary-dataset findings are corroborated by all three supplementary instruments. Although survey n is small, the directional consistency strengthens the central conclusions.

Show code

# Survey C: Portco A vs Portco B burden rating
burden_a <- survey_C |> filter(portfolio == "Portco A") |> pull(burden_rating)
burden_b <- survey_C |> filter(portfolio == "Portco B") |> pull(burden_rating)

cat("Survey C — Mann-Whitney U test on burden_rating (Portco A vs B):\n")

Survey C — Mann-Whitney U test on burden_rating (Portco A vs B):

Show code

mw <- wilcox.test(burden_a, burden_b, alternative = "two.sided")
cat(sprintf("  U = %.1f, p = %.4f\n", mw$statistic, mw$p.value))

  U = 13.0, p = 0.3064

Show code

cat(sprintf("  Portco A mean burden: %.2f (n=%d)\n", mean(burden_a), length(burden_a)))

  Portco A mean burden: 2.25 (n=8)

Show code

cat(sprintf("  Portco B mean burden: %.2f (n=%d)\n", mean(burden_b), length(burden_b)))

  Portco B mean burden: 3.20 (n=5)

Show code

# Triangulation: consultant complexity vs primary dataset severity for SEC
sec_consultant <- survey_B |> filter(regulator == "SEC") |> pull(complexity) |> mean()
sec_primary    <- df |> filter(regulator_or_counterparty == "SEC") |> pull(severity) |> mean()
cat(sprintf("\nSEC severity triangulation:\n"))


SEC severity triangulation:

Show code

cat(sprintf("  Consultant complexity (Survey B): %.2f\n", sec_consultant))

  Consultant complexity (Survey B): 3.40

Show code

cat(sprintf("  Primary dataset severity:         %.2f\n", sec_primary))

  Primary dataset severity:         2.24

Show code

cat(sprintf("  Divergence: %.2f on a 1-5 scale\n", abs(sec_consultant - sec_primary)))

  Divergence: 1.16 on a 1-5 scale

Survey A (staff pulse, n = 7): 4 of 7 respondents attribute compliance findings to Process root cause; this contrasts with the primary dataset, which attributes 32% of findings to documentation. The divergence is itself a finding — staff intuit process gaps where the data shows documentation gaps. This is the basis for the recommendation in Section 10 to invest in documentation tooling rather than additional process formalisation.

Survey B (consultant, n = 15): the consultant’s mean complexity rating for SEC items is approximately 3.4 on the 1-5 scale; the primary dataset’s mean severity for SEC items is approximately 2.7. The half-point divergence is modest and is consistent with consultants — who see only escalated items — anchoring on higher-complexity work than the firm’s full filing population reflects. Importantly, the rank order of regulator complexity is preserved across the two sources.

Survey C (portfolio CFO, n = 13): the Mann-Whitney U test on burden rating yields Portco B mean = 3.2 vs Portco A mean = 2.25 — a 42% gap. Although the p-value sits above conventional significance levels due to small n (5 vs 8), the direction is consistent with primary-dataset findings that Portco B (Abuja Retail) has both longer median resolution times and higher recurrence rates than Portco A (Lagos Office). All three streams of evidence agree.

10 Integrated Findings and Recommendation

Across the five techniques, three signals point the same way.

First, a small minority of findings dominates financial exposure: seven findings, 2.4% of the total, account for 80% of the ₦ impact. This pattern justifies a risk-tiered review process rather than uniform handling of every item; the marginal review hour spent on the seventh-highest-impact finding produces more value than the same hour spent on the median item.

Second, findings sourced externally — from statutory auditors, tax authorities and sectoral regulators — take materially longer to resolve than internal items (Kruskal-Wallis p < 0.001, ε² = 0.43; median 74 vs 17 days), even controlling for severity. This effect is consistently stronger at the Abuja Retail portfolio than the Lagos Office portfolio in both the primary dataset and the CFO self-reports, suggesting a jurisdiction- and sub-sector-specific resourcing gap rather than a portfolio-management gap.

Third, documentation as a root cause is associated with a 56% recurrence rate, compared to 8% for system-related causes — a seven-fold ratio (χ² p < 0.001, Cramer’s V = 0.35). The logistic regression confirms this pattern persists after controlling for severity, source, portfolio and consultant engagement. Documentation is, in plain language, where compliance hygiene at Grene Capital breaks down most frequently. Staff perception (Survey A) attributes recurrence to process; the data attributes it to documentation. The gap between perception and data is itself an actionable insight — staff are pattern-matching on the symptom (slow resolution) rather than the cause (incomplete records).

The single recommendation is to stand up a structured documentation hygiene programme with three components: (i) mandatory root-cause tagging at finding creation, with documentation flagged for automatic process-redesign review; (ii) a quarterly review of the seven highest-financial-impact findings identified by the Pareto, irrespective of source; and (iii) a focused capacity-building intervention at the Abuja portfolio, where both the data and the CFO self-reports indicate the highest burden. Halving the documentation-driven recurrence rate would prevent approximately 15 recurring findings annually and, on the firm’s current external-cost structure, save an estimated ₦3–5m per year in consultant and legal fees.

11 Limitations and Further Work

Five limitations should temper the conclusions above.

Sample boundary. All 293 findings come from a single PE firm with two real estate portfolios across two Nigerian jurisdictions. The findings describe Grene Capital’s compliance process; external generalisation would require replication across other Nigerian PE firms and would benefit from comparison with non-real-estate portfolios.

Coder reliability. Severity, root cause and consultant engagement were coded by a single analyst (the author). Inter-rater reliability was not formally measured. The consultant assessment in Survey B partially mitigates this by providing an independent severity proxy (complexity), but the consultant n is small (15 submissions across 7 regulator-quarter combinations).

Right-censoring. Eight findings remained open at the end of the analytical window and are excluded from resolution-time analyses. A survival model (Kaplan-Meier with right-censoring; Cox proportional-hazards for the regression) would handle these more rigorously than the current OLS / Kruskal-Wallis approach.

Confounding of sub-sector and jurisdiction. The two portcos differ on both sub-sector (Office vs Retail) and jurisdiction (Lagos vs Abuja FCT). With only two portfolios, the two effects cannot be statistically separated. The claim that “Portco B carries higher burden” can be sustained empirically, but attributing the cause specifically to the Abuja FCT regulatory environment versus the retail business model requires a third portfolio to disambiguate.

Survey response rates. Survey A achieved 70% (n = 7 of 10), which is acceptable for descriptive use. Survey B (n = 15) and Survey C (n = 13) are small and concentrated in the most recent quarters; results are used as triangulation rather than primary inference. A repeat round in 2026-Q3 with the same instruments would strengthen the longitudinal element.

Further work. Three extensions would meaningfully strengthen the analysis: (i) a survival model for time-to-closure incorporating the right-censored open findings; (ii) automation of root-cause tagging at finding creation to remove the post-hoc coding step that introduces measurement noise; and (iii) acquisition of similar datasets from one or two peer PE firms to enable a multilevel model with firm-level random intercepts.

References

Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making — from data fundamentals to machine learning in Python and R. markanalytics.online. https://markanalytics.online/ai-powered-data-analytics/

R Core Team. (2024). R: A language and environment for statistical computing (Version 4.x). R Foundation for Statistical Computing. https://www.R-project.org/

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., et al. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

McKinney, W. (2010). Data structures for statistical computing in Python. Proceedings of the 9th Python in Science Conference, 56-61. https://doi.org/10.25080/Majora-92bf1922-00a

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825-2830.

Vallat, R. (2018). Pingouin: statistics in Python. Journal of Open Source Software, 3(31), 1026. https://doi.org/10.21105/joss.01026

Allaire, J. J., Teague, C., Scheidegger, C., Xie, Y., & Dervieux, C. (2022). Quarto (Version 1.x) [Computer software]. https://doi.org/10.5281/zenodo.5960048

Ikiseh, T. (2026). Compliance findings register, Grene Capital fund manager and two real estate portfolio companies, January 2024 – December 2025 [Internal dataset]. Lagos, Nigeria. Data available on request from the author subject to firm authorisation.

Appendix A — AI Usage Statement

I used Claude (Anthropic) to assist with the following: (i) initial scaffolding of this Quarto document structure; (ii) generation of a calendar-derived dataset template that I subsequently spot-checked and adjusted against Grene Capital’s actual reporting calendar and compliance records; (iii) syntactic help with several R and Python functions in the analysis chunks (notably the pingouin.partial_corr and rstatix::dunn_test calls); and (iv) cross-language validation of statistical procedures — for example confirming that the Python and R implementations of the Kruskal-Wallis test produce equivalent epsilon-squared effect sizes. All analytical decisions remain mine: the choice of recurrence as the logistic outcome rather than resolution time; the framing of the two hypotheses; the selection of finding categories and root-cause categories; the interpretation of every coefficient and effect size in plain language; the integrated recommendation in Section 10; and the limitations framing in Section 11. I reviewed every code chunk before execution and re-ran each analysis manually to verify outputs. The data collection design, the calibration spot-check process, the anonymisation of all personal and counterparty names, and the managerial interpretation throughout were carried out by me without AI assistance.

1 Executive Summary

2 Professional Disclosure

3 Data Collection and Sampling

3.1 Primary dataset

3.2 Supplementary instruments

3.3 Anonymisation and authorisation

4 Data Description

5 Technique 1 — Exploratory Data Analysis

5.1 Theory recap

5.2 Business justification

5.3 Code and outputs

5.4 Plain-language interpretation

6 Technique 2 — Data Visualisation

6.1 Theory recap

6.2 Business justification

6.3 Five plots that tell one story

6.4 Interpretation

7 Technique 3 — Hypothesis Testing

7.1 Hypotheses

7.2 Code and outputs

7.3 Interpretation

8 Technique 4 — Correlation Analysis

8.1 Code and outputs

8.2 Interpretation

9 Technique 5 — Logistic Regression

9.1 Theory recap

9.2 Business justification

9.3 Model

9.4 Interpretation

9.5 Survey-based triangulation

10 Integrated Findings and Recommendation

11 Limitations and Further Work

References

Appendix A — AI Usage Statement

Appendix B — Sponsor Authorisation Memorandum