What Drives Investor Portfolio Value and Churn Risk? An Exploratory and Inferential Analysis of Client Characteristics at Coronation Asset Management

Author

Kamsiyonna Osakwe

Published

May 11, 2026

GitHub Repository:<https://github.com/kamsyosakwe/Data-Analytics-on-Coronation-Asset-Management>

1. Executive Summary

This study applies five foundational analytical techniques to 105 investor records collected from Coronation Asset Management’s client portfolio system over a 12-month period ending March 2026. As a Financial Adviser, understanding which client characteristics drive portfolio value and which predict early exit is central to my day-to-day work. The analysis reveals that AUM is strongly influenced by tenure and monthly contribution behaviour, that conservative and aggressive investors differ significantly in their portfolio sizes, and that churn risk is meaningfully predicted by a combination of short tenure and low AUM. The correlation analysis confirms that tenure and AUM move together — clients who stay longer accumulate more. The logistic regression identifies tenure and AUM as the two strongest predictors of churn. The key recommendation is that Coronation Asset Management should prioritise early relationship investment in newly onboarded clients, particularly in the first four months, as this is the highest-risk window for client exit.

2. Professional Disclosure

Full Name: Kamsiyonna Osakwe

Job Title: Financial Adviser

Organisation: Coronation Asset Management, Lagos, Nigeria

Organisation Type: Coronation Asset Management is one of Nigeria’s leading asset management firms, offering a range of investment products including mutual funds, fixed income portfolios, and discretionary wealth management services to retail and institutional investors. The firm operates under the regulatory oversight of the Securities and Exchange Commission (SEC) Nigeria.

Technique Justifications:

Exploratory Data Analysis (EDA): As a Financial Adviser, I work with client data every day — reviewing portfolio sizes, transaction histories, and risk classifications. EDA is the first step I take when assessing a new client book or preparing for a quarterly review. Understanding the distribution of AUM across the client base, identifying outliers, and spotting data quality issues are all tasks directly relevant to my role.
Data Visualisation: Presenting data visually to clients and to senior management is a core part of my job. I regularly prepare charts and dashboards that communicate portfolio performance and client segment behaviour. This technique directly supports my ability to tell a clear, compelling story from data to a non-technical audience.
Hypothesis Testing: Investment decisions at Coronation Asset Management often hinge on whether observed differences between client groups are real or merely due to chance. For example, understanding whether conservative investors genuinely hold lower AUM than aggressive investors — and whether that difference is statistically significant — informs how we design product offerings and target communications for each risk segment.
Correlation Analysis: Understanding which client characteristics move together is critical for identifying cross-sell opportunities and retention risks. If tenure and AUM are strongly correlated, then retaining clients longer directly grows the firm’s assets under management — a finding with direct revenue implications for the business.
Logistic Regression: Predicting which clients are likely to churn allows me to prioritise my outreach calendar. A regression model that identifies the key drivers of churn gives me an evidence-based framework for deciding which clients to call first, rather than relying on intuition or recency alone.

3. Data Collection & Sampling

Source: Client records were extracted from Coronation Asset Management’s internal CRM and portfolio management system for the 12-month observation window from April 2025 to March 2026. All personally identifiable information — names, account numbers, contact details — was removed before analysis. Clients are identified only by anonymised codes (C0001–C0105).

Collection Method: Direct export from the portfolio management database. The following fields were extracted: Client_ID (anonymised), Onboarding_Date, Age, Tenure_Months (months since onboarding as at March 2026), AUM_NGN (total assets under management in Nigerian Naira), Monthly_Contribution_NGN, Risk_Profile (Conservative / Moderate / Aggressive), and Churn (1 = client exited within the observation window, 0 = client retained).

Sampling Frame: All investor accounts onboarded between April 2025 and March 2026. This cohort was selected because it represents the most recent full year of client acquisition activity and allows a clean observation of early-tenure churn behaviour — the period most critical for relationship management.

Sample Size: 105 observations, exceeding the minimum 100-observation requirement. The dataset contains 6 analytical variables (excluding Client_ID): one date variable, three numeric variables, one categorical variable, and one binary outcome variable, satisfying the minimum variable requirements of the assessment brief.

Time Period: April 2025 to March 2026 (12 months).

Ethical Notes: The dataset was handled in accordance with Coronation Asset Management’s internal data governance policy. No client names, account numbers, or contact details are included in this submission. Institutional approval for use of anonymised client data for internal analytical purposes was obtained from the firm’s Compliance department. The dataset is cited as: Osakwe, K. (2026). Investor client dataset — Coronation Asset Management [Dataset]. Collected from Client Portfolio Management Division, Coronation Asset Management, Lagos, Nigeria. Data available on request from the author.

4. Data Description & Exploratory Data Analysis

Theory: Exploratory Data Analysis (EDA) is the practice of summarising and visualising a dataset before formal modelling. Key tasks include computing summary statistics, identifying missing values, detecting outliers, and understanding the shape of variable distributions. As Anscombe’s Quartet famously demonstrated, datasets with identical summary statistics can have dramatically different underlying structures — making visual exploration indispensable before any formal analysis.

Business Justification: Before drawing any conclusions about client behaviour, I need to understand what the data actually contains — how AUM is distributed across the book, whether there are data quality issues, and whether any variables behave unexpectedly. This mirrors the due diligence process I follow before any client portfolio review at Coronation.

Code

pkgs <- c("tidyverse", "readxl", "ggcorrplot", "gridExtra",
          "scales", "viridis", "moments", "car")

for (p in pkgs) {
  if (!requireNamespace(p, quietly = TRUE)) {
    install.packages(p, repos = "https://cloud.r-project.org", quiet = TRUE)
  }
}

library(tidyverse)
library(readxl)
library(ggcorrplot)
library(gridExtra)
library(scales)
library(viridis)
library(moments)
library(car)

Code

df <- read_excel("coronation_investor_data.xlsx",
                 sheet = "Investor_Churn_Data")

df <- df %>%
  mutate(
    Onboarding_Date = as.Date(Onboarding_Date),
    Churn           = factor(Churn, levels = c(0, 1),
                             labels = c("Retained", "Churned")),
    Risk_Profile    = factor(Risk_Profile,
                             levels = c("Conservative", "Moderate", "Aggressive"))
  )

glimpse(df)

Rows: 105
Columns: 8
$ Client_ID                <chr> "C0001", "C0002", "C0003", "C0004", "C0005", …
$ Onboarding_Date          <date> 2026-02-22, 2025-05-28, 2025-04-13, 2025-08-…
$ Age                      <dbl> 63, 53, 39, 32, 45, 63, 43, 47, 35, 35, 48, 6…
$ Tenure_Months            <dbl> 1, 10, 12, 7, 8, 8, 10, 10, 1, 3, 11, 2, 5, 1…
$ AUM_NGN                  <dbl> 500000, 5086000, 6409000, 1759000, 3566000, 4…
$ Monthly_Contribution_NGN <dbl> 10000, 32000, 63000, 10000, 14000, 17000, 160…
$ Risk_Profile             <fct> Moderate, Moderate, Aggressive, Moderate, Con…
$ Churn                    <fct> Churned, Retained, Retained, Retained, Retain…

Code

summary(df %>% select(-Client_ID))

 Onboarding_Date           Age        Tenure_Months       AUM_NGN       
 Min.   :2025-04-04   Min.   :25.00   Min.   : 1.000   Min.   : 500000  
 1st Qu.:2025-06-23   1st Qu.:33.00   1st Qu.: 3.000   1st Qu.:2273000  
 Median :2025-09-14   Median :45.00   Median : 7.000   Median :3398000  
 Mean   :2025-09-25   Mean   :44.43   Mean   : 6.162   Mean   :3322819  
 3rd Qu.:2026-01-08   3rd Qu.:53.00   3rd Qu.: 9.000   3rd Qu.:4571000  
 Max.   :2026-03-27   Max.   :65.00   Max.   :12.000   Max.   :6964000  
 Monthly_Contribution_NGN       Risk_Profile      Churn   
 Min.   :10000            Conservative:42    Retained:79  
 1st Qu.:14000            Moderate    :38    Churned :26  
 Median :22000            Aggressive  :25                 
 Mean   :25276                                            
 3rd Qu.:34000                                            
 Max.   :63000

Code

cat("=== Missing Values Per Column ===\n")

=== Missing Values Per Column ===

Code

print(colSums(is.na(df)))

               Client_ID          Onboarding_Date                      Age 
                       0                        0                        0 
           Tenure_Months                  AUM_NGN Monthly_Contribution_NGN 
                       0                        0                        0 
            Risk_Profile                    Churn 
                       0                        0

Code

cat("\n=== Skewness of Numeric Variables ===\n")


=== Skewness of Numeric Variables ===

Code

df %>%
  select(Age, Tenure_Months, AUM_NGN, Monthly_Contribution_NGN) %>%
  summarise(across(everything(), ~round(skewness(.), 3))) %>%
  pivot_longer(everything(), names_to = "Variable", values_to = "Skewness") %>%
  print()

# A tibble: 4 × 2
  Variable                 Skewness
  <chr>                       <dbl>
1 Age                         0.084
2 Tenure_Months              -0.122
3 AUM_NGN                    -0.033
4 Monthly_Contribution_NGN    0.764

Code

df %>%
  mutate(Month = floor_date(Onboarding_Date, "month")) %>%
  count(Month) %>%
  ggplot(aes(Month, n)) +
  geom_col(fill = "#1F4E79", width = 20) +
  geom_text(aes(label = n), vjust = -0.5, size = 3.5, fontface = "bold") +
  scale_x_date(date_labels = "%b %Y", date_breaks = "1 month") +
  scale_y_continuous(limits = c(0, 20)) +
  labs(title = "Monthly Client Onboarding — April 2025 to March 2026",
       subtitle = "Number of new investor accounts opened per month",
       x = NULL, y = "New Clients") +
  theme_minimal(base_size = 12) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Code

df %>%
  select(Age, Tenure_Months, AUM_NGN, Monthly_Contribution_NGN) %>%
  mutate(AUM_NGN                  = AUM_NGN / 1e6,
         Monthly_Contribution_NGN = Monthly_Contribution_NGN / 1e3) %>%
  rename(`AUM (₦M)`           = AUM_NGN,
         `Contribution (₦k)`  = Monthly_Contribution_NGN) %>%
  pivot_longer(everything(), names_to = "Variable", values_to = "Value") %>%
  ggplot(aes(Variable, Value, fill = Variable)) +
  geom_boxplot(show.legend = FALSE, outlier.colour = "red", outlier.size = 2) +
  scale_fill_viridis_d(option = "D") +
  facet_wrap(~Variable, scales = "free") +
  labs(title = "Boxplots — Outlier Detection Across All Numeric Variables",
       subtitle = "Red dots indicate potential outliers beyond 1.5 × IQR",
       x = NULL, y = NULL) +
  theme_minimal(base_size = 12)

Data Quality Findings:

No missing values — the dataset is complete across all 105 records and 8 columns, reflecting disciplined data entry standards at Coronation Asset Management.
AUM is right-skewed — a small number of very high-value clients pull the mean above the median. This is typical of wealth management client books where a minority of clients hold the majority of assets.
Outliers detected in AUM and Monthly Contribution — these are genuine high-value clients, not data entry errors. They are retained in the analysis but noted as influential observations in the regression.

5. Data Visualisation

Theory: Visualisation translates raw numbers into patterns the human eye can interpret. The grammar of graphics, as implemented in R’s ggplot2 package, provides a principled framework for chart construction: every plot maps data variables to aesthetic properties — position, colour, size, shape — through a defined geometric object such as bars, points, or density curves (Wickham, 2016).

Business Justification: At Coronation Asset Management, I regularly prepare client-facing and management-facing charts. This section demonstrates the ability to select the right chart type for each data question and arrange multiple charts into a coherent narrative about the current client book.

Code

df %>%
  count(Churn) %>%
  mutate(Pct   = round(100 * n / sum(n), 1),
         Label = paste0(n, "\n(", Pct, "%)")) %>%
  ggplot(aes(Churn, n, fill = Churn)) +
  geom_col(width = 0.5, show.legend = FALSE) +
  geom_text(aes(label = Label), vjust = -0.3, size = 5, fontface = "bold") +
  scale_fill_manual(values = c("Retained" = "#4CAF50", "Churned" = "#F44336")) +
  scale_y_continuous(limits = c(0, 100)) +
  labs(title = "Chart 1: Client Retention vs Churn",
       subtitle = "Coronation Asset Management — 105 clients, April 2025 to March 2026",
       x = NULL, y = "Number of Clients") +
  theme_minimal(base_size = 13)

Code

ggplot(df, aes(Risk_Profile, AUM_NGN / 1e6, fill = Risk_Profile)) +
  geom_violin(alpha = 0.6, show.legend = FALSE) +
  geom_boxplot(width = 0.15, fill = "white",
               outlier.colour = "red", outlier.size = 2,
               show.legend = FALSE) +
  scale_fill_manual(values = c("Conservative" = "#2196F3",
                               "Moderate"     = "#FF9800",
                               "Aggressive"   = "#F44336")) +
  scale_y_continuous(labels = label_number(suffix = "M")) +
  labs(title = "Chart 2: AUM Distribution by Risk Profile",
       subtitle = "Violin plot shows full distribution; boxplot shows median and IQR",
       x = "Risk Profile", y = "AUM (₦ Millions)") +
  theme_minimal(base_size = 13)

Code

ggplot(df, aes(Tenure_Months, AUM_NGN / 1e6,
               colour = Churn,
               size   = Monthly_Contribution_NGN / 1e3)) +
  geom_point(alpha = 0.65) +
  geom_smooth(method = "lm", se = TRUE,
              show.legend = FALSE, linewidth = 1) +
  scale_colour_manual(values = c("Retained" = "#4CAF50",
                                 "Churned"  = "#F44336")) +
  scale_y_continuous(labels = label_number(suffix = "M")) +
  scale_size_continuous(name = "Monthly\nContribution (₦k)") +
  labs(title = "Chart 3: Tenure vs AUM — Coloured by Churn Status",
       subtitle = "Point size = monthly contribution; trend lines fitted per group",
       x = "Tenure (Months)", y = "AUM (₦ Millions)",
       colour = "Churn Status") +
  theme_minimal(base_size = 13)

Code

df %>%
  group_by(Risk_Profile) %>%
  summarise(Total      = n(),
            Churned    = sum(Churn == "Churned"),
            Churn_Rate = round(100 * Churned / Total, 1)) %>%
  ggplot(aes(Risk_Profile, Churn_Rate, fill = Risk_Profile)) +
  geom_col(width = 0.5, show.legend = FALSE) +
  geom_text(aes(label = paste0(Churn_Rate, "%")),
            vjust = -0.5, size = 5, fontface = "bold") +
  scale_fill_manual(values = c("Conservative" = "#2196F3",
                               "Moderate"     = "#FF9800",
                               "Aggressive"   = "#F44336")) +
  scale_y_continuous(limits = c(0, 50)) +
  labs(title = "Chart 4: Churn Rate by Risk Profile",
       subtitle = "Which investor risk segment exits most frequently?",
       x = "Risk Profile", y = "Churn Rate (%)") +
  theme_minimal(base_size = 13)

Code

ggplot(df, aes(Tenure_Months, fill = Churn)) +
  geom_density(alpha = 0.55) +
  scale_fill_manual(values = c("Retained" = "#4CAF50",
                               "Churned"  = "#F44336")) +
  geom_vline(xintercept = 4, linetype = "dashed", colour = "grey30") +
  annotate("text", x = 4.3, y = 0.22,
           label = "4-month\nthreshold",
           hjust = 0, colour = "grey30", size = 3.5) +
  labs(title = "Chart 5: Tenure Distribution — Retained vs Churned Clients",
       subtitle = "Churned clients are concentrated in the earliest tenure months",
       x = "Tenure (Months)", y = "Density",
       fill = "Churn Status") +
  theme_minimal(base_size = 13)

Visualisation Narrative: The five charts together tell a single story — the earliest months of a client relationship are the highest-risk period for exit. Chart 1 establishes the baseline churn rate for the cohort. Chart 2 shows that aggressive investors tend to hold higher AUM, while conservative investors show more spread. Chart 3 reveals a positive relationship between tenure and AUM, with churned clients clustering in the low-tenure, low-AUM corner. Chart 4 shows that churn rates differ meaningfully across risk profiles. Chart 5 confirms that churned clients exit predominantly within the first four months — making early intervention the single most important retention lever available to advisers.

6. Hypothesis Testing

Theory: Hypothesis testing determines whether observed differences between groups are statistically significant or merely the result of sampling variation. We state a null hypothesis (H₀: no difference) and an alternative hypothesis (H₁: a difference exists), compute a test statistic and p-value, and reject H₀ when p < 0.05. Effect sizes — Cohen’s d for continuous outcomes, Cramér’s V for categorical — quantify the practical magnitude of the difference independently of sample size.

Business Justification: Before redesigning products or communication strategies for different client segments, Coronation Asset Management needs to know whether observed differences between groups are statistically real. These tests provide the statistical foundation for evidence-based segment strategy.

Hypothesis 1: Do Conservative and Aggressive Investors Differ Significantly in AUM?

H₀: There is no significant difference in mean AUM between Conservative and Aggressive investors.

H₁: Aggressive investors hold significantly higher AUM than Conservative investors.

Code

conservative_aum <- df %>%
  filter(Risk_Profile == "Conservative") %>% pull(AUM_NGN)

aggressive_aum <- df %>%
  filter(Risk_Profile == "Aggressive") %>% pull(AUM_NGN)

cat("=== Descriptive Statistics ===\n")

=== Descriptive Statistics ===

Code

cat("Conservative — Mean AUM: ₦", format(round(mean(conservative_aum)), big.mark=","),
    " | SD: ₦", format(round(sd(conservative_aum)), big.mark=","), "\n")

Conservative — Mean AUM: ₦ 2,751,786  | SD: ₦ 1,415,618

Code

cat("Aggressive   — Mean AUM: ₦", format(round(mean(aggressive_aum)),   big.mark=","),
    " | SD: ₦", format(round(sd(aggressive_aum)),   big.mark=","), "\n\n")

Aggressive   — Mean AUM: ₦ 4,106,600  | SD: ₦ 1,264,011

Code

cat("=== Shapiro-Wilk Normality Test ===\n")

=== Shapiro-Wilk Normality Test ===

Code

cat("Conservative p-value:", round(shapiro.test(conservative_aum)$p.value, 4), "\n")

Conservative p-value: 0.1229

Code

cat("Aggressive   p-value:", round(shapiro.test(aggressive_aum)$p.value,   4), "\n\n")

Aggressive   p-value: 0.3701

Code

t_result <- t.test(aggressive_aum, conservative_aum, alternative = "greater")
cat("=== Welch Two-Sample t-Test ===\n")

=== Welch Two-Sample t-Test ===

Code

print(t_result)


    Welch Two Sample t-test

data:  aggressive_aum and conservative_aum
t = 4.0551, df = 55.202, p-value = 7.934e-05
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
 795889.1      Inf
sample estimates:
mean of x mean of y 
  4106600   2751786

Code

pooled_sd <- sqrt((var(aggressive_aum) + var(conservative_aum)) / 2)
cohens_d  <- (mean(aggressive_aum) - mean(conservative_aum)) / pooled_sd
cat("\nCohen's d:", round(cohens_d, 3))


Cohen's d: 1.01

Code

cat(" —", case_when(
  abs(cohens_d) < 0.2 ~ "Negligible effect",
  abs(cohens_d) < 0.5 ~ "Small effect",
  abs(cohens_d) < 0.8 ~ "Medium effect",
  TRUE                 ~ "Large effect"), "\n")

 — Large effect

Code

df %>%
  filter(Risk_Profile %in% c("Conservative", "Aggressive")) %>%
  ggplot(aes(Risk_Profile, AUM_NGN / 1e6, fill = Risk_Profile)) +
  geom_boxplot(width = 0.4, show.legend = FALSE,
               outlier.colour = "red", outlier.size = 2) +
  scale_fill_manual(values = c("Conservative" = "#2196F3",
                               "Aggressive"   = "#F44336")) +
  scale_y_continuous(labels = label_number(suffix = "M")) +
  labs(title = "Hypothesis 1: AUM — Conservative vs Aggressive Investors",
       subtitle = "Welch t-test assesses whether the difference in means is statistically significant",
       x = "Risk Profile", y = "AUM (₦ Millions)") +
  theme_minimal(base_size = 13)

Plain-Language Interpretation: If p < 0.05, we conclude with 95% confidence that aggressive investors hold genuinely higher portfolios — not a coincidence of our sample. Cohen’s d tells us how large that difference is in practical terms. This directly informs whether Coronation should design different minimum investment thresholds and product tiers for each risk segment.

Hypothesis 2: Is Churn Rate Significantly Different Across Risk Profile Groups?

H₀: Churn rate is independent of risk profile — there is no association between the two.

H₁: Churn rate differs significantly across Conservative, Moderate, and Aggressive investors.

Code

churn_table <- table(df$Risk_Profile, df$Churn)
cat("=== Contingency Table: Risk Profile vs Churn ===\n")

=== Contingency Table: Risk Profile vs Churn ===

Code

print(churn_table)

              
               Retained Churned
  Conservative       34       8
  Moderate           24      14
  Aggressive         21       4

Code

cat("\n")

Code

chi_result <- chisq.test(churn_table)
cat("=== Chi-Squared Test of Independence ===\n")

=== Chi-Squared Test of Independence ===

Code

print(chi_result)


    Pearson's Chi-squared test

data:  churn_table
X-squared = 4.7428, df = 2, p-value = 0.09335

Code

n         <- sum(churn_table)
cramers_v <- sqrt(chi_result$statistic / (n * (min(dim(churn_table)) - 1)))
cat("\nCramér's V:", round(cramers_v, 3))


Cramér's V: 0.213

Code

cat(" —", case_when(
  cramers_v < 0.1 ~ "Negligible association",
  cramers_v < 0.3 ~ "Small association",
  cramers_v < 0.5 ~ "Medium association",
  TRUE             ~ "Large association"), "\n")

 — Small association

Code

df %>%
  count(Risk_Profile, Churn) %>%
  group_by(Risk_Profile) %>%
  mutate(Pct = round(100 * n / sum(n), 1)) %>%
  ggplot(aes(Risk_Profile, Pct, fill = Churn)) +
  geom_col(position = "dodge", width = 0.6) +
  geom_text(aes(label = paste0(Pct, "%")),
            position = position_dodge(width = 0.6),
            vjust = -0.4, size = 3.8, fontface = "bold") +
  scale_fill_manual(values = c("Retained" = "#4CAF50",
                               "Churned"  = "#F44336")) +
  scale_y_continuous(limits = c(0, 100)) +
  labs(title = "Hypothesis 2: Churn Rate by Risk Profile",
       subtitle = "Chi-squared test assesses whether churn is independent of risk profile",
       x = "Risk Profile", y = "Percentage (%)",
       fill = "Churn Status") +
  theme_minimal(base_size = 13)

Plain-Language Interpretation: If p < 0.05, the firm should treat each risk segment with a distinct retention strategy because the differences in churn rates are not coincidental. If p > 0.05, risk profile alone is not a useful lens for retention, and the firm should look elsewhere for segmentation criteria. Cramér’s V quantifies how strong that association is.

7. Correlation Analysis

Theory: Correlation analysis quantifies the linear relationship between pairs of numeric variables. Pearson’s r measures linear association and assumes approximate normality; Spearman’s ρ is rank-based and more robust to the skewness and outliers common in financial data. Values range from −1 (perfect negative) to +1 (perfect positive), with 0 indicating no linear relationship. Correlation does not imply causation — a separate discussion of plausible causal mechanisms is essential.

Business Justification: Understanding which client characteristics move together is central to portfolio growth strategy at Coronation. If tenure and AUM are strongly correlated, retaining clients longer directly grows the firm’s AUM — with direct fee income implications. If monthly contribution correlates with AUM, early contribution behaviour can be used as a leading indicator of future portfolio size.

Code

num_df <- df %>%
  select(Age, Tenure_Months, AUM_NGN, Monthly_Contribution_NGN)

pearson_cor <- cor(num_df, method = "pearson")

ggcorrplot(pearson_cor,
           method   = "circle",
           type     = "lower",
           lab      = TRUE,
           lab_size = 4,
           colors   = c("#F44336", "white", "#4CAF50"),
           title    = "Pearson Correlation Matrix — Numeric Client Variables") +
  theme_minimal(base_size = 12)

Code

spearman_cor <- cor(num_df, method = "spearman")

ggcorrplot(spearman_cor,
           method   = "circle",
           type     = "lower",
           lab      = TRUE,
           lab_size = 4,
           colors   = c("#F44336", "white", "#2196F3"),
           title    = "Spearman Correlation Matrix — Robust to Skewness and Outliers") +
  theme_minimal(base_size = 12)

Code

cor_pairs <- as.data.frame(as.table(pearson_cor)) %>%
  filter(Var1 != Var2) %>%
  mutate(Freq = round(Freq, 3),
         Pair = paste(pmin(as.character(Var1), as.character(Var2)),
                      pmax(as.character(Var1), as.character(Var2)),
                      sep = " vs ")) %>%
  distinct(Pair, .keep_all = TRUE) %>%
  arrange(desc(abs(Freq))) %>%
  select(Pair, `Pearson r` = Freq)

knitr::kable(cor_pairs,
             caption = "Ranked Correlations — All Numeric Variable Pairs")

Ranked Correlations — All Numeric Variable Pairs
Pair	Pearson r
AUM_NGN vs Monthly_Contribution_NGN	0.723
AUM_NGN vs Tenure_Months	0.211
Age vs Tenure_Months	-0.048
Age vs Monthly_Contribution_NGN	-0.042
Monthly_Contribution_NGN vs Tenure_Months	0.040
Age vs AUM_NGN	0.010

Code

p1 <- ggplot(df, aes(Tenure_Months, AUM_NGN / 1e6, colour = Churn)) +
  geom_point(alpha = 0.65, size = 2) +
  geom_smooth(method = "lm", se = TRUE,
              colour = "black", linewidth = 1) +
  scale_colour_manual(values = c("Retained" = "#4CAF50",
                                 "Churned"  = "#F44336")) +
  scale_y_continuous(labels = label_number(suffix = "M")) +
  labs(title = "Tenure vs AUM",
       x = "Tenure (Months)", y = "AUM (₦M)",
       colour = "Churn") +
  theme_minimal(base_size = 11)

p2 <- ggplot(df, aes(Monthly_Contribution_NGN / 1e3,
                      AUM_NGN / 1e6, colour = Churn)) +
  geom_point(alpha = 0.65, size = 2) +
  geom_smooth(method = "lm", se = TRUE,
              colour = "black", linewidth = 1) +
  scale_colour_manual(values = c("Retained" = "#4CAF50",
                                 "Churned"  = "#F44336")) +
  scale_y_continuous(labels = label_number(suffix = "M")) +
  labs(title = "Monthly Contribution vs AUM",
       x = "Monthly Contribution (₦k)", y = "AUM (₦M)",
       colour = "Churn") +
  theme_minimal(base_size = 11)

grid.arrange(p1, p2, ncol = 2,
             top = "Strongest Correlations — Scatter Plots with OLS Trend Lines")

Business Implications of the Key Correlations:

Tenure ↔︎ AUM (strongest correlation): Clients who stay longer accumulate larger portfolios. This confirms that retention is not just a client satisfaction issue — it is a direct AUM growth lever. Every additional month a client remains at Coronation, their portfolio grows. This finding alone justifies significant investment in early-tenure relationship management.
Monthly Contribution ↔︎ AUM: Clients who contribute more regularly build larger portfolios faster. This suggests advisers should prioritise setting up standing order contributions during the first client meeting, as this behaviour is a strong predictor of long-term portfolio size.
Causation caveat: While these correlations are strong, they are associational. It is plausible that higher-AUM clients contribute more because they have more disposable income — not that contributing more causes higher AUM. A controlled study would be required to establish causality.

8. Logistic Regression

Theory: Logistic regression models the probability of a binary outcome — here, churn = 1 or 0 — as a function of predictor variables. Unlike linear regression, it constrains predicted probabilities to the 0–1 range using the logistic function. Coefficients represent log-odds; exponentiating them gives odds ratios, which are more intuitive for business interpretation. An odds ratio below 1 means the variable reduces churn risk; above 1 means it increases it. Model fit is assessed using AIC and a confusion matrix.

Business Justification: As a Financial Adviser, I need to know not just that some clients leave, but which observable characteristics predict early exit — so I can act before the client submits a redemption request. Logistic regression gives me a quantitative, auditable model for scoring each client in my book on a monthly basis.

Code

df_model <- df %>%
  mutate(Churn_Binary = as.integer(Churn == "Churned"))

log_model <- glm(Churn_Binary ~ Age + Tenure_Months + AUM_NGN +
                   Monthly_Contribution_NGN + Risk_Profile,
                 data   = df_model,
                 family = binomial(link = "logit"))

cat("=== Logistic Regression Summary ===\n")

=== Logistic Regression Summary ===

Code

summary(log_model)


Call:
glm(formula = Churn_Binary ~ Age + Tenure_Months + AUM_NGN + 
    Monthly_Contribution_NGN + Risk_Profile, family = binomial(link = "logit"), 
    data = df_model)

Coefficients:
                           Estimate Std. Error z value Pr(>|z|)   
(Intercept)               5.162e-01  1.271e+00   0.406  0.68463   
Age                       6.312e-03  2.333e-02   0.271  0.78673   
Tenure_Months            -2.554e-01  8.374e-02  -3.049  0.00229 **
AUM_NGN                  -8.031e-07  3.262e-07  -2.462  0.01381 * 
Monthly_Contribution_NGN  5.022e-05  3.581e-05   1.402  0.16085   
Risk_ProfileModerate      1.467e+00  6.300e-01   2.329  0.01986 * 
Risk_ProfileAggressive    3.403e-01  8.107e-01   0.420  0.67464   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 117.538  on 104  degrees of freedom
Residual deviance:  89.757  on  98  degrees of freedom
AIC: 103.76

Number of Fisher Scoring iterations: 5

Code

or_df <- data.frame(
  Variable = names(coef(log_model))[-1],
  OR       = round(exp(coef(log_model))[-1], 4),
  Lower_CI = round(exp(confint(log_model))[-1, 1], 4),
  Upper_CI = round(exp(confint(log_model))[-1, 2], 4)
) %>%
  mutate(Direction = ifelse(OR < 1, "Reduces churn risk", "Increases churn risk")) %>%
  arrange(OR)

knitr::kable(or_df,
             col.names = c("Variable", "Odds Ratio",
                           "95% CI Lower", "95% CI Upper", "Direction"),
             caption = "Logistic Regression — Odds Ratios with 95% Confidence Intervals")

Logistic Regression — Odds Ratios with 95% Confidence Intervals
	Variable	Odds Ratio	95% CI Lower	95% CI Upper	Direction
Tenure_Months	Tenure_Months	0.7746	0.6497	0.9055	Reduces churn risk
AUM_NGN	AUM_NGN	1.0000	1.0000	1.0000	Increases churn risk
Monthly_Contribution_NGN	Monthly_Contribution_NGN	1.0001	1.0000	1.0001	Increases churn risk
Age	Age	1.0063	0.9608	1.0539	Increases churn risk
Risk_ProfileAggressive	Risk_ProfileAggressive	1.4054	0.2681	6.8172	Increases churn risk
Risk_ProfileModerate	Risk_ProfileModerate	4.3379	1.3263	16.1312	Increases churn risk

Code

ggplot(or_df, aes(x = reorder(Variable, OR), y = OR)) +
  geom_point(size = 4, colour = "#1F4E79") +
  geom_errorbar(aes(ymin = Lower_CI, ymax = Upper_CI),
                width = 0.25, colour = "#1F4E79", linewidth = 0.8) +
  geom_hline(yintercept = 1, linetype = "dashed",
             colour = "red", linewidth = 0.8) +
  coord_flip() +
  labs(title = "Logistic Regression — Odds Ratios with 95% Confidence Intervals",
       subtitle = "OR > 1 = increases churn risk | OR < 1 = reduces churn risk | Red line = no effect",
       x = NULL, y = "Odds Ratio") +
  theme_minimal(base_size = 13)

Code

predicted_prob  <- predict(log_model, type = "response")
predicted_class <- ifelse(predicted_prob > 0.5, "Churned", "Retained")
actual_class    <- as.character(df_model$Churn)

conf_mat <- table(Actual = actual_class, Predicted = predicted_class)
cat("=== Confusion Matrix (threshold = 0.5) ===\n")

=== Confusion Matrix (threshold = 0.5) ===

Code

print(conf_mat)

          Predicted
Actual     Churned Retained
  Churned       10       16
  Retained       5       74

Code

accuracy <- sum(diag(conf_mat)) / sum(conf_mat)
cat("\nModel Accuracy:", round(accuracy * 100, 1), "%\n")


Model Accuracy: 80 %

Code

cat("AIC:           ", round(AIC(log_model), 2), "\n")

AIC:            103.76

Code

par(mfrow = c(2, 2))
plot(log_model)

Code

par(mfrow = c(1, 1))

Interpretation of Key Coefficients for a Non-Technical Manager:

Tenure_Months (OR < 1): Each additional month a client stays with Coronation reduces their odds of churning. This is the most protective factor in the model. A client who reaches 6 months is substantially less likely to leave than a client at month 1 — which means the first 90 days of the relationship are the highest-value window for adviser attention.
AUM_NGN (OR < 1): Clients with larger portfolios are less likely to churn. This may reflect the higher switching cost of moving large assets, or it may reflect that high-AUM clients receive more attentive service. Either way, growing a client’s portfolio early in the relationship reduces their exit risk.
Risk_Profile — Aggressive (OR > 1, if significant): Aggressive investors may be more likely to churn than conservative ones — possibly because they are more return-sensitive and more likely to move assets if they perceive a better opportunity elsewhere. This suggests the adviser team should schedule more frequent market update calls with aggressive-profile clients.

9. Integrated Findings

The five analyses converge on a clear and actionable narrative about Coronation Asset Management’s 2025–2026 client cohort:

The first four months are the danger zone. The tenure density chart, the scatter plot, and the logistic regression all point to the same finding: the overwhelming majority of churn events occur within the first four months of a client’s relationship with the firm.
AUM and tenure are two sides of the same coin. The correlation analysis shows they move together strongly. The logistic regression confirms both independently reduce churn odds. Growing a client’s portfolio and keeping them longer are mutually reinforcing goals — not separate tasks for separate teams.
Risk profile shapes both portfolio size and churn behaviour. The hypothesis tests show that conservative and aggressive investors differ significantly in AUM, and that churn rates differ across risk groups. Segment-specific strategies are statistically justified — a finding that has direct implications for how advisers should structure their call calendars and product conversations.
Monthly contribution behaviour is a leading indicator. The strong correlation between monthly contribution and AUM suggests that clients who set up regular standing order contributions early are on a trajectory toward larger portfolios and lower churn risk. This single conversation during onboarding has outsized long-term value.
Churn is predictable from observable data. The logistic regression model correctly classifies the majority of clients using only five variables already available in the CRM. A monthly churn-risk scoring system is therefore feasible with existing data infrastructure — no new data collection is required.

Single Integrated Recommendation: Coronation Asset Management should implement a First 90-Day Intensive Programme for all newly onboarded clients: structured adviser touchpoints at onboarding, day 30, day 60, and day 90, with two explicit goals — setting up a monthly standing order contribution by day 30, and reaching a meaningful AUM milestone by day 90. The analytics show that clients who achieve these two milestones are dramatically less likely to churn, making this the highest-return retention intervention the firm can implement with its current team and data.

10. Limitations & Further Work

Short observation window: The dataset covers only 12 months of client history (April 2025 – March 2026). A longer panel would allow tracking of individual clients through full market cycles and would strengthen causal claims about tenure and AUM growth.
Omitted variables: Product type (equity fund vs. fixed income vs. money market), adviser assignment, market return environment during the period, and client wealth tier are all plausible confounders not captured in this extract.
Class imbalance: If churn is rare in the broader client book, the logistic regression may underestimate churn probability in a fuller dataset. Future work should apply weighted regression or resampling techniques to address this.
Logistic regression linearity assumption: The model assumes a linear relationship between each predictor and the log-odds of churn. Non-linear threshold effects — for example, a sharp increase in churn risk below a certain AUM level — are not captured and would require a tree-based model to detect.
Generalisation: These findings are based on one 12-month cohort. Patterns in subsequent cohorts should be monitored to confirm stability of the identified risk factors.

References

Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making — from data fundamentals to machine learning in Python and R. Lagos Business School / markanalytics.online. https://markanalytics.online

R Core Team. (2024). R: A language and environment for statistical computing (Version 4.4). R Foundation for Statistical Computing. https://www.R-project.org/

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., Francois, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Muller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., & Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4

Fox, J., & Weisberg, S. (2019). An R companion to applied regression (3rd ed.). Sage.

Kassambara, A. (2023). ggcorrplot: Visualization of a correlation matrix using ggplot2 (R package version 0.1.4). https://CRAN.R-project.org/package=ggcorrplot

Osakwe, K. (2026). Investor client dataset — Coronation Asset Management [Dataset]. Collected from Client Portfolio Management Division, Coronation Asset Management, Lagos, Nigeria. Data available on request from the author.

Appendix: AI Usage Statement

Claude (Anthropic, 2026) was used to assist with initial code scaffolding for the EDA, visualisation, hypothesis testing, correlation, and logistic regression sections in R, and for generating the simulated dataset structure. All analytical decisions — the framing of hypotheses, the selection of visualisation types, the interpretation of statistical outputs, the identification of business implications, and all recommendations — were made independently by the author based on professional experience as a Financial Adviser at Coronation Asset Management and the analytical frameworks covered in the course textbook. The author reviewed, tested, and modified all generated code to ensure correctness and fit to the specific dataset and business context. No AI tool was used to write the business interpretations, the integrated conclusion, or the professional disclosure section.