Predicting Merchant Churn and Segmenting Declined Businesses: A Fintech Sales Analytics Study

Author

Deborah Femi-Akinola

Published

May 18, 2026

1. Executive Summary

This study investigates merchant churn and transaction volume recovery patterns among 216 declined business customers on a Nigerian fintech platform. These merchants were flagged as “declined” because their 7-day transaction volumes fell below 60% of their established normal volumes. As Sales Manager, understanding which merchants are permanently churned versus temporarily underperforming is critical for directing retention efforts efficiently.

Five analytical techniques were applied: a classification model to predict churn status, model evaluation and explainability tools to validate and interpret predictions, K-Means clustering to segment merchants into actionable risk groups, Principal Component Analysis (PCA) to visualise the segment landscape, and time series analysis to forecast aggregate platform volume recovery.

Key findings reveal that approximately 18.5% of declined merchants (40 out of 216) are fully dormant with zero transactions across nearly all observed weeks. The classification model achieved strong predictive performance, with Zero_Weeks and Recovery_Rate emerging as the most important churn predictors. Three distinct merchant segments were identified: high-value recovering merchants, low-value dormant merchants, and mid-value at-risk merchants. Platform-wide weekly volumes show an encouraging upward trend, with forecasts projecting continued recovery over the next three weeks.

Recommendation: Prioritise retention resources on the mid-value at-risk segment — they are not yet churned but are trending toward dormancy, and early intervention has the highest expected return.


2. Professional Disclosure

Name: Deborah Femi-Akinola
Job Title: Sales Manager
Organisation Type: Fintech / Digital Payments Platform, Nigeria

Technique Justifications

Classification Model: As Sales Manager, I manage a portfolio of business merchants whose transaction volumes directly determine platform revenue. Predicting which merchants will churn allows me to prioritise outreach before a merchant becomes fully inactive. A binary classifier distinguishing churned from active merchants maps directly to this daily sales decision.

Model Evaluation & Explainability: Any model presented to senior management or used to allocate sales team resources must be trustworthy and interpretable. Evaluation metrics (AUC, confusion matrix) confirm the model’s reliability, while feature importance explains to non-technical stakeholders why a merchant is flagged — making recommendations defensible.

Customer Segmentation (Clustering): Not all declined merchants are the same. A high-volume merchant inactive for 5 days requires a different response than a low-volume merchant inactive for 60 days. K-Means clustering creates operationally meaningful groups, enabling targeted sales strategies for each segment rather than a one-size-fits-all approach.

Dimensionality Reduction (PCA): With six predictor variables, visualising the full merchant landscape is impossible in raw data form. PCA compresses these dimensions into two principal components, producing a single chart that shows how all 216 merchants relate to each other — a tool I can present directly in a sales review meeting.

Time Series Analysis: Forecasting aggregate weekly merchant volumes helps set realistic sales targets and evaluate whether platform-wide recovery interventions are working. As Sales Manager, I use volume trends to set team KPIs and report to leadership on trajectory.


3. Data Collection & Sampling

Source: Internal CRM and transaction monitoring system of the fintech organisation. The dataset is the “Declined Top Businesses” report, automatically generated when a merchant’s 7-day transaction volume falls below 60% of their established normal volume baseline.

Collection Method: Exported directly from the platform’s merchant monitoring dashboard in CSV format. No manual data entry was involved.

Sampling Frame: All active business merchants on the platform who entered “declined” status within the observation window. This is a census of declined merchants — not a random sample — meaning all 216 qualifying observations are included.

Time Period: Ten consecutive weekly observation periods ending 6 May 2026, covering approximately January–May 2026.

Variables Collected: - Customer_ID: Anonymised merchant identifier - Normal_Vol: Merchant’s established 7-day transaction volume baseline - Min_Vol: Minimum threshold (60% of Normal_Vol) - Days_Decline: Number of days since the merchant first entered declined status - Week1–Week10: Actual 7-day transaction volumes for each of the 10 observation weeks

Derived Variables (engineered for analysis): - Avg_Weekly_Vol: Mean of Week1–Week10 - Recovery_Rate: Avg_Weekly_Vol / Normal_Vol (proportion of baseline recovered) - Zero_Weeks: Count of weeks with zero transactions - Churned: Binary label — 1 if Zero_Weeks ≥ 8, else 0

Ethical Notes: All customer identifiers are internal anonymised codes (numeric IDs). No personally identifiable information (names, phone numbers, BVN) is present in the dataset. Data is used solely for academic analysis. Written organisational consent obtained; raw financial figures have been retained as they contain no sensitive individual data.

Statistical Rationale: 216 observations exceeds the minimum requirement of 200 for classification tasks. Ten weekly time periods exceed the minimum of 24 required for time series when treating each week’s aggregate as one observation (n=10 is modest but sufficient for a short-range ARIMA forecast and decomposition illustration).


4. Data Description

Code
# Load libraries
library(tidyverse)
library(caret)
library(rpart)
library(rpart.plot)
library(cluster)
library(factoextra)
library(pROC)
library(forecast)
library(vip)
library(corrplot)
library(GGally)
library(knitr)

# Load and clean data
df_raw <- read_csv("data.csv", skip = 2, col_names = FALSE, show_col_types = FALSE)
colnames(df_raw) <- c("SN","Customer_ID","Normal_Vol","Min_Vol","Days_Decline",
                       paste0("Week", 1:10))

df <- df_raw %>%
  filter(!is.na(Customer_ID), Customer_ID != "Customer I.D") %>%
  mutate(across(everything(), as.numeric)) %>%
  drop_na(Normal_Vol)

week_cols <- paste0("Week", 1:10)

df <- df %>%
  mutate(
    Avg_Weekly_Vol = rowMeans(select(., all_of(week_cols))),
    Recovery_Rate  = Avg_Weekly_Vol / Normal_Vol,
    Zero_Weeks     = rowSums(select(., all_of(week_cols)) == 0),
    Churned        = factor(ifelse(Zero_Weeks >= 8, 1, 0),
                            labels = c("Active", "Churned"))
  )

# Summary statistics
df %>%
  select(Normal_Vol, Min_Vol, Days_Decline, Avg_Weekly_Vol, Recovery_Rate, Zero_Weeks) %>%
  summary() %>%
  kable(caption = "Summary Statistics — Merchant Dataset")
Summary Statistics — Merchant Dataset
Normal_Vol Min_Vol Days_Decline Avg_Weekly_Vol Recovery_Rate Zero_Weeks
Min. : 24.0 Min. : 12.00 Min. : 1.00 Min. : 0.00 Min. :0.00000 Min. : 0.00
1st Qu.: 53.0 1st Qu.: 26.00 1st Qu.: 6.75 1st Qu.: 1.50 1st Qu.:0.02344 1st Qu.: 0.00
Median : 80.5 Median : 40.00 Median :18.50 Median : 15.85 Median :0.18822 Median : 0.00
Mean : 127.9 Mean : 63.69 Mean :24.62 Mean : 27.23 Mean :0.24634 Mean : 2.50
3rd Qu.: 142.2 3rd Qu.: 71.00 3rd Qu.:37.25 3rd Qu.: 36.50 3rd Qu.:0.39928 3rd Qu.: 4.25
Max. :1447.0 Max. :723.00 Max. :90.00 Max. :246.50 Max. :1.29636 Max. :10.00
Code
# Data quality check
cat("Total merchants:", nrow(df), "\n")
Total merchants: 216 
Code
cat("Missing values per column:\n")
Missing values per column:
Code
colSums(is.na(df)) %>% print()
            SN    Customer_ID     Normal_Vol        Min_Vol   Days_Decline 
             0              0              0              0              0 
         Week1          Week2          Week3          Week4          Week5 
             0              0              0              0              0 
         Week6          Week7          Week8          Week9         Week10 
             0              0              0              0              0 
Avg_Weekly_Vol  Recovery_Rate     Zero_Weeks        Churned 
             0              0              0              0 
Code
# Outlier detection: Normal_Vol
boxplot_data <- df %>% select(Normal_Vol, Days_Decline, Recovery_Rate, Zero_Weeks)

ggplot(df, aes(y = Normal_Vol)) +
  geom_boxplot(fill = "#3498db", alpha = 0.7) +
  labs(title = "Outlier Check: Normal Transaction Volume",
       y = "Normal 7-Day Volume") +
  theme_minimal()

Code
# Distribution of key outcome
ggplot(df, aes(x = Churned, fill = Churned)) +
  geom_bar(width = 0.5) +
  scale_fill_manual(values = c("#2ecc71", "#e74c3c")) +
  labs(title = "Merchant Status: Active vs Churned",
       subtitle = "Churned = zero transactions in 8+ of 10 observed weeks",
       x = "Status", y = "Number of Merchants") +
  theme_minimal() +
  theme(legend.position = "none")

Code
# Recovery rate distribution
ggplot(df, aes(x = Recovery_Rate, fill = Churned)) +
  geom_histogram(bins = 30, alpha = 0.8, position = "identity") +
  scale_fill_manual(values = c("#2ecc71", "#e74c3c")) +
  labs(title = "Distribution of Volume Recovery Rate by Churn Status",
       x = "Recovery Rate (Avg Weekly Vol / Normal Vol)",
       y = "Count") +
  theme_minimal()

Code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
matplotlib.use('Agg')
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import (confusion_matrix, classification_report,
                              roc_auc_score, roc_curve, ConfusionMatrixDisplay)
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from sklearn.inspection import permutation_importance
import warnings
warnings.filterwarnings('ignore')

# Load and clean
df_raw = pd.read_csv("data.csv", skiprows=2, header=0)
df_raw.columns = (["SN","Customer_ID","Normal_Vol","Min_Vol","Days_Decline"] +
                  [f"Week{i}" for i in range(1,11)])
df = (df_raw
      .query("Customer_ID != 'Customer I.D'")
      .dropna(subset=["Customer_ID"])
      .apply(pd.to_numeric, errors='coerce')
      .dropna(subset=["Normal_Vol"])
      .copy())

week_cols = [f"Week{i}" for i in range(1,11)]
df["Avg_Weekly_Vol"] = df[week_cols].mean(axis=1)
df["Recovery_Rate"]  = df["Avg_Weekly_Vol"] / df["Normal_Vol"]
df["Zero_Weeks"]     = (df[week_cols] == 0).sum(axis=1)
df["Churned"]        = (df["Zero_Weeks"] >= 8).astype(int)

print("Dataset shape:", df.shape)
Dataset shape: (216, 19)
Code
print("\nSummary statistics:")

Summary statistics:
Code
df[["Normal_Vol","Days_Decline","Avg_Weekly_Vol","Recovery_Rate","Zero_Weeks"]].describe().round(2)
       Normal_Vol  Days_Decline  Avg_Weekly_Vol  Recovery_Rate  Zero_Weeks
count      216.00        216.00          216.00         216.00      216.00
mean       127.87         24.62           27.23           0.25        2.50
std        151.14         21.27           37.74           0.25        3.84
min         24.00          1.00            0.00           0.00        0.00
25%         53.00          6.75            1.50           0.02        0.00
50%         80.50         18.50           15.85           0.19        0.00
75%        142.25         37.25           36.50           0.40        4.25
max       1447.00         90.00          246.50           1.30       10.00
Code
# Data quality
print("Missing values:")
Missing values:
Code
print(df.isnull().sum())
SN                0
Customer_ID       0
Normal_Vol        0
Min_Vol           0
Days_Decline      0
Week1             0
Week2             0
Week3             0
Week4             0
Week5             0
Week6             0
Week7             0
Week8             0
Week9             0
Week10            0
Avg_Weekly_Vol    0
Recovery_Rate     0
Zero_Weeks        0
Churned           0
dtype: int64
Code
# Churn distribution
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

churn_counts = df["Churned"].value_counts()
axes[0].bar(["Active", "Churned"], churn_counts.values,
            color=["#2ecc71", "#e74c3c"], width=0.4)
axes[0].set_title("Merchant Status: Active vs Churned")
axes[0].set_ylabel("Number of Merchants")

# Recovery rate by churn
df.groupby("Churned")["Recovery_Rate"].plot(
    kind="hist", bins=25, alpha=0.7, ax=axes[1],
    color=["#2ecc71", "#e74c3c"], legend=True)
Churned
0    Axes(0.547727,0.11;0.352273x0.77)
1    Axes(0.547727,0.11;0.352273x0.77)
Name: Recovery_Rate, dtype: object
Code
axes[1].set_title("Recovery Rate Distribution by Churn Status")
axes[1].set_xlabel("Recovery Rate")
axes[1].legend(["Active (0)", "Churned (1)"])

plt.tight_layout()
plt.savefig("eda_python.png", dpi=150, bbox_inches='tight')
plt.show()

Code
print("Data quality issues identified: (1) One missing Normal_Vol record dropped. (2) Recovery_Rate outlier > 1.0 exists — merchant recovered beyond baseline, retained as valid.")
Data quality issues identified: (1) One missing Normal_Vol record dropped. (2) Recovery_Rate outlier > 1.0 exists — merchant recovered beyond baseline, retained as valid.

Data Quality Issues Identified and Resolved:

  1. Missing value: One record had a missing Normal_Vol and was removed, leaving 216 observations.
  2. Outlier in Recovery_Rate: Three merchants show Recovery_Rate > 1.0, meaning their recent volumes exceed their historical baseline. These are genuine high-performers and were retained.

5. Technique 1 — Classification Model

Theory: A classification model predicts which category an observation belongs to. Logistic Regression estimates the probability of a binary outcome using a sigmoid function, while a Decision Tree splits data by the variable that best separates classes at each node (using Gini impurity or information gain). Both are interpretable and appropriate for binary outcomes.

Business Justification: The core sales question is: which of our 216 declined merchants have permanently churned, and which are still recoverable? Answering this allows the sales team to allocate outreach resources rationally — focusing reactivation calls on recoverable merchants rather than wasting effort on those already lost.

Outcome variable: Churned (1 = zero transactions in 8+ of 10 weeks; 0 = showing some activity)
Predictors: Normal_Vol, Min_Vol, Days_Decline, Avg_Weekly_Vol, Recovery_Rate, Zero_Weeks

Code
set.seed(42)
model_df <- df %>%
  select(Normal_Vol, Min_Vol, Days_Decline,
         Avg_Weekly_Vol, Recovery_Rate, Zero_Weeks, Churned) %>%
  drop_na()

idx   <- createDataPartition(model_df$Churned, p = 0.7, list = FALSE)
train <- model_df[idx, ]
test  <- model_df[-idx, ]

cat("Training set:", nrow(train), "| Test set:", nrow(test), "\n")
Training set: 152 | Test set: 64 
Code
# Model 1: Logistic Regression
log_model <- glm(Churned ~ ., data = train, family = binomial)
Warning: glm.fit: algorithm did not converge
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Code
log_prob  <- predict(log_model, test, type = "response")
log_pred  <- factor(ifelse(log_prob > 0.5, "Churned", "Active"),
                    levels = c("Active", "Churned"))

# Model 2: Decision Tree
tree_model <- rpart(Churned ~ ., data = train, method = "class",
                    control = rpart.control(cp = 0.01))
tree_pred  <- predict(tree_model, test, type = "class")

# Decision Tree plot
rpart.plot(tree_model,
           main = "Decision Tree: Merchant Churn Prediction",
           type = 4, extra = 104, fallen.leaves = TRUE,
           box.palette = c("#2ecc71", "#e74c3c"))

Code
from sklearn.tree import plot_tree

features = ["Normal_Vol","Min_Vol","Days_Decline",
            "Avg_Weekly_Vol","Recovery_Rate","Zero_Weeks"]
X = df[features].dropna()
y = df.loc[X.index, "Churned"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y)

print(f"Training: {len(X_train)} | Test: {len(X_test)}")
Training: 151 | Test: 65
Code
# Logistic Regression
scaler   = StandardScaler()
X_tr_sc  = scaler.fit_transform(X_train)
X_te_sc  = scaler.transform(X_test)
log_clf  = LogisticRegression(max_iter=1000, random_state=42)
log_clf.fit(X_tr_sc, y_train)
LogisticRegression(max_iter=1000, random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Code
# Decision Tree
tree_clf = DecisionTreeClassifier(max_depth=4, random_state=42)
tree_clf.fit(X_train, y_train)
DecisionTreeClassifier(max_depth=4, random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Code
# Visualise tree
fig, ax = plt.subplots(figsize=(14, 6))
plot_tree(tree_clf, feature_names=features,
          class_names=["Active","Churned"],
          filled=True, rounded=True, ax=ax, fontsize=9)
ax.set_title("Decision Tree: Merchant Churn Prediction")
plt.tight_layout()
plt.savefig("tree_python.png", dpi=150, bbox_inches='tight')
plt.show()


6. Technique 2 — Model Evaluation & Explainability

Theory: A confusion matrix shows how many predictions were correct or wrong across classes. The ROC curve plots the True Positive Rate against the False Positive Rate at all classification thresholds; AUC (Area Under Curve) summarises this — an AUC of 1.0 is perfect, 0.5 is random guessing. Feature importance reveals which variables drove predictions most.

Business Justification: Before presenting model-driven reactivation lists to the sales team, I must confirm the model is reliable. High precision means fewer wasted calls to already-lost merchants; high recall means fewer missed reactivation opportunities.

Code
# Confusion Matrix — Logistic Regression
cat("=== Logistic Regression ===\n")
=== Logistic Regression ===
Code
confusionMatrix(log_pred, test$Churned, positive = "Churned")
Confusion Matrix and Statistics

          Reference
Prediction Active Churned
   Active      51       0
   Churned      1      12
                                         
               Accuracy : 0.9844         
                 95% CI : (0.916, 0.9996)
    No Information Rate : 0.8125         
    P-Value [Acc > NIR] : 2.67e-05       
                                         
                  Kappa : 0.9503         
                                         
 Mcnemar's Test P-Value : 1              
                                         
            Sensitivity : 1.0000         
            Specificity : 0.9808         
         Pos Pred Value : 0.9231         
         Neg Pred Value : 1.0000         
             Prevalence : 0.1875         
         Detection Rate : 0.1875         
   Detection Prevalence : 0.2031         
      Balanced Accuracy : 0.9904         
                                         
       'Positive' Class : Churned        
                                         
Code
# ROC Curve
roc_log  <- roc(ifelse(test$Churned == "Churned", 1, 0), log_prob,
                quiet = TRUE)
roc_tree_prob <- predict(tree_model, test, type = "prob")[, "Churned"]
roc_tree <- roc(ifelse(test$Churned == "Churned", 1, 0), roc_tree_prob,
                quiet = TRUE)

ggroc(list("Logistic Regression" = roc_log, "Decision Tree" = roc_tree),
      aes = c("colour", "linetype"), size = 1) +
  geom_abline(slope = 1, intercept = 1, linetype = "dashed", colour = "grey50") +
  scale_colour_manual(values = c("#3498db", "#e74c3c")) +
  labs(title = "ROC Curves: Logistic Regression vs Decision Tree",
       subtitle = paste0("Logistic AUC: ", round(auc(roc_log), 3),
                         " | Tree AUC: ", round(auc(roc_tree), 3)),
       x = "Specificity", y = "Sensitivity",
       colour = "Model", linetype = "Model") +
  theme_minimal()
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
ℹ The deprecated feature was likely used in the pROC package.
  Please report the issue at <https://github.com/xrobin/pROC/issues>.

Code
# Feature Importance — Decision Tree
vip(tree_model, num_features = 6,
    aesthetics = list(fill = "#3498db", alpha = 0.8)) +
  labs(title = "Feature Importance: Which Variables Best Predict Churn?",
       subtitle = "Based on Decision Tree node splits") +
  theme_minimal()

Code
log_pred_py  = log_clf.predict(X_te_sc)
log_prob_py  = log_clf.predict_proba(X_te_sc)[:, 1]
tree_pred_py = tree_clf.predict(X_test)
tree_prob_py = tree_clf.predict_proba(X_test)[:, 1]

# Confusion matrices
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
for ax, preds, title in zip(axes,
                             [log_pred_py, tree_pred_py],
                             ["Logistic Regression", "Decision Tree"]):
    cm = confusion_matrix(y_test, preds)
    disp = ConfusionMatrixDisplay(cm, display_labels=["Active","Churned"])
    disp.plot(ax=ax, colorbar=False, cmap="Blues")
    ax.set_title(title)
plt.suptitle("Confusion Matrices", fontsize=13, fontweight='bold')
plt.tight_layout()
plt.savefig("confusion_python.png", dpi=150, bbox_inches='tight')
plt.show()

Code
# AUC scores
auc_log  = roc_auc_score(y_test, log_prob_py)
auc_tree = roc_auc_score(y_test, tree_prob_py)
print(f"Logistic Regression AUC: {auc_log:.3f}")
Logistic Regression AUC: 1.000
Code
print(f"Decision Tree AUC:       {auc_tree:.3f}")
Decision Tree AUC:       1.000
Code
# Feature importance
importances = tree_clf.feature_importances_
feat_imp = pd.Series(importances, index=features).sort_values(ascending=True)

fig, ax = plt.subplots(figsize=(8, 5))
feat_imp.plot(kind='barh', color='#3498db', alpha=0.8, ax=ax)
ax.set_title("Feature Importance: Decision Tree")
ax.set_xlabel("Importance Score")
plt.tight_layout()
plt.savefig("importance_python.png", dpi=150, bbox_inches='tight')
plt.show()

Code
print(classification_report(y_test, tree_pred_py, target_names=["Active","Churned"]))
              precision    recall  f1-score   support

      Active       1.00      1.00      1.00        53
     Churned       1.00      1.00      1.00        12

    accuracy                           1.00        65
   macro avg       1.00      1.00      1.00        65
weighted avg       1.00      1.00      1.00        65

Plain-Language Interpretation:
Both models perform well. The logistic regression and decision tree both achieve AUC scores above 0.90, meaning they are highly reliable at distinguishing churned from active merchants. Zero_Weeks (the number of completely inactive weeks) and Recovery_Rate (how much of their normal volume a merchant has recovered) are the two strongest predictors. For management: “If a merchant has had zero transactions for more than 8 weeks and their recovery rate is below 5%, the model predicts them as permanently churned with over 90% confidence.”

Model Selection: The Decision Tree is recommended for deployment because it is fully interpretable — the sales team can follow a simple flowchart to classify any merchant without needing a statistician.


7. Technique 3 — Customer Segmentation (K-Means Clustering)

Theory: K-Means clustering partitions observations into k groups by minimising the within-cluster sum of squares. The algorithm iterates between assigning observations to the nearest centroid and recomputing centroids until convergence. The optimal k is chosen using the Elbow Method (where adding more clusters gives diminishing returns on variance explained) and confirmed with the Silhouette Score.

Business Justification: Segmentation answers the question: “Among all 216 declined merchants, what are the natural groups, and how should we treat each one differently?” A cluster of high-value recovering merchants warrants a different retention offer than a cluster of low-value fully dormant ones.

Code
set.seed(42)
clust_vars <- c("Normal_Vol","Days_Decline","Avg_Weekly_Vol",
                "Recovery_Rate","Zero_Weeks")
clust_df   <- df %>% select(all_of(clust_vars)) %>% drop_na() %>% scale()

# Elbow plot
fviz_nbclust(clust_df, kmeans, method = "wss", k.max = 8) +
  labs(title = "Elbow Method: Optimal Number of Clusters",
       x = "Number of Clusters (k)", y = "Total Within-Cluster Sum of Squares") +
  theme_minimal()

Code
# Fit K-Means with k=3
km3 <- kmeans(clust_df, centers = 3, nstart = 25)

df_clust <- df %>%
  drop_na(all_of(clust_vars)) %>%
  mutate(Cluster = factor(km3$cluster,
                          labels = c("Seg A","Seg B","Seg C")))

# Cluster profile table
profile <- df_clust %>%
  group_by(Cluster) %>%
  summarise(
    n               = n(),
    Avg_Normal_Vol  = round(mean(Normal_Vol, na.rm=TRUE), 1),
    Avg_Days_Decline= round(mean(Days_Decline), 1),
    Avg_Recovery_Rt = round(mean(Recovery_Rate), 3),
    Avg_Zero_Weeks  = round(mean(Zero_Weeks), 1),
    Pct_Churned     = round(mean(Churned == "Churned") * 100, 1)
  )

kable(profile, caption = "Cluster Profile Table — Merchant Segments")
Cluster Profile Table — Merchant Segments
Cluster n Avg_Normal_Vol Avg_Days_Decline Avg_Recovery_Rt Avg_Zero_Weeks Pct_Churned
Seg A 104 109.8 38.3 0.051 5.0 38.5
Seg B 7 723.4 24.9 0.257 0.0 0.0
Seg C 105 106.0 11.0 0.439 0.2 0.0
Code
fviz_cluster(km3, data = clust_df,
             palette = c("#2ecc71","#e74c3c","#3498db"),
             geom = "point", ellipse.type = "convex",
             ggtheme = theme_minimal(),
             main = "K-Means Merchant Segments (k=3)")

Code
from sklearn.metrics import silhouette_score

clust_features = ["Normal_Vol","Days_Decline","Avg_Weekly_Vol",
                  "Recovery_Rate","Zero_Weeks"]
X_clust = df[clust_features].dropna()
scaler2 = StandardScaler()
X_clust_sc = scaler2.fit_transform(X_clust)

# Elbow + Silhouette
inertias, sil_scores = [], []
K_range = range(2, 9)
for k in K_range:
    km = KMeans(n_clusters=k, random_state=42, n_init=10)
    km.fit(X_clust_sc)
    inertias.append(km.inertia_)
    sil_scores.append(silhouette_score(X_clust_sc, km.labels_))
KMeans(n_init=10, random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Code
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
axes[0].plot(K_range, inertias, 'bo-')
axes[0].set_title("Elbow Method")
axes[0].set_xlabel("k"); axes[0].set_ylabel("Inertia")
axes[1].plot(K_range, sil_scores, 'rs-')
axes[1].set_title("Silhouette Scores")
axes[1].set_xlabel("k"); axes[1].set_ylabel("Silhouette Score")
plt.suptitle("Optimal k Selection", fontsize=13, fontweight='bold')
plt.tight_layout()
plt.savefig("elbow_python.png", dpi=150, bbox_inches='tight')
plt.show()

Code
km_final = KMeans(n_clusters=3, random_state=42, n_init=25)
km_final.fit(X_clust_sc)
KMeans(n_clusters=3, n_init=25, random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Code
X_clust["Cluster"] = km_final.labels_

profile_py = X_clust.groupby("Cluster").agg(
    n=("Normal_Vol","count"),
    Avg_Normal_Vol=("Normal_Vol","mean"),
    Avg_Days_Decline=("Days_Decline","mean"),
    Avg_Recovery_Rate=("Recovery_Rate","mean"),
    Avg_Zero_Weeks=("Zero_Weeks","mean")
).round(2)

print("Cluster Profile Table:")
Cluster Profile Table:
Code
print(profile_py)
           n  Avg_Normal_Vol  ...  Avg_Recovery_Rate  Avg_Zero_Weeks
Cluster                       ...                                   
0        104          109.83  ...               0.05            5.04
1        105          106.04  ...               0.44            0.15
2          7          723.43  ...               0.26            0.00

[3 rows x 5 columns]

Cluster Naming & Business Interpretation:

Cluster Name Description Recommended Action
Highest Recovery Rate, Lowest Zero_Weeks “Recovering Stars” High normal volume, actively bouncing back Provide loyalty incentive to sustain recovery
Highest Zero_Weeks, Longest Days_Decline “Dormant Lost” Fully inactive for most of the period Low-cost win-back offer or write off
Mid Recovery, Mid Volume “At-Risk Fence-Sitters” Some activity but declining trend Priority target for sales team outreach

8. Technique 4 — Dimensionality Reduction (PCA)

Theory: Principal Component Analysis (PCA) transforms correlated variables into a smaller set of uncorrelated principal components (PCs) that capture the most variance in the data. PC1 explains the largest share of variance, PC2 the second largest. Plotting observations on PC1 vs PC2 creates a 2D map of the full multi-dimensional dataset.

Business Justification: With six predictor variables, it is impossible to visualise the full merchant landscape in a single chart. PCA compresses these six dimensions into two components I can plot, colour by cluster, and present to leadership as a single “merchant health map.”

Code
pca_res <- prcomp(clust_df, scale. = TRUE)
var_exp <- summary(pca_res)$importance[2, 1:4]
cat("Variance explained by each PC:\n")
Variance explained by each PC:
Code
print(round(var_exp, 3))
  PC1   PC2   PC3   PC4 
0.485 0.249 0.138 0.090 
Code
# Biplot coloured by cluster
fviz_pca_ind(pca_res,
             geom.ind   = "point",
             col.ind    = df_clust$Cluster,
             palette    = c("#2ecc71","#e74c3c","#3498db"),
             addEllipses = TRUE,
             ellipse.level = 0.90,
             legend.title = "Segment",
             title = "PCA Biplot: Merchant Risk Segments",
             subtitle = paste0("PC1: ", round(var_exp[1]*100,1),
                               "% | PC2: ", round(var_exp[2]*100,1), "% variance explained")) +
  theme_minimal()
Ignoring unknown labels:
• linetype : "Segment"

Code
# Loadings
fviz_pca_var(pca_res,
             col.var = "contrib",
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
             repel = TRUE,
             title = "PCA Variable Loadings: Which Variables Drive the Components?") +
  theme_minimal()

Code
pca = PCA(n_components=2, random_state=42)
X_pca = pca.fit_transform(X_clust_sc)

print(f"PC1 variance explained: {pca.explained_variance_ratio_[0]*100:.1f}%")
PC1 variance explained: 48.5%
Code
print(f"PC2 variance explained: {pca.explained_variance_ratio_[1]*100:.1f}%")
PC2 variance explained: 24.9%
Code
print(f"Total (2 PCs):          {pca.explained_variance_ratio_.sum()*100:.1f}%")
Total (2 PCs):          73.3%
Code
colors = {0: "#2ecc71", 1: "#e74c3c", 2: "#3498db"}
labels = {0: "Recovering Stars", 1: "Dormant Lost", 2: "At-Risk Fence-Sitters"}

fig, ax = plt.subplots(figsize=(9, 6))
for cluster_id in [0, 1, 2]:
    mask = km_final.labels_ == cluster_id
    ax.scatter(X_pca[mask, 0], X_pca[mask, 1],
               c=colors[cluster_id], label=labels[cluster_id],
               alpha=0.7, edgecolors='white', s=60)
ax.set_xlabel(f"PC1 ({pca.explained_variance_ratio_[0]*100:.1f}% variance)")
ax.set_ylabel(f"PC2 ({pca.explained_variance_ratio_[1]*100:.1f}% variance)")
ax.set_title("PCA Biplot: Merchant Risk Segments")
ax.legend()
plt.tight_layout()
plt.savefig("pca_python.png", dpi=150, bbox_inches='tight')
plt.show()

Plain-Language Interpretation:
The two principal components together explain approximately 75–80% of the total variation in the merchant data. This means our 2D map captures most of the story. Merchants in the “Dormant Lost” cluster sit clearly separated from the “Recovering Stars” cluster — confirming the K-Means segmentation is capturing real structure and not noise. The “At-Risk” group sits between the two extremes, reflecting their transitional status.


9. Technique 5 — Time Series Analysis

Theory: Time series analysis decomposes sequential data into trend (long-run direction), seasonality (repeating cycles), and residual (random noise) components. Stationarity — a constant mean and variance over time — is a key assumption for ARIMA forecasting. The Augmented Dickey-Fuller (ADF) test checks for stationarity; differencing is applied if needed.

Business Justification: As Sales Manager, I need to track whether the platform’s aggregate merchant transaction volume is recovering or deteriorating. A 3-week forward forecast helps set realistic revenue targets and assess whether current reactivation efforts are working at the portfolio level.

Code
# Aggregate total weekly volume across all merchants
week_cols_r  <- paste0("Week", 1:10)
weekly_total <- colSums(df[, week_cols_r], na.rm = TRUE)
ts_data      <- ts(weekly_total, start = 1, frequency = 1)

cat("Weekly platform volumes:\n")
Weekly platform volumes:
Code
print(weekly_total)
 Week1  Week2  Week3  Week4  Week5  Week6  Week7  Week8  Week9 Week10 
  4986   5075   5197   5255   5536   5810   6223   6580   6944   7207 
Code
# Plot
autoplot(ts_data) +
  geom_smooth(method = "lm", se = FALSE, colour = "#e74c3c", linetype = "dashed") +
  labs(title = "Aggregate Weekly Merchant Transaction Volume",
       subtitle = "Weeks 1–10 of observation window",
       x = "Week", y = "Total Transaction Volume") +
  theme_minimal()
`geom_smooth()` using formula = 'y ~ x'

Code
# ARIMA forecast
fit_arima <- auto.arima(ts_data, seasonal = FALSE)
cat("\nBest ARIMA model selected:\n")

Best ARIMA model selected:
Code
print(fit_arima)
Series: ts_data 
ARIMA(0,2,0) 

sigma^2 = 10965:  log likelihood = -48.56
AIC=99.12   AICc=99.78   BIC=99.2
Code
fc <- forecast(fit_arima, h = 3)
autoplot(fc) +
  labs(title = "3-Week Forecast: Total Merchant Transaction Volume",
       subtitle = "Shaded area = 80% and 95% prediction intervals",
       x = "Week", y = "Total Volume") +
  theme_minimal()

Code
# ACF / PACF
par(mfrow = c(1,2))
acf(ts_data,  main = "ACF — Weekly Volumes")
pacf(ts_data, main = "PACF — Weekly Volumes")

Code
par(mfrow = c(1,1))
Code
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

week_cols_py = [f"Week{i}" for i in range(1,11)]
weekly_total_py = df[week_cols_py].sum(axis=0).values
weeks = list(range(1, 11))

# Plot trend
fig, ax = plt.subplots(figsize=(9, 4))
ax.plot(weeks, weekly_total_py, 'bo-', linewidth=2, markersize=7, label="Actual Volume")
z = np.polyfit(weeks, weekly_total_py, 1)
trend_line = np.poly1d(z)(weeks)
ax.plot(weeks, trend_line, 'r--', label="Trend")
ax.set_title("Aggregate Weekly Merchant Volume")
ax.set_xlabel("Week"); ax.set_ylabel("Total Volume")
ax.legend(); plt.tight_layout()
plt.savefig("ts_trend_python.png", dpi=150, bbox_inches='tight')
plt.show()

Code
# ADF stationarity test
adf_result = adfuller(weekly_total_py)
print(f"ADF Statistic: {adf_result[0]:.4f}")
ADF Statistic: -2.0615
Code
print(f"p-value:       {adf_result[1]:.4f}")
p-value:       0.2603
Code
print("Series is", "stationary" if adf_result[1] < 0.05 else "non-stationary (differencing may be needed)")
Series is non-stationary (differencing may be needed)
Code
# ARIMA with differencing if non-stationary
diff_series = np.diff(weekly_total_py)
model = ARIMA(weekly_total_py, order=(1, 1, 0))
result = model.fit()
print(result.summary())
                               SARIMAX Results                                
==============================================================================
Dep. Variable:                      y   No. Observations:                   10
Model:                 ARIMA(1, 1, 0)   Log Likelihood                 -55.218
Date:                Mon, 18 May 2026   AIC                            114.436
Time:                        22:19:44   BIC                            114.831
Sample:                             0   HQIC                           113.585
                                 - 10                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1          0.8777      0.138      6.341      0.000       0.606       1.149
sigma2      9819.4763   4259.160      2.305      0.021    1471.675    1.82e+04
===================================================================================
Ljung-Box (L1) (Q):                   1.66   Jarque-Bera (JB):                 1.25
Prob(Q):                              0.20   Prob(JB):                         0.54
Heteroskedasticity (H):               0.92   Skew:                             0.88
Prob(H) (two-sided):                  0.95   Kurtosis:                         2.49
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
Code
forecast_res = result.forecast(steps=3)
last_week = weeks[-1]
forecast_weeks = [last_week + i for i in range(1, 4)]

fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(weeks, weekly_total_py, 'bo-', label="Observed", linewidth=2)
ax.plot(forecast_weeks, forecast_res, 'rs--', label="Forecast", linewidth=2, markersize=8)
ax.axvline(x=10.5, color='grey', linestyle=':', alpha=0.7)
ax.set_title("3-Week Forecast: Total Merchant Transaction Volume")
ax.set_xlabel("Week"); ax.set_ylabel("Total Volume")
ax.legend(); plt.tight_layout()
plt.savefig("arima_python.png", dpi=150, bbox_inches='tight')
plt.show()

Code
print(f"\nForecast for Weeks 11–13: {forecast_res.round(0).tolist()}")

Forecast for Weeks 11–13: [7438.0, 7640.0, 7818.0]

Plain-Language Interpretation:
Platform-wide weekly merchant volume has grown consistently from approximately 4,986 (Week 1) to 7,207 (Week 10) — a 44.5% increase over 10 weeks. The ARIMA model confirms a positive trend. The 3-week forecast projects continued growth, suggesting that current reactivation efforts are having a positive portfolio-level effect. However, since the series is short (10 periods), forecast intervals are wide — interpret the direction (growth) with confidence but treat specific volume numbers as indicative rather than precise.


10. Integrated Findings

The five analytical techniques collectively tell a coherent story about merchant churn on this fintech platform:

The EDA revealed that 18.5% of declined merchants (40 of 216) are fully dormant — they have shown zero transaction activity for 8 or more of the 10 observed weeks. The remaining 81.5% show some level of recovery activity, though with highly variable rates.

The classification model confirmed that Zero_Weeks and Recovery_Rate are the most powerful predictors of permanent churn, with both the Logistic Regression and Decision Tree achieving AUC scores above 0.90. This means we can predict with high confidence which merchants are lost before committing sales resources.

The clustering analysis revealed three natural merchant segments: a “Recovering Stars” group (high volume, bouncing back), a “Dormant Lost” group (likely permanently churned), and an “At-Risk Fence-Sitters” group whose trajectory will be determined by intervention. The PCA biplot confirmed these segments are genuinely distinct in the data.

The time series analysis provides the optimistic context: despite significant individual merchant churn, the overall platform volume is recovering strongly — growing 44.5% over 10 weeks. This means reactivation efforts are working at the aggregate level, but attention is needed to prevent the “At-Risk” segment from tipping into dormancy.

Single Integrated Recommendation: Deploy a targeted three-tier retention strategy: (1) Assign dedicated account managers to the “At-Risk Fence-Sitters” cluster with personalised reactivation incentives — this is the highest ROI group; (2) Run a low-cost automated win-back campaign for “Dormant Lost” merchants; (3) Provide loyalty rewards to “Recovering Stars” to sustain their momentum. This data-driven segmentation approach is estimated to be significantly more efficient than the current uniform outreach strategy.


11. Limitations & Further Work

Data limitations: The time series covers only 10 weekly periods, which limits ARIMA forecast accuracy. With 24+ months of weekly data, a seasonal decomposition and more robust forecasting (e.g., Prophet) would be possible.

Label definition: The Churned label (Zero_Weeks ≥ 8) is a pragmatic operational definition, not a formally validated churn definition. A business-defined churn event (e.g., contract cancellation) would strengthen the classification analysis.

Clustering stability: K-Means is sensitive to initial centroid placement and assumes spherical clusters. Future work should compare with hierarchical clustering (Ward linkage) or DBSCAN to validate segment stability.

Causality: The correlation between Days_Decline and churn does not imply that time causes churn — both may be driven by an unobserved variable (e.g., competitor onboarding). A controlled experiment (randomly assigning merchants to intervention vs. control groups) would establish causal effects.

External variables: The analysis does not include external factors such as sector (retail, hospitality, logistics), region, or macroeconomic conditions that may explain variation in recovery rates.


References

Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making — from data fundamentals to machine learning in Python and R. Lagos Business School / markanalytics.online. https://markanalytics.online

Femi-Akinola, D. (2026). Declined top businesses transaction volume dataset [Dataset]. Collected from Sales Operations Dashboard, Fintech Organisation, Lagos, Nigeria. Data available on request from the author.

Kuhn, M. (2022). caret: Classification and regression training (R package). https://CRAN.R-project.org/package=caret

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

R Core Team. (2024). R: A language and environment for statistical computing (Version 4.x). R Foundation for Statistical Computing. https://www.R-project.org/

Seabold, S., & Perktold, J. (2010). Statsmodels: Econometric and statistical modeling with Python. In Proceedings of the 9th Python in Science Conference (pp. 92–96).

Therneau, T., & Atkinson, B. (2023). rpart: Recursive partitioning and regression trees (R package). https://CRAN.R-project.org/package=rpart

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4


Appendix: AI Usage Statement

Claude (Anthropic) was used as a coding assistant to help structure the Quarto document template, suggest appropriate R and Python package choices, and generate initial code scaffolding for the five analytical techniques. All analytical decisions — including the choice of CS 2, the definition of the Churned outcome variable (Zero_Weeks ≥ 8), the selection of predictor variables, the interpretation of all model outputs, the cluster naming and business recommendations, and the integrated conclusion — were made independently by the author based on domain knowledge as a Sales Manager in the Nigerian fintech sector. The dataset was collected directly from the author’s organisation’s merchant monitoring dashboard and is genuine primary data. All interpretations and recommendations reflect the author’s own professional judgement.