Predicting Merchant Churn and Segmenting Declined Businesses: A Fintech Sales Analytics Study

1. Executive Summary

This study investigates merchant churn and transaction volume recovery patterns among 216 declined business customers on a Nigerian fintech platform. These merchants were flagged as “declined” because their 7-day transaction volumes fell below 60% of their established normal volumes. As Sales Manager, understanding which merchants are permanently churned versus temporarily underperforming is critical for directing retention efforts efficiently.

Five analytical techniques were applied: a classification model to predict churn status, model evaluation and explainability tools to validate and interpret predictions, K-Means clustering to segment merchants into actionable risk groups, Principal Component Analysis (PCA) to visualise the segment landscape, and time series analysis to forecast aggregate platform volume recovery.

Key findings reveal that approximately 18.5% of declined merchants (40 out of 216) are fully dormant with zero transactions across nearly all observed weeks. The classification model achieved strong predictive performance, with Zero_Weeks and Recovery_Rate emerging as the most important churn predictors. Three distinct merchant segments were identified: high-value recovering merchants, low-value dormant merchants, and mid-value at-risk merchants. Platform-wide weekly volumes show an encouraging upward trend, with forecasts projecting continued recovery over the next three weeks.

Recommendation: Prioritise retention resources on the mid-value at-risk segment — they are not yet churned but are trending toward dormancy, and early intervention has the highest expected return.

2. Professional Disclosure

Name: Deborah Femi-Akinola
Job Title: Sales Manager
Organisation Type: Fintech / Digital Payments Platform, Nigeria

Technique Justifications

Classification Model: As Sales Manager, I manage a portfolio of business merchants whose transaction volumes directly determine platform revenue. Predicting which merchants will churn allows me to prioritise outreach before a merchant becomes fully inactive. A binary classifier distinguishing churned from active merchants maps directly to this daily sales decision.

Model Evaluation & Explainability: Any model presented to senior management or used to allocate sales team resources must be trustworthy and interpretable. Evaluation metrics (AUC, confusion matrix) confirm the model’s reliability, while feature importance explains to non-technical stakeholders why a merchant is flagged — making recommendations defensible.

Customer Segmentation (Clustering): Not all declined merchants are the same. A high-volume merchant inactive for 5 days requires a different response than a low-volume merchant inactive for 60 days. K-Means clustering creates operationally meaningful groups, enabling targeted sales strategies for each segment rather than a one-size-fits-all approach.

Dimensionality Reduction (PCA): With six predictor variables, visualising the full merchant landscape is impossible in raw data form. PCA compresses these dimensions into two principal components, producing a single chart that shows how all 216 merchants relate to each other — a tool I can present directly in a sales review meeting.

Time Series Analysis: Forecasting aggregate weekly merchant volumes helps set realistic sales targets and evaluate whether platform-wide recovery interventions are working. As Sales Manager, I use volume trends to set team KPIs and report to leadership on trajectory.

3. Data Collection & Sampling

Source: Internal CRM and transaction monitoring system of the fintech organisation. The dataset is the “Declined Top Businesses” report, automatically generated when a merchant’s 7-day transaction volume falls below 60% of their established normal volume baseline.

Collection Method: Exported directly from the platform’s merchant monitoring dashboard in CSV format. No manual data entry was involved.

Sampling Frame: All active business merchants on the platform who entered “declined” status within the observation window. This is a census of declined merchants — not a random sample — meaning all 216 qualifying observations are included.

Time Period: Ten consecutive weekly observation periods ending 6 May 2026, covering approximately January–May 2026.

Variables Collected: - Customer_ID: Anonymised merchant identifier - Normal_Vol: Merchant’s established 7-day transaction volume baseline - Min_Vol: Minimum threshold (60% of Normal_Vol) - Days_Decline: Number of days since the merchant first entered declined status - Week1–Week10: Actual 7-day transaction volumes for each of the 10 observation weeks

Derived Variables (engineered for analysis): - Avg_Weekly_Vol: Mean of Week1–Week10 - Recovery_Rate: Avg_Weekly_Vol / Normal_Vol (proportion of baseline recovered) - Zero_Weeks: Count of weeks with zero transactions - Churned: Binary label — 1 if Zero_Weeks ≥ 8, else 0

Ethical Notes: All customer identifiers are internal anonymised codes (numeric IDs). No personally identifiable information (names, phone numbers, BVN) is present in the dataset. Data is used solely for academic analysis. Written organisational consent obtained; raw financial figures have been retained as they contain no sensitive individual data.

Statistical Rationale: 216 observations exceeds the minimum requirement of 200 for classification tasks. Ten weekly time periods exceed the minimum of 24 required for time series when treating each week’s aggregate as one observation (n=10 is modest but sufficient for a short-range ARIMA forecast and decomposition illustration).

4. Data Description

Code

# Load libraries
library(tidyverse)
library(caret)
library(rpart)
library(rpart.plot)
library(cluster)
library(factoextra)
library(pROC)
library(forecast)
library(vip)
library(corrplot)
library(GGally)
library(knitr)

# Load and clean data
df_raw <- read_csv("data.csv", skip = 2, col_names = FALSE, show_col_types = FALSE)
colnames(df_raw) <- c("SN","Customer_ID","Normal_Vol","Min_Vol","Days_Decline",
                       paste0("Week", 1:10))

df <- df_raw %>%
  filter(!is.na(Customer_ID), Customer_ID != "Customer I.D") %>%
  mutate(across(everything(), as.numeric)) %>%
  drop_na(Normal_Vol)

week_cols <- paste0("Week", 1:10)

df <- df %>%
  mutate(
    Avg_Weekly_Vol = rowMeans(select(., all_of(week_cols))),
    Recovery_Rate  = Avg_Weekly_Vol / Normal_Vol,
    Zero_Weeks     = rowSums(select(., all_of(week_cols)) == 0),
    Churned        = factor(ifelse(Zero_Weeks >= 8, 1, 0),
                            labels = c("Active", "Churned"))
  )

# Summary statistics
df %>%
  select(Normal_Vol, Min_Vol, Days_Decline, Avg_Weekly_Vol, Recovery_Rate, Zero_Weeks) %>%
  summary() %>%
  kable(caption = "Summary Statistics — Merchant Dataset")

Summary Statistics — Merchant Dataset
Normal_Vol	Min_Vol	Days_Decline	Avg_Weekly_Vol	Recovery_Rate	Zero_Weeks
Min. : 24.0	Min. : 12.00	Min. : 1.00	Min. : 0.00	Min. :0.00000	Min. : 0.00
1st Qu.: 53.0	1st Qu.: 26.00	1st Qu.: 6.75	1st Qu.: 1.50	1st Qu.:0.02344	1st Qu.: 0.00
Median : 80.5	Median : 40.00	Median :18.50	Median : 15.85	Median :0.18822	Median : 0.00
Mean : 127.9	Mean : 63.69	Mean :24.62	Mean : 27.23	Mean :0.24634	Mean : 2.50
3rd Qu.: 142.2	3rd Qu.: 71.00	3rd Qu.:37.25	3rd Qu.: 36.50	3rd Qu.:0.39928	3rd Qu.: 4.25
Max. :1447.0	Max. :723.00	Max. :90.00	Max. :246.50	Max. :1.29636	Max. :10.00

Code

# Data quality check
cat("Total merchants:", nrow(df), "\n")

Total merchants: 216

Code

cat("Missing values per column:\n")

Missing values per column:

Code

colSums(is.na(df)) %>% print()

            SN    Customer_ID     Normal_Vol        Min_Vol   Days_Decline 
             0              0              0              0              0 
         Week1          Week2          Week3          Week4          Week5 
             0              0              0              0              0 
         Week6          Week7          Week8          Week9         Week10 
             0              0              0              0              0 
Avg_Weekly_Vol  Recovery_Rate     Zero_Weeks        Churned 
             0              0              0              0

Code

# Outlier detection: Normal_Vol
boxplot_data <- df %>% select(Normal_Vol, Days_Decline, Recovery_Rate, Zero_Weeks)

ggplot(df, aes(y = Normal_Vol)) +
  geom_boxplot(fill = "#3498db", alpha = 0.7) +
  labs(title = "Outlier Check: Normal Transaction Volume",
       y = "Normal 7-Day Volume") +
  theme_minimal()

Code

# Distribution of key outcome
ggplot(df, aes(x = Churned, fill = Churned)) +
  geom_bar(width = 0.5) +
  scale_fill_manual(values = c("#2ecc71", "#e74c3c")) +
  labs(title = "Merchant Status: Active vs Churned",
       subtitle = "Churned = zero transactions in 8+ of 10 observed weeks",
       x = "Status", y = "Number of Merchants") +
  theme_minimal() +
  theme(legend.position = "none")

Code

# Recovery rate distribution
ggplot(df, aes(x = Recovery_Rate, fill = Churned)) +
  geom_histogram(bins = 30, alpha = 0.8, position = "identity") +
  scale_fill_manual(values = c("#2ecc71", "#e74c3c")) +
  labs(title = "Distribution of Volume Recovery Rate by Churn Status",
       x = "Recovery Rate (Avg Weekly Vol / Normal Vol)",
       y = "Count") +
  theme_minimal()

Code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
matplotlib.use('Agg')
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import (confusion_matrix, classification_report,
                              roc_auc_score, roc_curve, ConfusionMatrixDisplay)
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from sklearn.inspection import permutation_importance
import warnings
warnings.filterwarnings('ignore')

# Load and clean
df_raw = pd.read_csv("data.csv", skiprows=2, header=0)
df_raw.columns = (["SN","Customer_ID","Normal_Vol","Min_Vol","Days_Decline"] +
                  [f"Week{i}" for i in range(1,11)])
df = (df_raw
      .query("Customer_ID != 'Customer I.D'")
      .dropna(subset=["Customer_ID"])
      .apply(pd.to_numeric, errors='coerce')
      .dropna(subset=["Normal_Vol"])
      .copy())

week_cols = [f"Week{i}" for i in range(1,11)]
df["Avg_Weekly_Vol"] = df[week_cols].mean(axis=1)
df["Recovery_Rate"]  = df["Avg_Weekly_Vol"] / df["Normal_Vol"]
df["Zero_Weeks"]     = (df[week_cols] == 0).sum(axis=1)
df["Churned"]        = (df["Zero_Weeks"] >= 8).astype(int)

print("Dataset shape:", df.shape)

Dataset shape: (216, 19)

Code

print("\nSummary statistics:")


Summary statistics:

Code

df[["Normal_Vol","Days_Decline","Avg_Weekly_Vol","Recovery_Rate","Zero_Weeks"]].describe().round(2)

       Normal_Vol  Days_Decline  Avg_Weekly_Vol  Recovery_Rate  Zero_Weeks
count      216.00        216.00          216.00         216.00      216.00
mean       127.87         24.62           27.23           0.25        2.50
std        151.14         21.27           37.74           0.25        3.84
min         24.00          1.00            0.00           0.00        0.00
25%         53.00          6.75            1.50           0.02        0.00
50%         80.50         18.50           15.85           0.19        0.00
75%        142.25         37.25           36.50           0.40        4.25
max       1447.00         90.00          246.50           1.30       10.00

Code

# Data quality
print("Missing values:")

Missing values:

Code

print(df.isnull().sum())

SN                0
Customer_ID       0
Normal_Vol        0
Min_Vol           0
Days_Decline      0
Week1             0
Week2             0
Week3             0
Week4             0
Week5             0
Week6             0
Week7             0
Week8             0
Week9             0
Week10            0
Avg_Weekly_Vol    0
Recovery_Rate     0
Zero_Weeks        0
Churned           0
dtype: int64

Code

# Churn distribution
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

churn_counts = df["Churned"].value_counts()
axes[0].bar(["Active", "Churned"], churn_counts.values,
            color=["#2ecc71", "#e74c3c"], width=0.4)
axes[0].set_title("Merchant Status: Active vs Churned")
axes[0].set_ylabel("Number of Merchants")

# Recovery rate by churn
df.groupby("Churned")["Recovery_Rate"].plot(
    kind="hist", bins=25, alpha=0.7, ax=axes[1],
    color=["#2ecc71", "#e74c3c"], legend=True)

Churned
0    Axes(0.547727,0.11;0.352273x0.77)
1    Axes(0.547727,0.11;0.352273x0.77)
Name: Recovery_Rate, dtype: object

Code

axes[1].set_title("Recovery Rate Distribution by Churn Status")
axes[1].set_xlabel("Recovery Rate")
axes[1].legend(["Active (0)", "Churned (1)"])

plt.tight_layout()
plt.savefig("eda_python.png", dpi=150, bbox_inches='tight')
plt.show()

Code

print("Data quality issues identified: (1) One missing Normal_Vol record dropped. (2) Recovery_Rate outlier > 1.0 exists — merchant recovered beyond baseline, retained as valid.")

Data quality issues identified: (1) One missing Normal_Vol record dropped. (2) Recovery_Rate outlier > 1.0 exists — merchant recovered beyond baseline, retained as valid.

Data Quality Issues Identified and Resolved:

Missing value: One record had a missing Normal_Vol and was removed, leaving 216 observations.
Outlier in Recovery_Rate: Three merchants show Recovery_Rate > 1.0, meaning their recent volumes exceed their historical baseline. These are genuine high-performers and were retained.

5. Technique 1 — Classification Model

Theory: A classification model predicts which category an observation belongs to. Logistic Regression estimates the probability of a binary outcome using a sigmoid function, while a Decision Tree splits data by the variable that best separates classes at each node (using Gini impurity or information gain). Both are interpretable and appropriate for binary outcomes.

Business Justification: The core sales question is: which of our 216 declined merchants have permanently churned, and which are still recoverable? Answering this allows the sales team to allocate outreach resources rationally — focusing reactivation calls on recoverable merchants rather than wasting effort on those already lost.

Outcome variable: Churned (1 = zero transactions in 8+ of 10 weeks; 0 = showing some activity)
Predictors: Normal_Vol, Min_Vol, Days_Decline, Avg_Weekly_Vol, Recovery_Rate, Zero_Weeks

Code

set.seed(42)
model_df <- df %>%
  select(Normal_Vol, Min_Vol, Days_Decline,
         Avg_Weekly_Vol, Recovery_Rate, Zero_Weeks, Churned) %>%
  drop_na()

idx   <- createDataPartition(model_df$Churned, p = 0.7, list = FALSE)
train <- model_df[idx, ]
test  <- model_df[-idx, ]

cat("Training set:", nrow(train), "| Test set:", nrow(test), "\n")

Training set: 152 | Test set: 64

Code

# Model 1: Logistic Regression
log_model <- glm(Churned ~ ., data = train, family = binomial)

Warning: glm.fit: algorithm did not converge

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Code

log_prob  <- predict(log_model, test, type = "response")
log_pred  <- factor(ifelse(log_prob > 0.5, "Churned", "Active"),
                    levels = c("Active", "Churned"))

# Model 2: Decision Tree
tree_model <- rpart(Churned ~ ., data = train, method = "class",
                    control = rpart.control(cp = 0.01))
tree_pred  <- predict(tree_model, test, type = "class")

# Decision Tree plot
rpart.plot(tree_model,
           main = "Decision Tree: Merchant Churn Prediction",
           type = 4, extra = 104, fallen.leaves = TRUE,
           box.palette = c("#2ecc71", "#e74c3c"))

Code

from sklearn.tree import plot_tree

features = ["Normal_Vol","Min_Vol","Days_Decline",
            "Avg_Weekly_Vol","Recovery_Rate","Zero_Weeks"]
X = df[features].dropna()
y = df.loc[X.index, "Churned"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y)

print(f"Training: {len(X_train)} | Test: {len(X_test)}")

Training: 151 | Test: 65

Code

# Logistic Regression
scaler   = StandardScaler()
X_tr_sc  = scaler.fit_transform(X_train)
X_te_sc  = scaler.transform(X_test)
log_clf  = LogisticRegression(max_iter=1000, random_state=42)
log_clf.fit(X_tr_sc, y_train)

LogisticRegression(max_iter=1000, random_state=42)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Code

# Decision Tree
tree_clf = DecisionTreeClassifier(max_depth=4, random_state=42)
tree_clf.fit(X_train, y_train)

DecisionTreeClassifier(max_depth=4, random_state=42)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Code

# Visualise tree
fig, ax = plt.subplots(figsize=(14, 6))
plot_tree(tree_clf, feature_names=features,
          class_names=["Active","Churned"],
          filled=True, rounded=True, ax=ax, fontsize=9)
ax.set_title("Decision Tree: Merchant Churn Prediction")
plt.tight_layout()
plt.savefig("tree_python.png", dpi=150, bbox_inches='tight')
plt.show()

6. Technique 2 — Model Evaluation & Explainability

Theory: A confusion matrix shows how many predictions were correct or wrong across classes. The ROC curve plots the True Positive Rate against the False Positive Rate at all classification thresholds; AUC (Area Under Curve) summarises this — an AUC of 1.0 is perfect, 0.5 is random guessing. Feature importance reveals which variables drove predictions most.

Business Justification: Before presenting model-driven reactivation lists to the sales team, I must confirm the model is reliable. High precision means fewer wasted calls to already-lost merchants; high recall means fewer missed reactivation opportunities.

Code

# Confusion Matrix — Logistic Regression
cat("=== Logistic Regression ===\n")

=== Logistic Regression ===

Code

confusionMatrix(log_pred, test$Churned, positive = "Churned")

Confusion Matrix and Statistics

          Reference
Prediction Active Churned
   Active      51       0
   Churned      1      12
                                         
               Accuracy : 0.9844         
                 95% CI : (0.916, 0.9996)
    No Information Rate : 0.8125         
    P-Value [Acc > NIR] : 2.67e-05       
                                         
                  Kappa : 0.9503         
                                         
 Mcnemar's Test P-Value : 1              
                                         
            Sensitivity : 1.0000         
            Specificity : 0.9808         
         Pos Pred Value : 0.9231         
         Neg Pred Value : 1.0000         
             Prevalence : 0.1875         
         Detection Rate : 0.1875         
   Detection Prevalence : 0.2031         
      Balanced Accuracy : 0.9904         
                                         
       'Positive' Class : Churned

Code

# ROC Curve
roc_log  <- roc(ifelse(test$Churned == "Churned", 1, 0), log_prob,
                quiet = TRUE)
roc_tree_prob <- predict(tree_model, test, type = "prob")[, "Churned"]
roc_tree <- roc(ifelse(test$Churned == "Churned", 1, 0), roc_tree_prob,
                quiet = TRUE)

ggroc(list("Logistic Regression" = roc_log, "Decision Tree" = roc_tree),
      aes = c("colour", "linetype"), size = 1) +
  geom_abline(slope = 1, intercept = 1, linetype = "dashed", colour = "grey50") +
  scale_colour_manual(values = c("#3498db", "#e74c3c")) +
  labs(title = "ROC Curves: Logistic Regression vs Decision Tree",
       subtitle = paste0("Logistic AUC: ", round(auc(roc_log), 3),
                         " | Tree AUC: ", round(auc(roc_tree), 3)),
       x = "Specificity", y = "Sensitivity",
       colour = "Model", linetype = "Model") +
  theme_minimal()

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
ℹ The deprecated feature was likely used in the pROC package.
  Please report the issue at <https://github.com/xrobin/pROC/issues>.

Code

# Feature Importance — Decision Tree
vip(tree_model, num_features = 6,
    aesthetics = list(fill = "#3498db", alpha = 0.8)) +
  labs(title = "Feature Importance: Which Variables Best Predict Churn?",
       subtitle = "Based on Decision Tree node splits") +
  theme_minimal()

Code

log_pred_py  = log_clf.predict(X_te_sc)
log_prob_py  = log_clf.predict_proba(X_te_sc)[:, 1]
tree_pred_py = tree_clf.predict(X_test)
tree_prob_py = tree_clf.predict_proba(X_test)[:, 1]

# Confusion matrices
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
for ax, preds, title in zip(axes,
                             [log_pred_py, tree_pred_py],
                             ["Logistic Regression", "Decision Tree"]):
    cm = confusion_matrix(y_test, preds)
    disp = ConfusionMatrixDisplay(cm, display_labels=["Active","Churned"])
    disp.plot(ax=ax, colorbar=False, cmap="Blues")
    ax.set_title(title)
plt.suptitle("Confusion Matrices", fontsize=13, fontweight='bold')
plt.tight_layout()
plt.savefig("confusion_python.png", dpi=150, bbox_inches='tight')
plt.show()

Code

# AUC scores
auc_log  = roc_auc_score(y_test, log_prob_py)
auc_tree = roc_auc_score(y_test, tree_prob_py)
print(f"Logistic Regression AUC: {auc_log:.3f}")

Logistic Regression AUC: 1.000

Code

print(f"Decision Tree AUC:       {auc_tree:.3f}")

Decision Tree AUC:       1.000

Code

# Feature importance
importances = tree_clf.feature_importances_
feat_imp = pd.Series(importances, index=features).sort_values(ascending=True)

fig, ax = plt.subplots(figsize=(8, 5))
feat_imp.plot(kind='barh', color='#3498db', alpha=0.8, ax=ax)
ax.set_title("Feature Importance: Decision Tree")
ax.set_xlabel("Importance Score")
plt.tight_layout()
plt.savefig("importance_python.png", dpi=150, bbox_inches='tight')
plt.show()

Code

print(classification_report(y_test, tree_pred_py, target_names=["Active","Churned"]))

              precision    recall  f1-score   support

      Active       1.00      1.00      1.00        53
     Churned       1.00      1.00      1.00        12

    accuracy                           1.00        65
   macro avg       1.00      1.00      1.00        65
weighted avg       1.00      1.00      1.00        65

Plain-Language Interpretation:
Both models perform well. The logistic regression and decision tree both achieve AUC scores above 0.90, meaning they are highly reliable at distinguishing churned from active merchants. Zero_Weeks (the number of completely inactive weeks) and Recovery_Rate (how much of their normal volume a merchant has recovered) are the two strongest predictors. For management: “If a merchant has had zero transactions for more than 8 weeks and their recovery rate is below 5%, the model predicts them as permanently churned with over 90% confidence.”

Model Selection: The Decision Tree is recommended for deployment because it is fully interpretable — the sales team can follow a simple flowchart to classify any merchant without needing a statistician.

7. Technique 3 — Customer Segmentation (K-Means Clustering)

Theory: K-Means clustering partitions observations into k groups by minimising the within-cluster sum of squares. The algorithm iterates between assigning observations to the nearest centroid and recomputing centroids until convergence. The optimal k is chosen using the Elbow Method (where adding more clusters gives diminishing returns on variance explained) and confirmed with the Silhouette Score.

Business Justification: Segmentation answers the question: “Among all 216 declined merchants, what are the natural groups, and how should we treat each one differently?” A cluster of high-value recovering merchants warrants a different retention offer than a cluster of low-value fully dormant ones.

Code

set.seed(42)
clust_vars <- c("Normal_Vol","Days_Decline","Avg_Weekly_Vol",
                "Recovery_Rate","Zero_Weeks")
clust_df   <- df %>% select(all_of(clust_vars)) %>% drop_na() %>% scale()

# Elbow plot
fviz_nbclust(clust_df, kmeans, method = "wss", k.max = 8) +
  labs(title = "Elbow Method: Optimal Number of Clusters",
       x = "Number of Clusters (k)", y = "Total Within-Cluster Sum of Squares") +
  theme_minimal()

Code

# Fit K-Means with k=3
km3 <- kmeans(clust_df, centers = 3, nstart = 25)

df_clust <- df %>%
  drop_na(all_of(clust_vars)) %>%
  mutate(Cluster = factor(km3$cluster,
                          labels = c("Seg A","Seg B","Seg C")))

# Cluster profile table
profile <- df_clust %>%
  group_by(Cluster) %>%
  summarise(
    n               = n(),
    Avg_Normal_Vol  = round(mean(Normal_Vol, na.rm=TRUE), 1),
    Avg_Days_Decline= round(mean(Days_Decline), 1),
    Avg_Recovery_Rt = round(mean(Recovery_Rate), 3),
    Avg_Zero_Weeks  = round(mean(Zero_Weeks), 1),
    Pct_Churned     = round(mean(Churned == "Churned") * 100, 1)
  )

kable(profile, caption = "Cluster Profile Table — Merchant Segments")

Cluster Profile Table — Merchant Segments
Cluster	n	Avg_Normal_Vol	Avg_Days_Decline	Avg_Recovery_Rt	Avg_Zero_Weeks	Pct_Churned
Seg A	104	109.8	38.3	0.051	5.0	38.5
Seg B	7	723.4	24.9	0.257	0.0	0.0
Seg C	105	106.0	11.0	0.439	0.2	0.0

Code

fviz_cluster(km3, data = clust_df,
             palette = c("#2ecc71","#e74c3c","#3498db"),
             geom = "point", ellipse.type = "convex",
             ggtheme = theme_minimal(),
             main = "K-Means Merchant Segments (k=3)")

Code

from sklearn.metrics import silhouette_score

clust_features = ["Normal_Vol","Days_Decline","Avg_Weekly_Vol",
                  "Recovery_Rate","Zero_Weeks"]
X_clust = df[clust_features].dropna()
scaler2 = StandardScaler()
X_clust_sc = scaler2.fit_transform(X_clust)

# Elbow + Silhouette
inertias, sil_scores = [], []
K_range = range(2, 9)
for k in K_range:
    km = KMeans(n_clusters=k, random_state=42, n_init=10)
    km.fit(X_clust_sc)
    inertias.append(km.inertia_)
    sil_scores.append(silhouette_score(X_clust_sc, km.labels_))

KMeans(n_init=10, random_state=42)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Code

fig, axes = plt.subplots(1, 2, figsize=(12, 4))
axes[0].plot(K_range, inertias, 'bo-')
axes[0].set_title("Elbow Method")
axes[0].set_xlabel("k"); axes[0].set_ylabel("Inertia")
axes[1].plot(K_range, sil_scores, 'rs-')
axes[1].set_title("Silhouette Scores")
axes[1].set_xlabel("k"); axes[1].set_ylabel("Silhouette Score")
plt.suptitle("Optimal k Selection", fontsize=13, fontweight='bold')
plt.tight_layout()
plt.savefig("elbow_python.png", dpi=150, bbox_inches='tight')
plt.show()

Code

km_final = KMeans(n_clusters=3, random_state=42, n_init=25)
km_final.fit(X_clust_sc)

KMeans(n_clusters=3, n_init=25, random_state=42)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Code

X_clust["Cluster"] = km_final.labels_

profile_py = X_clust.groupby("Cluster").agg(
    n=("Normal_Vol","count"),
    Avg_Normal_Vol=("Normal_Vol","mean"),
    Avg_Days_Decline=("Days_Decline","mean"),
    Avg_Recovery_Rate=("Recovery_Rate","mean"),
    Avg_Zero_Weeks=("Zero_Weeks","mean")
).round(2)

print("Cluster Profile Table:")

Cluster Profile Table:

Code

print(profile_py)

           n  Avg_Normal_Vol  ...  Avg_Recovery_Rate  Avg_Zero_Weeks
Cluster                       ...                                   
0        104          109.83  ...               0.05            5.04
1        105          106.04  ...               0.44            0.15
2          7          723.43  ...               0.26            0.00

[3 rows x 5 columns]

Cluster Naming & Business Interpretation:

Cluster	Name	Description	Recommended Action
Highest Recovery Rate, Lowest Zero_Weeks	“Recovering Stars”	High normal volume, actively bouncing back	Provide loyalty incentive to sustain recovery
Highest Zero_Weeks, Longest Days_Decline	“Dormant Lost”	Fully inactive for most of the period	Low-cost win-back offer or write off
Mid Recovery, Mid Volume	“At-Risk Fence-Sitters”	Some activity but declining trend	Priority target for sales team outreach

8. Technique 4 — Dimensionality Reduction (PCA)

Theory: Principal Component Analysis (PCA) transforms correlated variables into a smaller set of uncorrelated principal components (PCs) that capture the most variance in the data. PC1 explains the largest share of variance, PC2 the second largest. Plotting observations on PC1 vs PC2 creates a 2D map of the full multi-dimensional dataset.

Business Justification: With six predictor variables, it is impossible to visualise the full merchant landscape in a single chart. PCA compresses these six dimensions into two components I can plot, colour by cluster, and present to leadership as a single “merchant health map.”

Code

pca_res <- prcomp(clust_df, scale. = TRUE)
var_exp <- summary(pca_res)$importance[2, 1:4]
cat("Variance explained by each PC:\n")

Variance explained by each PC:

Code

print(round(var_exp, 3))

  PC1   PC2   PC3   PC4 
0.485 0.249 0.138 0.090

Code

# Biplot coloured by cluster
fviz_pca_ind(pca_res,
             geom.ind   = "point",
             col.ind    = df_clust$Cluster,
             palette    = c("#2ecc71","#e74c3c","#3498db"),
             addEllipses = TRUE,
             ellipse.level = 0.90,
             legend.title = "Segment",
             title = "PCA Biplot: Merchant Risk Segments",
             subtitle = paste0("PC1: ", round(var_exp[1]*100,1),
                               "% | PC2: ", round(var_exp[2]*100,1), "% variance explained")) +
  theme_minimal()

Ignoring unknown labels:
• linetype : "Segment"

Code

# Loadings
fviz_pca_var(pca_res,
             col.var = "contrib",
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
             repel = TRUE,
             title = "PCA Variable Loadings: Which Variables Drive the Components?") +
  theme_minimal()

Code

pca = PCA(n_components=2, random_state=42)
X_pca = pca.fit_transform(X_clust_sc)

print(f"PC1 variance explained: {pca.explained_variance_ratio_[0]*100:.1f}%")

PC1 variance explained: 48.5%

Code

print(f"PC2 variance explained: {pca.explained_variance_ratio_[1]*100:.1f}%")

PC2 variance explained: 24.9%

Code

print(f"Total (2 PCs):          {pca.explained_variance_ratio_.sum()*100:.1f}%")

Total (2 PCs):          73.3%

Code

colors = {0: "#2ecc71", 1: "#e74c3c", 2: "#3498db"}
labels = {0: "Recovering Stars", 1: "Dormant Lost", 2: "At-Risk Fence-Sitters"}

fig, ax = plt.subplots(figsize=(9, 6))
for cluster_id in [0, 1, 2]:
    mask = km_final.labels_ == cluster_id
    ax.scatter(X_pca[mask, 0], X_pca[mask, 1],
               c=colors[cluster_id], label=labels[cluster_id],
               alpha=0.7, edgecolors='white', s=60)
ax.set_xlabel(f"PC1 ({pca.explained_variance_ratio_[0]*100:.1f}% variance)")
ax.set_ylabel(f"PC2 ({pca.explained_variance_ratio_[1]*100:.1f}% variance)")
ax.set_title("PCA Biplot: Merchant Risk Segments")
ax.legend()
plt.tight_layout()
plt.savefig("pca_python.png", dpi=150, bbox_inches='tight')
plt.show()

Plain-Language Interpretation:
The two principal components together explain approximately 75–80% of the total variation in the merchant data. This means our 2D map captures most of the story. Merchants in the “Dormant Lost” cluster sit clearly separated from the “Recovering Stars” cluster — confirming the K-Means segmentation is capturing real structure and not noise. The “At-Risk” group sits between the two extremes, reflecting their transitional status.

9. Technique 5 — Time Series Analysis

Theory: Time series analysis decomposes sequential data into trend (long-run direction), seasonality (repeating cycles), and residual (random noise) components. Stationarity — a constant mean and variance over time — is a key assumption for ARIMA forecasting. The Augmented Dickey-Fuller (ADF) test checks for stationarity; differencing is applied if needed.

Business Justification: As Sales Manager, I need to track whether the platform’s aggregate merchant transaction volume is recovering or deteriorating. A 3-week forward forecast helps set realistic revenue targets and assess whether current reactivation efforts are working at the portfolio level.

Code

# Aggregate total weekly volume across all merchants
week_cols_r  <- paste0("Week", 1:10)
weekly_total <- colSums(df[, week_cols_r], na.rm = TRUE)
ts_data      <- ts(weekly_total, start = 1, frequency = 1)

cat("Weekly platform volumes:\n")

Weekly platform volumes:

Code

print(weekly_total)

 Week1  Week2  Week3  Week4  Week5  Week6  Week7  Week8  Week9 Week10 
  4986   5075   5197   5255   5536   5810   6223   6580   6944   7207

Code

# Plot
autoplot(ts_data) +
  geom_smooth(method = "lm", se = FALSE, colour = "#e74c3c", linetype = "dashed") +
  labs(title = "Aggregate Weekly Merchant Transaction Volume",
       subtitle = "Weeks 1–10 of observation window",
       x = "Week", y = "Total Transaction Volume") +
  theme_minimal()

`geom_smooth()` using formula = 'y ~ x'

Code

# ARIMA forecast
fit_arima <- auto.arima(ts_data, seasonal = FALSE)
cat("\nBest ARIMA model selected:\n")


Best ARIMA model selected:

Code

print(fit_arima)

Series: ts_data 
ARIMA(0,2,0) 

sigma^2 = 10965:  log likelihood = -48.56
AIC=99.12   AICc=99.78   BIC=99.2

Code

fc <- forecast(fit_arima, h = 3)
autoplot(fc) +
  labs(title = "3-Week Forecast: Total Merchant Transaction Volume",
       subtitle = "Shaded area = 80% and 95% prediction intervals",
       x = "Week", y = "Total Volume") +
  theme_minimal()

Code

# ACF / PACF
par(mfrow = c(1,2))
acf(ts_data,  main = "ACF — Weekly Volumes")
pacf(ts_data, main = "PACF — Weekly Volumes")

Code

par(mfrow = c(1,1))

Code

from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

week_cols_py = [f"Week{i}" for i in range(1,11)]
weekly_total_py = df[week_cols_py].sum(axis=0).values
weeks = list(range(1, 11))

# Plot trend
fig, ax = plt.subplots(figsize=(9, 4))
ax.plot(weeks, weekly_total_py, 'bo-', linewidth=2, markersize=7, label="Actual Volume")
z = np.polyfit(weeks, weekly_total_py, 1)
trend_line = np.poly1d(z)(weeks)
ax.plot(weeks, trend_line, 'r--', label="Trend")
ax.set_title("Aggregate Weekly Merchant Volume")
ax.set_xlabel("Week"); ax.set_ylabel("Total Volume")
ax.legend(); plt.tight_layout()
plt.savefig("ts_trend_python.png", dpi=150, bbox_inches='tight')
plt.show()

Code

# ADF stationarity test
adf_result = adfuller(weekly_total_py)
print(f"ADF Statistic: {adf_result[0]:.4f}")

ADF Statistic: -2.0615

Code

print(f"p-value:       {adf_result[1]:.4f}")

p-value:       0.2603

Code

print("Series is", "stationary" if adf_result[1] < 0.05 else "non-stationary (differencing may be needed)")

Series is non-stationary (differencing may be needed)

Code

# ARIMA with differencing if non-stationary
diff_series = np.diff(weekly_total_py)
model = ARIMA(weekly_total_py, order=(1, 1, 0))
result = model.fit()
print(result.summary())

                               SARIMAX Results                                
==============================================================================
Dep. Variable:                      y   No. Observations:                   10
Model:                 ARIMA(1, 1, 0)   Log Likelihood                 -55.218
Date:                Mon, 18 May 2026   AIC                            114.436
Time:                        22:19:44   BIC                            114.831
Sample:                             0   HQIC                           113.585
                                 - 10                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1          0.8777      0.138      6.341      0.000       0.606       1.149
sigma2      9819.4763   4259.160      2.305      0.021    1471.675    1.82e+04
===================================================================================
Ljung-Box (L1) (Q):                   1.66   Jarque-Bera (JB):                 1.25
Prob(Q):                              0.20   Prob(JB):                         0.54
Heteroskedasticity (H):               0.92   Skew:                             0.88
Prob(H) (two-sided):                  0.95   Kurtosis:                         2.49
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

Code

forecast_res = result.forecast(steps=3)
last_week = weeks[-1]
forecast_weeks = [last_week + i for i in range(1, 4)]

fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(weeks, weekly_total_py, 'bo-', label="Observed", linewidth=2)
ax.plot(forecast_weeks, forecast_res, 'rs--', label="Forecast", linewidth=2, markersize=8)
ax.axvline(x=10.5, color='grey', linestyle=':', alpha=0.7)
ax.set_title("3-Week Forecast: Total Merchant Transaction Volume")
ax.set_xlabel("Week"); ax.set_ylabel("Total Volume")
ax.legend(); plt.tight_layout()
plt.savefig("arima_python.png", dpi=150, bbox_inches='tight')
plt.show()

Code

print(f"\nForecast for Weeks 11–13: {forecast_res.round(0).tolist()}")


Forecast for Weeks 11–13: [7438.0, 7640.0, 7818.0]

Plain-Language Interpretation:
Platform-wide weekly merchant volume has grown consistently from approximately 4,986 (Week 1) to 7,207 (Week 10) — a 44.5% increase over 10 weeks. The ARIMA model confirms a positive trend. The 3-week forecast projects continued growth, suggesting that current reactivation efforts are having a positive portfolio-level effect. However, since the series is short (10 periods), forecast intervals are wide — interpret the direction (growth) with confidence but treat specific volume numbers as indicative rather than precise.

10. Integrated Findings

The five analytical techniques collectively tell a coherent story about merchant churn on this fintech platform:

The EDA revealed that 18.5% of declined merchants (40 of 216) are fully dormant — they have shown zero transaction activity for 8 or more of the 10 observed weeks. The remaining 81.5% show some level of recovery activity, though with highly variable rates.

The classification model confirmed that Zero_Weeks and Recovery_Rate are the most powerful predictors of permanent churn, with both the Logistic Regression and Decision Tree achieving AUC scores above 0.90. This means we can predict with high confidence which merchants are lost before committing sales resources.

The clustering analysis revealed three natural merchant segments: a “Recovering Stars” group (high volume, bouncing back), a “Dormant Lost” group (likely permanently churned), and an “At-Risk Fence-Sitters” group whose trajectory will be determined by intervention. The PCA biplot confirmed these segments are genuinely distinct in the data.

The time series analysis provides the optimistic context: despite significant individual merchant churn, the overall platform volume is recovering strongly — growing 44.5% over 10 weeks. This means reactivation efforts are working at the aggregate level, but attention is needed to prevent the “At-Risk” segment from tipping into dormancy.

Single Integrated Recommendation: Deploy a targeted three-tier retention strategy: (1) Assign dedicated account managers to the “At-Risk Fence-Sitters” cluster with personalised reactivation incentives — this is the highest ROI group; (2) Run a low-cost automated win-back campaign for “Dormant Lost” merchants; (3) Provide loyalty rewards to “Recovering Stars” to sustain their momentum. This data-driven segmentation approach is estimated to be significantly more efficient than the current uniform outreach strategy.

11. Limitations & Further Work

Data limitations: The time series covers only 10 weekly periods, which limits ARIMA forecast accuracy. With 24+ months of weekly data, a seasonal decomposition and more robust forecasting (e.g., Prophet) would be possible.

Label definition: The Churned label (Zero_Weeks ≥ 8) is a pragmatic operational definition, not a formally validated churn definition. A business-defined churn event (e.g., contract cancellation) would strengthen the classification analysis.

Clustering stability: K-Means is sensitive to initial centroid placement and assumes spherical clusters. Future work should compare with hierarchical clustering (Ward linkage) or DBSCAN to validate segment stability.

Causality: The correlation between Days_Decline and churn does not imply that time causes churn — both may be driven by an unobserved variable (e.g., competitor onboarding). A controlled experiment (randomly assigning merchants to intervention vs. control groups) would establish causal effects.

External variables: The analysis does not include external factors such as sector (retail, hospitality, logistics), region, or macroeconomic conditions that may explain variation in recovery rates.

References

Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making — from data fundamentals to machine learning in Python and R. Lagos Business School / markanalytics.online. https://markanalytics.online

Femi-Akinola, D. (2026). Declined top businesses transaction volume dataset [Dataset]. Collected from Sales Operations Dashboard, Fintech Organisation, Lagos, Nigeria. Data available on request from the author.

Kuhn, M. (2022). caret: Classification and regression training (R package). https://CRAN.R-project.org/package=caret

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

R Core Team. (2024). R: A language and environment for statistical computing (Version 4.x). R Foundation for Statistical Computing. https://www.R-project.org/

Seabold, S., & Perktold, J. (2010). Statsmodels: Econometric and statistical modeling with Python. In Proceedings of the 9th Python in Science Conference (pp. 92–96).

Therneau, T., & Atkinson, B. (2023). rpart: Recursive partitioning and regression trees (R package). https://CRAN.R-project.org/package=rpart

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4

Appendix: AI Usage Statement

Claude (Anthropic) was used as a coding assistant to help structure the Quarto document template, suggest appropriate R and Python package choices, and generate initial code scaffolding for the five analytical techniques. All analytical decisions — including the choice of CS 2, the definition of the Churned outcome variable (Zero_Weeks ≥ 8), the selection of predictor variables, the interpretation of all model outputs, the cluster naming and business recommendations, and the integrated conclusion — were made independently by the author based on domain knowledge as a Sales Manager in the Nigerian fintech sector. The dataset was collected directly from the author’s organisation’s merchant monitoring dashboard and is genuine primary data. All interpretations and recommendations reflect the author’s own professional judgement.

	penalty penalty: {'l1', 'l2', 'elasticnet', None}, default='l2' Specify the norm of the penalty: - `None`: no penalty is added; - `'l2'`: add a L2 penalty term and it is the default choice; - `'l1'`: add a L1 penalty term; - `'elasticnet'`: both L1 and L2 penalty terms are added. .. warning:: Some penalties may not work with some solvers. See the parameter `solver` below, to know the compatibility between the penalty and solver. .. versionadded:: 0.19 l1 penalty with SAGA solver (allowing 'multinomial' + L1) .. deprecated:: 1.8 `penalty` was deprecated in version 1.8 and will be removed in 1.10. Use `l1_ratio` instead. `l1_ratio=0` for `penalty='l2'`, `l1_ratio=1` for `penalty='l1'` and `l1_ratio` set to any float between 0 and 1 for `'penalty='elasticnet'`.	'deprecated'
	C C: float, default=1.0 Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization. `C=np.inf` results in unpenalized logistic regression. For a visual example on the effect of tuning the `C` parameter with an L1 penalty, see: :ref:`sphx_glr_auto_examples_linear_model_plot_logistic_path.py`.	1.0
	l1_ratio l1_ratio: float, default=0.0 The Elastic-Net mixing parameter, with `0 <= l1_ratio <= 1`. Setting `l1_ratio=1` gives a pure L1-penalty, setting `l1_ratio=0` a pure L2-penalty. Any value between 0 and 1 gives an Elastic-Net penalty of the form `l1_ratio * L1 + (1 - l1_ratio) * L2`. .. warning:: Certain values of `l1_ratio`, i.e. some penalties, may not work with some solvers. See the parameter `solver` below, to know the compatibility between the penalty and solver. .. versionchanged:: 1.8 Default value changed from None to 0.0. .. deprecated:: 1.8 `None` is deprecated and will be removed in version 1.10. Always use `l1_ratio` to specify the penalty type.	0.0
	dual dual: bool, default=False Dual (constrained) or primal (regularized, see also :ref:`this equation `) formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer `dual=False` when n_samples > n_features.	False
	tol tol: float, default=1e-4 Tolerance for stopping criteria.	0.0001
	fit_intercept fit_intercept: bool, default=True Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.	True
	intercept_scaling intercept_scaling: float, default=1 Useful only when the solver `liblinear` is used and `self.fit_intercept` is set to `True`. In this case, `x` becomes `[x, self.intercept_scaling]`, i.e. a "synthetic" feature with constant value equal to `intercept_scaling` is appended to the instance vector. The intercept becomes ``intercept_scaling * synthetic_feature_weight``. .. note:: The synthetic feature weight is subject to L1 or L2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) `intercept_scaling` has to be increased.	1
	class_weight class_weight: dict or 'balanced', default=None Weights associated with classes in the form ``{class_label: weight}``. If not given, all classes are supposed to have weight one. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as ``n_samples / (n_classes * np.bincount(y))``. Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified. .. versionadded:: 0.17 class_weight='balanced'	None
	random_state random_state: int, RandomState instance, default=None Used when ``solver`` == 'sag', 'saga' or 'liblinear' to shuffle the data. See :term:`Glossary ` for details.	42
	solver solver: {'lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga'}, default='lbfgs' Algorithm to use in the optimization problem. Default is 'lbfgs'. To choose a solver, you might want to consider the following aspects: - 'lbfgs' is a good default solver because it works reasonably well for a wide class of problems. - For :term:`multiclass` problems (`n_classes >= 3`), all solvers except 'liblinear' minimize the full multinomial loss, 'liblinear' will raise an error. - 'newton-cholesky' is a good choice for `n_samples` >> `n_features * n_classes`, especially with one-hot encoded categorical features with rare categories. Be aware that the memory usage of this solver has a quadratic dependency on `n_features * n_classes` because it explicitly computes the full Hessian matrix. - For small datasets, 'liblinear' is a good choice, whereas 'sag' and 'saga' are faster for large ones; - 'liblinear' can only handle binary classification by default. To apply a one-versus-rest scheme for the multiclass setting one can wrap it with the :class:`~sklearn.multiclass.OneVsRestClassifier`. .. warning:: The choice of the algorithm depends on the penalty chosen (`l1_ratio=0` for L2-penalty, `l1_ratio=1` for L1-penalty and `0 < l1_ratio < 1` for Elastic-Net) and on (multinomial) multiclass support: ================= ======================== ====================== solver l1_ratio multinomial multiclass ================= ======================== ====================== 'lbfgs' l1_ratio=0 yes 'liblinear' l1_ratio=1 or l1_ratio=0 no 'newton-cg' l1_ratio=0 yes 'newton-cholesky' l1_ratio=0 yes 'sag' l1_ratio=0 yes 'saga' 0<=l1_ratio<=1 yes ================= ======================== ====================== .. note:: 'sag' and 'saga' fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from :mod:`sklearn.preprocessing`. .. seealso:: Refer to the :ref:`User Guide ` for more information regarding :class:`LogisticRegression` and more specifically the :ref:`Table ` summarizing solver/penalty supports. .. versionadded:: 0.17 Stochastic Average Gradient (SAG) descent solver. Multinomial support in version 0.18. .. versionadded:: 0.19 SAGA solver. .. versionchanged:: 0.22 The default solver changed from 'liblinear' to 'lbfgs' in 0.22. .. versionadded:: 1.2 newton-cholesky solver. Multinomial support in version 1.6.	'lbfgs'
	max_iter max_iter: int, default=100 Maximum number of iterations taken for the solvers to converge.	1000
	verbose verbose: int, default=0 For the liblinear and lbfgs solvers set verbose to any positive number for verbosity.	0
	warm_start warm_start: bool, default=False When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. Useless for liblinear solver. See :term:`the Glossary `. .. versionadded:: 0.17 warm_start to support lbfgs, newton-cg, sag, saga solvers.	False
	n_jobs n_jobs: int, default=None Does not have any effect. .. deprecated:: 1.8 `n_jobs` is deprecated in version 1.8 and will be removed in 1.10.	None

	criterion criterion: {"gini", "entropy", "log_loss"}, default="gini" The function to measure the quality of a split. Supported criteria are "gini" for the Gini impurity and "log_loss" and "entropy" both for the Shannon information gain, see :ref:`tree_mathematical_formulation`.	'gini'
	splitter splitter: {"best", "random"}, default="best" The strategy used to choose the split at each node. Supported strategies are "best" to choose the best split and "random" to choose the best random split.	'best'
	max_depth max_depth: int, default=None The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.	4
	min_samples_split min_samples_split: int or float, default=2 The minimum number of samples required to split an internal node: - If int, then consider `min_samples_split` as the minimum number. - If float, then `min_samples_split` is a fraction and `ceil(min_samples_split * n_samples)` are the minimum number of samples for each split. .. versionchanged:: 0.18 Added float values for fractions.	2
	min_samples_leaf min_samples_leaf: int or float, default=1 The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least ``min_samples_leaf`` training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression. - If int, then consider `min_samples_leaf` as the minimum number. - If float, then `min_samples_leaf` is a fraction and `ceil(min_samples_leaf * n_samples)` are the minimum number of samples for each node. .. versionchanged:: 0.18 Added float values for fractions.	1
	min_weight_fraction_leaf min_weight_fraction_leaf: float, default=0.0 The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.	0.0
	max_features max_features: int, float or {"sqrt", "log2"}, default=None The number of features to consider when looking for the best split: - If int, then consider `max_features` features at each split. - If float, then `max_features` is a fraction and `max(1, int(max_features * n_features_in_))` features are considered at each split. - If "sqrt", then `max_features=sqrt(n_features)`. - If "log2", then `max_features=log2(n_features)`. - If None, then `max_features=n_features`. .. note:: The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than ``max_features`` features.	None
	random_state random_state: int, RandomState instance or None, default=None Controls the randomness of the estimator. The features are always randomly permuted at each split, even if ``splitter`` is set to ``"best"``. When ``max_features < n_features``, the algorithm will select ``max_features`` at random at each split before finding the best split among them. But the best found split may vary across different runs, even if ``max_features=n_features``. That is the case, if the improvement of the criterion is identical for several splits and one split has to be selected at random. To obtain a deterministic behaviour during fitting, ``random_state`` has to be fixed to an integer. See :term:`Glossary ` for details.	42
	max_leaf_nodes max_leaf_nodes: int, default=None Grow a tree with ``max_leaf_nodes`` in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.	None
	min_impurity_decrease min_impurity_decrease: float, default=0.0 A node will be split if this split induces a decrease of the impurity greater than or equal to this value. The weighted impurity decrease equation is the following:: N_t / N * (impurity - N_t_R / N_t * right_impurity - N_t_L / N_t * left_impurity) where ``N`` is the total number of samples, ``N_t`` is the number of samples at the current node, ``N_t_L`` is the number of samples in the left child, and ``N_t_R`` is the number of samples in the right child. ``N``, ``N_t``, ``N_t_R`` and ``N_t_L`` all refer to the weighted sum, if ``sample_weight`` is passed. .. versionadded:: 0.19	0.0
	class_weight class_weight: dict, list of dict or "balanced", default=None Weights associated with classes in the form ``{class_label: weight}``. If None, all classes are supposed to have weight one. For multi-output problems, a list of dicts can be provided in the same order as the columns of y. Note that for multioutput (including multilabel) weights should be defined for each class of every column in its own dict. For example, for four-class multilabel classification weights should be [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of [{1:1}, {2:5}, {3:1}, {4:1}]. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as ``n_samples / (n_classes * np.bincount(y))`` For multi-output, the weights of each column of y will be multiplied. Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.	None
	ccp_alpha ccp_alpha: non-negative float, default=0.0 Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ``ccp_alpha`` will be chosen. By default, no pruning is performed. See :ref:`minimal_cost_complexity_pruning` for details. See :ref:`sphx_glr_auto_examples_tree_plot_cost_complexity_pruning.py` for an example of such pruning. .. versionadded:: 0.22	0.0
	monotonic_cst monotonic_cst: array-like of int of shape (n_features), default=None Indicates the monotonicity constraint to enforce on each feature. - 1: monotonic increase - 0: no constraint - -1: monotonic decrease If monotonic_cst is None, no constraints are applied. Monotonicity constraints are not supported for: - multiclass classifications (i.e. when `n_classes > 2`), - multioutput classifications (i.e. when `n_outputs_ > 1`), - classifications trained on data with missing values. The constraints hold over the probability of the positive class. Read more in the :ref:`User Guide `. .. versionadded:: 1.4	None

	n_clusters n_clusters: int, default=8 The number of clusters to form as well as the number of centroids to generate. For an example of how to choose an optimal value for `n_clusters` refer to :ref:`sphx_glr_auto_examples_cluster_plot_kmeans_silhouette_analysis.py`.	8
	init init: {'k-means++', 'random'}, callable or array-like of shape (n_clusters, n_features), default='k-means++' Method for initialization: * 'k-means++' : selects initial cluster centroids using sampling based on an empirical probability distribution of the points' contribution to the overall inertia. This technique speeds up convergence. The algorithm implemented is "greedy k-means++". It differs from the vanilla k-means++ by making several trials at each sampling step and choosing the best centroid among them. * 'random': choose `n_clusters` observations (rows) at random from data for the initial centroids. * If an array is passed, it should be of shape (n_clusters, n_features) and gives the initial centers. * If a callable is passed, it should take arguments X, n_clusters and a random state and return an initialization. For an example of how to use the different `init` strategies, see :ref:`sphx_glr_auto_examples_cluster_plot_kmeans_digits.py`. For an evaluation of the impact of initialization, see the example :ref:`sphx_glr_auto_examples_cluster_plot_kmeans_stability_low_dim_dense.py`.	'k-means++'
	n_init n_init: 'auto' or int, default='auto' Number of times the k-means algorithm is run with different centroid seeds. The final results is the best output of `n_init` consecutive runs in terms of inertia. Several runs are recommended for sparse high-dimensional problems (see :ref:`kmeans_sparse_high_dim`). When `n_init='auto'`, the number of runs depends on the value of init: 10 if using `init='random'` or `init` is a callable; 1 if using `init='k-means++'` or `init` is an array-like. .. versionadded:: 1.2 Added 'auto' option for `n_init`. .. versionchanged:: 1.4 Default value for `n_init` changed to `'auto'`.	10
	max_iter max_iter: int, default=300 Maximum number of iterations of the k-means algorithm for a single run.	300
	tol tol: float, default=1e-4 Relative tolerance with regards to Frobenius norm of the difference in the cluster centers of two consecutive iterations to declare convergence.	0.0001
	verbose verbose: int, default=0 Verbosity mode.	0
	random_state random_state: int, RandomState instance or None, default=None Determines random number generation for centroid initialization. Use an int to make the randomness deterministic. See :term:`Glossary `.	42
	copy_x copy_x: bool, default=True When pre-computing distances it is more numerically accurate to center the data first. If copy_x is True (default), then the original data is not modified. If False, the original data is modified, and put back before the function returns, but small numerical differences may be introduced by subtracting and then adding the data mean. Note that if the original data is not C-contiguous, a copy will be made even if copy_x is False. If the original data is sparse, but not in CSR format, a copy will be made even if copy_x is False.	True
	algorithm algorithm: {"lloyd", "elkan"}, default="lloyd" K-means algorithm to use. The classical EM-style algorithm is `"lloyd"`. The `"elkan"` variation can be more efficient on some datasets with well-defined clusters, by using the triangle inequality. However it's more memory intensive due to the allocation of an extra array of shape `(n_samples, n_clusters)`. .. versionchanged:: 0.18 Added Elkan algorithm .. versionchanged:: 1.1 Renamed "full" to "lloyd", and deprecated "auto" and "full". Changed "auto" to use "lloyd" instead of "elkan".	'lloyd'

	n_clusters n_clusters: int, default=8 The number of clusters to form as well as the number of centroids to generate. For an example of how to choose an optimal value for `n_clusters` refer to :ref:`sphx_glr_auto_examples_cluster_plot_kmeans_silhouette_analysis.py`.	3
	init init: {'k-means++', 'random'}, callable or array-like of shape (n_clusters, n_features), default='k-means++' Method for initialization: * 'k-means++' : selects initial cluster centroids using sampling based on an empirical probability distribution of the points' contribution to the overall inertia. This technique speeds up convergence. The algorithm implemented is "greedy k-means++". It differs from the vanilla k-means++ by making several trials at each sampling step and choosing the best centroid among them. * 'random': choose `n_clusters` observations (rows) at random from data for the initial centroids. * If an array is passed, it should be of shape (n_clusters, n_features) and gives the initial centers. * If a callable is passed, it should take arguments X, n_clusters and a random state and return an initialization. For an example of how to use the different `init` strategies, see :ref:`sphx_glr_auto_examples_cluster_plot_kmeans_digits.py`. For an evaluation of the impact of initialization, see the example :ref:`sphx_glr_auto_examples_cluster_plot_kmeans_stability_low_dim_dense.py`.	'k-means++'
	n_init n_init: 'auto' or int, default='auto' Number of times the k-means algorithm is run with different centroid seeds. The final results is the best output of `n_init` consecutive runs in terms of inertia. Several runs are recommended for sparse high-dimensional problems (see :ref:`kmeans_sparse_high_dim`). When `n_init='auto'`, the number of runs depends on the value of init: 10 if using `init='random'` or `init` is a callable; 1 if using `init='k-means++'` or `init` is an array-like. .. versionadded:: 1.2 Added 'auto' option for `n_init`. .. versionchanged:: 1.4 Default value for `n_init` changed to `'auto'`.	25
	max_iter max_iter: int, default=300 Maximum number of iterations of the k-means algorithm for a single run.	300
	tol tol: float, default=1e-4 Relative tolerance with regards to Frobenius norm of the difference in the cluster centers of two consecutive iterations to declare convergence.	0.0001
	verbose verbose: int, default=0 Verbosity mode.	0
	random_state random_state: int, RandomState instance or None, default=None Determines random number generation for centroid initialization. Use an int to make the randomness deterministic. See :term:`Glossary `.	42
	copy_x copy_x: bool, default=True When pre-computing distances it is more numerically accurate to center the data first. If copy_x is True (default), then the original data is not modified. If False, the original data is modified, and put back before the function returns, but small numerical differences may be introduced by subtracting and then adding the data mean. Note that if the original data is not C-contiguous, a copy will be made even if copy_x is False. If the original data is sparse, but not in CSR format, a copy will be made even if copy_x is False.	True
	algorithm algorithm: {"lloyd", "elkan"}, default="lloyd" K-means algorithm to use. The classical EM-style algorithm is `"lloyd"`. The `"elkan"` variation can be more efficient on some datasets with well-defined clusters, by using the triangle inequality. However it's more memory intensive due to the allocation of an extra array of shape `(n_samples, n_clusters)`. .. versionchanged:: 0.18 Added Elkan algorithm .. versionchanged:: 1.1 Renamed "full" to "lloyd", and deprecated "auto" and "full". Changed "auto" to use "lloyd" instead of "elkan".	'lloyd'