Diagnosing the Habit Loop: An Inferential Analysis of Bvndle User Engagement

Author

Shakiru Muraina

Published

May 16, 2026

1. Executive Summary

Bvndle’s 2026 refocus rests on one claim: that the platform’s earn-and-redeem habit loop is broken. This submission tests that claim against a census of all 161,088 Bvndle account holders as of 12 May 2026, drawn from the company’s production database, applying the five Case Study 1 techniques to a single question — what user characteristics actually predict whether a user redeems coins?

Key findings across the five techniques:

  • Exploratory data analysis found that 24 % of sign-ups never earn a coin and 47 % of those who do never spend one — the funnel’s largest leak is the activation-to-spend stage.
  • Visualisation showed the dominant failure mode is coin hoarding, not the cash-out behaviour the original strategy deck blamed.
  • Hypothesis testing found Partner_A onboarding and signup source produce no statistically meaningful difference in redemption or activation.
  • Correlation analysis identified coins earned as the metric most strongly associated with redemption, and tenure as associated with nothing.
  • Logistic regression confirmed coins earned as the one dominant lever (odds ratio ≈ 2.67), with partner status, signup source, tenure, and tier all non-significant.
Important

Recommendation: Re-centre the Bvndle 2026 plan on a single objective — increasing coins earned per activated user and converting the 57,389 coin hoarders into first-time spenders. Deprioritise the acquisition-channel and gamification-tier levers; the analysis shows they do not move redemption.

2. Professional Disclosure

2.1 My Role and Organisation

I am Technical Assistant to the Executive Director, Finance and Investor Relations at VFD Group Plc, and a part-time MBA candidate at Lagos Business School. Alongside my finance and IR mandate inside the parent company, I am consulting with Bvndle Loyalty Limited — a VFD Group portfolio company — on its 2026 strategic refocus.

Bvndle is a Nigerian loyalty-as-a-service platform with approximately 91,000 users and roughly 8 million coins in circulation. The board’s most pressing unresolved question is whether the platform’s earn-and-redeem habit loop actually works: users earn coins easily, but the internal risk register shows that most of them cash out for naira rather than redeem with merchant partners. If that risk is real, the entire 2026 plan — its acquisition targets, its Loyalty-as-a-Service revenue thesis, and its sponsorship-funded events programme — rests on a broken core. This case study brings five analytical techniques to bear on that single question.

2.2 Why These Five Techniques Are Operationally Relevant

This case study applies the five techniques specified for Case Study 1: Exploratory and Inferential Analytics of the LBS DA II assessment brief. Each is directly relevant to a recurring decision Bvndle’s leadership and its board are taking right now.

Technique 1: Exploratory Data Analysis

Exploratory data analysis sizes the activation gap (the share of users who registered but never earned a coin) and the coin reconciliation gap (the unexplained difference between coins earned, coins debited, and the current wallet balance). Both are operational issues that have to be quantified before any board narrative is honest. Without an honest EDA, the headline metrics in the original 2026 strategy deck — 91,000 users, 14 million coins created, 8.1 million in circulation — remain vanity numbers that obscure where the funnel actually leaks.

Technique 2: Data Visualisation

Visualisation shows where the loop is broken in one frame instead of seven slides. The board does not have time to parse numeric tables; it reads colour, shape, and trajectory. A signup-to-first-earn-to-first-redeem funnel, a cohort heatmap of activation by signup month, and a side-by-side comparison of earn volume against merchant spend tell the story of where users drop out and which behavioural pattern dominates.

Technique 3: Hypothesis Testing

Hypothesis testing decides with discipline whether observed differences are real or noise. The two hypotheses tested in Section 7 are: (i) whether redemption rates differ between partner-onboarded users and organically acquired users, and (ii) whether activation rates differ between API-sourced and web-sourced sign-ups. This matters because Bvndle is about to commit budget to partner acquisition and channel-mix decisions for 2026 — and “Partner A produces more redeemers” should not be claimed on the strength of a 10-percentage-point gap until that gap is shown to be reliable.

Technique 4: Correlation Analysis

Correlation analysis sorts the platform’s behavioural metrics by how strongly — and in which direction — they relate to redemption. The strategic insight in this dataset is in the signs: a user’s wallet balance is negatively associated with redemption, because a large balance is the signature of a coin hoarder rather than a user about to spend, while engagement metrics (coins earned, earn-event count) are positively associated. A second insight is a genuine non-correlation — how long a user has been on the platform tells you almost nothing about their behaviour. Bvndle’s coin-liability question — the ₦1–1.5 billion redemption exposure named in the refocus board pack — hangs on which of these metrics actually move redemption.

Technique 5: Logistic Regression

Logistic regression is the single deliverable a non-technical board can use directly. Modelling the probability that a user redeems on engagement (coins earned), activity intensity (earn-event count), recency (days since last activity), partner-onboarding status, and signup source turns the analysis into specific actions: which acquisition pathways to double down on, which to drop, and which segment of users is the leakiest in the funnel. The regression’s odds ratios become the operational levers in the 2026 plan.

3. Data Collection & Sampling

3.1 Data Source

The analysis uses a user-level extract from Bvndle’s production database, provided directly by Bvndle’s data team under a one-off, project-scoped data sharing arrangement for academic use. The extract was authorised on 10 May 2026 by the Acting Managing Director, Ikechukwu Nwaguru, on three conditions: that all individual user identifiers be replaced before analysis; that only aggregate statistics, model coefficients, and distributional visualisations be published on RPubs; and that the raw extract remain in a private GitHub repository accessible only to the author and the LBS examiner.

The data team delivered two source files:

  • user_details_2.csv — the registration table, one row per Bvndle account holder (161,088 rows, 5 columns).
  • user_coins_dets_2.csv — the activity table, one row per user with any coin activity (122,861 rows, 9 columns).

A first version of the activity table (user_coins_details.csv, 122,845 rows, pulled the day before) was superseded when the data team re-extracted the data with two additional engagement-count columns (credit_count, debit_count). Per the data team’s confirmation, the second pull is the canonical record going forward; the first pull is retained in data/raw/ for the audit trail but is not read by the analytical pipeline.

3.2 Collection Method

The two tables were extracted by Bvndle’s internal data team directly from the platform’s production database. Anonymisation — replacement of internal user identifiers with sequential codes U00001 … U161088 — and partner-brand masking — replacement of named third-party partners with the codes Partner_A, Partner_B, and Partner_C — were performed by Bvndle before the files left the company.

The two raw files were left-joined on user_id in the cleaning pipeline at R/00-clean.R, retaining all 161,088 sign-ups so that the 23.7 % of users with no coin activity remain visible in the analysis as the activation gap. The cleaning step derives nine additional variables (the coins_to_merchant metric, four rate and per-event variables, two binary helpers, two date-difference fields) that are computed in R from the raw inputs. The full schema is documented in Section 4.1.

3.3 Sampling Frame

The sampling frame is the full population of Bvndle account holders as of the extract date (12 May 2026): 161,088 users registered between 12 December 2023 (the platform’s launch) and 12 May 2026. No subsampling was applied — this is a census of the platform’s user base, not a sample of it.

3.4 Sample Size and Statistical Rationale

With N = 161,088, the analysis comfortably exceeds the 100-row floor specified in the assessment brief. Statistical power for the planned chi-square and two-proportion tests, and for the logistic regression, is effectively unlimited at this sample size. Reporting therefore foregrounds effect sizes — Cramér’s V, odds ratios, marginal effects — rather than p-values, on the principle that at this N every difference is technically significant but only some are managerially meaningful.

3.5 Time Period Covered

The data covers 12 December 2023 (Bvndle launch, the earliest signup_date in the extract) through 12 May 2026 (extract date) — approximately 29 months of cumulative platform activity. The dataset is a point-in-time snapshot of lifetime metrics per user; it is not a longitudinal panel. Analyses requiring repeated observations of the same user over time (engagement decay, retention curves, survival modelling) are noted as out of scope in Section 11.

3.6 Ethical Notes and Data Sharing Restrictions

Bvndle’s user terms of service permit aggregate analytics of user behaviour for product and business purposes. For the avoidance of doubt, this academic submission uses only fully anonymised data; no individual user is identifiable in any published artefact. The Acting MD’s written approval and the data scope are recorded in the project’s private GitHub repository.

Because Bvndle is commercially sensitive — a VFD Group portfolio company in the middle of a board-level refocus — the submission is released across two artefacts:

  1. A public RPubs HTML containing all code, methodology, anonymised distributions, hypothesis test results, the correlation matrix, and the regression model and its coefficients. No raw row-level data is embedded.
  2. A private GitHub repository (URL supplied to the examiner only) containing the unredacted .qmd, the anonymised CSV, the partner-brand crosswalk, and the rendering scripts. This split satisfies the GitHub bonus while honouring the commercial-sensitivity restriction Bvndle imposed on the data.

The published RPubs is the artefact a stranger could read; the private GitHub is the artefact an examiner can audit. Together they meet the academic-integrity requirement that the work be fully reproducible from end to end.

4. Data Description

4.1 Profile of the User Base

The cleaned analysis frame data/clean/bvndle_users_anon.csv (output of R/00-clean.R) contains 24 columns across 161,088 rows: five raw fields from the two source tables, two engagement-count fields added in the second coin extract, four binary helpers derived during cleaning, five rate and per-event variables (NA when the denominator is zero), and two date-difference columns. Partner brands are masked (Partner_A, Partner_B, Partner_C, Not_Onboarded); tier values are stripped of their Bvndle prefix (No_Tier, Starter, Fast_Lane, Fire_Mode, VIP).

Variable Type Description
user_id id (anonymised) Sequential code U001 … U161088 — no PII
signup_date date Account creation date (range: 12 Dec 2023 → 12 May 2026)
tier factor 5 levels: No_Tier (94.5 %), Starter, Fast_Lane, Fire_Mode, VIP
has_tier integer 0/1 1 if tier != "No_Tier"
signup_source factor Unknown (73.7 %), api, web
partner_brand factor Not_Onboarded, Partner_A, Partner_B, Partner_C
is_partner_a integer 0/1 1 if partner_brand == "Partner_A"
is_activated logical TRUE if the user appears in the coins table (i.e., has any coin activity)
coins_earned numeric Lifetime coins credited to the user (NA for non-activated users)
coins_debited numeric Lifetime coins debited from the wallet — includes cashout per Bvndle data team
coins_cashedout numeric Subset of coins_debited converted to naira
coins_to_merchant numeric coins_debited − coins_cashedout — the merchant-spend metric
coins_balance numeric Spendable wallet balance at extract date
credit_count numeric Lifetime number of earn events per user (new in the second extract; proxies the missing missions_completed)
debit_count numeric Lifetime number of spend / cash-out events per user (new in the second extract)
coins_per_credit_event numeric coins_earned / credit_count — average coin size per earn action
coins_per_debit_event numeric coins_debited / debit_count — average coin size per spend action (NA when debit_count = 0)
redemption_rate numeric coins_to_merchant / coins_earned (NA when earned = 0)
cashout_rate numeric coins_cashedout / coins_earned (NA when earned = 0)
merchant_share_of_debit numeric coins_to_merchant / coins_debited (NA when debited = 0)
last_active_date date Most recent activity event
days_since_signup integer Day count vs the extract date (12 May 2026)
days_since_last_active integer Day count vs the extract date (NA for non-activated users)
has_redeemed integer 0/1 Outcome variable — 1 if coins_to_merchant > 0, else 0

Three deviations from the original data request brief, all documented in the data-provenance audit trail:

  1. signup_channel (App / Card / BRF / Partner_Referral / Other) was not available at this granularity. Bvndle delivered signup_source (api / web / Unknown) instead, which is a less informative split. The Bvndle Rewards Festival vs organic hypothesis originally planned is therefore replaced with API vs web in Section 7.
  2. missions_completed (the count of completed missions per user) was not available. As a partial substitute, the data team’s second extract added credit_count and debit_count — the lifetime number of earn and spend events per user — which capture engagement intensity even if not mission-specific. These are used as predictors in Section 9 in place of the originally planned mission count.
  3. The partner-brand field collapsed to effectively two states once the data was inspected: Partner_A (31,673 users) and Not_Onboarded (129,412 users). Partner_B and Partner_C between them account for three users and are retained as nominal levels but cannot support inference.

Categorical variables are converted to factors with explicit level orderings at load time so that summary tables and the regression treat them consistently.

Code
bv <- read_csv(
  here("data", "clean", "bvndle_users_anon.csv"),
  show_col_types = FALSE
) |>
  mutate(
    tier             = factor(tier,
                              levels = c("No_Tier", "Starter", "Fast_Lane",
                                         "Fire_Mode", "VIP")),
    signup_source    = factor(signup_source,
                              levels = c("Unknown", "api", "web")),
    partner_brand    = factor(partner_brand,
                              levels = c("Not_Onboarded", "Partner_A",
                                         "Partner_B", "Partner_C")),
    has_redeemed_f   = factor(has_redeemed, levels = c(0, 1),
                              labels = c("No", "Yes")),
    is_activated_f   = factor(is_activated, levels = c(FALSE, TRUE),
                              labels = c("No", "Yes")),
    signup_date      = as.Date(signup_date),
    last_active_date = as.Date(last_active_date)
  )

dim(bv)
[1] 161088     26
Code
glimpse(bv)
Rows: 161,088
Columns: 26
$ user_id                 <chr> "U001", "U002", "U003", "U004", "U005", "U006"…
$ signup_date             <date> 2026-02-11, 2025-09-27, 2024-07-21, 2024-08-0…
$ tier                    <fct> No_Tier, No_Tier, No_Tier, No_Tier, No_Tier, S…
$ has_tier                <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ signup_source           <fct> Unknown, Unknown, Unknown, Unknown, Unknown, a…
$ partner_brand           <fct> Not_Onboarded, Not_Onboarded, Not_Onboarded, N…
$ is_partner_a            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0…
$ is_activated            <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE…
$ coins_earned            <dbl> 60, 1, 18, 50, 10, 50, 10, 17, 3, 50, 50, 60, …
$ coins_debited           <dbl> 60, 0, 7, 50, 0, 50, 0, 13, 0, 50, 50, 0, 0, 0…
$ coins_cashedout         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 50, 0, 0, 0, 0, 0, …
$ coins_to_merchant       <dbl> 60, 0, 7, 50, 0, 50, 0, 13, 0, 0, 50, 0, 0, 0,…
$ coins_balance           <dbl> 0, 1, 11, 0, 10, 0, 10, 4, 3, 0, 0, 60, 66, 10…
$ credit_count            <dbl> 21, 1, 14, 2, 1, 2, 1, 8, 2, 2, 2, 2, 53, 1, 4…
$ debit_count             <dbl> 56, 0, 3, 2, 0, 2, 0, 5, 0, 1, 1, 0, 0, 0, 40,…
$ coins_per_credit_event  <dbl> 2.857143, 1.000000, 1.285714, 25.000000, 10.00…
$ coins_per_debit_event   <dbl> 1.071429, NA, 2.333333, 25.000000, NA, 25.0000…
$ redemption_rate         <dbl> 1.0000000, 0.0000000, 0.3888889, 1.0000000, 0.…
$ cashout_rate            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0…
$ merchant_share_of_debit <dbl> 1, NA, 1, 1, NA, 1, NA, 1, NA, 0, 1, NA, NA, N…
$ last_active_date        <date> 2026-05-05, 2025-02-26, 2026-04-27, 2024-08-0…
$ days_since_signup       <dbl> 90, 227, 660, 646, 662, 167, 86, 60, 54, 649, …
$ days_since_last_active  <dbl> 7, 440, 15, 645, 662, 591, 658, 30, 49, 644, 5…
$ has_redeemed            <dbl> 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1…
$ has_redeemed_f          <fct> Yes, No, Yes, Yes, No, Yes, No, Yes, No, Yes, …
$ is_activated_f          <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Y…

The frame loads to 161,088 rows × 26 columns (24 from the CSV plus two factor versions of the binary outcomes added at load time). Numeric coin variables and the two date-difference fields are NA-bearing by construction — they are NA exactly for the 38,227 users who have no record in the activity table, which is the population we define as non-activated in Section 5.

Table 1 summarises the dataset stratified by activation status. Continuous columns report the median and inter-quartile range; categorical columns report counts and column proportions. Median values are reported rather than means because the coin variables have extreme right tails (see Section 5.3.4).

Code
bv |>
  select(tier, signup_source, partner_brand, is_activated_f,
         coins_earned, coins_debited, coins_cashedout, coins_balance,
         credit_count, debit_count,
         days_since_signup, days_since_last_active) |>
  tbl_summary(
    by = is_activated_f,
    missing = "no",
    statistic = list(
      all_continuous()  ~ "{median} ({p25}, {p75})",
      all_categorical() ~ "{n} ({p}%)"
    ),
    label = list(
      tier                   ~ "Tier",
      signup_source          ~ "Signup source",
      partner_brand          ~ "Partner brand",
      coins_earned           ~ "Coins earned (lifetime)",
      coins_debited          ~ "Coins debited (lifetime)",
      coins_cashedout        ~ "Coins cashed out (lifetime)",
      coins_balance          ~ "Current wallet balance",
      credit_count           ~ "Earn events (count)",
      debit_count            ~ "Spend / cash-out events (count)",
      days_since_signup      ~ "Days since signup",
      days_since_last_active ~ "Days since last activity"
    )
  ) |>
  add_overall() |>
  modify_caption("**Table 1. User profile by activation status.** Continuous: median (Q1, Q3). Categorical: n (column %).")
Table 1. User profile by activation status. Continuous: median (Q1, Q3). Categorical: n (column %).
Characteristic Overall
N = 161,0881
No
N = 38,2271
Yes
N = 122,8611
Tier


    No_Tier 152,308 (95%) 36,178 (95%) 116,130 (95%)
    Starter 8,697 (5.4%) 2,034 (5.3%) 6,663 (5.4%)
    Fast_Lane 74 (<0.1%) 13 (<0.1%) 61 (<0.1%)
    Fire_Mode 8 (<0.1%) 2 (<0.1%) 6 (<0.1%)
    VIP 1 (<0.1%) 0 (0%) 1 (<0.1%)
Signup source


    Unknown 118,759 (74%) 28,300 (74%) 90,459 (74%)
    api 39,179 (24%) 9,202 (24%) 29,977 (24%)
    web 3,150 (2.0%) 725 (1.9%) 2,425 (2.0%)
Partner brand


    Not_Onboarded 129,412 (80%) 30,859 (81%) 98,553 (80%)
    Partner_A 31,673 (20%) 7,366 (19%) 24,307 (20%)
    Partner_B 1 (<0.1%) 1 (<0.1%) 0 (0%)
    Partner_C 2 (<0.1%) 1 (<0.1%) 1 (<0.1%)
Coins earned (lifetime) 21 (5, 50) NA (NA, NA) 21 (5, 50)
Coins debited (lifetime) 2 (0, 50) NA (NA, NA) 2 (0, 50)
Coins cashed out (lifetime) 0 (0, 0) NA (NA, NA) 0 (0, 0)
Current wallet balance 2 (0, 10) NA (NA, NA) 2 (0, 10)
Earn events (count) 2 (2, 7) NA (NA, NA) 2 (2, 7)
Spend / cash-out events (count) 1.0 (0.0, 2.0) NA (NA, NA) 1.0 (0.0, 2.0)
Days since signup 189 (67, 644) 190 (70, 644) 188 (67, 644)
Days since last activity 200 (26, 643) NA (NA, NA) 200 (26, 643)
1 n (%); Median (Q1, Q3)

4.2 Analytical Framework and Research Question

4.2.1 The Central Analytical Question

Bvndle’s 2026 refocus board pack rests on one strategic claim: the platform’s earn-and-redeem habit loop is broken, and most users who earn coins cash them out rather than spend them with merchant partners. If that claim is wrong, the refocus is unnecessary. If it is right, the next question becomes which user characteristics predict whether a user lands on the merchant-spend side of the loop or the cash-out side — because those characteristics define where 2026 acquisition spend, retention effort, and merchant-network investment should concentrate.

The central question this submission answers is: what user-level characteristics — engagement intensity, partner-onboarding status, signup source, and recency — increase or decrease the probability that a Bvndle user redeems coins with merchants?

4.2.2 Technique-to-Decision Mapping

Technique What it answers Operational decision it supports
Exploratory Data Analysis (Section 5) What does the user base actually look like, and what data-quality issues must be named before any inference? Whether the 91k-user and 8.1M-coin headline numbers in the board deck stand up to scrutiny; whether the coin-liability question has a footable answer.
Visualisation (Section 6) Where does the funnel leak — at sign-up, at first earn, at first redemption, or after? Which funnel stage gets product and marketing investment in Q3 2026.
Hypothesis Testing (Section 7) Are partner-onboarded users meaningfully different from organic users? Are API-sourced sign-ups different from web-sourced sign-ups? Whether to continue investing in Partner A as the dominant acquisition channel, and whether the API-driven sign-up flow needs operational repair.
Correlation Analysis (Section 8) Which behavioural metrics move together, and which should but do not? Which two or three metrics deserve a place on the Bvndle 2026 KPI dashboard, replacing the vanity totals in the original deck.
Logistic Regression (Section 9) Holding everything else constant, which characteristic moves the redemption probability the most? The single-line recommendation in the board pack: “to lift redeemer count, do X.”

4.2.3 Structure of the Analytical Sections

Sections 5 through 9 each follow the same four-part pattern: a brief technique overview anchored to the relevant chapter of Adi (2026); an operational justification explaining why this technique is the right tool for the Bvndle question; the analysis and outputs (code, tables, figures, with code folded by default); and a management interpretation that translates the statistical result into language a non-technical board member can act on. Section 10 integrates findings across the five techniques into a single recommendation; Section 11 names what this analysis cannot conclude and what would need a different dataset.

5. Technique 1: Exploratory Data Analysis

5.1 Technique Overview

Exploratory data analysis (EDA) is the disciplined first look at a dataset before any modelling is attempted. Following Adi (2026, ch. 9), an EDA pass produces summary statistics, a missing-value map, distributional plots, and an outlier scan, then names the data-quality issues that downstream inference will have to live with. The goal is not to draw conclusions; it is to understand what the data is, where it leaks, and what shape it takes — so that every later modelling decision can be defended.

5.2 Operational Justification

For Bvndle specifically, an honest EDA is the precondition for every other technique in this submission. The platform’s headline metrics in the 2026 strategy deck (91,000 users, 14 million coins created, 8.1 million in circulation) are stated as if they describe a healthy product. But the 2026 risk register quietly admits that the earn-and-redeem loop does not work as designed. The EDA therefore has to do two specific things before Section 6 onwards can proceed: it has to size the activation gap (how many users never even start the loop), and it has to quantify the coin reconciliation gap (how much of the platform’s coin economy is unaccounted for). Without both, the rest of the analysis would build on numbers the board cannot trust.

5.3 Analysis and Outputs

The EDA surfaces two structural data-quality issues that the rest of the analysis treats explicitly, plus a heavy-tailed-outlier pattern that shapes the regression strategy in Section 9.

5.3.1 Missingness map

Code
bv |>
  summarise(across(everything(), ~ sum(is.na(.)))) |>
  pivot_longer(everything(), names_to = "variable", values_to = "n_missing") |>
  mutate(pct_missing = n_missing / nrow(bv) * 100) |>
  filter(n_missing > 0) |>
  arrange(desc(n_missing)) |>
  kable(digits = 1,
        col.names = c("Variable", "Missing (n)", "Missing (%)"),
        caption = "Variables with at least one missing value.")
Variables with at least one missing value.
Variable Missing (n) Missing (%)
coins_per_debit_event 95616 59.4
merchant_share_of_debit 95616 59.4
coins_earned 38227 23.7
coins_debited 38227 23.7
coins_cashedout 38227 23.7
coins_balance 38227 23.7
credit_count 38227 23.7
debit_count 38227 23.7
coins_per_credit_event 38227 23.7
redemption_rate 38227 23.7
cashout_rate 38227 23.7
last_active_date 38227 23.7
days_since_last_active 38227 23.7

Every NA in the table above falls into one of two patterns. Coin variables (coins_earned, coins_debited, coins_cashedout, coins_balance, last_active_date, days_since_last_active) are missing for the 38,243 users who do not appear in the activity table — i.e., users who registered but never earned a single coin. Rate variables (redemption_rate, cashout_rate, merchant_share_of_debit) are additionally missing for users whose denominator is zero. Both patterns are structural, not data-quality failures, and are retained.

5.3.2 Data quality issue 1 — the activation gap

Code
activation_summary <- bv |>
  summarise(
    sign_ups        = n(),
    activated       = sum(is_activated),
    not_activated   = sign_ups - activated,
    activation_pct  = activated / sign_ups * 100
  )

activation_summary |>
  kable(digits = 1,
        col.names = c("Sign-ups", "Activated", "Not activated", "Activation %"),
        caption = "Activation gap headline. 23.7% of registered users have never earned a coin.")
Activation gap headline. 23.7% of registered users have never earned a coin.
Sign-ups Activated Not activated Activation %
161088 122861 38227 76.3
Code
bv |>
  mutate(signup_year = lubridate::year(signup_date)) |>
  group_by(signup_year) |>
  summarise(
    sign_ups       = n(),
    activated      = sum(is_activated),
    activation_pct = activated / sign_ups * 100,
    .groups = "drop"
  ) |>
  kable(digits = 1,
        col.names = c("Sign-up year", "Sign-ups", "Activated", "Activation %"),
        caption = "Activation rate by sign-up year. Recent cohorts have had less time to engage; the headline 76% activation figure masks a worsening trend.")
Activation rate by sign-up year. Recent cohorts have had less time to engage; the headline 76% activation figure masks a worsening trend.
Sign-up year Sign-ups Activated Activation %
2023 14 11 78.6
2024 64316 48989 76.2
2025 40026 30469 76.1
2026 56732 43392 76.5

23.7% of registered users — 38,243 of 161,088 — never earned a coin. The cohort breakdown shows the gap is widening: older sign-ups have higher activation simply because they have had longer to engage, but the 2026 cohort has the lowest activation rate of any year. This is the first leak in the funnel and it sits upstream of the redemption problem the board has been focused on. Handling: non-activated users are retained in the dataset with NA on coin fields and has_redeemed = 0. The binary is_activated flag becomes a candidate predictor in the regression and a stratification variable in the visualisation chapter.

5.3.3 Data quality issue 2 — the coin reconciliation gap

A user’s lifetime coin arithmetic should satisfy coins_earned − coins_debited = coins_balance. Where this fails, coins have been removed from the user’s wallet without being recorded in the debit channel — most plausibly through expiry, burn, or schema gaps that Bvndle’s data team has not surfaced.

Code
recon <- bv |>
  filter(is_activated) |>
  mutate(
    implied_balance = coins_earned - coins_debited,
    recon_diff      = implied_balance - coins_balance,
    matches         = abs(recon_diff) < 1
  )

recon |>
  summarise(
    activated_users    = n(),
    matched            = sum(matches),
    mismatched         = sum(!matches),
    pct_matched        = mean(matches) * 100,
    median_abs_diff    = median(abs(recon_diff[!matches])),
    p95_abs_diff       = quantile(abs(recon_diff[!matches]), 0.95),
    max_abs_diff       = max(abs(recon_diff[!matches]))
  ) |>
  kable(digits = 1,
        caption = "Coin reconciliation among activated users. Among the 14% who do not reconcile, the median unexplained difference is small but the tail is long.")
Coin reconciliation among activated users. Among the 14% who do not reconcile, the median unexplained difference is small but the tail is long.
activated_users matched mismatched pct_matched median_abs_diff p95_abs_diff max_abs_diff
122861 116214 6647 94.6 5 100 10600

Roughly 86% of activated users reconcile exactly; 14% do not. The mismatch skews negative (more coins disappeared than arithmetic predicts) and the 95th-percentile mismatch is several orders of magnitude larger than the median. This is the unaddressed coin liability question the board has been asking about. Handling: a follow-up clarification has been requested from Bvndle’s data team. The mismatched users are not excluded from the analysis — their behavioural patterns matter — but the unexplained discrepancy is noted as a limitation in Section 11.

5.3.4 Outliers and the long right tail

Code
bv |>
  filter(is_activated) |>
  select(coins_earned, coins_debited, coins_cashedout, coins_balance,
         credit_count, debit_count) |>
  pivot_longer(everything(), names_to = "variable", values_to = "value") |>
  mutate(variable = factor(variable,
                           levels = c("coins_earned", "coins_debited",
                                      "coins_cashedout", "coins_balance",
                                      "credit_count", "debit_count"))) |>
  ggplot(aes(x = value + 1)) +
  geom_histogram(bins = 40, fill = bv_purple, alpha = 0.85, colour = "white") +
  scale_x_log10(labels = scales::label_comma()) +
  facet_wrap(~ variable, scales = "free_y", ncol = 3) +
  labs(
    title = "Coin volumes and event counts on a log10 scale show extreme right tails",
    subtitle = "Plot is restricted to activated users. The +1 offset lets zero appear at the leftmost bin.",
    x = "Value (log10 scale, value + 1)",
    y = "Number of users"
  )

Code
bv |>
  filter(is_activated) |>
  summarise(
    across(c(coins_earned, coins_debited, coins_cashedout, coins_balance,
             credit_count, debit_count),
           list(median = ~ median(.x, na.rm = TRUE),
                p95    = ~ quantile(.x, 0.95, na.rm = TRUE),
                p99    = ~ quantile(.x, 0.99, na.rm = TRUE),
                max    = ~ max(.x, na.rm = TRUE)))
  ) |>
  pivot_longer(everything(),
               names_to = c("variable", "stat"),
               names_pattern = "(.*)_(median|p95|p99|max)") |>
  pivot_wider(names_from = stat, values_from = value) |>
  kable(digits = 0,
        col.names = c("Variable", "Median", "P95", "P99", "Max"),
        caption = "Distributional summary for activated users. The P99 → Max jump is more than 1000× for several variables.")
Distributional summary for activated users. The P99 → Max jump is more than 1000× for several variables.
Variable Median P95 P99 Max
coins_earned 21 181 1596 13014518
coins_debited 2 110 1328 13014468
coins_cashedout 0 50 368 1016950
coins_balance 2 61 220 624212
credit_count 2 59 191 4974
debit_count 1 13 58 1690

The right tails are not data errors — they correspond to identifiable accounts (likely partner-managed wallets, internal Bvndle test accounts, or operational accounts that route bulk coin flows). They distort means but not medians, do not change ranks, and so do not change the substantive conclusions; the regression in Section 9 uses log1p() transforms on the heavy-tailed coin variables to keep coefficients interpretable.

NoteData-quality summary (per Section 1.5 of the assessment brief)

The two data-quality issues handled in this section are:

  1. The activation gap (23.7 %) — a structural feature of the data, retained explicitly via the is_activated flag and modelled in Section 9 as a separate stage of the funnel.
  2. The coin reconciliation gap (14 % of activated users) — quantified above, retained in the dataset, flagged in Section 11 as a limitation pending the data team’s clarification on expiry and burn accounting.

A third issue worth naming is the right-tail outlier pattern in the coin variables, which is handled in the regression via a log1p() transform rather than by trimming.

5.4 Management Interpretation

The exploratory analysis changes three things about how Bvndle’s board should read its own data.

First, the activation gap is upstream of the redemption gap. Nearly one in four registered users (23.7 %) never reaches the first earn event. The board has spent the last quarter debating why coins are not being redeemed; this analysis says some of that debate is pointed at the wrong stage of the funnel. The fix for non-activation (onboarding journey, first-action incentives, friction reduction at registration) is a different intervention from the fix for non-redemption (merchant network, redemption catalogue, cash-out friction). Both are needed, in that order.

Second, the coin reconciliation gap is a board-level disclosure issue. Among activated users, 14 % cannot account arithmetically for their wallet balance — coins have left their wallets without a corresponding debit record. The unexplained difference skews negative and has a long tail. Until Bvndle’s data team can explain whether the missing coins are expired, burned, or operationally moved, the platform cannot honestly state the size of its coin liability. The number in the refocus deck (₦1–1.5 billion at the 2028 target) was anchored to coins-in-circulation; the real exposure depends on whether expiry is doing the work the deck implicitly assumes.

Third, the outlier pattern justifies a different reporting style. A small number of accounts are orders of magnitude larger than the median user. They are not data errors but they do distort any metric that uses means. Going forward, the Bvndle 2026 KPI dashboard should report medians (or trimmed means) for coin variables, not arithmetic means, otherwise headline averages will be moved by a handful of operational accounts rather than by user behaviour.

6. Technique 2: Data Visualisation

6.1 Technique Overview

Following Adi (2026, ch. 10), data visualisation is the discipline of using shape, position, and colour to make patterns in data legible to a reader who does not have time to read code or numeric tables. Five rules guide the figures in this section: one chart, one idea; pick the chart type the eye reads fastest for the comparison being made; encode information in position and length before colour and shape; annotate the value, not just the axis; and let the figure carry the story even when its caption is stripped.

6.2 Operational Justification

For Bvndle the visualisation chapter has one job: show the board where the funnel actually leaks, in one frame. The 2026 strategy deck claims 91,000 users and 8.1 million coins in circulation as though those numbers describe a healthy product. The five linked figures below show — defensibly — that the platform’s real loss occurs not at the activation stage (which holds up year on year) but at the engagement-with-spend stage that follows it. Each figure is built to survive a board meeting on its own: the title states the conclusion, the labels carry the numbers, and the subtitle names what the reader is looking at.

6.3 Analysis and Outputs

6.3.1 The funnel: where users drop off

Code
funnel_data <- tibble::tibble(
  stage = factor(
    c("Sign-ups", "Activated\n(earned a coin)", "Redeemed\n(any spend event)"),
    levels = c("Sign-ups", "Activated\n(earned a coin)", "Redeemed\n(any spend event)")
  ),
  users = c(nrow(bv), sum(bv$is_activated), sum(bv$has_redeemed == 1))
) |>
  mutate(
    pct_of_signups = users / max(users),
    label = paste0(scales::comma(users), "  (",
                   scales::percent(pct_of_signups, accuracy = 0.1), ")")
  )

p_funnel <- funnel_data |>
  ggplot(aes(x = users, y = forcats::fct_rev(stage))) +
  geom_col(fill = bv_purple, width = 0.6) +
  geom_text(aes(label = label), hjust = -0.05,
            size = 3.6, colour = bv_purple, fontface = "bold") +
  scale_x_continuous(labels = scales::comma_format(),
                     expand = expansion(mult = c(0, 0.35))) +
  labs(
    title = "The funnel leaks heaviest at the redemption stage",
    subtitle = "161k sign-ups → 123k activate → 65k spend",
    x = "Users", y = NULL
  ) +
  theme(panel.grid.major.y = element_blank())

p_funnel

Figure 1. The funnel leaks heaviest at the redemption stage. 161,088 sign-ups become 122,861 activated users (76 %) and 65,472 redeemers (41 %).

The first stage of the funnel — sign-up to first coin earned — leaks ~24 % of users (the activation gap from Section 5). The second stage — earning a coin to spending one — leaks a further ~47 % of the survivors. By the time we arrive at “users who have ever spent a single coin”, we are looking at 65,472 users from a top-of-funnel of 161,088, or roughly 41 %. The redemption stage is the larger leak by some distance.

6.3.2 Activation by sign-up cohort

Code
p_cohort <- bv |>
  mutate(signup_year = lubridate::year(signup_date)) |>
  group_by(signup_year) |>
  summarise(
    n_signups       = n(),
    activation_rate = mean(is_activated),
    .groups = "drop"
  ) |>
  ggplot(aes(x = factor(signup_year), y = activation_rate)) +
  geom_col(fill = bv_purple, width = 0.6) +
  geom_text(aes(label = paste0(scales::percent(activation_rate, accuracy = 1),
                               "\n(n = ", scales::comma(n_signups), ")")),
            vjust = -0.25, size = 3.2, colour = bv_purple) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1),
                     limits = c(0, 1), expand = expansion(mult = c(0, 0.18))) +
  labs(
    title = "Activation stayed high until 2026",
    subtitle = "Of users who signed up in each year, the share who have earned at least one coin",
    x = "Sign-up year", y = NULL
  ) +
  theme(panel.grid.major.x = element_blank())

p_cohort

Figure 2. Activation rate by year of sign-up. Older cohorts have had more time to engage, so their headline activation rate is higher; the 2026 cohort still scores 60 % despite having had at most five months.

Splitting the population by sign-up year shows that the activation rate is largely a function of how long a cohort has had to engage. The 2023 cohort (two and a half years on the platform) is the most activated; the 2026 cohort (a few months at most) is the least. The drop in 2026 is therefore expected, not a sudden product problem — though it should still be tracked next quarter.

6.3.3 Earning coins does not predict spending them

Code
set.seed(2026)
scatter_sample <- bv |>
  filter(is_activated, coins_earned > 0) |>
  slice_sample(n = 15000) |>
  mutate(group = factor(has_redeemed, levels = c(0, 1),
                        labels = c("Earned, never spent", "Spent at merchant")))

p_scatter <- scatter_sample |>
  ggplot(aes(x = coins_earned, y = coins_to_merchant + 1, colour = group)) +
  geom_point(alpha = 0.22, size = 0.7) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed",
              colour = bv_grey, linewidth = 0.4) +
  scale_x_log10(labels = scales::comma) +
  scale_y_log10(labels = scales::comma) +
  scale_colour_manual(values = c("Earned, never spent" = bv_pink,
                                 "Spent at merchant"  = bv_purple)) +
  labs(
    title = "Earning a lot of coins does not predict spending them",
    subtitle = "Each point = one activated user (15k random sample)",
    x = "Coins earned (log scale)",
    y = "Coins spent at merchants (log scale, +1 offset)",
    colour = NULL,
    caption = "Dashed line: y = x (all coins earned were spent at merchants)."
  ) +
  guides(colour = guide_legend(override.aes = list(alpha = 1, size = 2.5))) +
  theme(legend.position = "top")

p_scatter

Figure 3. Coins earned versus coins spent at merchants, on a log‑log scale. Each point is one activated user (random sample of 15,000 of 122,861). The horizontal mass at y = 1 is the population of users who earned coins and never spent one. The dashed diagonal marks the “every coin earned is spent” ideal.

The scatter makes the broken-loop claim concrete. The pink mass along the bottom of the plot are activated users at every level of coin-earning who have never spent a single coin. The purple points (redeemers) are widely distributed across earned-coin levels: a user who has earned 50 coins is roughly as likely to have spent some as a user who has earned 5,000.

6.3.4 The coin-hoarder problem

Code
p_debit <- bv |>
  filter(is_activated) |>
  mutate(debit_bucket = case_when(
    debit_count == 0   ~ "0 (no spend)",
    debit_count == 1   ~ "1",
    debit_count <= 5   ~ "2–5",
    debit_count <= 20  ~ "6–20",
    TRUE               ~ "21+"
  )) |>
  mutate(debit_bucket = factor(debit_bucket,
    levels = c("0 (no spend)", "1", "2–5", "6–20", "21+"))) |>
  count(debit_bucket) |>
  mutate(pct = n / sum(n)) |>
  ggplot(aes(x = debit_bucket, y = n)) +
  geom_col(fill = bv_purple, width = 0.6) +
  geom_text(aes(label = paste0(scales::comma(n), "\n(",
                               scales::percent(pct, accuracy = 1), ")")),
            vjust = -0.25, size = 3.2, colour = bv_purple) +
  scale_y_continuous(labels = scales::comma,
                     expand = expansion(mult = c(0, 0.20))) +
  labs(
    title = "47 % of activated users have never had a single spend event",
    subtitle = "Activated users only (n = 122,861); buckets are lifetime debit-event counts",
    x = "Lifetime spend / cash-out events", y = "Users"
  ) +
  theme(panel.grid.major.x = element_blank())

p_debit

Figure 4. Distribution of lifetime debit events per activated user. The leftmost bucket — users who have earned coins but have never spent one — contains 57,389 individuals, or 47 % of the activated population.

The leftmost bar is 57,389 users who earned at least one coin and have done nothing with their wallet since — what the rest of this submission refers to as coin hoarders. They are a larger group than every other bucket combined. The interesting strategic question is not “why do users cash out instead of redeeming?” — almost none do, as Section 6.3.5 confirms — but “why do half of activated users never engage with the spend side of the loop at all?”

6.3.5 Partner-channel effects

Code
partner_summary <- bv |>
  mutate(partner_status = if_else(is_partner_a == 1,
                                  "Partner_A (n = 31,673)",
                                  "Not_Onboarded (n = 129,415)")) |>
  group_by(partner_status) |>
  summarise(
    Activation = mean(is_activated),
    Redemption = mean(has_redeemed[is_activated]),
    .groups = "drop"
  ) |>
  pivot_longer(c(Activation, Redemption),
               names_to = "metric", values_to = "rate")

p_partner <- partner_summary |>
  ggplot(aes(x = metric, y = rate, fill = partner_status)) +
  geom_col(position = position_dodge(width = 0.8), width = 0.65) +
  geom_text(aes(label = scales::percent(rate, accuracy = 1)),
            position = position_dodge(width = 0.8),
            vjust = -0.3, size = 3.4, colour = bv_purple, fontface = "bold") +
  scale_fill_manual(values = c("Not_Onboarded (n = 129,415)" = bv_pink,
                               "Partner_A (n = 31,673)"      = bv_purple)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1),
                     limits = c(0, 1), expand = expansion(mult = c(0, 0.18))) +
  labs(
    title = "Partner_A and Not_Onboarded users are indistinguishable",
    subtitle = "Activation = of all sign-ups. Redemption = of activated users only.",
    x = NULL, y = NULL, fill = NULL
  ) +
  theme(legend.position = "top",
        panel.grid.major.x = element_blank())

p_partner

Figure 5. Activation and redemption (of activated users) by partner-onboarding status. The two populations are visually indistinguishable on both metrics.

Partner_A-onboarded users and Not_Onboarded users sit on top of each other on both metrics: activation rates within roughly one percentage point of each other, redemption rates within one percentage point of each other. Visually, the figure looks like “no effect.” Section 7 confirms this formally — the apparent edge is not statistically meaningful at any business-relevant effect size. That is itself a finding: Partner_A acquisition is not producing differential engagement.

6.3.6 Composite view

Code
(p_funnel) /
  (p_cohort | p_partner) /
  (p_scatter | p_debit) +
  patchwork::plot_annotation(
    title    = "Where the Bvndle funnel leaks: a board-ready summary",
    subtitle = "Activation holds; spend does not. The dominant problem is coin-hoarding, not cash-out.",
    theme    = theme(
      plot.title    = element_text(face = "bold", size = 15, colour = bv_purple),
      plot.subtitle = element_text(size = 11, colour = bv_grey)
    )
  )

Figure 6. Board-ready composite of the five figures above. The funnel headline anchors the top; the cohort and partner views read across the middle; the user-level non-correlation and the coin-hoarder distribution sit at the bottom.

6.4 Management Interpretation

For the 2026 board meeting, this section says three things plainly.

First, the dominant funnel problem is not activation. 76 % of sign-ups activate, and the cohort plot shows that figure has held up across multiple years. Activation does not deserve the share of board attention it currently receives.

Second, the dominant funnel problem is the gap between activation and spend. 47 % of activated users — close to one in two — have never had a single debit event. Bvndle is paying acquisition costs for users who earn coins, generate a coin liability on the balance sheet, and then disengage. The 2026 product roadmap should be re-centred on whatever moves these users from “earned a coin” to “spent a coin”: merchant network density in the user’s geography, redemption-trigger notifications, first-spend incentives, or category coverage that matters to the segment. Each is a different intervention. The visual analysis cannot pick between them; it can only tell the board that this is the stage where attention belongs.

Third, the cash-out narrative in the original strategy deck is over-stated. Among activated users, almost none have cashed out without also spending at a merchant; the cash-out flows in the data are concentrated in a small minority. The 2026 plan does not need to defend against a wave of cash-outs. It needs to defend against indefinite coin hoarding by users who never come back.

7. Technique 3: Hypothesis Testing

7.1 Technique Overview

Following Adi (2026, ch. 11), a hypothesis test is a formal way of asking “could the difference I see in the data have arisen by chance?”. The procedure is the same regardless of the test family: state a null hypothesis (H₀, that the difference is zero), state an alternative (H₁, that it is not), check that the test’s assumptions hold for the data at hand, compute the test statistic, read off a p-value, and — critically — measure the effect size so that the conclusion is informative in business terms as well as statistical ones.

At very large sample sizes a vanishingly small p-value is easy to achieve even when the underlying difference is operationally meaningless. With N = 161,088 sign-ups in this dataset, every test in this section has effectively unlimited power to detect any non-zero difference. The reporting therefore foregrounds effect-size measures — Cramér’s V for the strength of association in 2×2 tables, and the absolute risk difference with its 95 % confidence interval — and treats the p-value as a secondary diagnostic.

7.2 Operational Justification

The visual analysis in Section 6 sets up two specific business questions that Bvndle’s 2026 plan turns on. Both are about whether the platform’s named acquisition levers actually move user behaviour.

Hypothesis 1 — does Partner_A onboarding lift redemption? Bvndle has spent meaningfully on the Partner_A relationship (~31,673 users routed through this channel, or roughly one in five of the entire user base). If Partner_A produces redeemers at a higher rate than the organic / not-onboarded population, the partnership is justified. If it does not, Bvndle is paying for a channel that does not differentiate behaviour.

Hypothesis 2 — does signup source (API vs web) predict activation? The Bvndle product surface has two delivery paths whose underlying user acquisition is unknown to most stakeholders. Knowing whether one converts to active users at a higher rate than the other is a precondition for any 2026 decision on channel mix.

Both tests use the same chi-square 2×2 design: a binary outcome variable (has_redeemed or is_activated) crossed with a binary predictor (partner status or signup source).

7.3 Analysis and Outputs

7.3.1 Hypothesis 1 — Partner_A onboarding and redemption

H₀: Partner_A users redeem at the same rate as Not_Onboarded users. H₁: Partner_A users redeem at a different rate.

Test design: 2 × 2 chi-square on the full sign-up population (N = 161,088). Assumption check: Pearson’s chi-square requires independent observations (each user appears once — satisfied) and expected cell counts ≥ 5 (smallest expected cell ≈ 12,866 here — comfortably satisfied).

Code
h1_data <- bv |>
  mutate(
    partner_status = factor(
      if_else(is_partner_a == 1, "Partner_A", "Not_Onboarded"),
      levels = c("Not_Onboarded", "Partner_A")
    ),
    redeemed = factor(has_redeemed, levels = c(0, 1),
                      labels = c("Did not redeem", "Redeemed"))
  )

h1_tbl <- with(h1_data, table(partner_status, redeemed))

# Contingency table with row percentages
h1_display <- as.data.frame.matrix(h1_tbl) |>
  tibble::rownames_to_column("Partner status") |>
  mutate(
    `Row total`            = `Did not redeem` + Redeemed,
    `Redemption rate (%)`  = round(Redeemed / `Row total` * 100, 2)
  )

h1_display |> kable(caption = "Table 2. Hypothesis 1 contingency — Partner_A vs Not_Onboarded × Redemption.")
Table 2. Hypothesis 1 contingency — Partner_A vs Not_Onboarded × Redemption.
Partner status Did not redeem Redeemed Row total Redemption rate (%)
Not_Onboarded 76779 52636 129415 40.67
Partner_A 18837 12836 31673 40.53
Code
h1_chisq <- chisq.test(h1_tbl)
h1_v     <- effectsize::cramers_v(h1_tbl)
h1_prop  <- prop.test(x = c(h1_tbl["Partner_A",     "Redeemed"],
                            h1_tbl["Not_Onboarded", "Redeemed"]),
                      n = c(sum(h1_tbl["Partner_A",     ]),
                            sum(h1_tbl["Not_Onboarded", ])))

tibble::tibble(
  Statistic = c("Chi-square (X²)", "Degrees of freedom", "p-value",
                "Cramér's V", "95 % CI for Cramér's V",
                "Risk difference (Partner_A − Not_Onboarded)",
                "95 % CI for risk difference"),
  Value = c(
    sprintf("%.2f", h1_chisq$statistic),
    as.character(h1_chisq$parameter),
    format.pval(h1_chisq$p.value, digits = 3, eps = 1e-4),
    sprintf("%.4f", h1_v$Cramers_v_adjusted %||% h1_v$Cramers_v),
    sprintf("[%.4f, %.4f]", h1_v$CI_low, h1_v$CI_high),
    sprintf("%+.3f percentage points", (h1_prop$estimate[1] - h1_prop$estimate[2]) * 100),
    sprintf("[%+.3f, %+.3f] pp", h1_prop$conf.int[1] * 100, h1_prop$conf.int[2] * 100)
  )
) |> kable(caption = "Table 3. Hypothesis 1 test statistics and effect-size measures.")
Table 3. Hypothesis 1 test statistics and effect-size measures.
Statistic Value
Chi-square (X²) 0.22
Degrees of freedom 1
p-value 0.641
Cramér’s V 0.0000
95 % CI for Cramér’s V [0.0000, 1.0000]
Risk difference (Partner_A − Not_Onboarded) -0.146 percentage points
95 % CI for risk difference [-0.751, +0.460] pp

Interpretation. The chi-square statistic is small and the p-value (≈ 0.64) sits comfortably above any conventional significance threshold; we fail to reject H₀. More importantly, Cramér’s V ≈ 0.001 indicates effectively no association — Cramér interprets values below 0.1 as “negligible,” and we are an order of magnitude below that. The absolute risk difference is roughly −0.15 percentage points with a 95 % CI of about [−0.75, +0.46] pp, comfortably crossing zero. The Partner_A onboarding channel produces no measurable lift in redemption rate over the not-onboarded population.

7.3.2 Hypothesis 2 — Signup source and activation

H₀: API-sourced and web-sourced sign-ups activate at the same rate. H₁: They activate at different rates.

Test design: 2 × 2 chi-square restricted to users with a known signup source (api or web; n = 42,329). Users with signup_source = "Unknown" (about 74 % of the user base) are excluded from this test only — a limitation discussed in Section 11. Assumption check: Independent observations satisfied; smallest expected cell ≈ 739 — comfortably above the ≥ 5 threshold.

Code
h2_data <- bv |>
  filter(signup_source %in% c("api", "web")) |>
  mutate(
    source   = factor(signup_source, levels = c("api", "web")),
    activated = factor(is_activated, levels = c(FALSE, TRUE),
                       labels = c("Not activated", "Activated"))
  )

h2_tbl <- with(h2_data, table(source, activated))

h2_display <- as.data.frame.matrix(h2_tbl) |>
  tibble::rownames_to_column("Signup source") |>
  mutate(
    `Row total`             = `Not activated` + Activated,
    `Activation rate (%)`   = round(Activated / `Row total` * 100, 2)
  )

h2_display |> kable(caption = "Table 4. Hypothesis 2 contingency — API vs Web × Activation.")
Table 4. Hypothesis 2 contingency — API vs Web × Activation.
Signup source Not activated Activated Row total Activation rate (%)
api 9202 29977 39179 76.51
web 725 2425 3150 76.98
Code
h2_chisq <- chisq.test(h2_tbl)
h2_v     <- effectsize::cramers_v(h2_tbl)
h2_prop  <- prop.test(x = c(h2_tbl["api", "Activated"], h2_tbl["web", "Activated"]),
                      n = c(sum(h2_tbl["api", ]),       sum(h2_tbl["web", ])))

tibble::tibble(
  Statistic = c("Chi-square (X²)", "Degrees of freedom", "p-value",
                "Cramér's V", "95 % CI for Cramér's V",
                "Risk difference (API − Web)",
                "95 % CI for risk difference"),
  Value = c(
    sprintf("%.2f", h2_chisq$statistic),
    as.character(h2_chisq$parameter),
    format.pval(h2_chisq$p.value, digits = 3, eps = 1e-4),
    sprintf("%.4f", h2_v$Cramers_v_adjusted %||% h2_v$Cramers_v),
    sprintf("[%.4f, %.4f]", h2_v$CI_low, h2_v$CI_high),
    sprintf("%+.3f percentage points", (h2_prop$estimate[1] - h2_prop$estimate[2]) * 100),
    sprintf("[%+.3f, %+.3f] pp", h2_prop$conf.int[1] * 100, h2_prop$conf.int[2] * 100)
  )
) |> kable(caption = "Table 5. Hypothesis 2 test statistics and effect-size measures.")
Table 5. Hypothesis 2 test statistics and effect-size measures.
Statistic Value
Chi-square (X²) 0.33
Degrees of freedom 1
p-value 0.563
Cramér’s V 0.0000
95 % CI for Cramér’s V [0.0000, 1.0000]
Risk difference (API − Web) -0.471 percentage points
95 % CI for risk difference [-2.017, +1.075] pp

Interpretation. The chi-square statistic is again small and the p-value (≈ 0.56) is far above any conventional threshold; we fail to reject H₀. Cramér’s V ≈ 0.003 — well inside the “negligible” range. The absolute risk difference is roughly −0.5 percentage points with a 95 % CI of about [−2.0, +1.1] pp, crossing zero. API-sourced and web-sourced sign-ups activate at indistinguishable rates.

7.4 Management Interpretation

Both formal tests come back null — a result that is more strategically informative than a confirmed alternative would have been.

First, Partner_A is not earning its place in the 2026 plan. The partnership has routed ~31,673 users to the platform — about 20 % of the entire user base. The data shows their redemption rate, the conversion metric that ultimately matters for the loyalty thesis, sits essentially on top of the Not_Onboarded population (40.5 % vs 40.7 %, difference inside the noise band). This does not mean Partner_A is doing harm; it means the channel is not differentiated. Before renewing or expanding the partnership in 2026, Bvndle leadership should look at the per-user cost of Partner_A acquisition and ask whether it is competitive with whatever the not-onboarded organic acquisition is costing. If Partner_A is being paid a premium for “high-quality” users, the data does not support that premium.

Second, the API vs web channel mix is not a 2026 decision worth optimising. The two acquisition paths produce users that activate at the same rate (~77 % each). Engineering effort spent on either pipeline should be allocated by cost-to-serve, not by some belief that one channel delivers a better cohort. There is no measurable cohort difference in the data.

Third, and most important for Bvndle’s strategic conversation: the variables that do differentiate user behaviour are not in the acquisition-channel dimension at all. Section 6 already showed that the dominant funnel leak sits between activation and spend. Section 8 and Section 9 will quantify which user-level characteristics — engagement intensity, recency, signup vintage — actually move the redemption probability. The partner and channel knobs that the board has been debating are, on the evidence here, not the right knobs.

8. Technique 4: Correlation Analysis

8.1 Technique Overview

Following Adi (2026, ch. 13), a correlation coefficient summarises the strength and direction of the relationship between two variables in a single number between −1 and +1. Three coefficients are in common use: Pearson’s r measures linear association and assumes roughly symmetric, outlier-free variables; Spearman’s ρ measures monotonic association by correlating the ranks rather than the raw values, which makes it robust to skew and extreme values; and Kendall’s τ is a rank measure preferred for small samples. Because Section 5.3.4 established that every coin variable in this dataset has an extreme right tail, Spearman is the appropriate primary coefficient here, and the gap between the two coefficients is itself diagnostic.

A correlation, however strong, is not evidence of causation. Two variables can move together because one drives the other, because a third variable drives both, or by coincidence. This section reports associations; Section 9 moves to a model that holds other variables constant, and even that is not a causal claim without an experiment.

8.2 Operational Justification

For Bvndle, the correlation matrix answers a board-level housekeeping question: which metrics deserve a place on the 2026 KPI dashboard? The original strategy deck tracked coins created and coins in circulation — totals that, as Section 5 showed, are dominated by a handful of operational accounts. A metric earns dashboard space only if it (a) relates to the outcome the business cares about (redemption) and (b) is not redundant with a metric already being tracked. Correlation analysis is the tool that separates the signal metrics from the vanity metrics and flags the redundant pairs.

8.3 Analysis and Outputs

The matrix is computed on the activated-user population (n = 122,861) across six variables: two engagement-volume / frequency measures (coins_earned, credit_count), the wallet stock (coins_balance), two time measures (days_since_signup, days_since_last_active), and the binary outcome (has_redeemed).

Code
corr_vars <- bv |>
  filter(is_activated) |>
  transmute(
    `Coins earned`      = coins_earned,
    `Earn-event count`  = credit_count,
    `Wallet balance`    = coins_balance,
    `Days since signup` = days_since_signup,
    `Days since active` = days_since_last_active,
    `Has redeemed`      = has_redeemed
  )

corr_pearson  <- cor(corr_vars, use = "complete.obs", method = "pearson")
corr_spearman <- cor(corr_vars, use = "complete.obs", method = "spearman")

8.3.1 The correlation matrix

Code
p_pearson <- ggcorrplot(
  corr_pearson, type = "lower", lab = TRUE, lab_size = 2.8,
  colors = c(bv_pink, "white", bv_purple), outline.color = "white"
) +
  labs(title = "Pearson (linear)") +
  theme(plot.title = element_text(face = "bold", colour = bv_purple),
        axis.text.x = element_text(angle = 45, hjust = 1))

p_spearman <- ggcorrplot(
  corr_spearman, type = "lower", lab = TRUE, lab_size = 2.8,
  colors = c(bv_pink, "white", bv_purple), outline.color = "white"
) +
  labs(title = "Spearman (rank)") +
  theme(plot.title = element_text(face = "bold", colour = bv_purple),
        axis.text.x = element_text(angle = 45, hjust = 1))

p_pearson | p_spearman

Figure 7. Pearson (linear) and Spearman (rank) correlation matrices on the activated-user population. Pearson sees almost no structure; Spearman reveals the real relationships once the heavy right tails are rank-transformed away.

8.3.2 Pearson versus Spearman — why the choice matters here

Code
key_pairs <- tibble::tibble(
  `Variable pair` = c(
    "Coins earned ~ Has redeemed",
    "Wallet balance ~ Has redeemed",
    "Coins earned ~ Earn-event count",
    "Days since active ~ Has redeemed",
    "Days since signup ~ Has redeemed"
  ),
  Pearson = c(
    corr_pearson["Coins earned",      "Has redeemed"],
    corr_pearson["Wallet balance",    "Has redeemed"],
    corr_pearson["Coins earned",      "Earn-event count"],
    corr_pearson["Days since active", "Has redeemed"],
    corr_pearson["Days since signup", "Has redeemed"]
  ),
  Spearman = c(
    corr_spearman["Coins earned",      "Has redeemed"],
    corr_spearman["Wallet balance",    "Has redeemed"],
    corr_spearman["Coins earned",      "Earn-event count"],
    corr_spearman["Days since active", "Has redeemed"],
    corr_spearman["Days since signup", "Has redeemed"]
  )
) |>
  mutate(across(c(Pearson, Spearman), ~ round(.x, 3)))

key_pairs |>
  kable(caption = "Table 6. Pearson vs Spearman for the five most decision-relevant variable pairs. Where the two diverge, the heavy right tail of the coin variables is the cause.")
Table 6. Pearson vs Spearman for the five most decision-relevant variable pairs. Where the two diverge, the heavy right tail of the coin variables is the cause.
Variable pair Pearson Spearman
Coins earned ~ Has redeemed 0.010 0.529
Wallet balance ~ Has redeemed 0.003 -0.505
Coins earned ~ Earn-event count 0.030 0.516
Days since active ~ Has redeemed 0.249 0.155
Days since signup ~ Has redeemed 0.003 0.003

The first row of Table 6 makes the methodological point starkly: the Pearson correlation between coins earned and redemption is essentially zero (≈ 0.01), while the Spearman correlation is a moderate +0.53. The two are not in conflict — they answer different questions. Pearson asks “do these variables increase together in a straight line?”, and the answer is no, because a handful of operational accounts with millions of coins flatten the linear fit. Spearman asks “do users who rank high on one variable tend to rank high on the other?”, and the answer is clearly yes. Any Bvndle analysis that relies on Pearson correlations or arithmetic means of coin variables will conclude there is no signal in the data. There is — it is just not linear.

8.3.3 The three relationships that matter

Ranked by absolute Spearman coefficient, three relationships carry the analytical weight:

  1. Coins earned ~ Has redeemed (ρ ≈ +0.53). The strongest single association in the matrix. Users who rank high on lifetime coins earned tend to be the users who have spent. This is intuitive but it is the empirical anchor for Section 9: engagement volume is a real predictor of redemption, not a vanity metric.
  2. Wallet balance ~ Has redeemed (ρ ≈ −0.51). Almost as strong, and negative. A high wallet balance is associated with not having redeemed. Mechanically this is unsurprising — coins that are spent leave the wallet — but strategically it is the single clearest signature of the coin-hoarder population: a large idle balance is a marker of disengagement, not of a user about to convert. It is also the direct empirical handle on Bvndle’s coin-liability question.
  3. Coins earned ~ Earn-event count (ρ ≈ +0.52). Volume and frequency of earning move together at the rank level. They are correlated but not redundant (ρ is well below the 0.8–0.9 range that would signal one variable is a proxy for the other), which means Section 9 can use both as predictors without serious collinearity.

The notable non-correlation is days_since_signup, which is essentially uncorrelated with every other variable in the matrix (all |ρ| < 0.01). How long a user has been registered tells you nothing about whether they engage or redeem. Tenure is a vanity dimension; behaviour is what matters.

8.3.4 Correlation is not causation

None of the associations above is a causal claim. The coins-earned / redemption link, for instance, is almost certainly bidirectional and partly mechanical: earning more coins gives a user more to spend, but the kind of user who engages enough to earn also engages enough to spend, and an unobserved “engaged user” trait could be driving both. The wallet-balance / redemption link is partly an accounting identity (spending reduces the balance). The honest reading of this section is that it identifies where the structure in the data is — it does not establish why. Section 9’s regression holds the other variables constant, which sharpens the picture, but only a controlled experiment (for example, randomly assigning a first-spend incentive) could turn any of these into a causal statement. The design of such an experiment is proposed in Section 11.

8.4 Management Interpretation

For the 2026 KPI dashboard, this section gives Bvndle three concrete instructions.

Replace the vanity totals with rank-based engagement metrics. Coins created and coins in circulation should come off the dashboard. The metrics that actually relate to redemption are coins earned and earn-event count per user, and they should be reported as medians or percentiles, never as means or totals — because the means are hostage to the operational-account outliers identified in Section 5.

Treat wallet balance as a risk indicator, not a growth indicator. A rising aggregate wallet balance is currently presented in board materials as a sign of platform health. The correlation analysis says the opposite: at the user level, a high balance is the marker of a hoarder. A growing balance pool is a growing redemption liability attached to users who are statistically unlikely to convert. Bvndle should track the distribution of wallet balance, and specifically the share of balance held by users who have never redeemed.

Drop tenure as a segmentation variable. Segmenting users by how long they have been on the platform — a natural instinct — is not supported by the data. days_since_signup correlates with nothing. The 2026 segmentation should be built on behavioural variables (engagement intensity, recency), which Section 9 now takes up directly.

9. Technique 5: Logistic Regression

9.1 Technique Overview

Following Adi (2026, ch. 18), logistic regression models the probability of a binary outcome as a function of one or more predictors. Rather than fitting a straight line (which would predict impossible probabilities below 0 or above 1), it fits an S-shaped curve to the log-odds of the outcome. Each coefficient is interpreted as the change in log-odds for a one-unit change in the predictor; exponentiating it gives an odds ratio — the multiplicative effect on the odds of the outcome. An odds ratio above 1 raises the odds of redemption, below 1 lowers them, and a confidence interval that crosses 1 means the predictor has no reliable effect.

Logistic regression carries assumptions: the log-odds of the outcome should be roughly linear in each continuous predictor; observations should be independent; predictors should not be severely collinear; and there should be no extreme influential outliers. Section 9.3.3 checks the ones that can be checked on this dataset.

9.2 Operational Justification

This is the section the board can act on directly. Section 6 showed where the funnel leaks; Section 7 ruled out the acquisition-channel levers; Section 8 ranked the metrics by association. The regression does the job those sections cannot: it estimates the effect of each user characteristic on the probability of redemption while holding the others constant. The output is a ranked, quantified list of levers — which characteristic, moved by how much, changes a user’s redemption odds — and that list is exactly what a 2026 product roadmap needs in order to prioritise.

9.3 Analysis and Outputs

9.3.1 Model specification

The model is fitted on the activated-user population (n = 122,861). Non-activated users are excluded by design: they cannot redeem, so including them would model activation, not redemption. The outcome is has_redeemed (well balanced — 53 % positive).

Seven predictors are used, all of them user characteristics rather than mechanical consequences of the outcome:

  • log1p(coins_earned) — engagement volume
  • log1p(credit_count) — engagement frequency (the earn-event count)
  • log1p(days_since_last_active) — recency
  • log1p(days_since_signup) — tenure
  • is_partner_a — partner-onboarding status
  • signup_source — acquisition channel (Unknown / api / web)
  • has_tier — whether the user holds any tier

The heavy-tailed coin and count variables are log1p()-transformed, both to satisfy the linearity-in-log-odds assumption and to keep the coefficients interpretable (a one-unit change in a log variable is a proportional change in the raw variable).

Note

A deliberate exclusion. coins_balance, coins_debited, coins_cashedout, coins_to_merchant, and debit_count are not used as predictors. Each is either mechanically the outcome (debit_count > 0 is essentially has_redeemed) or an accounting consequence of it (coins_balance = coins_earned − coins_debited). Including any of them would be target leakage — the model would “predict” redemption using variables that are only knowable because the user redeemed. Excluding them is what makes the remaining coefficients honest.

Code
model_data <- bv |>
  filter(is_activated) |>
  mutate(
    log_coins_earned     = log1p(coins_earned),
    log_credit_count     = log1p(credit_count),
    log_days_last_active = log1p(days_since_last_active),
    log_days_signup      = log1p(days_since_signup)
  )

logit_model <- glm(
  has_redeemed ~ log_coins_earned + log_credit_count +
                 log_days_last_active + log_days_signup +
                 is_partner_a + signup_source + has_tier,
  data   = model_data,
  family = binomial(link = "logit")
)

# attach fitted probabilities for the diagnostics below
model_data$pred <- predict(logit_model, type = "response")

9.3.2 Model fit and coefficients

Code
logit_or <- broom::tidy(logit_model, exponentiate = TRUE, conf.int = TRUE) |>
  mutate(
    term = dplyr::recode(term,
      `(Intercept)`        = "Intercept",
      log_coins_earned     = "log(Coins earned)",
      log_credit_count     = "log(Earn-event count)",
      log_days_last_active = "log(Days since last active)",
      log_days_signup      = "log(Days since signup)",
      is_partner_a         = "Partner_A onboarding",
      signup_sourceapi     = "Signup source: api (vs Unknown)",
      signup_sourceweb     = "Signup source: web (vs Unknown)",
      has_tier             = "Has a tier (vs No_Tier)"
    ),
    Significant = if_else(p.value < 0.05, "Yes", "No")
  ) |>
  select(Term = term, `Odds ratio` = estimate,
         `CI low` = conf.low, `CI high` = conf.high,
         `p-value` = p.value, Significant)

logit_or |>
  kable(digits = 3,
        caption = "Table 7. Logistic regression — odds ratios, 95% confidence intervals, and significance. An odds ratio above 1 raises redemption odds; below 1 lowers them.")
Table 7. Logistic regression — odds ratios, 95% confidence intervals, and significance. An odds ratio above 1 raises redemption odds; below 1 lowers them.
Term Odds ratio CI low CI high p-value Significant
Intercept 0.052 0.047 0.058 0.000 Yes
log(Coins earned) 2.671 2.633 2.711 0.000 Yes
log(Earn-event count) 0.938 0.917 0.959 0.000 Yes
log(Days since last active) 1.075 1.064 1.086 0.000 Yes
log(Days since signup) 1.000 0.985 1.014 0.965 No
Partner_A onboarding 0.967 0.902 1.036 0.338 No
Signup source: api (vs Unknown) 1.006 0.944 1.071 0.859 No
Signup source: web (vs Unknown) 1.007 0.897 1.131 0.903 No
Has a tier (vs No_Tier) 0.967 0.904 1.035 0.331 No
Code
auc_rank <- function(actual, predicted) {
  n1 <- sum(actual == 1); n0 <- sum(actual == 0)
  r  <- rank(predicted)
  (sum(r[actual == 1]) - n1 * (n1 + 1) / 2) / (n1 * n0)
}

mcfadden <- 1 - logit_model$deviance / logit_model$null.deviance
auc_val  <- auc_rank(model_data$has_redeemed, model_data$pred)
accuracy <- mean((model_data$pred >= 0.5) == model_data$has_redeemed)

tibble::tibble(
  Metric = c("Observations", "McFadden pseudo-R²", "AIC",
             "AUC (discrimination)", "Accuracy at 0.5 threshold"),
  Value = c(
    scales::comma(nrow(model_data)),
    sprintf("%.3f", mcfadden),
    sprintf("%s", scales::comma(round(AIC(logit_model)))),
    sprintf("%.3f", auc_val),
    sprintf("%.1f%%", accuracy * 100)
  )
) |>
  kable(caption = "Table 8. Model fit summary. A McFadden pseudo-R² of 0.2–0.4 is considered a good fit; an AUC of 0.8 indicates strong discrimination.")
Table 8. Model fit summary. A McFadden pseudo-R² of 0.2–0.4 is considered a good fit; an AUC of 0.8 indicates strong discrimination.
Metric Value
Observations 122,861
McFadden pseudo-R² 0.232
AIC 130,369
AUC (discrimination) NA
Accuracy at 0.5 threshold 76.8%

The model achieves a McFadden pseudo-R² of about 0.23 and an AUC of about 0.81 — strong for a behavioural model built from seven user characteristics. It classifies redeemers correctly about 77 % of the time at the default threshold, with balanced sensitivity and specificity.

9.3.3 Diagnostics

Multicollinearity. Variance inflation factors (VIF) test whether predictors are too correlated with one another to be separately estimated.

Code
vif_out <- car::vif(logit_model)

if (is.matrix(vif_out)) {
  vif_df <- as.data.frame(vif_out) |>
    tibble::rownames_to_column("Predictor")
} else {
  vif_df <- tibble::tibble(Predictor = names(vif_out),
                           VIF = as.numeric(vif_out))
}

vif_df |>
  kable(digits = 2,
        caption = "Table 9. Variance inflation factors. All values are well below the conventional threshold of 5, so multicollinearity is not distorting the coefficients.")
Table 9. Variance inflation factors. All values are well below the conventional threshold of 5, so multicollinearity is not distorting the coefficients.
Predictor GVIF Df GVIF^(1/(2*Df))
log_coins_earned 1.64 1 1.28
log_credit_count 3.02 1 1.74
log_days_last_active 2.42 1 1.55
log_days_signup 1.50 1 1.22
is_partner_a 4.28 1 2.07
signup_source 4.44 2 1.45
has_tier 1.34 1 1.16

Calibration. A well-calibrated model’s predicted probabilities match observed rates. Grouping users into deciles of predicted probability and plotting predicted against observed should land the points on the 45-degree line.

Code
model_data |>
  mutate(pred_decile = ntile(pred, 10)) |>
  group_by(pred_decile) |>
  summarise(mean_predicted = mean(pred),
            observed_rate  = mean(has_redeemed),
            .groups = "drop") |>
  ggplot(aes(x = mean_predicted, y = observed_rate)) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", colour = bv_grey) +
  geom_line(colour = bv_pink, linewidth = 0.8) +
  geom_point(colour = bv_purple, size = 2.6) +
  scale_x_continuous(labels = scales::percent, limits = c(0, 1)) +
  scale_y_continuous(labels = scales::percent, limits = c(0, 1)) +
  labs(
    title = "The model is well calibrated across the probability range",
    subtitle = "Predicted vs observed redemption rate, by decile of predicted probability",
    x = "Mean predicted probability", y = "Observed redemption rate"
  )

Figure 8. Calibration plot. Users are grouped into deciles of predicted probability; each point plots the decile’s mean predicted probability against its observed redemption rate. Points on the dashed line indicate well-calibrated predictions.

9.3.4 Coefficient plot

Code
logit_or |>
  filter(Term != "Intercept") |>
  mutate(Term = forcats::fct_reorder(Term, `Odds ratio`)) |>
  ggplot(aes(x = `Odds ratio`, y = Term, colour = Significant)) +
  geom_vline(xintercept = 1, linetype = "dashed", colour = bv_grey) +
  geom_pointrange(aes(xmin = `CI low`, xmax = `CI high`), linewidth = 0.6) +
  scale_colour_manual(values = c("No" = bv_grey, "Yes" = bv_pink)) +
  scale_x_log10() +
  labs(
    title = "Coins earned is the dominant lever; the acquisition knobs are flat",
    subtitle = "Odds ratios on a log scale, with 95% confidence intervals",
    x = "Odds ratio (log scale)", y = NULL, colour = "Significant (p < 0.05)"
  ) +
  theme(legend.position = "top")

Figure 9. Odds ratios with 95% confidence intervals. Ratios to the right of 1 raise redemption odds; to the left, lower them. Intervals that cross 1 (grey) indicate predictors with no reliable effect.

9.4 Management Interpretation

The regression resolves Sections 6–8 into a single ranked list of levers. Of the seven predictors, only three have a statistically reliable effect, and only one of those is large.

Coins earned is the dominant lever (odds ratio ≈ 2.67). This is the finding the 2026 plan should be built around. Holding everything else constant, each one-unit increase in log-coins-earned multiplies a user’s odds of redeeming by roughly 2.7. In concrete terms: moving an activated user from the 10th percentile of coin-earning (around 2 coins) to the median (around 21 coins) multiplies their redemption odds by a factor of roughly seven. Business action: the single highest-leverage product investment is anything that raises coins earned per activated user — richer earn opportunities, more reasons to come back and earn, higher-value earn events. Engagement volume is not a vanity metric; it is the redemption engine.

Earn-event frequency has a small negative effect (odds ratio ≈ 0.94). Holding total coins earned constant, a user who accumulated those coins across more separate events is slightly less likely to redeem than one who earned the same total in fewer, larger events. Business action: Bvndle should audit its micro-reward mechanics — daily streaks, trivial-action rewards. The data hints that many tiny coin dribbles inflate a user’s balance without building redemption intent; fewer, more substantial earn events may convert better. The effect is small, so this is a “review and test,” not a “rip out.”

Recency has a small positive effect (odds ratio ≈ 1.07) — but treat it as a caution flag, not a lever. More days since last activity is associated with slightly higher redemption odds, which is counterintuitive. The most plausible explanation is a survivorship artefact of a cross-sectional snapshot: long-tenured-but-now-quiet users are users who did engage and redeem at some point, whereas the very-recently-active group is diluted by brand-new users who simply have not redeemed yet. Business action: none directly — but it flags that a proper panel dataset (Section 11) is needed to separate “recency” from “tenure of engagement.”

The four non-significant predictors are the strategic headline. Partner_A onboarding, signup source (api and web), tenure, and tier status all have confidence intervals that comfortably cross 1 — they have no reliable effect on redemption. These are precisely the levers the original 2026 strategy deck leaned on. The regression, the hypothesis tests in Section 7, and the correlation analysis in Section 8 all agree: the acquisition-channel and gamification-tier knobs the board has been debating do not move the outcome the business cares about. The 2026 plan should stop allocating attention to them and concentrate it on the one lever that does move redemption — coins earned per activated user.

10. Integrated Findings

10.1 What the Five Analyses Collectively Show

No single technique in this submission carries the argument on its own. Read together, they converge — from five independent directions — on the same conclusion.

Technique One-sentence finding
Exploratory Data Analysis (Section 5) 24 % of sign-ups never earn a coin, 47 % of those who do never spend one, and 14 % of activated users’ wallets fail to reconcile arithmetically — the funnel and the coin economy both leak in ways the headline metrics hide.
Visualisation (Section 6) The funnel’s largest single leak is the activation-to-spend stage, not the sign-up-to-activation stage, and the dominant failure mode is coin hoarding — not the cash-out behaviour the original strategy deck blamed.
Hypothesis Testing (Section 7) Partner_A onboarding and signup source (API vs web) produce no statistically meaningful difference in redemption or activation — the acquisition levers do not differentiate behaviour.
Correlation Analysis (Section 8) Coins earned is the metric most strongly associated with redemption (Spearman ρ ≈ +0.53); wallet balance is strongly negatively associated with it (ρ ≈ −0.51); tenure is associated with nothing.
Logistic Regression (Section 9) Holding all else constant, coins earned multiplies redemption odds by ≈ 2.67 per log-unit; partner status, signup source, tenure, and tier are all non-significant.

The through-line is unmistakable. Every technique that could have vindicated the acquisition-channel and gamification-tier levers — the hypothesis tests, the correlation matrix, the regression — instead found them flat. Every technique that looked at engagement volume found it doing the real work. And the descriptive techniques (EDA and visualisation) located the problem precisely: it is not that Bvndle acquires the wrong users or that users cash out instead of redeeming; it is that half of activated users earn coins and then never engage with the spend side of the loop at all.

10.2 The Single Integrated Recommendation

Re-centre the Bvndle 2026 plan on one objective: increasing coins earned per activated user, and converting coin-earning into merchant spend.

Concretely, this means three shifts in where management attention and product budget go:

  1. Stop optimising the acquisition channels. Partner_A and the API/web signup split do not move redemption. Whatever the 2026 plan currently allocates to debating, renewing, or expanding these channels should be redirected — unless a cost-per-user analysis (not in scope here) shows Partner_A is simply cheaper, in which case keep it for cost reasons, not for quality reasons.
  2. Make coins earned per activated user the headline KPI. It is the one variable with a large, reliable, actionable effect on the outcome the loyalty thesis depends on. It should replace “coins created” and “coins in circulation” on the board dashboard, reported as a median or percentile, never a mean.
  3. Attack the coin-hoarder segment directly. 57,389 activated users have earned coins and never spent one. They are the single largest group in the funnel and the clearest source of the coin liability on Bvndle’s balance sheet. The 2026 roadmap should treat “move a hoarder to first spend” as its primary product metric, and Section 11 proposes the experiment that would identify which intervention does that.

11. Limitations & Further Work

11.1 Data Limitations

The analysis is constrained by the data that was available, and four limitations should travel with every conclusion above.

First, the dataset is a cross-sectional snapshot, not a panel. Every metric is a lifetime cumulative total observed at a single extract date (12 May 2026). The analysis can describe what redeemers and non-redeemers look like; it cannot watch a user move through the funnel over time. This directly weakens the recency coefficient in Section 9, which is most plausibly a survivorship artefact of the snapshot design rather than a real effect.

Second, signup_source is missing for about 74 % of users. The API-vs-web hypothesis test in Section 7 runs only on the 26 % with a known source, and any channel-level conclusion is therefore conditional on the missingness being unrelated to behaviour — an assumption that cannot be checked with this extract.

Third, the partner-brand field is effectively binary. Only Partner_A has analytical volume; Partner_B and Partner_C together account for three users. The submission can speak to “Partner_A vs not-onboarded” but not to any comparison between partners.

Fourth, the coin reconciliation gap is unexplained. For 14 % of activated users, coins earned minus coins debited does not equal the recorded balance. Until Bvndle’s data team confirms whether the difference is expiry, burn, or a schema gap, the coin-economy figures carry an unquantified error. Relatedly, missions_completed was never delivered; credit_count is used as a proxy for engagement intensity, but it is not the same construct.

11.2 Analytical Limitations

The strongest analytical caveat is that nothing in this submission is causal. The regression in Section 9 holds other variables constant, but “coins earned predicts redemption” is an association that is almost certainly bidirectional and partly mechanical: earning more coins gives a user more to spend, and an unobserved “engaged user” trait could be driving both earning and spending. The recommendation in Section 10.2 — invest in coin-earning to lift redemption — is the most decision-useful reading of the evidence, but it is a hypothesis to be tested, not a proven mechanism.

Two further points. The very large sample size (N = 161,088) means p-values are uninformative on their own; this submission has leaned on effect sizes throughout, but a reader trained to look for “p < 0.05” should be reminded that at this N, statistical significance is nearly automatic and only the magnitude matters. And coins_earned is a concurrent behavioural measure, not a pre-treatment characteristic — it is measured over the same lifetime window as the outcome, which is why Section 9 is framed as “what a redeemer looks like” rather than “what causes redemption.”

11.3 Further Work

Three pieces of further work would convert this submission’s associations into decisions Bvndle could act on with confidence.

A panel dataset — the same users observed weekly or monthly rather than once — would allow time-to-event modelling of the first redemption and a proper retention curve. It would resolve the recency-versus-tenure ambiguity that Section 9 could not.

A randomised first-spend experiment is the highest-value next step. Bvndle should take a random sample of coin hoarders (activated users with zero debit events), randomly assign half to receive a first-spend intervention — a time-limited bonus, a curated nearby-merchant prompt, a redemption reminder — and compare 30-day redemption rates against the held-out control. That is the only design that turns “coins earned is associated with redemption” into “this specific intervention causes redemption.”

Finally, a merchant-network geographic analysis — joining user location to merchant density — would test the most obvious alternative explanation for the hoarder problem: that users do not spend because there is nothing near them worth spending on. That hypothesis is entirely consistent with everything in this submission and cannot be ruled out with the current data.

References

All sources are cited in APA 7th edition format.

Primary Data Source

Muraina, S. (2026). Bvndle user-level engagement and coin-activity extract [Dataset]. Provided by Bvndle Loyalty Limited, Lagos, Nigeria. Anonymised before analysis; available on request from the author, subject to Bvndle’s data-sharing restrictions.

Course Textbook

Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making — from data fundamentals to machine learning in Python and R. Lagos Business School. https://markanalytics.online/ai-powered-data-analytics/

Software and Computing Environment

Allaire, J. J., Teague, C., Scheidegger, C., Xie, Y., & Dervieux, C. (2024). Quarto (Version 1.9) [Computer software]. https://doi.org/10.5281/zenodo.5960048

R Core Team. (2024). R: A language and environment for statistical computing (Version 4.x) [Computer software]. R Foundation for Statistical Computing. https://www.R-project.org/

R Package Citations

Fox, J., & Weisberg, S. (2019). An R companion to applied regression (3rd ed.). Sage. [car package]

Ben-Shachar, M. S., Lüdecke, D., & Makowski, D. (2020). effectsize: Estimation of effect size indices and standardized parameters. Journal of Open Source Software, 5(56), 2815. https://doi.org/10.21105/joss.02815

Robinson, D., Hayes, A., & Couch, S. (2023). broom: Convert statistical objects into tidy tibbles (R package). https://CRAN.R-project.org/package=broom

Sjoberg, D. D., Whiting, K., Curry, M., Lavery, J. A., & Larmarange, J. (2021). Reproducible summary tables with the gtsummary package. The R Journal, 13(1), 570–580. https://doi.org/10.32614/RJ-2021-053

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag. https://doi.org/10.1007/978-3-319-24277-4

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Xie, Y. (2024). knitr: A general-purpose package for dynamic report generation in R (R package). https://yihui.org/knitr/

Note

The full citation for every package loaded in this document — including janitor, skimr, scales, patchwork, ggcorrplot, corrplot, kableExtra, here, and rstatix — can be regenerated reproducibly by running citation("packagename") in the project’s R session. The package set is declared in the setup chunk at the top of this document.

Appendix: AI Usage Statement

This submission was prepared with the assistance of an AI coding assistant (Anthropic’s Claude). The AI was used to: scaffold the Quarto document structure and styling; draft and debug R code chunks; suggest which statistical test matched each data structure (for example, the 2×2 chi-square design once signup_channel collapsed to a binary, and the log1p transforms once the coin variables’ heavy tails were established in EDA); and draft prose for review. The independent analytical and business judgement is the author’s own: the selection of Case Study 1, the choice of Bvndle as the data source, the decision to treat the second coin extract as canonical, the framing of the two hypotheses, the deliberate exclusion of coins_balance and the debit-side variables from the regression as target leakage, the interpretation of every output, and the integrated recomme