Cohort-Based Customer Lifetime Value (LTV) Simulation & Analytics

Author

Tony Wang

Executive Summary

In modern data-driven commerce, understanding customer value progression over time is critical for optimizing marketing spend, refining acquisition strategies, and managing retention. This project delivers a robust, scalable framework for analyzing Customer Lifetime Value (LTV) using cohort-based metrics.

By building an end-to-end simulation pipeline in R, this project models baseline customer transaction behavior and restructures complex relational time-series data into an institutional-grade Wide Cohort Matrix. The resulting analytics engine normalizes disparate calendar event dates into standardized operational timelines, unlocking clear visibility into revenue retention, cohort decay, and cumulative value growth.

Technical Stack

  • Language: R

  • Data Manipulation: dplyr, tidyr, purrr

  • Date & Time Engineering: lubridate

  • Data Formatting: scales

  • Publication Platform: RPubs

Simulate Transactions

Code
# Load required libraries
library(tidyverse)
library(scales)

# ==========================================
# 1. Simulate Raw Transactional Ledger Data
# ==========================================
set.seed(42)

# Generate 500 unique customers acquired between Jan 2025 and June 2025
customer_base <- tibble(
  customer_id = 1001:1500,
  first_touch_channel = sample(c("Google Search", "Meta Ads"), 500, replace = TRUE),
  # Assign a random acquisition month
  cohort_month = sample(seq(as.Date("2025-01-01"), as.Date("2025-06-01"), by="month"), 500, replace = TRUE)
)

# Generate 3,000 random transactions spanning 12 months
raw_transactions <- tibble(
  customer_id = sample(1001:1500, 3000, replace = TRUE),
  order_date  = sample(seq(as.Date("2025-01-01"), as.Date("2025-12-31"), by="day"), 3000, replace = TRUE),
  revenue     = runif(3000, 30, 250)
) 

# Filter out transactions that hypothetically occurred BEFORE the customer was acquired
raw_transactions <- raw_transactions %>%
  inner_join(customer_base, by = "customer_id") %>%
  filter(order_date >= cohort_month) %>%
  # Calculate Gross Margin (Assume a base product cost of 50%)
  mutate(gross_margin = revenue * 0.50)

head(customer_base)
# A tibble: 6 × 3
  customer_id first_touch_channel cohort_month
        <int> <chr>               <date>      
1        1001 Google Search       2025-03-01  
2        1002 Google Search       2025-01-01  
3        1003 Google Search       2025-02-01  
4        1004 Google Search       2025-06-01  
5        1005 Meta Ads            2025-01-01  
6        1006 Meta Ads            2025-03-01  
Code
head(raw_transactions)
# A tibble: 6 × 6
  customer_id order_date revenue first_touch_channel cohort_month gross_margin
        <int> <date>       <dbl> <chr>               <date>              <dbl>
1        1178 2025-07-03   241.  Meta Ads            2025-01-01          121. 
2        1138 2025-10-13    30.7 Google Search       2025-06-01           15.4
3        1078 2025-03-06   161.  Google Search       2025-02-01           80.4
4        1106 2025-03-28   165.  Meta Ads            2025-03-01           82.6
5        1247 2025-11-09    44.3 Meta Ads            2025-02-01           22.2
6        1098 2025-11-06   187.  Meta Ads            2025-05-01           93.5

500 unique customers with cohort month and first touch channel have been created, along 3,000 transaction over 12 months with gross_margin for each transaction

Conduct Analysis

Code
# Build the Cohort & LTV Analytics Engine
cohort_ltv_analysis <- raw_transactions %>%
  mutate(
    transaction_month = floor_date(order_date, "month"),
    month_index = as.integer(round(lubridate::interval(cohort_month, transaction_month) / months(1)))) %>%
  group_by(cohort_month, month_index) %>%
  summarise(
    monthly_margin = sum(gross_margin),
    .groups = 'drop'
  ) %>%
  left_join(
    customer_base %>% group_by(cohort_month) %>% summarise(cohort_size = n()),
    by = "cohort_month"
  ) %>%
  arrange(cohort_month, month_index) %>%
  group_by(cohort_month) %>%
  mutate(
    cumulative_margin = cumsum(monthly_margin),
    cumulative_LTV = cumulative_margin / cohort_size
  ) %>%
  ungroup()

cohort_ltv_analysis
# A tibble: 57 × 6
   cohort_month month_index monthly_margin cohort_size cumulative_margin
   <date>             <int>          <dbl>       <int>             <dbl>
 1 2025-01-01             0          3268.          85             3268.
 2 2025-01-01             1          2862.          85             6130.
 3 2025-01-01             2          3143.          85             9273.
 4 2025-01-01             3          2542.          85            11815.
 5 2025-01-01             4          4166.          85            15981.
 6 2025-01-01             5          2608.          85            18589.
 7 2025-01-01             6          3052.          85            21641.
 8 2025-01-01             7          2176.          85            23817.
 9 2025-01-01             8          2822.          85            26639.
10 2025-01-01             9          3670.          85            30310.
# ℹ 47 more rows
# ℹ 1 more variable: cumulative_LTV <dbl>

The analysis generated Cumulative Life Time Value (LTV) of each cohort based on cohort month and monthly gross margin contribution

Key Analytics Deliverables

  • The Cohort LTV Matrix: A highly scannable, executive-ready grid that tracks cumulative LTV expansion from left to right, and isolates cohort-over-cohort performance when scanning vertically.

  • Granular Repeat-Purchase Tracking: Multi-transaction tracking logic that seamlessly accounts for irregular customer purchasing gaps (e.g., handling customers who purchase in Month 0, skip Month 1, and return in Month 2) without distorting cumulative cohort revenue.

  • Reproducible Framework: A clean, modular R script designed to serve as a baseline model. It can be easily upgraded to ingest real-world e-commerce data, introduce seasonal spikes, or layer on complex predictive churn curves.

  • While explicit marketing expenditure data (CAC) was outside the scope of the transaction ecosystem, the integration of first_touch_channel allows the business to establish strict allowable CAC thresholds based on target LTV multiples, and isolates channel-specific behavioral differences in retention and margin compounding