Cohort-Based Customer Lifetime Value (LTV) Simulation & Analytics
Author
Tony Wang
Executive Summary
In modern data-driven commerce, understanding customer value progression over time is critical for optimizing marketing spend, refining acquisition strategies, and managing retention. This project delivers a robust, scalable framework for analyzing Customer Lifetime Value (LTV) using cohort-based metrics.
By building an end-to-end simulation pipeline in R, this project models baseline customer transaction behavior and restructures complex relational time-series data into an institutional-grade Wide Cohort Matrix. The resulting analytics engine normalizes disparate calendar event dates into standardized operational timelines, unlocking clear visibility into revenue retention, cohort decay, and cumulative value growth.
Technical Stack
Language: R
Data Manipulation:dplyr, tidyr, purrr
Date & Time Engineering:lubridate
Data Formatting:scales
Publication Platform: RPubs
Simulate Transactions
Code
# Load required librarieslibrary(tidyverse)library(scales)# ==========================================# 1. Simulate Raw Transactional Ledger Data# ==========================================set.seed(42)# Generate 500 unique customers acquired between Jan 2025 and June 2025customer_base <-tibble(customer_id =1001:1500,first_touch_channel =sample(c("Google Search", "Meta Ads"), 500, replace =TRUE),# Assign a random acquisition monthcohort_month =sample(seq(as.Date("2025-01-01"), as.Date("2025-06-01"), by="month"), 500, replace =TRUE))# Generate 3,000 random transactions spanning 12 monthsraw_transactions <-tibble(customer_id =sample(1001:1500, 3000, replace =TRUE),order_date =sample(seq(as.Date("2025-01-01"), as.Date("2025-12-31"), by="day"), 3000, replace =TRUE),revenue =runif(3000, 30, 250)) # Filter out transactions that hypothetically occurred BEFORE the customer was acquiredraw_transactions <- raw_transactions %>%inner_join(customer_base, by ="customer_id") %>%filter(order_date >= cohort_month) %>%# Calculate Gross Margin (Assume a base product cost of 50%)mutate(gross_margin = revenue *0.50)head(customer_base)
# A tibble: 6 × 3
customer_id first_touch_channel cohort_month
<int> <chr> <date>
1 1001 Google Search 2025-03-01
2 1002 Google Search 2025-01-01
3 1003 Google Search 2025-02-01
4 1004 Google Search 2025-06-01
5 1005 Meta Ads 2025-01-01
6 1006 Meta Ads 2025-03-01
Code
head(raw_transactions)
# A tibble: 6 × 6
customer_id order_date revenue first_touch_channel cohort_month gross_margin
<int> <date> <dbl> <chr> <date> <dbl>
1 1178 2025-07-03 241. Meta Ads 2025-01-01 121.
2 1138 2025-10-13 30.7 Google Search 2025-06-01 15.4
3 1078 2025-03-06 161. Google Search 2025-02-01 80.4
4 1106 2025-03-28 165. Meta Ads 2025-03-01 82.6
5 1247 2025-11-09 44.3 Meta Ads 2025-02-01 22.2
6 1098 2025-11-06 187. Meta Ads 2025-05-01 93.5
500 unique customers with cohort month and first touch channel have been created, along 3,000 transaction over 12 months with gross_margin for each transaction
The analysis generated Cumulative Life Time Value (LTV) of each cohort based on cohort month and monthly gross margin contribution
Print & Plot Outcome
Code
#Format into a Corporate Executive Matrixltv_wide_matrix <- cohort_ltv_analysis %>%mutate(cohort_label =format(cohort_month, "%Y-%m"),cumulative_LTV =dollar(cumulative_LTV, accuracy =0.01) ) %>%select(cohort_label, cohort_size, month_index, cumulative_LTV) %>%pivot_wider(names_from = month_index, names_prefix ="Month_", values_from = cumulative_LTV, values_fill ="-")# Filter out any unformatted character strings and keep the raw numeric data for plotting.plot_ready_data <- cohort_ltv_analysis %>%mutate(cohort_label =format(cohort_month, "%Y-%m")) %>%filter(month_index <=6) # Let's clip it at Month 6 for a clean view# Generate the Executive LTV Trajectory Chartltv_chart <-ggplot(plot_ready_data, aes(x = month_index, y = cumulative_LTV, color = cohort_label, group = cohort_label)) +# Add clean, distinct lines and points for each cohortgeom_line(linewidth =1.2) +geom_point(size =2.5) +# Format axes for executive presentationscale_x_continuous(breaks =0:6, labels =paste0("Month ", 0:6)) +scale_y_continuous(labels =dollar_format(prefix ="$")) +# Apply a crisp, high-contrast themetheme_minimal(base_size =14) +theme(legend.position ="right",panel.grid.minor =element_blank(),plot.title =element_text(face ="bold", size =16),plot.subtitle =element_text(color ="gray40", size =12) ) +# Clear, professional labelinglabs(title ="Customer LTV Monetization Speed by Cohort",subtitle ="Cumulative gross margin generated per customer over time (Normalized View)",x ="Months Elapsed Since Onboarding",y ="Cumulative Lifetime Value (LTV)",color ="Acquisition Cohort" )# Display the chartprint(ltv_chart)
The Cohort LTV Matrix: A highly scannable, executive-ready grid that tracks cumulative LTV expansion from left to right, and isolates cohort-over-cohort performance when scanning vertically.
Granular Repeat-Purchase Tracking: Multi-transaction tracking logic that seamlessly accounts for irregular customer purchasing gaps (e.g., handling customers who purchase in Month 0, skip Month 1, and return in Month 2) without distorting cumulative cohort revenue.
Reproducible Framework: A clean, modular R script designed to serve as a baseline model. It can be easily upgraded to ingest real-world e-commerce data, introduce seasonal spikes, or layer on complex predictive churn curves.
While explicit marketing expenditure data (CAC) was outside the scope of the transaction ecosystem, the integration of first_touch_channel allows the business to establish strict allowable CAC thresholds based on target LTV multiples, and isolates channel-specific behavioral differences in retention and margin compounding