This is the last of a series of 4 dynamic documents written to demonstrate the value of customer analytics to the senior management of a diversified holding company with subsidiaries in key emerging markets. The documents aim to support the argument for significant investments in data and analytics capabilities as a means of driving revenue and profit growth through improved customer relationship management.
The key insights and arguments are presented in a separate powerpoint deck prepared for non-technical executives.
The papers implement advanced analytics methods to:
Identify managerially relevant subgroups in a customer database (segmentation).
Build a model that predicts customer churn.
Build a model that predicts the spending level of customers predicted to remain loyal.
4. Estimate the lifetime value of customer and hence the present value of the customer database.
This document focuses on the fourth task - estimating customer lifetime value.
There are numerous methods available extimating lifetime value. Many are simple to the point of having limited business value whilst some are highly complex and possibly do not justify the effort. The method used here does not fall into the fiendishly advanced category.
Neverthless, the approach is far from trivial and in my opinion produces offers significant business value.
The first document in this series - Customer Analytics I - Statistical Segmentation - discusses the choice of the ‘correct’ number of subgroups in a segmentation scheme.
The solution implemented in this document is a managerial segmentation solution commonly applied in practice. The document referenced above proposes a more statistically rigorous solution based on algorithmically identified natural clusters.
The segmentation scheme is applied to the latest version of the database (2015) and separately, retrospectively to the 2014 version of the database.
This split approach allows us to analyse the movement of customers between segments from 1 period to the next. Assuming transition behaviour is maintained into the future, we contruct a transition matrix that is used to model the evolution of the customer database going forward. This model can of course be updated to reflect changes in transition behaviour as a result of marketing actions for example.
We then compute historical revenue generated by individual customers and by segment. Next, we project forward 10 periods (years) by combining revenue with what we know about customers’ transitions between segments going forward. 10 years is the forecast horizon. The choice is based on judgement.
The estimated lifetime value is discounted to obtain the present value of the database.
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)library(ggplot2)
library(dplyr)
library(sqldf)customer_data <- read.delim(file = 'transactions.txt', header = FALSE, sep = '\t',
dec = '.')The dataset is a proprietary customer database consisting of 51,243 observations across 3 features:
The data covers the period January 2nd 2005 to December 31st 2015.
The dataset is tidy and ready for analysis.
## Name columns
customer_data <- rename(customer_data, customer_id = V1, purchase_amount = V2,
date_of_purchase = V3)
## Convert purchase date to R date object
customer_data$date_of_purchase <- as.Date(customer_data$date_of_purchase, "%Y-%m-%d")Compute additional variables (key marketing indicators):
recency - time since last purchase (days)
frequency - number of purchase transactions
year_of_purchase
avg_amount - average amount spent
## Create `recency` variable
customer_data$days_since <- as.numeric(difftime(time1 = "2016-01-01",
time2 = customer_data$date_of_purchase,
units = "days"))
customer_data$year_of_purchase <- as.numeric(format(customer_data$date_of_purchase,
"%Y"))
## Customer database as at end 2015 grouped by customer ID with remaining
## marketing indicators computed
customers_2015 <- customer_data %>%
group_by(customer_id) %>%
summarise(recency = min(days_since),
frequency = n(),
first_purchase = max(days_since),
avg_amount = mean(purchase_amount))Implement 4 segment solution of the current database
customers_2015$segment <- "NA"Customers who have not made a purchase in more than 3 years are labelled “inactive”
customers_2015$segment[which(customers_2015$recency > 365*3)] <- "inactive"Customers who have not made a purchase in more 2 years (but less than 3) are labelled “cold”
customers_2015$segment[which(customers_2015$recency <= 365*3 &
customers_2015$recency > 365*2)] <- "cold"Customers who have not made a purchase during the last year but did so the year before are labelled “warm”
customers_2015$segment[which(customers_2015$recency <= 365*2 &
customers_2015$recency > 365*1)] <- "warm"Customers who made a purchase within the last year are labelled “active”
customers_2015$segment[which(customers_2015$recency <= 365)] <- "active"Segment counts
ggplot(data = customers_2015, aes(x = segment)) +
geom_bar(fill = "steelblue") +
theme_classic() +
labs(x = "Segment", y = "",
title = "Segment Counts - 2015 Database")Barely 30% of the database is “active”, meaning they made a purchase during the last year.
Half of all customers in the database are “inactive” - they have made no purchases in more than 3 years.
prop.table(table(customers_2015$segment))##
## active cold inactive warm
## 0.2930988 0.1033284 0.4972580 0.1063148
Further subdivide the “active” and “warm” categories
“active”
customers_2015$segment[which(customers_2015$segment == "active" &
customers_2015$first_purchase <= 365)] <- "new active"
customers_2015$segment[which(customers_2015$segment == "active" &
customers_2015$avg_amount < 100)] = "active low value"
customers_2015$segment[which(customers_2015$segment == "active" &
customers_2015$avg_amount >= 100)] = "active high value"“warm”
customers_2015$segment[which(customers_2015$segment == "warm" &
customers_2015$first_purchase <= 365*2)] <- "new warm"
customers_2015$segment[which(customers_2015$segment == "warm" &
customers_2015$avg_amount < 100)] <- "warm low value"
customers_2015$segment[which(customers_2015$segment == "warm" &
customers_2015$avg_amount >= 100)] <- "warm high value"Re-order segment
customers_2015$segment <- factor(customers_2015$segment,
levels = c("inactive",
"cold",
"warm high value",
"warm low value",
"new warm",
"active high value",
"active low value",
"new active"))Expanded segments
ggplot(data = customers_2015, aes(x = segment)) +
geom_bar(fill = "steelblue") +
theme_classic() +
labs(x = "Segment", y = "",
title =
"Segment Counts (2015)\n'active' & 'warm' segments subdivided") +
coord_flip()Create customer database as it was 1 year before (at end 2014)
The object of this exercise is to compare the 2014 database to the 2015 version in order to develop a sense of how customers move between segments over time.
We shall construct a transition matrix that models this behaviour.
2014 database
customers_2014 <- customer_data %>%
filter(days_since > 365) %>%
group_by(customer_id) %>%
summarise(recency = min(days_since) -365,
frequency = n(),
first_purchase = max(days_since) - 365,
avg_amount = mean(purchase_amount),
revenue = sum(purchase_amount)) Implement 4 segment solution of the database
customers_2014$segment = "NA"Customers who have not made a purchase in more than 3 years are labelled “inactive”
customers_2014$segment[which(customers_2014$recency > 365*3)] <- "inactive"Customers who have not made a purchase in more 2 years (but less than 3) are labelled “cold”
customers_2014$segment[which(customers_2014$recency <= 365*3 &
customers_2014$recency > 365*2)] <- "cold"Customers who have not made a purchase during the last year but did so the year before are labelled “warm”
customers_2014$segment[which(customers_2014$recency <= 365*2 &
customers_2014$recency > 365*1)] <- "warm"Customers who made a purchase within the last year are labelled “active”
customers_2014$segment[which(customers_2014$recency <= 365)] <- "active"Further refine the “warm” and “active” categories
“warm”
customers_2014$segment[which(customers_2014$segment == "warm" &
customers_2014$first_purchase <= 365*2)] <- "new warm"
customers_2014$segment[which(customers_2014$segment == "warm" &
customers_2014$avg_amount < 100)] <- "warm low value"
customers_2014$segment[which(customers_2014$segment == "warm" &
customers_2014$avg_amount >= 100)] <- "warm high value"“active”
customers_2014$segment[which(customers_2014$segment == "active" &
customers_2014$first_purchase <= 365)] <- "new active"
customers_2014$segment[which(customers_2014$segment == "active" &
customers_2014$avg_amount < 100)] <- "active low value"
customers_2014$segment[which(customers_2014$segment == "active" &
customers_2014$avg_amount >= 100)] <- "active high value"Convert segment to factor and re-order
customers_2014$segment = factor(customers_2014$segment,
levels = c("inactive",
"cold",
"warm high value",
"warm low value",
"new warm",
"active high value",
"active low value",
"new active"))Plot 2014 segments
ggplot(data = customers_2014, aes(x = segment)) +
geom_bar(fill = "steelblue") +
theme_classic() +
labs(x = "Segment", y = "",
title =
"Segment Counts (2014)\n'active' & 'warm' segments subdivided") +
coord_flip()2014 segment counts look very similar to 2015
The likelihood of customers moving from a given segment to another.
transition_data <- merge(x = customers_2014, y = customers_2015,
by = "customer_id", all.x = TRUE)
transition <- table(transition_data$segment.x, transition_data$segment.y)
transition <- prop.table(transition,1)
print(round(transition,3))##
## inactive cold warm high value warm low value new warm
## inactive 0.962 0.000 0.000 0.000 0.000
## cold 0.897 0.000 0.000 0.000 0.000
## warm high value 0.000 0.676 0.000 0.000 0.000
## warm low value 0.000 0.721 0.000 0.000 0.000
## new warm 0.000 0.911 0.000 0.000 0.000
## active high value 0.000 0.000 0.251 0.000 0.000
## active low value 0.000 0.000 0.000 0.299 0.000
## new active 0.000 0.000 0.000 0.000 0.653
##
## active high value active low value new active
## inactive 0.005 0.033 0.000
## cold 0.010 0.093 0.000
## warm high value 0.315 0.009 0.000
## warm low value 0.001 0.278 0.000
## new warm 0.012 0.077 0.000
## active high value 0.745 0.004 0.000
## active low value 0.007 0.693 0.000
## new active 0.062 0.285 0.000
96% of customers inactive in 2014 remained so in 2015.
91% of customers classes as “new warm” in 2014 were allowed to go “cold” in 2015.
Of greatest concern, 65 of customers acquired in 2014 purchased nothing in 2015.
The previous 3 documents in this series propose machine learning solutions that could inform marketing efforts to staunch the high attrition rates observed in the transition matrix.
We shall use the transition matrix to predict customer transitions over the next 10 years.
Initialize a matrix with the number of customers in each segment today; project forward 10 additional periods
segment_counts <- matrix(nrow = 8, ncol = 11)
colnames(segment_counts) <- 2015:2025Populate the matrix
segment_counts[, 1] <- table(customers_2015$segment)
row.names(segment_counts) <- levels(customers_2015$segment)## Compute for 10 future periods
for (i in 2:11) {
segment_counts[, i] = segment_counts[, i-1] %*% transition
}The table shows customers’ evolution through the segments across 10 periods
print(round(segment_counts))## 2015 2016 2017 2018 2019 2020 2021 2022 2023
## inactive 9158 10517 11539 12636 12940 13186 13386 13542 13664
## cold 1903 1584 1711 874 821 782 740 709 684
## warm high value 119 144 165 160 156 152 149 146 143
## warm low value 901 991 1058 989 938 884 844 813 789
## new warm 938 987 0 0 0 0 0 0 0
## active high value 573 657 639 624 607 593 581 571 562
## active low value 3313 3537 3305 3134 2954 2820 2717 2637 2575
## new active 1512 0 0 0 0 0 0 0 0
## 2024 2025
## inactive 13759 13834
## cold 665 650
## warm high value 141 139
## warm low value 771 756
## new warm 0 0
## active high value 554 547
## active low value 2527 2490
## new active 0 0
Numbers of inactive customers are predicted to grow throughout the period although the trend slows down significantly over time.
Numbers of active high value customers fall throughout the period. A sharp decline is noticeable after 2017. The trend levels off beyond that point.
barplot(segment_counts[1, ], xlab = "Year", main = "Inactive Customers",
col = "steelblue")
barplot(segment_counts[2, ], xlab = "Year", main = "Active high value customers",
col = "steelblue")Compute revenue generated by each customer in 2015
revenue_2015 <- customer_data %>%
filter(year_of_purchase == 2015) %>%
group_by(customer_id) %>%
summarise(revenue_2015 = sum(purchase_amount))Add 2015 revenue to 2015 customer data
merged_2015 <- left_join(customers_2015, revenue_2015, by = "customer_id")
## Convert missing values to 0
merged_2015$revenue_2015[is.na(merged_2015$revenue_2015)] <- 0Average revenue by segment in 2015
revenue2015_segments <- merged_2015 %>%
group_by(segment) %>%
summarise("average_revenue" = mean(revenue_2015))
print(revenue2015_segments)## Source: local data frame [8 x 2]
##
## segment average_revenue
## (fctr) (dbl)
## 1 inactive 0.00000
## 2 cold 0.00000
## 3 warm high value 0.00000
## 4 warm low value 0.00000
## 5 new warm 0.00000
## 6 active high value 323.56894
## 7 active low value 52.30604
## 8 new active 79.16614
10 year projected annual revenue per segment.
revenue_segments <- revenue2015_segments$average_revenue * segment_counts
print(revenue_segments)## 2015 2016 2017 2018 2019 2020
## inactive 0.0 0.0 0.0 0.0 0.0 0.0
## cold 0.0 0.0 0.0 0.0 0.0 0.0
## warm high value 0.0 0.0 0.0 0.0 0.0 0.0
## warm low value 0.0 0.0 0.0 0.0 0.0 0.0
## new warm 0.0 0.0 0.0 0.0 0.0 0.0
## active high value 185405.0 212495.1 206634.1 202008.5 196555.0 191967.1
## active low value 173289.9 184985.5 172891.5 163912.8 154516.6 147520.4
## new active 119699.2 0.0 0.0 0.0 0.0 0.0
## 2021 2022 2023 2024 2025
## inactive 0.0 0.0 0.0 0.0 0.0
## cold 0.0 0.0 0.0 0.0 0.0
## warm high value 0.0 0.0 0.0 0.0 0.0
## warm low value 0.0 0.0 0.0 0.0 0.0
## new warm 0.0 0.0 0.0 0.0 0.0
## active high value 188022.1 184626.9 181716.8 179228.5 177105.5
## active low value 142116.9 137930.2 134692.5 132189.2 130254.5
## new active 0.0 0.0 0.0 0.0 0.0
10 year projected annual revenue.
revenue_projected <- colSums(revenue_segments)
print(round(revenue_projected))## 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024
## 478394 397481 379526 365921 351072 339488 330139 322557 316409 311418
## 2025
## 307360
Revenues are projected to fall over time. Unsurprising, as active customer numbers are predicted to decline over the same period.
barplot(revenue_projected, xlab = "Year", ylab = "Revenue",
main = "Revenue Predictions", col = "steelblue")Cummulated revenue
revenue_cumulated <- cumsum(revenue_projected)
print(round(revenue_cumulated))## 2015 2016 2017 2018 2019 2020 2021 2022 2023
## 478394 875875 1255400 1621322 1972393 2311881 2642020 2964577 3280986
## 2024 2025
## 3592404 3899764
Create a discount factor
In order obtain the present value of a stream of future cashflows, we must discount by an appropriate discount factor to account for the time value of money and other factors. This is a finance concept best discussed elsewhere.
The 10% discount rate is subjective.
discount_rate <- 0.10
discount = 1 / ((1 + discount_rate) ^ ((1:11) - 1))
print(round(discount,2))## [1] 1.00 0.91 0.83 0.75 0.68 0.62 0.56 0.51 0.47 0.42 0.39
Compute discounted annual revenue
disc_annual_revenue <- revenue_projected * discount
print(round(disc_annual_revenue))## 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024
## 478394 361346 313658 274922 239787 210795 186355 165523 147607 132071
## 2025
## 118501
Plot discounted revenue
barplot(revenue_projected, xlab = "Year", ylab = "Revenue",
main = "Predicted Revenue", col = "steelblue")
barplot(disc_annual_revenue, xlab = "Year", ylab = "Revenue",
main = "Predicted Revenue\n (Discounted)", col = "steelblue")Compute discounted cumulative revenue
disc_cumulative_revenue <- cumsum(disc_annual_revenue)
print(round(disc_cumulative_revenue))## 2015 2016 2017 2018 2019 2020 2021 2022 2023
## 478394 839740 1153398 1428320 1668106 1878901 2065256 2230779 2378386
## 2024 2025
## 2510458 2628958
barplot(disc_cumulative_revenue, xlab = "Year", ylab = "$",
main = "Discounted Cumulative Revenue", col = "steelblue")Present value of the customer database
print(disc_cumulative_revenue[11] - revenue_projected[1])## 2025
## 2150564
2,150,564
The code can easily be modified to compute the lifetime value of individual customers.
Note: We computed the lifetime value of all customers currently in the database. No account is taken of future customer acquisitions.