This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.

Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.

Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).

Customer Lifetime Value (CLV)

Customer Lifetime Value is “the present value of the future cash flows attributed to the customer during his/her entire relationship with the company.” There are different kinds of formulas, from simplified to advanced, to calculate CLV. But the following one might be the one being used most commonly:

where,

Here we assume that \({r}\) is constant in the formula; however, it is not always the case. The factors which influence \({r}\) include demographics (age, geography, and profession etc), behavior (Recency, Frequency, Monetary, etc), tenure, competition, etc. There are some improved formulas which forecast the \({r}\) by different approaches such as Logistic Regression.

We will demonstrate how to use R and calculate a customer’s CLV by predicting the retention/repurchasing rate \({r}\) of customers in each future purchasing cycle time with the Logistic Regression model based on the predictors of Recency, Frequency, and Monetary.

Data Set

We will use the CDNow full example data set for concrete case study to build the above model.

There are 23570 distinct customers who made their first purchase at CDNOW in the first quarter of 1997 in the sample data. There are a total of 69,659 transaction records, which occurred during the period of the start of Jan 1997 to the end of June 1998.

Exploring the relationships between Repurchase Rate and Recency, Frequency, and Monetary

First, we calculate the number of customers grouped by Recency values, and then further group them into “Buy” and “No Buy” according to the data in the next purchasing cycle time, and finally get the percentage of customers who repurchase in a certain Recency value in the next period. Here we leverage the R language function “ddply” to complete the grouping and calculating work. Below is a list pairs of percentage and Recency value we calculated. Please note that the less the Recency value is, the more recent the purchasing takes place.

Next, we will remove the record before the start date and end date.

str(df)
'data.frame':   69659 obs. of  4 variables:
 $ V1: int  1 2 2 3 3 3 3 3 3 4 ...
 $ V2: int  19970101 19970112 19970112 19970102 19970330 19970402 19971115 19971125 19980528 19970101 ...
 $ V3: int  1 1 5 2 2 2 5 4 1 2 ...
 $ V4: num  11.8 12 77 20.8 20.8 ...

Let’s construct the data frame and rename the column variables, and verify the changes.

'data.frame':   69659 obs. of  3 variables:
 $ ID    : num  1 2 2 3 3 3 3 3 3 4 ...
 $ Date  : num  2e+07 2e+07 2e+07 2e+07 2e+07 ...
 $ Amount: num  11.8 12 77 20.8 20.8 ...
str(df)

Next, we’ll go ahead and covert the Date column to Date variable type:

str(df)
'data.frame':   69659 obs. of  3 variables:
 $ ID    : num  1 2 2 3 3 3 3 3 3 4 ...
 $ Date  : Date, format: "1997-01-01" "1997-01-12" "1997-01-12" "1997-01-02" ...
 $ Amount: num  11.8 12 77 20.8 20.8 ...
#  set the "forecast" transaction time scope which are a bi-month purchasing cycle time
startDate_forcast <- as.Date("19980301","%Y%m%d")
endDate_forcast <- as.Date("19980430","%Y%m%d")
#get the rolled up R,F,M data frames
history <- getDataFrame(df,startDate_history,endDate_history)
forcast <- getDataFrame(df,startDate_forcast,endDate_forcast)
# set the purchasing cycle time as 60 days, and discrete the Recency 
history$Recency<- history$Recency %/% 60 
#discrete the Monetary by $10 interval
breaks<-seq(0,round(max(history$Monetary)+9),by=10)
history$Monetary<-as.numeric(cut(history$Monetary,breaks,labels=FALSE))
#add a Buy/No Buy column to the RFM data frame
Buy<-rep(0,nrow(history))
history<-cbind(history,Buy)
# find out the those who repurchased in the forcast period 19980301 - 19980430
history[history$ID %in% forcast$ID, ]$Buy<-1
train<-history
head(train)
# get "Buy" percentages based on the variable Recency
colNames<-c("Recency")
p<-getPercentages(train,colNames)
# get the Buy ~ Recency model
r.glm=glm(Percentage~Recency,family=quasibinomial(link='logit'),data=p)
p_r<-p
# get "Buy" percentages based on the variable Frequency
colNames<-c("Frequency")
p<-getPercentages(train,colNames)
# get the Buy ~ Frequency model
f.glm=glm(Percentage~Frequency,family=quasibinomial(link='logit'),data=p)
p_f<-p
# get "Buy" percentages based on the variable Monetary
colNames<-c("Monetary")
p<-getPercentages(train,colNames)
# get the Buy ~ Monetary model
m.glm=glm(Percentage~Monetary,family=quasibinomial(link='logit'),data=p)
p_m<-p
#plot and draw fit curves of Percentage ~ r,f,m
par(mfrow=c(1,3),oma=c(0,0,2,0))
plot(p_r$Recency,p_r$Percentage*100,xlab="Recency",ylab="Probablity of Purchasing(%)")
lines(lowess(p_r$Recency,p_r$Percentage*100),col="blue",lty=2)

plot(p_f$Frequency,p_f$Percentage*100,xlab="Frequency",ylab="Probablity of Purchasing(%)")
lines(lowess(p_f$Frequency,p_f$Percentage*100),col="blue",lty=2)

plot(p_m$Monetary,p_m$Percentage*100,xlab="Monetary",ylab="Probablity of Purchasing(%)")
lines(lowess(p_m$Monetary,p_m$Percentage*100),col="blue",lty=2)
title("Percentages ~ Recency, Frequency, Monetary", y=10,outer=TRUE)

par(mfrow=c(1,1))
model<-glm(Buy~Recency+Frequency,family=quasibinomial(link='logit'),data=train)
pred<-predict(model,data.frame(Recency=c(0),Frequency=c(1)),type='response')
## caculating the CLV for a customer with R=0,F=1,average profit=100,discount rate=0.02 for 3 periods
v<-getCLV(0,1,100,1,0,3,0.02,model)
v
[1] 63.91906
