When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).
Exploring the relationships between Repurchase Rate and Recency, Frequency, and Monetary
First, we calculate the number of customers grouped by Recency values, and then further group them into “Buy” and “No Buy” according to the data in the next purchasing cycle time, and finally get the percentage of customers who repurchase in a certain Recency value in the next period. Here we leverage the R language function “ddply” to complete the grouping and calculating work. Below is a list pairs of percentage and Recency value we calculated. Please note that the less the Recency value is, the more recent the purchasing takes place.
Next, we will remove the record before the start date and end date.
str(df)
'data.frame': 69659 obs. of 4 variables:
$ V1: int 1 2 2 3 3 3 3 3 3 4 ...
$ V2: int 19970101 19970112 19970112 19970102 19970330 19970402 19971115 19971125 19980528 19970101 ...
$ V3: int 1 1 5 2 2 2 5 4 1 2 ...
$ V4: num 11.8 12 77 20.8 20.8 ...
Let’s construct the data frame and rename the column variables, and verify the changes.
'data.frame': 69659 obs. of 3 variables:
$ ID : num 1 2 2 3 3 3 3 3 3 4 ...
$ Date : num 2e+07 2e+07 2e+07 2e+07 2e+07 ...
$ Amount: num 11.8 12 77 20.8 20.8 ...
str(df)
Next, we’ll go ahead and covert the Date column to Date variable type:
str(df)
'data.frame': 69659 obs. of 3 variables:
$ ID : num 1 2 2 3 3 3 3 3 3 4 ...
$ Date : Date, format: "1997-01-01" "1997-01-12" "1997-01-12" "1997-01-02" ...
$ Amount: num 11.8 12 77 20.8 20.8 ...
# set the "forecast" transaction time scope which are a bi-month purchasing cycle time
startDate_forcast <- as.Date("19980301","%Y%m%d")
endDate_forcast <- as.Date("19980430","%Y%m%d")
#get the rolled up R,F,M data frames
history <- getDataFrame(df,startDate_history,endDate_history)
forcast <- getDataFrame(df,startDate_forcast,endDate_forcast)
# set the purchasing cycle time as 60 days, and discrete the Recency
history$Recency<- history$Recency %/% 60
#discrete the Monetary by $10 interval
breaks<-seq(0,round(max(history$Monetary)+9),by=10)
history$Monetary<-as.numeric(cut(history$Monetary,breaks,labels=FALSE))
#add a Buy/No Buy column to the RFM data frame
Buy<-rep(0,nrow(history))
history<-cbind(history,Buy)
# find out the those who repurchased in the forcast period 19980301 - 19980430
history[history$ID %in% forcast$ID, ]$Buy<-1
train<-history
head(train)
# get "Buy" percentages based on the variable Recency
colNames<-c("Recency")
p<-getPercentages(train,colNames)
# get the Buy ~ Recency model
r.glm=glm(Percentage~Recency,family=quasibinomial(link='logit'),data=p)
p_r<-p
# get "Buy" percentages based on the variable Frequency
colNames<-c("Frequency")
p<-getPercentages(train,colNames)
# get the Buy ~ Frequency model
f.glm=glm(Percentage~Frequency,family=quasibinomial(link='logit'),data=p)
p_f<-p
# get "Buy" percentages based on the variable Monetary
colNames<-c("Monetary")
p<-getPercentages(train,colNames)
# get the Buy ~ Monetary model
m.glm=glm(Percentage~Monetary,family=quasibinomial(link='logit'),data=p)
p_m<-p
#plot and draw fit curves of Percentage ~ r,f,m
par(mfrow=c(1,3),oma=c(0,0,2,0))
plot(p_r$Recency,p_r$Percentage*100,xlab="Recency",ylab="Probablity of Purchasing(%)")
lines(lowess(p_r$Recency,p_r$Percentage*100),col="blue",lty=2)

plot(p_f$Frequency,p_f$Percentage*100,xlab="Frequency",ylab="Probablity of Purchasing(%)")
lines(lowess(p_f$Frequency,p_f$Percentage*100),col="blue",lty=2)

plot(p_m$Monetary,p_m$Percentage*100,xlab="Monetary",ylab="Probablity of Purchasing(%)")
lines(lowess(p_m$Monetary,p_m$Percentage*100),col="blue",lty=2)
title("Percentages ~ Recency, Frequency, Monetary", y=10,outer=TRUE)

par(mfrow=c(1,1))
model<-glm(Buy~Recency+Frequency,family=quasibinomial(link='logit'),data=train)
pred<-predict(model,data.frame(Recency=c(0),Frequency=c(1)),type='response')
## caculating the CLV for a customer with R=0,F=1,average profit=100,discount rate=0.02 for 3 periods
v<-getCLV(0,1,100,1,0,3,0.02,model)
v
[1] 63.91906
