Purpose: RFM and CLV analysis of customer:

Goal 1: Segment the customer based on their purchase activity;

Goal 2: Identify which type of customer should be promoted to make profit;

Goal 3: Calculate the customer life time value

Step 1: Load data and check the data structure

setwd("C:\\Users\\Yang\\Desktop\\Marketing Engineering\\CLV\\CDNOW_master")
CD_df <- read.table ("CDNOW_master.txt", head=F)
names(CD_df) <- c("ID", "Date", "Num", "Amount")
CD_df[,2] <- as.Date (as.character(CD_df[,2]), "%Y%m%d")
str(CD_df)
## 'data.frame':    69659 obs. of  4 variables:
##  $ ID    : int  1 2 2 3 3 3 3 3 3 4 ...
##  $ Date  : Date, format: "1997-01-01" "1997-01-12" ...
##  $ Num   : int  1 1 5 2 2 2 5 4 1 2 ...
##  $ Amount: num  11.8 12 77 20.8 20.8 ...
UniID<- CD_df[!duplicated(CD_df$ID),]
nrow(UniID)
## [1] 23570

Conclution: The data has 4 variables, ID(customer ID), Date(the date of the transaction), Num (the number of CDs purchased), and Amount (the dolloar value of the transaction). There are 69659 records total and 23570 unique customers.

Step 2: R,F,and M analysis.

R(Recency): How recently did the customer purchase?

F(Frequency): How often do they purchase?

M(Monetary value): How much do they spend each time on average?

startDate <- as.Date("19970101","%Y%m%d")
endDate <- as.Date("19980101","%Y%m%d")

df1 <- CD_df[order(CD_df$Date,decreasing = TRUE),]
df2 <- df1[df1$Date >= startDate & df1$Date <= endDate,]
df3 <- df2[!duplicated(df2$ID),]
Recency <- as.numeric(difftime(endDate,df3$Date,units="days"))
df3$Recency <- NULL
df3$Recency <- Recency
df3 <- df3[order(df3$ID), ]

df2$ID <- as.factor(df2$ID)
fre <- as.data.frame(table(df2$ID))
df3$Frequency <- NULL
df3$Frequency <- fre[,2]

M_df <- aggregate(df2$Amount, list(df2$ID), sum)
df3$Monetary <- NULL
df3$Monetary <- M_df$x
df3$Monetary <- df3$Monetary/df3$Frequency
df3[1:10,]
##    ID       Date Num Amount Recency Frequency Monetary
## 1   1 1997-01-01   1  11.77     365         1 11.77000
## 2   2 1997-01-12   1  12.00     354         2 44.50000
## 8   3 1997-11-25   4  20.96      37         5 27.89400
## 13  4 1997-12-12   2  26.48      20         4 25.12500
## 23  5 1997-12-12   3  40.47      20        10 34.81400
## 25  6 1997-01-01   1  20.99     365         1 20.99000
## 27  7 1997-10-11   7  97.43      82         2 63.08500
## 35  8 1997-12-25   1  13.99       7         7 24.74286
## 38  9 1997-05-13   2  30.33     233         2 26.93500
## 40 10 1997-01-21   3  39.31     345         1 39.31000

Conclusion: R is calculated by the different days of purchase date and enddate(1998-01-01). For other data which purchase date are recent, we may use the current date instead of enddate. F is caulated by the frequency each customer individual purchase during that period. M is calculated by total money spent during that period divided by frequency.

Reference: Data Apple (http://www.dataapple.net/?p=84)