Goal 1: Segment the customer based on their purchase activity;
Goal 2: Identify which type of customer should be promoted to make profit;
Goal 3: Calculate the customer life time value
setwd("C:\\Users\\Yang\\Desktop\\Marketing Engineering\\CLV\\CDNOW_master")
CD_df <- read.table ("CDNOW_master.txt", head=F)
names(CD_df) <- c("ID", "Date", "Num", "Amount")
CD_df[,2] <- as.Date (as.character(CD_df[,2]), "%Y%m%d")
str(CD_df)
## 'data.frame': 69659 obs. of 4 variables:
## $ ID : int 1 2 2 3 3 3 3 3 3 4 ...
## $ Date : Date, format: "1997-01-01" "1997-01-12" ...
## $ Num : int 1 1 5 2 2 2 5 4 1 2 ...
## $ Amount: num 11.8 12 77 20.8 20.8 ...
UniID<- CD_df[!duplicated(CD_df$ID),]
nrow(UniID)
## [1] 23570
Conclution: The data has 4 variables, ID(customer ID), Date(the date of the transaction), Num (the number of CDs purchased), and Amount (the dolloar value of the transaction). There are 69659 records total and 23570 unique customers.
R(Recency): How recently did the customer purchase?
F(Frequency): How often do they purchase?
M(Monetary value): How much do they spend each time on average?
startDate <- as.Date("19970101","%Y%m%d")
endDate <- as.Date("19980101","%Y%m%d")
df1 <- CD_df[order(CD_df$Date,decreasing = TRUE),]
df2 <- df1[df1$Date >= startDate & df1$Date <= endDate,]
df3 <- df2[!duplicated(df2$ID),]
Recency <- as.numeric(difftime(endDate,df3$Date,units="days"))
df3$Recency <- NULL
df3$Recency <- Recency
df3 <- df3[order(df3$ID), ]
df2$ID <- as.factor(df2$ID)
fre <- as.data.frame(table(df2$ID))
df3$Frequency <- NULL
df3$Frequency <- fre[,2]
M_df <- aggregate(df2$Amount, list(df2$ID), sum)
df3$Monetary <- NULL
df3$Monetary <- M_df$x
df3$Monetary <- df3$Monetary/df3$Frequency
df3[1:10,]
## ID Date Num Amount Recency Frequency Monetary
## 1 1 1997-01-01 1 11.77 365 1 11.77000
## 2 2 1997-01-12 1 12.00 354 2 44.50000
## 8 3 1997-11-25 4 20.96 37 5 27.89400
## 13 4 1997-12-12 2 26.48 20 4 25.12500
## 23 5 1997-12-12 3 40.47 20 10 34.81400
## 25 6 1997-01-01 1 20.99 365 1 20.99000
## 27 7 1997-10-11 7 97.43 82 2 63.08500
## 35 8 1997-12-25 1 13.99 7 7 24.74286
## 38 9 1997-05-13 2 30.33 233 2 26.93500
## 40 10 1997-01-21 3 39.31 345 1 39.31000
Conclusion: R is calculated by the different days of purchase date and enddate(1998-01-01). For other data which purchase date are recent, we may use the current date instead of enddate. F is caulated by the frequency each customer individual purchase during that period. M is calculated by total money spent during that period divided by frequency.
Reference: Data Apple (http://www.dataapple.net/?p=84)