INTRODUCTION
80% of your business comes from 20% of your customers.
As Digital Marketers, when we think of segmentation we often think of demographics, geolocation, or how far along a user went down the sales funnel; but what happens once we acquire the customer?
If a customer is loyal, does it make a difference to continue focussing our marketing dollars on them or should we focus on customers that haven’t shopped on your website in a while. Do you know how much these customers usually spend? Would an email with a special promotion help bring those customers back into the mix?
WHAT IS RFM?
To help answer these questions, many digital marketers often use a RFM Analysis to categorize customers into different segments based on their purchase history. RFM Segmentation or Analysis is a powerful tool used to target groups of customers with relevant communication based on their purchase behavior. RFM stands for Recency, Frequency, and Monetary and is used to categorize customers based on these three quantitative factors:
- Recency: How recently a customer made a purchase
- Frequency: How often a customer makes a purchase
- Monetary Value: How much money a customer has spent on purchases
RFM Analysis assigns a rank to every customer in each of these categories. Usually the higher the ranking, the better the customer. The best customer would have a top ranking in each category
Understanding Recency, Frequency, and Monetary Value
RFM is thought to have originated in direct mail in an article by Jan Roelf and Tom Wansbeek, “Optimal Selection for Direct Mail”, in a 1995 issue of Marketing Science.
Lets go over the concepts of Recency, Frequency, and Monetary value and how they can affect your marketing strategy.
Recency
A customer that has recently shopped at a website is more likely to shop again than a customer who hasn’t made a purchase in quite some time.
This data can be used to target users that have not shopped in a while by offering them a special promotion with an email encouraging these users to give the company another shot. If a user has recently shopped, we can offer them a retargeting ad with an item that might complement whatever they recently purchased.
Frequency
How often a customer buys something on the website can be affected by many factors like price point, the need for replacement, and obviously, the type of product. Customers are more likely to have to buy eggs more often that they would buy a laptop. We can use the various digital strategies to direct users to replace their product based on a predicted purchase cycle.
Monetary
Monetary value has to do with how much a customer has spent during a particular length of time. Just like in Vegas, “high rollers” usually get special treatment as the ROI for customers with a high monetary value tend to be greater.
We can’t disregard customers with low monetary value as they may be the best cheerleaders as loyal customers.
Why is RFM Important?
RFM Segmentation allows us to categorize a customer base and allows us to try different levers to help push our customers into a higher category. RFM allows us to better understand our customers. It helps us show that special promotion to that customer that may not visited in a while or a remarketing campaign for a special luxury item for a user that shows high monetary value.
Other Way of Using RFM
RFM Segmentation isn’t only used for e-commerce. By simply replacing the Monetary category with Time-on-Page, we can use RFM to help users to stay on your page longer.
Don’t throw the baby out with the bathwater!
Even though RFM can be a power tool. It is not meant to replace other types of segmentation like psychographic or lifestyle but it should be used in combination with these other forms of segmenting. A 19 year-old male frequent shopper from NYC usually has different interest than a 65 year-old female frequent shopper from Kansas.
RFM Step By Step
In this analysis, we will be working with the e-commerce dataset from the UCI MachineLearning Repository that contains over 150.000 transactions for an UK-based online retailer. You can find this data set here.
If you want to follow along with your own data, you can generate this date from your CRM. The following fields are are required:
- Fields Required
- CustomerID or Customer Name
- Sales or Invoice Date
- Invoice Number
- Revenue or Sales Amount
Set Up and Getting the Data
First, we will install the neccessary libraries. Fortunately, there is a R package, the rfm package that does most of the work in creating a RFM Analysis for you. The data found on Kaggle comes in a csv file. The code below gets the data into R and gives us a glimpse into out table.
Looking at the raw data, we see the that it consist of eight columns with the following observations:
- Observations
- StockCode: An identification number for the product purchased. Not Needed.
- Description: Description of product purchased. Not needed.
- InvoiceDate: Date purchase was invoiced. Not in date format.
- Country: Country customer purchased from. Not Needed
- No Revenue: No revenue column, but we can get this by multiplying UnitPrice and Quantity.
- StockCode: An identification number for the product purchased. Not Needed.
There are several columns that we do not need for this excercise. We do not have a Revenue column but we can get it by multiplying UnitPrice and Quantity.
Feature Engineering
We will have to feature engineer a little to get the data into a format that we need it in for our RFM Analysis. In the following, we will do the following steps:
- Following Steps
- Remove unnecessary columns
- Create a revenue column
- Reformat InvoiceDate
ecom <- mutate(ecommerce[,c(-2:-3, -8)], # gets only columns we need
Revenue = UnitPrice*Quantity, # creates revenue
InvoiceDate =as.Date(InvoiceDate, # reformats data format
format='%m/%d/%Y'))
ecom %>%
head() %>%
kable() %>%
kable_minimal()
InvoiceNo | Quantity | InvoiceDate | UnitPrice | CustomerID | Revenue |
---|---|---|---|---|---|
536365 | 6 | 2010-12-01 | 2.55 | 17850 | 15.30 |
536365 | 6 | 2010-12-01 | 3.39 | 17850 | 20.34 |
536365 | 8 | 2010-12-01 | 2.75 | 17850 | 22.00 |
536365 | 6 | 2010-12-01 | 3.39 | 17850 | 20.34 |
536365 | 6 | 2010-12-01 | 3.39 | 17850 | 20.34 |
536365 | 2 | 2010-12-01 | 7.65 | 17850 | 15.30 |
Data Exploration
Now that we have revenue, lets look at a summary to get a better idea of what we are working with.
InvoiceNo | Quantity | InvoiceDate | UnitPrice | CustomerID | Revenue | |
---|---|---|---|---|---|---|
573585 : 1114 | Min. :-80995.00 | Min. :2010-12-01 | Min. :-11062.06 | Min. :12346 | Min. :-168469.60 | |
581219 : 749 | 1st Qu.: 1.00 | 1st Qu.:2011-03-28 | 1st Qu.: 1.25 | 1st Qu.:13953 | 1st Qu.: 3.40 | |
581492 : 731 | Median : 3.00 | Median :2011-07-19 | Median : 2.08 | Median :15152 | Median : 9.75 | |
580729 : 721 | Mean : 9.55 | Mean :2011-07-04 | Mean : 4.61 | Mean :15288 | Mean : 17.99 | |
558475 : 705 | 3rd Qu.: 10.00 | 3rd Qu.:2011-10-19 | 3rd Qu.: 4.13 | 3rd Qu.:16791 | 3rd Qu.: 17.40 | |
579777 : 687 | Max. : 80995.00 | Max. :2011-12-09 | Max. : 38970.00 | Max. :18287 | Max. : 168469.60 | |
(Other):537202 | NA | NA | NA | NA’s :135080 | NA |
Something that quickly jumps out is that several columnns contain NA’s. There are also negatives amounts in quantity, UnitPrice, and Revenue. Let’s plot this on a chart and see what this looks like.
ecom %>%
ggplot(aes(Quantity, UnitPrice)) +
geom_point(alpha=.3) +
geom_jitter() +
labs(title="Comparing UnitPrice against Quantity",
tag = "Figure 1") +
theme_minimal()
Negative Quantities and UnitPrice
Looking the above graph and we can clearly see that there are transactions that have negative quantities and negative unit prices. Lets filter for all entries that have negative entries to get a better understanding of our data set.
ecom %>%
filter(UnitPrice <= 0 | Quantity <= 0 ) %>%
arrange(UnitPrice) %>%
head(10) %>%
kable() %>%
kable_classic_2()
InvoiceNo | Quantity | InvoiceDate | UnitPrice | CustomerID | Revenue |
---|---|---|---|---|---|
A563186 | 1 | 2011-08-12 | -11062.06 | NA | -11062.06 |
A563187 | 1 | 2011-08-12 | -11062.06 | NA | -11062.06 |
536414 | 56 | 2010-12-01 | 0.00 | NA | 0.00 |
536545 | 1 | 2010-12-01 | 0.00 | NA | 0.00 |
536546 | 1 | 2010-12-01 | 0.00 | NA | 0.00 |
536547 | 1 | 2010-12-01 | 0.00 | NA | 0.00 |
536549 | 1 | 2010-12-01 | 0.00 | NA | 0.00 |
536550 | 1 | 2010-12-01 | 0.00 | NA | 0.00 |
536552 | 1 | 2010-12-01 | 0.00 | NA | 0.00 |
536553 | 3 | 2010-12-01 | 0.00 | NA | 0.00 |
We now see that we have over 11,000 entries that have negative quantities or revenue. Since we are trying to categorize our customers, lets take out all the NA’s for CustomerId and remove anything that might be credit adjustments, or refunds and see what that looks like.
ecom_clean <- ecom %>%
drop_na(CustomerID) %>% # Removes entries with out customer information
filter(Quantity > 0,
UnitPrice > 0 ) #Removes entries with negative quantity or Unit Price
ecom_clean %>%
ggplot(aes(Quantity, UnitPrice)) +
geom_point(alpha=.3, aes()) +
geom_jitter() +
labs(title="Comparing UnitPrice against Quantity",
subtitle = "There are negatives for both Quantity and UnitPrice",
tag = "Figure 2") +
theme_light() +
scale_x_log10() + # to visualize better
scale_y_log10()
Comparing Unit Price and Quantity after cleaning the data
I took the liberty of doing the log of both Quantity and UnitPrice to help paint a better picture of our data. We can see that all negative values are gone. Thee data looks like it makes sense.We now have a table that’s ready for our rfm functions.
InvoiceNo | Quantity | InvoiceDate | UnitPrice | CustomerID | Revenue |
---|---|---|---|---|---|
581483 | 80995 | 2011-12-09 | 2.08 | 16446 | 168469.60 |
541431 | 74215 | 2011-01-18 | 1.04 | 12346 | 77183.60 |
556444 | 60 | 2011-06-10 | 649.50 | 15098 | 38970.00 |
551697 | 1 | 2011-05-03 | 8142.75 | 16029 | 8142.75 |
567423 | 1412 | 2011-09-20 | 5.06 | 17450 | 7144.72 |
540815 | 3114 | 2011-01-11 | 2.10 | 15749 | 6539.40 |
550461 | 3114 | 2011-04-18 | 2.10 | 15749 | 6539.40 |
573003 | 2400 | 2011-10-27 | 2.08 | 14646 | 4992.00 |
540815 | 1930 | 2011-01-11 | 2.55 | 15749 | 4921.50 |
550461 | 1930 | 2011-04-18 | 2.40 | 15749 | 4632.00 |
RFM Package
We will be using the rfm package to help with our RFM analysis. We have created a revenue column and removed NA’s, negative quantities and UnitPrices. We will let the rfm package do most of the leg work and score our customers based on recency, frequency, and monetary.
rfm_result <- rfm_table_order(
data = ecom_clean,
customer_id = CustomerID,
revenue = Revenue,
order_date = InvoiceDate,
analysis_date = as.Date("2012/01/01")
)
As you can see above, the rfm_table_order function not only automatically gives our customerID a recency, frequency, and monetary score but it also shows the last time a customer made a purchase, their total amount of transactions, and total revenue.
RFM Visualizations
The rfm Package has some handy visualizations that come right out-of-the-box. All we have to do is to put the “rfm_result” object we just created into the visualization functions in the rfM Package and the package does the rest.
Heat Map
Monetary Value is concentrated in the high frequency
We see that a big portion of the monetary value is concentrated around frequent shoppers. We also see some monetary value concentrated around new shoppers who have a high recency but low frequency. It might behoove us to give special attention to these users and make sure they become frequent shoppers.
Bar Chart
RFM distribution
Here we have another view of how our shoppers are distributed among the different categories. As mentioned before, our shoppers tend to shop frequently but we can give special care to users who haven’t shopped recently. What would you do to reach these customers and motivate them to shop again?
Scatterplot
A large proportion of the customers have shopped within the last 100 days
It appears that a large portion of customers are concentrated below 100 days although there are a few high monetary value customers who haven’t had a purchase in over 200 days. What kind of specialized message can we use to bring them back into the fold?
There’s a strong correlation between how frequently a shopper purchases at our website and their monetary value. We do have a few low frequency/high-value outliers. It might make sense to isolate these low frequency/high-value customers to reach them with a targetted message.
This is another strong indication of the importance of bringing customers back to out website. Are the ads that we’re showing our customers have items they might be interested in? Is there a purchase cycle we can detect?
RFM Categories
Now that we have scores for our customers, we’re going to create different categories using the rfm_segment function. The categories are not set in stone, so feel free to come up with segments that make sense for your business. Feel free to use the segments I’m using as a start.
To create our segments, we have to set our upper and lower bounds for each category created as you can see in the code chunk below.
# We label the various segments
segment_names <- c("Champions", "Loyal Customers", "Potential Loyalist",
"New Customers", "Promising", "Need Attention", "About To Sleep",
"At Risk", "Can't Lose Them", "Lost")
# We set the upper and lower bounds for recency, frequency, and monetary for the above segments
recency_lower <- c(4, 2, 3, 4, 3, 2, 2, 1, 1, 1)
recency_upper <- c(5, 5, 5, 5, 4, 3, 3, 2, 1, 2)
frequency_lower <- c(4, 3, 1, 1, 1, 2, 1, 2, 4, 1)
frequency_upper <- c(5, 5, 3, 1, 1, 3, 2, 5, 5, 2)
monetary_lower <- c(4, 3, 1, 1, 1, 2, 1, 2, 4, 1)
monetary_upper <- c(5, 5, 3, 1, 1, 3, 2, 5, 5, 2)
# We use the segments and the bounds we previously established to group our users into different segments
segment <- rfm_segment(rfm_result,
segment_names,
recency_lower,
recency_upper,
frequency_lower,
frequency_upper,
monetary_lower,
monetary_upper)
head(segment) %>%
kable() %>%
kable_classic_2()
customer_id | segment | rfm_score | transaction_count | recency_days | amount | recency_score | frequency_score | monetary_score |
---|---|---|---|---|---|---|---|---|
12346 | Others | 115 | 1 | 348 | 77183.60 | 1 | 1 | 5 |
12347 | Champions | 555 | 182 | 25 | 4310.00 | 5 | 5 | 5 |
12348 | Loyal Customers | 234 | 31 | 98 | 1797.24 | 2 | 3 | 4 |
12349 | Champions | 444 | 73 | 41 | 1757.55 | 4 | 4 | 4 |
12350 | At Risk | 122 | 17 | 333 | 334.40 | 1 | 2 | 2 |
12352 | Loyal Customers | 345 | 85 | 59 | 2506.04 | 3 | 4 | 5 |
RFM Visuals
Now that we have our customers separated into the different RFM segments, let’s use the rfm package’s built-in function to visualize our data based on these assigned segments.
RMF Recency by Segment
Above we see the median recency for the different customer segments. It makes sense that Lost and At Risk customers have an average median recency of over 250 days with the Champions and Potential Loyalist having a fraction of that.
RMF Monetary Value Char Chart by Segment
The above plot shows the importanancet of the Champions segment as they have by far the greatest median monetary value. We also see the large disparity between champions and the second closest group of Loyal Customers with Champions having more than 2.5 times the value. We also see that At Risk shoppers have a good median monetary value and as we saw in the previous chart, the At Risk group is one of the largest groups.
Conclusion
As you can see above, the RFM Analysis is a powerful tool to use with other segmenting strategies. The rfm package makes it easy to categorize and visualize your customer base. This helps us get a clearer picture of your customer base’s shopping habits.
If you have any questions or any strategies you want so share using RFM analysis, feel free to share.