INTRODUCTION

80% of your business comes from 20% of your customers.

As Digital Marketers, when we think of segmentation we often think of demographics, geolocation, or how far along a user went down the sales funnel; but what happens once we acquire the customer?

If a customer is loyal, does it make a difference to continue focussing our marketing dollars on them or should we focus on customers that haven’t shopped on your website in a while. Do you know how much these customers usually spend? Would an email with a special promotion help bring those customers back into the mix?

WHAT IS RFM?

To help answer these questions, many digital marketers often use a RFM Analysis to categorize customers into different segments based on their purchase history. RFM Segmentation or Analysis is a powerful tool used to target groups of customers with relevant communication based on their purchase behavior. RFM stands for Recency, Frequency, and Monetary and is used to categorize customers based on these three quantitative factors:

  • Recency: How recently a customer made a purchase
  • Frequency: How often a customer makes a purchase
  • Monetary Value: How much money a customer has spent on purchases

RFM Analysis assigns a rank to every customer in each of these categories. Usually the higher the ranking, the better the customer. The best customer would have a top ranking in each category

Understanding Recency, Frequency, and Monetary Value

RFM is thought to have originated in direct mail in an article by Jan Roelf and Tom Wansbeek, “Optimal Selection for Direct Mail”, in a 1995 issue of Marketing Science.

Lets go over the concepts of Recency, Frequency, and Monetary value and how they can affect your marketing strategy.

Recency

A customer that has recently shopped at a website is more likely to shop again than a customer who hasn’t made a purchase in quite some time.

This data can be used to target users that have not shopped in a while by offering them a special promotion with an email encouraging these users to give the company another shot. If a user has recently shopped, we can offer them a retargeting ad with an item that might complement whatever they recently purchased.

Frequency

How often a customer buys something on the website can be affected by many factors like price point, the need for replacement, and obviously, the type of product. Customers are more likely to have to buy eggs more often that they would buy a laptop. We can use the various digital strategies to direct users to replace their product based on a predicted purchase cycle.

Monetary

Monetary value has to do with how much a customer has spent during a particular length of time. Just like in Vegas, “high rollers” usually get special treatment as the ROI for customers with a high monetary value tend to be greater.

We can’t disregard customers with low monetary value as they may be the best cheerleaders as loyal customers.

Why is RFM Important?

RFM Segmentation allows us to categorize a customer base and allows us to try different levers to help push our customers into a higher category. RFM allows us to better understand our customers. It helps us show that special promotion to that customer that may not visited in a while or a remarketing campaign for a special luxury item for a user that shows high monetary value.

Other Way of Using RFM

RFM Segmentation isn’t only used for e-commerce. By simply replacing the Monetary category with Time-on-Page, we can use RFM to help users to stay on your page longer.

Don’t throw the baby out with the bathwater!

Even though RFM can be a power tool. It is not meant to replace other types of segmentation like psychographic or lifestyle but it should be used in combination with these other forms of segmenting. A 19 year-old male frequent shopper from NYC usually has different interest than a 65 year-old female frequent shopper from Kansas.

RFM Step By Step

In this analysis, we will be working with the e-commerce dataset from the UCI MachineLearning Repository that contains over 150.000 transactions for an UK-based online retailer. You can find this data set here.

If you want to follow along with your own data, you can generate this date from your CRM. The following fields are are required:

Set Up and Getting the Data

First, we will install the neccessary libraries. Fortunately, there is a R package, the rfm package that does most of the work in creating a RFM Analysis for you. The data found on Kaggle comes in a csv file. The code below gets the data into R and gives us a glimpse into out table.

Looking at the raw data, we see the that it consist of eight columns with the following observations:

  • Observations
    • StockCode: An identification number for the product purchased. Not Needed.
    • Description: Description of product purchased. Not needed.
    • InvoiceDate: Date purchase was invoiced. Not in date format.
    • Country: Country customer purchased from. Not Needed
    • No Revenue: No revenue column, but we can get this by multiplying UnitPrice and Quantity.

There are several columns that we do not need for this excercise. We do not have a Revenue column but we can get it by multiplying UnitPrice and Quantity.

Feature Engineering

We will have to feature engineer a little to get the data into a format that we need it in for our RFM Analysis. In the following, we will do the following steps:

  • Following Steps
    • Remove unnecessary columns
    • Create a revenue column
    • Reformat InvoiceDate
InvoiceNo Quantity InvoiceDate UnitPrice CustomerID Revenue
536365 6 2010-12-01 2.55 17850 15.30
536365 6 2010-12-01 3.39 17850 20.34
536365 8 2010-12-01 2.75 17850 22.00
536365 6 2010-12-01 3.39 17850 20.34
536365 6 2010-12-01 3.39 17850 20.34
536365 2 2010-12-01 7.65 17850 15.30

Data Exploration

Now that we have revenue, lets look at a summary to get a better idea of what we are working with.

InvoiceNo Quantity InvoiceDate UnitPrice CustomerID Revenue
573585 : 1114 Min. :-80995.00 Min. :2010-12-01 Min. :-11062.06 Min. :12346 Min. :-168469.60
581219 : 749 1st Qu.: 1.00 1st Qu.:2011-03-28 1st Qu.: 1.25 1st Qu.:13953 1st Qu.: 3.40
581492 : 731 Median : 3.00 Median :2011-07-19 Median : 2.08 Median :15152 Median : 9.75
580729 : 721 Mean : 9.55 Mean :2011-07-04 Mean : 4.61 Mean :15288 Mean : 17.99
558475 : 705 3rd Qu.: 10.00 3rd Qu.:2011-10-19 3rd Qu.: 4.13 3rd Qu.:16791 3rd Qu.: 17.40
579777 : 687 Max. : 80995.00 Max. :2011-12-09 Max. : 38970.00 Max. :18287 Max. : 168469.60
(Other):537202 NA NA NA NA’s :135080 NA

Something that quickly jumps out is that several columnns contain NA’s. There are also negatives amounts in quantity, UnitPrice, and Revenue. Let’s plot this on a chart and see what this looks like.

Negative Quantities and UnitPrice

Negative Quantities and UnitPrice

Looking the above graph and we can clearly see that there are transactions that have negative quantities and negative unit prices. Lets filter for all entries that have negative entries to get a better understanding of our data set.

InvoiceNo Quantity InvoiceDate UnitPrice CustomerID Revenue
A563186 1 2011-08-12 -11062.06 NA -11062.06
A563187 1 2011-08-12 -11062.06 NA -11062.06
536414 56 2010-12-01 0.00 NA 0.00
536545 1 2010-12-01 0.00 NA 0.00
536546 1 2010-12-01 0.00 NA 0.00
536547 1 2010-12-01 0.00 NA 0.00
536549 1 2010-12-01 0.00 NA 0.00
536550 1 2010-12-01 0.00 NA 0.00
536552 1 2010-12-01 0.00 NA 0.00
536553 3 2010-12-01 0.00 NA 0.00

We now see that we have over 11,000 entries that have negative quantities or revenue. Since we are trying to categorize our customers, lets take out all the NA’s for CustomerId and remove anything that might be credit adjustments, or refunds and see what that looks like.

Comparing Unit Price and Quantity after cleaning the data

Comparing Unit Price and Quantity after cleaning the data

I took the liberty of doing the log of both Quantity and UnitPrice to help paint a better picture of our data. We can see that all negative values are gone. Thee data looks like it makes sense.We now have a table that’s ready for our rfm functions.

InvoiceNo Quantity InvoiceDate UnitPrice CustomerID Revenue
581483 80995 2011-12-09 2.08 16446 168469.60
541431 74215 2011-01-18 1.04 12346 77183.60
556444 60 2011-06-10 649.50 15098 38970.00
551697 1 2011-05-03 8142.75 16029 8142.75
567423 1412 2011-09-20 5.06 17450 7144.72
540815 3114 2011-01-11 2.10 15749 6539.40
550461 3114 2011-04-18 2.10 15749 6539.40
573003 2400 2011-10-27 2.08 14646 4992.00
540815 1930 2011-01-11 2.55 15749 4921.50
550461 1930 2011-04-18 2.40 15749 4632.00

RFM Package

We will be using the rfm package to help with our RFM analysis. We have created a revenue column and removed NA’s, negative quantities and UnitPrices. We will let the rfm package do most of the leg work and score our customers based on recency, frequency, and monetary.

As you can see above, the rfm_table_order function not only automatically gives our customerID a recency, frequency, and monetary score but it also shows the last time a customer made a purchase, their total amount of transactions, and total revenue.

RFM Visualizations

The rfm Package has some handy visualizations that come right out-of-the-box. All we have to do is to put the “rfm_result” object we just created into the visualization functions in the rfM Package and the package does the rest.

Heat Map

Monetary Value is concentrated in the high frequency

Monetary Value is concentrated in the high frequency

We see that a big portion of the monetary value is concentrated around frequent shoppers. We also see some monetary value concentrated around new shoppers who have a high recency but low frequency. It might behoove us to give special attention to these users and make sure they become frequent shoppers.

Bar Chart

RFM distribution

RFM distribution

Here we have another view of how our shoppers are distributed among the different categories. As mentioned before, our shoppers tend to shop frequently but we can give special care to users who haven’t shopped recently. What would you do to reach these customers and motivate them to shop again?

Scatterplot

A large proportion of the customers have shopped within the last 100 days

A large proportion of the customers have shopped within the last 100 days

It appears that a large portion of customers are concentrated below 100 days although there are a few high monetary value customers who haven’t had a purchase in over 200 days. What kind of specialized message can we use to bring them back into the fold?

There’s a strong correlation between how frequently a shopper purchases at our website and their monetary value. We do have a few low frequency/high-value outliers. It might make sense to isolate these low frequency/high-value customers to reach them with a targetted message.

This is another strong indication of the importance of bringing customers back to out website. Are the ads that we’re showing our customers have items they might be interested in? Is there a purchase cycle we can detect?

RFM Categories

Now that we have scores for our customers, we’re going to create different categories using the rfm_segment function. The categories are not set in stone, so feel free to come up with segments that make sense for your business. Feel free to use the segments I’m using as a start.

To create our segments, we have to set our upper and lower bounds for each category created as you can see in the code chunk below.

customer_id segment rfm_score transaction_count recency_days amount recency_score frequency_score monetary_score
12346 Others 115 1 348 77183.60 1 1 5
12347 Champions 555 182 25 4310.00 5 5 5
12348 Loyal Customers 234 31 98 1797.24 2 3 4
12349 Champions 444 73 41 1757.55 4 4 4
12350 At Risk 122 17 333 334.40 1 2 2
12352 Loyal Customers 345 85 59 2506.04 3 4 5

RFM Visuals

Now that we have our customers separated into the different RFM segments, let’s use the rfm package’s built-in function to visualize our data based on these assigned segments.

RMF Recency by Segment

RMF Recency by Segment

Above we see the median recency for the different customer segments. It makes sense that Lost and At Risk customers have an average median recency of over 250 days with the Champions and Potential Loyalist having a fraction of that.

RMF Monetary Value Char Chart by Segment

RMF Monetary Value Char Chart by Segment

The above plot shows the importanancet of the Champions segment as they have by far the greatest median monetary value. We also see the large disparity between champions and the second closest group of Loyal Customers with Champions having more than 2.5 times the value. We also see that At Risk shoppers have a good median monetary value and as we saw in the previous chart, the At Risk group is one of the largest groups.

Conclusion

As you can see above, the RFM Analysis is a powerful tool to use with other segmenting strategies. The rfm package makes it easy to categorize and visualize your customer base. This helps us get a clearer picture of your customer base’s shopping habits.

If you have any questions or any strategies you want so share using RFM analysis, feel free to share.