Introduction

At the heart of marketing analytics, lie a few basic questions:

One: who are my customers? Two: which customer should I target, and spend most of my marketing budget on? And three, what’s the future value of my customers? That’s exactly what I’m prepared to do in this project:

  • Segmentation is all about understanding your customers.

  • Scoring models are about targeting the right ones. And:

  • customer lifetime value is about anticipating their future value.

These are the foundations of Marketing Analytics.

Note that for this particular project, the customer here is the zip code. Meaning that instead of dealing with each customer individually, we’ll group all customers from a particular zip code under one group, and treat this group as a unique customer, and repeat the process for all zip codes.

Customer Segmentation:

The segmentation will be based on the most 3 useful segmentation variables:

Recency, Frequency, and Monetary Value (RFM)

  • Recency indicates when a customer made his or her last purchase.

  • Frequency refers to the number of purchases made in the past. The more purchases have been made in the past, the more likely additional purchases will occur in the future. Finally,

  • monetary value refers to the amount of money spent on average at each purchase occasion. Obviously, the more a costumer spends on average, the more valuable he is

The following code will do the following:

read in a sample data set, add an ID column to represent unique customers, and then add 3 more columns for the recency, frequency, and monetary value:

# reading the dataset
dat <- read_excel("Sample_Data .xlsx")

# have a look at it
dat
Email ID Date Transac Value
100pkp@gmail.com 2016-11-25 365.2
101pkp@gmail.com 2016-03-16 588.2
102pkp@gmail.com 2016-02-10 646.2
103pkp@gmail.com 2016-03-09 670.2
104pkp@gmail.com 2016-08-30 1694.2
104pkp@gmail.com 2016-05-30 604.2
104pkp@gmail.com 2016-05-08 992.2
104pkp@gmail.com 2016-02-18 2178.2
105pkp@gmail.com 2016-08-20 678.2
106pkp@gmail.com 2016-03-19 622.2
107pkp@gmail.com 2016-07-31 958.2
108pkp@gmail.com 2016-05-17 892.2
109pkp@gmail.com 2016-09-17 927.2
110pkp@gmail.com 2016-07-07 779.2
111pkp@gmail.com 2016-04-22 663.2
112pkp@gmail.com 2016-01-21 897.2
113pkp@gmail.com 2016-09-27 164.2
113pkp@gmail.com 2016-09-18 365.2
113pkp@gmail.com 2016-09-17 584.2
113pkp@gmail.com 2016-08-21 152.2
113pkp@gmail.com 2016-02-24 365.2
# remove the spaces from variables' names for easier coding
colnames(dat) = c("EmailID", "Date", "TransacValue")

new_dat <- dat %>% 
         # add a uniqure customer ID
  mutate(ID = as.numeric(factor(EmailID)),
         
         # Create a new variable representing the number of days since last purchase
         # by computing the time difference between the last day in the database 
         # (assume this to be the first day in the new year, i.e. "2017-01-01")
         # and the date of purchase
         # This is used to calculate recency
         DaysSince = as.numeric(interval(start = Date,
                                         end = "2017-01-01"))/3600/24) %>% 
  
  # calculate the recency, frequncy, and monetary value
  group_by(ID) %>%
  summarise(Recency = min(DaysSince),
            FirstPurchase = max(DaysSince),
            Frequency = n(),
            AvgPurchaseAmount = mean(TransacValue)) %>% 
  select(ID, everything())

# now have a look at the new data
new_dat
ID Recency FirstPurchase Frequency AvgPurchaseAmount
1 37 37 1 365.2
2 291 291 1 588.2
3 326 326 1 646.2
4 298 298 1 670.2
5 124 318 4 1367.2
6 134 134 1 678.2
7 288 288 1 622.2
8 154 154 1 958.2
9 229 229 1 892.2
10 106 106 1 927.2
11 178 178 1 779.2
12 254 254 1 663.2
13 346 346 1 897.2
14 96 312 5 326.2
# Let's see a quick statistical summary of our new data 
summary(new_dat)
##        ID           Recency      FirstPurchase     Frequency  
##  Min.   : 1.00   Min.   : 37.0   Min.   : 37.0   Min.   :1.0  
##  1st Qu.: 4.25   1st Qu.:126.5   1st Qu.:160.0   1st Qu.:1.0  
##  Median : 7.50   Median :203.5   Median :271.0   Median :1.0  
##  Mean   : 7.50   Mean   :204.4   Mean   :233.6   Mean   :1.5  
##  3rd Qu.:10.75   3rd Qu.:290.2   3rd Qu.:308.5   3rd Qu.:1.0  
##  Max.   :14.00   Max.   :346.0   Max.   :346.0   Max.   :5.0  
##  AvgPurchaseAmount
##  Min.   : 326.2   
##  1st Qu.: 628.2   
##  Median : 674.2   
##  Mean   : 741.5   
##  3rd Qu.: 896.0   
##  Max.   :1367.2

Note: again, in this project, we’ll group each zip code customers together and do these calculations on a zip-code by zip-code basis. For example the recency would be computed as the average recency for all customers that belong to a particular zip code. The same is true for the frequency and monetary value.

Next, we need to perform the segmentation based on the calculated RFM. There are two approaches to achieve this:

Approach (1): Statistical Segmentation

In this approach you let your statistical software find the segments automatically. There are many methods to do that, the simplest of which is hierarchical clustering.

Approach (2): Managerial Segmentation

In this approach you divide your customers into multiple segments that are managerially meaningful. An example of such segmentation is illustrated below:

Porposed Segmentation Models:

A proposed simple segmentation model based on RFM variables includes dividing customers into four groups, or segments, based on their recency:

  • I define as active, a customer who purchased something within the last 12 months,
  • as warm, someone whose last purchase happens a year before, i.e. between 13 and 24 months.
  • A cold customer, is one whose last purchase was between two and three years ago. For those who haven’t purchased anything for more than three years, we qualify them a inactive.
The picture below represent this:
Proposed Customers Segmentation


But for a better and more effective segmentation, I would also like to treat recently acquired customers differently. Maybe be sending them special coupons, or a welcome packet. Or maybe we just want to track whether we’ll be able to keep those new customers in the long run. To that avail, I take all the customers in the active segment, and put the new ones, Those with history of just one purchase, so far, into a segment I call: new active customers.

Now, let’s go one step further and divide the remaining active customers into two subgroups, based on how much money they spend on average. Based on my analysis, I’ll decide to put the bar at $100, and qualify as higher value those customers who spend $100 or more, on average, at each purchase occasion. And qualify as lower value, those who spend less.

Also I will apply the same criteria to the warm customers. So we can decide on which ones we want to concentrate our marketing budget.

The picture below represent this model of segmentation, which I will implement:
A More Elaborate Segmentation Model


This segmentation is, obviously, very simple, and you could imagine much more complex segmentations with dozens of segments, and many more than only a few segmentations variables used. But you get the point; It can be already very useful. And it’s quite relevant for managers.

Building scoring models

After I’m done with segmentation, I’ll move on to build scoring models; that is, building models to predict how much money your customers (here: customers are the zip-codes as mentioned before) are going to spend over the next 12 month.

To do so, I am going to create a calibration data. I will be going back in time, 12 month ago, and extract from that data, two separate types of variables:

  • The first type of variables are called predictors. These are the bits of information we had about each customer at the time, as if we went back in time and had no clue about what they did over the next 12 months. For instance, we compute what was their recency, frequency, and monetary value, for each customer, a year ago.

  • The second type of variable is the variable we try to predict, it’s sometimes called the target variable. In our case, I’ll predict whether a customer will remain active and make at least one purchase. And if so, how much money each customer is likely to spend. In other words, I predict both the probability and the likely amount.

Once we have built our calibration data, containing both the predictors and the target variable, the next step is to link the two through a statistical model. The model will predict customers probabilities of purchase as well as the most likely amount.

Customers Lifetime Value

Customer lifetime value is a powerful piece of business intelligence that informs an efficient strategy for business growth. Customer Lifetime Value is defined as the value (in current dollars) of the net profit you can expect from a given customer’s purchases over the entire life of the customer relationship.

With a CLV calculation, you’re learning what your average customer is “worth” to your company. Let’s say a customer CLV is $60K. If it cost you $15K a year to service this customer and it cost $20K to aquire this customer. How long do you want to keep this customer? Obviously no more than 3 years or you’ll only be breaking even and starting to lose money. (Again, when we say customer here, we’re aggregating by zip codes)

Understnding this type of data helps a company be much more profitable and also helps define the ideal customers for the marketing and sales team to focus on.

To calculate Customers Lifetime Value, we need to do 3 steps:

Computing The Transition Matirx

Using the segmentation model I’ve developed before, I can make a snapshot of customers database today; how many customers I have and how many fall into each segment. And of course, I can have the same snapshot taken say, a year ago with the same information.

Also, I can check how customers went from one segment to another. For example, some of the active high value customers maybe kept making high volume purchases and remained in the same segment. Others remained active but spent less and moved to the lower value segments. And so on.

I will then transform these figures into probabilities, and consider that these probabilities will likely remain stable over time. In other words, if 50% of your high value customers a year ago remain in the same segment this year, then it’s likely that about 50% of your high value customers today will remain high value customers next year. And the same logic will apply to the year after, and the year after, and so on.

Basically, by analyzing what happened in the recent past, you will predict what will likely happened in the near future. Of course, I need to compute these transition probabilities from any segment to any segment.

The tool I’ll use is the Transition Matrix. The picture below represent it:
The Transition Matrix


The left most column represent the segments in which customers were, say, a year ago and the top most row represent the segments in which customers are today. Each value in that matrix, each probability, represent the likelihood of going from one segment to another, and, of course, because all customers need to go somewhere, each row sums to 100 per cent.

Note that by the very definition of the segments, some cells may always be equal to zero, meaning there is a zero probability of going from one specific segment to another. For instance, you may never go from active to inactive in just one period, but that’s fine. That’s usually the case.

This transition matrix is essential because this is the key element we use to compute customer lifetime value.

After computing the transition matrix, the next step is:

Estimating how customers will evolve

The next step of the process is to estimate how many customers I have in each segment next year, and the year after, and so on, for as long as I’d like, although many firms will stop after three, five, or ten years at most.

This operation consists of multiplying a matrix by a vector. We multiply the transition matrix by a vector that includes the number of customers in each segment today. The output of that operation will be another vector, that will contain the estimated number of customers in each segment in a year.

If you take that new vector you just obtained, and multiply by the transition matrix one more time, you get an estimate of the number of customers in each segment two years from now, and so on.

Assigning And Discounting Revenue

Now we just need to transform our computations from the last 2 steps into Dollars.

The first step: Assume Continuation of Purchasing Amount:

That is: assume that the revenue generated by a customer can be fully explained and predicted by the segment to which he belongs, whether today or ten years from now; so, if an average customer in a high value segment generates an average of $100, we’ll simply assume that this figure will not change over the years. In reality it might go up or down, but without additional information our best guess is to assume that this figure will remain stable over time.

The second step: Discount:

Of course all $100 are not created equally, for one customer will spend $100 today another will spend the same amount five years from now, the first customer is more valuable, because future revenues are uncertain, distant and not immediate. Therefore, they need to be discounted by taking into account risk and uncertainty factors.

There’s no clear guideline about the kind of discount rate you should apply. But simply: the higher the discount rate, the more short term focused a firm will be. On the other hand, with a discount rate closer to zero, future revenues would not be discounted as much and computations would be much more long-term focused.