Overview

MTA stands for multi-touch attribution, which is the practice of assigning credit for a conversion event to multiple marketing touchpoints that contributed to the event. In other words, it is a way to measure the impact of various marketing channels on the customer journey, taking into account that multiple touchpoints can influence a customer’s decision to convert.

An MTA model is a statistical model that is used to assign credit to each touchpoint in the customer journey based on its contribution to the final conversion event. There are several different types of MTA models, each with its own strengths and weaknesses.

Here we will compare two popular methods–Markov and Heuristic or rules-based models.

Markov Model

Markov MTA (Multi-Touch Attribution) models are a type of statistical model used in marketing to quantify the impact of different channels on customer journeys. They are based on the principles of Markov chains, which model the probability of transitioning from one state to another over time. In the context of marketing, the states represent the different marketing channels that customers may interact with over the course of their journey.

A Markov MTA model calculates the probability of a customer transitioning from one channel to another, given the current channel. This probability is represented in a transition matrix, which can be used to calculate the contribution of each channel to conversions using different attribution models. By analyzing the transition matrix, marketers can gain insight into the most effective marketing channels for driving conversions and optimize their marketing strategies accordingly.

Heuristic Models

There are several main heuristic or rules-based models used for MTA (multi-touch attribution) modeling:

First-touch attribution: This model gives all the credit to the first touchpoint that a customer interacts with in their journey.
Last-touch attribution: This model gives all the credit to the last touchpoint that a customer interacts with in their journey.
Linear attribution: This model evenly distributes credit across all touchpoints in a customer journey.
Time-decay attribution: This model gives more credit to touchpoints that are closer in time to the conversion event.
U-shaped attribution: This model gives the most credit to the first and last touchpoints in a customer journey, with the remaining credit distributed evenly among the intermediate touchpoints.

Environmental Setup

This code loads several R packages into the current R session, making their functions and features available for use in subsequent code.

library(gt): loads the gt package, which provides a way to create high-quality, publication-ready tables in R.
library(purrr): loads the purrr package, which provides a set of functions for working with lists and other types of data structures in a functional programming style.
library(ggplot2): loads the ggplot2 package, which provides a flexible and powerful system for creating data visualizations in R.
library(stringr): loads the stringr package, which provides a set of functions for working with strings and text data in R.
library(ChannelAttribution): loads the ChannelAttribution package, which provides a set of functions for performing multi-touch attribution modeling in R.
library(tidyr): loads the tidyr package, which provides a set of functions for reshaping and transforming data in R.
library(dplyr): loads the dplyr package, which provides a set of functions for manipulating and summarizing data in R.
library(magrittr): loads the magrittr package, which provides a set of functions for creating a more intuitive and readable syntax for complex data manipulation operations using the %>% operator.

library(gt)
library(purrr)
library(ggplot2)
library(stringr)
library(ChannelAttribution)
library(tidyr)
library(dplyr)
library(magrittr)
library(tinytex)

Create Transition Matrix

This code creates a table that describes how customers transition between different marketing channels. The table is represented as a 5x5 grid of numbers, where each column represents a marketing channel and each row represents a possible state that a customer can be in. The states are “Non-Conversion”, “Display”, “TV”, “Search”, and “Conversion”.

The first line of the code sets up the initial values of the table based on some assumptions about how customers move between different marketing channels. For example, if a customer is in the “Non-Conversion” state, the value in the second column of that row represents the probability that they will move to the “Display” state. Similarly, if a customer is in the “TV” state, the value in the third row of that column represents the probability that they will move to the “Search” state.

The second line of the code scales each row of the table so that the probabilities for each row add up to 1. This ensures that the table represents a valid probability distribution.

The third line assigns names to each column of the table, which makes it easier to refer to each column by name.

The fourth line assigns names to each row of the table, which makes it easier to refer to each row by name.

M = t(matrix(c(1,0,0,0,0,
               200,0,300,350,10,
               160,300,0,350,8,
               150,200,200,0,20,
               0,0,0,0,1), nrow = 5))
M = M/rowSums(M)  
colnames(M) = c("Non-Conversion","Display",
                "TV","Search", "Conversion")
row.names(M) = c("Non-Conversion","Display",
                 "TV","Search","Conversion")

Define Function for Path Simulation

This code defines a function that simulates a sequence of events (called a path) based on a given transition matrix and the number of times to simulate the path.

The function starts by creating a vector to store the results of the simulation and randomly chooses a starting point in the transition matrix.

It then loops through the number of times specified to simulate the path, each time generating a new state (or step) based on the probabilities in the current state’s row of the transition matrix. It keeps track of the path by adding each new state to the vector of results.

The loop ends when a stopping condition is met (i.e., the first or last state is reached or the maximum number of steps is reached).

Finally, the function returns the sequence of states (excluding any invalid values).

simulate_path = function(M, num_sim){
  num_sim = num_sim   
  path = vector(mode = "integer", length = num_sim)   
  path[1] = sample(2:(nrow(M)-1), 1) 
  
  for(i in 1:(num_sim-1)){
    p = M[path[i],]
    sn = which(rmultinom(1, 1, p) == 1)
    path[i+1] = sn     
    if(sn == 1 | sn == 5){
      break     
    }   
  }   
  return(path[path > 0]) 
}

Simulate Customer Path Through Channels

This code simulates customer paths through the marketing channels defined in the transition matrix M.

The first two lines of code define the number of simulations and the number of paths, respectively.

The third line uses the map() function from the purrr package to generate num_paths paths, each consisting of num_sim steps, by calling the simulate_path() function. The resulting paths are stored in a list called paths.

The fourth line creates a new list called conversion, which is obtained by mapping over paths and creating a data frame for each path. The conversion data frames indicate whether a given path ends in conversion or not, and are used later to calculate the attribution of conversions to each channel.

The fifth line creates a data frame called pathdf by mapping over paths and creating a data frame for each path. Each row of the pathdf data frame represents a step in a simulated customer path and contains information on the touchpoint and channel at each step.

The next three lines join the pathdf, conversion, and M data frames to add additional information on conversion, channel names, and channel numbers to the pathdf data frame.

Finally, the code displays the first 10 rows of the pathdf data frame using the gt() function from the gt package. The resulting table provides an overview of the simulated customer paths through the marketing channels.

num_sim = 100 
num_paths = 10000  
paths = purrr::map(1:num_paths, ~simulate_path(M, num_sim))  

conversion = purrr::map(paths, ~ data.frame(
  conversion = ifelse(.x[length(.x)] == 5, 
                      "converter", "non-converter"))) %>% 
  bind_rows(.id = "path_num")  

pathdf = map(paths, ~data.frame(touchpoint = 1:length(.x), channel = .x)) %>% 
  bind_rows(.id = "path_num") %>% 
  left_join(conversion) %>% 
  left_join(data.frame(channel_name = colnames(M), channel = 1:5))

## Joining, by = "path_num"
## Joining, by = "channel"

head(pathdf,10) %>% 
  gt() %>%
  gt::tab_header(title = "Simulated Paths")

path_num	touchpoint	channel	conversion	channel_name
Simulated Paths
1	1	3	non-converter	TV
1	2	4	non-converter	Search
1	3	3	non-converter	TV
1	4	2	non-converter	Display
1	5	1	non-converter	Non-Conversion
2	1	3	non-converter	TV
2	2	2	non-converter	Display
2	3	3	non-converter	TV
2	4	2	non-converter	Display
2	5	3	non-converter	TV

Plotting Simulated Paths

The first part of the code creates an interactive line plot of simulated customer paths through the marketing channels. The plot is created using the ggplotly() function from the plotly package. The data for the plot is taken from the pathdf data frame. The touchpoint, channel, and conversion variables are mapped to the x-axis, y-axis, and color of the plot, respectively. Each line in the plot represents a single customer path. The color of each line represents whether the path ended in a conversion or not. The plot includes axis labels and uses a minimalistic theme.

The second part of the code calculates the proportion of paths that resulted in a conversion. The table() function takes the conversion variable from the conversion data frame and calculates the frequency of each unique value (i.e., “converter” or “non-converter”). This frequency table is then divided by the total number of rows in the conversion data frame (nrow(conversion)) to obtain the proportion of paths that resulted in a conversion. This provides a summary of the overall effectiveness of the marketing channels.

plotly::ggplotly(
  pathdf %>% 
    ggplot(aes(touchpoint, channel, color = conversion, 
               group = path_num)) + geom_line() +   
    labs(x = "Touchpoint Number", 
         y = "Channel Touchpoint") +  
    theme_minimal() 
)

table(conversion$conversion)/nrow(conversion)

## 
##     converter non-converter 
##        0.0782        0.9218

Summarizing Simulated Paths

This code is used to analyze simulated customer paths through the marketing channels, and summarize them in a table format.

The first line groups the simulated paths by the path number, splits the groups into separate data frames, and then extracts the names of the channels for each path.

The second line removes the “Non-Conversion” and “Conversion” states from each path, resulting in a list that contains only the marketing channels the customer interacted with.

The third line creates a new data frame by concatenating the names of the channels in each path with the conversion status of the path. The resulting data frame is named journeydf, where each row represents a path and the columns show the channels and whether the path resulted in a conversion or not.

The fourth line groups the journeydf data frame by the path column and calculates the number of converters and non-converters for each path.

Finally, the last line displays the top 15 rows of the journeydf data frame as an interactive table, which shows the channels and conversion status for each path. The table is given a title “Simulated Journey Data (Top 15 Converters)”.

named_paths =
  pathdf %>% group_by(path_num) %>% 
  group_split() %>% 
  map(~pull(.x,channel_name)) 
path_trim = 
  map(named_paths, ~.x[.x != "Non-Conversion" & .x != "Conversion"])  
journeydf = as_tibble(
  cbind(
    as.data.frame(do.call(rbind,
                          map(path_trim, ~str_c(.x, collapse = " > "))
    )
    ),conversion$conversion)
) 
names(journeydf) = c("path","conversion")  
journeydf = 
  journeydf %>% 
  group_by(path) %>% 
  summarise(converters = 
              sum(if_else(conversion == "converter",1, 0)),             
            non_converters = 
              sum(if_else(conversion == "non-converter", 1, 0)
              )
  ) %>%  
  arrange(-converters, -non_converters)  
head(journeydf, 15) %>% gt::gt() %>% 
  gt::tab_header(
    title = "Simulated Journey Data (Top 15 Converters)"
  )

path	converters	non_converters
Simulated Journey Data (Top 15 Converters)
Search	76	919
Display	67	774
TV	58	651
Display > Search	29	393
TV > Search	24	409
TV > Display	23	226
Search > Display	22	249
Search > TV	18	242
Display > TV	16	236
Search > TV > Display	13	95
Search > TV > Search	12	129
Display > Search > TV	11	107
TV > Display > Search	10	122
TV > Search > TV	10	104
Display > TV > Search	9	142

Calculate Metrics and Generate Histogram

This code does some calculations and generates two visualizations to help better understand the simulated customer journeys.

The first part of the code creates a table with 5 rows and 2 columns. The “metric” column shows the names of the metrics calculated, which are the total number of converters, the total number of non-converters, the total number of paths, the conversion rate as a percentage, and the number of unique journeys. The “value” column contains the corresponding values of these metrics, which are calculated using the journeydf data frame. The table is printed as an interactive table using the gt() function and is titled “Summary”.

The second part of the code calculates the length of each path in the path_trim list using the map_int() function from the purrr package. The result is a vector of integers representing the length of each path.

The third part of the code generates a histogram of the path_lengths using the hist() function in R. The histogram shows the distribution of path lengths across all simulated customer journeys. The x-axis represents the length of the path and the y-axis represents the frequency of paths with that length. The title of the histogram is “Path Length Distribution”.

data.frame(
  metric = c("Converters",
             "Non Converters",
             "Number of Paths",
             "Conversion Rate %",
             "Unique Journeys"),
  value = c(sum(journeydf$converters),
            sum(journeydf$non_converters),
            sum(journeydf$non_converters) + sum(journeydf$converters),
            100*sum(journeydf$converters) / num_paths,
            nrow(journeydf))) %>% 
  gt::gt() %>%
  gt::tab_header(title = "Summary")

metric	value
Summary
Converters	782.00
Non Converters	9218.00
Number of Paths	10000.00
Conversion Rate %	7.82
Unique Journeys	1484.00

path_lengths = map_int(path_trim, ~length(.x)) 
summary(path_lengths)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.000   3.000   3.954   5.000  33.000

hist(path_lengths, main = "Path Length Distribution")

Calculate Transition Matrix

This code calculates the transition matrix for the customer journeys in the journeydf data frame. The transition matrix is used to calculate the probability of transitioning from one channel to another channel given that the previous channel was the starting point.

The transition_matrix() function from the ChannelAttribution package is used to calculate the transition matrix. The var_path argument is set to “path” to specify the column in journeydf that contains the path of channels for each customer journey. The var_conv and var_null arguments specify the columns in journeydf that contain the number of converters and non-converters for each journey, respectively.

The resulting transition matrix is saved in an object called tM, which contains the transition matrix as well as some other information. The second line of code extracts the transition matrix from tM using the $ operator and pipes it to the gt::gt() function to print it as an interactive table using the gt package. The resulting table shows the probability of transitioning from each channel to every other channel, given that the previous channel was the starting point. The title of the table is “Transition Probabilities”.

tM = transition_matrix(journeydf,
                       var_path = "path",
                       var_conv = "converters",
                       var_null = "non_converters")  
tM$transition_matrix %>% gt::gt() %>% 
  gt::tab_header(title = "Transition Probabilities")

channel_from	channel_to	transition_probability
Transition Probabilities
(start)	1	0.32910000
(start)	2	0.33890000
(start)	3	0.33200000
1	(conversion)	0.02187791
1	(null)	0.27350979
1	2	0.34846855
1	3	0.35614375
2	(conversion)	0.01993925
2	(null)	0.22797726
2	1	0.40384765
2	3	0.34823584
3	(conversion)	0.01732518
3	(null)	0.19426152
3	1	0.42842584
3	2	0.35998746

Create Markov Model

This code uses a function called markov_model() from a package called ChannelAttribution to create a Markov model based on the simulated customer journey data in the journeydf dataset.

The Markov model is a mathematical model that helps determine the probability of transitioning between states. In this case, the states are the marketing channels that customers can visit on their journey to conversion.

The markov_model() function takes several arguments, including the dataset, the names of the columns that contain the path of channels, and the number of converters and non-converters, and the out_more argument is set to TRUE to return additional information about the Markov model.

The resulting object, mm_res, contains the Markov model and other relevant information. The next line of code extracts the result component of mm_res, which contains the transition matrix and other information about the Markov model. Then, it pipes the result to the gt::gt() function to create an interactive table using the gt package. The resulting table shows the probability of transitioning from each channel to every other channel, given that the previous channel was the starting point, based on the Markov model.

mm_res = markov_model(journeydf,
                      var_path = "path",
                      var_conv = "converters",
                      var_null = "non_converters",
                      out_more = TRUE)

## 
## Number of simulations: 100000 - Convergence reached: 3.25% < 5.00%
## 
## Percentage of simulated paths that successfully end before maximum number of steps (34) is reached: 99.99%

mm_res$result %>% gt::gt()

channel_name	total_conversions
Search	271.7461
Display	256.4744
TV	253.7794

Calculate Conversions & Removal Effects

This code is calculating the attributed conversions for each marketing channel based on the Markov model generated in the previous code block.

The first line of the code calculates the total number of conversions by summing the converters column of the journeydf data frame.

The second line of the code extracts the removal_effects component from the mm_res object, which contains the effect of removing each channel on the total number of conversions. The removal_effects object is a vector containing the removal effect of each channel.

The third line of the code calculates the relative removal effects of each channel by dividing the removal_effects vector by the sum of its values.

The fourth line of the code calculates the attributed conversions for each channel by multiplying the total number of conversions by the relative removal effects.

The fifth line of the code prints the attributed conversions vector, which contains the attributed conversions for each marketing channel.

The sixth line of the code prints the removal_effects vector, which contains the removal effect of each marketing channel.

markov_coversions = sum(journeydf$converters)
removal_effects =  mm_res$removal_effects$removal_effects
relative_removal_effects = removal_effects/sum(removal_effects)
attributed_conversions = markov_coversions * relative_removal_effects
attributed_conversions

## [1] 271.7461 256.4744 253.7794

removal_effects

## [1] 0.7469136 0.7049383 0.6975309

Create Heuristic Models for Attribution

This code uses the heuristic_models() function from the ChannelAttribution package to calculate attribution of conversions to marketing channels. The input data is the journeydf dataset which contains information about simulated customer journeys. The var_path argument specifies the column name in journeydf that contains the path of channels and the var_conv argument specifies the column name in journeydf that contains the number of converters.

The heuristic_models() function generates several common attribution models based on the input data, such as the first-touch, last-touch, and U-shaped models. The resulting attribution data is saved to the mta_res object.

The next line of code uses the left_join() function from the dplyr package to join the mta_res data frame with the transition matrix information from the Markov model stored in mm_res. This resulting data frame includes the attribution data from the heuristic models and the transition matrix information from the Markov model.

Finally, the resulting data frame is printed as an interactive table using the gt::gt() function from the gt package.

mta_res <- heuristic_models(journeydf, 
                 var_path = "path", 
                 var_conv = "converters") %>% 
  left_join(mm_res$result) %>% 
  gt::gt()

## Joining, by = "channel_name"

Visualization for Comparison

To summarize, this code generates a bar chart that compares the performance of different attribution (heuristic) models in assigning credit to marketing channels. Each bar in the chart represents a marketing channel, and the height of the bar indicates the number of conversions attributed to that channel by each of the MTA models. The models are plotted on the x-axis and are color-coded for each channel. The chart is given a title “MTA Model Result Comparison” and a minimalistic theme.

mta_res %>% as.data.frame() %>%
  pivot_longer(-channel_name, 
               names_to = "MTA_Model", values_to = "Conversions") %>% 
  ggplot(aes(MTA_Model, Conversions, fill = channel_name)) + 
  geom_col(position = "dodge") +
  labs(title = "MTA Model Result Comparison")+
theme_minimal()

Conclusion

The attributed conversions from the Markov model are represented by the “total_conversions” columns which you will notice matches up to the numbers for each channel under Create Markov Model. The first three sets of columns are the different heuristic models used. The take away here is clearly the method by which MTA models are calculated can have a dramatic effect on findings. Perhaps the best example of this is the difference in conversions attributed to Search where the first-touch heuristic model drops the conversions by almost 10%.

Multi-Touch Attribution Modeling

DW Richardson

2023-02-15