Halo Effect vs. Cannibalization Effect
author: Soheil (Sol) Sadeghi date: 6-17-2015 width: 1400 height: 700 autosize: true
Problem Overview
Promotion on specific item(s) for a specific period
- Example: 10% off on peanut butter in the first week of March
This promotion might have effect on other items as well:
- Halo Effects: increase in sale
- Cannibalization Effect: decrease in sales
Goal: Detect the items with possible halo/cannibalization effect!
Probabilistic View of Classification
- H: Halo Set
- C: Cannibalization Set
N: Neutral Set
Different Perceptions:
- Classification: each item belongs to one set
- Probabilissim: each item has a probability of belonging to any sets
Probabilisim to Ranking to Classification
Estimate the probability of each item belonging to each set
Rank all the items based on some criterion
The top items belong to H, the bottom items belong to C, the rest belongs to N
Questions
How to estimate the probabilities?
What criterion should be used to rank the items?
What thresholding should be used to seperate items?
Model Overview
Item level model (data points are items)
For each item, define: Response variable and features
Each promotion is considered as a query
Item ranking is developed within queries
- Each query has its own ordered list of items
Response Variable
Within each query, for each item
S: Total sale for the promotion period
ES: Expected sale based on historical data
S/ES: Normalized sale
- (S-ES)/ES is eventually the same concept
- ES/S is an alternative
Response Variable: 3 lables based on the values of normalized sale
- (5, 1, 0) for normalized sale: halo is of main interest
- (5, 1, 0) for reverse normalized sale: cannibalization is of main interest
Motivation for Features
- Consider 4 different baskets
- Basket 1: peanut butter, nutella, bread, milk
- Basket 2: bread, jerry, butter, milk
- Basket 3: cheese, nuts, tea, bread
- Basket 4: cheese, peanut butter, bread, milk
- Peanut butter matched with bread/milk twice, nutella/cheese once
- It has four common items with bread/milk, two common items with the rest
- (match,common) belongs to the set of {(2,4), (1,2), (0,2)}
- match variable has direct effect on halo effect but indirect effect on cannibalization effect (through common variable)
Motivation for Features - continued
(normalized.match, normalized.common) for variables are as follows:
- Nutella: (1 , 2/3)
- Milk: (2/3 , 2/3)
- Bread: (1/2 , 1/2)
- Cheese: (1/2 , 2/5)
- Jerry: (0 , 2/3)
- Butter: (0 , 2/3)
- Nuts: (0 , 2/3)
- Tea: (0 , 2/3)
Features - continued
For each query:
- time
- time-related features
- promotion-related features
- promoted item-related features
- ?
Ranking Approach
Considering the reverse normalized score and defining the response as follow:
- 5: The item with cannibalization effect
- 1: The item with neutral effect
- 0: The item with halo effect
The lables are motivated by the NDCG criterion which is used by Bing search engine
What is Normalized Discounted cumulative Gain (NDCG)?
For an ordered query \[
\small
DCG_j = \sum_{i=1}^{n_j} \frac{2^{Y_i} - 1}{\log(i+1)}
\]
IDCG: Ideal DCG for a given set of query
\[
\small
NDCG_j = \frac{DCG_j}{IDCG_j}
\]
and Final NDCG for the whole dataset
\[
\small
NDCG_F = \frac{1}{N_Q} \sum_{j=1}^{N_Q} NDCG_j
\]
Motivation for NDCG using Classification Learning
- Perfect classifications naturally result in perfect DCG scores
- DCG errors are bounded by classification errors
- Even an improvement of 1% NDCG is often considered significant
BUT
- DCG criterion is non-convex and non-smooth
Classification to Ranking to Classification
- First learn the class probabilities by some soft classification algorithm
- I am using the Boosting Tree Algorithm for learning class probabilities
- Then score each data point according to the Expected Relevence
\[
S_i = \sum_{k=0,1,5} T(k) \hat{p}_{i,k}
\]
T(k) a monotone (increasing) function - k or 2^k
Within each quey, rank the items based on S
Top items belong to C, bottom items belong to H, and the rest belongs to N
Stochastic Gradient Tree Boost Algorithm
- \(F_0(x) = \arg \min_{\rho} \sum_{i=1}^{N} L(y_i, \rho)\)
- \(\text{For m = 1 to M do:}\)
- \(\{\pi(i)\}_1^N = \text{rand-perm} \{i\}_1^N\)
- \(\tilde{y}_{\pi(i)m} = - \biggl[ \frac{\partial L(y_{\pi(i)}, F(x_{\pi(i)}))}{\partial F(x_{\pi(i)})} \biggr]_{F(x)=F_{m-1}(x)} ; i=1,\ldots,\tilde{N}\)
- \(\{R_{lm}\}_{1}^{L} = \text{L - terminal node tree}(\{\tilde{y}_{\pi(i)m}, x_{\pi(i)}\}_1^\tilde{N})\)
- \(\rho_m = \arg \min_{\rho} \sum_{x_{\pi(i)} \in R_{lm}} L(y_{\pi(i)}, F_{m-1}(x_{\pi(i)}) + \rho)\)
- \(F_m(x) = F_{m-1}(x) + \nu \ldotp \rho_{lm} 1(x \in R_{lm})\)
- \(\text{End For}\)
- \(\text{End Algorithm}\)
Computational Cost Improvement
Questions?
Thank You