Aggregate random coefficients logit

The business problem

Technical problem 1: aggregate data only

Aggregate data only

Ideally we'd want data for individual sales/decision-makers
Data on competitors is normally only aggregated to total sales/market shares

If each individual \(i\) has information \(\mathcal{I}\) about a product which maps to their choice through their own decision function \(\text{choice}_{i}()\)

\[ E\left[\sum_{i}\Delta \text{choice}_{i}(\mathcal{I}) |\, \Delta \mathcal{I}\right] \] does not necessarily equal

\[ E\left[\Delta \text{Aggregate choice}(\mathcal{I}) | \Delta \mathcal{I} \right] \]

Technical problem 2: endogeneity in price-setting

Endogeneity in price-setting

Price-setters take unobserved product information into account. But this information also drives sales.

Technical problem 3: strategy in price-setting

Strategy in price setting

Actions (product offering, pricing choice, strategic partnerships) invite responses from competitors.

All observations come from an equilibrium in which this is the case

If you don't take account of this your predictions will be biased.

The aggregate random coefficients logit model

The generative model at the level of the individual

The random coefficients logit model (also known as "mixed logit") is

\[ U_{itj} = X_{jt}\beta_{i} + \xi_{jt} + \epsilon_{itj} \text{ with } \epsilon_{itj} \sim \text{Gumbel}() \]

\(\beta_{i}\) are individual slope coefficients on observed product attributes \(X_jt\), \(\xi_{jt}\) is each product's demand shock, assumed to be known to decisionmakers but unobserved by us.

The decision-maker \(i\) in market \(t\) simply chooses the product \(j\) from \(J + 1\) products that gives them the greatest utility \(U_{itj}\)

From utilities to choice probabilities

Because we've made the assumption that \(\epsilon_{itj}\) are IID Gumbel, it follows that for \(U_{it} = (U_{i1t}, \dots, U_{iJt})'\)

\[ p_{ijt}(\beta_{i}, \xi_{jt}) = \text{Prob}(U_{ijt} = \max(U_{it})) = \frac{\exp(X_{jt}\beta_{i} + \xi_{jt})}{1 + \sum_{j=1}^{J}{\exp(X_{jt}\beta_{i} + \xi_{jt}})} \] Where the 1 in the denominator comes from the \(J+1\)'th "outside good" mean utility of 0.

See Luce and Suppes (1965) and McFadden (1974).

What have we done?

We've gone from a set of unknowns \(\beta_{i},\, \xi_{jt}\) and observed data \(X_{jt}\) to choice probabilities at the individual level.
We now need to aggregate across individuals.

From choice probabilities to aggregate sales

\[ \text{Market sales}_{t} \sim \text{Multinomial}(\text{shares}_{t}, \text{Market size}_{t}) \] for \(p_{it} = (p{i1t}, \dots, p_{iJt})'\)

\[ \text{shares}_{t}(\xi_{t}) = \int_{i} p_{it}(\beta_{i}, \xi_{jt})p(\beta_{i})d\beta_i \]

So in order to aggregate up to market-sales, all we need to do is take the average of the individual-level choice probabilities across individuals.

Price-setting

Remember that \(\xi \rightarrow U\) and \(\xi \rightarrow \text{price}\).

We can't implement \(\xi_{jt}\) as a straight random effect
- There are as many \(\xi_{jt}\) as there are observations
- \(\xi_{jt}\) is correlated with price, which is in \(X_{jt}\).

First idea (Betancourt and Gelman)

Just treat it as a latent factor a la a SEM.

\[ \text{price}_{jt} \sim \text{Normal}_{+}(X_{jt}\gamma + Z_{jt}\delta + \lambda \xi_{jt}, \sigma) \]

Reduced-form supply side is clearly ridiculous

A simple structural model

A very simple "structural" supply model might be to assume that each producer in each market maximizes within-period profits.

Idea:

Assume firms are profit-maximizing (choose price so that (price - costs)*predicted sales are maximized)
Take derivative and set to 0, rearrange in terms of price
Model the costs function with product characteristics \(X\) and cost-shifting instruments \(Z\)

\[ \text{price}_{jt} \sim \text{Normal}_{+}\left(\alpha + X_{jt-\text{price}} \gamma + Z_{jt} \delta - \frac{s_{j}(\theta, X_{t})}{\frac{\partial s_{j}(\theta, X_{t})}{\partial \text{price}_{jt}}}, \sigma_{p}\right) \]

Implementing in Stan

Idea (from Berry Levinsohn and Pakes):

Draw a matrix of \(\text{Normal}(0,1)\) "data" \(\eta_{I\times P}\). Each row symbolises a simulated decisionmaker. Each column corresponds to a column of \(X\).
Simulate multi-normal \(\beta_{i} = \bar{\beta} + L\eta\) where \(L\) is the cholesky factor of the covariance matrix.
This gives I "fake" decision-makers. Average their choice probabilities.

Implementing in Stan

vector get_shares(vector delta, matrix x, matrix eta, matrix L) {
  matrix[rows(eta), rows(delta)] utils;
  matrix[rows(eta), rows(delta)] probs;
  vector[rows(delta)] shares;
    
  for(i in 1:rows(eta)) {
    utils[i] = delta' + eta[i] * L * x';
    probs[i] = softmax(utils[i]')';
  }
  
  for(i in 1:cols(probs)) {
    shares[i] = mean(probs[:,i]);
  }
  
  return(shares);
}

Parameter recapture exercise

Want to get in touch?

Twitter @jim_savage

Email james@lendable.io