MSBA Data Analytics

Introduction (1)

  • Previously, we learned how to use binary variables as regressors (independent variables)
  • But in some cases we might be interested in learning how entity characteristics influence a binary dependent variable
  • For example, we might be interested in studying whether there is racial discrimination in the provision of loans
    • We are interested in comparing individuals who are identify with different races, but are otherwise identical
    • It is not sufficient to compare average loan denial rates

Introduction (2)

We will consider two forms of regression to analyze such situations

  1. Linear Probability Models, using OLS to do multiple regression analysis with a binary dependent variable
  2. Nonlinear Regression Models, that might be a better fit of such binary models

The Math of Latent Dependent Variables

In economics, we believe people choose to do things that makes them better off.

That is, they maximize utility.

Let \(y*\) represent a person’s utility. We do not get to observe \(y*\).

Instead, we observe \(y=1\) if \(y*>0\) and \(y=0\) otherwise.

The Math of Latent Dependent Variables

\[y* = \beta X + \epsilon \] Now assume \(F\) is the cumulative distribution of \(\epsilon\) \[\begin{align*} Prob(y=1) &= Prob(y*>0) \\ &= Prob(\beta X + \epsilon>0) \\ &= Prob(\epsilon>-\beta X) \\ &= 1-Prob(\epsilon<-\beta X) \\ &= 1-F(-\beta X) \end{align*}\]

If F is symmetric about 0, \[\begin{align*} Prob(y=1) &= 1-F(\epsilon<-\beta X) \\ &= F(\beta X) \end{align*}\]

Examples of Binary Dependent Variables

  • The provision of a mortgage loan
  • The decision to smoke/not smoke
  • The decision to go to college or not
  • If a country receives foreign aid or not

Racial Discrimination Mortgage Loans

  • In this chapter we are interested in studying whether there is racial discrimination in the provision of mortgage loans.
  • Data compiled by researchers at the Boston Fed under the Home Mortgage Disclosure Act (HMDA)
  • The dependent variable of this example is a binary variable equal
    • 1 if an individual is denied
    • 0 otherwise

Effect of Payment-to-Income Ration

Using a subset of the data on mortgages \(n=127\)

Interpreting the OLS Regression (1)

  • Looking at the plot we see that when \(P/I~ratio = 0.3\), the predicted value \(\widehat{deny} = 0.2\).
  • What does it mean to predict a binary variable with a continuous value?
  • Using a probability linear model, we interpret this as predicting that someone with such a P/I ratio would be denied a loan with a probability of 20%.

Interpreting the OLS Regression (2)

  • Recall that the predicted value of an OLS regression is \[E[Y|X_1, \dots, X_k] = \beta_0 + \beta_1 X_{1i} + \dots + \beta_k X_{ki}\]

  • Recall that for a binary random (Bernoulli) variable \(Y\) \[\begin{align*} E[Y] &= Pr(Y=0)\times 0 + Pr(Y=1)\times 1 \\ &= Pr(Y=1) \end{align*}\]

  • In a regression context \[E[Y|X_1,\dots,X_k] = Pr(Y=1|X_1,\dots, X_k)\]

The Linear Probability Model

The linear probability model is

\[Y_i = \beta_0 + \beta_1 X_{1i} + \dots + \beta_k X_{ki} + u_i\]

and therefore

\[Pr(Y = 1|X_1,\dots, X_k) = \beta_0 + \beta_1 X_{1} + \dots + \beta_k X_{k}\]

  • \(\beta_1\) is the change in the probability that \(Y=1\) associated with a unit change in \(X_1\).

R Squared in a LPM

  • A model with a continuous dependent variable one can imagine the possibility of getting \(R^2 = 1\), when all the data lines up on the regression line.
  • This would be impossible if we had a binary dependent variable, unless the regressors are also all binary.
  • Therefore, \(R^2\) from a LPM regression does not have a useful interpretation.

Application to the Boston HMDA Data

suppressMessages(library("AER"))
data(HMDA)
fm1 <- lm(I(as.numeric(deny) - 1) ~ pirat, data = HMDA)

coeftest(fm1, vcov.=vcovHAC(fm1))
## 
## t test of coefficients:
## 
##              Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) -0.079910   0.032287 -2.4750   0.01339 *  
## pirat        0.603535   0.102917  5.8643 5.138e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Application to the Boston HMDA Data

fm2 <- lm(I(as.numeric(deny) - 1) ~ pirat + afam, data = HMDA)

coeftest(fm2, vcov.=vcovHAC(fm2))
## 
## t test of coefficients:
## 
##              Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) -0.090514   0.029433 -3.0753  0.002127 ** 
## pirat        0.559195   0.091569  6.1068 1.184e-09 ***
## afamyes      0.177428   0.037471  4.7351 2.318e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Introduction to Non-Linear Probability Model

  • Since the fit in a linear probability model could be nonsensical, we consider two alternative nonlinear regression models

  • Since cumulative probability distribution functions (CDFs) produce functions from 0 to 1, we use them to model \(Pr(Y=1|X_1,\dots,X_k)\)

  • We use two types of nonlinear models

    1. Probit regressions, which uses the CDF of the standard normal
    2. Logit regression, uses a “logistic” CDF

Probit Regression

The Probit regression model with a single regressor is

\[Pr(Y=1|X) = \Phi(\beta_0 + \beta_1 X)\]

where \(\Phi\) is the CDF of the standard normal distribution

Example

  • Consider the mortgage example, regression loan denial on the P/I ratio
  • Suppose that \(\beta_0 = -2\) and \(\beta_1 = 3\)
  • What is the probability of being denied a loan is \(P/I~ratio = 0.4\)?

\[\begin{align*} \Phi(\beta_0 + \beta_1 P/I~ratio) &= \Phi(-2 + 3\times P/I~ratio)\\ &= \Phi(-0.8) \\ &= Pr(Z \leq -0.8) = 21.2\% \end{align*}\]where \(Z \sim N(0,1)\)

Interpreting the Coefficient (1)

\[Pr(Y=1|X) = \Phi(\underbrace{\beta_0 + \beta_1 X}_{z})\]

  • \(\beta_1\) is the change in the \(z\)-value associated with a unit change in \(X\).
  • If \(\beta_1 > 0\), an increase in \(X\) would lead to an increase in the \(z\)-value and in turn the probability of \(Y=1\)
  • If \(\beta_1 < 0\), an increase in \(X\) would lead to a decrease in the \(z\)-value and in turn the probability of \(Y=1\)

Interpreting the Coefficient (2)

  • While, the effect of \(X\) on the \(z\)-value is linear, its effect on \(Pr(Y=1)\) is nonlinear

Probit Model Graph

Multiple Regressor Probit

\[Pr(Y=1|X_1, X_2) = \Phi(\beta_0 + \beta_1 X_1 + \beta_2 X_2)\]

  • Once again the parameters \(\beta_1\) and \(\beta_2\) represent the linear effect of a unit change in \(X_1\) and \(X_2\), respectively, on the \(z\)-value.
  • For example, suppose \(\beta_0 = -1.6\), \(\beta_1 = 2\), and \(\beta_2 = 0.5\). If \(X_1 = 0.4\) and \(X_2 = 1\), the probability that \(Y=1\) would be \(\Phi(-0.3) = 38\%\).

General Probit Model

\[\begin{equation*} Pr(Y=1|X_1, X_2,\dots, X_k) = \\ \\ \Phi(\underbrace{\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_k X_k}_z) \end{equation*}\]To calculate the effect of a change in a regressor (e.g. from \(X_1\) to \(X_1 + \Delta X_1\)) on the \(Pr(Y=1|X_1,\dots,X_k)\), subtract \[\Phi(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_k X_k)\]from \[\Phi(\beta_0 + \beta_1 (X_1 + \Delta X_1) + \beta_2 X_2 + \dots + \beta_k X_k)\]

Application to Mortgage Data (1)

fm3 <- glm(deny ~ pirat, family = binomial(link = "probit"), data = HMDA)
summary(fm3)
## 
## Call:
## glm(formula = deny ~ pirat, family = binomial(link = "probit"), 
##     data = HMDA)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.4140  -0.5281  -0.4750  -0.3900   2.8159  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -2.1941     0.1378 -15.927  < 2e-16 ***
## pirat         2.9679     0.3858   7.694 1.43e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1744.2  on 2379  degrees of freedom
## Residual deviance: 1663.6  on 2378  degrees of freedom
## AIC: 1667.6
## 
## Number of Fisher Scoring iterations: 6

Application to Mortgage Data (2)

fm4 <- glm(deny ~ pirat + afam, family = binomial(link = "probit"), data = HMDA)
summary(fm4)
## 
## Call:
## glm(formula = deny ~ pirat + afam, family = binomial(link = "probit"), 
##     data = HMDA)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1208  -0.4762  -0.4251  -0.3550   2.8799  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -2.25879    0.13669 -16.525  < 2e-16 ***
## pirat        2.74178    0.38047   7.206 5.75e-13 ***
## afamyes      0.70816    0.08335   8.496  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1744.2  on 2379  degrees of freedom
## Residual deviance: 1594.3  on 2377  degrees of freedom
## AIC: 1600.3
## 
## Number of Fisher Scoring iterations: 5

Logit Regression

\[\begin{equation*} Pr(Y = 1|X_1, \dots, X_k) =\\ F(\beta_0 + \beta_1 X_1 + \dots + \beta_k X_k) =\\ \frac{1}{1+\exp(\beta_0 + \beta_1 X_1 + \dots + \beta_k X_k)} \end{equation*}\]

It is similar to the probit model, except that we use the CDF for a standard logistic distribution, instead of the CDF for a standard normal.

Probit vs Logit Regression Models

Application to Mortgage Data (3)

fm5 <- glm(deny ~ pirat + afam, family = binomial(link = "logit"), data = HMDA)
summary(fm5)
## 
## Call:
## glm(formula = deny ~ pirat + afam, family = binomial(link = "logit"), 
##     data = HMDA)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3709  -0.4732  -0.4219  -0.3556   2.8038  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -4.1256     0.2684 -15.370  < 2e-16 ***
## pirat         5.3704     0.7283   7.374 1.66e-13 ***
## afamyes       1.2728     0.1462   8.706  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1744.2  on 2379  degrees of freedom
## Residual deviance: 1591.4  on 2377  degrees of freedom
## AIC: 1597.4
## 
## Number of Fisher Scoring iterations: 5

Marginal Effects (1)

While it is straightforward to perform hypothesis testing on the \(\beta\)’s of a non-linear model, the interpretation of these coefficients are difficult.

Instead, we have a preference for marginal effects.

Marginal Effects (2)

To find the marginal effects, we would need to take the derivative of the probability function and then find the expected value of the derivative.

To perform this task by hand is very difficult. Instead, we will use a package called mfx

Application to Mortgage Data (4)

suppressMessages(library("mfx"))
fm6 <- probitmfx(deny ~ pirat, data = HMDA)
fm7 <- probitmfx(deny ~ pirat + afam, data = HMDA)
fm8 <- logitmfx(deny ~ pirat + afam,  data = HMDA)
texreg::htmlreg(list(fm3,fm6,fm2,fm7), 
                custom.model.names = c("Probit","Marginal Effects Probit",
                "Linear Probability","Marginal Effects Probit"),
                center = TRUE, caption = "")

Application to Mortgage Data (4)

<!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.01 Transitional//EN” “http://www.w3.org/TR/html4/loose.dtd”>
Probit Marginal Effects Probit Linear Probability Marginal Effects Probit
(Intercept) -2.19*** -0.09***
(0.14) (0.02)
pirat 2.97*** 0.57*** 0.56*** 0.50***
(0.39) (0.07) (0.06) (0.07)
afamyes 0.18*** 0.17***
(0.02) (0.02)
AIC 1667.58 1667.58 1600.27
BIC 1679.13 1679.13 1617.60
Log Likelihood -831.79 -831.79 -797.14
Deviance 1663.58 1663.58 1594.27
Num. obs. 2380 2380 2380 2380
R2 0.08
Adj. R2 0.08
RMSE 0.31
p < 0.001, p < 0.01, p < 0.05

Advantages and Disadvantages

Model Type Advantages Disadvantages
LPM Can use fixed effects and easy to interpret Can predict outside 0 and 1
Probit Prob. bounded between 0 and 1. You can’t use fixed effects and suffers from incidental parameter problem. Coefficients are also hard to interpret.
Logit Prob. bounded between 0 and 1. Can use one-way fixed effects (conditional logit) Suffers Incidental parameter problem. Coefficients are hard to interpret.

Multinomial Logit

Models with Multiple Choices

Examples of multinomial choice (polytomous) situations:

  1. Choice of a laundry detergent: Tide, Cheer, Arm & Hammer, Wisk, etc.
  2. Choice of a major: economics, marketing, management, finance or accounting.
  3. Choices after graduating from high school: not going to college, going to a private 4-year college, a public 4 year-college, or a 2-year college.

Firms also have such multinomial choices

  1. In which country to operate
  2. Where to locate a store
  3. Which CEO to hire

Multinomial Logit

The explanatory variable \(x_i\) is individual specific, but does not change across alternatives. Example age of the individual.

The dependent variable is nominal

Properties of multinomial choice situations:

  1. It is key that there are more than 2 choices

  2. It is key that there is no meaningful ordering to them. Otherwise we would want to use that information (with an ordered probit or ordered logit)

Multinomial Chocie Probabilities

The probability that y is equal to choice \(i\).

\[P_i=\frac{exp(\beta_{0i}+\beta_{1i}X_i)}{\sum_{k=1}^{K} exp(\beta_{0k}+\beta_{1k}X_k)}\] The probability that y is equal to choice \(j\).

\[P_j=\frac{exp(\beta_{0j}+\beta_{1j}X_j)}{\sum_{k=1}^{K} exp(\beta_{0k}+\beta_{1k}X_k)}\] Relative probability choices \[P_i/P_j = \frac{exp(\beta_{0i}+\beta_{1i}X_i)}{exp(\beta_{0j}+\beta_{1j}X_j)}\]

Relative Probabilities

We can only identify relative probabilities for each choice.

Similar to our discussion of dummy variables, we need to model our choices as relative to a base.

We set the base by forcing one of the choices to have \(\beta\)’s equal to zero.

If we do this for choice \(j\), then the relative probabilities can be expressed as \[P_i/P_j = exp(\beta_{0i}+\beta_{1i}X_i)\]

IIA Property

  • There is the implicit assumption in logit models that the odds between any pair of alternatives is independent of irrelevant alternatives (IIA)

One way to state the assumption

  • If choice A is preferred to choice B out of the choice set {A,B}, then introducing a third alternative X, thus expanding that choice set to {A,B,X}, must not make B preferable to A.

  • which kind of makes sense.

IIA Property

There is the implicit assumption in logit models that the odds between any pair of alternatives is independent of irrelevant alternatives (IIA)

In the case of the multinomial logit model, the IIA implies that adding another alternative or changing the characteristics of a third alternative must not affect the relative odds between the two alternatives considered.

This is not realistic for many real life applications involving similar (substitute) alternatives.

Red Bus/Blue Bus (McFadden 1974).

Imagine commuters first face a decision between two modes of transportation: cars and red bus

Suppose that a consumer chooses between these two options with equal probability, 0.5, so that the odds ratio equals 1.

Now add a third mode, blue bus. Assuming bus commuters do not care about the color of the bus (they are perfect substitutes), consumers are expected to choose between bus and car still with equal probability, so the probability of car is still 0.5, while the probabilities of each of the two bus types should go down to 0.25

However, this violates IIA: for the odds ratio between car and red bus to be preserved, the new probabilities must be: car 0.33; red bus 0.33; blue bus 0.33

The IIA axiom does not mix well with perfect substitutes.

Alternatives to Multionomial Logit

The advantage of Multinomial Logit (and Logit for that matter) is that the probabilities have a closed form solution (i.e. a simple equation)

An alternative is to use multinomial probit.

Advantage: NO IIA property!

Disadvantage: Computationally intensive once the number of choices is greater than 3.

Multinomial Logit Example

A relatively common R function that fits multinomial logit models is multinom from package nnet.

Let us use the dataset nels_small for an example of how multinom works.

The variable grades in this dataset is an index, with best grades represented by lower values of grade .

We try to explain the choice of a secondary institution ( psechoice ) only by the high school grade.

Categorical Dependent Variable

The variable pschoice can take one of three values:

Variable Code
psechoice=1 no college,
psechoice=2 two year college
psechoice=3 four year college

R Code for Multinomial Logit

stargazer::stargazer(nels.multinom, type="html")
Dependent variable:
2 3
(1) (2)
grades -0.309*** -0.706***
(0.052) (0.053)
Constant 2.505*** 5.770***
(0.418) (0.404)
Akaike Inf. Crit. 1,758.626 1,758.626
Note: p<0.1; p<0.05; p<0.01

Interpreting Output

The output from the multinom function gives coefficient estimates for each level of the response variable psechoice , except for the first level, which is the benchmark.

We treat the dependent choice variables similar to dummy variables. If there are G options, then we can only identify parameters associated with G-1 options.

As in the probit and logit model, we can only identify relative differences.

That is, how much the probability of choosing option A over B increases or decreases

Making Predictions

Suppose we wanted to know the probabilities for the median student and for the student in the top 5%.

medGrades <- median(nels_small$grades)
fifthPercentileGrades <- quantile(nels_small$grades, .05)
newdat <- data.frame(grades=c(medGrades, fifthPercentileGrades))
pred <- predict(nels.multinom, newdat, "probs")
knitr::kable(pred, digits = 2)
1 2 3
0.18 0.29 0.53
5% 0.02 0.10 0.89

We can clearly see the high performing student is more likely to attend college.

Conditional Logit

Difference between Conditional and Multinomial Logit

Multinomial logit models a choice as a function of the chooser’s characteristics, whereas conditional logit models the choice as a function of the choice’s characteristics.

It’s really that simple! Note that the two can be combined.

Conditional Logit is a special case of Multinomial Logit

Difference between Conditional and Multinomial Logit

Multinomial logit \[U_{ij}=X_{i}\beta_j+e_{ij}\] Conditional logit \[U_{ij}=X_{ij}\beta+e_{ij}\]

Notice, the difference in the subscripts. Conditional logit does not estimate different \(\beta\)’s for each choice. Instead, there is one set of parameters, but the characteristics \(X\) changes with each product.

Important differences

Characteristics CANNOT vary only by individual. If they do then they will fall out.

\[Pr(Y_i=j)=\frac{exp(\beta_1 X_{ij}+\beta_2 Z_i)}{\sum_{k=1}^{K}exp(\beta_1 X_{ik}+\beta_2 Z_i) }\]

You can never estimate \(\beta_2\) in this case.

Proof

\[\begin{align*} Pr(Y_i=j) &=\frac{exp(\beta_1 X_{ij}+\beta_2 Z_i)}{\sum_{k=1}^{K}exp(\beta_1 X_{ik}+\beta_2 Z_i) } \\ \\ &=\frac{exp(\beta_1 X_{ij})exp(\beta_2 Z_i)}{\sum_{k=1}^{K}exp(\beta_1 X_{ik})exp(\beta_2 Z_i) } \\ \\ &=\frac{exp(\beta_1 X_{ij})exp(\beta_2 Z_i)}{exp(\beta_2 Z_i) \sum_{k=1}^{K}exp(\beta_1 X_{ik}) } \\ \\ &=\frac{exp(\beta_1 X_{ij})}{ \sum_{k=1}^{K}exp(\beta_1 X_{ik}) }\end{align*}\]

Fixed Effect Logit

The conditional logit model can effectively removes certain fixed effects.

Again, consider our previous example. \[\begin{align*}Pr(Y_i=j) &=\frac{exp(\beta_1 X_{ij}+\beta_2 Z_i)}{\sum_{k=1}^{K}exp(\beta_1 X_{ik}+\beta_2 Z_i) } \\ \\ &=\frac{exp(\beta_1 X_{ij}+\alpha_i)}{\sum_{k=1}^{K}exp(\beta_1 X_{ik}+\alpha_i) }\end{align*}\]

Warning: As N grows large, the maximum likelihood routine is not well defined. This problem is called the incidental parameter problem and present in all non-linear models. In R, you can use the package bife to reduce this problem.

Mixed Logit

A mixed logit combines Conditional logit and multinomial logit

If you allow for random coefficients (unobserved heterogeneity also known as random effects), then the mixed logit model can overcome the IIA property.

Conditional/Multinomial/Mixed Logit in R

We will use the mlogit package in R

library(mlogit)
data("Fishing", package = "mlogit")
Fish <- mlogit.data(Fishing, varying = c(2:9), shape = "wide", choice = "mode")
## a pure "conditional" model
m1<-mlogit(mode ~ price + catch, data = Fish)
## a pure "multinomial model"
m2<-mlogit(mode ~ 0 | income, data = Fish)
## which can also be estimated using multinom (package nnet)
#library("nnet")
#summary(multinom(mode ~ income, data = Fishing))
## a "mixed" model
m <- mlogit(mode ~ price+ catch | income, data = Fish)
stargazer::stargazer(m1,m2,m,type="html")

Examples of Mixed Logit in R

Dependent variable:
Conditional
(1) (2) (3)
boat:(intercept) 0.871*** 0.739*** 0.527**
(0.114) (0.197) (0.223)
charter:(intercept) 1.499*** 1.341*** 1.694***
(0.133) (0.195) (0.224)
pier:(intercept) 0.307*** 0.814*** 0.778***
(0.115) (0.229) (0.220)
price -0.025*** -0.025***
(0.002) (0.002)
catch 0.377*** 0.358***
(0.110) (0.110)
boat:income 0.0001** 0.0001*
(0.00004) (0.0001)
charter:income -0.00003 -0.00003
(0.00004) (0.0001)
pier:income -0.0001*** -0.0001**
(0.0001) (0.0001)
Observations 1,182 1,182 1,182
R2 0.178 0.014 0.189
Log Likelihood -1,230.784 -1,477.151 -1,215.138
LR Test 533.878*** (df = 5) 41.145*** (df = 6) 565.171*** (df = 8)
Note: p<0.1; p<0.05; p<0.01

Summary

  • Multinomial choice models are used when the dependent variable represents a choice between several options
  • Multinomial probit and multinomial logit are the most popular multinomial choice models
  • If choice is between J + 1 options, both will have J equations
  • Conditional logit is a special case of multinomial logit with only 1 equation
  • Maximum likelihood estimation is the most common way of estimating all these models
  • Explanatory variables can be characteristics of the options or the individuals (or both) and can lead to different models (so pay careful attention
  • Care must be taken with interpretation of coefficients/marginal effects

Demand Estimation with aggregate market shares

Micro versus Market level data

One of the advantages of having microdata is that you can estimate demand directly from individual consumer choices.

But often times, we do not get to observe individual transaction data.

Instead, we observe market level data where we know the market share for a certain product and product characteristics.

Another problem: price endogeneity?

Note: in deriving all these examples, an implicit assumption is that the distribution of the \(\epsilon_{ij}\)’s are independent of the prices. This is analogous to assuming that prices are exogenous.

Case study: Trajtenberg (1989) study of demand for CAT scanners. Disturbing finding: coefficient on price is positive, implying that people prefer more expensive machines!

Possible explanation: quality differentials across products not adequately controlled for. In differentiated product markets, where each product is valued on the basis of its characteristics, brands with highly-desired characteristics (higher quality) may command higher prices. If any of these characteristics are not observed, and hence not controlled for, we can have endogeneity problems. \(E[p\epsilon] \ne 0\)

Estimation with aggregate market shares

Next we consider how to estimate demand functions in the presence of price endogeneity, and when the researcher only has access to aggregate market shares.

This summarizes findings from Berry (1994).

Data

Our data for a particular market looks like this

j \(\widehat{s}_{mj}\) \(p_j\) \(X_{1j}\) \(X_{2j}\)
A 25% $1.50 red large
B 35% $2.00 blue small
C 45% $2.50 green large

Total Market size = \(M\) Total number of brands = \(J\)

We want to use these data to estimate demand for different products using differences in market share and characteristics across different brands and different markets.

Model

Let a consumer’s utility function be \[U_{ijm}=X_{jm}\beta-\alpha p_{jm}+\xi_{jm}+\epsilon_{ijm}\] where i indexes consumer, j indexes product, and m indexes market.

Let \(\delta_{jm}=X_{jm}\beta-\alpha p_{jm}+\xi_{jm}\)

Econometrician observes neither \(\xi_{jm}\) or \(\epsilon_{ij}\), but household i observes both.

You can think of \(\xi_{jm}\) as a product specific unobserved quality shock. It is easy to see that high quality products (high \(\xi\)) would imply consumers are willing to pay more for the good so price would be higher as well. \(E[\xi p \neq 0]\)

The other error is an idiosyncratic shock, which we will assume is distributed Type I Extreme Value across all consumers, brands, and markets.

Probabilities

Give our assumption of the idiosyncratic error, we can write the choice probabilities as a conditional logit probability.

\[Pr(y_{ijm}=j)=\frac{exp(\delta_{jm})}{\sum_{k=1}^J exp(\delta_{km})}\]

Predicted market shares

We can’t estimate this probability directly, but we do observe the actual market shares. So we can transform this probability into a predicted market share

That is, if we knew the values of the \(\delta\)’s, then we could construct the actual market shares.

Predicted market share

\[\widetilde{s}_{jm}=Pr(y_{ijm}=j)=\frac{exp(\delta_{jm})}{\sum_{k=1}^J exp(\delta_{km})}\]

We need to normalize one product such that \(\delta_{0m}=0\)

\[\widetilde{s}_{0m}=Pr(y_{ijm}=0)=\frac{1}{1+\sum_k exp(\delta_{km})}\]

Transform shares

We can then use these predicted shares to make a linear equation by taking logs

\[log(\widetilde{s}_{jm})-log(\widetilde{s}_{0m})=\delta_{jm}=X_{jm}\beta-\alpha p_{jm}+\xi_{jm}\] Construct our new dependent variable

Since we actually observe the market shares of each product in each market, we can construct our actual values of \(\delta_{jm}\).

Let \(s_{jm}\) be the actual market share of product j in market m.

Then the actual mean utility is \(\widehat{\delta}_{jm}=log({s}_{jm})-log({s}_{0m})\)

Objective Function

We calculate this for every market. Our objective function becomes \[\begin{align*}E[\xi Z]&=\frac{1}{MM}\frac{1}{J}\sum_{m=1}^M \sum_{j=1}^J \xi_{jm}Z_{jm} \\ \\ &=\frac{1}{MM}\frac{1}{J}\sum_{m=1}^M \sum_{j=1}^J [\widehat{\delta}_{jm}-X_{jm}\beta-\alpha p_{jm}]Z_{jm}\end{align*}\]

Where Z is a set of instrumental variables and MM is the total number of markets.

Basically, we reduce the whole problem down to an instrumentals variable problem.

What are appropriate instruments

  • Cost shifters (prices of raw materials or workers; industry dependent)
  • Characteristics of competitors’ products
  • If panel, the price of the same good, but in other markets

Berry 1994 in R: Data

blpdata<-hdm::BLP$BLP #BLP Data
blpZ<-hdm::BLP$Z #Instrumental Variables
blpZ<-as.data.frame(blpZ)
blp.var.names=c("model name","model id","firm id","cdid",
                "log price","miles per gallon","miles per dollar",
                "horse power per weight","air conditioning",
                "size of the car","market share","outside option share",
                "log share j - log share 0","time trend")
ols.1<-lm(y~price+hpwt+space+mpg+air+mpd+factor(model.id), data=blpdata)
iv.1<-ivreg(y~price+hpwt+space+mpg+air+mpd+factor(model.id)|hpwt+space+mpg
            +air+mpd+factor(model.id)+blpZ$sum.other.1+blpZ$sum.other.hpwt
            +blpZ$sum.other.air+blpZ$sum.other.mpd
            +blpZ$sum.other.space+blpZ$sum.rival.1+blpZ$sum.rival.hpwt
            +blpZ$sum.rival.air+blpZ$sum.rival.mpd+blpZ$sum.rival.space, data=blpdata)

Berry 1994 in R: Results

Dependent variable:
OLS instrumental
variable
(1) (2)
price -0.085*** -0.197***
(0.009) (0.021)
Horse power per wgt 0.065 0.754***
(0.246) (0.285)
size of car 0.945*** 0.952***
(0.197) (0.207)
miles per gallon 0.016 0.004
(0.063) (0.067)
air conditioning 0.165** 0.457***
(0.067) (0.087)
miles per dollar -0.018 0.018
(0.040) (0.043)
Constant -1.030** -1.734***
(0.419) (0.457)
Observations 2,217 2,217
R2 0.874 0.861
Adjusted R2 0.831 0.814
Residual Std. Error (df = 1654) 0.567 0.596
F Statistic 20.446*** (df = 562; 1654)
Note: p<0.1; p<0.05; p<0.01