The Multinomial Model

Christopher Weber

Invalid Date

The Nominal Model

These notes follow Long (1997), Chapter 6
Often, dependent variables don’t have a natural ordering
If we have multi-category nominal data, we will again violate the assumptions of the classical linear regression model
In the case of nominal data, we again can use the intuition of logit and probit with binary variables
The Multinomial Logit and Multinomial Probit

An Example

Voting (1=Democrat; 2=Republican; 3=Libertarian)
Run one logit model predicting the probability of Democrat relative to Republican voting
Run a second model predicting Democrat versus Libertarian
Run a third model predicting Republican versus Libertarian

Intuition

Long (1997), Chapter 6
Assume \(y_{obs} \in (R, D, L)\).

\[ln({{pr(D|x)}\over{pr(R|x})}=\beta_{0,D|R}+\beta_{1,D|R}x\] \[ln({{pr(D|x)}\over{pr(L|x})}=\beta_{0,D|L}+\beta_{1,D|L}x\] \[ln({{pr(R|x)}\over{pr(L|x})}=\beta_{0,R|L}+\beta_{1,R|L}x\]

Intuition

\[{{pr(D|x)}\over{pr(R|x})}=exp(\beta_{0,D|R}+\beta_{1,D|R}x)\] \[{{pr(D|x)}\over{pr(L|x})}=exp(\beta_{0,D|L}+\beta_{1,D|L}x)\] \[{{pr(R|x)}\over{pr(L|x})}=exp(\beta_{0,R|L}+\beta_{1,R|L}x)\]

Intuition

However, the sum of the first two equations equals the third equation. We need not estimate each model; it’s redundant (and not identified)
Calculate the probability of being in the \(k\)th category

\[{{pr(y=K|x)}}={{exp(X\beta_{k})}\over {\sum_k exp(X\beta_{k})}}\]

Multiply the above expression by \(\tau\), \(exp(x\tau)/exp(x\tau)\)
The probabilities will stay the same, but \(\beta=\beta+\tau\)

Instead

\[ y_{obs} = \begin{array}{lr} D, 1/(1+\sum_{k=2}^K exp(XB_k))\\ R, exp(XB_{R})/(1+\sum_{k=2}^K exp(XB_k))\\ L, exp(XB_{L})/(1+\sum_{k=2}^K exp(XB_k))\\ \end{array} \]

We estimate \(k-1\) unique equations, where one category serves as the baseline, reference category

The Likelihood

The probability of being in the \(k\)th category for the \(i\)th subject is,

\[ pr(y_{i}=K|x_i) = {exp(XB)}\over{\sum exp(XB)} \] - Calculate the joint parameter space, \(pr(y_{i}=1|X_i)\times pr(y_{i}=2|X_i) \times pr(y_{i}=3|X_i) \times....pr(y_{i}=K|X_i)\)

This is just the joint probability for category membership, for each subject, so

The Likelihood

\[ pr(y_{i}|X_i) = \prod_{k=1}^K {exp(XB)}\over{\sum exp(XB)} \]

\[ pr(y|X) = \prod_{i=1}^N \prod_{k=1}^K { {exp(XB)}\over{\sum exp(XB)} } \]

The Log Likelihood

\[ Loglik(\beta | y, X) = \sum_{i=1}^N \sum_{k=1}^K log[{ {exp(XB)}\over{\sum exp(XB)} } ] \]

Interpretation

With \(k\) categories, there are \(k-1\) unique equations in the multinomial logit model. In other words, if we include 2 covariates and there are 3 categories, we would estimate six parameters
The partial derivative is different at levels of \(x\)

\[{{\partial pr(y=k|x)}\over{\partial x}}=\sum_{j=1}^J \beta_{j,m}pr(y=k|x)\]

Interpretation

The key to understand here is that one category serves as the baseline and we interpret the results of the \(k-1\) categories

\[H_0=\beta_{k,1|r}=\beta_{k,2|r}=....\beta_{k,J|r}\]

Interpretation

Likewise, we may also test the probability of being in the \(k\)th category, given a particular value of \(x\).

\[pr(y=k|x)={{exp(xB_k)}/{\sum_{j=1}^Jexp(xB_k)}}\]

Independence of Irrelevant Alternatives

The multinomial models make a relatively strong assumption about the choice process
It is called the Independence of Irrelevant Alternatives (IIA) assumption
The probability of odds contrasting two choices are unaffected by additional alternatives
McFadden (cited on Long 1997, p. 182) introduces the now classic Red Bus/Blue Bus example

Transportation

The logic…..
Say there are two forms of transportation available in a city: The city bus and driving one’s car.
If an individual is indifferent to these approaches, taking advantage of both about equally, assume that \(p(car)=0.5\) and \(p(bus)=0.5\),
The odds of taking the bus relative to the car is 1:1. The buses in the city are all red

Irrelevance?

The city introduces a bus on this individual’s route
The only difference is that the bus is blue
Because the blue bus is identical (with the exception of the color), the individual probably doesn’t prefer it over the red bus
The only way that IIA holds is if the probability of \(p(car)=0.33, p(Red)=0.33, p(Blue)=0.33\)

Irrelevance?

This doesn’t make much sense; it implies that the individual will ride the bus over driving – the probability of taking is 2/3
Logically, what we should observe is that \(p(drive)=0.5, p(red)=0.25, p(blue)=0.25\). This involves a violation of IIA
The only way for IIA to hold is if the associated probabilities change and \(p(car)=p(red)\)
But we are unlikely to observe this if we logically think about the problem

Tests

The odds of selecting the red bus, relative to the car should be the same regardless of whether blue buses are available
We need to make the IIA in both the multinomial and conditional logit models
Voting (Bush and Clinton 1992)
The assumption holds that the odds (i.e., the coefficients) should be the same in both models. This can be tested by using a “Hausman test”

The Hausman Test

Conceptually, the test involves comparing the full multinomial model to one where outcome categories are dropped from the analysis
The test is distributed \(\chi^2\) and relies on the change in coefficients weighted by the inverse of the variance-covariance matrix of the full and restricted multinomial models
See Long (1997, p 184) for the exact calculation. This is often called a Hausman test, or a Hausman-McFadden test of IIA