The Latent Class Multinomial Logit Model

The behavioral model

In this section some basics of the LC-MNL are presented. For a deeper review of the LC-MNL model, see Hess (2014), Bujosa, Riera, and Hicks (2010), and Wedel and Kamakura (2012). Consider the following random utility specification: \[\begin{equation*} U_{ijt}^* = \mathbf x_{ijt}^\top \boldsymbol{\beta}_i+\epsilon_{ijt}, \;\;\;i = 1,...,N; j=1,..., J, t = 1,...,T_i, \end{equation*}\]

where $U_{ijt}^*$ is the latent indirect utility for individual $i$ when choosing alternative $j$ in choice situation $t$; $\mathbf x_{ijt}$ is a $K\times 1$ vector of observed alternative attributes; $\epsilon_{ijt}$ is the idiosyncratic taste shock, and is i.i.d. Type 1 Extreme Value; the parameter vector $\boldsymbol \beta_i$ is unobserved for each $i$ and is assumed to vary in the population following some distribution $g(\cdot)$. Different assumptions about $g(\cdot)$ gives rise to different Logit models.

Let $y_{ijt}=1$ if individual $i$ chooses $j$ on occasion $t$, and 0 otherwise. Given a specific value of the preference parameters $\boldsymbol{\beta}_i = \boldsymbol{\beta}_q$, the conditional joint density of choices made by consumer $i$ is:

\[\begin{equation*} f(\mathbf y_i|\mathbf X_i, \boldsymbol{\beta}_q) = \prod_{t=1}^{T_i}\prod_{j=1}^J\left[\frac{\exp\left( \mathbf x_{ijt}^\top\boldsymbol{\beta}_q\right)}{\sum_{j=1}^J\exp\left(\mathbf x_{ijt}^\top\boldsymbol{\beta}_q\right)}\right]^{y_{ijt}}, \end{equation*}\]

where $\mathbf y_i = (y_{i1}, y_{i2},..., y_{iT_i})$ is a vector that collects all choices made by individual $i$. The unconditional joint density of individual choices depends on the assumptions of the mixing distribution.

Unlike Mixed Logit Model (MIXL), where $g(\cdot)$ is parametric and continuous, the LC-MNL assumes that preferences are distributed following a discrete distribution (Kamakura and Russell 1989). For some comparison between the LC-MNL and MIXL model see Greene and Hensher (2003); Shen (2009); and Keane and Wasi (2013). Unobserved preference heterogeneity is then accommodated in LC-MNL by a discrete number $Q$ of separate (and unobserved) classes or segments of individuals with different values for the preference parameters within each class. Subsets of parameters can be constrained to be the same across classes. Note that individuals in each segment share homogeneous preferences (parameters are fixed within a class), but heterogeneity in preferences exists across classes. Formally, the population distribution of the parameters is specified as:

\[\begin{equation}\label{eq:discrete_prob} g(\boldsymbol{\beta}_i|\boldsymbol{\gamma}) = \begin{cases} \boldsymbol\beta_{1} & \mbox{with probability $w_{i1}(\boldsymbol\gamma)$} \\ \boldsymbol\beta_{2} & \mbox{with probability $w_{i2}(\boldsymbol\gamma)$} \\ \vdots & \vdots \\ \boldsymbol\beta_Q & \mbox{with probability $w_{iQ}(\boldsymbol\gamma)$} \end{cases}, \end{equation}\]

where individual $i$ belongs to class $q$ with probability $w_{iq}$ $(q = 1, ..., Q)$, such that $\sum_{q}w_{iq}=1$ and $w_{iq}> 0$; $\mathbf\gamma = (\boldsymbol{\gamma}_1,\dots,\boldsymbol{\gamma}_Q)$ is the set of parameters that describe the stochastic assignment to classes. This discrete mixing distribution (or class assignment/membership probability) is unknown to the analyst (as is the number of classes). Given $Q$, the most widely used formulation for $w_{iq}$ is the semiparametric multinomial Logit (Shen 2009; Greene and Hensher 2003):

\[\begin{equation*} w_{iq}(\boldsymbol\gamma) = \frac{\exp\left(\mathbf h_i^\top\boldsymbol \gamma_q\right)}{\sum_{q=1}^Q\exp\left(\mathbf h_i^\top\boldsymbol\gamma_q\right)};\;\;q=1,...,Q,\;\boldsymbol\gamma_1=\boldsymbol{0}, \end{equation*}\]

where $\mathbf h_i$ denotes a vector of socio-economic characteristics that determine assignment to classes. The parameters of the first class are normalized to zero for identification of the model. In fact, gmnl function uses this normalization. Note that one could omit any socio-economic covariate as a determinant of the class assignment probability. Under this scenario of constant assignment, the class probabilities simply become constants of the form:

\[\begin{equation}\label{eq:class_prob_str} w_{iq}(\boldsymbol\gamma) = \frac{\exp\left(\gamma_q\right)}{\sum_{q=1}^Q\exp\left(\gamma_q\right)};\;\;q=1,...,Q,\;\gamma_1=0, \end{equation}\]

where $\gamma_q$ $(q = 1,..., Q)$ is a set of constants used to compute class probabilities (Scarpa and Thiene 2005).

The unconditional probability of the sequence of choices made by individual $i$ is given by: \[\begin{equation*} f(\mathbf y_i| \mathbf X_i, \boldsymbol\theta) = \sum_{q = 1}^Qw_{iq}(\boldsymbol\gamma_q)\left\lbrace\prod_{t=1}^{T_i}\prod_{j=1}^J\left[\frac{\exp\left( \mathbf x_{ijt}^\top\boldsymbol\beta_q\right)}{\sum_{j=1}^J\exp\left(\mathbf x_{ijt}^\top\boldsymbol\beta_q\right)}\right]^{y_{ijt}}\right\rbrace. \end{equation*}\]

where $\boldsymbol\theta = (\boldsymbol\gamma , \boldsymbol\beta)$ is the vector of parameters of interest at the population level such that $\boldsymbol\beta=(\boldsymbol\beta_1,\dots,\boldsymbol\beta_Q)$ and $\boldsymbol\gamma=(\boldsymbol\gamma_1,\dots,\boldsymbol\gamma_Q)$.

Since this probability does not require integration, the estimation for the sample of consumers can be undertaken using the standard maximum likelihood estimator (MLE). However, for a large number of classes quasi-newton methods may exhibit convergence problems; for these situations, the iterative Expectation-Maximization (EM) algorithm (Bhat 1997, Train (2008)) can be implemented to retrieve maximum likelihood estimates. The current version of gmnl package only allows the MLE. I hope we could add the EM algorithm soon.

Estimation using gmnl

We first load the required packages:

rm(list = ls(all = TRUE))   # Clean objects
library("gmnl")             # Load gmnl package
library("mlogit")           # Load mlogit package

For this example we will use the ‘Electricity’ dataset from mlogit package. This is a stated preference data for the choice of electricity suppliers used by Revelt and Train (2000) and Train (2009). For more information about this dataset type help(Electricity).

In this sated preference data, individuals where presented with 8-12 hypothetical choice situations. In each choice occasion, the customer was presented with four alternative suppliers with different prices and other characteristics. The attributes are the following:

pf: fixed price at a stated cent per kilowatt-hour (c/kWh), with the price varying over suppliers and experiments.
cl: this attribute corresponds to the length of contract that the supplier offered (for example 1 year to 5 years), during which the rates were guaranteed and the customer would be required a penalty to switch to another supplier. The supplier could offer no contract in which case either side could stop the agreement at any time. This is recorded as a contract length of 0.
loc: whether the supplier is a local company.
wk: whether the supplier was a well-known company.
tod: whether the supplier charged time-of-day (TOD) rates (specified prices in each period). the price is 11 (c/kWh) from 8am to 8pm and 5 (c/kWh) from 8pm to 8am. These TOD prices did not vary over suppliers or experiments: whenever the supplier was said to offer TOD, the prices were stated as above.
seas: seasonal rate under which the price is 10 (c/kWh) in the summer, 8 (c/kWh) in the winter and 5 (c/kWh) from 8pm to 8am. Like TOD, these prices did not vary.

In the following codes, we load the dataset and put it into the required format using mlogit.data function:

# Load data and put it into the required format
data("Electricity", package = "mlogit")
Electr <- mlogit.data(Electricity, 
                      id = "id", 
                      choice = "choice", 
                      varying = 3:26, 
                      shape = "wide", 
                      sep = "")

A LC-MNL with three classes $Q = 3$ is estimated in gmnl (Sarrias and Daziano 2017a) as follows:

# Estimate a LC-MNL model with 3 classes
lc <- gmnl(choice ~ pf + cl + loc + wk + tod + seas | 0 | 0 | 0 | 1 , 
                   data = Electr,
                   model = 'lc', 
                   Q = 3, 
                   panel = TRUE,
                   method = "bhhh")

## Estimating LC model

It is important to note that the user needs to specify at least a constant in the fifth part of the formula when estimating a LC-MNL model. Thus, by including 1 we are estimating a model where the logit class-assignment specification is:

\[\begin{equation*} w_{iq}(\boldsymbol\gamma) = \frac{\exp\left(\gamma_q\right)}{\sum_{q=1}^Q\exp\left(\gamma_q\right)};\;\;q=1,2,3,\;\gamma_1=0, \end{equation*}\]

and the model will estimate $\gamma_2$ and $\gamma_3$. If the class assignment is also determined by socio-economic characteristics, those covariates should also be included in the fifth part. It is important to highlight that only time-invariant socioeconomic variables can be included for the class-assignment specification. If the dataset contains individual-specific characteristics that vary across choice situations (or time), then the first observation will be considered.

# Result
summary(lc)

## 
## Model estimated on: Wed Nov 29 23:04:02 2017 
## 
## Call:
## gmnl(formula = choice ~ pf + cl + loc + wk + tod + seas | 0 | 
##     0 | 0 | 1, data = Electr, model = "lc", Q = 3, panel = TRUE, 
##     method = "bhhh")
## 
## Frequencies of categories:
## 
##       1       2       3       4 
## 0.22702 0.26393 0.23816 0.27089 
## 
## The estimation took: 0h:0m:4s 
## 
## Coefficients:
##               Estimate Std. Error  z-value  Pr(>|z|)    
## class.1.pf   -0.436786   0.049953  -8.7439 < 2.2e-16 ***
## class.1.cl   -0.024613   0.014068  -1.7496 0.0801890 .  
## class.1.loc   2.510545   0.100253  25.0421 < 2.2e-16 ***
## class.1.wk    1.649299   0.092876  17.7580 < 2.2e-16 ***
## class.1.tod  -2.726110   0.396319  -6.8786 6.046e-12 ***
## class.1.seas -3.720022   0.399087  -9.3213 < 2.2e-16 ***
## class.2.pf   -0.711346   0.068973 -10.3134 < 2.2e-16 ***
## class.2.cl   -0.534724   0.033816 -15.8129 < 2.2e-16 ***
## class.2.loc   0.632174   0.121524   5.2021 1.971e-07 ***
## class.2.wk    0.558454   0.105730   5.2819 1.279e-07 ***
## class.2.tod  -5.994899   0.539367 -11.1147 < 2.2e-16 ***
## class.2.seas -5.855283   0.553464 -10.5793 < 2.2e-16 ***
## class.3.pf   -0.785066   0.039993 -19.6299 < 2.2e-16 ***
## class.3.cl   -0.055608   0.015272  -3.6412 0.0002714 ***
## class.3.loc   1.559850   0.110696  14.0913 < 2.2e-16 ***
## class.3.wk    1.219109   0.089710  13.5894 < 2.2e-16 ***
## class.3.tod  -8.866931   0.350940 -25.2662 < 2.2e-16 ***
## class.3.seas -8.332032   0.340697 -24.4558 < 2.2e-16 ***
## (class)2     -0.515807   0.045713 -11.2835 < 2.2e-16 ***
## (class)3      0.152700   0.036940   4.1337 3.570e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Optimization of log-likelihood by BHHH maximisation
## Log Likelihood: -4338.4
## Number of observations: 4308
## Number of iterations: 44
## Exit of MLE: successive function values within tolerance limit

The output shows the estimated parameter for each attribute in each class. For example, class.1.pf and class.2.pf represent the sensitivity respect to price for class 1 and 2, respectively. The results are very consistent across classes in terms of sign: costumers in any class dislike having a higher price, longer contracts, they avoid season rates and TOD.

The R file lc_helpers.R contains several functions for analyzing results from a LC-MNL model. I would like to note that these functions are not tested yet. I hope to add them as soon as possible to gmnl package. These functions can be downloaded here and load into R as follows:

# Load script with additional functions
source('lc_helpers.R')

To compare the sensitivities for each attribute across classes, we will use the function plot_ci_lc. This function plots the point estimates and their 95% confidence interval for each variable. The command syntax is the following:

# Plot coefficients and 95% CI
plot_ci_lc(lc)

From the results we can observe that class 3 shows a high sensitivity to price, TOD and season rates as compared with the other classes. Thus, it seems that the pattern of tastes of class 3 is associated to those individuals more sensitive to prices in general. Class 2 is representative of those individuals that are more sensitive to contract length, less sensitive to whether the company is local and moderate sensitivity to prices. Finally people in class 1 do not exhibit any preference for contract length, are more sensitive to whether the firm is local or well-known, and less sensitive to prices relative to other classes. So, pattern of tastes of class 1 is associated to those individuals more sensitive to suppliers’ characteristics than prices.

The user can also specify the variables to be plotted by typing for example:

# Plot some coefficients
plot_ci_lc(lc, var = c("pf", "cl"))

The share of individuals in each class can be computed using the Logit formula. Recall that the estimates (class)2 and (class)3 represent $\gamma_2$ and $\gamma_3$, respectively. So, for example, we can compute the share of individuals in the second class as:

# Share for class 2
exp(coef(lc)["(class)2"]) / (exp(0) + exp(coef(lc)["(class)2"]) + exp(coef(lc)["(class)3"]))

##  (class)2 
## 0.2161549

The function ‘shares’ allows you to get the shares of individuals in each class in a more user-friendly way

# Using shares function
shares(lc)

## share q=1 share q=2 share q=3 
## 0.3620573 0.2161549 0.4217877

Thus, for example, class 3 which is representative of individuals more sensitive to prices represents approximately the 42% of the sample.

Other measures

If we are interested in the willingness-to-pay (WTP) for each attribute, we can compute the ratio between the coefficient for each attribute and the price coefficient (for an application see Scarpa and Thiene (2005)). In particular, the WTP for attribute $j$ in class $q$ is:

\[\begin{equation} \widehat{wtp}_{qj} = - \frac{\widehat{\beta}_j}{\widehat{\beta}_{pf, q}} \end{equation}\]

and it represents the willingness to pay for a unit change in the attribute $j$ for those individuals in class $q$. For example, the WTP for contract length in each class is given by:

# WTP for each class
 -coef(lc)["class.1.cl"] / coef(lc)["class.1.pf"]

##  class.1.cl 
## -0.05635118

 -coef(lc)["class.2.cl"] / coef(lc)["class.2.pf"]

## class.2.cl 
## -0.7517079

 -coef(lc)["class.3.cl"] / coef(lc)["class.3.pf"]

##  class.3.cl 
## -0.07083191

Thus, the WTP ranges from -0.057 in the first class to -0.752 for the second class. It should be noted that in each class contract length is viewed as a negative attribute, so individuals are willing to pay to reduce the length of the contract. Thus, individuals belonging to the second class are willing to pay 0.752 c/kWh extra to have a contract that is one year shorter.

We could also compute the weighted average of the WTP defined as: \[\begin{equation} \bar{\widehat{wtp}}_{j} = \sum_{q = 1}^3 \widehat{wtp}_{qj}\widehat{w}_q \end{equation}\]

This represents the average WTP for the whole sample. Using the estimates from the LC-MNL model, we can compute this as follows:

# Average WTP for cl
wtp_bar <- (-coef(lc)["class.1.cl"] / coef(lc)["class.1.pf"]) * shares(lc)[1] + 
           (-coef(lc)["class.2.cl"] / coef(lc)["class.2.pf"]) * shares(lc)[2] + 
          (-coef(lc)["class.3.cl"] / coef(lc)["class.3.pf"]) * shares(lc)[3] 
wtp_bar

## class.1.cl 
## -0.2127637

The result implies that, in average, the individuals in the sample are willing to pay 0.213 c/kWh extra to have a contract that is one year shorter.

Scarpa and Thiene (2005) also proposes a measure of relative diversity across WTP values of classes:

\[\begin{equation} \widehat{\xi}_j = \frac{\sum_{q = 1}^3 \left| \widehat{wtp}_{qj} - \bar{\widehat{wtp}}_{j}\right|}{\left|\bar{\widehat{wtp}}_{j}\right|} \end{equation}\]

According to Scarpa and Thiene (2005) this measure can be interpreted as a measure of preference intensity dispersion across classes.

# Intensity dispersion for lc
xi <- (abs((-coef(lc)["class.1.cl"] / coef(lc)["class.1.pf"]) - wtp_bar) +
      abs((-coef(lc)["class.2.cl"] / coef(lc)["class.2.pf"]) - wtp_bar) + 
      abs((-coef(lc)["class.3.cl"] / coef(lc)["class.3.pf"]) - wtp_bar)) / abs(wtp_bar)
xi

## class.1.cl 
##   3.935297

Individual-Specific Estimates

Conditional parameters at the individual level

The discrete distribution of the parameter give us a general profile of how individuals’ tastes are distributed in the population. However, we sometimes would also like to know where each individual lies in this distribution. To fixed some ideas, note that the the unconditional probability of the sequence of choices of consumer $i$ is:

\[\begin{equation*} f(\boldsymbol y_i| \boldsymbol X_i, \boldsymbol \theta) = \sum_{q = 1}^Qf(\boldsymbol y_i|\boldsymbol X_i, \boldsymbol \beta_q) g(\boldsymbol \beta_i|\boldsymbol \gamma), \end{equation*}\]

Using Bayes’ theorem it is possible to obtain the following expression:

\[\begin{equation*} f(\boldsymbol\beta_i| \boldsymbol y_i, \boldsymbol X_i, \boldsymbol\theta) = \frac{f(\boldsymbol y_i|\boldsymbol X_i, \boldsymbol\beta_q) g(\boldsymbol\beta_i| \boldsymbol\gamma)}{f(\boldsymbol y_i| \boldsymbol X_i, \boldsymbol \theta)}= \frac{f(\boldsymbol y_i| \boldsymbol X_i, \boldsymbol\beta_q) g(\boldsymbol\beta_i| \boldsymbol\gamma)}{\sum_{q = 1}^Qf(\boldsymbol y_i|\boldsymbol X_i, \boldsymbol\beta_q) g(\boldsymbol\beta_i| \boldsymbol\gamma)}. \end{equation*}\]

This equation represents the posterior distribution of the individual part-worths given $\boldsymbol\theta$. Note that whereas $g(\boldsymbol\beta_i| \boldsymbol\gamma)$ is the unconditional distribution of preferences in the population, the posterior $f(\boldsymbol\beta_i| \boldsymbol y_i, \boldsymbol X_i, \boldsymbol\theta)$ is the conditional distribution of the individual parameter $\boldsymbol\beta_i$ –conditional on the sequence of choices $\boldsymbol y_i$ when facing a design matrix of attributes $\boldsymbol X_i$ (i.e. conditional on the observed data) and on the parameters of the distribution of preferences in the population $\boldsymbol \theta$.

If we take the expectation of $f(\boldsymbol\beta_i| \boldsymbol y_i, \boldsymbol X_i, \boldsymbol\theta)$, we obtain the most expected value of the parameter for individuals $i$. Thus, the population conditional expectation of $\boldsymbol \beta_i$ (also known as the posterior mean) is:

\[\begin{equation}\label{eq:population_expected_beta} \bar{\boldsymbol\beta}_i = E\left[\boldsymbol\beta_i| \boldsymbol y_i, \boldsymbol X_i, \boldsymbol \theta\right] = \frac{\sum_{q = 1}^Q\boldsymbol \beta_q f(\boldsymbol y_i|\boldsymbol X_i, \boldsymbol \beta_q) g(\boldsymbol\beta_i| \boldsymbol\gamma)}{\sum_{q = 1}^Qf(\boldsymbol y_i|\boldsymbol X_i, \boldsymbol\beta_q) g(\boldsymbol \beta_i| \boldsymbol \gamma)}. \end{equation}\]

These conditional expectations of the preference parameters $\boldsymbol\beta_i$ specific to consumer $i$ are generally different from the mean $\boldsymbol \beta$ of the unconditional distribution $g(\boldsymbol \beta_i| \boldsymbol\theta)$. An estimator for the conditional expectation in the LC-MNL framework is given by:

\[\begin{equation}\label{eq:estimate_expected_beta_1} \widehat{\bar{\boldsymbol \beta_i}} = \widehat{E}\left[\boldsymbol\beta_i| \boldsymbol y_i, \boldsymbol X_i, \widehat{\boldsymbol\theta}\right] = \frac{\sum_{q = 1}^Q \widehat{\boldsymbol\beta}_{q}\widehat{w}_{iq}(\boldsymbol\gamma)\prod_{t = 1}^{T_i}\prod_{j=1}^J\left[\frac{\exp\left( \boldsymbol x_{ijt}^\top\widehat{\boldsymbol\beta}_q\right)}{\sum_{j=1}^J\exp\left(\boldsymbol x_{ijt}^\top\widehat{\boldsymbol \beta}_q\right)}\right]^{y_{ijt}}}{\sum_{q = 1}^Q \widehat{w}_{iq}(\boldsymbol\gamma) \prod_{t = 1}^{T_i}\prod_{j=1}^J\left[\frac{\exp\left( \boldsymbol x_{ijt}^\top\widehat{\boldsymbol \beta}_q\right)}{\sum_{j=1}^J\exp\left(\boldsymbol x_{ijt}^\top\widehat{\boldsymbol \beta}_q\right)}\right]^{y_{ijt}}}. \end{equation}\]

Similarly, an estimator of the posterior membership probability is (Kamakura and Russell 1989; Greene and Hensher 2003):

\[\begin{equation*}\label{eq:conditional_prob} \widehat{\pi}_{iq}(\boldsymbol\beta_i| \boldsymbol y_i, \boldsymbol X_i, \boldsymbol \theta)= \frac{\widehat{w}_{iq}(\boldsymbol\gamma)\prod_{t = 1}^{T_i}\prod_{j=1}^J\left[\frac{\exp\left( \boldsymbol x_{ijt}^\top\widehat{\boldsymbol\beta}_q\right)}{\sum_{j=1}^J\exp\left(\boldsymbol x_{ijt}^\top\widehat{\boldsymbol \beta}_q\right)}\right]^{y_{ijt}}}{\sum_{q = 1}^Q \widehat{w}_{iq}(\boldsymbol\gamma) \prod_{t = 1}^{T_i}\prod_{j=1}^J\left[\frac{\exp\left(\boldsymbol x_{ijt}^\top\widehat{\boldsymbol \beta}_q\right)}{\sum_{j=1}^J\exp\left(\boldsymbol x_{ijt}^\top\widehat{\boldsymbol \beta}_q\right)}\right]^{y_{ijt}}}, \end{equation*}\]

which gives the probability for individual $i$ belonging to class $q$ given observed choices. An empirical strategy for assigning individuals to specific segments is to use the class with the highest posterior $\widehat{\pi}_{iq}(\boldsymbol\beta_i| \boldsymbol y_i, \boldsymbol X_i, \boldsymbol \theta)$ (DeSarbo, Ramaswamy, and Cohen 1995).

In sum, individual parameters in general $\boldsymbol \beta_i$ are realizations of a random process with conditional distribution $f(\boldsymbol \beta_i| \boldsymbol y_i, \boldsymbol X_i, \boldsymbol \theta)$ –which is specific to the consumer– and unconditional distribution $g(\boldsymbol\beta_i| \boldsymbol\gamma)$. The conditional point estimator $\widehat{\bar{\boldsymbol\beta}}_i$ (posterior mean) is just an estimator of the expected value of the conditional distribution $f(\boldsymbol\beta_i| \boldsymbol y_i, \boldsymbol X_i, \boldsymbol \theta)$ (posterior distribution of the individual part-worths). Note that the LC-MML conditional point estimates can be written as an explicit function of the posterior membership probabilities:

\[\begin{equation}\label{eq:posterior_mean} \widehat{\bar{\boldsymbol\beta}}_i = \sum_{q=1}^Q \widehat{\boldsymbol\beta}_q\widehat{\pi}_{iq}(\boldsymbol\beta_i|\boldsymbol y_i, \boldsymbol X_i, \boldsymbol\theta), \end{equation}\]

which can be easily implemented to derive estimates of the conditional expectation of individual parameters.

Obtaining individual-specific estimates using gmnl

gmnl allows the user to get the conditional estimates for each individual in the sample. As an illustration we will get the individuals’ conditional means for the TOD parameter using the function effect.gmnl:

# Get individuals' estimates for TOD
bi_tod <- effect.gmnl(lc, par = "tod", effect = "ce")$mean
summary(bi_tod)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -8.867  -8.836  -5.996  -6.023  -2.823  -2.726

One can also plot the kernel density of the individuals’ conditional mean by typing the following:

# Plotting the distribution of the individuals' estimates
plot(lc, par = "tod", effect = "ce", type = "density", col = "blue")

The conditional probabilities can be obtained by accessing the element Qir from the object lc:

# Conditional probabilitites
pi_hat <- lc$Qir
colnames(pi_hat) <- c("q = 1", "q = 2", "q = 3")
dim(pi_hat)

## [1] 361   3

pi_hat is $n \times Q$ matrix, which gives for each individual the (conditional) probability of belonging to class $q$. In this case, we have 361 individuals and 3 classes.

# Conditional probabilitites for the first  6 individuals
round(head(pi_hat), 4)

##       q = 1  q = 2  q = 3
## [1,] 1.0000 0.0000 0.0000
## [2,] 0.0004 0.0000 0.9996
## [3,] 0.0000 0.0028 0.9972
## [4,] 0.0019 0.9981 0.0001
## [5,] 0.0000 0.0000 1.0000
## [6,] 0.0082 0.0000 0.9918

Note that it is almost 100% sure that the first individual belongs to the first class, based on his sequence of choices, whereas individual 2 is most likely to belong to the third class.

Some warnings

Some caution should be taken when using individual-specific estimates. Sarrias and Daziano (2017b) show for the LC-MNL model that when the number of choice situations ($T$) is low, the individual-specific estimates are inconsistent (along with their standard errors) (For the MIXL logit see Revelt and Train (2000)). The intuition is that traditional asymptotics considers the case when $N \to \infty$, but keeping the number of choice situations $T$ fixed is not sufficient for statistical consistency of the conditional expectation: without new information about the choices made by consumer $i$ one cannot retrieve the true parameter $\boldsymbol\beta_i$. However, if $T$ rises without bound, then $\widehat{\bar{\boldsymbol\beta}}_i$ is a consistent estimator of the true $\boldsymbol\beta_i$. The intuition behind this convergence process is that $f(\boldsymbol\beta_i| \boldsymbol y_i, \boldsymbol X_i, \boldsymbol\theta)$ will tend to move toward $\boldsymbol\beta_i$ as $T$ rises, becoming more concentrated. In fact, as $T\to \infty$, the conditional distribution, and hence its expected value, converges to the true value of $\boldsymbol\beta_i$.

Another important issue is the estimation of the standard errors for the individual-specific estimates. gmnl package uses the conditional variance of the individual-specific estimates for estimating the standard errors. However, Sarrias and Daziano (2017b) show that this estimator is not consistent and hence not reliable. Using a Monte Carlo study, they show that the Krinsky & Robb procedure (Krinsky and Robb 1986, Krinsky and Robb (1990)) gives accurate estimates of the standard errors as the number of choice situations increases. The main problem with the standard errors using the conditional variance is that they converge to 0 as the number of choice situations increases, therefore it tends to produce standard errors that are too small as $T$ increases. This implies that using this procedure, standard errors might give the false impression that almost all individual’s parameters are significant, when in fact it is a misspecification problem. Thus, we do not recommend using the standard errors that gmnl delivers. We hope to include the Krinsky & Robb procedure procedure as soon as possible to the gmnl functionalities.

Including socio-economic charateristics

In this section we will work with the TravelModel dataset from AER package:

data("TravelMode", package = "AER")
TM <- mlogit.data(TravelMode, 
                          choice = "choice", 
                          shape = "long", 
                          alt.levels = c("air", "train", "bus", "car"), 
                          chid.var = "individual")

If we would like to include the variables income, size and a constant into the class probability assignment we should type:

lc2 <- gmnl(choice ~ wait + vcost | 0 | 0 | 0 | income + size, 
                data = TM, 
                model = "lc",
                Q = 2, 
                method = 'bfgs')

## Estimating LC model

summary(lc2)

## 
## Model estimated on: Wed Nov 29 23:04:03 2017 
## 
## Call:
## gmnl(formula = choice ~ wait + vcost | 0 | 0 | 0 | income + size, 
##     data = TM, model = "lc", Q = 2, method = "bfgs")
## 
## Frequencies of categories:
## 
##     air   train     bus     car 
## 0.27619 0.30000 0.14286 0.28095 
## 
## The estimation took: 0h:0m:0s 
## 
## Coefficients:
##                Estimate Std. Error z-value  Pr(>|z|)    
## class.1.wait  -0.085539   0.016936 -5.0507 4.403e-07 ***
## class.1.vcost  0.064883   0.013916  4.6624 3.126e-06 ***
## class.2.wait   0.032227   0.014533  2.2174  0.026593 *  
## class.2.vcost -0.050561   0.016343 -3.0936  0.001977 ** 
## (class)2       1.276352   0.692142  1.8441  0.065174 .  
## income:class2 -0.030496   0.013062 -2.3348  0.019555 *  
## class2:size   -0.741815   0.370211 -2.0038  0.045096 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Optimization of log-likelihood by BFGS maximization
## Log Likelihood: -237.5
## Number of observations: 210
## Number of iterations: 93
## Exit of MLE: successful convergence

The coefficients for the variables affecting the probability assignment should be interpreted with respect to the normalized class, which in this case in the fist class. Thus, individuals with higher income and with larger families are more likely to belong to the first class. Conversely, poorer individuals with smaller families are more likely to belong to the second class.

References

Bhat, C.R. 1997. “An Endogenous Segmentation Mode Choice Model with an Application to Intercity Travel.” Transportation Science 3: 34–48. doi:10.1287/trsc.31.1.34.

Bujosa, Angel, Antoni Riera, and Robert L Hicks. 2010. “Combining Discrete and Continuous Representations of Preference Heterogeneity: A Latent Class Approach.” Environmental and Resource Economics 47 (4). Springer: 477–93. doi:10.1007/s10640-010-9389-y.

DeSarbo, W., V. Ramaswamy, and S. Cohen. 1995. “Market Segmentation with Choice-based Conjoint Analysis.” Marketing Letters 6: 137–47. doi:10.1007/BF00994929.

Greene, William H, and David A Hensher. 2003. “A Latent Class Model for Discrete Choice Analysis: Contrasts with Mixed Logit.” Transportation Research Part B: Methodological 37 (8). Elsevier: 681–98. doi:10.1016/S0191-2615(02)00046-2.

Hess, Stephane. 2014. “Latent Class Structures: Taste Heterogeneity and Beyond.” In Handbook of Choice Modelling, edited by S. Hess and A. Daly, 311. Elgar Original Reference Series. Edward Elgar Publishing. doi:10.4337/9781781003152.00021.

Kamakura, W.A, and G. Russell. 1989. “A Probabilistic Choice Model for Market Segmentation and Elasticity Structure.” Journal of Marketing Research 26: 379–90. doi:10.2307/3172759.

Keane, Michael, and Nada Wasi. 2013. “Comparing Alternative Models of Heterogeneity in Consumer Choice Behavior.” Journal of Applied Econometrics 28 (6). Wiley Online Library: 1018–45. doi:10.1002/jae.2304.

Krinsky, Itzhak, and A. Leslie Robb. 1986. “On Approximating the Statistical Properties of Elasticities.” The Review of Economics and Statistics 68 (4): 715–19. doi:10.2307/1924536.

———. 1990. “On Approximating the Statistical Properties of Elasticities: A Correction.” The Review of Economics and Statistics 72 (1): 189–90. doi:10.2307/2109761.

Revelt, David, and Kenneth Train. 2000. “Customer-Specific Taste Parameters and Mixed Logit: Households’ Choice of Electricity Supplier.” Working Paper. Department of Economics, UCB.

Sarrias, Mauricio, and Ricardo Daziano. 2017a. “Multinomial Logit Models with Continuous and Discrete Individual Heterogeneity in R: The gmnl Package.” Journal of Statistical Software, Articles 79 (2): 1–46. doi:10.18637/jss.v079.i02.

Sarrias, Mauricio, and Ricardo A. Daziano. 2017b. “Individual-Specific Point and Interval Conditional Estimates of Latent Class Logit Parameters.” Journal of Choice Modelling. doi:10.1016/j.jocm.2017.10.004.

Scarpa, Riccardo, and Mara Thiene. 2005. “Destination Choice Models for Rock Climbing in the Northeastern Alps: A Latent-Class Approach Based on Intensity of Preferences.” Land Economics 81 (3). University of Wisconsin Press: 426–44. doi:10.3368/le.81.3.426.

Shen, Junyi. 2009. “Latent Class Model or Mixed Logit Model? A Comparison by Transport Mode Choice Data.” Applied Economics 41 (22). Taylor & Francis: 2915–24. doi:10.1080/00036840801964633.

Train, K. 2008. “EM Algorithms for Nonparametric Estimation of Mixing Distributions.” Journal of Choice Modelling 1 (1): 40–69. doi:10.1016/S1755-5345(13)70022-8.

Train, Kenneth E. 2009. Discrete Choice Methods with Simulation. 2nd ed. Cambridge university press. doi:10.1017/CBO9780511805271.

Wedel, Michel, and Wagner A Kamakura. 2012. Market Segmentation: Conceptual and Methodological Foundations. Vol. 8. Springer Science & Business Media. doi:10.1007/978-1-4615-4651-1.

FAQ: Latent Class Multinomial Logit Model using gmnl Package

Mauricio Sarrias