The problem

A simulated dataset

To show how bad-scaling can produce problems with convergence or numerical issues involved in inverting the Hessian, I create an artificial mixed logit dataset:

rm(list = ls(all = TRUE))          # Clean objects
library("gmnl")
library("mlogit")

I first set the globals (seed, number of individuals, number of alternatives, etc) for the simulation:

# Globals
set.seed(250586)             # Set seed
N <- 300                     # Number of individuals
J <- 3                       # Number of alternatives
t <- 10                      # Number of choice situations
R <- 50                      # Numer of draws
id <- rep(1:N, each = J * t) # Id for each individual

Now we create two normally distributed parameters and two variables.

# Create random parameters and variables
b1 <- 1 + 1 * rnorm(N) # Normal
b2 <- 2 + 1 * rnorm(N) # Normal
x1 <- as.numeric(runif(N * J * t) < 0.5) # dummy variable
x2 <- rnorm(N * J * t)                   # continuous variable

Note that x1 is a dummy variable, whereas x2 is continuous. The utility for each individual is created as follows:

# The true data generating process
U <-  b1[id] * x1 + b2[id] * x2 - log(-log(runif(N * J * t))) 

Note that the error term is distributed as Extreme value type I. The chosen alternative is based in the following code:

# Chosen alternative
chosen <- rep(0, N * J * t)
for (i in 1:(N * t)) {
  U_j <- U[((i - 1) * J + 1):(i * J)]
  U_max <- max(U_j)
  pos <- which(U_j == U_max)
  chosen[((i - 1) * J + 1):(i * J)][pos] <- TRUE
}

Putting the data in the correct format using mlogit.data function from mlogit package

data <- as.data.frame(cbind(id, chosen, x1, x2))
data <- mlogit.data(data, 
                    choice = "chosen",
                    shape = "long", 
                    alt.levels = c("1", "2", "3"), 
                    id.var = "id")

Now we estimate a MIXL model:

mix1 <- gmnl(chosen ~ x1 + x2| 0, 
            data = data,
            model = "mixl",
            ranp = c(x1 = "n", x2 = "n"),
            print.init = TRUE,
            R = R,
            panel = TRUE,
            iterlim = 300,
            method = "bhhh",
            print.level = 2)
## 
## Starting Values:
##        x1        x2     sd.x1     sd.x2 
## 0.7574765 1.5132982 0.1000000 0.1000000 
## Estimating MIXL model 
## ----- Initial parameters: -----
## fcn value: -2109.154 
##       parameter initial gradient free
## x1    0.7574765      -0.03071733    1
## x2    1.5132982       3.56397182    1
## sd.x1 0.1000000      18.41496933    1
## sd.x2 0.1000000      93.80607253    1
## Condition number of the (active) hessian: 160.454 
## -----Iteration 1 -----
## -----Iteration 2 -----
## -----Iteration 3 -----
## -----Iteration 4 -----
## -----Iteration 5 -----
## -----Iteration 6 -----
## -----Iteration 7 -----
## -----Iteration 8 -----
## -----Iteration 9 -----
## -----Iteration 10 -----
## -----Iteration 11 -----
## -----Iteration 12 -----
## --------------
## successive function values within tolerance limit 
## 12  iterations
## estimate: 0.8749408 2.062459 0.9824276 0.9992621 
## Function value: -2003.711
summary(mix1)
## 
## Model estimated on: Sat Oct 07 15:34:09 2017 
## 
## Call:
## gmnl(formula = chosen ~ x1 + x2 | 0, data = data, model = "mixl", 
##     ranp = c(x1 = "n", x2 = "n"), R = R, panel = TRUE, print.init = TRUE, 
##     iterlim = 300, method = "bhhh", print.level = 2)
## 
## Frequencies of categories:
## 
##       1       2       3 
## 0.31933 0.34100 0.33967 
## 
## The estimation took: 0h:0m:13s 
## 
## Coefficients:
##       Estimate Std. Error z-value  Pr(>|z|)    
## x1    0.874941   0.071802 12.1855 < 2.2e-16 ***
## x2    2.062459   0.078690 26.2099 < 2.2e-16 ***
## sd.x1 0.982428   0.118759  8.2725  2.22e-16 ***
## sd.x2 0.999262   0.072016 13.8755 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Optimization of log-likelihood by BHHH maximisation
## Log Likelihood: -2003.7
## Number of observations: 3000
## Number of iterations: 12
## Exit of MLE: successive function values within tolerance limit
## Simulation based on 50 draws

The results show that the point estimates are very close to the true population parameters. It is also important to notice that I have included the argument print.level = 2 in the gmnl function. This allows tracing the optimization procedure in real time.