Binary Logit Model

Introduction

The chapter 4 in the Apollo manual also explains the working of Apollo package in detail. However, I feel that the details in chapter 4 can be overwhelming for those who are new to discrete choice modelling. Therefore, I have created a simple template for binary logit model in this document to walk you through all the steps. The structure of the model will generally remain similar for more complex choice models (multinomial logit model, ordered logit model, latent choice model, mixed logit model, etc). We will later add more elements in the code for other models.

There are also many examples, along with few datasets, which can be downloaded from the Apollo choice modelling website for more detailed understanding.

The code is broken down into series of steps, it very slightly deviates from the manual in step 2 because we will use tidyverse package to read, analyze and preprocess data since it is more flexible and also very useful tool for data analysis.

Step 1 - Initialisation
- clear memory (optional but recommended)
- set working directory
- load packages
- set apollo core control
Step 2 - Data
- read file (csv or spss)
- clean data (optional)
- if we read spss file, we should remove labels/label/attributes
- convert tibble to dataframe
Step 3 - Define model parameters
- initialise parameters (betas)
- indicate parameters that are to be kept fixed
Step 4 - Validate Inputs
Step 5 - Define apollo probabilities function
- define utility equation
- set mnl_settings
Step 6 - Estimate model
Step 7 - Print and save output

Create a folder with name Apollo Binary Logit. Start a new script by clicking on File %>% New File %>% R Script. Read %>% as then. The script pane would be the on top left. Let’s save the script as BL_Template.R (by clicking File %>% Save), we don’t need to write the extension .R just like we don’t write the extension .png to save the image. Save the script in the same folder (Apollo Binary Logit).

Step 1 - Initialise the code

The first step is to initialise the code. We can break it into two sub-steps. You can read more detail in page no. 20 in the manual.

Step 1-1 - Load libraries

We first clear the workspace, set working directory, load the relevant libraries and then we call the function apollo_initialise(). Please read section 4.1 in the manual for details about the function.

We will load two packages in the R environment.

tidyverse - to read, clean, transform, visualize and post processing the data
apollo - for discrete choice modelling
haven - for spss file

# --------------------------------------------------------------------------------
### Step 1 - Initialise the code 

#### Step 1-1 - Load libraries

# clear workspace
rm(list = ls())

# set working directory
setwd("ENTER PATH (USE / NOT \)")
# e.g. setwd("C:/Users/gulhare.s/Desktop/Discrete choice analysis/Apollo Binary Logit")

# load relevant libraries
library(tidyverse) # for data analysis
library(apollo) # for discrete choice modeling
library(haven) # for spss file

# mandatory step
apollo_initialise()

Step 1-2 - Set Apollo core controls

We have to set the core controls. In this case, we give the name and description of the model. We also identify the column which contains information about the individual decision makers. The detailed list of core controls can be seen in page no. 21 of the manual.

# --------------------------------------------------------------------------------
#### Step 1-2 - Set Apollo controls

apollo_control = list(
  modelName  ="ENTER MODEL_NAME",
  modelDescr =" ",
  indivID    ="ENTER INDIVIDUAL IDENTIFIER"
)
# e.g.
# apollo_control = list(
#   modelName  ="Model_1",
#   modelDescr ="Only travel attributes",
#   indivID    ="PersonID"
# )

Step 2 - Data

We can break this step into two sub-steps. In this step 2-1, we read the data from files (.csv, .spss). And if required, in step 2-2, we clean and transform the data.

Step 2-1 - Read data

You can read in detail about reading/importing csv file in chapter 11 in the book - R for Data Science

# --------------------------------------------------------------------------------
### Step 2 - Data

#### Step 2-1 - Read data

# TO READ CSV FILE (OPTIONAL)
tbl <- read_csv("ENTER FILE NAME")
# e.g. tbl <- read_csv("Data.csv")

# READ SPSS FILE (OPTIONAL)
tbl_spss <- read_sav("ENTER FILE NAME")
# e.g. tbl_spss <- read_sav("Data.sav")

Step 2-2 - Clean and transform data

This step involves cleaning and transforming the data. In case, we read data from spss files, we need to remove labels, attributes from the tibble using the command below. Note that we are working with data type tibble. but Apollo requires data in the form of dataframe, not tibble. So we convert tibble into dataframe using function as.data.frame() and assign it to a variable called database. THe final data for discrete choice modelling needs to be stored as database.

# ----------------------------------------------------------------------------------
#### Step 2-2 - Clean and transform data

# If we read data from SPSS file, we need to remove labels/label/attributes from tibble (OPTIONAL)
tbl <- zap_labels(zap_formats(zap_label(tbl_spss)))

# convert tibble into dataframe
database <- as.data.frame(tbl)

Step 3 - Define model parameters

We need to define a vector apollo_beta which contains parameters and their starting values. We initialise them to zeros. In the commented example, we need to estimate alternate specific constants and beta parameters for travel time and travel cost. We also need to define another vector apollo_fixed which contains parameters whose values are kept fixed. In the example, we fix the value of alternate specific constant of bus i.e. asc_bus is fixed to 0.

# --------------------------------------------------------------------------------
### Step 3 - Define model parameters

### Vector of parameters, including any that are kept fixed in estimation
apollo_beta = c("INITIALIZE PARAMETERS")
# e.g.
# apollo_beta = c(asc_car = 0,
#                 asc_bus = 0,
#                 b_tt    = 0,
#                 b_tc    = 0)

### Vector with names (in quotes) of parameters to be kept fixed at their starting value in apollo_beta, use apollo_beta_fixed = c() if none
apollo_fixed = c("ENTER PARAMETER THAT HAS TO FIXED")
# e.g.
# apollo_fixed = c("asc_bus")

Step 4 - Validate Inputs

This function runs a number of checks and produces a consolidated list of model inputs. It looks for various inputs in global environment -

apollo_control
apollo_beta
apollo_fixed
database
Also searched for the identifier indivID (which is declared in step 1-2) in the database

If any of these is missing from global environment, then apollo_validateInputs() fails.

# --------------------------------------------------------------------------------
### Step 4 - Validate data

apollo_inputs = apollo_validateInputs()

Step 5 - Define apollo probabilities

Unlike other functions which are predefined, apollo_probabilities() needs to be defined by the user. The function is used for model estimation by another function called apollo_estimate() in step 6. The apollo_probabilities() takes three inputs -

apollo_beta
apollo_inputs
functionality which takes a default value “estimate”

The step can be seen as three sub steps.

Step 5-1 Attach elements

The initial lines of the code apollo_attach(apollo_beta, apollo_inputs) enables us to call individual elements of database e.g. using TTcar instead of database$TTcar. The command on.exit(apollo_detach(apollo_beta, apollo_inputs)) reverses the first command as soon as the code exits the apollo_probabilities(). The details can be read in section 4.5 of the manual.

Step 5-2 Calculate probabilities

We define the actual model i.e. a list of utility equations V and then calculate the list of probabilities P using function apollo_mnl(). The function apollo_mnl() takes two inputs:

mnl_settings: a list containing three compulsory inputs -

alternatives: A named vector containing the names of the alternatives as defined by the user
choiceVar: A vector containing the chosen alternative for each observation
utilities: A list object containing one utility for each alternative, defined earlier

functionality: takes a default value for model estimation

Step 5-3 Calculate likelihood

We take product of choice probabilities using function apollo_prepareProd() (if there are multiple choices observations for each individuals) and return the object P.

# --------------------------------------------------------------------------------
### Step 5 - Define apollo probabilities

apollo_probabilities=function(apollo_beta, apollo_inputs, functionality="estimate"){

  
  ### Step 5-1 ATTACH INPUTS AND DETACH AFTER FUNCTION EXIT ********************
  apollo_attach(apollo_beta, apollo_inputs)
  on.exit(apollo_detach(apollo_beta, apollo_inputs))
  
  ### DEFINE UTILITY EQUATIONS HERE *************************************
  
  ### List of utilities: these must use the same names as in mnl_settings, order is irrelevant
  V = list()
  V[["ENTER_MODE_1"]]  = ENTER UTILITY EQUATION FOR MODE 1
  V[["ENTER_MODE_2"]]  = ENTER UTILITY EQUATION FOR MODE 1
  # e.g.
  # V[["car"]]  = asc_car + b_tt * TTcar + b_tc * TCcar
  # V[["bus"]]  = asc_bus + b_tt * TTbus + b_tc * TCbus
  
  
  ### Define settings for MNL model component
  mnl_settings = list(
    alternatives  = c(ENTER_MODE_1 = IDENTIFIER_MODE_1, 
                      ENTER_MODE_2 = IDENTIFIER_MODE_2), 
    avail         = 1, 
    choiceVar     = ENTER_CHOICE_COLUMN_NAME,
    V             = V)
  # e.g.
  # mnl_settings = list(
  #   alternatives  = c(car = 1, bus = 0), 
  #   avail         = 1, 
  #   choiceVar     = choice,
  #   V             = V)
  
  # *************************************************************************************
  
  
  ### Create list of probabilities P
  P = list()
  
  ### Compute probabilities using MNL model
  P[['model']] = apollo_mnl(mnl_settings, functionality)

  # ### Take product across observation for same individual
  # P = apollo_panelProd(P, apollo_inputs, functionality)

  ### Prepare and return outputs of function
  P = apollo_prepareProb(P, apollo_inputs, functionality)
  
  return(P)
}

Step 6 - Estimate model

We can perform model estimation by calling the function apollo_estimate() and saving the output from it in an object called model. This function uses the maxLik package for classical estimation. You can read details in section 4.6 of the manual.

# --------------------------------------------------------------------------------
### Step 6 - Estimate model

model = apollo_estimate(apollo_beta, 
                        apollo_fixed, 
                        apollo_probabilities, 
                        apollo_inputs, 
                        writeIter = TRUE)

Step 7 - Model output

After completing model estimation, the user can output the results to the console and/or tp a set of different output files.

# --------------------------------------------------------------------------------
### Step 7 - Model output

apollo_modelOutput(model)

# save output 
apollo_saveOutput(model)