The chapter 4 in the Apollo manual also explains the working of Apollo package in detail. However, I feel that the details in chapter 4 can be overwhelming for those who are new to discrete choice modelling. Therefore, I have created a simple template for binary logit model in this document to walk you through all the steps. The structure of the model will generally remain similar for more complex choice models (multinomial logit model, ordered logit model, latent choice model, mixed logit model, etc). We will later add more elements in the code for other models.
There are also many examples, along with few datasets, which can be downloaded from the Apollo choice modelling website for more detailed understanding.
The code is broken down into series of steps, it very slightly deviates from the manual in step 2 because we will use tidyverse package to read, analyze and preprocess data since it is more flexible and also very useful tool for data analysis.
Step 1 - Initialisation
Step 2 - Data
Step 3 - Define model parameters
Step 4 - Validate Inputs
Step 5 - Define apollo probabilities function
Step 6 - Estimate model
Step 7 - Print and save output
Create a folder with name Apollo Binary Logit. Start a new script by clicking on File %>% New File %>% R Script. Read %>% as then. The script pane would be the on top left. Let’s save the script as BL_Template.R (by clicking File %>% Save), we don’t need to write the extension .R just like we don’t write the extension .png to save the image. Save the script in the same folder (Apollo Binary Logit).
The first step is to initialise the code. We can break it into two sub-steps. You can read more detail in page no. 20 in the manual.
We first clear the workspace, set working directory, load the relevant libraries and then we call the function apollo_initialise(). Please read section 4.1 in the manual for details about the function.
We will load two packages in the R environment.
# --------------------------------------------------------------------------------
### Step 1 - Initialise the code
#### Step 1-1 - Load libraries
# clear workspace
rm(list = ls())
# set working directory
setwd("ENTER PATH (USE / NOT \)")
# e.g. setwd("C:/Users/gulhare.s/Desktop/Discrete choice analysis/Apollo Binary Logit")
# load relevant libraries
library(tidyverse) # for data analysis
library(apollo) # for discrete choice modeling
library(haven) # for spss file
# mandatory step
apollo_initialise()
We have to set the core controls. In this case, we give the name and description of the model. We also identify the column which contains information about the individual decision makers. The detailed list of core controls can be seen in page no. 21 of the manual.
# --------------------------------------------------------------------------------
#### Step 1-2 - Set Apollo controls
apollo_control = list(
modelName ="ENTER MODEL_NAME",
modelDescr =" ",
indivID ="ENTER INDIVIDUAL IDENTIFIER"
)
# e.g.
# apollo_control = list(
# modelName ="Model_1",
# modelDescr ="Only travel attributes",
# indivID ="PersonID"
# )
We can break this step into two sub-steps. In this step 2-1, we read the data from files (.csv, .spss). And if required, in step 2-2, we clean and transform the data.
You can read in detail about reading/importing csv file in chapter 11 in the book - R for Data Science
# --------------------------------------------------------------------------------
### Step 2 - Data
#### Step 2-1 - Read data
# TO READ CSV FILE (OPTIONAL)
tbl <- read_csv("ENTER FILE NAME")
# e.g. tbl <- read_csv("Data.csv")
# READ SPSS FILE (OPTIONAL)
tbl_spss <- read_sav("ENTER FILE NAME")
# e.g. tbl_spss <- read_sav("Data.sav")
This step involves cleaning and transforming the data. In case, we read data from spss files, we need to remove labels, attributes from the tibble using the command below. Note that we are working with data type tibble. but Apollo requires data in the form of dataframe, not tibble. So we convert tibble into dataframe using function as.data.frame() and assign it to a variable called database. THe final data for discrete choice modelling needs to be stored as database.
# ----------------------------------------------------------------------------------
#### Step 2-2 - Clean and transform data
# If we read data from SPSS file, we need to remove labels/label/attributes from tibble (OPTIONAL)
tbl <- zap_labels(zap_formats(zap_label(tbl_spss)))
# convert tibble into dataframe
database <- as.data.frame(tbl)
We need to define a vector apollo_beta which contains parameters and their starting values. We initialise them to zeros. In the commented example, we need to estimate alternate specific constants and beta parameters for travel time and travel cost. We also need to define another vector apollo_fixed which contains parameters whose values are kept fixed. In the example, we fix the value of alternate specific constant of bus i.e. asc_bus is fixed to 0.
# --------------------------------------------------------------------------------
### Step 3 - Define model parameters
### Vector of parameters, including any that are kept fixed in estimation
apollo_beta = c("INITIALIZE PARAMETERS")
# e.g.
# apollo_beta = c(asc_car = 0,
# asc_bus = 0,
# b_tt = 0,
# b_tc = 0)
### Vector with names (in quotes) of parameters to be kept fixed at their starting value in apollo_beta, use apollo_beta_fixed = c() if none
apollo_fixed = c("ENTER PARAMETER THAT HAS TO FIXED")
# e.g.
# apollo_fixed = c("asc_bus")
This function runs a number of checks and produces a consolidated list of model inputs. It looks for various inputs in global environment -
apollo_controlapollo_betaapollo_fixeddatabaseindivID (which is declared in step 1-2) in the databaseIf any of these is missing from global environment, then apollo_validateInputs() fails.
# --------------------------------------------------------------------------------
### Step 4 - Validate data
apollo_inputs = apollo_validateInputs()
Unlike other functions which are predefined, apollo_probabilities() needs to be defined by the user. The function is used for model estimation by another function called apollo_estimate() in step 6. The apollo_probabilities() takes three inputs -
apollo_betaapollo_inputsfunctionality which takes a default value “estimate”The step can be seen as three sub steps.
The initial lines of the code apollo_attach(apollo_beta, apollo_inputs) enables us to call individual elements of database e.g. using TTcar instead of database$TTcar. The command on.exit(apollo_detach(apollo_beta, apollo_inputs)) reverses the first command as soon as the code exits the apollo_probabilities(). The details can be read in section 4.5 of the manual.
We define the actual model i.e. a list of utility equations V and then calculate the list of probabilities P using function apollo_mnl(). The function apollo_mnl() takes two inputs:
mnl_settings: a list containing three compulsory inputs -functionality: takes a default value for model estimationWe take product of choice probabilities using function apollo_prepareProd() (if there are multiple choices observations for each individuals) and return the object P.
# --------------------------------------------------------------------------------
### Step 5 - Define apollo probabilities
apollo_probabilities=function(apollo_beta, apollo_inputs, functionality="estimate"){
### Step 5-1 ATTACH INPUTS AND DETACH AFTER FUNCTION EXIT ********************
apollo_attach(apollo_beta, apollo_inputs)
on.exit(apollo_detach(apollo_beta, apollo_inputs))
### DEFINE UTILITY EQUATIONS HERE *************************************
### List of utilities: these must use the same names as in mnl_settings, order is irrelevant
V = list()
V[["ENTER_MODE_1"]] = ENTER UTILITY EQUATION FOR MODE 1
V[["ENTER_MODE_2"]] = ENTER UTILITY EQUATION FOR MODE 1
# e.g.
# V[["car"]] = asc_car + b_tt * TTcar + b_tc * TCcar
# V[["bus"]] = asc_bus + b_tt * TTbus + b_tc * TCbus
### Define settings for MNL model component
mnl_settings = list(
alternatives = c(ENTER_MODE_1 = IDENTIFIER_MODE_1,
ENTER_MODE_2 = IDENTIFIER_MODE_2),
avail = 1,
choiceVar = ENTER_CHOICE_COLUMN_NAME,
V = V)
# e.g.
# mnl_settings = list(
# alternatives = c(car = 1, bus = 0),
# avail = 1,
# choiceVar = choice,
# V = V)
# *************************************************************************************
### Create list of probabilities P
P = list()
### Compute probabilities using MNL model
P[['model']] = apollo_mnl(mnl_settings, functionality)
# ### Take product across observation for same individual
# P = apollo_panelProd(P, apollo_inputs, functionality)
### Prepare and return outputs of function
P = apollo_prepareProb(P, apollo_inputs, functionality)
return(P)
}
We can perform model estimation by calling the function apollo_estimate() and saving the output from it in an object called model. This function uses the maxLik package for classical estimation. You can read details in section 4.6 of the manual.
# --------------------------------------------------------------------------------
### Step 6 - Estimate model
model = apollo_estimate(apollo_beta,
apollo_fixed,
apollo_probabilities,
apollo_inputs,
writeIter = TRUE)
After completing model estimation, the user can output the results to the console and/or tp a set of different output files.
# --------------------------------------------------------------------------------
### Step 7 - Model output
apollo_modelOutput(model)
# save output
apollo_saveOutput(model)