Preamble

We will start estimating models. As you have already experienced a bit of R, we jump start into the first part of our choice experiment analysis.

We continue with the example of the festival. We added one more alternative, that is “no festival”.

First we read in packages, set the seed and clear output. The seed is determine the draws from a random number generator. We need random numbers when we simulate the unobserved part of utility. Using the seed, you will always get the exact same random numbers.

rm(list=ls())

library("readr")
library("psych")
library("dplyr")

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library("evd")           
library("apollo")

## 
## 
##              . ,,                                                            
##             ,      ,,                                                        
##  ,,,,,,    ,         ,,                                                      
## ,     ,,  ,            ,,,,.                                                 
## ,,     , ,,   ,,,,,,    ,,,                                 //  //           
##   ,     ,,,.   ,,,,,.   ,,      ////                        //  //           
## ,,     ,,,,,.           ,,     // //     //////    /////    //  //    /////  
## ,,,        ,,           ,      //  //    /    //  //   //   //  //   //   // 
##               ,,       ,      ////////   /    //  //   //   //  //   //   // 
##                 ,,   ,,      //     //   /   ///  //   //   //  //   //   // 
##                    ,         //      //  /////      ///      //  //    ///   
##                                          //                                  
##                                          //                                  
## 
## Apollo 0.3.4
## http://www.ApolloChoiceModelling.com
## See url for a detailed manual, examples and a user forum.
## Sign up to the user forum to receive updates on new releases.
## 
## Please cite Apollo in all written material you produce:
## Hess S, Palma D (2019). "Apollo: a flexible, powerful and customisable
## freeware package for choice model estimation and application." Journal
## of Choice Modelling, 32. doi:10.1016/j.jocm.2019.100170

library("tidyr")
library("kableExtra")

## 
## Attaching package: 'kableExtra'

## The following object is masked from 'package:dplyr':
## 
##     group_rows

library("tidylog")

## 
## Attaching package: 'tidylog'

## The following objects are masked from 'package:tidyr':
## 
##     drop_na, fill, gather, pivot_longer, pivot_wider, replace_na,
##     spread, uncount

## The following objects are masked from 'package:dplyr':
## 
##     add_count, add_tally, anti_join, count, distinct, distinct_all,
##     distinct_at, distinct_if, filter, filter_all, filter_at, filter_if,
##     full_join, group_by, group_by_all, group_by_at, group_by_if,
##     inner_join, left_join, mutate, mutate_all, mutate_at, mutate_if,
##     relocate, rename, rename_all, rename_at, rename_if, rename_with,
##     right_join, sample_frac, sample_n, select, select_all, select_at,
##     select_if, semi_join, slice, slice_head, slice_max, slice_min,
##     slice_sample, slice_tail, summarise, summarise_all, summarise_at,
##     summarise_if, summarize, summarize_all, summarize_at, summarize_if,
##     tally, top_frac, top_n, transmute, transmute_all, transmute_at,
##     transmute_if, ungroup

## The following object is masked from 'package:stats':
## 
##     filter

set.seed(48498877)

Reading in a design and prepare for simulation

We use a design generated in NGENE. The design is in wide format, which we will need to estimate models in the R package apollo. However, designs may come in other formats and if you use other packages (e.g. gmnl), you would need to reshape to long.

designfile = "design_festivals.RDS"

design <- readRDS(designfile) %>% 
   mutate(Choice.situation = row_number())

## mutate: new variable 'Choice.situation' (integer) with 20 unique values and 0% NA

kable(design) %>% kable_styling() ## Print the design. We use the kable function to make it look nice.

alt1_lineup	alt1_byob	alt1_size	alt1_holidays	alt1_price	alt2_lineup	alt2_byob	alt2_size	alt2_holidays	alt2_price	alt3_lineup	alt3_byob	alt3_size	alt3_holidays	alt3_price	block	Choice.situation
9	0	11.156250	1	100	5	0	8.006368	5	150	7	1	11.156250	5	50	2	1
7	1	8.987197	5	50	7	0	8.987197	5	150	5	1	8.006368	1	250	1	2
7	0	8.006368	1	100	5	0	11.156250	5	50	5	0	8.006368	5	250	1	3
5	1	8.006368	2	100	9	1	8.987197	5	200	7	1	11.156250	5	150	1	4
9	1	8.987197	1	250	9	0	8.987197	1	200	9	1	8.006368	5	200	1	5
9	0	8.987197	2	250	5	0	8.987197	5	150	9	0	11.156250	5	100	1	6
5	1	11.156250	2	200	5	1	8.987197	5	50	7	0	11.156250	2	150	1	7
5	1	11.156250	1	150	9	0	8.006368	2	150	9	0	8.987197	2	50	1	8
9	1	11.156250	5	50	5	0	11.156250	1	150	5	0	8.987197	1	150	2	9
5	0	11.156250	2	150	7	1	11.156250	5	200	9	0	8.987197	5	250	2	10
5	1	8.006368	5	50	9	0	8.987197	2	150	7	0	8.006368	1	200	2	11
5	1	8.987197	2	150	9	0	8.006368	1	50	9	0	11.156250	5	100	2	12
7	0	8.006368	2	150	7	1	8.987197	5	50	7	0	8.006368	2	250	2	13
5	0	8.987197	2	250	5	0	8.987197	2	50	7	1	8.006368	1	250	1	14
7	1	11.156250	5	200	7	0	8.987197	5	150	9	1	8.006368	5	150	2	15
5	1	8.006368	2	200	7	0	8.006368	2	200	7	1	8.006368	1	150	2	16
5	1	11.156250	1	250	7	0	8.006368	5	200	5	1	8.987197	2	150	2	17
7	0	8.006368	1	250	5	0	11.156250	2	150	5	0	8.987197	5	100	2	18
5	0	11.156250	5	200	9	0	8.987197	2	100	5	1	8.006368	2	250	1	19
9	1	11.156250	2	100	5	1	8.987197	2	200	9	0	8.987197	5	50	1	20

Have a look at the output. Each row is one choice set. The design consists of 20 choice sets, separated into 2 blocks. There are three alternatives and an opt out option. The opt out option is not part of the design, so we do not see it. Each alternative contains 5 attributes of which .price is the festival price.

Setting up parameters for simulation

In this step we define the coefficients (betas) for the utility function and define other parameters such as number of simulated responses.

The betas can be made up by us and varied as desired. However, note that higher beta values mean lower error variance and thus more significant values. For example, if you use beta values in the range of 100 to 200, you will get significant results already with 10 respondents. Can you explain why this is the case?

bwyld = 1
bcringe = 1
bsheesh = 1
blineup = 0.6
bbyob = 1
bsize = -0.1
bholidays = -0.4
bprice = -0.01

Next, we define some meta parameters for the simulation. The number of replications can be set by us.

replications <-500    # number of respondents in the simulation. You can increase this number later on and see how the standard errors go down

nsets<-nrow(design)        ## Extract the number of choice sets from the design
nblocks<-max(design$block) ## Extract the number of blocks from the design
setpp <- nsets/nblocks      # Extract the Choice Sets per respondent; in this 'no blocks' design everyone sees all 24 sets
respondents <- replications*nblocks

Generate dataset

We will now generate a dataset based on the parameters above. We will duplicate the design according to the number of replications, create a variable identifying the the respondents (for panel data analysis), and create deterministic utility.

Utility for alternative 1 is defined as

\[ V_1 = b_{wyld} + b_{lineup}*alt1.lineup + b_{byob} * alt1.byob + b_{size} * alt1.size + b_{holidays}*alt1.holidays + b_{price} * alt1.price \] and for alternative 2

\[ V_2 = b_{cringe} + b_{lineup}*alt2.lineup + b_{byob} * alt2.byob + b_{size} * alt2.size + b_{holidays}*alt2.holidays + b_{price} * alt2.price \] and for alternative 3

\[ V_3 = b_{sheesh} + b_{lineup}*alt3.lineup + b_{byob} * alt3.byob + b_{size} * alt3.size + b_{holidays}*alt3.holidays + b_{price} * alt3.price \]

The Utility of the opt out is set to zero

\[ V_{optout} = 0 \]

database<- design %>%
  arrange(block,Choice.situation) %>% 
  dplyr::slice(rep(row_number(), replications)) %>%    ## replicate design according to number of replications
  dplyr::mutate(RID = rep(1:respondents, each=setpp)) %>%  # create Respondent ID.
  dplyr::relocate(RID,`Choice.situation`) %>% 
  dplyr::mutate(
    V.1 = bwyld + blineup*alt1_lineup + bbyob * alt1_byob + bsize * alt1_size + bholidays * alt1_holidays + bprice * alt1_price , #Utility of Wyld
    V.2 = bcringe + blineup*alt2_lineup + bbyob * alt2_byob + bsize * alt2_size + bholidays * alt2_holidays + bprice * alt2_price ,  #Utility of Cringe
    V.3 = bsheesh + blineup*alt3_lineup + bbyob * alt3_byob + bsize * alt3_size + bholidays * alt3_holidays + bprice * alt3_price ,  #Utility of Sheesh
    V.4 = 0 ) %>% # utility of opt out, normalized to zero
  as.data.frame()


kable(head(database,20),digits = 2) %>% kable_styling()

RID	Choice.situation	alt1_lineup	alt1_byob	alt1_size	alt1_holidays	alt1_price	alt2_lineup	alt2_byob	alt2_size	alt2_holidays	alt2_price	alt3_lineup	alt3_byob	alt3_size	alt3_holidays	alt3_price	block	V.1	V.2	V.3
1	2	7	1	8.99	5	50	7	0	8.99	5	150	5	1	8.01	1	250	1	2.80	0.80	1.30
1	3	7	0	8.01	1	100	5	0	11.16	5	50	5	0	8.01	5	250	1	3.00	0.38	-1.30
1	4	5	1	8.01	2	100	9	1	8.99	5	200	7	1	11.16	5	150	1	2.40	2.50	1.58
1	5	9	1	8.99	1	250	9	0	8.99	1	200	9	1	8.01	5	200	1	3.60	3.10	2.60
1	6	9	0	8.99	2	250	5	0	8.99	5	150	9	0	11.16	5	100	1	2.20	-0.40	2.28
1	7	5	1	11.16	2	200	5	1	8.99	5	50	7	0	11.16	2	150	1	1.08	1.60	1.78
1	8	5	1	11.16	1	150	9	0	8.01	2	150	9	0	8.99	2	50	1	1.98	3.30	4.20
1	14	5	0	8.99	2	250	5	0	8.99	2	50	7	1	8.01	1	250	1	-0.20	1.80	2.50
1	19	5	0	11.16	5	200	9	0	8.99	2	100	5	1	8.01	2	250	1	-1.12	3.70	0.90
1	20	9	1	11.16	2	100	5	1	8.99	2	200	9	0	8.99	5	50	1	4.48	1.30	3.00
2	1	9	0	11.16	1	100	5	0	8.01	5	150	7	1	11.16	5	50	2	3.88	-0.30	2.58
2	9	9	1	11.16	5	50	5	0	11.16	1	150	5	0	8.99	1	150	2	3.78	0.98	1.20
2	10	5	0	11.16	2	150	7	1	11.16	5	200	9	0	8.99	5	250	2	0.58	1.08	1.00
2	11	5	1	8.01	5	50	9	0	8.99	2	150	7	0	8.01	1	200	2	1.70	3.20	2.00
2	12	5	1	8.99	2	150	9	0	8.01	1	50	9	0	11.16	5	100	2	1.80	4.70	2.28
2	13	7	0	8.01	2	150	7	1	8.99	5	50	7	0	8.01	2	250	2	2.10	2.80	1.10
2	15	7	1	11.16	5	200	7	0	8.99	5	150	9	1	8.01	5	150	2	1.08	0.80	3.10
2	16	5	1	8.01	2	200	7	0	8.01	2	200	7	1	8.01	1	150	2	1.40	1.60	3.50
2	17	5	1	11.16	1	250	7	0	8.01	5	200	5	1	8.99	2	150	2	0.98	0.40	1.80
2	18	7	0	8.01	1	250	5	0	11.16	2	150	5	0	8.99	5	100	2	1.50	0.58	0.10

Now look at the database. It contains the deterministic part of utility. This is exactly what we looked at in the theory slides.

Simulate choices

So far, we have created a dataset with deterministic utility. In the Random Utility model, we assume that, besides the deterministic part \(V\), utility \(U\) has an unobserved part as well. For the logit model, we assume an additive error term \(\epsilon\) which is iid Gumbel distributed.

\[U = V+ \epsilon\]

We can take draws from the Gumbel distribution using the function rgumbel. Before we do this, lets try to understand this distribution and how it results in a logistic distribution.

Excersise:

Use the function rgumbel and take 1000 random draws and store the results in an object e1. Take another 1000 draws and store them in e2. Plot the result.

database <-  database %>% 
    group_by(RID) %>% 
    mutate(
      e.1 = rgumbel(setpp,loc=0, scale=1) , # random draws from gumbel
      e.2 = rgumbel(setpp,loc=0, scale=1) ,
      e.3 = rgumbel(setpp,loc=0, scale=1) ,
      e.4 = rgumbel(setpp,loc=0, scale=1) ,
      U.1 = V.1 + e.1 ,   # calculate total utility
      U.2 = V.2 + e.2 ,
      U.3 = V.3 + e.3 ,
      U.4 = V.4 + e.4
    )   %>% 
    as.data.frame()

## group_by: one grouping variable (RID)

## mutate (grouped): new variable 'e.1' (double) with 10,000 unique values and 0% NA

##                   new variable 'e.2' (double) with 10,000 unique values and 0% NA

##                   new variable 'e.3' (double) with 10,000 unique values and 0% NA

##                   new variable 'e.4' (double) with 10,000 unique values and 0% NA

##                   new variable 'U.1' (double) with 10,000 unique values and 0% NA

##                   new variable 'U.2' (double) with 10,000 unique values and 0% NA

##                   new variable 'U.3' (double) with 10,000 unique values and 0% NA

##                   new variable 'U.4' (double) with 10,000 unique values and 0% NA

  database$choice <- max.col(database[,c("U.1" , "U.2" , "U.3" , "U.4" )])  # Now, the choice variable

We have now a dataset that looks like a real world dataset. Yet, some variables would be unknown in real data. Which ones?

kable(head(database, 9), digits = 2) %>% kable_styling()

RID	Choice.situation	alt1_lineup	alt1_byob	alt1_size	alt1_holidays	alt1_price	alt2_lineup	alt2_byob	alt2_size	alt2_holidays	alt2_price	alt3_lineup	alt3_byob	alt3_size	alt3_holidays	alt3_price	block	V.1	V.2	V.3	e.1	e.2	e.3	e.4	U.1	U.2	U.3	U.4	choice
1	2	7	1	8.99	5	50	7	0	8.99	5	150	5	1	8.01	1	250	1	2.80	0.80	1.30	1.08	0.74	-0.04	0.10	3.88	1.54	1.26	0.10	1
1	3	7	0	8.01	1	100	5	0	11.16	5	50	5	0	8.01	5	250	1	3.00	0.38	-1.30	0.13	0.32	-0.16	-0.80	3.13	0.70	-1.46	-0.80	1
1	4	5	1	8.01	2	100	9	1	8.99	5	200	7	1	11.16	5	150	1	2.40	2.50	1.58	0.36	-0.55	0.88	-0.93	2.76	1.95	2.46	-0.93	1
1	5	9	1	8.99	1	250	9	0	8.99	1	200	9	1	8.01	5	200	1	3.60	3.10	2.60	0.87	-0.24	-1.31	-0.75	4.47	2.86	1.29	-0.75	1
1	6	9	0	8.99	2	250	5	0	8.99	5	150	9	0	11.16	5	100	1	2.20	-0.40	2.28	0.90	-1.22	-0.66	0.02	3.11	-1.62	1.62	0.02	1
1	7	5	1	11.16	2	200	5	1	8.99	5	50	7	0	11.16	2	150	1	1.08	1.60	1.78	2.27	-0.13	-0.52	0.95	3.36	1.48	1.26	0.95	1
1	8	5	1	11.16	1	150	9	0	8.01	2	150	9	0	8.99	2	50	1	1.98	3.30	4.20	2.61	0.13	-0.07	-0.41	4.59	3.43	4.13	-0.41	1
1	14	5	0	8.99	2	250	5	0	8.99	2	50	7	1	8.01	1	250	1	-0.20	1.80	2.50	-1.12	1.36	-0.69	0.59	-1.32	3.16	1.81	0.59	2
1	19	5	0	11.16	5	200	9	0	8.99	2	100	5	1	8.01	2	250	1	-1.12	3.70	0.90	-0.66	-0.22	-0.31	0.33	-1.78	3.48	0.59	0.33	2

Exercise

We can inspect if the simulation is correct. Go to wikipedia and find out if the mean and standard deviation of the theoretical Gumbel distribution is approximately equal to to means and standard deviations of the empirical (simulated) distributions.

summary(database[,c("e.1","e.2","e.3","e.4")])

##       e.1               e.2               e.3               e.4         
##  Min.   :-2.2601   Min.   :-2.1620   Min.   :-2.1961   Min.   :-2.1932  
##  1st Qu.:-0.3345   1st Qu.:-0.3209   1st Qu.:-0.3110   1st Qu.:-0.3339  
##  Median : 0.3731   Median : 0.3590   Median : 0.3689   Median : 0.3780  
##  Mean   : 0.5728   Mean   : 0.5851   Mean   : 0.5791   Mean   : 0.5830  
##  3rd Qu.: 1.2449   3rd Qu.: 1.2568   3rd Qu.: 1.2487   3rd Qu.: 1.2748  
##  Max.   : 7.7541   Max.   : 9.1418   Max.   :10.9442   Max.   :10.1492

sd(database$e.2)

## [1] 1.282901

summary(database[,c("V.1","V.2","V.3","V.4")])

##       V.1              V.2               V.3              V.4   
##  Min.   :-1.116   Min.   :-0.3987   Min.   :-1.301   Min.   :0  
##  1st Qu.: 1.084   1st Qu.: 0.7471   1st Qu.: 1.176   1st Qu.:0  
##  Median : 1.893   Median : 1.4503   Median : 1.900   Median :0  
##  Mean   : 1.953   Mean   : 1.6974   Mean   : 1.876   Mean   :0  
##  3rd Qu.: 2.851   3rd Qu.: 2.8763   3rd Qu.: 2.588   3rd Qu.:0  
##  Max.   : 4.484   Max.   : 4.6994   Max.   : 4.201   Max.   :0

summary(database[,c("U.1","U.2","U.3","U.4")])

##       U.1              U.2               U.3              U.4         
##  Min.   :-3.121   Min.   :-2.3109   Min.   :-3.377   Min.   :-2.1932  
##  1st Qu.: 1.249   1st Qu.: 0.8803   1st Qu.: 1.347   1st Qu.:-0.3339  
##  Median : 2.428   Median : 2.1178   Median : 2.385   Median : 0.3780  
##  Mean   : 2.526   Mean   : 2.2825   Mean   : 2.455   Mean   : 0.5830  
##  3rd Qu.: 3.705   3rd Qu.: 3.5132   3rd Qu.: 3.500   3rd Qu.: 1.2748  
##  Max.   :10.925   Max.   :10.5040   Max.   :14.444   Max.   :10.1492

Finally, we can delete those variables that are not visible in a real world dataset

database <- database %>% 
  select(-c(matches("\\.[1-4]$")))

## select: dropped 12 variables (V.1, V.2, V.3, V.4, e.1, …)

Estimate a conditional logit model

Having generated the dataset, we can now estimate a model. Here we use apollo to estimate a conditional logit model.

apollo_initialise()
  
  modelOutput_settings = list(printPVal=T)
  
  ### Set core controls
  apollo_control = list(
    modelName  ="Simulated Data",
    modelDescr ="Simple MNL model",
    indivID    ="RID"
  )
  
  
  apollo_beta=c(b_wyld =0,
                b_cringe =0,
                b_sheesh =0,
                b_lineup  =0.5,
                b_byob =0,
                b_size =0,
                b_holidays =0,      
                b_price =0 )
  
  ### keine Parameter fix halten
  apollo_fixed = c()
  
  ### validieren
  apollo_inputs = apollo_validateInputs()

## Several observations per individual detected based on the value of RID.
##   Setting panelData in apollo_control set to TRUE.
## All checks on apollo_control completed.
## All checks on database completed.

  apollo_probabilities=function(apollo_beta, apollo_inputs, functionality="estimate"){
    
    ### Function initialisation: do not change the following three commands
    ### Attach inputs and detach after function exit
    apollo_attach(apollo_beta, apollo_inputs)
    on.exit(apollo_detach(apollo_beta, apollo_inputs))
    
    ### Create list of probabilities P
    P = list()
    
    ### List of utilities (later integrated in mnl_settings below)
    V = list()
    V[['alt1']] = b_wyld + b_lineup*alt1_lineup + b_byob * alt1_byob + b_size * alt1_size + b_holidays * alt1_holidays + b_price * alt1_price
    
    V[['alt2']] = b_cringe + b_lineup*alt2_lineup + b_byob * alt2_byob + b_size * alt2_size + b_holidays * alt2_holidays + b_price * alt2_price
    
    V[['alt3']] = b_sheesh + b_lineup*alt3_lineup + b_byob * alt3_byob + b_size * alt3_size + b_holidays * alt3_holidays + b_price * alt3_price
    
    V[['alt4']] = 0
    
    ### Define settings for MNL model component
    mnl_settings = list(
      alternatives  = c(alt1=1, alt2=2, alt3=3, alt4=4) ,
      avail         = 1, # all alternatives are available in every choice
      choiceVar     = choice,
      V             = V  # tell function to use list vector defined above
      
    )
    
    ### Compute probabilities using MNL model
    P[['model']] = apollo_mnl(mnl_settings, functionality)
    
    ### Take product across observation for same individual
    P = apollo_panelProd(P, apollo_inputs, functionality)
    
    ### Average across inter-individual draws - nur bei Mixed Logit!
    ### P = apollo_avgInterDraws(P, apollo_inputs, functionality)
    
    ### Prepare and return outputs of function
    P = apollo_prepareProb(P, apollo_inputs, functionality)
    return(P)
  }
  
  model1 = apollo_estimate(apollo_beta, apollo_fixed,
                                  apollo_probabilities, apollo_inputs, 
                                  estimate_settings=list(hessianRoutine="maxLik"))

## Preparing user-defined functions.
## 
## Testing likelihood function...
## 
## Overview of choices for MNL model component :
##                                      alt1     alt2     alt3     alt4
## Times available                  10000.00 10000.00 10000.00 10000.00
## Times chosen                      3587.00  2929.00  3065.00   419.00
## Percentage chosen overall           35.87    29.29    30.65     4.19
## Percentage chosen when available    35.87    29.29    30.65     4.19
## 
## 
## Pre-processing likelihood function...
## 
## Testing influence of parameters
## Starting main estimation
## 
## BGW using analytic model derivatives supplied by caller...
## 
## 
## Iterates will be written to: 
##  /home/dj44vuri/Nextcloud/phd course uppsala 2024/Modelestimation/Simulated Data_iterations.csv
##     it    nf     F            RELDF    PRELDF    RELDX    MODEL stppar
##      0     1 1.073609084e+04
##      1     5 1.011300467e+04 5.804e-02 5.477e-02 4.20e-02   G   2.03e+00
##      2     6 8.946057602e+03 1.154e-01 1.131e-01 1.66e-01   G   1.45e-01
##      3     7 8.765877393e+03 2.014e-02 2.052e-02 1.31e-01   G   4.71e-03
##      4     8 8.760935004e+03 5.638e-04 5.500e-04 6.19e-02   G   0.00e+00
##      5     9 8.760930722e+03 4.888e-07 4.836e-07 3.59e-04   G   0.00e+00
##      6    10 8.760930720e+03 2.129e-10 2.139e-10 3.14e-05   S   0.00e+00
##      7    11 8.760930720e+03 8.658e-14 1.081e-13 5.96e-07   S   0.00e+00
## 
## ***** Relative function convergence *****
## 
## Estimated parameters with approximate standard errors from BHHH matrix:
##               Estimate     BHHH se BHH t-ratio (0)
## b_wyld        0.935061     0.16329           5.726
## b_cringe      0.981856     0.16550           5.933
## b_sheesh      0.974214     0.16359           5.955
## b_lineup      0.601917     0.01092          55.097
## b_byob        1.001380     0.03346          29.927
## b_size       -0.098638     0.01313          -7.510
## b_holidays   -0.401710     0.01142         -35.177
## b_price      -0.009955  2.4689e-04         -40.320
## 
## Final LL: -8760.9307
## 
## Calculating log-likelihood at equal shares (LL(0)) for applicable
##   models...
## Calculating log-likelihood at observed shares from estimation data
##   (LL(c)) for applicable models...
## Calculating LL of each model component...
## Calculating other model fit measures
## Computing covariance matrix using numerical methods (maxLik). This may
##   take a while, no progress bar displayed.
## Negative definite Hessian with maximum eigenvalue: -12.491413
## Computing score matrix...
## 
## Your model was estimated using the BGW algorithm. Please acknowledge
##   this by citing Bunch et al. (1993) - DOI 10.1145/151271.151279

And look at the model. Remember the true parameters are:

bwyld = 2 bcringe = 0.5 bsheesh = 0 blineup = 0.6 bbyob = 1 bsize = -0.01 bholidays = -0.4 bprice = -0.01

kable(apollo_modelOutput(model1, modelOutput_settings = list(printPVal=T)), digits = 3) %>% kable_styling()

## Model run by dj44vuri using Apollo 0.3.4 on R 4.4.1 for Linux.
## Please acknowledge the use of Apollo by citing Hess & Palma (2019)
##   DOI 10.1016/j.jocm.2019.100170
##   www.ApolloChoiceModelling.com
## 
## Model name                                  : Simulated Data
## Model description                           : Simple MNL model
## Model run at                                : 2024-11-20 13:32:24.86852
## Estimation method                           : bgw
## Model diagnosis                             : Relative function convergence
## Optimisation diagnosis                      : Maximum found
##      hessian properties                     : Negative definite
##      maximum eigenvalue                     : -12.491413
##      reciprocal of condition number         : 3.8169e-07
## Number of individuals                       : 1000
## Number of rows in database                  : 10000
## Number of modelled outcomes                 : 10000
## 
## Number of cores used                        :  1 
## Model without mixing
## 
## LL(start)                                   : -10736.09
## LL at equal shares, LL(0)                   : -13862.94
## LL at observed shares, LL(C)                : -12227.97
## LL(final)                                   : -8760.93
## Rho-squared vs equal shares                  :  0.368 
## Adj.Rho-squared vs equal shares              :  0.3675 
## Rho-squared vs observed shares               :  0.2835 
## Adj.Rho-squared vs observed shares           :  0.2831 
## AIC                                         :  17537.86 
## BIC                                         :  17595.54 
## 
## Estimated parameters                        : 8
## Time taken (hh:mm:ss)                       :  00:00:1.81 
##      pre-estimation                         :  00:00:0.35 
##      estimation                             :  00:00:0.41 
##      post-estimation                        :  00:00:1.04 
## Iterations                                  :  7  
## 
## Unconstrained optimisation.
## 
## Estimates:
##               Estimate        s.e.   t.rat.(0)  p(1-sided)    Rob.s.e.
## b_wyld        0.935061     0.16351       5.719   5.372e-09     0.16329
## b_cringe      0.981856     0.16558       5.930   1.516e-09     0.16374
## b_sheesh      0.974214     0.16404       5.939   1.435e-09     0.16348
## b_lineup      0.601917     0.01074      56.070       0.000     0.01009
## b_byob        1.001380     0.03336      30.020       0.000     0.03287
## b_size       -0.098638     0.01330      -7.415   6.062e-14     0.01346
## b_holidays   -0.401710     0.01129     -35.580       0.000     0.01164
## b_price      -0.009955  2.4668e-04     -40.354       0.000  2.4505e-04
##            Rob.t.rat.(0)  p(1-sided)
## b_wyld             5.726   5.128e-09
## b_cringe           5.997   1.008e-09
## b_sheesh           5.959   1.268e-09
## b_lineup          59.632       0.000
## b_byob            30.461       0.000
## b_size            -7.330   1.149e-13
## b_holidays       -34.514       0.000
## b_price          -40.623       0.000

	Estimate	s.e.	t.rat.(0)	Rob.s.e.	Rob.t.rat.(0)
b_wyld	0.935	0.164	5.719	0.163	5.726
b_cringe	0.982	0.166	5.930	0.164	5.997
b_sheesh	0.974	0.164	5.939	0.163	5.959
b_lineup	0.602	0.011	56.070	0.010	59.632
b_byob	1.001	0.033	30.020	0.033	30.461
b_size	-0.099	0.013	-7.415	0.013	-7.330
b_holidays	-0.402	0.011	-35.580	0.012	-34.514
b_price	-0.010	0.000	-40.354	0.000	-40.623

Excersise:

We can now play around with the parameters from above and see how that affects the results. For example, change the betas, samples size, delete specific choice sets in the design etc.

library(ggplot2)

## 
## Attaching package: 'ggplot2'

## The following objects are masked from 'package:psych':
## 
##     %+%, alpha

# Define the constant b
b <- model1$estimate["b_lineup"]
p <- model1$estimate["b_price"]

# Create a sequence of x values from 1 to 70000
x_values <- seq(1, 200, length.out = 1000)

# Calculate f(x) = b * ln(x) and f'(x) = b / x
data <- data.frame(
  x = x_values,
  f_x = b * log(x_values),  
  mWTP = - b / (x_values*p)
)

# Create the ggplot
ggplot(data, aes(x = x)) +
  geom_line(aes(y = f_x, color = "Utility")) +
  geom_line(aes(y = mWTP, color = "mWTP")) +
  labs(title = "Utility from festival size",
       x = "Number of people",
       y = "Utility/mWTP",
       color = "Function") +
  scale_color_manual(values = c("Utility" = "blue", "mWTP" = "red")) +
  theme_minimal()

Simulation Basic Model

Preamble

Reading in a design and prepare for simulation

Setting up parameters for simulation

Generate dataset

Simulate choices

Excersise:

Exercise

Estimate a conditional logit model

Excersise: