We will start estimating models. As you have already experienced a bit of R, we jump start into the first part of our choice experiment analysis.
We continue with the example of the festival. We added one more alternative, that is “no festival”.
First we read in packages, set the seed and clear output. The seed is determine the draws from a random number generator. We need random numbers when we simulate the unobserved part of utility. Using the seed, you will always get the exact same random numbers.
rm(list=ls())
library("readr")
library("psych")
library("dplyr")
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library("evd")
library("apollo")
##
##
## . ,,
## , ,,
## ,,,,,, , ,,
## , ,, , ,,,,.
## ,, , ,, ,,,,,, ,,, // //
## , ,,,. ,,,,,. ,, //// // //
## ,, ,,,,,. ,, // // ////// ///// // // /////
## ,,, ,, , // // / // // // // // // //
## ,, , //////// / // // // // // // //
## ,, ,, // // / /// // // // // // //
## , // // ///// /// // // ///
## //
## //
##
## Apollo 0.3.4
## http://www.ApolloChoiceModelling.com
## See url for a detailed manual, examples and a user forum.
## Sign up to the user forum to receive updates on new releases.
##
## Please cite Apollo in all written material you produce:
## Hess S, Palma D (2019). "Apollo: a flexible, powerful and customisable
## freeware package for choice model estimation and application." Journal
## of Choice Modelling, 32. doi:10.1016/j.jocm.2019.100170
library("tidyr")
library("kableExtra")
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
library("tidylog")
##
## Attaching package: 'tidylog'
## The following objects are masked from 'package:tidyr':
##
## drop_na, fill, gather, pivot_longer, pivot_wider, replace_na,
## spread, uncount
## The following objects are masked from 'package:dplyr':
##
## add_count, add_tally, anti_join, count, distinct, distinct_all,
## distinct_at, distinct_if, filter, filter_all, filter_at, filter_if,
## full_join, group_by, group_by_all, group_by_at, group_by_if,
## inner_join, left_join, mutate, mutate_all, mutate_at, mutate_if,
## relocate, rename, rename_all, rename_at, rename_if, rename_with,
## right_join, sample_frac, sample_n, select, select_all, select_at,
## select_if, semi_join, slice, slice_head, slice_max, slice_min,
## slice_sample, slice_tail, summarise, summarise_all, summarise_at,
## summarise_if, summarize, summarize_all, summarize_at, summarize_if,
## tally, top_frac, top_n, transmute, transmute_all, transmute_at,
## transmute_if, ungroup
## The following object is masked from 'package:stats':
##
## filter
set.seed(48498877)
We use a design generated in NGENE. The design is in wide format,
which we will need to estimate models in the R package apollo. However,
designs may come in other formats and if you use other packages
(e.g. gmnl
), you would need to reshape to long.
designfile = "design_festivals.RDS"
design <- readRDS(designfile) %>%
mutate(Choice.situation = row_number())
## mutate: new variable 'Choice.situation' (integer) with 20 unique values and 0% NA
kable(design) %>% kable_styling() ## Print the design. We use the kable function to make it look nice.
alt1_lineup | alt1_byob | alt1_size | alt1_holidays | alt1_price | alt2_lineup | alt2_byob | alt2_size | alt2_holidays | alt2_price | alt3_lineup | alt3_byob | alt3_size | alt3_holidays | alt3_price | block | Choice.situation |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
9 | 0 | 11.156250 | 1 | 100 | 5 | 0 | 8.006368 | 5 | 150 | 7 | 1 | 11.156250 | 5 | 50 | 2 | 1 |
7 | 1 | 8.987197 | 5 | 50 | 7 | 0 | 8.987197 | 5 | 150 | 5 | 1 | 8.006368 | 1 | 250 | 1 | 2 |
7 | 0 | 8.006368 | 1 | 100 | 5 | 0 | 11.156250 | 5 | 50 | 5 | 0 | 8.006368 | 5 | 250 | 1 | 3 |
5 | 1 | 8.006368 | 2 | 100 | 9 | 1 | 8.987197 | 5 | 200 | 7 | 1 | 11.156250 | 5 | 150 | 1 | 4 |
9 | 1 | 8.987197 | 1 | 250 | 9 | 0 | 8.987197 | 1 | 200 | 9 | 1 | 8.006368 | 5 | 200 | 1 | 5 |
9 | 0 | 8.987197 | 2 | 250 | 5 | 0 | 8.987197 | 5 | 150 | 9 | 0 | 11.156250 | 5 | 100 | 1 | 6 |
5 | 1 | 11.156250 | 2 | 200 | 5 | 1 | 8.987197 | 5 | 50 | 7 | 0 | 11.156250 | 2 | 150 | 1 | 7 |
5 | 1 | 11.156250 | 1 | 150 | 9 | 0 | 8.006368 | 2 | 150 | 9 | 0 | 8.987197 | 2 | 50 | 1 | 8 |
9 | 1 | 11.156250 | 5 | 50 | 5 | 0 | 11.156250 | 1 | 150 | 5 | 0 | 8.987197 | 1 | 150 | 2 | 9 |
5 | 0 | 11.156250 | 2 | 150 | 7 | 1 | 11.156250 | 5 | 200 | 9 | 0 | 8.987197 | 5 | 250 | 2 | 10 |
5 | 1 | 8.006368 | 5 | 50 | 9 | 0 | 8.987197 | 2 | 150 | 7 | 0 | 8.006368 | 1 | 200 | 2 | 11 |
5 | 1 | 8.987197 | 2 | 150 | 9 | 0 | 8.006368 | 1 | 50 | 9 | 0 | 11.156250 | 5 | 100 | 2 | 12 |
7 | 0 | 8.006368 | 2 | 150 | 7 | 1 | 8.987197 | 5 | 50 | 7 | 0 | 8.006368 | 2 | 250 | 2 | 13 |
5 | 0 | 8.987197 | 2 | 250 | 5 | 0 | 8.987197 | 2 | 50 | 7 | 1 | 8.006368 | 1 | 250 | 1 | 14 |
7 | 1 | 11.156250 | 5 | 200 | 7 | 0 | 8.987197 | 5 | 150 | 9 | 1 | 8.006368 | 5 | 150 | 2 | 15 |
5 | 1 | 8.006368 | 2 | 200 | 7 | 0 | 8.006368 | 2 | 200 | 7 | 1 | 8.006368 | 1 | 150 | 2 | 16 |
5 | 1 | 11.156250 | 1 | 250 | 7 | 0 | 8.006368 | 5 | 200 | 5 | 1 | 8.987197 | 2 | 150 | 2 | 17 |
7 | 0 | 8.006368 | 1 | 250 | 5 | 0 | 11.156250 | 2 | 150 | 5 | 0 | 8.987197 | 5 | 100 | 2 | 18 |
5 | 0 | 11.156250 | 5 | 200 | 9 | 0 | 8.987197 | 2 | 100 | 5 | 1 | 8.006368 | 2 | 250 | 1 | 19 |
9 | 1 | 11.156250 | 2 | 100 | 5 | 1 | 8.987197 | 2 | 200 | 9 | 0 | 8.987197 | 5 | 50 | 1 | 20 |
Have a look at the output. Each row is one choice set. The design
consists of 20 choice sets, separated into 2 blocks. There are three
alternatives and an opt out option. The opt out option is not part of
the design, so we do not see it. Each alternative contains 5 attributes
of which .price
is the festival price.
In this step we define the coefficients (betas) for the utility function and define other parameters such as number of simulated responses.
The betas can be made up by us and varied as desired. However, note that higher beta values mean lower error variance and thus more significant values. For example, if you use beta values in the range of 100 to 200, you will get significant results already with 10 respondents. Can you explain why this is the case?
bwyld = 1
bcringe = 1
bsheesh = 1
blineup = 0.6
bbyob = 1
bsize = -0.1
bholidays = -0.4
bprice = -0.01
Next, we define some meta parameters for the simulation. The number of replications can be set by us.
replications <-500 # number of respondents in the simulation. You can increase this number later on and see how the standard errors go down
nsets<-nrow(design) ## Extract the number of choice sets from the design
nblocks<-max(design$block) ## Extract the number of blocks from the design
setpp <- nsets/nblocks # Extract the Choice Sets per respondent; in this 'no blocks' design everyone sees all 24 sets
respondents <- replications*nblocks
We will now generate a dataset based on the parameters above. We will duplicate the design according to the number of replications, create a variable identifying the the respondents (for panel data analysis), and create deterministic utility.
Utility for alternative 1 is defined as
\[ V_1 = b_{wyld} + b_{lineup}*alt1.lineup + b_{byob} * alt1.byob + b_{size} * alt1.size + b_{holidays}*alt1.holidays + b_{price} * alt1.price \] and for alternative 2
\[ V_2 = b_{cringe} + b_{lineup}*alt2.lineup + b_{byob} * alt2.byob + b_{size} * alt2.size + b_{holidays}*alt2.holidays + b_{price} * alt2.price \] and for alternative 3
\[ V_3 = b_{sheesh} + b_{lineup}*alt3.lineup + b_{byob} * alt3.byob + b_{size} * alt3.size + b_{holidays}*alt3.holidays + b_{price} * alt3.price \]
The Utility of the opt out is set to zero
\[ V_{optout} = 0 \]
database<- design %>%
arrange(block,Choice.situation) %>%
dplyr::slice(rep(row_number(), replications)) %>% ## replicate design according to number of replications
dplyr::mutate(RID = rep(1:respondents, each=setpp)) %>% # create Respondent ID.
dplyr::relocate(RID,`Choice.situation`) %>%
dplyr::mutate(
V.1 = bwyld + blineup*alt1_lineup + bbyob * alt1_byob + bsize * alt1_size + bholidays * alt1_holidays + bprice * alt1_price , #Utility of Wyld
V.2 = bcringe + blineup*alt2_lineup + bbyob * alt2_byob + bsize * alt2_size + bholidays * alt2_holidays + bprice * alt2_price , #Utility of Cringe
V.3 = bsheesh + blineup*alt3_lineup + bbyob * alt3_byob + bsize * alt3_size + bholidays * alt3_holidays + bprice * alt3_price , #Utility of Sheesh
V.4 = 0 ) %>% # utility of opt out, normalized to zero
as.data.frame()
kable(head(database,20),digits = 2) %>% kable_styling()
RID | Choice.situation | alt1_lineup | alt1_byob | alt1_size | alt1_holidays | alt1_price | alt2_lineup | alt2_byob | alt2_size | alt2_holidays | alt2_price | alt3_lineup | alt3_byob | alt3_size | alt3_holidays | alt3_price | block | V.1 | V.2 | V.3 | V.4 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 7 | 1 | 8.99 | 5 | 50 | 7 | 0 | 8.99 | 5 | 150 | 5 | 1 | 8.01 | 1 | 250 | 1 | 2.80 | 0.80 | 1.30 | 0 |
1 | 3 | 7 | 0 | 8.01 | 1 | 100 | 5 | 0 | 11.16 | 5 | 50 | 5 | 0 | 8.01 | 5 | 250 | 1 | 3.00 | 0.38 | -1.30 | 0 |
1 | 4 | 5 | 1 | 8.01 | 2 | 100 | 9 | 1 | 8.99 | 5 | 200 | 7 | 1 | 11.16 | 5 | 150 | 1 | 2.40 | 2.50 | 1.58 | 0 |
1 | 5 | 9 | 1 | 8.99 | 1 | 250 | 9 | 0 | 8.99 | 1 | 200 | 9 | 1 | 8.01 | 5 | 200 | 1 | 3.60 | 3.10 | 2.60 | 0 |
1 | 6 | 9 | 0 | 8.99 | 2 | 250 | 5 | 0 | 8.99 | 5 | 150 | 9 | 0 | 11.16 | 5 | 100 | 1 | 2.20 | -0.40 | 2.28 | 0 |
1 | 7 | 5 | 1 | 11.16 | 2 | 200 | 5 | 1 | 8.99 | 5 | 50 | 7 | 0 | 11.16 | 2 | 150 | 1 | 1.08 | 1.60 | 1.78 | 0 |
1 | 8 | 5 | 1 | 11.16 | 1 | 150 | 9 | 0 | 8.01 | 2 | 150 | 9 | 0 | 8.99 | 2 | 50 | 1 | 1.98 | 3.30 | 4.20 | 0 |
1 | 14 | 5 | 0 | 8.99 | 2 | 250 | 5 | 0 | 8.99 | 2 | 50 | 7 | 1 | 8.01 | 1 | 250 | 1 | -0.20 | 1.80 | 2.50 | 0 |
1 | 19 | 5 | 0 | 11.16 | 5 | 200 | 9 | 0 | 8.99 | 2 | 100 | 5 | 1 | 8.01 | 2 | 250 | 1 | -1.12 | 3.70 | 0.90 | 0 |
1 | 20 | 9 | 1 | 11.16 | 2 | 100 | 5 | 1 | 8.99 | 2 | 200 | 9 | 0 | 8.99 | 5 | 50 | 1 | 4.48 | 1.30 | 3.00 | 0 |
2 | 1 | 9 | 0 | 11.16 | 1 | 100 | 5 | 0 | 8.01 | 5 | 150 | 7 | 1 | 11.16 | 5 | 50 | 2 | 3.88 | -0.30 | 2.58 | 0 |
2 | 9 | 9 | 1 | 11.16 | 5 | 50 | 5 | 0 | 11.16 | 1 | 150 | 5 | 0 | 8.99 | 1 | 150 | 2 | 3.78 | 0.98 | 1.20 | 0 |
2 | 10 | 5 | 0 | 11.16 | 2 | 150 | 7 | 1 | 11.16 | 5 | 200 | 9 | 0 | 8.99 | 5 | 250 | 2 | 0.58 | 1.08 | 1.00 | 0 |
2 | 11 | 5 | 1 | 8.01 | 5 | 50 | 9 | 0 | 8.99 | 2 | 150 | 7 | 0 | 8.01 | 1 | 200 | 2 | 1.70 | 3.20 | 2.00 | 0 |
2 | 12 | 5 | 1 | 8.99 | 2 | 150 | 9 | 0 | 8.01 | 1 | 50 | 9 | 0 | 11.16 | 5 | 100 | 2 | 1.80 | 4.70 | 2.28 | 0 |
2 | 13 | 7 | 0 | 8.01 | 2 | 150 | 7 | 1 | 8.99 | 5 | 50 | 7 | 0 | 8.01 | 2 | 250 | 2 | 2.10 | 2.80 | 1.10 | 0 |
2 | 15 | 7 | 1 | 11.16 | 5 | 200 | 7 | 0 | 8.99 | 5 | 150 | 9 | 1 | 8.01 | 5 | 150 | 2 | 1.08 | 0.80 | 3.10 | 0 |
2 | 16 | 5 | 1 | 8.01 | 2 | 200 | 7 | 0 | 8.01 | 2 | 200 | 7 | 1 | 8.01 | 1 | 150 | 2 | 1.40 | 1.60 | 3.50 | 0 |
2 | 17 | 5 | 1 | 11.16 | 1 | 250 | 7 | 0 | 8.01 | 5 | 200 | 5 | 1 | 8.99 | 2 | 150 | 2 | 0.98 | 0.40 | 1.80 | 0 |
2 | 18 | 7 | 0 | 8.01 | 1 | 250 | 5 | 0 | 11.16 | 2 | 150 | 5 | 0 | 8.99 | 5 | 100 | 2 | 1.50 | 0.58 | 0.10 | 0 |
Now look at the database. It contains the deterministic part of utility. This is exactly what we looked at in the theory slides.
So far, we have created a dataset with deterministic utility. In the Random Utility model, we assume that, besides the deterministic part \(V\), utility \(U\) has an unobserved part as well. For the logit model, we assume an additive error term \(\epsilon\) which is iid Gumbel distributed.
\[U = V+ \epsilon\]
We can take draws from the Gumbel distribution using the function
rgumbel
. Before we do this, lets try to understand this
distribution and how it results in a logistic distribution.
Use the function rgumbel
and take 1000 random draws and
store the results in an object e1
. Take another 1000 draws
and store them in e2
. Plot the result.
database <- database %>%
group_by(RID) %>%
mutate(
e.1 = rgumbel(setpp,loc=0, scale=1) , # random draws from gumbel
e.2 = rgumbel(setpp,loc=0, scale=1) ,
e.3 = rgumbel(setpp,loc=0, scale=1) ,
e.4 = rgumbel(setpp,loc=0, scale=1) ,
U.1 = V.1 + e.1 , # calculate total utility
U.2 = V.2 + e.2 ,
U.3 = V.3 + e.3 ,
U.4 = V.4 + e.4
) %>%
as.data.frame()
## group_by: one grouping variable (RID)
## mutate (grouped): new variable 'e.1' (double) with 10,000 unique values and 0% NA
## new variable 'e.2' (double) with 10,000 unique values and 0% NA
## new variable 'e.3' (double) with 10,000 unique values and 0% NA
## new variable 'e.4' (double) with 10,000 unique values and 0% NA
## new variable 'U.1' (double) with 10,000 unique values and 0% NA
## new variable 'U.2' (double) with 10,000 unique values and 0% NA
## new variable 'U.3' (double) with 10,000 unique values and 0% NA
## new variable 'U.4' (double) with 10,000 unique values and 0% NA
database$choice <- max.col(database[,c("U.1" , "U.2" , "U.3" , "U.4" )]) # Now, the choice variable
We have now a dataset that looks like a real world dataset. Yet, some variables would be unknown in real data. Which ones?
kable(head(database, 9), digits = 2) %>% kable_styling()
RID | Choice.situation | alt1_lineup | alt1_byob | alt1_size | alt1_holidays | alt1_price | alt2_lineup | alt2_byob | alt2_size | alt2_holidays | alt2_price | alt3_lineup | alt3_byob | alt3_size | alt3_holidays | alt3_price | block | V.1 | V.2 | V.3 | V.4 | e.1 | e.2 | e.3 | e.4 | U.1 | U.2 | U.3 | U.4 | choice |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 7 | 1 | 8.99 | 5 | 50 | 7 | 0 | 8.99 | 5 | 150 | 5 | 1 | 8.01 | 1 | 250 | 1 | 2.80 | 0.80 | 1.30 | 0 | 1.08 | 0.74 | -0.04 | 0.10 | 3.88 | 1.54 | 1.26 | 0.10 | 1 |
1 | 3 | 7 | 0 | 8.01 | 1 | 100 | 5 | 0 | 11.16 | 5 | 50 | 5 | 0 | 8.01 | 5 | 250 | 1 | 3.00 | 0.38 | -1.30 | 0 | 0.13 | 0.32 | -0.16 | -0.80 | 3.13 | 0.70 | -1.46 | -0.80 | 1 |
1 | 4 | 5 | 1 | 8.01 | 2 | 100 | 9 | 1 | 8.99 | 5 | 200 | 7 | 1 | 11.16 | 5 | 150 | 1 | 2.40 | 2.50 | 1.58 | 0 | 0.36 | -0.55 | 0.88 | -0.93 | 2.76 | 1.95 | 2.46 | -0.93 | 1 |
1 | 5 | 9 | 1 | 8.99 | 1 | 250 | 9 | 0 | 8.99 | 1 | 200 | 9 | 1 | 8.01 | 5 | 200 | 1 | 3.60 | 3.10 | 2.60 | 0 | 0.87 | -0.24 | -1.31 | -0.75 | 4.47 | 2.86 | 1.29 | -0.75 | 1 |
1 | 6 | 9 | 0 | 8.99 | 2 | 250 | 5 | 0 | 8.99 | 5 | 150 | 9 | 0 | 11.16 | 5 | 100 | 1 | 2.20 | -0.40 | 2.28 | 0 | 0.90 | -1.22 | -0.66 | 0.02 | 3.11 | -1.62 | 1.62 | 0.02 | 1 |
1 | 7 | 5 | 1 | 11.16 | 2 | 200 | 5 | 1 | 8.99 | 5 | 50 | 7 | 0 | 11.16 | 2 | 150 | 1 | 1.08 | 1.60 | 1.78 | 0 | 2.27 | -0.13 | -0.52 | 0.95 | 3.36 | 1.48 | 1.26 | 0.95 | 1 |
1 | 8 | 5 | 1 | 11.16 | 1 | 150 | 9 | 0 | 8.01 | 2 | 150 | 9 | 0 | 8.99 | 2 | 50 | 1 | 1.98 | 3.30 | 4.20 | 0 | 2.61 | 0.13 | -0.07 | -0.41 | 4.59 | 3.43 | 4.13 | -0.41 | 1 |
1 | 14 | 5 | 0 | 8.99 | 2 | 250 | 5 | 0 | 8.99 | 2 | 50 | 7 | 1 | 8.01 | 1 | 250 | 1 | -0.20 | 1.80 | 2.50 | 0 | -1.12 | 1.36 | -0.69 | 0.59 | -1.32 | 3.16 | 1.81 | 0.59 | 2 |
1 | 19 | 5 | 0 | 11.16 | 5 | 200 | 9 | 0 | 8.99 | 2 | 100 | 5 | 1 | 8.01 | 2 | 250 | 1 | -1.12 | 3.70 | 0.90 | 0 | -0.66 | -0.22 | -0.31 | 0.33 | -1.78 | 3.48 | 0.59 | 0.33 | 2 |
We can inspect if the simulation is correct. Go to wikipedia and find out if the mean and standard deviation of the theoretical Gumbel distribution is approximately equal to to means and standard deviations of the empirical (simulated) distributions.
summary(database[,c("e.1","e.2","e.3","e.4")])
## e.1 e.2 e.3 e.4
## Min. :-2.2601 Min. :-2.1620 Min. :-2.1961 Min. :-2.1932
## 1st Qu.:-0.3345 1st Qu.:-0.3209 1st Qu.:-0.3110 1st Qu.:-0.3339
## Median : 0.3731 Median : 0.3590 Median : 0.3689 Median : 0.3780
## Mean : 0.5728 Mean : 0.5851 Mean : 0.5791 Mean : 0.5830
## 3rd Qu.: 1.2449 3rd Qu.: 1.2568 3rd Qu.: 1.2487 3rd Qu.: 1.2748
## Max. : 7.7541 Max. : 9.1418 Max. :10.9442 Max. :10.1492
sd(database$e.2)
## [1] 1.282901
summary(database[,c("V.1","V.2","V.3","V.4")])
## V.1 V.2 V.3 V.4
## Min. :-1.116 Min. :-0.3987 Min. :-1.301 Min. :0
## 1st Qu.: 1.084 1st Qu.: 0.7471 1st Qu.: 1.176 1st Qu.:0
## Median : 1.893 Median : 1.4503 Median : 1.900 Median :0
## Mean : 1.953 Mean : 1.6974 Mean : 1.876 Mean :0
## 3rd Qu.: 2.851 3rd Qu.: 2.8763 3rd Qu.: 2.588 3rd Qu.:0
## Max. : 4.484 Max. : 4.6994 Max. : 4.201 Max. :0
summary(database[,c("U.1","U.2","U.3","U.4")])
## U.1 U.2 U.3 U.4
## Min. :-3.121 Min. :-2.3109 Min. :-3.377 Min. :-2.1932
## 1st Qu.: 1.249 1st Qu.: 0.8803 1st Qu.: 1.347 1st Qu.:-0.3339
## Median : 2.428 Median : 2.1178 Median : 2.385 Median : 0.3780
## Mean : 2.526 Mean : 2.2825 Mean : 2.455 Mean : 0.5830
## 3rd Qu.: 3.705 3rd Qu.: 3.5132 3rd Qu.: 3.500 3rd Qu.: 1.2748
## Max. :10.925 Max. :10.5040 Max. :14.444 Max. :10.1492
Finally, we can delete those variables that are not visible in a real world dataset
database <- database %>%
select(-c(matches("\\.[1-4]$")))
## select: dropped 12 variables (V.1, V.2, V.3, V.4, e.1, …)
Having generated the dataset, we can now estimate a model. Here we use apollo to estimate a conditional logit model.
apollo_initialise()
modelOutput_settings = list(printPVal=T)
### Set core controls
apollo_control = list(
modelName ="Simulated Data",
modelDescr ="Simple MNL model",
indivID ="RID"
)
apollo_beta=c(b_wyld =0,
b_cringe =0,
b_sheesh =0,
b_lineup =0.5,
b_byob =0,
b_size =0,
b_holidays =0,
b_price =0 )
### keine Parameter fix halten
apollo_fixed = c()
### validieren
apollo_inputs = apollo_validateInputs()
## Several observations per individual detected based on the value of RID.
## Setting panelData in apollo_control set to TRUE.
## All checks on apollo_control completed.
## All checks on database completed.
apollo_probabilities=function(apollo_beta, apollo_inputs, functionality="estimate"){
### Function initialisation: do not change the following three commands
### Attach inputs and detach after function exit
apollo_attach(apollo_beta, apollo_inputs)
on.exit(apollo_detach(apollo_beta, apollo_inputs))
### Create list of probabilities P
P = list()
### List of utilities (later integrated in mnl_settings below)
V = list()
V[['alt1']] = b_wyld + b_lineup*alt1_lineup + b_byob * alt1_byob + b_size * alt1_size + b_holidays * alt1_holidays + b_price * alt1_price
V[['alt2']] = b_cringe + b_lineup*alt2_lineup + b_byob * alt2_byob + b_size * alt2_size + b_holidays * alt2_holidays + b_price * alt2_price
V[['alt3']] = b_sheesh + b_lineup*alt3_lineup + b_byob * alt3_byob + b_size * alt3_size + b_holidays * alt3_holidays + b_price * alt3_price
V[['alt4']] = 0
### Define settings for MNL model component
mnl_settings = list(
alternatives = c(alt1=1, alt2=2, alt3=3, alt4=4) ,
avail = 1, # all alternatives are available in every choice
choiceVar = choice,
V = V # tell function to use list vector defined above
)
### Compute probabilities using MNL model
P[['model']] = apollo_mnl(mnl_settings, functionality)
### Take product across observation for same individual
P = apollo_panelProd(P, apollo_inputs, functionality)
### Average across inter-individual draws - nur bei Mixed Logit!
### P = apollo_avgInterDraws(P, apollo_inputs, functionality)
### Prepare and return outputs of function
P = apollo_prepareProb(P, apollo_inputs, functionality)
return(P)
}
model1 = apollo_estimate(apollo_beta, apollo_fixed,
apollo_probabilities, apollo_inputs,
estimate_settings=list(hessianRoutine="maxLik"))
## Preparing user-defined functions.
##
## Testing likelihood function...
##
## Overview of choices for MNL model component :
## alt1 alt2 alt3 alt4
## Times available 10000.00 10000.00 10000.00 10000.00
## Times chosen 3587.00 2929.00 3065.00 419.00
## Percentage chosen overall 35.87 29.29 30.65 4.19
## Percentage chosen when available 35.87 29.29 30.65 4.19
##
##
## Pre-processing likelihood function...
##
## Testing influence of parameters
## Starting main estimation
##
## BGW using analytic model derivatives supplied by caller...
##
##
## Iterates will be written to:
## /home/dj44vuri/Nextcloud/phd course uppsala 2024/Modelestimation/Simulated Data_iterations.csv
## it nf F RELDF PRELDF RELDX MODEL stppar
## 0 1 1.073609084e+04
## 1 5 1.011300467e+04 5.804e-02 5.477e-02 4.20e-02 G 2.03e+00
## 2 6 8.946057602e+03 1.154e-01 1.131e-01 1.66e-01 G 1.45e-01
## 3 7 8.765877393e+03 2.014e-02 2.052e-02 1.31e-01 G 4.71e-03
## 4 8 8.760935004e+03 5.638e-04 5.500e-04 6.19e-02 G 0.00e+00
## 5 9 8.760930722e+03 4.888e-07 4.836e-07 3.59e-04 G 0.00e+00
## 6 10 8.760930720e+03 2.129e-10 2.139e-10 3.14e-05 S 0.00e+00
## 7 11 8.760930720e+03 8.658e-14 1.081e-13 5.96e-07 S 0.00e+00
##
## ***** Relative function convergence *****
##
## Estimated parameters with approximate standard errors from BHHH matrix:
## Estimate BHHH se BHH t-ratio (0)
## b_wyld 0.935061 0.16329 5.726
## b_cringe 0.981856 0.16550 5.933
## b_sheesh 0.974214 0.16359 5.955
## b_lineup 0.601917 0.01092 55.097
## b_byob 1.001380 0.03346 29.927
## b_size -0.098638 0.01313 -7.510
## b_holidays -0.401710 0.01142 -35.177
## b_price -0.009955 2.4689e-04 -40.320
##
## Final LL: -8760.9307
##
## Calculating log-likelihood at equal shares (LL(0)) for applicable
## models...
## Calculating log-likelihood at observed shares from estimation data
## (LL(c)) for applicable models...
## Calculating LL of each model component...
## Calculating other model fit measures
## Computing covariance matrix using numerical methods (maxLik). This may
## take a while, no progress bar displayed.
## Negative definite Hessian with maximum eigenvalue: -12.491413
## Computing score matrix...
##
## Your model was estimated using the BGW algorithm. Please acknowledge
## this by citing Bunch et al. (1993) - DOI 10.1145/151271.151279
And look at the model. Remember the true parameters are:
bwyld = 2 bcringe = 0.5 bsheesh = 0 blineup = 0.6 bbyob = 1 bsize = -0.01 bholidays = -0.4 bprice = -0.01
kable(apollo_modelOutput(model1, modelOutput_settings = list(printPVal=T)), digits = 3) %>% kable_styling()
## Model run by dj44vuri using Apollo 0.3.4 on R 4.4.1 for Linux.
## Please acknowledge the use of Apollo by citing Hess & Palma (2019)
## DOI 10.1016/j.jocm.2019.100170
## www.ApolloChoiceModelling.com
##
## Model name : Simulated Data
## Model description : Simple MNL model
## Model run at : 2024-11-20 13:32:24.86852
## Estimation method : bgw
## Model diagnosis : Relative function convergence
## Optimisation diagnosis : Maximum found
## hessian properties : Negative definite
## maximum eigenvalue : -12.491413
## reciprocal of condition number : 3.8169e-07
## Number of individuals : 1000
## Number of rows in database : 10000
## Number of modelled outcomes : 10000
##
## Number of cores used : 1
## Model without mixing
##
## LL(start) : -10736.09
## LL at equal shares, LL(0) : -13862.94
## LL at observed shares, LL(C) : -12227.97
## LL(final) : -8760.93
## Rho-squared vs equal shares : 0.368
## Adj.Rho-squared vs equal shares : 0.3675
## Rho-squared vs observed shares : 0.2835
## Adj.Rho-squared vs observed shares : 0.2831
## AIC : 17537.86
## BIC : 17595.54
##
## Estimated parameters : 8
## Time taken (hh:mm:ss) : 00:00:1.81
## pre-estimation : 00:00:0.35
## estimation : 00:00:0.41
## post-estimation : 00:00:1.04
## Iterations : 7
##
## Unconstrained optimisation.
##
## Estimates:
## Estimate s.e. t.rat.(0) p(1-sided) Rob.s.e.
## b_wyld 0.935061 0.16351 5.719 5.372e-09 0.16329
## b_cringe 0.981856 0.16558 5.930 1.516e-09 0.16374
## b_sheesh 0.974214 0.16404 5.939 1.435e-09 0.16348
## b_lineup 0.601917 0.01074 56.070 0.000 0.01009
## b_byob 1.001380 0.03336 30.020 0.000 0.03287
## b_size -0.098638 0.01330 -7.415 6.062e-14 0.01346
## b_holidays -0.401710 0.01129 -35.580 0.000 0.01164
## b_price -0.009955 2.4668e-04 -40.354 0.000 2.4505e-04
## Rob.t.rat.(0) p(1-sided)
## b_wyld 5.726 5.128e-09
## b_cringe 5.997 1.008e-09
## b_sheesh 5.959 1.268e-09
## b_lineup 59.632 0.000
## b_byob 30.461 0.000
## b_size -7.330 1.149e-13
## b_holidays -34.514 0.000
## b_price -40.623 0.000
Estimate | s.e. | t.rat.(0) | p(1-sided) | Rob.s.e. | Rob.t.rat.(0) | p(1-sided) | |
---|---|---|---|---|---|---|---|
b_wyld | 0.935 | 0.164 | 5.719 | 0 | 0.163 | 5.726 | 0 |
b_cringe | 0.982 | 0.166 | 5.930 | 0 | 0.164 | 5.997 | 0 |
b_sheesh | 0.974 | 0.164 | 5.939 | 0 | 0.163 | 5.959 | 0 |
b_lineup | 0.602 | 0.011 | 56.070 | 0 | 0.010 | 59.632 | 0 |
b_byob | 1.001 | 0.033 | 30.020 | 0 | 0.033 | 30.461 | 0 |
b_size | -0.099 | 0.013 | -7.415 | 0 | 0.013 | -7.330 | 0 |
b_holidays | -0.402 | 0.011 | -35.580 | 0 | 0.012 | -34.514 | 0 |
b_price | -0.010 | 0.000 | -40.354 | 0 | 0.000 | -40.623 | 0 |
We can now play around with the parameters from above and see how that affects the results. For example, change the betas, samples size, delete specific choice sets in the design etc.
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
# Define the constant b
b <- model1$estimate["b_lineup"]
p <- model1$estimate["b_price"]
# Create a sequence of x values from 1 to 70000
x_values <- seq(1, 200, length.out = 1000)
# Calculate f(x) = b * ln(x) and f'(x) = b / x
data <- data.frame(
x = x_values,
f_x = b * log(x_values),
mWTP = - b / (x_values*p)
)
# Create the ggplot
ggplot(data, aes(x = x)) +
geom_line(aes(y = f_x, color = "Utility")) +
geom_line(aes(y = mWTP, color = "mWTP")) +
labs(title = "Utility from festival size",
x = "Number of people",
y = "Utility/mWTP",
color = "Function") +
scale_color_manual(values = c("Utility" = "blue", "mWTP" = "red")) +
theme_minimal()