Att_Name | truepar | altscfeff.mean | befficientdesign.mean | efficientdesign.mean | orthodesign.mean | altscfeff.median | befficientdesign.median | efficientdesign.median | orthodesign.median |
---|---|---|---|---|---|---|---|---|---|
basc | -1.20 | -1.20 | -1.20 | -1.21 | -1.20 | -1.20 | -1.20 | -1.20 | -1.20 |
baction1 | 0.10 | 0.10 | 0.10 | 0.10 | 0.10 | 0.10 | 0.10 | 0.10 | 0.10 |
badvisory1 | 0.40 | 0.40 | 0.40 | 0.40 | 0.40 | 0.40 | 0.40 | 0.40 | 0.40 |
bpartner1 | 0.30 | 0.30 | 0.30 | 0.30 | 0.30 | 0.30 | 0.30 | 0.30 | 0.30 |
bcomp1 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 |
basc2 | -1.60 | -1.60 | -1.60 | -1.61 | -1.61 | -1.60 | -1.60 | -1.62 | -1.60 |
baction2 | 0.20 | 0.20 | 0.19 | 0.21 | 0.20 | 0.20 | 0.20 | 0.21 | 0.20 |
badvisory2 | 0.60 | 0.60 | 0.60 | 0.60 | 0.60 | 0.60 | 0.60 | 0.60 | 0.60 |
bpartner2 | 0.40 | 0.40 | 0.40 | 0.40 | 0.40 | 0.40 | 0.40 | 0.40 | 0.40 |
bcomp2 | 0.01 | 0.02 | 0.02 | 0.02 | 0.02 | 0.01 | 0.01 | 0.02 | 0.02 |
Simulation experimental design
Preliminary information
This document documents the process of generating and testing different designs for a study with 5 attributes and two alternatives. The process of finding the best design follows different steps.
Generating different designs: The parameters used to generate the designs are denoted as priors
Simulating a data set: The simulation assumes a clearly defined data generating process (DGP). The DGP is specified in terms of error variance, utility parameters and a specification of the utility function. The parameters used to generate the data are denoted as true parameters.
Estimating a model on the simulated dataset. Here, we can estimate a model that is exactly the same as the GDP or a model that deviates from the GDP. We denote the parameters estimated in each model as the estimated parameters.
We repeat steps 2 and 3 \(N\) times to infer which design is the best. Here we look at statistical power (at a 5% significance level), unbiasedness and efficiency.
Power: If we have 100 simulations, how often do we find an effect (knowing that the effect exists).
Unbiasedness: If we have 100 simulations, we know that each estimation will not give the true parameter, but something around the true parameter. This is because the DGP has some randomness involved. But the estimated parameters should should fluctuate around the true parameter. Thus, the mean of the \(N\) estimated parameters should be equal to the true parameter.
Efficiency: This is a relative measure. The design we consider as efficient is the one, that has lower mean estimated standard errors (equally p-values) than all other other designs.
Once we have found a design, we check each choice set manually to make sure the choice sets are not bogous to the respondents. If we find bogous choice set we either take another design, or change this choice set.
- Note
-
Unbiasedness and efficiency are independent of the error variance and effect size. Power, in contrast, depends on the magnitude of the true parameters or the error variance. As we will not know these values before we have collected the data, power is somehow an abitrarly measure. The best strategy is to use the parameters from a similar study as the true parameters, and multiply it by a constant $ c < 1$. Then we get a rather conservative power estimate. Still, we can always fool ourselves (and reviewers) by using parameters that give us the power we want.
For the simulations, we use 360 respondents for a GDP with homogenous preferences (conditional logit model)
Design Generation
We created four different experimental designs:
- orthogonal design
- efficient design
- bayesian efficient design
- efficient design capturing alternative specific parameters
Simulation
The following presents the results from a DGP of the conditional logit model. This is the standard approach and assumes homogeneous preferences.
The simulation has 360 respondents and 1000 runs. The simulation itself took 35M 43S .
The simulation is based on the output file output/agora_simulation.rds
Unbiasedness
Table 1 shows summary statistics of the estimated parameters for the 1000 runs. We want to make sure that they are nearly equal to the true parameters.
Efficiency
Table 2 shows the summary statistics of values of robust p-values over all runs. In all designs except for orth, we have p values \(< 0.00\).
parname | altscfeff.mean | befficientdesign.mean | efficientdesign.mean | orthodesign.mean | altscfeff.sd | befficientdesign.sd | efficientdesign.sd | orthodesign.sd | altscfeff.min | befficientdesign.min | efficientdesign.min | orthodesign.min | altscfeff.max | befficientdesign.max | efficientdesign.max | orthodesign.max | altscfeff.range | befficientdesign.range | efficientdesign.range | orthodesign.range | altscfeff.se | befficientdesign.se | efficientdesign.se | orthodesign.se | altscfeff.median | altscfeff.skew | altscfeff.kurtosis | befficientdesign.median | befficientdesign.skew | befficientdesign.kurtosis | efficientdesign.median | efficientdesign.skew | efficientdesign.kurtosis | orthodesign.median | orthodesign.skew | orthodesign.kurtosis |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
rob_pval0_basc | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | 0.00 | 0.01 | 0.01 | 0.00 | 0.00 | 0.01 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0 | 24.83 | 676.90 | 0.00 | 13.46 | 194.05 | 0.00 | 26.99 | 786.56 | 0.00 | 28.24 | 844.65 |
rob_pval0_baction1 | 0.22 | 0.33 | 0.33 | 0.24 | 0.27 | 0.30 | 0.29 | 0.28 | 0 | 0 | 0 | 0 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.01 | 0.01 | 0.01 | 0.01 | 0.1 | 1.31 | 0.65 | 0.24 | 0.68 | -0.85 | 0.25 | 0.69 | -0.73 | 0.10 | 1.24 | 0.36 |
rob_pval0_badvisory1 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.01 | 0.01 | 0.00 | 0 | 0 | 0 | 0 | 0.00 | 0.13 | 0.15 | 0.00 | 0.00 | 0.13 | 0.15 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0 | 22.21 | 539.74 | 0.00 | 14.13 | 260.83 | 0.00 | 17.05 | 335.80 | 0.00 | 27.09 | 779.78 |
rob_pval0_bpartner1 | 0.00 | 0.02 | 0.02 | 0.00 | 0.00 | 0.06 | 0.06 | 0.01 | 0 | 0 | 0 | 0 | 0.07 | 0.67 | 0.97 | 0.13 | 0.07 | 0.67 | 0.97 | 0.13 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0 | 11.62 | 151.62 | 0.00 | 6.42 | 52.14 | 0.00 | 8.54 | 94.00 | 0.00 | 13.19 | 206.37 |
rob_pval0_bcomp1 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0 | 17.86 | 348.54 | 0.00 | 22.43 | 557.33 | 0.00 | 26.69 | 754.34 | 0.00 | 31.50 | 991.89 |
rob_pval0_basc2 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0 | 24.60 | 663.58 | 0.00 | 26.63 | 768.97 | 0.00 | 20.55 | 470.56 | 0.00 | 21.78 | 520.52 |
rob_pval0_baction2 | 0.05 | 0.14 | 0.13 | 0.05 | 0.12 | 0.21 | 0.20 | 0.11 | 0 | 0 | 0 | 0 | 0.97 | 0.99 | 0.98 | 0.92 | 0.97 | 0.99 | 0.98 | 0.92 | 0.00 | 0.01 | 0.01 | 0.00 | 0.0 | 4.37 | 21.55 | 0.04 | 2.07 | 3.96 | 0.03 | 2.28 | 4.87 | 0.01 | 3.99 | 18.76 |
rob_pval0_badvisory2 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0 | 29.13 | 880.56 | 0.00 | 12.86 | 203.27 | 0.00 | 11.91 | 166.91 | 0.00 | 31.53 | 992.99 |
rob_pval0_bpartner2 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.01 | 0.02 | 0.00 | 0 | 0 | 0 | 0 | 0.01 | 0.25 | 0.27 | 0.02 | 0.01 | 0.25 | 0.27 | 0.02 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0 | 11.10 | 144.98 | 0.00 | 11.98 | 191.25 | 0.00 | 9.26 | 102.35 | 0.00 | 17.79 | 335.09 |
rob_pval0_bcomp2 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.01 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | 0.00 | 0.12 | 0.09 | 0.00 | 0.00 | 0.12 | 0.09 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0 | 10.87 | 142.62 | 0.00 | 17.34 | 360.86 | 0.00 | 13.78 | 214.20 | 0.00 | 27.13 | 781.95 |
To get better insights into efficiency, we can look at the standard deviations of the estimated parameters. Smaller standard deviations mean lower fluctuations around the mean. From an efficiency perspective, we should select the model with the lowest standard deviations. It turns out that not one design outperforms another. But the design with simple priors seems to be better for many attributes. This is a bit surprising, as the GDP was based on the priors of the avclasspap design.
parname | truepar | altscfeff.sd | befficientdesign.sd | efficientdesign.sd | orthodesign.sd | altscfeff.min | befficientdesign.min | efficientdesign.min | orthodesign.min | altscfeff.max | befficientdesign.max | efficientdesign.max | orthodesign.max | altscfeff.range | befficientdesign.range | efficientdesign.range | orthodesign.range | altscfeff.se | befficientdesign.se | efficientdesign.se | orthodesign.se | altscfeff.median | altscfeff.skew | altscfeff.kurtosis | befficientdesign.median | befficientdesign.skew | befficientdesign.kurtosis | efficientdesign.median | efficientdesign.skew | efficientdesign.kurtosis | orthodesign.median | orthodesign.skew | orthodesign.kurtosis |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
basc | -1.20 | 0.14 | 0.21 | 0.21 | 0.14 | -1.70 | -1.90 | -1.87 | -1.68 | -0.78 | -0.57 | -0.49 | -0.78 | 0.92 | 1.33 | 1.38 | 0.90 | 0 | 0.01 | 0.01 | 0.00 | -1.20 | -0.16 | 0.15 | -1.20 | -0.02 | 0.02 | -1.20 | -0.01 | -0.04 | -1.20 | 0.02 | 0.03 |
baction1 | 0.10 | 0.06 | 0.08 | 0.09 | 0.06 | -0.10 | -0.15 | -0.25 | -0.13 | 0.31 | 0.37 | 0.38 | 0.28 | 0.41 | 0.51 | 0.63 | 0.41 | 0 | 0.00 | 0.00 | 0.00 | 0.10 | -0.07 | -0.14 | 0.10 | 0.01 | -0.36 | 0.10 | -0.04 | 0.30 | 0.10 | -0.07 | -0.14 |
badvisory1 | 0.40 | 0.06 | 0.09 | 0.08 | 0.06 | 0.21 | 0.14 | 0.12 | 0.19 | 0.60 | 0.69 | 0.72 | 0.58 | 0.40 | 0.56 | 0.59 | 0.39 | 0 | 0.00 | 0.00 | 0.00 | 0.40 | 0.03 | -0.10 | 0.40 | 0.13 | 0.06 | 0.40 | 0.09 | 0.16 | 0.40 | -0.06 | -0.02 |
bpartner1 | 0.30 | 0.06 | 0.09 | 0.09 | 0.06 | 0.11 | -0.05 | 0.00 | 0.10 | 0.50 | 0.55 | 0.56 | 0.49 | 0.38 | 0.60 | 0.56 | 0.39 | 0 | 0.00 | 0.00 | 0.00 | 0.30 | 0.07 | -0.02 | 0.30 | -0.10 | -0.21 | 0.30 | -0.07 | 0.04 | 0.30 | -0.01 | -0.03 |
bcomp1 | 0.02 | 0.00 | 0.00 | 0.00 | 0.00 | 0.01 | 0.01 | 0.01 | 0.01 | 0.03 | 0.03 | 0.03 | 0.03 | 0.01 | 0.02 | 0.02 | 0.01 | 0 | 0.00 | 0.00 | 0.00 | 0.02 | 0.08 | -0.10 | 0.02 | 0.04 | -0.03 | 0.02 | 0.05 | -0.10 | 0.02 | -0.09 | 0.21 |
basc2 | -1.60 | 0.15 | 0.24 | 0.25 | 0.17 | -2.11 | -2.36 | -2.47 | -2.11 | -1.18 | -0.89 | -0.76 | -1.10 | 0.93 | 1.48 | 1.71 | 1.00 | 0 | 0.01 | 0.01 | 0.01 | -1.60 | -0.10 | -0.11 | -1.60 | -0.02 | -0.21 | -1.62 | 0.05 | -0.03 | -1.60 | -0.11 | -0.21 |
baction2 | 0.20 | 0.07 | 0.09 | 0.10 | 0.07 | -0.02 | -0.15 | -0.12 | -0.05 | 0.43 | 0.52 | 0.51 | 0.43 | 0.46 | 0.67 | 0.63 | 0.48 | 0 | 0.00 | 0.00 | 0.00 | 0.20 | -0.05 | -0.12 | 0.20 | 0.05 | 0.03 | 0.21 | -0.03 | -0.02 | 0.20 | -0.01 | -0.13 |
badvisory2 | 0.60 | 0.07 | 0.10 | 0.10 | 0.07 | 0.40 | 0.34 | 0.33 | 0.33 | 0.83 | 0.94 | 0.91 | 0.84 | 0.44 | 0.60 | 0.57 | 0.52 | 0 | 0.00 | 0.00 | 0.00 | 0.60 | 0.10 | 0.06 | 0.60 | 0.04 | -0.15 | 0.60 | 0.04 | -0.28 | 0.60 | -0.03 | 0.03 |
bpartner2 | 0.40 | 0.07 | 0.10 | 0.10 | 0.07 | 0.20 | 0.11 | 0.11 | 0.16 | 0.61 | 0.69 | 0.72 | 0.62 | 0.41 | 0.58 | 0.61 | 0.46 | 0 | 0.00 | 0.00 | 0.00 | 0.40 | -0.06 | -0.16 | 0.40 | 0.18 | -0.32 | 0.40 | 0.01 | -0.05 | 0.40 | 0.13 | 0.12 |
bcomp2 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.01 | 0.01 | 0.01 | 0.01 | 0.02 | 0.02 | 0.03 | 0.02 | 0.01 | 0.02 | 0.02 | 0.01 | 0 | 0.00 | 0.00 | 0.00 | 0.01 | 0.08 | -0.17 | 0.01 | -0.13 | -0.19 | 0.02 | -0.05 | -0.11 | 0.02 | 0.08 | -0.21 |
Statistical power
In Table 4, we see the power (5% significance) for all designs
Design | Power (95%) |
---|---|
altscfeff | 31.1 |
befficientdesign | 5.3 |
efficientdesign | 4.3 |
orthodesign | 29.0 |
Illustration of simulated parameter values
To facilitate interpretation and judgement of the different designs, we plot the densities of estimated parameters from the five experimental designs.
$basc
$baction1
$badvisory1
$bpartner1
$bcomp1
$basc2
$baction2
$badvisory2
$bpartner2
$bcomp2