The question that will be included in the survey will be as follows
This is the design for the conjoint experiment for Covid reserch. If you want to do a full factorial design:
library(AlgDesign) #the librray needed for the design
#full factorial design
ffd <- gen.factorial(c(3,2,2), varNames = c("policy_provider", "media", "election"),factors = "all")
If you want to do a fractional design. The function optFederov( ) has three arguments: ~., ffd,and number. The argument ~. implies that all data variables are used linearly and their names are used in a model. The argument ffd indicates the name of data containing the candidate list, which is the same as the name of the object containing the full factorial design created above. The argument 5 indicates the number of rows (alternatives) in the fractional factorial design. PS: The minimum rows that the design lets is 5
set.seed(1234)
des <- optFederov(~., ffd, 6)
# the new list data created with fractional factorial design
des
## $D
## [1] 0.2901835
##
## $A
## [1] 5.4
##
## $Ge
## [1] 0.833
##
## $Dea
## [1] 0.819
##
## $design
## policy_provider media election
## 3 3 1 1
## 4 1 2 1
## 5 2 2 1
## 7 1 1 2
## 8 2 1 2
## 12 3 2 2
##
## $rows
## [1] 3 4 5 7 8 12
The object des contains some objects (elements) as a result of the function optFederov( ). The fractional factorial design that is required below and is called design in the object des is assigned to an object alt1 from the object des by a notation $, which is used to access an element of the object, as follows.
alt1 <- des$design
alt1
## policy_provider media election
## 3 3 1 1
## 4 1 2 1
## 5 2 2 1
## 7 1 1 2
## 8 2 1 2
## 12 3 2 2
M-1 copies of the fractional factorial design created in step 2 are made. Since two packs of policies are shown for each question in the choice experiments (M=2), a copy of the fractional factorial design must be made. The copy is created in an object alt2 using the operator <- in the following manner.
alt2 <- alt1
alt2
## policy_provider media election
## 3 3 1 1
## 4 1 2 1
## 5 2 2 1
## 7 1 1 2
## 8 2 1 2
## 12 3 2 2
replacement.
We randomly select rows (alternatives) from each of the M fractional factorial designs without replacement, and this selection is repeated until all rows in each of the M sets of the fractional factorial design are assigned to P total choice sets. P is equal to the number of rows in the designs. In R, a different and new uniform random variable is added to every fractional factorial design; then, each design is sorted on the basis of its corresponding random variable.
A new uniform random number is added to the object alt1 and alt2 as follows. The command transform() implies that a uniform random variable r1 containing five values is added to the object alt1 and alt2
alt1 <- transform(alt1, r1=runif(6))
alt1
## policy_provider media election r1
## 3 3 1 1 0.5039335
## 4 1 2 1 0.4939609
## 5 2 2 1 0.7512002
## 7 1 1 2 0.1746498
## 8 2 1 2 0.8483924
## 12 3 2 2 0.8648338
alt2 <- transform(alt2, r2=runif(6))
alt2
## policy_provider media election r2
## 3 3 1 1 0.04185728
## 4 1 2 1 0.31718216
## 5 2 2 1 0.01374994
## 7 1 1 2 0.23902573
## 8 2 1 2 0.70649462
## 12 3 2 2 0.30809476
Next, each fractional factorial design is sorted on the basis of its corresponding uniform random variable. The following command is executed on the object alt1 by using the function order()
alt1_sort <- alt1[order(alt1$r1),]
alt2_sort <- alt2[order(alt2$r2),]
alt1_sort
## policy_provider media election r1
## 7 1 1 2 0.1746498
## 4 1 2 1 0.4939609
## 3 3 1 1 0.5039335
## 5 2 2 1 0.7512002
## 8 2 1 2 0.8483924
## 12 3 2 2 0.8648338
alt2_sort
## policy_provider media election r2
## 5 2 2 1 0.01374994
## 3 3 1 1 0.04185728
## 7 1 1 2 0.23902573
## 12 3 2 2 0.30809476
## 4 1 2 1 0.31718216
## 8 2 1 2 0.70649462
Each line of the sorted fractional factorial designs (objects alt1_sort and alt2_sort) corresponds to each question of the choice experiments.. For example, the same line of the objects alt1_sort and alt2_sort shows the combination of the attributes levels of policy A and policy B for the same question, respectively.
q <- cbind(alt1_sort, alt2_sort)
q
## policy_provider media election r1 policy_provider media election
## 7 1 1 2 0.1746498 2 2 1
## 4 1 2 1 0.4939609 3 1 1
## 3 3 1 1 0.5039335 1 1 2
## 5 2 2 1 0.7512002 3 2 2
## 8 2 1 2 0.8483924 1 2 1
## 12 3 2 2 0.8648338 2 1 2
## r2
## 7 0.01374994
## 4 0.04185728
## 3 0.23902573
## 5 0.30809476
## 8 0.31718216
## 12 0.70649462
A conditional logit model, which has been applied in numerous previous researches related to choice experiments, is assumed to be used for the statistical analysis of the responses to the choice experiments questions in our example. This model is based on the random utility theory where a respondent’s utility is divided into two components: a representative (systematic) component and a random component and that for the two types of policy is assumed to be as follows.
\[ V_{j}=ASC+β_{p}provider_{j}+β_{m}media_{j}+β_{e}election_{j} \] where \(V_{j}\) denotes the representative component of utility for the policy, ASC is the alternative specific constant; \(provider_{j}\) takes value 1 if policy j was produced by government, 2 if the policy was formulated by opposition and 3 if the policy is formulated by health experts; \(media_{j}\) takes value 1 if policy j foresees media restrictions and 2 if policy foresees media freedom; \(election_{j}\) takes value 1 if policy j foresees election postponements and 2 otherwise.
A value of 1 implies selecting policy A, 2 implies selecting policy B, and 3 implies selecting the none-of-these option.
The conditional logit model can be executed by using the function clogit() that is included in the survival package (Lumley 2006). To consider the effects of individual characteristics on the valuation of attributes, the interaction between individual characteristics and attribute variables will be included in the model.
Hypothetical 10 individuals’ responses will look like below. Each individual will evaluate 3 policy combinations. The values of 1 and 2 imply selecting policy A and policy B respectively
The clogit() function of the survival package necessitates the following data structure. The ASC corresponding to alternative 3 (i.e., β03) is set to zero, and the ASCs corresponding to other alternatives are evaluated with respect to the reference alternative.
Variable “Question” is a stratification variable that is used to identify each combination of respondents and questions. Since there are three alternatives to each question (choice situation), the stratification variable takes a different value every three lines. For example, the value of Question in the case of question 1 for respondent 1 from row 2 to row 4 is 101. Variable “response” is a dummy variable that, at each row, takes the value 1 if the alternative is selected and 0 otherwise. For example, we observe that respondent 1 selected the Policy A option (first alternative) in the case of question 1 because the variable response in second row (including the header row) takes a value of 1 and that in both rows 2 and 3 takes a value of 0.
Remaining columns contain independent variables in the conditional logit model. Variable ASC is an alternative specific constant for each type of the policies. Variable “policy_provider” takes three distinct values as described in the coding scheme. Variable “media” and variable “election” represent dummy variables taking a value of 1 according to the coding scheme.
Each attribute variable of the policy_provider, media and election corresponding to the none-of-these option takes a value of 0. Because the function clogit( ) does not correspond to a question (choice scenario) format including the none-of-these option, it is necessary to normalize the representative component of utility for the option to be 0.
The conditional logit model can be applied by using the function clogit() included in the survival package in R. The output shows that the values of the log-likelihood at zero and that at convergence are -49.43755 and -47.84860, respectively.
library(survival)
library(gsheet) #required to import csv from google sheets
sample_data <- gsheet2tbl('docs.google.com/spreadsheets/d/1p51e0vPZ0dl6xY4FIZDF7KluH3WUvDRSr_Hzu-V5Evo/edit?usp=sharing') #sample data
clogout1 <- clogit(Response~Asc+Policy_provider+Media+Election+strata(Question),data=sample_data)
clogout1
## Call:
## clogit(Response ~ Asc + Policy_provider + Media + Election +
## strata(Question), data = sample_data)
##
## coef exp(coef) se(coef) z p
## Asc 0.99422 2.70262 1.33097 0.747 0.455
## Policy_provider -0.11908 0.88774 0.28472 -0.418 0.676
## Media 0.08872 1.09278 0.47145 0.188 0.851
## Election -0.21486 0.80665 0.42271 -0.508 0.611
##
## Likelihood ratio test=3.18 on 4 df, p=0.5285
## n= 135, number of events= 45
clogout1$loglik
## [1] -49.43755 -47.84860
A method for reflecting the respondents’ individual characteristics in the conditional logit model is to use a new variable that multiplies the individual characteristic variable and attribute variables (including ASC). Suppose that the effect of the respondent’s gender on their valuation of the media variable is statistically tested, that is, a new variable media_{j}Female is introduced as one of independent variables, where variable Female takes a value of 1 if the respondent is a female and 0 otherwise. An interaction between two variables in formula is expressed by using the operation :, that is, Media:Female. The conditional logit model can be estimated as follows.
clogout2 <- clogit(Response~Asc+Policy_provider+Media+Election+Media:Female+strata(Question),data=sample_data)
clogout2
## Call:
## clogit(Response ~ Asc + Policy_provider + Media + Election +
## Media:Female + strata(Question), data = sample_data)
##
## coef exp(coef) se(coef) z p
## Asc 1.09717 2.99568 1.32549 0.828 0.408
## Policy_provider -0.08264 0.92069 0.28060 -0.294 0.768
## Media -0.35760 0.69935 0.55022 -0.650 0.516
## Election -0.23708 0.78893 0.42784 -0.554 0.579
## Media:Female 0.68727 1.98828 0.43343 1.586 0.113
##
## Likelihood ratio test=5.77 on 5 df, p=0.3293
## n= 135, number of events= 45
clogout2$loglik
## [1] -49.43755 -46.55275
In this design the question was asked without the “neither” policy option. another fake data created for fulfiling that condition
Marginal means
library(cregg)
sample_data_l <- gsheet2tbl('docs.google.com/spreadsheets/d/1VopGTGRzB4Iwtf0s822Xf78Gmz9ShBy-ry6thzL9OMg/edit?usp=sharing') #leeper sample data
#converting to factors
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.4 v dplyr 1.0.2
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
sample_data_l <- sample_data_l %>%
mutate(Gender_f = case_when(
Female == 0 ~ "Male",
TRUE ~ "Female"
))
sample_data_l_f <- mutate_if(sample_data_l, is.character, as.factor)
sample_data_l_f$Policy_provider_f <- as.factor(sample_data_l_f$Policy_provider)
sample_data_l_f$Media_f <- as.factor(sample_data_l_f$Media)
sample_data_l_f$Election_f <- as.factor(sample_data_l_f$Election)
# descriptive plotting
f1 <- Response ~ authoritarian + party_id + Age + Gender_f
plot(mm(sample_data_l_f, f1, id = ~ Question), vline = 0.5)
## Warning in logLik.svyglm(x): svyglm not fitted by maximum likelihood.
## Warning in logLik.svyglm(x): svyglm not fitted by maximum likelihood.
## Warning in logLik.svyglm(x): svyglm not fitted by maximum likelihood.
## Warning in logLik.svyglm(x): svyglm not fitted by maximum likelihood.
A more common analytic approach for conjoints is to estimate average marginal component effects (AMCEs) using some form of regression analysis. cregg uses glm() and svyglm() to perform estimation and margins to generate average marginal effect estimates. Designs can be specified with any interactions between conjoint features but only AMCEs are returned. Just like for mm(), the output of cj() (or its alias, amce()) is a tidy data frame
# estimation
amces <- cj(sample_data_l_f, Response ~ Policy_provider_f + Media_f + Election_f, id = ~Question)
## Warning in logLik.svyglm(x): svyglm not fitted by maximum likelihood.
head(amces[c("feature", "level", "estimate", "std.error")], 7L)
## feature level estimate std.error
## 1 Policy_provider_f Election 0.00000000 NA
## 2 Policy_provider_f Media 0.15307696 0.1129783
## 3 Policy_provider_f Policy_provider 0.04494523 0.1235695
## 4 Media_f Freedom 0.00000000 NA
## 5 Media_f Restriction -0.02871373 0.1062066
## 6 Election_f On-time 0.00000000 NA
## 7 Election_f Postponement 0.05920716 0.1148850
plot(amces)