In this exercise, we will replicate the study by Knox et al (2019). The article introduce the Patient Preference Trials design, a way to contour the estimation of heterogeneous treatment effects with self-selection context, in political science. This exercise is based on the code provided on the suplememtary materials.
The variables on the data set will be defined as:
S: stated preference (factor / integer in \(\{0,…,J-1\}\))
D: design arm (0 = free-choice, 1 = forced-exposure)
C: revealed preference / actual choice (only observed when \(D==0\))
A: actual treatment received (randomized when \(D==1\), else \(A=C\))
Y: outcome (continuous in \([0,1]\) or binary). In the example a sentiment index towards media (continuous) and if the respondent would recommend the media to a friend (binary)
1.1 Loading dependencies
Show Answer
## note: S_i, C_i, A_i are zero-indexed in paper but one-indexed in code'%.%'<-function(x,y) paste(x,y,sep='')# Loading packageslibrary(reshape2)library(plyr)library(tidyverse)library(janitor)library(xtable)library(MASS) # for mvrnorm()library(gtools) # for rdirichlet()library(lpSolve)library(sjPlot)# Loading functionsdevtools::source_url("https://raw.githubusercontent.com/dcknox/ppt/master/functions/bounds_frechet_function.R")devtools::source_url("https://raw.githubusercontent.com/dcknox/ppt/master/functions/bounds_binary_function.R")options(download.file.method ="libcurl", timeout =600)load_rds_remote <-function(url) { tf <-tempfile(fileext =".rds")tryCatch({ utils::download.file(url, tf, mode ="wb", quiet =TRUE)readRDS(tf) }, error =function(e) {# Fallbacks if download.file fails in your knit environmentif (!requireNamespace("curl", quietly =TRUE)) install.packages("curl") con <- curl::curl(url)on.exit(try(close(con), silent =TRUE), add =TRUE) raw <-readBin(con, what ="raw", n =1e8)writeBin(raw, tf)readRDS(tf) })}# Creating directory to save the resultsif (!dir.exists('results')){dir.create('results')}############# style #############red_dark <-'#A53F4F'red_medium <-'#A31F34'red_light <-'#A9606C'blue_dark <-'#0B356F'blue_medium <-'#315485'blue_lightest <-'#8F9AAA'grey_light<-'#C2C0BF'grey_dark <-'#8A8B8C'
1.2 Data Cleaning
Let’s start downloading the data set and pre processing it for the analysis.
Show Answer
# Reading dataurl <-"https://raw.githubusercontent.com/dcknox/ppt/master/data/Spring2014omnibus_raw.csv"d.raw <-read.csv(url, stringsAsFactors =FALSE, check.names =FALSE)## media assignment variables: convert NA/1 to 0/1d.raw$fox[is.na(d.raw$fox)] <-0d.raw$msnbc[is.na(d.raw$msnbc)] <-0d.raw$entertainment[is.na(d.raw$entertainment)] <-0d.raw$med_choice[is.na(d.raw$med_choice)] <-0# 0 if no choice given, else 1-4## construct 'stated preference' variabled.raw$med_pref[which(d.raw$med_pref ==4)] <-3d.raw$med_pref[which(d.raw$med_pref ==0)] <-NA## construct 'actual choice' variable## (pool both entertainment options in free-choice arm,## already pooled in forced choice arm)d.raw$med_choice[which(d.raw$med_choice ==4)] <-3d.raw$med_choice[which(d.raw$med_choice ==0)] <-NA## construct 'received treatment' assignment variable## (A=c if D=0, A~cat(1/3,1/3,1/3) if D=1)## (fox=1, msnbc=2, entertainment=3)d.raw$med_assign <-ifelse(d.raw$forcedchoice ==1, d.raw$fox +2*d.raw$msnbc +3*d.raw$entertainment, d.raw$med_choice )## construct party id variabled.raw$pid <-NAd.raw$pid[d.raw$party1 ==1] <--1# considers self a democratd.raw$pid[d.raw$party1 ==2] <-1# considers self a republicand.raw$pid[d.raw$party4 ==1] <-1# neither but closer to repd.raw$pid[d.raw$party4 ==2] <--1# neither but closer to repd.raw$pid[d.raw$party4 ==3] <-0# closer to neither rep or dem## construct pro/counter-attitudinal media variable## pro-attitudinal = 1 (fox for reps, or msnbc for dems)## counter-attitudinal = 2 (msnbc for reps, or fox for dems)## entertainment = 3 (for both parties)med_labels <-c('pro-attitudinal','counter-attitudinal','entertainment')d.raw <- d.raw[-which(d.raw$pid ==0|is.na(d.raw$pid)),] # drop independentsd.raw$med_pref.att <-ifelse((d.raw$med_pref ==1& d.raw$pid ==1) | (d.raw$med_pref ==2& d.raw$pid ==-1), 1,ifelse((d.raw$med_pref ==2& d.raw$pid ==1) | (d.raw$med_pref ==1& d.raw$pid ==-1), 2, 3) )d.raw$med_choice.att <-ifelse((d.raw$med_choice ==1& d.raw$pid ==1) | (d.raw$med_choice ==2& d.raw$pid ==-1), 1,ifelse((d.raw$med_choice ==2& d.raw$pid ==1) | (d.raw$med_choice ==1& d.raw$pid ==-1), 2, 3) )d.raw$med_assign.att <-ifelse((d.raw$med_assign ==1& d.raw$pid ==1) | (d.raw$med_assign ==2& d.raw$pid ==-1), 1,ifelse((d.raw$med_assign ==2& d.raw$pid ==1) | (d.raw$med_assign ==1& d.raw$pid ==-1), 2, 3) )## construct media sentiment index and rescale to [0,1]## these are eight questions on perception of media (7-point scales)## flip all questions so larger numbers mean 'media is good'd.raw$perceive.index <-rowSums(cbind(d.raw$fair_4, d.raw$friendly_4, d.raw$good_4, d.raw$quarrel_4 *-1+8, d.raw$balanced_4, d.raw$oneside_4*-1+8, d.raw$american_4, d.raw$accurate_4 ))d.raw$perceive.index <- (d.raw$perceive.index -8) /-48+1tab_df(d.raw |>clean_names()|> dplyr::slice_head(n=10))
x
forcedchoice
fox
msnbc
entertainment
med_pref
med_choice
actions_discuss
fair_4
friendly_4
good_4
quarrel_4
balanced_4
oneside_4
american_4
accurate_4
party1
party4
med_assign
pid
med_pref_att
med_choice_att
med_assign_att
perceive_index
1
1
0
0
1
2
NA
7
4
4
4
4
4
4
4
4
1
NA
3
-1
1
NA
3
0.50
2
0
0
0
0
3
3
3
4
2
3
6
3
4
3
3
1
NA
3
-1
3
3
3
0.67
3
1
0
1
0
3
NA
2
4
3
3
4
3
4
3
4
1
NA
2
-1
3
NA
1
0.58
5
0
0
0
0
3
3
4
1
1
1
7
1
6
1
1
1
NA
3
-1
3
3
3
0.98
7
0
0
0
0
3
3
4
2
3
2
6
2
6
1
1
3
2
3
-1
3
3
3
0.85
8
1
0
1
0
3
NA
7
4
7
4
1
4
4
7
4
2
NA
2
1
3
NA
2
0.31
11
0
0
0
0
3
1
2
1
4
1
5
1
5
1
1
2
NA
1
1
3
1
1
0.85
12
1
0
0
1
2
NA
3
2
3
3
5
3
3
3
3
1
NA
3
-1
1
NA
3
0.65
13
1
1
0
0
1
NA
1
5
3
5
5
5
2
2
3
1
NA
1
-1
2
NA
2
0.50
15
0
0
0
0
1
1
2
1
5
1
3
3
3
1
1
2
NA
1
1
1
1
1
0.71
Now let’s split the data set for specific analysis on each outcome variable. We will start by making the data set for continuous variables \([0,1]\). Those are going to be stored in the d.frechet object.
Now let’s make the data set for binary variables. This will be stored on the d.binary object.
Show Answer
## at least somewhat likely to discuss clip with friendd.binary <-data.frame(S = d.raw$med_pref.att,C = d.raw$med_choice.att,A = d.raw$med_assign.att,D = d.raw$forcedchoice )d.binary$Y <- d.raw$actions_discuss <=3# not likely/not sure = 0, else = 1drop <-which(is.na(d.binary$S) |is.na(d.binary$D) |is.na(d.binary$A) |is.na(d.binary$Y) | (is.na(d.binary$C) & d.binary$D ==0) )if (length(drop) >0){ d.binary <- d.binary[-drop,]}rm(drop)tab_df(d.binary |>slice_head(n=10))
S
C
A
D
Y
1
NA
3
1
FALSE
3
3
3
0
TRUE
3
NA
1
1
TRUE
3
3
3
0
FALSE
3
3
3
0
FALSE
3
NA
2
1
FALSE
3
1
1
0
TRUE
1
NA
3
1
TRUE
2
NA
2
1
TRUE
1
1
1
0
TRUE
To last, we will prepare an effect matrix (effects.mat) that will help to estimate effect of changing between the treatment values. This will be later imputed as an argument in the function to estimate the treatment effects.
Now we will run the analysis for the continuous variable using the function bounds.frechet, imported from Dean Knox GitHub. This is a very computationally heavy operation, which may take a few hours depending on the computer.
Show Answer
if (file.exists('./results/acte_frechet_posterior5k.rds')){ acte.frechet <-readRDS('./results/acte_frechet_posterior5k.rds')} else {set.seed(02139) acte.frechet <-bounds.frechet(data = d.frechet,effects.mat = effects.mat, # Effect matrixposterior =TRUE,n_dir_MC =500,# Nu,ber o simulationsn_norm_MC =10, # Number of random normal draws in each simulationsensitivity =TRUE, # Argument to compute the sensitivity analysis when doing the simulationsrhos =seq(0, .25, .01) )saveRDS(acte.frechet, './results/acte_frechet_posterior5k.rds')}
In order to skip the simulation, we will also download the results from the GitHub page. Lets plot the effects of the partisan media exposure on the sentiment index towards media.
Now, let’s replicate the sensitivity analysis proposed on the paper for continuous variables. As specified before, when computing the results, the function also performed the sensitivity analysis. Therefore, we just need to extract the parameters again to create the visualization.
To end we will implement the analysis for the binary variable using the function bounds.binary, imported from Dean Knox GitHub. This is an even heavier computationally heavy operation, which may take more than a few hours depending on the computer.
Again we will skip the estimation and download the results from Knox GitHub. We then compute the parameters and create the visualization of the effects.
Let’s again replicate the sensitivity analysis proposed on the paper, this time for binary variables. Again, as specified before, when computing the results, the function also performed the sensitivity analysis. So we repeat the process.