log_odds <- function(p) {
return(
log(p/(1-p))
)
}
create_T <- function(xrho, n, p = .5) {
x <- rnorm(n)
t <- xrho*x + rnorm(n, mean = log_odds(p), sd = sqrt(1-xrho^2))
return(data.frame(cbind(x = x, t = t)))
}Power with Propensity Weights
Introduction
The purpose of this short document is to use simulation to provide a method to determine plausible design effects from the use of inverse propensity weights. For example, suppose a power analysis is required for an evaluation of a program which seeks to improve several binary outcomes with varying baseline rates.
- outcome 1 = .1
- outcome 2 = .25
- outcome 3 = .5
Propensity weights design effect
These design effects are for propensity weighted approach. Propensity weights effectively match treatment and comparison units based on a predicted propensity (see, e.g., Austin 2011) for the estimation of the average treatment effect (ATE). However, propensity weights also impact the significance tests through weighting design effects, which are related to the variation in the weight variable. A rough estimate of this design effect is given in Kish (1992)
\[ DEFF = 1 + CV_w^2 \]
where \(CV_w\) is the ratio of the standard deviation of the weights to the mean weight. The factor \(DEFF\) can be used to estimate the effective sample size for power analysis. For example, if the actual sample size is 100, but the expected \(DEFF\) is 2, the sample size for power analysis should be \(\frac{100}{2} = 50\). Alternatively, if a power analysis reports a required sample of 100 and the \(DEFF\) is 2, the actual number of required cases is 200.
In order to anticipate the \(DEFF\) from propensity weights, we must consider the the correlation of the observable factors used in the treatment propensity model and how this impacts the weights. Formally, let’s assume that the propensity for treatment is some probability which ranges from 0 to 1, but can be transformed into log-odds, \(T_i = \mbox{ln}\left(\frac{p_i}{1-p_p}\right)\).
These quantiles are associated with a set of factors which can be summarized by some set of standard normal values, \(X\).
This function then creates the variables for \(X\) and \(T\), where both are approximately standard normal and correlated with a value of \(\rho\). We also define a log-odds function.
For example, we create an object with 1,000 observations for \(T\) and \(X\), called xt
xt <- create_T(xrho = .5, n = 1000)
summary(xt) x t
Min. :-3.159044 Min. :-3.058016
1st Qu.:-0.688354 1st Qu.:-0.634629
Median :-0.009003 Median : 0.055023
Mean : 0.009850 Mean : 0.007813
3rd Qu.: 0.685672 3rd Qu.: 0.633364
Max. : 2.753219 Max. : 2.408692
cat(paste("sd of t is", sd(xt$t),"\n"))sd of t is 0.955875153440084
cat(paste("cor between x and t is", cor(xt$x, xt$t),"\n"))cor between x and t is 0.497211775994681
Next, we create another function to take \(T\) and create a treatment variable, based on a binomial coin flip with probability equal to the probability associated with log-odds \(T\). The program also takes a parameter for the general probability of treatment, \(p\), which is converted into a logit to re-center the value of \(T\).
Say we want a treatment variable with a mean of around .5 (after defining an inverse logit function),
inv_logit <- function(t) {
return(
1/(1+exp(-1*(t)))
)
}
mean(
rbinom(
nrow(xt),
1,
inv_logit(xt$t)
)
)[1] 0.498
This program then runs a quick logit to predict treatment and computes the propensity weight, and then the expected design effect. It uses the create_T function defined above.
compute_deff <- function(xrho, p = .5, n) {
xt <- create_T(xrho = xrho,n = n, p = p)
treat <- rbinom(
n,
1,
inv_logit(xt$t)
)
ehat <- glm(treat ~ xt$x, family = binomial("logit"))$fitted.values
w <- treat/ehat + (1-treat)/(1-ehat)
return(1 + sd(w)^2/mean(w)^2)
}So, if we assume that our observed factors are correlated with the propensity for treatment by about .5, and half of the cases are assigned treatment, the design effect may be something like
compute_deff(xrho = .5, p = .5, n =1000)[1] 1.048454
So, below we run a simulation with various values of this correlation to get a sense about what we expect this to design effect to be, by getting the mean of the simulations for correlations from 0, which means the weights are very similar and thus no design effect to .9, which means the design effect is larger because the weights differ more.
mean_sim <- function(xrho, n, p, reps = 10000) {
return(
mean(
replicate(
reps,
compute_deff(xrho = xrho, p = p, n = n)
)
)
)
}
deff <- unlist(
lapply(
seq(from = 0, to = .9, by = .1),
mean_sim,
p = .5,
n = 1000,
)
)
cbind(rho = seq(from = 0, to = .9, by = .1), deff = deff) rho deff
[1,] 0.0 1.002010
[2,] 0.1 1.003769
[3,] 0.2 1.009111
[4,] 0.3 1.018611
[5,] 0.4 1.032439
[6,] 0.5 1.051774
[7,] 0.6 1.079025
[8,] 0.7 1.114816
[9,] 0.8 1.163595
[10,] 0.9 1.232852
Where we see that the design effect increases with the quality of the propensity model. A worst case would be a \(DEFF\) of about 1.3, with a marginal model producing a \(DEFF\) of about 1.10.
Below are values for \(DEFF\) for designs in which 10, 25, 75, or 90 percent of \(N=1000\) cases are typically assigned treatment. As the portion of cases assigned treatment become more extreme, the design effect increases.
for (p in c(.1, .25, .5, .75, .9)) {
deff <- unlist(
lapply(
seq(from = 0, to = .9, by = .1),
mean_sim,
p = p,
n =1000,
)
)
cat(paste0("p = ", p,"\n"))
print(cbind(rho = seq(from = 0, to = .9, by = .1), deff = deff))
cat("\n")
}p = 0.1
rho deff
[1,] 0.0 2.184610
[2,] 0.1 2.194091
[3,] 0.2 2.229179
[4,] 0.3 2.284834
[5,] 0.4 2.375992
[6,] 0.5 2.501019
[7,] 0.6 2.674646
[8,] 0.7 2.902447
[9,] 0.8 3.214270
[10,] 0.9 3.642611
p = 0.25
rho deff
[1,] 0.0 1.229047
[2,] 0.1 1.232630
[3,] 0.2 1.244051
[4,] 0.3 1.262332
[5,] 0.4 1.289472
[6,] 0.5 1.327022
[7,] 0.6 1.379900
[8,] 0.7 1.449139
[9,] 0.8 1.548725
[10,] 0.9 1.687918
p = 0.5
rho deff
[1,] 0.0 1.002007
[2,] 0.1 1.003772
[3,] 0.2 1.009180
[4,] 0.3 1.018469
[5,] 0.4 1.032355
[6,] 0.5 1.051821
[7,] 0.6 1.078307
[8,] 0.7 1.114653
[9,] 0.8 1.164398
[10,] 0.9 1.233071
p = 0.75
rho deff
[1,] 0.0 1.229666
[2,] 0.1 1.232785
[3,] 0.2 1.242895
[4,] 0.3 1.261740
[5,] 0.4 1.289401
[6,] 0.5 1.326830
[7,] 0.6 1.379779
[8,] 0.7 1.450299
[9,] 0.8 1.548309
[10,] 0.9 1.687641
p = 0.9
rho deff
[1,] 0.0 2.183357
[2,] 0.1 2.193195
[3,] 0.2 2.229672
[4,] 0.3 2.291048
[5,] 0.4 2.381147
[6,] 0.5 2.503858
[7,] 0.6 2.668637
[8,] 0.7 2.901404
[9,] 0.8 3.210860
[10,] 0.9 3.663451
These results are generally invariant with regards to sample size. For example, here are the results for \(p = .25\) for \(N = 500\)
deff <- unlist(
lapply(
seq(from = 0, to = .9, by = .1),
mean_sim,
p = .25,
n =500,
)
)
cbind(rho = seq(from = 0, to = .9, by = .1), deff = deff) rho deff
[1,] 0.0 1.233912
[2,] 0.1 1.237913
[3,] 0.2 1.249339
[4,] 0.3 1.266866
[5,] 0.4 1.295482
[6,] 0.5 1.334918
[7,] 0.6 1.388497
[8,] 0.7 1.458235
[9,] 0.8 1.558223
[10,] 0.9 1.700188
The Power Analysis
Now we are ready for the power analysis. Below, I estimate the required sample to detect proportion differences of .05, .1, and .2 from baselines of .5, .75, and .1 (for the outcomes above). To apply the impact of weighting, we then multiply the samples by 1.1 and 1.3 to find the required samples (assuming a 50/50 split in cases).
for (baseline in c(.1,.25,.5)) {
for (d in c(.05,.1,.2)) {
n <- power.prop.test(
p1 = baseline,
p2 = baseline+d,
power = .8,
alternative = "two.sided")$n
cat(
paste0(
"baseline rate = ", baseline,
"; delta = ", d, "\n",
"DEFF = 1.1 = ", ceiling(n*2*1.1), "\n",
"DEFF = 1.3 = ", ceiling(n*2*1.3), "\n"
)
)
}
}baseline rate = 0.1; delta = 0.05
DEFF = 1.1 = 1509
DEFF = 1.3 = 1783
baseline rate = 0.1; delta = 0.1
DEFF = 1.1 = 438
DEFF = 1.3 = 518
baseline rate = 0.1; delta = 0.2
DEFF = 1.1 = 136
DEFF = 1.3 = 161
baseline rate = 0.25; delta = 0.05
DEFF = 1.1 = 2752
DEFF = 1.3 = 3252
baseline rate = 0.25; delta = 0.1
DEFF = 1.1 = 723
DEFF = 1.3 = 855
baseline rate = 0.25; delta = 0.2
DEFF = 1.1 = 194
DEFF = 1.3 = 230
baseline rate = 0.5; delta = 0.05
DEFF = 1.1 = 3443
DEFF = 1.3 = 4069
baseline rate = 0.5; delta = 0.1
DEFF = 1.1 = 853
DEFF = 1.3 = 1008
baseline rate = 0.5; delta = 0.2
DEFF = 1.1 = 205
DEFF = 1.3 = 242
Acknowledgement
This work was supported by the Department of Education, (grant number R305D200045).