Use of propensity scores illustrated

References

AHRQ Summary Variables in Observational Research: Propensity Scores and Disease Risk Scores: http://effectivehealthcare.ahrq.gov/index.cfm/search-for-guides-reviews-and-reports/?productid=1084&pageaction=displayproduct (quotes are from this article).

Code to create illustrations

The codes shown here are to produce illustrations, not to actually conduct PS analyses. The gray broken line indicates the total.

## Create "PS"
ps.seq <- seq(from = 0, to = 1, by = 0.01)

## Create a data frame for plotting
ps.dat <- data.frame(ps = ps.seq)

## Create hypothetical distributions of PS for the exposed and unexposed
ps.dat$exposed   <- dnorm(x = ps.seq, mean = 0.6, sd = 0.25) * 100
ps.dat$unexposed <- dnorm(x = ps.seq, mean = 0.3, sd = 0.20) * 200
ps.dat$total     <- with(ps.dat, exposed + unexposed)

## Transform for ggplot2
library(reshape2)
ps.dat.melt <- melt(data          = ps.dat,
                    id.vars       = c("ps"),
                    measure.vars  = c("exposed","unexposed","total"),
                    variable.name = "group",
                    value.name    = "number"
                    )

## Plot distributions of PS for the for exposed and unexposed
library(ggplot2)
ps1 <- ggplot(data = subset(ps.dat.melt, group != "total"),
              mapping = aes(x = ps, y = number, group = group, color = group, lty = group)) +
    layer(geom = "line", stat = "identity", size = 1) +
    theme_bw() +
    theme(legend.key = element_blank()) +
    geom_line(data = subset(ps.dat.melt, group == "total"),
              aes(x = ps, y = number), inherit.aes = F, lty = 2, lwd = 0.1) # Add total
ps1

plot of chunk unnamed-chunk-2

Restriction

Those who have extreme PS will be excluded (grayed out) because they do not have comparative counterparts.

“Advantages of restriction include greater transparency and increased validity of comparisons. Disadvantages of restriction involve the exclusion of some exposed people. If a beneficial or adverse effect occurs differently among the set of excluded people (those to whom the restriction was not made) then this form of effect measure modification will not be observed.” (Ref. AHRQ Summary Variables in Observational Research)

dat1 <- data.frame(xmin = 0, xmax = 0.125, ymin = 0, ymax = 500)
dat2 <- data.frame(xmin = 0.75, xmax = 1, ymin = 0, ymax = 500)
ps1 +
    geom_rect(data = dat1, aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax), alpha = 1/3, inherit.aes = F) +
    geom_rect(data = dat2, aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax), alpha = 1/3, inherit.aes = F)

plot of chunk unnamed-chunk-3

Stratification (by quintiles)

Comparison is conducted within each quintile.

“An advantage of stratification is transparency in that balance on covariates achieved through use of the propensity score can be shown explicitly when using stratification. A disadvantage of stratification is that in order to be transparent many tables may be required, making for a potentially unwieldy presentation. Additionally, residual confounding within strata may cause bias.16,17” (Ref. AHRQ Summary Variables in Observational Research)

## Get quintile threshold positions
quintile.pos <- colSums(outer(cumsum(ps.dat$total), sum(ps.dat$total)/5 * c(1:5), "<"))
ps.seq[quintile.pos]

[1] 0.20 0.32 0.44 0.60 0.99

## Stratification (Quintiles)
ps1 + labs(y = "") + geom_vline(x = c(0, 0.2, 0.32, 0.44, 0.6, 1))

plot of chunk unnamed-chunk-4

Matching

Analysis is conducted using those who have counterparts in the other exposure group.

“The advantages of matching include transparency in that once matched, the groups to be compared can be explicitly described so that any reviewer can see whether the characteristics of the compared groups might be subject to confounding, by presenting a comparison of characteristics similar to a "Table 1” from a clinical trial. The primary disadvantage of matching is the exclusion of unmatched subjects, which will be mainly exposed subjects that appear in the tails of the PS distribution.“ (Ref. AHRQ Summary Variables in Observational Research)

## Create a dataset for matching
ps.dat.melt.match <- ps.dat.melt

## Create numbers that remain after matching
ps.dat.melt.match[ps.dat.melt.match$group ==   "exposed", "number"] <- with(ps.dat, pmin(exposed, unexposed))
ps.dat.melt.match[ps.dat.melt.match$group == "unexposed", "number"] <- with(ps.dat, pmin(exposed, unexposed))

## Plot distributions of PS for the for exposed and unexposed
ps.match <- ggplot(data = subset(ps.dat.melt.match, group != "total"),
                   mapping = aes(x = ps, y = number, group = group, color = group, lty = group)) +
    layer(geom = "line", stat = "identity", size = 1) +
    theme_bw() +
    theme(legend.key = element_blank()) +
    geom_line(data = subset(ps.dat.melt.match, group == "total"),
              aes(x = ps, y = number), inherit.aes = F, lty = 2, lwd = 0.1) # Add total
ps.match

plot of chunk unnamed-chunk-5

Modeling

Including the propensity score as a variable in a regression model is equivalent of making comparison within fine strata. But the comparison may be made between real subjects and non-existent "extrapolated subjects” (positivity violation).

“it has the advantage of using all of the subjects with the treatment of interest and their comparators. A disadvantage of the approach is that transparency may be reduced by inclusion of the propensity score to a single term. Also, without examining and removing the areas of non-overlap in the propensity score distributions through restriction, some multivariable extrapolation may occur as mentioned above. (Multivariable regression encounters the more hidden problem of extrapolation where the comparisons might be made between observed data and extrapolated data for covariate patterns where there are not both exposed and unexposed people.)” (Ref. AHRQ Summary Variables in Observational Research)

ps1 + labs(y = "") + geom_vline(x = seq(0, 1, 0.05))

plot of chunk unnamed-chunk-6

Weighting

The unexposed or both of the groups are reweighted to create comparable groups.

“A primary advantage of weighting is that all subjects in an analysis can be used so that there may be less concern about excluded subjects. A disadvantage is the loss of transparency that accompanies the use of weighting. A tabular presentation of the characteristics of the compared groups is not possible in the same way as with matching or stratification. (Ref. AHRQ Summary Variables in Observational Research)

Inverse Probability of Treatment Weighting (IPTW)

It is the comparison between the scenario where everybody in the whole cohort was treated and the scenario where everybody in the whole cohort was NOT treated.

"the IPTW weighting approach can lead to extremely large weights being applied to a small number of individuals. ” (Ref. AHRQ Summary Variables in Observational Research)

## Create a dataset for IPTW
ps.dat.melt.iptw <- ps.dat.melt

## Create numbers that are created by IPTW
ps.dat.melt.iptw[ps.dat.melt.iptw$group ==   "exposed", "number"] <- ps.dat$total
ps.dat.melt.iptw[ps.dat.melt.iptw$group == "unexposed", "number"] <- ps.dat$total

## Plot distributions of PS for the for exposed and unexposed
ps.iptw <- ggplot(data = subset(ps.dat.melt.iptw, group != "total"),
                  mapping = aes(x = ps, y = number, group = group, color = group, lty = group)) +
    layer(geom = "line", stat = "identity", size = 1) +
    theme_bw() +
    theme(legend.key = element_blank()) +
    geom_line(data = subset(ps.dat.melt.iptw, group == "total"),
              aes(x = ps, y = number), inherit.aes = F, lty = 2, lwd = 0.1) # Add total
ps.iptw

plot of chunk unnamed-chunk-7

Standardized mortality ratio weighting (SMR weighting)

It is the comparison between the exposed group under treatment and the exposed group “untreated”.

## Create a dataset for SMR
ps.dat.melt.smr <- ps.dat.melt

## Create numbers that are created by SMR
ps.dat.melt.smr[ps.dat.melt.smr$group == "unexposed", "number"] <- ps.dat$exposed

## Plot distributions of PS for the for exposed and unexposed
ps.smr <- ggplot(data = subset(ps.dat.melt.smr, group != "total"),
                  mapping = aes(x = ps, y = number, group = group, color = group, lty = group)) +
    layer(geom = "line", stat = "identity", size = 1) +
    theme_bw() +
    theme(legend.key = element_blank()) +
    geom_line(data = subset(ps.dat.melt.smr, group == "total"),
              aes(x = ps, y = number), inherit.aes = F, lty = 2, lwd = 0.1) # Add total
ps.smr

plot of chunk unnamed-chunk-8

Table of Contents: http://rpubs.com/kaz_yos/toc
Table of Contents (Japanese): http://rpubs.com/kaz_yos/toc-j
Top Page: http://rpubs.com/kaz_yos/
SlideShare: http://www.slideshare.net/kaz_yos