This report was generated on 2017-10-12 13:31:22. The code is available at https://github.com/graveja0/sccs.
source("scripts/setup-packages-and-functions.R")
convert.from.sas <- FALSE
impute.data = FALSE
source("scripts/convert-raw-data-from-sas.R")
source("scripts/read-in-and-tidy-data.R")
Our first objective is to identify the sample size and characteristics of responders to the follow-up surveys. In total there are currently 31,712
responders to the follow-up 3 (FU3) survey.
In this section we will explore the predictors of follow-up nonresponse, as well as come up with a set of nonresponse weights we will later use to test the sensitivity of our results for nonresponse.
Our first step is to pre-specify a set of characteristics and administrative variables deemed a priori as potentially important for response. We can begin with a baseline response propensity specification with linear terms for each of these covariates, though we could also consider choosing a more flexible specification that considers second-order and interaction terms (possibly selected by an iterative algorithm and checked via cross-validation of the prediction errors to mitigate against the risk of overfitting).
# Baseline variables for predicting response
z <- c("education","sex","maritalstatus","hhsize","employed",
"enrollment_source","raceanalysis","insurancecoverage")
# Variables we want to enter as a spline
z.spline <- c("enrollment_age")
# Follow-Up variables for predicting response (e.g., lagged value of the outcomes).
x_2 <- c("insurancecoveragef1","sf12v2healthf1")
x_3 <- c("insurancecoveragef1")
We next fit logistic regression models predicting survey response.
After estimating the response propensity model we next assess the adequacy of the predicted response score. To do this we exploit a feature of the true response propensity score: the independence of the response indicator and the baseline characteristics given the score. Ideally, strata would be constructed pairing individuals with the same value of the score; however, this isn’t feasible given the wide range of values the score could take on.
Therefore, we proceed by stratifying the data using the following iterative procedure.
In the notation below, let \(N_0(j)\) and \(N_1(j)\) be the subsample sizes of non-responders and responders in substratum \(j\). Similarly, let \(\bar l_0(j)\) and \(\bar l_1(j)\) be the average value of the linearized response propensity within the stratum. Finally, let \(S^2\) be the sample variance of the linearized response propensity within block j.
The plot below shows the density distribution of response propensity scores among responders and non-responders within each of the J strata.
We next check overall balance of the covariates across responders and non-responders. We have \(K\) covariates in the response propensity model, and to assess balance across strata we will analyze each of these covariates \(X_{ik}, k = 1, ..., K\) as an “outcome.” Specifically, in stratum \(j\) the difference between responding and non-responding units is estimated by
\[ \hat \tau_k^X(j) = \bar X_{1,k}(j) - \bar X_{0,k}(j) \] with sampling variance estimated as \[ \hat V_k^X(j) = s_k^2(j) \cdot \bigg ( \frac{1}{N_1(j)}+\frac{1}{N_0(j)} \bigg) \] where \[ s_k^2(j) = \frac{1}{N_0(j)+N_1(j)-2}\bigg ( \sum_{i:B_i(j)=1}^N (1-R_i) \cdot (X_{ik}-\bar X_{0k}(j))^2 + \sum_{i:B_i(j)=1}^N R_i \cdot (X_{ik}-\bar X_{1k}(j))^2\bigg) \]
We then take a weighted average of these estimates across blocks
\[ \hat \tau_k^X = \sum_{j=1}^J \bigg ( \frac{N_0(j)+N_1(j)}{N} \bigg ) \hat \tau_k^X(j) \] with variance \[ \hat V_k^X = \sum_{j= 1}^J \bigg (\frac{N_0(j)+N_1(j)}{N}) \bigg )^2 \cdot \hat V_k \] and convert into a z-value for a two-sided test of whether \(\hat \tau_k^X\) is equal to zero: \[ z_k = \frac{\hat \tau_k^X}{\sqrt{\hat V_k^X}} \]
To assess overall balance we can obtain the associated p-values for these z-scores and compare them to the uniform distribution. We then ask, what fraction of the empirical p-values are less than a given percentile of the uniform(0,1) distribution? In the plot below, the plotted fraction of the empirical p-values below a given percentile value on the uniform(0,1) is above the 45-degree line and the mean value is >0.50, indicating good overall balance across responders and non-responders on all the included predictors.
Balance Check for FU3 Weight
Having settled on a response propensity model with a reasonable degree of balance across responders and non-responders, we construct a non-response weight for each responding observation within the strata. These observations get a weight equal to the inverse of the average observed response rate within each stratum.
# Validate the weights sum to the target population size (i.e., FU1 sample size)
sum(xx_3.f$w.FU3)
[1] 50751
round(sum(xx_3.f$w.FU3),0)== dim(xx_3)[1]
[1] TRUE
nr.weight <- xx_3.f %>% ungroup %>% select(idnumber,w.FU3)
getwd()
[1] "/Users/gravesj/Dropbox/Projects/sccs/analysis/scripts"
save(nr.weight,file=file.path("../input",paste0("sccs-fu3-nonresponse-weight-",raw_file_date,".RData")))
In this section we will define some key samples to use later in the analysis:
The nonelderly sample are individuals aged 64 or younger as of the last vital status update.
The SCCS sample are individuals in the 12 SCCS states (e.g., there are 409 individuals who had a primary residence outside the 12 SCCS states when they were recruited into the sample; for now we’ll just exclude those from our anlayses.)
The Expansion State sample includes individuals from Kentucky, West Virginia, Arkansas and Louisiana. Note that a Restricted Expansion State sample is also defined excluding Louisiana respondants, as Louisiana expanded Medicaid while we were in the field with FU3.
The Medicare sample are individuals aged 65 or older as the last vital status update, or who self-reported Medicare coverage in any follow-up survey.
source("./scripts/define-samples.R")
# Merge in the Nonresponse weight
load(file.path("./input",paste0("sccs-fu3-nonresponse-weight-",raw_file_date,".RData")))
xx <-
xx %>% left_join(nr.weight, "idnumber") %>% dplyr::mutate(w.FU3 = ifelse(is.na(w.FU3), 0, w.FU3))
Our first analytic objective is to assess the impact of expansion on health status and mortality outcomes. We begin by defining our outcome variables based on administrative data on vital status through December 31, 2015, as well as health status variables based on the SF-12 questions asked in the FU3 survey.
One challenge is missingness in the SF-12 questions, and more generally in certain FU3 survey items. As can be seen in the plot below (which shows missingness patterns in a random sample of 200 FU3 respondents), “b” items in the SF-12 couplets are overwhelimingly more likely to be left as missing; this is evident by the vertical “bands” of grey for the I2B, I3B and I4B SF-12 questions (i.e., the second item in the couplets).
source("./scripts/composite-sf12-score.R")
ggplot_missing <- function(x){
x %>%
is.na %>%
reshape2::melt() %>%
ggplot(data = .,
aes(x = Var2,
y = Var1)) +
geom_raster(aes(fill = value)) +
scale_fill_grey(name = "",
labels = c("Present","Missing")) +
theme_minimal() +
theme(axis.text.x = element_text(angle=45, vjust=0.5)) +
labs(x = "Variables in Dataset",
y = "Rows / observations")
}
set.seed(1234)
sf12f1 %>% select(starts_with("i")) %>% select(-idnumber) %>% filter(row_number() %in% sample(seq(nrow(sf12f1)),200)) %>% ggplot_missing()
Missingness Patterns in a Random Sample of 200 SCCS FU3 Respondants
This missingness results in the following initial CONSORT diagrams for the across-state (i.e., nonelderly) and within-state samples.
For now we will utilize the MICE package to impute for item nonresponse a single time. At some point we need to revisit whether we stick with a single imputation or pursue a multiple imputation strategy..
## Imputation of Missing Values
xx.preimp <- xx
source("./scripts/impute-missing-values.R")
xx <- ss.composite.health.imputed
After imputation we obtain the following sample CONSORTs.
# Adding in non-imputed variables
not.in.xx <- c("idnumber",names(xx.preimp)[-(which(names(xx.preimp) %in%
intersect(names(xx.preimp),names(xx))))])
xx <- xx %>% left_join(xx.preimp[,not.in.xx],"idnumber")
source("./scripts/composite-sf12-score.R")
xx <-
xx %>%
left_join(sf12f1[,c("idnumber","sf12_physicalf1","sf12_mentalf1")], "idnumber") %>%
left_join(sf12f3[,c("idnumber","sf12_physicalf3","sf12_mentalf3")], "idnumber")
xx <-
xx %>% mutate(
insurance01f1 = as.integer(insurancecoveragef1 == 1),
insurance01f2 = as.integer(insurancecoveragef2 == 1),
insurance01f3 = as.integer(insurancecoveragef3 == 1),
ex_vgf1 = as.integer(sf12v2healthf1 %in% c(1, 2)),
ex_vgf3 = as.integer(sf12v2healthf3 %in% c(1, 2)),
goodf1 = as.integer(sf12v2healthf1 == 3),
goodf3 = as.integer(sf12v2healthf3 == 3),
fair_poorf1 = as.integer(sf12v2healthf1 %in% c(4, 5)),
fair_poorf3 = as.integer(sf12v2healthf3 %in% c(4, 5)),
mortalityf1 = 0,
mortalityf3 = as.integer(vitalstatus=="D")
)
For this analysis we will rely on established composite measures of physical and mental health from the SF-12. The scoring instructions have been converted (and validated using the test data) from the SAS file for the SF-12 v2.0 composite score found here.
The spike histograms below provide detail on the distribution of the composite physical and mental health measures for FU1 by state Medicaid expansion status.
xx %>% filter(ss.composite.health.acrosstate==1) %>%
mutate(healthf1 =sf12_physicalf1,
healthf3 = sf12_physicalf3,
mentalf1 = sf12_mentalf1,
mentalf3 = sf12_mentalf3) %>%
filter(!is.na(healthf3) & !(is.na(healthf1))) %>%
select(ss.expansion,healthf3,healthf1) %>%
mutate(health.cs = healthf3-healthf1) %>%
with(histboxp(x=healthf1,group=ss.expansion,sd=TRUE,bins=200,xlab="Composite Physical Health Status (FU1)")) -> p
putHfig(p, "Spike histograms, means, quantiles, Gini's mean difference, and SD for SF-12 Composite Physical Health Status stratified by expansion status",
scap="Stratified spike histograms and quantiles for SF-12 Physical Health Status")
Spike histograms, means, quantiles, Gini’s mean difference, and SD for SF-12 Composite Physical Health Status stratified by expansion status
xx %>% filter(ss.composite.health.acrosstate==1) %>%
mutate(healthf1 =sf12_physicalf1,
healthf3 = sf12_physicalf3,
mentalf1 = sf12_mentalf1,
mentalf3 = sf12_mentalf3) %>%
filter(!is.na(mentalf1) & !(is.na(mentalf3))) %>%
select(ss.expansion,mentalf3,mentalf1) %>%
mutate(mental.cs = mentalf3-mentalf1) %>%
with(histboxp(x=mentalf1,group=ss.expansion,sd=TRUE,bins=200,xlab="Composite Mental Health Status (FU1)")) -> p
putHfig(p, "Spike histograms, means, quantiles, Gini's mean difference, and SD for SF-12 Composite Mental Health Status stratified by expansion status",
scap="Stratified spike histograms and quantiles for SF-12 Mental Health Status")
Spike histograms, means, quantiles, Gini’s mean difference, and SD for SF-12 Composite Mental Health Status stratified by expansion status
xx %>% filter(ss.composite.health.acrosstate==1) %>%
mutate(healthf1 =sf12_physicalf1,
healthf3 = sf12_physicalf3,
mentalf1 = sf12_mentalf1,
mentalf3 = sf12_mentalf3) %>%
filter(!is.na(healthf3) & !(is.na(healthf1))) %>%
select(ss.expansion,healthf3,healthf1) %>%
mutate(health.cs = healthf3-healthf1) %>%
with(histboxp(x=health.cs,group=ss.expansion,sd=TRUE,bins=200,xlab="Change Score: Composite Physical Health Status (FU1 to FU3)")) -> p
putHfig(p, "Spike histograms, means, quantiles, Gini's mean difference, and SD for SF-12 Composite Physical Health Status stratified by expansion status",
scap="Stratified spike histograms and quantiles for SF-12 Physical Health Status")
<hr>
### <span style="font-family:Verdana;font-size:10px;">∟ </span><span style="font-family:Verdana;font-size:12px;color:MidnightBlue;">Stratified spike histograms and quantiles for SF-12 Physical Health Status</span>
<span style="font-family:Verdana;font-size:12px;color:MidnightBlue;">Spike histograms, means, quantiles, Gini's mean difference, and SD for SF-12 Composite Physical Health Status stratified by expansion status</span>
The following plots compare the distribution of the physical and mental health composite scores to analogous scores from a representative sample of Southern Region MEPS repsondants who are exact matched based on age and gender to the SCCS sample.
sccs.health.status <-xx %>% filter(ss.composite.health.acrosstate==1) %>%
select(sf12_mentalf1,sf12_physicalf1,age=vitalstatus_age,sex) %>%
mutate(sample="SCCS",tx=1,
male = as.integer(sex=="M")) %>% select(-sex)
load("../analysis/input/ignore/meps/sf12f1_meps15.RData")
meps.health.status <- sf12f1 %>% filter(region15==3) %>%
select(sf12_mentalf1,sf12_physicalf1,sex,age=age15x) %>%
mutate(sample = "MEPS",tx=0,
male=as.integer(sex==1)) %>% select(-sex) # Has the same name in the MEPS dataset so need to
require(MatchIt)
rbind(sccs.health.status,meps.health.status) %>% na.omit() -> meps.sccs.compare
m.out <- matchit(tx ~ male + age , data = meps.sccs.compare,
method = "exact")
meps.sccs.compare <- cbind.data.frame(weight = m.out$weights,meps.sccs.compare)
meps.sccs.compare %>% group_by(tx) %>% summarise(sum_weight=sum(weight)) %>% pull(sum_weight) -> totals
ratio <- totals[2] / totals[1]
meps.sccs.compare %>% mutate(weight = ((1-tx)*ratio+1*tx)*weight) -> meps.sccs.compare
meps.sccs.compare %>% ggplot(aes(x=sf12_physicalf1))+geom_density(aes(fill=sample,x=sf12_physicalf1,weights=weight),alpha=0.4)+theme_few()+
labs(fill="Sample",x="Physical Health Composite",y="Density")
Physical Health Composite: Density Comparison of SCCS Sample with Age- and Gender-Matched Southern Region MEPS Sample
meps.sccs.compare %>% ggplot(aes(x=sf12_mentalf1))+geom_density(aes(fill=sample,x=sf12_mentalf1,weights=weight),alpha=0.4)+theme_few()+
labs(fill="Sample",x="Mental Health Composite",y="Density")
Physical Health Composite: Density Comparison of SCCS Sample with Age- and Gender-Matched Southern Region MEPS Sample
xx.hlth <-
xx %>% filter(ss.composite.health.acrosstate==1 & jan1_2010_to_vs_months>0 ) %>%
group_by(ss.expansion) %>%
select(idnumber,
ss.expansion,
yrf1=questionnaireyrf1,
yrf2=questionnaireyrf2,
yrf3=questionnaireyrf3,
healthf1 = sf12v2healthf1,
healthf3= sf12v2healthf3,
mortality = death2015,
w.FU3,
state,
vitalstatus,r_1,r_3,
vitalstatus_months,
jan1_2010_to_vs_months) %>%
group_by(ss.expansion) %>%
mutate(expansion = ifelse(ss.expansion==1,"Expansion","Nonexpansion")) %>%
mutate(healthf3 = ifelse(r_3==0 & vitalstatus!="D",7,ifelse(r_3==0 & vitalstatus=="D",6,healthf3))) %>%
mutate(healthf1 = as.numeric(factor(healthf1)),
healthf3 = as.numeric(factor(healthf3))) %>%
filter(healthf3<=6)
f1.dist <- xx.hlth %>% filter(healthf1 %in% paste0(1:5)) %>%
mutate(healthf3=ifelse(is.na(healthf3) | healthf3==8888 | healthf3==9999,6,healthf3)) %>%
group_by(ss.expansion,healthf1) %>% mutate(weight=1) %>%
summarise(value=sum(weight)) %>% group_by(ss.expansion) %>%
mutate(n=sum(value)) %>% mutate(pct = sprintf("%1.1f%%",100*value/n))
f1.dist.nonexpand <- f1.dist %>% filter(ss.expansion==0) %>% pull(pct)
f1.dist.expand <- f1.dist %>% filter(ss.expansion==1) %>% pull(pct)
f3.dist <- xx.hlth %>% filter(healthf1 %in% paste0(1:5)) %>%
mutate(healthf3=ifelse(is.na(healthf3) | healthf3==8888 | healthf3==9999,6,healthf3)) %>%
group_by(ss.expansion,healthf3) %>% mutate(weight=1) %>%
summarise(value=sum(weight)) %>% group_by(ss.expansion) %>%
mutate(n=sum(value)) %>% mutate(pct = sprintf("%1.1f%%",100*value/n))
f3.dist.nonexpand <- f3.dist %>% filter(ss.expansion==0) %>% pull(pct)
f3.dist.expand <- f3.dist %>% filter(ss.expansion==1) %>% pull(pct)
xx.hlth %>% filter(healthf1<=5) %>%
group_by(ss.expansion,healthf1,healthf3) %>% mutate(weight=1) %>%
summarise(value=sum(weight)) %>%
group_by(ss.expansion,healthf1) %>%
gather(variable,value,-ss.expansion,-healthf1,-healthf3) %>%
group_by(ss.expansion,healthf1) %>%
mutate(n = sum(value)) %>% group_by(ss.expansion) %>%
mutate(percentage = value/n,percentage_label = sprintf("%1.1f%%", 100*percentage)) %>%
ungroup() -> temp
Health Status Transitions: Expansion States
Health Status Transitions: Nonexpansion States
Expansion Minus Nonexpansion Difference in Transition Matrix
The plot below provides the distribution of DD estimates for the 221 possible combinations of 3 Medicaid expansion states out of 12 total states. The observed DD estimate is denoted by the solid vertical line. The text annotations provide the DD estimate (same as in figure above) and the associated p-value.
library(survival)
xx.survival <-
xx.hlth %>% mutate(status = as.integer(vitalstatus=="D"))
xx.km <- survfit(coxph(Surv(jan1_2010_to_vs_months,status)~strata(ss.expansion),data=xx.survival))
survminer::ggsurvplot(xx.km, data = xx.survival, risk.table = FALSE,conf.int = TRUE,ggtheme = theme_minimal()) + ylim(0.75,1) + geom_vline(aes(xintercept=48))+xlab("Time Since January 1, 2010")+ylab("Survival Probability")
Kaplan-Meier Plot (January 1, 2010 to Death Month) Stratified by Expansion Status
xx.hlth <-
xx %>% filter(ss.composite.health.withinstate.nonelderly==1 & jan1_2010_to_vs_months>0 ) %>%
select(idnumber,
ss.expansion,
yrf1=questionnaireyrf1,
yrf2=questionnaireyrf2,
yrf3=questionnaireyrf3,
healthf1 = sf12v2healthf1,
healthf3= sf12v2healthf3,
mortality = death2015,
w.FU3,
state,
vitalstatus,r_1,r_3,
vitalstatus_months,
jan1_2010_to_vs_months,
medicaref1,
medicaref2,
medicaref3) %>%
group_by(ss.expansion) %>%
mutate(expansion = ifelse(ss.expansion==1,"Expansion","Nonexpansion")) %>%
mutate(healthf3 = ifelse(r_3==0 & vitalstatus!="D",7,ifelse(r_3==0 & vitalstatus=="D",6,healthf3))) %>%
mutate(healthf1 = as.numeric(factor(healthf1)),
healthf3 = as.numeric(factor(healthf3))) %>%
filter(healthf3<=6) %>%
mutate(medicaref1 = as.integer(medicaref1==1),
medicaref2 = as.integer(medicaref2==1),
medicaref3 = as.integer(medicaref3==1))
f1.dist <- xx.hlth %>% filter(healthf1 %in% paste0(1:5)) %>%
mutate(healthf3=ifelse(is.na(healthf3) | healthf3==8888 | healthf3==9999,6,healthf3)) %>%
group_by(medicaref1,healthf1) %>% mutate(weight=1) %>%
summarise(value=sum(weight)) %>% group_by(medicaref1) %>%
mutate(n=sum(value)) %>% mutate(pct = sprintf("%1.1f%%",100*value/n))
f1.dist.cx <- f1.dist %>% filter(medicaref1==1) %>% pull(pct)
f1.dist.tx <- f1.dist %>% filter(medicaref1==0) %>% pull(pct)
f3.dist <- xx.hlth %>% filter(healthf1 %in% paste0(1:5)) %>%
mutate(healthf3=ifelse(is.na(healthf3) | healthf3==8888 | healthf3==9999,6,healthf3)) %>%
group_by(medicaref1,healthf3) %>% mutate(weight=1) %>%
summarise(value=sum(weight)) %>% group_by(medicaref1) %>%
mutate(n=sum(value)) %>% mutate(pct = sprintf("%1.1f%%",100*value/n))
f3.dist.cx <- f3.dist %>% filter(medicaref1==1) %>% pull(pct)
f3.dist.tx <- f3.dist %>% filter(medicaref1==0) %>% pull(pct)
xx.hlth %>% filter(healthf1<=5) %>%
group_by(medicaref1,healthf1,healthf3) %>% mutate(weight=1) %>%
summarise(value=sum(weight)) %>%
group_by(medicaref1,healthf1) %>%
gather(variable,value,-medicaref1,-healthf1,-healthf3) %>%
group_by(medicaref1,healthf1) %>%
mutate(n = sum(value)) %>% group_by(medicaref1) %>%
mutate(percentage = value/n,percentage_label = sprintf("%1.1f%%", 100*percentage)) %>%
ungroup() -> temp
Health Status Transitions: Intervention Group
Health Status Transitions: Medicare Control Group
Intervention Minus Medicare Difference in Transition Matrix
library(survival)
xx.survival <-
xx.hlth %>% mutate(status = as.integer(vitalstatus=="D"))
xx.km <- survfit(coxph(Surv(jan1_2010_to_vs_months,status)~strata(medicaref1),data=xx.survival))
survminer::ggsurvplot(xx.km, data = xx.survival, risk.table = FALSE,conf.int = TRUE,ggtheme = theme_minimal()) + ylim(0.75,1) + geom_vline(aes(xintercept=48))+xlab("Time Since January 1, 2010")+ylab("Survival Probability")
Kaplan-Meier Plot (January 1, 2010 to Death Month) Stratified by Expansion Status