Description

This file contains the initial analytic work for for the SCCS project.

Nonresponse and Missingness

Our first objective is to identify the sample size and characteristics of responders to the follow-up surveys. In total there are 31,713 responders to the follow-up 3 (FU3) survey.

Flow Diagram of Follow-Up Participation

Flow Diagram Demonstrating Follow-Up Participation and Loss-To-Follow-Up

Nonresponse Modeling and Adjustment

In this section we will explore the predictors of follow-up nonresponse, as well as come up with a set of nonresponse weights we will later use to test the sensitivity of our results for nonresponse.

Pre-Specify Characteristics for Modeling Nonresponse

Our first step is to pre-specify a set of characteristics and administrative variables deemed a priori as potentially important for response. We can begin with a baseline response propensity specification with linear terms for each of these covariates, though we could also consider choosing a more flexible specification that considers second-order and interaction terms (possibly selected by an iterative algorithm and checked via cross-validation of the prediction errors to mitigate against the risk of overfitting).

# Baseline variables for predicting response
  z <- c("education","sex","maritalstatus","hhsize","employed",
         "enrollment_source","raceanalysis","insurancecoverage")

# Variables we want to enter as a spline
  z.spline <- c("enrollment_age")

# Follow-Up variables for predicting response (e.g., lagged value of the outcomes).
  x_2 <- c("insurancecoveragef1")
  x_3 <- c("insurancecoveragef1")

Fit the response propensity model

We next fit logistic regression models predicting survey response.

Construct response propensity strata

After estimating the response propensity model we next assess the adequacy of the predicted response score. To do this we exploit a feature of the true response propensity score: the independence of the response indicator and the baseline characteristics given the score. Ideally, strata would be constructed pairing individuals with the same value of the score; however, this isn’t feasible given the wide range of values the score could take on.

Therefore, we proceed by stratifying the data using the following iterative procedure.

  • We begin with one stratum (J=1) and test whether the current number of strata is sufficient using a t-statistic calculated by comparing the mean values for responders and non-responders of the log-odds ratio for the response propensity scores We assess the adequacy of this block using the estimated linearized response propensity score (log odds ratio), defined as \[ \hat l(x) = \mathrm{ln}\bigg ( \frac{\hat e(x)}{1-\hat e(x)} \bigg ) \] We focus on the log odds ratio because it is more likely to have an (approximately) normal distribution (see figure below).

  • If there is evidence that the mean values of the log odds ratio of the estimated score for responders and non-responders are unequal in the given stratum (for example, if the t-statistic is above a pre-determined cutoff value of 1) then the stratum is split in half according to the median value of the estimated score within the stratum. This split is subject not only to the pre-determined t-value cutoff, but also to pre-specified minimum sample characteristics.

In the notation below, let \(N_0(j)\) and \(N_1(j)\) be the subsample sizes of non-responders and responders in substratum \(j\). Similarly, let \(\bar l_0(j)\) and \(\bar l_1(j)\) be the average value of the linearized response propensity within the stratum. Finally, let \(S^2\) be the sample variance of the linearized response propensity within block j.

The plot below shows the density distribution of response propensity scores among responders and non-responders within each of the J strata.

Assessing global balance across strata

We next check overall balance of the covariates across responders and non-responders. We have \(K\) covariates in the response propensity model, and to assess balance across strata we will analyze each of these covariates \(X_{ik}, k = 1, ..., K\) as an “outcome.” Specifically, in stratum \(j\) the difference between responding and non-responding units is estimated by

\[ \hat \tau_k^X(j) = \bar X_{1,k}(j) - \bar X_{0,k}(j) \] with sampling variance estimated as \[ \hat V_k^X(j) = s_k^2(j) \cdot \bigg ( \frac{1}{N_1(j)}+\frac{1}{N_0(j)} \bigg) \] where \[ s_k^2(j) = \frac{1}{N_0(j)+N_1(j)-2}\bigg ( \sum_{i:B_i(j)=1}^N (1-R_i) \cdot (X_{ik}-\bar X_{0k}(j))^2 + \sum_{i:B_i(j)=1}^N R_i \cdot (X_{ik}-\bar X_{1k}(j))^2\bigg) \]

We then take a weighted average of these estimates across blocks

\[ \hat \tau_k^X = \sum_{j=1}^J \bigg ( \frac{N_0(j)+N_1(j)}{N} \bigg ) \hat \tau_k^X(j) \] with variance \[ \hat V_k^X = \sum_{j= 1}^J \bigg (\frac{N_0(j)+N_1(j)}{N}) \bigg )^2 \cdot \hat V_k \] and convert into a z-value for a two-sided test of whether \(\hat \tau_k^X\) is equal to zero: \[ z_k = \frac{\hat \tau_k^X}{\sqrt{\hat V_k^X}} \]

To assess overall balance we can obtain the associated p-values for these z-scores and compare them to the uniform distribution. We then ask, what fraction of the empirical p-values are less than a given percentile of the uniform(0,1) distribution? In the plot below, the plotted fraction of the empirical p-values below a given percentile value on the uniform(0,1) is above the 45-degree line and the mean value is >0.50, indicating good overall balance across responders and non-responders on all the included predictors.

Balance Check for FU3 Weight

Balance Check for FU3 Weight

Construct the non-response weights

Having settled on a response propensity model with a reasonable degree of balance across responders and non-responders, we construct a non-response weight for each responding observation within the strata. These observations get a weight equal to the inverse of the average observed response rate within each stratum.

# Validate the weights sum to the target population size (i.e., FU1 sample size)
sum(xx_3.f$w.FU3) 
## [1] 50752
round(sum(xx_3.f$w.FU3),0)== dim(xx_3)[1]
## [1] TRUE
nr.weight <- xx_3.f %>% ungroup %>% select(idnumber,w.FU3)

Define Samples

We will define several primary samples for the analysis:

We also need to define some administrative tracking variables:

Age Distribution of Sample at Follow-Up 3

Density Distribution of Age at Follow-Up 3 (Each Shaded Region is 25% of the Data)

Density Distribution of Age at Follow-Up 3 (Each Shaded Region is 25% of the Data)

Noneldery Sample: Density Distribution of Age at Follow-Up 3 (Each Shaded Region is 25% of the Data)

Noneldery Sample: Density Distribution of Age at Follow-Up 3 (Each Shaded Region is 25% of the Data)

Health Insurance Outcomes

Insurance Coverage by Follow-Up Round, Nonelderly Adults (as of FU3)
expansion Insured.Baseline Insured.FU1 Insured.FU2 Insured.FU3 Diff.FU1.FU3
Expansion 0.5550 0.7043 0.7810 0.9557 0.2474
Nonexpansion 0.5742 0.6705 0.7389 0.8412 0.1684
Insurance Coverage by Follow-Up Round, Nonelderly Adults (as of FU3) - WEIGHTED FOR NONRESPONSE
expansion Insured.Baseline Insured.FU1 Insured.FU2 Insured.FU3 Diff.FU1.FU3
Expansion 0.5826 0.7284 0.8025 0.9597 0.2312
Nonexpansion 0.5906 0.6888 0.7459 0.8465 0.1577

Self-Reported Health Status Outcomes

Self-Reported Health by Follow-Up Round, Nonelderly Adults (as of FU3) - WEIGHTED FOR NONRESPONSE
group FU1_Expansion FU3_Expansion diff_Expansion FU1_Nonexpansion FU3_Nonexpansion diff_Nonexpansion diff_in_diff
Excellent 0.0690 0.0634 -0.0055 0.0656 0.0605 -0.0052 -0.0003
Very Good 0.2158 0.2186 0.0028 0.2002 0.1980 -0.0022 0.0049
Good 0.3543 0.3539 -0.0004 0.3611 0.3656 0.0045 -0.0050
Fair 0.2675 0.2754 0.0079 0.2956 0.2969 0.0013 0.0067
Poor 0.0934 0.0884 -0.0050 0.0773 0.0787 0.0014 -0.0064
Refuse 0.0000 0.0000 0.0000 0.0002 0.0001 -0.0001 0.0001
Don’t Know 0.0690 0.0003 -0.0687 0.0656 0.0002 -0.0654 -0.0033
Self-Reported Health by Follow-Up Round, Nonelderly Adults (as of FU3)
group FU1_Expansion FU3_Expansion diff_Expansion FU1_Nonexpansion FU3_Nonexpansion diff_Nonexpansion diff_in_diff
Excellent 0.0587 0.0530 -0.0057 0.0583 0.0519 -0.0064 0.0007
Very Good 0.1928 0.1983 0.0055 0.1828 0.1784 -0.0044 0.0099
Good 0.3521 0.3511 -0.0010 0.3568 0.3648 0.0080 -0.0090
Fair 0.2896 0.2971 0.0075 0.3168 0.3177 0.0010 0.0065
Poor 0.1067 0.1001 -0.0066 0.0852 0.0868 0.0016 -0.0082
Refuse 0.0000 0.0000 0.0000 0.0002 0.0001 -0.0001 0.0001
Don’t Know 0.0587 0.0003 -0.0584 0.0583 0.0003 -0.0580 -0.0004