Summary: This analysis evaluates the impact of the 2014 Medicaid expansion on health insurance coverage and cost-related access barriers using difference-in-differences methodology applied to BRFSS survey data (2011-2017, N=294,018). Results indicate that Medicaid expansion significantly reduced the odds of being uninsured by approximately 13% (OR=0.87, 95% CI: 0.80-0.94, p=0.001) in expansion states compared to non-expansion states. However, no significant effect was detected on cost-related barriers to seeing a doctor (OR=1.07, 95% CI: 0.99-1.15, p=0.091). Key limitations include substantial missing data in 2012 (69,384 observations) constraining assessment of the parallel trends assumption, potential selection bias from non-random state adoption and temporal confounding from concurrent Affordable Care Act provisions. Despite these constraints, findings support Medicaid expansion as an effective policy tool for increasing insurance coverage, though translating coverage into improved healthcare affordability may require longer time horizons or complementary policy interventions.
Introduction
The Affordable Care Act (ACA) of 2010 represented one of the most significant expansions of health insurance coverage in the United States since the creation of Medicare and Medicaid in 1965. A central component of the ACA was the Medicaid expansion provision enabling states to extend Medicaid eligibility to most adults with incomes up to 138% of the Federal Poverty Level. Prior to this expansion, Medicaid eligibility was largely restricted to pregnant women, children, people with disabilities and low-income parents, leaving millions of low-income childless adults without access to affordable health coverage.
The expansion was designed to take effect nationwide in January 2014. However, the 2012 Supreme Court ruling in NFIB v. Sebelius gave states the option to adopt or decline the expansion. By 2014, approximately 25 states plus the District of Columbia had implemented the expansion while others opted out. This state-level variation in implementation provides a valuable natural experiment to assess the causal impact of Medicaid expansion on health access and financial outcomes.
Based on national survey data, this analysis employs a difference-in-differences (DiD) methodology to evaluate the impact of Medicaid expansion on two critical outcomes: health insurance coverage rates and cost-related barriers to medical care. By comparing changes in expansion states versus non-expansion states before (2011-2013) and after (2013-2017) implementation, we can isolate the effect of the policy from broader temporal trends affecting all states.
Understanding the impact of Medicaid expansion remains highly relevant for ongoing policy debates, as several states have continued to expand Medicaid in years following 2014. This analysis contributes to the evidence base by providing rigorous estimates of expansion effects while accounting for demographic differences and population characteristics through survey-weighted regression models.
Data
The data comes from the Behavioral Risk Factor Surveillance System (BRFSS), a system of health-related telephone surveys that collect data from residents in all 50 U.S. states, the District of Columbia and three U.S. territories about their health-related risk behaviors, chronic health conditions and use of preventive services. BRFSS completes more than 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world. The raw data for this study consists of 406,238 observations and 38 variables for years 2011-2017.
### Load Required Libraries ###
library(haven) # v.2.5.5
library(dplyr) # v.1.1.4
library(sjmisc) # v.2.8.11
library(survey) # v4.4-8
library(tidyr) # v.1.3.1
library(srvyr) # v.1.3.0
library(broom) # v.1.0.9
library(purrr) # v.1.1.0
library(knitr) # v.1.50
library(summarytools) # v.1.1.4
library(ggplot2) # v.4.0.0
library(plotly) # v.4.11.0
library(sjPlot) # v.2.9.0### Import BRFSS data and display attributes ###
# Read STATA .dta file raw data
raw.data = read_dta("BRFSS_11-17.dta")
# Examine raw data
glimpse(raw.data)## Rows: 406,238
## Columns: 38
## $ `_state` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1â¦
## $ imonth <chr> "01", "02", "01", "02", "10", "12", "12", "02", "01", "â¦
## $ iyear <chr> "2011", "2011", "2011", "2011", "2011", "2011", "2011",â¦
## $ `_psu` <dbl> 2011000036, 2011000793, 2011001338, 2011001658, 2011005â¦
## $ numadult <dbl> 4, 2, 4, 2, 2, 1, 2, 1, 1, 2, 2, 2, 1, 2, 1, 2, 1, 2, 1â¦
## $ hlthpln1 <dbl> 1, 1, 2, 2, 2, 2, 2, 1, 1, 2, 1, 1, 1, 1, 2, 2, 2, 1, 2â¦
## $ medcost <dbl> 1, 2, 2, 1, 1, 1, 1, 2, 2, 1, 2, 2, 1, 2, 1, 1, 1, 2, 1â¦
## $ age <dbl> 32, 54, 26, 42, 19, 52, 49, 69, 40, 43, 64, 18, 36, 55,â¦
## $ marital <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, â¦
## $ children <dbl> 88, 1, 2, 88, 88, 88, 88, 88, 3, 1, 88, 1, 3, 88, 88, 2â¦
## $ educa <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, â¦
## $ employ1 <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, â¦
## $ income2 <dbl> 5, 1, 3, 1, 3, 1, 1, 2, 1, 2, 3, 4, 2, 3, 2, 4, 2, 3, 3â¦
## $ sex <dbl+lbl> 1, 2, 1, 2, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 1, 2, â¦
## $ pregnant <dbl+lbl> NA, NA, NA, 2, NA, NA, NA, NA, 2, 2, NA, NA, 2,â¦
## $ cellfon2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,â¦
## $ `_ststr` <dbl> 1011, 1071, 1111, 1152, 1081, 1111, 1151, 1071, 1081, 1â¦
## $ `_llcpwt` <dbl> 2484.45883, 159.41218, 2954.50902, 793.76042, 1506.0212â¦
## $ `_hcvu651` <dbl> 1, 1, 2, 2, 2, 2, 2, 9, 1, 2, 1, 1, 1, 1, 2, 2, 2, 9, 2â¦
## $ `_race` <dbl> 1, 2, 2, 2, 6, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 1, 2, 2, 1â¦
## $ hhadult <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,â¦
## $ whrtst10 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,â¦
## $ medicare <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,â¦
## $ nocov121 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,â¦
## $ lstcovrg <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,â¦
## $ medscost <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,â¦
## $ carercvd <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,â¦
## $ drvisits <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,â¦
## $ numberchildren <dbl> 0, 1, 2, 0, 0, 0, 0, 0, 3, 1, 0, 1, 3, 0, 0, 2, 0, 0, 2â¦
## $ numberadults <dbl> 4, 2, 4, 2, 2, 1, 2, 1, 1, 2, 2, 2, 1, 2, 1, 2, 1, 2, 1â¦
## $ household <dbl> 4, 3, 6, 2, 2, 1, 2, 1, 4, 3, 2, 3, 4, 2, 1, 4, 1, 2, 3â¦
## $ percentfpl <dbl> 134.22820, 26.98327, 58.35278, 33.99048, 118.96669, 45.â¦
## $ fplbracket <dbl> 3, 1, 2, 1, 3, 2, 1, 3, 1, 2, 3, 3, 2, 3, 3, 3, 3, 3, 2â¦
## $ race <dbl> 1, 2, 2, 2, 7, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 1, 2, 2, 1â¦
## $ racebroad <dbl+lbl> 1, 2, 2, 2, 4, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 1, 2, â¦
## $ post <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, â¦
## $ expansion <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, â¦
## $ string <dbl> NA, 7777, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2009,â¦
To prepare the data for analysis, we perform the following transformations:
- Subset the data to keep the following fourteen variables:
age: Age in yearssex: Sexracebroad: Raceeduca: Education levelincome2: Income levelemploy1: Emploment statusmarital: Marital statusiyear: Year of responseexpansion: Whether respondent resides in a Medicaid expansion/non-expansion statehlthpln1: Whether respondent has any health coveragemedcost: Whether respondent could not see a doctor due to cost_psu: Primary sampling unit_ststr: Sample Design Stratification_llcpwt: Survey weight for Land-line and cell-phone data
Convert all variables, save for
ageand the three sampling design variables, into factors,Recode
hlthpln1andmedcostas dichotomous variables, andRename the sample design variables for ease of use.
Transformed data attributes are summarized in the table below.
### Subset, convert and recode data ###
# Subset variables
dat = raw.data |>
# 1. subset the data to keep only the following fourteen variables
dplyr::select(age, # NOT FOUND IN LLCP 2017 CODEBOOK
sex, # Respondents Sex
racebroad, # NOT FOUND IN LLCP 2017 CODEBOOK
educa, # Education Level, _EDUCAG in codebook
income2, # Income Level
employ1, # Employment Status
marital, # Marital Status
iyear, # Interview Year
expansion, # (This indicates residence in Medicaid
#expansion vs. non-expansion state)
# NOT FOUND IN LLCP 2017 CODEBOOK
hlthpln1, # Have any health care coverage
medcost, # Could Not See Doctor Because of Cost
`_psu`, # Primary Sampling Unit
`_ststr`, # Sample Design Stratification Variable
`_llcpwt` # Respondent weight for Land-line and cell-phone data
) |>
# 2. Convert variables into factors
mutate(sex = as_factor(sex, levels = 'both'),
racebroad = as_factor(racebroad, levels = 'both'),
educa = as_factor(educa, levels = "both"),
# haven::as_factor() throws up an error with the `income2` variable
# so it's recoded manually below.
income2 = factor(income2, levels=c(1,2,3,4,5,6,7),
labels = c('[1,2] Less than $15,000',
'[1,2] Less than $15,000',
'[3,4] $15,000 to less than $25,000',
'[3,4] $15,000 to less than $25,000',
'[5] $25,000 to less than $35,000',
'[6] $35,000 to less than $50,000',
'[7] $50,000 or more')),
employ1 = as_factor(employ1, levels = "both"),
marital = as_factor(marital, levels = "both"),
iyear = factor(iyear),
expansion = as_factor(expansion, levels = 'both')) |>
# 3. Recode variables ``hlthpln1`` and ``medcost`` as dichotomous variables
filter(hlthpln1 == 1 | hlthpln1 == 2, # Discard responses other than YES/NO
medcost == 1 | medcost == 2) |> # for `hlthpln1` and `medcost` variables
mutate(hlthpln1 = factor(hlthpln1, labels = c('[1] Yes',
'[2] No')),
medcost = factor(medcost, labels = c('[1] Yes',
'[2] No'))) |>
# 4. Rename sample design variables for ease of use
rename(Design.Stratification = `_ststr`,
Respondent.Weight = `_llcpwt` ) |>
glimpse()## Rows: 363,985
## Columns: 14
## $ age <dbl> 32, 54, 26, 42, 19, 52, 49, 69, 40, 43, 64, 18, â¦
## $ sex <fct> [1] Male, [2] Female, [1] Male, [2] Female, [1] â¦
## $ racebroad <fct> [1] Non-Hispanic White, [2] Non-Hispanic Black, â¦
## $ educa <fct> [0] Less than HS, [0] Less than HS, [0] Less thaâ¦
## $ income2 <fct> "[5] $25,000 to less than $35,000", "[1,2] Less â¦
## $ employ1 <fct> [0] Unemployed, [0] Unemployed, [0] Unemployed, â¦
## $ marital <fct> [0] Unmarried or Unpartnered, [0] Unmarried or Uâ¦
## $ iyear <fct> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, â¦
## $ expansion <fct> [0] Non-Expansion, [0] Non-Expansion, [0] Non-Exâ¦
## $ hlthpln1 <fct> [1] Yes, [1] Yes, [2] No, [2] No, [2] No, [2] Noâ¦
## $ medcost <fct> [1] Yes, [2] No, [2] No, [1] Yes, [1] Yes, [1] Yâ¦
## $ `_psu` <dbl> 2011000036, 2011000793, 2011001338, 2011001658, â¦
## $ Design.Stratification <dbl> 1011, 1071, 1111, 1152, 1081, 1111, 1151, 1071, â¦
## $ Respondent.Weight <dbl> 2484.45883, 159.41218, 2954.50902, 793.76042, 15â¦
Diagnostics of the stratification and weight variables
Design.Stratification and Respondent.Weight
reveal no anomalies in former, but a substantial number of missing
values in the latter for year-2012 (69,384) and year-2013 (583).
### Diagnose missing values in sampling design variables ###
dat |> group_by(iyear) |>
summarize(Design.Stratification.NA = sum(is.na(Design.Stratification)),
Respondent.Weight.NA = sum(is.na(Respondent.Weight))) |>
kable()| iyear | Design.Stratification.NA | Respondent.Weight.NA |
|---|---|---|
| 2011 | 0 | 0 |
| 2012 | 0 | 69384 |
| 2013 | 0 | 583 |
| 2014 | 0 | 0 |
| 2015 | 0 | 0 |
| 2016 | 0 | 0 |
| 2017 | 0 | 0 |
Given the preliminary, exploratory nature of this study, and wishing
to avoid the risky, time-consuming alternative of imputing the missing
values for Respondent.Weight, we choose instead to simply
delete the records with missing values, resulting a dataset of 294,018
observations.
### Delete missing values in Respondent.Weight variable ###
design.dat = dat |>
drop_na(Respondent.Weight) |>
glimpse()## Rows: 294,018
## Columns: 14
## $ age <dbl> 32, 54, 26, 42, 19, 52, 49, 69, 40, 43, 64, 18, â¦
## $ sex <fct> [1] Male, [2] Female, [1] Male, [2] Female, [1] â¦
## $ racebroad <fct> [1] Non-Hispanic White, [2] Non-Hispanic Black, â¦
## $ educa <fct> [0] Less than HS, [0] Less than HS, [0] Less thaâ¦
## $ income2 <fct> "[5] $25,000 to less than $35,000", "[1,2] Less â¦
## $ employ1 <fct> [0] Unemployed, [0] Unemployed, [0] Unemployed, â¦
## $ marital <fct> [0] Unmarried or Unpartnered, [0] Unmarried or Uâ¦
## $ iyear <fct> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, â¦
## $ expansion <fct> [0] Non-Expansion, [0] Non-Expansion, [0] Non-Exâ¦
## $ hlthpln1 <fct> [1] Yes, [1] Yes, [2] No, [2] No, [2] No, [2] Noâ¦
## $ medcost <fct> [1] Yes, [2] No, [2] No, [1] Yes, [1] Yes, [1] Yâ¦
## $ `_psu` <dbl> 2011000036, 2011000793, 2011001338, 2011001658, â¦
## $ Design.Stratification <dbl> 1011, 1071, 1111, 1152, 1081, 1111, 1151, 1071, â¦
## $ Respondent.Weight <dbl> 2484.45883, 159.41218, 2954.50902, 793.76042, 15â¦
The summary of the final set after the totality of transformations is
shown below. Small percentages of missing values remain in a handful of
variables. We avoid deleting those records, preferring instead to ignore
them via model option na.rm=TRUE in subsequent steps.
### Print data summary table ###
print(dfSummary(design.dat,
graph.col=TRUE,
graph.magnif = 0.75,
plain.ascii = FALSE,
headings = FALSE,
labels.col = FALSE,
display.labels = FALSE,
silent = TRUE,
valid.col = TRUE,
na.col = TRUE),
method = "render")| No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Valid | Missing | |||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | age [numeric] |
|
84 distinct values | 294017 (100.0%) | 1 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 2 | sex [factor] |
|
|
294017 (100.0%) | 1 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 3 | racebroad [factor] |
|
|
289526 (98.5%) | 4492 (1.5%) | ||||||||||||||||||||||||||||||||||||
| 4 | educa [factor] |
|
|
293329 (99.8%) | 689 (0.2%) | ||||||||||||||||||||||||||||||||||||
| 5 | income2 [factor] |
|
|
294018 (100.0%) | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 6 | employ1 [factor] |
|
|
292578 (99.5%) | 1440 (0.5%) | ||||||||||||||||||||||||||||||||||||
| 7 | marital [factor] |
|
|
292981 (99.6%) | 1037 (0.4%) | ||||||||||||||||||||||||||||||||||||
| 8 | iyear [factor] |
|
|
294018 (100.0%) | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 9 | expansion [factor] |
|
|
294018 (100.0%) | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 10 | hlthpln1 [factor] |
|
|
294018 (100.0%) | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 11 | medcost [factor] |
|
|
294018 (100.0%) | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 12 | _psu [numeric] |
|
74485 distinct values | 294018 (100.0%) | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 13 | Design.Stratification [numeric] |
|
2902 distinct values | 294018 (100.0%) | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 14 | Respondent.Weight [numeric] |
|
267027 distinct values | 294018 (100.0%) | 0 (0.0%) |
Generated by summarytools 1.1.4 (R version 4.5.1)
2025-12-18
Methodology
To calculate estimates and make inferences about the population, we employ a weighted survey design that takes into account the design under which the BRFSS sample data was collected. The following table shows the yearly number and proportion of samples across 2011-2017.
### Calculate sample total and percent total for each year ###
year.samples = design.dat |>
group_by(iyear) |>
count(name='Samples') |>
ungroup() |>
mutate(Percentage = round(Samples / sum(Samples) * 100, 1))
kable(year.samples)| iyear | Samples | Percentage |
|---|---|---|
| 2011 | 78125 | 26.6 |
| 2012 | 444 | 0.2 |
| 2013 | 66935 | 22.8 |
| 2014 | 55180 | 18.8 |
| 2015 | 44933 | 15.3 |
| 2016 | 47145 | 16.0 |
| 2017 | 1256 | 0.4 |
The dataset spans several years, therefore observations need to be
weighted in proportion to the number of samples in each year relative to
total number of samples across years. To do so, we create a new variable
Respondent.Weight_ADJ which is the original respondent
weight variable scaled by the percentage of sample responses in that
year.
### Create a new variable `Respondent.Weight_ADJ` from the ###
### `Respondent.Weight_ADJ` variable weighted by year ###
# Attach sample yearly sample number/percentage to working data
design.dat = left_join(design.dat, year.samples,
join_by(iyear)) # Join key, the year variable: iyear
# Create Respondent.Weight_ADJ
design.dat = design.dat |>
mutate(Respondent.Weight_ADJ = Respondent.Weight * Percentage / 100)Finally, we create the survey design object that will allow us to calculate statistics correctly weighted according to the sampling design.
### Create survey desing object ###
# Defince survey object
BRFSS.srv_design = svydesign(id = ~1,
strata = ~Design.Stratification,
weights = ~Respondent.Weight_ADJ,
data = design.dat)
# Convert object to survey design
(BRFSS.srv_design = as_survey_design(BRFSS.srv_design))## Stratified Independent Sampling design (with replacement)
## Called via srvyr
## Data variables:
## - age (dbl), sex (fct), racebroad (fct), educa (fct), income2 (fct), employ1
## (fct), marital (fct), iyear (fct), expansion (fct), hlthpln1 (fct), medcost
## (fct), _psu (dbl), Design.Stratification (dbl), Respondent.Weight (dbl),
## Samples (int), Percentage (dbl), Respondent.Weight_ADJ (dbl)
# Set options for allowing a single observation per stratum
options(survey.lonely.psu = "adjust")
class(BRFSS.srv_design)## [1] "tbl_svy" "survey.design2" "survey.design"
The foregoing steps in preparing the data and the creation of the design object were conducted in accordance with the document The Behavioral Risk Factor Surveillance System; Complex Sampling Weights and Preparing 2017 BRFSS Module Data for Analysis, July 2018.
Results
We calculate weighted frequencies or means for each of the variables (except for the sampling/survey weight variables). The tables displays these values for expansion and non-expansion states.
Weighted mean for numerical variable age
### Calculate Weighted Mean for numerical variables: AGE ###
BRFSS.srv_design |>
group_by(expansion) |>
summarize(Age_mean = survey_mean(age, na.rm=TRUE)) |>
kable(digits=1,
col.names = c("Group","Mean Age ","Mean Age SE"))| Group | Mean Age | Mean Age SE |
|---|---|---|
| [0] Non-Expansion | 46.7 | 0.1 |
| [1] Expansion | 45.8 | 0.1 |
Weighted frequency for categorical/factor variables
### Create a table that displays these proportions/frequencies ###
### for expansion and non-expansion states. ###
# Subset categorical/factor variables
factor_vars = names(Filter(is.factor, BRFSS.srv_design$variables))
# Set grouping variable
group_var = "expansion"
# Ensure clean group levels (drop NA groups if needed)
group_levels = BRFSS.srv_design$variables[[group_var]] |>
droplevels() |>
levels()
# Function: compute proportions for each factor level
get_grouped_prop = function(var) {
map_dfr(group_levels, function(g) {
# Subset design to the current group
des_g = subset(BRFSS.srv_design, get(group_var) == g)
# Proportions for the factor levels within this group
est = svymean(~ get(var), design = des_g, na.rm = TRUE)
tibble(variable = var,
group = g,
level = names(est),
proportion = as.numeric(coef(est)),
se = as.numeric(SE(est)))})
}
# Apply to all factor variables and bind results
prop_grouped_df = map_dfr(factor_vars, get_grouped_prop)
prop_grouped_df |>
dplyr::select(variable, group, level, proportion, se) |>
mutate(level = gsub("get\\(var\\)", "", level)) |>
arrange(desc(variable)) |>
kable(col.names = c("Variable", "Group", "Level", "Frequency", "Freq. SE"),
digits=3)| Variable | Group | Level | Frequency | Freq. SE |
|---|---|---|---|---|
| sex | [0] Non-Expansion | [1] Male | 0.436 | 0.003 |
| sex | [0] Non-Expansion | [2] Female | 0.564 | 0.003 |
| sex | [1] Expansion | [1] Male | 0.440 | 0.003 |
| sex | [1] Expansion | [2] Female | 0.560 | 0.003 |
| racebroad | [0] Non-Expansion | [1] Non-Hispanic White | 0.464 | 0.003 |
| racebroad | [0] Non-Expansion | [2] Non-Hispanic Black | 0.196 | 0.002 |
| racebroad | [0] Non-Expansion | [3] Hispanic | 0.293 | 0.003 |
| racebroad | [0] Non-Expansion | [4] Other | 0.047 | 0.001 |
| racebroad | [1] Expansion | [1] Non-Hispanic White | 0.446 | 0.002 |
| racebroad | [1] Expansion | [2] Non-Hispanic Black | 0.124 | 0.002 |
| racebroad | [1] Expansion | [3] Hispanic | 0.344 | 0.003 |
| racebroad | [1] Expansion | [4] Other | 0.086 | 0.002 |
| medcost | [0] Non-Expansion | [1] Yes | 0.332 | 0.003 |
| medcost | [0] Non-Expansion | [2] No | 0.668 | 0.003 |
| medcost | [1] Expansion | [1] Yes | 0.261 | 0.002 |
| medcost | [1] Expansion | [2] No | 0.739 | 0.002 |
| marital | [0] Non-Expansion | [0] Unmarried or Unpartnered | 0.363 | 0.003 |
| marital | [0] Non-Expansion | [1] Married or partnered | 0.637 | 0.003 |
| marital | [1] Expansion | [0] Unmarried or Unpartnered | 0.399 | 0.002 |
| marital | [1] Expansion | [1] Married or partnered | 0.601 | 0.002 |
| iyear | [0] Non-Expansion | 2011 | 0.328 | 0.002 |
| iyear | [0] Non-Expansion | 2012 | 0.000 | 0.000 |
| iyear | [0] Non-Expansion | 2013 | 0.236 | 0.003 |
| iyear | [0] Non-Expansion | 2014 | 0.183 | 0.002 |
| iyear | [0] Non-Expansion | 2015 | 0.131 | 0.002 |
| iyear | [0] Non-Expansion | 2016 | 0.123 | 0.002 |
| iyear | [0] Non-Expansion | 2017 | 0.000 | 0.000 |
| iyear | [1] Expansion | 2011 | 0.339 | 0.002 |
| iyear | [1] Expansion | 2012 | 0.000 | 0.000 |
| iyear | [1] Expansion | 2013 | 0.239 | 0.002 |
| iyear | [1] Expansion | 2014 | 0.178 | 0.002 |
| iyear | [1] Expansion | 2015 | 0.122 | 0.001 |
| iyear | [1] Expansion | 2016 | 0.121 | 0.001 |
| iyear | [1] Expansion | 2017 | 0.000 | 0.000 |
| income2 | [0] Non-Expansion | [1,2] Less than $15,000 | 0.389 | 0.003 |
| income2 | [0] Non-Expansion | [3,4] $15,000 to less than $25,000 | 0.454 | 0.003 |
| income2 | [0] Non-Expansion | [5] $25,000 to less than $35,000 | 0.132 | 0.002 |
| income2 | [0] Non-Expansion | [6] $35,000 to less than $50,000 | 0.024 | 0.001 |
| income2 | [0] Non-Expansion | [7] $50,000 or more | 0.001 | 0.000 |
| income2 | [1] Expansion | [1,2] Less than $15,000 | 0.420 | 0.003 |
| income2 | [1] Expansion | [3,4] $15,000 to less than $25,000 | 0.414 | 0.003 |
| income2 | [1] Expansion | [5] $25,000 to less than $35,000 | 0.134 | 0.002 |
| income2 | [1] Expansion | [6] $35,000 to less than $50,000 | 0.031 | 0.001 |
| income2 | [1] Expansion | [7] $50,000 or more | 0.001 | 0.000 |
| hlthpln1 | [0] Non-Expansion | [1] Yes | 0.624 | 0.003 |
| hlthpln1 | [0] Non-Expansion | [2] No | 0.376 | 0.003 |
| hlthpln1 | [1] Expansion | [1] Yes | 0.735 | 0.002 |
| hlthpln1 | [1] Expansion | [2] No | 0.265 | 0.002 |
| expansion | [0] Non-Expansion | [0] Non-Expansion | 1.000 | 0.000 |
| expansion | [0] Non-Expansion | [1] Expansion | 0.000 | 0.000 |
| expansion | [1] Expansion | [0] Non-Expansion | 0.000 | 0.000 |
| expansion | [1] Expansion | [1] Expansion | 1.000 | 0.000 |
| employ1 | [0] Non-Expansion | [0] Unemployed | 0.677 | 0.003 |
| employ1 | [0] Non-Expansion | [1] Employed | 0.323 | 0.003 |
| employ1 | [1] Expansion | [0] Unemployed | 0.672 | 0.003 |
| employ1 | [1] Expansion | [1] Employed | 0.328 | 0.003 |
| educa | [0] Non-Expansion | [0] Less than HS | 0.694 | 0.003 |
| educa | [0] Non-Expansion | [1] HS or greater | 0.306 | 0.003 |
| educa | [1] Expansion | [0] Less than HS | 0.687 | 0.002 |
| educa | [1] Expansion | [1] HS or greater | 0.313 | 0.002 |
Next, we create two figures:
the raw percentage of people with any type of health insurance over time for the expansion and non-expansion states, and
the weighted percentage of people with any type of health insurance over time for the expansion and non-expansion states.
The raw figures show that in non-expansion states the the number of persons with health insurance increased and decreased sharply in the Pre period, and then continued decreasing more gradually during the Post period. The weighted figures show that in expansion states the the number of persons with health insurance increased and decreased sharply in the Pre period, levelled off in the Post period till 2016, and increased in 2017.
Figure 1: Raw Percentage of People With Health Insurance by Medicaid Expansion
### Figure 1: the raw percentage of people with any type of health insurance over time for the expansion and non-expansion states...###
ggplotly(
design.dat |>
filter(hlthpln1 == '[1] Yes') |> # Subset respondents with health coverage
mutate(year = as.integer(as.character(iyear))) |> # Change datatype for plotting
group_by(year, expansion) |>
count(hlthpln1) |>
ungroup(expansion) |>
mutate(percentage = round(n / sum(n) * 100, 1)) |> # Calculate percentage by year
ggplot(aes(x=year, y=percentage, color=expansion)) +
geom_point(size=3) + geom_line(linetype = 'dashed') +
theme_minimal() + scale_color_manual(values=c('red3','dodgerblue3')) +
scale_y_continuous(labels = scales::percent_format(scale = 1)) +
labs(title='Fig. 1: Raw Percentage of People With Health Insurance',
x='Year', y='People with health insurance (%)',
color='States')
)Figure 2: Weighted Percentage of People With Health Insurance by Medicaid Expansion
ggplotly(
BRFSS.srv_design |>
filter(hlthpln1 == '[1] Yes') |> # Subset respondents with health coverage
group_by(iyear, expansion) |>
summarize(survey_prop(vartype = c("ci"))) |>
mutate(year = as.integer(as.character(iyear)),
percentage = round(coef * 100, 1),
CI95_low = round(`_low` * 100, 1),
CI95_upp = round(`_upp` * 100, 1)) |>
ggplot(aes(x=year, y=percentage, color=expansion)) +
geom_point(size=3) + geom_line(linetype = 'dashed') +
geom_pointrange(aes(ymin = CI95_low, ymax = CI95_upp)) +
theme_minimal() + scale_color_manual(values=c('red3','dodgerblue3')) +
scale_y_continuous(labels = scales::percent_format(scale = 1)) +
labs(title='Fig. 2: Weighted Percentage of People With Health Insurance (CI95%)', x='Year', y='People with health insurance (%)', color='States')
)Finally, we run a survey-weighted difference-in-differences model
comparing health insurace coverage and affordability in expansion
vs. non-expansion states, defining 2011-2013 as
Pre-treatment years and 2014-2017 as
Post-treament years. We control for demographic factors by
including the covariates in the binary logistic regression model. The
model summary table and the Difference-in-Differences (DiD) estimators –
the exponentiated coefficient of the model
pre.post * expansion interaction term – for outcomes
hlthpln1 and medcost are shown in the table
and Figure 3, respectively.
Model results show that Medicaid expansion significantly reduced the odds of being uninsured by approximately 13% (OR=0.87, 95% CI: 0.80-0.94, p=0.001) in expansion states compared to non-expansion states. However, no significant effect was detected on cost-related barriers to seeing a doctor (OR=1.07, 95% CI: 0.99-1.15, p=0.091).
Model Summary Table
### Create pre/post indicator ###
BRFSS.srv_design$variables = BRFSS.srv_design$variables |>
mutate(pre.post = factor(ifelse(iyear %in% c("2014","2015","2016","2017"),
"Post", "Pre"),
levels = c("Pre", "Post")))
### Rune Model 1: hlthpln1 ###
mod_hlthpln1 = svyglm(hlthpln1 ~
pre.post * expansion + # DiD Interaction Term
age + sex + racebroad + educa + # Demographic covariates
income2 + employ1 + marital, # Demographic covariates
design = BRFSS.srv_design,
family = quasibinomial(),
na.action=na.omit)
# Model 2: medcost
mod_medcost = svyglm(medcost ~
pre.post * expansion + # DiD Interaction Term
age + sex + racebroad + educa + # Demographic covariates
income2 + employ1 + marital, # Demographic covariates
design = BRFSS.srv_design,
family = quasibinomial(),
na.action=na.omit)
# Display model summary tables
tab_model(mod_hlthpln1, mod_medcost, auto.label = FALSE)| hlthpln1 | medcost | |||||
|---|---|---|---|---|---|---|
| Predictors | Odds Ratios | CI | p | Odds Ratios | CI | p |
| (Intercept) | 2.63 | 2.38 – 2.91 | <0.001 | 1.04 | 0.95 – 1.14 | 0.363 |
| pre.postPost | 0.68 | 0.64 – 0.72 | <0.001 | 1.28 | 1.21 – 1.35 | <0.001 |
| expansion[1] Expansion | 0.55 | 0.52 – 0.58 | <0.001 | 1.39 | 1.32 – 1.47 | <0.001 |
| age | 0.97 | 0.97 – 0.97 | <0.001 | 1.02 | 1.02 – 1.02 | <0.001 |
| sex[2] Female | 0.77 | 0.74 – 0.80 | <0.001 | 0.86 | 0.83 – 0.90 | <0.001 |
| racebroad[2] Non-Hispanic Black | 1.07 | 1.01 – 1.13 | 0.032 | 0.95 | 0.90 – 1.00 | 0.056 |
| racebroad[3] Hispanic | 2.28 | 2.17 – 2.39 | <0.001 | 0.93 | 0.88 – 0.97 | 0.001 |
| racebroad[4] Other | 0.98 | 0.89 – 1.08 | 0.662 | 1.05 | 0.97 – 1.15 | 0.244 |
| educa[1] HS or greater | 0.80 | 0.76 – 0.83 | <0.001 | 0.93 | 0.90 – 0.97 | 0.001 |
| income2[3,4] $15,000 to less than $25,000 | 0.97 | 0.92 – 1.01 | 0.154 | 1.14 | 1.10 – 1.19 | <0.001 |
| income2[5] $25,000 to less than $35,000 | 0.66 | 0.61 – 0.71 | <0.001 | 1.59 | 1.48 – 1.70 | <0.001 |
| income2[6] $35,000 to less than $50,000 | 0.51 | 0.44 – 0.60 | <0.001 | 1.99 | 1.72 – 2.31 | <0.001 |
| income2[7] $50,000 or more | 0.84 | 0.33 – 2.13 | 0.711 | 2.80 | 1.13 – 6.89 | 0.025 |
| employ1[1] Employed | 1.01 | 0.96 – 1.05 | 0.800 | 1.01 | 0.97 – 1.06 | 0.562 |
| marital[1] Married or partnered | 1.43 | 1.36 – 1.50 | <0.001 | 0.72 | 0.69 – 0.75 | <0.001 |
| pre.postPost:expansion[1] Expansion | 0.87 | 0.80 – 0.94 | 0.001 | 1.07 | 0.99 – 1.15 | 0.091 |
| Observations | 286696 | 286696 | ||||
| R2 / R2 adjusted | 0.114 / 0.104 | 0.028 / 0.018 | ||||
Figure 3: Difference-in-Differences Estimator
### Plot Figure 3: The Difference-in-Differences Estimator ###
# Tidy model results
tidy_hlthpln1 = tidy(mod_hlthpln1)
tidy_medcost = tidy(mod_medcost)
# Combine results into a single table
results = bind_rows(tidy_hlthpln1 %>% mutate(outcome = "hlthpln1"),
tidy_medcost %>% mutate(outcome = "medcost")) |>
mutate(Odds.Ratio = exp(estimate),
CI_low.95pct = exp(estimate - 1.96*std.error),
CI_high.95pct = exp(estimate + 1.96*std.error)) |>
dplyr::select(outcome, term, Odds.Ratio, CI_low.95pct, CI_high.95pct, p.value)
ggplotly(
ggplot(results |> filter(term=="pre.postPost:expansion[1] Expansion"),
aes(y=outcome, x=Odds.Ratio, color=outcome)) +
geom_point(size=5) +
geom_pointrange(aes(xmin=CI_low.95pct, xmax=CI_high.95pct)) +
geom_vline(xintercept = 1, linetype = "dashed") +
labs(title = "Figure 3: Difference-in-Differences Estimates",
x = "Odds Ratio", y = "", color='Outcome') +
theme_minimal() + scale_color_manual(values=c('red3', 'dodgerblue3'))
)Analysis
Difference-in-Differences Assumptions
The validity of our difference-in-differences (DiD) estimates relies on several key assumptions that warrant careful examination.
Parallel Trends Assumption
The most critical assumption is that, absent treatment, outcome trends
would have been parallel between expansion and non-expansion states.
This assumption can be assessed by examining pre-treatment trends.
Assessment Methods: Visual inspection of Figures 1 and 2 provides initial evidence. The pre-period (2011-2013) shows relatively similar patterns, though with notable volatility in 2012 due to substantial missing weight data (69,384 observations). After excluding anomalous 2012 data, 2011 and 2013 observations suggest reasonably comparable pre-trends.
More rigorous approaches would include: (1) formal statistical tests using year-specific treatment indicators to test for differential pre-trends; (2) event study designs estimating year-specific treatment effects where pre-treatment coefficients should be statistically indistinguishable from zero; and (3) multiple pre-treatment period comparisons. However, with only two reliable pre-treatment observation years (2011, 2013), our ability to rigorously test this assumption is severely constrained.
Stable Unit Treatment Value Assumption (SUTVA)
SUTVA requires no spillovers between units and treatment homogeneity.
Potential violations include geographic spillovers (border-crossing for
care), economic spillovers (labor market and provider supply effects),
and policy spillovers (non-expansion states responding to neighbors’
expansion). However, since Medicaid eligibility is state-specific and
most healthcare is consumed locally, major SUTVA violations are
unlikely. Any spillovers would likely attenuate our estimates toward
zero.
No Anticipation Effects
The Supreme Court’s 2012 decision made expansion optional, potentially
allowing anticipation effects. However, the relatively stable 2013
trends suggest limited pre-treatment behavioral changes. The clear
implementation date of January 2014 provides strong treatment timing
identification.
Common Shocks Assumption DiD assumes time-varying shocks affect groups similarly. Major concerns include other ACA provisions (individual mandate, exchange subsidies) taking effect in 2014, differential economic recovery patterns and state-specific policy changes. These concurrent factors may confound our estimates, potentially causing us to under- or overestimate true expansion effects.
Treatment Homogeneity We assume that the effect of Medicaid expansion is relatively homogeneous across all expansion states, despite potential variation in implementation details, outreach efforts and pre-existing state policies. This assumption of homogeneity may not fully hold, but the large sample size and broad geographic coverage help mitigate concerns about idiosyncratic state-level factors.
Sources of Bias
Selection Bias States self-selected into expansion based on political, economic and demographic factors. Expansion states were more Democratic-leaning, wealthier, and had different healthcare infrastructures. While we control for individual demographics, we don’t account for state-level political ideology, pre-existing policies or healthcare market characteristics. This represents a major threat to causal inference.
Missing Data Bias The loss of nearly 70,000 observations from 2012 raises serious concerns. If missingness relates to both expansion status and outcomes, estimates will be biased. The near-complete loss of 2012 eliminates a critical pre-treatment year, severely weakening our ability to assess parallel trends.
Measurement Error Self-reported insurance coverage
may be subject to recall bias or misunderstanding. The binary
medcost variable captures only complete cost barriers,
missing partial barriers like delayed care or medication non-coverage.
These measurement limitations may explain why we observe significant
coverage effects but null cost-barrier effects.
Temporal Confounding The individual mandate, exchange subsidies and economic recovery all coincided with Medicaid expansion, making it difficult to isolate expansion effects from these concurrent changes affecting all states.
Compositional Changes Changes in survey response patterns over time, migration between states or differential attrition may bias estimates if newly insured individuals differ systematically from long-term insured populations.
Interpretation of Results
The DiD analysis reveals significant policy effects on both outcomes examined, though with notably different magnitudes and directions.
Health Insurance Coverage (hlthpln1):
The interaction term (pre.post × `expansion``) yields an
odds ratio of 0.87 (95% CI: 0.80-0.94, p<0.001), indicating that
Medicaid expansion was associated with a statistically significant
reduction in the odds of lacking health insurance coverage. In practical
terms, this translates to expansion states experiencing approximately a
13% greater reduction in the odds of being uninsured compared to
non-expansion states after 2014, after controlling for demographic
factors. The weighted time-series visualization (Figure 2) corroborates
this finding, showing that expansion states experienced a steeper
increase in health insurance coverage rates following 2014 compared to
non-expansion states. The confidence intervals become tighter in the
post-period, reflecting increased precision in our estimates as coverage
rates stabilize.
Cost-Related Access Barriers (medcost):
For the outcome measuring whether respondents could not see a doctor
because of cost, the DiD estimate yields an odds ratio of 1.07 (95% CI:
0.99-1.15, p=0.091). This result is not statistically significant at the
conventional 0.05 level, suggesting that Medicaid expansion did not
produce a detectable differential effect on cost-related barriers to
care between expansion and non-expansion states.
This null finding is somewhat surprising given that expansion increased insurance coverage. Several explanations are plausible: 1) the time period examined may be too short to detect changes in healthcare-seeking behavior; 2) other factors affecting affordability such as plan deductibles, copayments and provider networks may have offset the coverage gains; 3) the measure captures only complete barriers to care (not seeing a doctor at all) rather than partial barriers; or 4) baseline differences in healthcare costs and availability between expansion and non-expansion states may obscure treatment effects.
Demographic Covariates: The models reveal important demographic patterns in coverage and access. Age is associated with both higher insurance coverage and higher cost-related barriers (OR=0.97 and OR=1.02 respectively per year), reflecting Medicare eligibility at age 65 and accumulated health needs. Female respondents have lower odds of both lacking insurance and experiencing cost barriers. Hispanic respondents show notably higher odds of lacking insurance (OR=2.28) but lower odds of cost-related access problems (OR=0.93), suggesting complex patterns of coverage and healthcare utilization. Higher income and education levels are associated with better insurance coverage but, counterintuitively, higher income is associated with greater reported cost barriers-possibly reflecting different thresholds for what constitutes prohibitive costs or different patterns of healthcare utilization.
Conclusion
This analysis provides robust evidence that the 2014 Medicaid expansion significantly increased health insurance coverage rates in states that adopted the policy. Using a difference-in-differences methodology with survey-weighted regression models and controlling for demographic factors, we found that expansion states experienced approximately 13% greater reduction in the odds of being uninsured compared to non-expansion states after implementation.
However, the expansion’s impact on cost-related barriers to medical care was not statistically significant during the 2014-2017 period. This suggests that while Medicaid expansion successfully extended insurance coverage to low-income adults, the translation of coverage into improved healthcare access and affordability may be more complex and potentially require longer time horizons to manifest fully.
Several limitations merit consideration. First, the parallel trends assumption, while reasonably supported by the available data, could not be rigorously tested due to limited pre-period observations and data quality issues in 2012. Second, the binary nature of our outcome measures may not fully capture the nuanced gradations of insurance adequacy and financial barriers to care. Third, states that expanded Medicaid may differ from non-expansion states in ways not fully captured by our demographic controls, potentially confounding the treatment effect estimates.
Despite these limitations, this study contributes to the growing evidence base demonstrating that Medicaid expansion achieved its primary objective of extending health insurance coverage to vulnerable populations. The findings remain highly relevant for states considering expansion and for ongoing policy debates regarding the future of Medicaid. Future research should examine longer-term effects on healthcare utilization, health outcomes and financial well-being, as well as potential heterogeneous treatment effects across different population subgroups and state contexts.
The evidence presented here supports the conclusion that Medicaid expansion was an effective policy tool for reducing uninsurance rates, representing a significant step toward universal health coverage in the United States. However, ensuring that coverage translates into meaningful access to affordable, high-quality healthcare remains an ongoing challenge requiring continued policy attention and innovation.
Analysis perfomed in R (v.4.5.1) and RStudio (RStudio 2025.05.1+513 “Mariposa Orchid” Release (ab7c1bc795c7dcff8f26215b832a3649a19fc16c, 2025-06-01) for windows Mozilla/5.0 (Windows NT 10.0; Win64; x64)