Assessing the Impact of Medicaid Expansion on Health Service Access and Cost

CLAUDIO, Mauricio

2025-12-08

Summary: This analysis evaluates the impact of the 2014 Medicaid expansion on health insurance coverage and cost-related access barriers using difference-in-differences methodology applied to BRFSS survey data (2011-2017, N=294,018). Results indicate that Medicaid expansion significantly reduced the odds of being uninsured by approximately 13% (OR=0.87, 95% CI: 0.80-0.94, p=0.001) in expansion states compared to non-expansion states. However, no significant effect was detected on cost-related barriers to seeing a doctor (OR=1.07, 95% CI: 0.99-1.15, p=0.091). Key limitations include substantial missing data in 2012 (69,384 observations) constraining assessment of the parallel trends assumption, potential selection bias from non-random state adoption and temporal confounding from concurrent Affordable Care Act provisions. Despite these constraints, findings support Medicaid expansion as an effective policy tool for increasing insurance coverage, though translating coverage into improved healthcare affordability may require longer time horizons or complementary policy interventions.



Introduction

The Affordable Care Act (ACA) of 2010 represented one of the most significant expansions of health insurance coverage in the United States since the creation of Medicare and Medicaid in 1965. A central component of the ACA was the Medicaid expansion provision enabling states to extend Medicaid eligibility to most adults with incomes up to 138% of the Federal Poverty Level. Prior to this expansion, Medicaid eligibility was largely restricted to pregnant women, children, people with disabilities and low-income parents, leaving millions of low-income childless adults without access to affordable health coverage.

The expansion was designed to take effect nationwide in January 2014. However, the 2012 Supreme Court ruling in NFIB v. Sebelius gave states the option to adopt or decline the expansion. By 2014, approximately 25 states plus the District of Columbia had implemented the expansion while others opted out. This state-level variation in implementation provides a valuable natural experiment to assess the causal impact of Medicaid expansion on health access and financial outcomes.

Based on national survey data, this analysis employs a difference-in-differences (DiD) methodology to evaluate the impact of Medicaid expansion on two critical outcomes: health insurance coverage rates and cost-related barriers to medical care. By comparing changes in expansion states versus non-expansion states before (2011-2013) and after (2013-2017) implementation, we can isolate the effect of the policy from broader temporal trends affecting all states.

Understanding the impact of Medicaid expansion remains highly relevant for ongoing policy debates, as several states have continued to expand Medicaid in years following 2014. This analysis contributes to the evidence base by providing rigorous estimates of expansion effects while accounting for demographic differences and population characteristics through survey-weighted regression models.


Data

The data comes from the Behavioral Risk Factor Surveillance System (BRFSS), a system of health-related telephone surveys that collect data from residents in all 50 U.S. states, the District of Columbia and three U.S. territories about their health-related risk behaviors, chronic health conditions and use of preventive services. BRFSS completes more than 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world. The raw data for this study consists of 406,238 observations and 38 variables for years 2011-2017.

### Load Required Libraries ###
library(haven) # v.2.5.5
library(dplyr) # v.1.1.4
library(sjmisc) # v.2.8.11
library(survey) # v4.4-8
library(tidyr) # v.1.3.1
library(srvyr) # v.1.3.0
library(broom) # v.1.0.9
library(purrr) # v.1.1.0
library(knitr) # v.1.50
library(summarytools) # v.1.1.4
library(ggplot2) # v.4.0.0
library(plotly) # v.4.11.0
library(sjPlot) # v.2.9.0
### Import BRFSS data and display attributes ###
# Read STATA .dta file raw data
raw.data = read_dta("BRFSS_11-17.dta")

# Examine raw data 
glimpse(raw.data)
## Rows: 406,238
## Columns: 38
## $ `_state`       <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ imonth         <chr> "01", "02", "01", "02", "10", "12", "12", "02", "01", "…
## $ iyear          <chr> "2011", "2011", "2011", "2011", "2011", "2011", "2011",…
## $ `_psu`         <dbl> 2011000036, 2011000793, 2011001338, 2011001658, 2011005…
## $ numadult       <dbl> 4, 2, 4, 2, 2, 1, 2, 1, 1, 2, 2, 2, 1, 2, 1, 2, 1, 2, 1…
## $ hlthpln1       <dbl> 1, 1, 2, 2, 2, 2, 2, 1, 1, 2, 1, 1, 1, 1, 2, 2, 2, 1, 2…
## $ medcost        <dbl> 1, 2, 2, 1, 1, 1, 1, 2, 2, 1, 2, 2, 1, 2, 1, 1, 1, 2, 1…
## $ age            <dbl> 32, 54, 26, 42, 19, 52, 49, 69, 40, 43, 64, 18, 36, 55,…
## $ marital        <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, …
## $ children       <dbl> 88, 1, 2, 88, 88, 88, 88, 88, 3, 1, 88, 1, 3, 88, 88, 2…
## $ educa          <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, …
## $ employ1        <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, …
## $ income2        <dbl> 5, 1, 3, 1, 3, 1, 1, 2, 1, 2, 3, 4, 2, 3, 2, 4, 2, 3, 3…
## $ sex            <dbl+lbl> 1, 2, 1, 2, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 1, 2, …
## $ pregnant       <dbl+lbl> NA, NA, NA,  2, NA, NA, NA, NA,  2,  2, NA, NA,  2,…
## $ cellfon2       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ `_ststr`       <dbl> 1011, 1071, 1111, 1152, 1081, 1111, 1151, 1071, 1081, 1…
## $ `_llcpwt`      <dbl> 2484.45883, 159.41218, 2954.50902, 793.76042, 1506.0212…
## $ `_hcvu651`     <dbl> 1, 1, 2, 2, 2, 2, 2, 9, 1, 2, 1, 1, 1, 1, 2, 2, 2, 9, 2…
## $ `_race`        <dbl> 1, 2, 2, 2, 6, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 1, 2, 2, 1…
## $ hhadult        <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ whrtst10       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ medicare       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ nocov121       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ lstcovrg       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ medscost       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ carercvd       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ drvisits       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ numberchildren <dbl> 0, 1, 2, 0, 0, 0, 0, 0, 3, 1, 0, 1, 3, 0, 0, 2, 0, 0, 2…
## $ numberadults   <dbl> 4, 2, 4, 2, 2, 1, 2, 1, 1, 2, 2, 2, 1, 2, 1, 2, 1, 2, 1…
## $ household      <dbl> 4, 3, 6, 2, 2, 1, 2, 1, 4, 3, 2, 3, 4, 2, 1, 4, 1, 2, 3…
## $ percentfpl     <dbl> 134.22820, 26.98327, 58.35278, 33.99048, 118.96669, 45.…
## $ fplbracket     <dbl> 3, 1, 2, 1, 3, 2, 1, 3, 1, 2, 3, 3, 2, 3, 3, 3, 3, 3, 2…
## $ race           <dbl> 1, 2, 2, 2, 7, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 1, 2, 2, 1…
## $ racebroad      <dbl+lbl> 1, 2, 2, 2, 4, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 1, 2, …
## $ post           <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ expansion      <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ string         <dbl> NA, 7777, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2009,…


To prepare the data for analysis, we perform the following transformations:

  1. Subset the data to keep the following fourteen variables:
  • age: Age in years
  • sex: Sex
  • racebroad: Race
  • educa: Education level
  • income2: Income level
  • employ1: Emploment status
  • marital: Marital status
  • iyear: Year of response
  • expansion: Whether respondent resides in a Medicaid expansion/non-expansion state
  • hlthpln1: Whether respondent has any health coverage
  • medcost: Whether respondent could not see a doctor due to cost
  • _psu: Primary sampling unit
  • _ststr: Sample Design Stratification
  • _llcpwt: Survey weight for Land-line and cell-phone data
  1. Convert all variables, save for age and the three sampling design variables, into factors,

  2. Recode hlthpln1 and medcost as dichotomous variables, and

  3. Rename the sample design variables for ease of use.

Transformed data attributes are summarized in the table below.

### Subset, convert and recode data ###
# Subset variables
dat = raw.data |>
   # 1. subset the data to keep only the following fourteen variables
   dplyr::select(age, # NOT FOUND IN LLCP 2017 CODEBOOK
                 sex, # Respondents Sex 
                 racebroad, # NOT FOUND IN LLCP 2017 CODEBOOK
                 educa, #  Education Level, _EDUCAG in codebook
                 income2, #  Income Level
                 employ1, # Employment Status 
                 marital, # Marital Status
                 iyear, # Interview Year 
                 expansion, # (This indicates residence in Medicaid
                                 #expansion vs. non-expansion state)
                                 # NOT FOUND IN LLCP 2017 CODEBOOK
                 hlthpln1, # Have any health care coverage
                 medcost, # Could Not See Doctor Because of Cost 
                 `_psu`, # Primary Sampling Unit 
                 `_ststr`, # Sample Design Stratification Variable 
                 `_llcpwt` # Respondent weight for Land-line and cell-phone data
                 ) |>
   # 2. Convert variables into factors
   mutate(sex = as_factor(sex, levels = 'both'),
          racebroad = as_factor(racebroad, levels = 'both'),
          educa = as_factor(educa, levels = "both"),
          # haven::as_factor() throws up an error with the `income2` variable
          # so it's recoded manually below.
          income2 = factor(income2, levels=c(1,2,3,4,5,6,7),
                           labels = c('[1,2] Less than $15,000',
                                      '[1,2] Less than $15,000',
                                      '[3,4] $15,000 to less than $25,000',
                                      '[3,4] $15,000 to less than $25,000',
                                      '[5] $25,000 to less than $35,000',
                                      '[6] $35,000 to less than $50,000',
                                      '[7] $50,000 or more')),
          employ1 = as_factor(employ1, levels = "both"),
          marital = as_factor(marital, levels = "both"),
          iyear = factor(iyear),
          expansion = as_factor(expansion, levels = 'both')) |>
   # 3. Recode variables ``hlthpln1`` and ``medcost`` as dichotomous variables
   filter(hlthpln1 == 1 | hlthpln1 == 2, # Discard responses other than YES/NO
          medcost == 1 | medcost == 2) |> # for `hlthpln1` and `medcost` variables
   mutate(hlthpln1 = factor(hlthpln1, labels = c('[1] Yes',
                                                 '[2] No')),
          medcost = factor(medcost, labels = c('[1] Yes',
                                               '[2] No'))) |>
   # 4. Rename sample design variables for ease of use
   rename(Design.Stratification = `_ststr`,
          Respondent.Weight = `_llcpwt` ) |>
   glimpse()
## Rows: 363,985
## Columns: 14
## $ age                   <dbl> 32, 54, 26, 42, 19, 52, 49, 69, 40, 43, 64, 18, …
## $ sex                   <fct> [1] Male, [2] Female, [1] Male, [2] Female, [1] …
## $ racebroad             <fct> [1] Non-Hispanic White, [2] Non-Hispanic Black, …
## $ educa                 <fct> [0] Less than HS, [0] Less than HS, [0] Less tha…
## $ income2               <fct> "[5] $25,000 to less than $35,000", "[1,2] Less …
## $ employ1               <fct> [0] Unemployed, [0] Unemployed, [0] Unemployed, …
## $ marital               <fct> [0] Unmarried or Unpartnered, [0] Unmarried or U…
## $ iyear                 <fct> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, …
## $ expansion             <fct> [0] Non-Expansion, [0] Non-Expansion, [0] Non-Ex…
## $ hlthpln1              <fct> [1] Yes, [1] Yes, [2] No, [2] No, [2] No, [2] No…
## $ medcost               <fct> [1] Yes, [2] No, [2] No, [1] Yes, [1] Yes, [1] Y…
## $ `_psu`                <dbl> 2011000036, 2011000793, 2011001338, 2011001658, …
## $ Design.Stratification <dbl> 1011, 1071, 1111, 1152, 1081, 1111, 1151, 1071, …
## $ Respondent.Weight     <dbl> 2484.45883, 159.41218, 2954.50902, 793.76042, 15…


Diagnostics of the stratification and weight variables Design.Stratification and Respondent.Weight reveal no anomalies in former, but a substantial number of missing values in the latter for year-2012 (69,384) and year-2013 (583).

### Diagnose missing values in sampling design variables ###
dat |> group_by(iyear) |>
   summarize(Design.Stratification.NA = sum(is.na(Design.Stratification)),
             Respondent.Weight.NA = sum(is.na(Respondent.Weight))) |>
   kable()
iyear Design.Stratification.NA Respondent.Weight.NA
2011 0 0
2012 0 69384
2013 0 583
2014 0 0
2015 0 0
2016 0 0
2017 0 0


Given the preliminary, exploratory nature of this study, and wishing to avoid the risky, time-consuming alternative of imputing the missing values for Respondent.Weight, we choose instead to simply delete the records with missing values, resulting a dataset of 294,018 observations.

### Delete missing values in Respondent.Weight variable ###
design.dat = dat |>
   drop_na(Respondent.Weight) |>
   glimpse()
## Rows: 294,018
## Columns: 14
## $ age                   <dbl> 32, 54, 26, 42, 19, 52, 49, 69, 40, 43, 64, 18, …
## $ sex                   <fct> [1] Male, [2] Female, [1] Male, [2] Female, [1] …
## $ racebroad             <fct> [1] Non-Hispanic White, [2] Non-Hispanic Black, …
## $ educa                 <fct> [0] Less than HS, [0] Less than HS, [0] Less tha…
## $ income2               <fct> "[5] $25,000 to less than $35,000", "[1,2] Less …
## $ employ1               <fct> [0] Unemployed, [0] Unemployed, [0] Unemployed, …
## $ marital               <fct> [0] Unmarried or Unpartnered, [0] Unmarried or U…
## $ iyear                 <fct> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, …
## $ expansion             <fct> [0] Non-Expansion, [0] Non-Expansion, [0] Non-Ex…
## $ hlthpln1              <fct> [1] Yes, [1] Yes, [2] No, [2] No, [2] No, [2] No…
## $ medcost               <fct> [1] Yes, [2] No, [2] No, [1] Yes, [1] Yes, [1] Y…
## $ `_psu`                <dbl> 2011000036, 2011000793, 2011001338, 2011001658, …
## $ Design.Stratification <dbl> 1011, 1071, 1111, 1152, 1081, 1111, 1151, 1071, …
## $ Respondent.Weight     <dbl> 2484.45883, 159.41218, 2954.50902, 793.76042, 15…


The summary of the final set after the totality of transformations is shown below. Small percentages of missing values remain in a handful of variables. We avoid deleting those records, preferring instead to ignore them via model option na.rm=TRUE in subsequent steps.

### Print data summary table ###
print(dfSummary(design.dat,
                graph.col=TRUE,
                graph.magnif = 0.75,
                plain.ascii = FALSE,
                headings = FALSE,
                labels.col = FALSE,
                display.labels = FALSE,
                silent = TRUE,
                valid.col = TRUE,
                na.col = TRUE),
      method = "render")
No Variable Stats / Values Freqs (% of Valid) Graph Valid Missing
1 age [numeric]
Mean (sd) : 54 (17.4)
min ≤ med ≤ max:
7 ≤ 55 ≤ 99
IQR (CV) : 28 (0.3)
84 distinct values 294017 (100.0%) 1 (0.0%)
2 sex [factor]
1. [1] Male
2. [2] Female
102147(34.7%)
191870(65.3%)
294017 (100.0%) 1 (0.0%)
3 racebroad [factor]
1. [1] Non-Hispanic White
2. [2] Non-Hispanic Black
3. [3] Hispanic
4. [4] Other
178951(61.8%)
40786(14.1%)
44122(15.2%)
25667(8.9%)
289526 (98.5%) 4492 (1.5%)
4 educa [factor]
1. [0] Less than HS
2. [1] HS or greater
182218(62.1%)
111111(37.9%)
293329 (99.8%) 689 (0.2%)
5 income2 [factor]
1. [1,2] Less than $15,000
2. [3,4] $15,000 to less tha
3. [5] $25,000 to less than
4. [6] $35,000 to less than
5. [7] $50,000 or more
156742(53.3%)
103951(35.4%)
26703(9.1%)
6393(2.2%)
229(0.1%)
294018 (100.0%) 0 (0.0%)
6 employ1 [factor]
1. [0] Unemployed
2. [1] Employed
222897(76.2%)
69681(23.8%)
292578 (99.5%) 1440 (0.5%)
7 marital [factor]
1. [0] Unmarried or Unpartne
2. [1] Married or partnered
157178(53.6%)
135803(46.4%)
292981 (99.6%) 1037 (0.4%)
8 iyear [factor]
1. 2011
2. 2012
3. 2013
4. 2014
5. 2015
6. 2016
7. 2017
78125(26.6%)
444(0.2%)
66935(22.8%)
55180(18.8%)
44933(15.3%)
47145(16.0%)
1256(0.4%)
294018 (100.0%) 0 (0.0%)
9 expansion [factor]
1. [0] Non-Expansion
2. [1] Expansion
126614(43.1%)
167404(56.9%)
294018 (100.0%) 0 (0.0%)
10 hlthpln1 [factor]
1. [1] Yes
2. [2] No
231801(78.8%)
62217(21.2%)
294018 (100.0%) 0 (0.0%)
11 medcost [factor]
1. [1] Yes
2. [2] No
72156(24.5%)
221862(75.5%)
294018 (100.0%) 0 (0.0%)
12 _psu [numeric]
Mean (sd) : 2013450376 (1768210)
min ≤ med ≤ max:
2.011e+09 ≤ 2.014e+09 ≤ 2016036930
IQR (CV) : 3989355 (0)
74485 distinct values 294018 (100.0%) 0 (0.0%)
13 Design.Stratification [numeric]
Mean (sd) : 215226.7 (172359.2)
min ≤ med ≤ max:
1011 ≤ 201041 ≤ 562019
IQR (CV) : 317021 (0.8)
2902 distinct values 294018 (100.0%) 0 (0.0%)
14 Respondent.Weight [numeric]
Mean (sd) : 575.9 (1126.7)
min ≤ med ≤ max:
0.4 ≤ 206.5 ≤ 33134.9
IQR (CV) : 497.8 (2)
267027 distinct values 294018 (100.0%) 0 (0.0%)

Generated by summarytools 1.1.4 (R version 4.5.1)
2025-12-18



Methodology

To calculate estimates and make inferences about the population, we employ a weighted survey design that takes into account the design under which the BRFSS sample data was collected. The following table shows the yearly number and proportion of samples across 2011-2017.

### Calculate sample total and percent total for each year ###
year.samples = design.dat |>
   group_by(iyear) |>
   count(name='Samples') |>
   ungroup() |>
   mutate(Percentage = round(Samples / sum(Samples) * 100, 1))

kable(year.samples)
iyear Samples Percentage
2011 78125 26.6
2012 444 0.2
2013 66935 22.8
2014 55180 18.8
2015 44933 15.3
2016 47145 16.0
2017 1256 0.4


The dataset spans several years, therefore observations need to be weighted in proportion to the number of samples in each year relative to total number of samples across years. To do so, we create a new variable Respondent.Weight_ADJ which is the original respondent weight variable scaled by the percentage of sample responses in that year.

### Create a new variable `Respondent.Weight_ADJ` from the ###
### `Respondent.Weight_ADJ` variable weighted by year ###
# Attach sample yearly sample number/percentage to working data
design.dat = left_join(design.dat, year.samples,
                       join_by(iyear)) # Join key, the year variable: iyear

# Create Respondent.Weight_ADJ
design.dat = design.dat |>
   mutate(Respondent.Weight_ADJ = Respondent.Weight * Percentage / 100)


Finally, we create the survey design object that will allow us to calculate statistics correctly weighted according to the sampling design.

### Create survey desing object ###
# Defince survey object
BRFSS.srv_design = svydesign(id = ~1,
                             strata = ~Design.Stratification,
                             weights = ~Respondent.Weight_ADJ,
                             data = design.dat)

# Convert object to survey design
(BRFSS.srv_design = as_survey_design(BRFSS.srv_design))
## Stratified Independent Sampling design (with replacement)
## Called via srvyr
## Data variables: 
##   - age (dbl), sex (fct), racebroad (fct), educa (fct), income2 (fct), employ1
##     (fct), marital (fct), iyear (fct), expansion (fct), hlthpln1 (fct), medcost
##     (fct), _psu (dbl), Design.Stratification (dbl), Respondent.Weight (dbl),
##     Samples (int), Percentage (dbl), Respondent.Weight_ADJ (dbl)
# Set options for allowing a single observation per stratum
options(survey.lonely.psu = "adjust") 

class(BRFSS.srv_design)
## [1] "tbl_svy"        "survey.design2" "survey.design"

The foregoing steps in preparing the data and the creation of the design object were conducted in accordance with the document The Behavioral Risk Factor Surveillance System; Complex Sampling Weights and Preparing 2017 BRFSS Module Data for Analysis, July 2018.


Results

We calculate weighted frequencies or means for each of the variables (except for the sampling/survey weight variables). The tables displays these values for expansion and non-expansion states.

Weighted mean for numerical variable age

### Calculate Weighted Mean for numerical variables: AGE ###
BRFSS.srv_design |>
   group_by(expansion) |>
   summarize(Age_mean = survey_mean(age, na.rm=TRUE)) |>
   kable(digits=1,
         col.names = c("Group","Mean Age ","Mean Age SE"))
Group Mean Age Mean Age SE
[0] Non-Expansion 46.7 0.1
[1] Expansion 45.8 0.1


Weighted frequency for categorical/factor variables

### Create a table that displays these proportions/frequencies ###
### for expansion and non-expansion states. ###
# Subset categorical/factor variables
factor_vars = names(Filter(is.factor, BRFSS.srv_design$variables))

# Set grouping variable
group_var = "expansion"

# Ensure clean group levels (drop NA groups if needed)
group_levels = BRFSS.srv_design$variables[[group_var]] |>
   droplevels() |>
   levels()

# Function: compute proportions for each factor level
get_grouped_prop = function(var) {
   map_dfr(group_levels, function(g) {
   # Subset design to the current group
   des_g = subset(BRFSS.srv_design, get(group_var) == g)
   # Proportions for the factor levels within this group
   est = svymean(~ get(var), design = des_g, na.rm = TRUE)
   tibble(variable = var,
          group = g,
          level = names(est),
          proportion = as.numeric(coef(est)),
          se = as.numeric(SE(est)))})
}

# Apply to all factor variables and bind results
prop_grouped_df = map_dfr(factor_vars, get_grouped_prop)

prop_grouped_df |>
   dplyr::select(variable, group, level, proportion, se) |>
   mutate(level = gsub("get\\(var\\)", "", level)) |>
   arrange(desc(variable)) |>
   kable(col.names = c("Variable", "Group", "Level", "Frequency", "Freq. SE"),
         digits=3)
Variable Group Level Frequency Freq. SE
sex [0] Non-Expansion [1] Male 0.436 0.003
sex [0] Non-Expansion [2] Female 0.564 0.003
sex [1] Expansion [1] Male 0.440 0.003
sex [1] Expansion [2] Female 0.560 0.003
racebroad [0] Non-Expansion [1] Non-Hispanic White 0.464 0.003
racebroad [0] Non-Expansion [2] Non-Hispanic Black 0.196 0.002
racebroad [0] Non-Expansion [3] Hispanic 0.293 0.003
racebroad [0] Non-Expansion [4] Other 0.047 0.001
racebroad [1] Expansion [1] Non-Hispanic White 0.446 0.002
racebroad [1] Expansion [2] Non-Hispanic Black 0.124 0.002
racebroad [1] Expansion [3] Hispanic 0.344 0.003
racebroad [1] Expansion [4] Other 0.086 0.002
medcost [0] Non-Expansion [1] Yes 0.332 0.003
medcost [0] Non-Expansion [2] No 0.668 0.003
medcost [1] Expansion [1] Yes 0.261 0.002
medcost [1] Expansion [2] No 0.739 0.002
marital [0] Non-Expansion [0] Unmarried or Unpartnered 0.363 0.003
marital [0] Non-Expansion [1] Married or partnered 0.637 0.003
marital [1] Expansion [0] Unmarried or Unpartnered 0.399 0.002
marital [1] Expansion [1] Married or partnered 0.601 0.002
iyear [0] Non-Expansion 2011 0.328 0.002
iyear [0] Non-Expansion 2012 0.000 0.000
iyear [0] Non-Expansion 2013 0.236 0.003
iyear [0] Non-Expansion 2014 0.183 0.002
iyear [0] Non-Expansion 2015 0.131 0.002
iyear [0] Non-Expansion 2016 0.123 0.002
iyear [0] Non-Expansion 2017 0.000 0.000
iyear [1] Expansion 2011 0.339 0.002
iyear [1] Expansion 2012 0.000 0.000
iyear [1] Expansion 2013 0.239 0.002
iyear [1] Expansion 2014 0.178 0.002
iyear [1] Expansion 2015 0.122 0.001
iyear [1] Expansion 2016 0.121 0.001
iyear [1] Expansion 2017 0.000 0.000
income2 [0] Non-Expansion [1,2] Less than $15,000 0.389 0.003
income2 [0] Non-Expansion [3,4] $15,000 to less than $25,000 0.454 0.003
income2 [0] Non-Expansion [5] $25,000 to less than $35,000 0.132 0.002
income2 [0] Non-Expansion [6] $35,000 to less than $50,000 0.024 0.001
income2 [0] Non-Expansion [7] $50,000 or more 0.001 0.000
income2 [1] Expansion [1,2] Less than $15,000 0.420 0.003
income2 [1] Expansion [3,4] $15,000 to less than $25,000 0.414 0.003
income2 [1] Expansion [5] $25,000 to less than $35,000 0.134 0.002
income2 [1] Expansion [6] $35,000 to less than $50,000 0.031 0.001
income2 [1] Expansion [7] $50,000 or more 0.001 0.000
hlthpln1 [0] Non-Expansion [1] Yes 0.624 0.003
hlthpln1 [0] Non-Expansion [2] No 0.376 0.003
hlthpln1 [1] Expansion [1] Yes 0.735 0.002
hlthpln1 [1] Expansion [2] No 0.265 0.002
expansion [0] Non-Expansion [0] Non-Expansion 1.000 0.000
expansion [0] Non-Expansion [1] Expansion 0.000 0.000
expansion [1] Expansion [0] Non-Expansion 0.000 0.000
expansion [1] Expansion [1] Expansion 1.000 0.000
employ1 [0] Non-Expansion [0] Unemployed 0.677 0.003
employ1 [0] Non-Expansion [1] Employed 0.323 0.003
employ1 [1] Expansion [0] Unemployed 0.672 0.003
employ1 [1] Expansion [1] Employed 0.328 0.003
educa [0] Non-Expansion [0] Less than HS 0.694 0.003
educa [0] Non-Expansion [1] HS or greater 0.306 0.003
educa [1] Expansion [0] Less than HS 0.687 0.002
educa [1] Expansion [1] HS or greater 0.313 0.002


Next, we create two figures:

  1. the raw percentage of people with any type of health insurance over time for the expansion and non-expansion states, and

  2. the weighted percentage of people with any type of health insurance over time for the expansion and non-expansion states.

The raw figures show that in non-expansion states the the number of persons with health insurance increased and decreased sharply in the Pre period, and then continued decreasing more gradually during the Post period. The weighted figures show that in expansion states the the number of persons with health insurance increased and decreased sharply in the Pre period, levelled off in the Post period till 2016, and increased in 2017.

Figure 1: Raw Percentage of People With Health Insurance by Medicaid Expansion

### Figure 1: the raw percentage of people with any type of health insurance over time for the expansion and non-expansion states...###
ggplotly(
design.dat |>
   filter(hlthpln1 == '[1] Yes') |> # Subset respondents with health coverage
   mutate(year = as.integer(as.character(iyear))) |> # Change datatype for plotting
   group_by(year, expansion) |>
   count(hlthpln1) |>
   ungroup(expansion) |>
   mutate(percentage = round(n / sum(n) * 100, 1)) |> # Calculate percentage by year
   ggplot(aes(x=year, y=percentage, color=expansion)) +
   geom_point(size=3) + geom_line(linetype = 'dashed') +
   theme_minimal() + scale_color_manual(values=c('red3','dodgerblue3')) +
   scale_y_continuous(labels = scales::percent_format(scale = 1)) +
   labs(title='Fig. 1: Raw Percentage of People With Health Insurance',
        x='Year', y='People with health insurance (%)',
        color='States')
)

Figure 2: Weighted Percentage of People With Health Insurance by Medicaid Expansion

ggplotly(
BRFSS.srv_design |>
      filter(hlthpln1 == '[1] Yes') |> # Subset respondents with health coverage
      group_by(iyear, expansion) |>
      summarize(survey_prop(vartype = c("ci"))) |>
      mutate(year = as.integer(as.character(iyear)),
             percentage = round(coef * 100, 1),
             CI95_low = round(`_low` * 100, 1),
             CI95_upp = round(`_upp` * 100, 1)) |>
   ggplot(aes(x=year, y=percentage, color=expansion)) +
   geom_point(size=3) + geom_line(linetype = 'dashed') +
   geom_pointrange(aes(ymin = CI95_low, ymax = CI95_upp)) +
   theme_minimal() + scale_color_manual(values=c('red3','dodgerblue3')) +
   scale_y_continuous(labels = scales::percent_format(scale = 1)) +
   labs(title='Fig. 2: Weighted Percentage of People With Health Insurance (CI95%)', x='Year', y='People with health insurance (%)', color='States')
)


Finally, we run a survey-weighted difference-in-differences model comparing health insurace coverage and affordability in expansion vs. non-expansion states, defining 2011-2013 as Pre-treatment years and 2014-2017 as Post-treament years. We control for demographic factors by including the covariates in the binary logistic regression model. The model summary table and the Difference-in-Differences (DiD) estimators – the exponentiated coefficient of the model pre.post * expansion interaction term – for outcomes hlthpln1 and medcost are shown in the table and Figure 3, respectively.

Model results show that Medicaid expansion significantly reduced the odds of being uninsured by approximately 13% (OR=0.87, 95% CI: 0.80-0.94, p=0.001) in expansion states compared to non-expansion states. However, no significant effect was detected on cost-related barriers to seeing a doctor (OR=1.07, 95% CI: 0.99-1.15, p=0.091).

Model Summary Table

### Create pre/post indicator ###
BRFSS.srv_design$variables = BRFSS.srv_design$variables |>
   mutate(pre.post = factor(ifelse(iyear %in% c("2014","2015","2016","2017"),
                                   "Post", "Pre"),
                            levels = c("Pre", "Post")))

### Rune Model 1: hlthpln1 ###
mod_hlthpln1 = svyglm(hlthpln1 ~
                         pre.post * expansion + # DiD Interaction Term
                         age + sex + racebroad + educa +  # Demographic covariates
                         income2 + employ1 + marital, # Demographic covariates
                      design = BRFSS.srv_design,
                      family = quasibinomial(),
                      na.action=na.omit)

# Model 2: medcost
mod_medcost = svyglm(medcost ~
                        pre.post * expansion + # DiD Interaction Term
                        age + sex + racebroad + educa + # Demographic covariates
                        income2 + employ1 + marital, # Demographic covariates
                     design = BRFSS.srv_design,
                     family = quasibinomial(),
                     na.action=na.omit)

# Display model summary tables
tab_model(mod_hlthpln1, mod_medcost, auto.label = FALSE)
  hlthpln1 medcost
Predictors Odds Ratios CI p Odds Ratios CI p
(Intercept) 2.63 2.38 – 2.91 <0.001 1.04 0.95 – 1.14 0.363
pre.postPost 0.68 0.64 – 0.72 <0.001 1.28 1.21 – 1.35 <0.001
expansion[1] Expansion 0.55 0.52 – 0.58 <0.001 1.39 1.32 – 1.47 <0.001
age 0.97 0.97 – 0.97 <0.001 1.02 1.02 – 1.02 <0.001
sex[2] Female 0.77 0.74 – 0.80 <0.001 0.86 0.83 – 0.90 <0.001
racebroad[2] Non-Hispanic Black 1.07 1.01 – 1.13 0.032 0.95 0.90 – 1.00 0.056
racebroad[3] Hispanic 2.28 2.17 – 2.39 <0.001 0.93 0.88 – 0.97 0.001
racebroad[4] Other 0.98 0.89 – 1.08 0.662 1.05 0.97 – 1.15 0.244
educa[1] HS or greater 0.80 0.76 – 0.83 <0.001 0.93 0.90 – 0.97 0.001
income2[3,4] $15,000 to less than $25,000 0.97 0.92 – 1.01 0.154 1.14 1.10 – 1.19 <0.001
income2[5] $25,000 to less than $35,000 0.66 0.61 – 0.71 <0.001 1.59 1.48 – 1.70 <0.001
income2[6] $35,000 to less than $50,000 0.51 0.44 – 0.60 <0.001 1.99 1.72 – 2.31 <0.001
income2[7] $50,000 or more 0.84 0.33 – 2.13 0.711 2.80 1.13 – 6.89 0.025
employ1[1] Employed 1.01 0.96 – 1.05 0.800 1.01 0.97 – 1.06 0.562
marital[1] Married or partnered 1.43 1.36 – 1.50 <0.001 0.72 0.69 – 0.75 <0.001
pre.postPost:expansion[1] Expansion 0.87 0.80 – 0.94 0.001 1.07 0.99 – 1.15 0.091
Observations 286696 286696
R2 / R2 adjusted 0.114 / 0.104 0.028 / 0.018


Figure 3: Difference-in-Differences Estimator

### Plot Figure 3: The Difference-in-Differences Estimator ###
# Tidy model results
tidy_hlthpln1 = tidy(mod_hlthpln1)
tidy_medcost  = tidy(mod_medcost)

# Combine results into a single table
results = bind_rows(tidy_hlthpln1 %>% mutate(outcome = "hlthpln1"),
                    tidy_medcost  %>% mutate(outcome = "medcost")) |>
   mutate(Odds.Ratio = exp(estimate),
          CI_low.95pct = exp(estimate - 1.96*std.error),
          CI_high.95pct = exp(estimate + 1.96*std.error)) |>
   dplyr::select(outcome, term, Odds.Ratio, CI_low.95pct, CI_high.95pct, p.value)


ggplotly(
ggplot(results |> filter(term=="pre.postPost:expansion[1] Expansion"),
       aes(y=outcome, x=Odds.Ratio, color=outcome)) +
   geom_point(size=5) +
   geom_pointrange(aes(xmin=CI_low.95pct, xmax=CI_high.95pct)) +
   geom_vline(xintercept = 1, linetype = "dashed") +
   labs(title = "Figure 3: Difference-in-Differences Estimates",
       x = "Odds Ratio", y = "", color='Outcome') +
   theme_minimal() + scale_color_manual(values=c('red3', 'dodgerblue3'))
)

Analysis

Difference-in-Differences Assumptions

The validity of our difference-in-differences (DiD) estimates relies on several key assumptions that warrant careful examination.

Parallel Trends Assumption
The most critical assumption is that, absent treatment, outcome trends would have been parallel between expansion and non-expansion states. This assumption can be assessed by examining pre-treatment trends.

Assessment Methods: Visual inspection of Figures 1 and 2 provides initial evidence. The pre-period (2011-2013) shows relatively similar patterns, though with notable volatility in 2012 due to substantial missing weight data (69,384 observations). After excluding anomalous 2012 data, 2011 and 2013 observations suggest reasonably comparable pre-trends.

More rigorous approaches would include: (1) formal statistical tests using year-specific treatment indicators to test for differential pre-trends; (2) event study designs estimating year-specific treatment effects where pre-treatment coefficients should be statistically indistinguishable from zero; and (3) multiple pre-treatment period comparisons. However, with only two reliable pre-treatment observation years (2011, 2013), our ability to rigorously test this assumption is severely constrained.

Stable Unit Treatment Value Assumption (SUTVA)
SUTVA requires no spillovers between units and treatment homogeneity. Potential violations include geographic spillovers (border-crossing for care), economic spillovers (labor market and provider supply effects), and policy spillovers (non-expansion states responding to neighbors’ expansion). However, since Medicaid eligibility is state-specific and most healthcare is consumed locally, major SUTVA violations are unlikely. Any spillovers would likely attenuate our estimates toward zero.

No Anticipation Effects
The Supreme Court’s 2012 decision made expansion optional, potentially allowing anticipation effects. However, the relatively stable 2013 trends suggest limited pre-treatment behavioral changes. The clear implementation date of January 2014 provides strong treatment timing identification.

Common Shocks Assumption DiD assumes time-varying shocks affect groups similarly. Major concerns include other ACA provisions (individual mandate, exchange subsidies) taking effect in 2014, differential economic recovery patterns and state-specific policy changes. These concurrent factors may confound our estimates, potentially causing us to under- or overestimate true expansion effects.

Treatment Homogeneity We assume that the effect of Medicaid expansion is relatively homogeneous across all expansion states, despite potential variation in implementation details, outreach efforts and pre-existing state policies. This assumption of homogeneity may not fully hold, but the large sample size and broad geographic coverage help mitigate concerns about idiosyncratic state-level factors.

Sources of Bias

Selection Bias States self-selected into expansion based on political, economic and demographic factors. Expansion states were more Democratic-leaning, wealthier, and had different healthcare infrastructures. While we control for individual demographics, we don’t account for state-level political ideology, pre-existing policies or healthcare market characteristics. This represents a major threat to causal inference.

Missing Data Bias The loss of nearly 70,000 observations from 2012 raises serious concerns. If missingness relates to both expansion status and outcomes, estimates will be biased. The near-complete loss of 2012 eliminates a critical pre-treatment year, severely weakening our ability to assess parallel trends.

Measurement Error Self-reported insurance coverage may be subject to recall bias or misunderstanding. The binary medcost variable captures only complete cost barriers, missing partial barriers like delayed care or medication non-coverage. These measurement limitations may explain why we observe significant coverage effects but null cost-barrier effects.

Temporal Confounding The individual mandate, exchange subsidies and economic recovery all coincided with Medicaid expansion, making it difficult to isolate expansion effects from these concurrent changes affecting all states.

Compositional Changes Changes in survey response patterns over time, migration between states or differential attrition may bias estimates if newly insured individuals differ systematically from long-term insured populations.

Interpretation of Results

The DiD analysis reveals significant policy effects on both outcomes examined, though with notably different magnitudes and directions.

Health Insurance Coverage (hlthpln1): The interaction term (pre.post × `expansion``) yields an odds ratio of 0.87 (95% CI: 0.80-0.94, p<0.001), indicating that Medicaid expansion was associated with a statistically significant reduction in the odds of lacking health insurance coverage. In practical terms, this translates to expansion states experiencing approximately a 13% greater reduction in the odds of being uninsured compared to non-expansion states after 2014, after controlling for demographic factors. The weighted time-series visualization (Figure 2) corroborates this finding, showing that expansion states experienced a steeper increase in health insurance coverage rates following 2014 compared to non-expansion states. The confidence intervals become tighter in the post-period, reflecting increased precision in our estimates as coverage rates stabilize.

Cost-Related Access Barriers (medcost): For the outcome measuring whether respondents could not see a doctor because of cost, the DiD estimate yields an odds ratio of 1.07 (95% CI: 0.99-1.15, p=0.091). This result is not statistically significant at the conventional 0.05 level, suggesting that Medicaid expansion did not produce a detectable differential effect on cost-related barriers to care between expansion and non-expansion states.

This null finding is somewhat surprising given that expansion increased insurance coverage. Several explanations are plausible: 1) the time period examined may be too short to detect changes in healthcare-seeking behavior; 2) other factors affecting affordability such as plan deductibles, copayments and provider networks may have offset the coverage gains; 3) the measure captures only complete barriers to care (not seeing a doctor at all) rather than partial barriers; or 4) baseline differences in healthcare costs and availability between expansion and non-expansion states may obscure treatment effects.

Demographic Covariates: The models reveal important demographic patterns in coverage and access. Age is associated with both higher insurance coverage and higher cost-related barriers (OR=0.97 and OR=1.02 respectively per year), reflecting Medicare eligibility at age 65 and accumulated health needs. Female respondents have lower odds of both lacking insurance and experiencing cost barriers. Hispanic respondents show notably higher odds of lacking insurance (OR=2.28) but lower odds of cost-related access problems (OR=0.93), suggesting complex patterns of coverage and healthcare utilization. Higher income and education levels are associated with better insurance coverage but, counterintuitively, higher income is associated with greater reported cost barriers-possibly reflecting different thresholds for what constitutes prohibitive costs or different patterns of healthcare utilization.


Conclusion

This analysis provides robust evidence that the 2014 Medicaid expansion significantly increased health insurance coverage rates in states that adopted the policy. Using a difference-in-differences methodology with survey-weighted regression models and controlling for demographic factors, we found that expansion states experienced approximately 13% greater reduction in the odds of being uninsured compared to non-expansion states after implementation.

However, the expansion’s impact on cost-related barriers to medical care was not statistically significant during the 2014-2017 period. This suggests that while Medicaid expansion successfully extended insurance coverage to low-income adults, the translation of coverage into improved healthcare access and affordability may be more complex and potentially require longer time horizons to manifest fully.

Several limitations merit consideration. First, the parallel trends assumption, while reasonably supported by the available data, could not be rigorously tested due to limited pre-period observations and data quality issues in 2012. Second, the binary nature of our outcome measures may not fully capture the nuanced gradations of insurance adequacy and financial barriers to care. Third, states that expanded Medicaid may differ from non-expansion states in ways not fully captured by our demographic controls, potentially confounding the treatment effect estimates.

Despite these limitations, this study contributes to the growing evidence base demonstrating that Medicaid expansion achieved its primary objective of extending health insurance coverage to vulnerable populations. The findings remain highly relevant for states considering expansion and for ongoing policy debates regarding the future of Medicaid. Future research should examine longer-term effects on healthcare utilization, health outcomes and financial well-being, as well as potential heterogeneous treatment effects across different population subgroups and state contexts.

The evidence presented here supports the conclusion that Medicaid expansion was an effective policy tool for reducing uninsurance rates, representing a significant step toward universal health coverage in the United States. However, ensuring that coverage translates into meaningful access to affordable, high-quality healthcare remains an ongoing challenge requiring continued policy attention and innovation.



Analysis perfomed in R (v.4.5.1) and RStudio (RStudio 2025.05.1+513 “Mariposa Orchid” Release (ab7c1bc795c7dcff8f26215b832a3649a19fc16c, 2025-06-01) for windows Mozilla/5.0 (Windows NT 10.0; Win64; x64)