Literary Cafe Series: Patterns and costs associated with GLP1-RA among US adults with type 2 diabetes (Gettings data from MEPS)

Mark Bounthavong

2026-01-30

Introduction and Motivating Example

I occasionally facilitate an informal meeting with students to discuss a biomedical paper and its merits. We pick at the details and discuss its impact on policy and practice. Oftentimes, we go on a tangent and explore other complementary ideas and explore higher-level themes associated with healthcare. But, ultimately, our conversation always returns to the paper, and we find ourselves wishing that we could get the data and explore its contents. I dub these meetings our “Literary Cafe” because it occurs in the morning at a local coffee house. (Note: I enjoy my coffee when I read literature.)

In our recent Literary Cafe, we reviewed a paper by Wu and colleagued titled [1], “Patterns and costs associated with glucagon-like peptide 1 receptor agonist use in US adults with type 2 diabetes.” The article was published in the Journal of Managed Care and Specialty Pharmacy (JMCP) and investigated the healthcare costs and resource utilization among US adults with type 2 diabetes who received (or did not receive) a glucagon-like peptide 1 receptor agonist (GLP1-RA).

There has been a lot of papers on GLP1-RAs due to its effectiveness at managing metabolic outcomes, reducing body weight, and cardiovascular benefits. Additionally, there have been a dearth fo studies examining its costs and cost-effectiveness due to the high-cost of these agents.

The paper by Wu and colleagues uses publicly available data from the Agency for Healthcare Research and Quality (AHRQ) Medical Expenditure Panel Survey (MEPS) to compare the healthcare costs and resource utilization between adults with type 2 diabetes with and without a GLP1-RA. I was drawn to this study because we can validate the numbers in the paper by downloading and applying the inclusion/exclusion criteria of the study, which are provided in the Methods section. Assuming that the paper contains enough details, we should be able to replicate the numbers in the paper (or compe reasonably close).

Hence, this article will be about my attempt to replicat the study’s findings by useing the same data the authors use. I’ll divide this into two (maybe three) parts.

Objectives

In part 1, I’ll demonstrate how to get the same number of responders used in the study using MEPS data.

In part 2, I’ll use the statsistical methods in the paper to replicate the findings.

Part 1: Reproduce the sample cohort

In part 1, I’ll demonstrate how we can use the same publicly available data and reproduce the sample used in the paper by Wu and colleagues.

Wu and colleagues provided a flow diagram (attrition chart) that illustrates their step-by-step process for generating their analytic sample. Our objective is to reproduce these number. We’re going to focus on the unweighted numbers instead of the weighted numbers. You’ll notice in the paper that both the unweighted and weighted numbers are presented. Since MEPS is a complex survey design, it’s necessary to apply the appropriate weights to generate the representative US population. However for this example we’re only interested in the unweighted numbers in the flow diagram, so, I only listed the unweighted numbers in the flow diagram below.

Flow diagran recreated from Wu and colleagues [1]

Flow diagran recreated from Wu and colleagues [1]

Data Source

We will need to get the MEPS Full Year Consolidated File for 2021 and 2022.

To do that, we will need to install and load the MEPS package from the MEPS GitHub site.

While we load the MEPS package, we will also load the other packages for this exercise.

#### To install the `MEPS` package, you'll need to have the `devtools` package:
library("devtools")
install_github("e-mitchell/meps_r_pkg/MEPS")
## vctrs   (0.7.0   -> 0.7.1) [CRAN]
## bit     (4.5.0.1 -> 4.6.0) [CRAN]
## hms     (1.1.3   -> 1.1.4) [CRAN]
## vroom   (1.6.5   -> 1.7.0) [CRAN]
## readr   (2.1.5   -> 2.1.6) [CRAN]
## forcats (1.0.0   -> 1.0.1) [CRAN]
## readxl  (1.4.4   -> 1.4.5) [CRAN]
## haven   (2.5.4   -> 2.5.5) [CRAN]
## 
## The downloaded binary packages are in
##  /var/folders/c1/pcc3tx0x1cxdbdhs48h7whyr0000gn/T//Rtmp0UZweQ/downloaded_packages
## ── R CMD build ─────────────────────────────────────────────────────────────────
##      checking for file ‘/private/var/folders/c1/pcc3tx0x1cxdbdhs48h7whyr0000gn/T/Rtmp0UZweQ/remotes128fb38c02574/e-mitchell-meps_r_pkg-9133161/MEPS/DESCRIPTION’ ...  ✔  checking for file ‘/private/var/folders/c1/pcc3tx0x1cxdbdhs48h7whyr0000gn/T/Rtmp0UZweQ/remotes128fb38c02574/e-mitchell-meps_r_pkg-9133161/MEPS/DESCRIPTION’
##   ─  preparing ‘MEPS’:
##    checking DESCRIPTION meta-information ...  ✔  checking DESCRIPTION meta-information
##   ─  checking for LF line-endings in source and make files and shell scripts
##   ─  checking for empty or unneeded directories
##   ─  building ‘MEPS_1.4.0.tar.gz’
##      
## 
library("MEPS") 


#### Load other libraries
library("survey")
library("foreign")
library("tidyverse")
library("psych")
library("haven")
library("gmodels")

Clean and Prepare Data

Once we’ve loaded the MEPS package, we can start cleaning the data from MEPS.

The next code chunk downloads the 2021 Full Year Consolidated Data File from MEPS and performs some data cleaning functions. In this process, we will also download the Medical Conditions File and the Prescribed Medicines File.

At the end of this code chunk, we will have the following dataframe:

  • hc2021x
  • cond2021
  • rx2021
###################### 2021 data #####################
#### Load data from AHRQ MEPS website
hc2021 <- read_MEPS(file = "h233") # Full-year consolidated file
cond2021 <- read_MEPS(file = "h231") # Medical conditions file
rx2021 <- read_MEPS(file = "h229a") # Prescribed medicines file

## Change column names to lowercase
names(hc2021) <- tolower(names(hc2021))
names(cond2021) <- tolower(names(cond2021))
names(rx2021) <- tolower(names(rx2021))

#### Keep only those variables of interest
hc2021x <- hc2021 %>%
  select(dupersid, totexp21, totslf21, totprv21, totmcr21, totmcd21, totva21, rxexp21, obvexp21,optexp21, ertexp21, iptexp21, hhaexp21, hhnexp21, opsexp21, perwt21f, varpsu, varstr, age21x, sex, racethx, panel)

# Rename variables
hc2021x <- hc2021x %>%
  rename(totexp = totexp21,
         slf = totslf21, 
         prv = totprv21, 
         mcr = totmcr21, 
         mcd = totmcd21, 
         va = totva21,
         rxexp = rxexp21,
         obvexp = obvexp21,
         optexp = optexp21,
         ertexp = ertexp21,
         iptexp = iptexp21, 
         hhaexp = hhaexp21, 
         hhnexp = hhnexp21, 
         opsexp = opsexp21,
         perwtf = perwt21f,
         age = age21x)

# Add year
hc2021x$year <- c(2021)

Identify the responders with diabetes

Once the data cleaning is completed, we will need to use the Medical Conditions File to identify responders who have a diagnosis of type 2 diabetes.

We’ll take advantage of the Clinical Classification Software Refined (CCSR), which contains groupings of ICD-10 codes for diabetes-related categories. These include:

  • END002 denotes diabetes mellitus without complication
  • END003 denotes diabetes mellitus with complication
  • END005 denotes diabetes mellitus, Type 2
## Diabetes filter
pattern <- c("END002", "END003", "END005") # Combine the CCSR codes
combined_pattern <- paste(pattern, collapse = "|") # adding an OR operator

In the Medical Conditions File, there are up to three places where CCSR codes are listed: ccsr1x, ccsr2x, and ccsr3x. We want to query each of these to see if any of the diabetes-related codes are present.

diabetes2021 <- cond2021 %>% 
  unite("all_CCSR", ccsr1x:ccsr3x, remove = FALSE) %>% 
  filter(grepl(combined_pattern, all_CCSR))

Once we locate the diabetes-related CCSR codes, we need to create a new indicate variable for diabetes called diabetes in the dataframe. We will code this as 1 = diabetes and 0 = no diabetes.

diabetes2021$diabetes <- 1 # Add an indicator for diabetes

Then, we will need to ensure that each row is a unique responder (we want to eliminate duplicates).

diabetes2021_unique <- diabetes2021 %>%
  distinct(dupersid, perwt21f, varpsu, varstr, .keep_all = TRUE) # Keep unique subjects

diabetes2021_sub <- diabetes2021_unique %>%
    select(dupersid, diabetes, perwt21f, varpsu, varstr) # Select only necessary variables

diabetes2021_sub <- diabetes2021_sub %>%
  rename(perwtf = perwt21f) # Rename variables

diabetes2021_sub <- zap_labels(diabetes2021_sub) # Remove any labels

Next, we want to merge the hc2021x dataframe with the diabetes2021_sub dataframe. The hc2021x dataframe contains the demographic and outcome data, and the diabetes2021_sub dataframe contains information on the diabetes == 1 indicator.

We will use the left_join() function to do the merge, and we will merge using the dupersid, perwtf, varpsu, and varstr.

# Merge data - LEFT JOIN
hc2021x_t2dm <- left_join(hc2021x, diabetes2021_sub, by = c("dupersid", "perwtf", "varpsu", "varstr"))

This complete the data cleaning for 2021. Now, we repeat the process for 2022.

Repeat Data Cleaning and Preparation for 2022 data

##################### 2022 data #####################
#### Load data from AHRQ MEPS website
## 2022
hc2022 <- read_MEPS(file = "h243") # Full-year consolidated file
cond2022 <- read_MEPS(file = "h241") # Medical Conditions File
rx2022 <- read_MEPS(file = "h239a") # Prescribed Medicines File

## Change column names to lowercase
names(hc2022) <- tolower(names(hc2022))
names(cond2022) <- tolower(names(cond2022))
names(rx2022) <- tolower(names(rx2022))

# Keep only the variables of interest
hc2022x <- hc2022 %>%
  select(dupersid, totexp22, totslf22, totprv22, totmcr22, totmcd22, totva22, rxexp22, obvexp22, optexp22, ertexp22, iptexp22, hhaexp22, hhnexp22, opsexp22, perwt22f, varpsu, varstr, age22x, sex, racethx, panel)

# Rename variables
hc2022x <- hc2022x %>%
  rename(totexp = totexp22,
         slf = totslf22, 
         prv = totprv22, 
         mcr = totmcr22, 
         mcd = totmcd22, 
         va = totva22,
         rxexp = rxexp22,
         obvexp = obvexp22,
         optexp = optexp22,
         ertexp = ertexp22,
         iptexp = iptexp22, 
         hhaexp = hhaexp22, 
         hhnexp = hhnexp22, 
         opsexp = opsexp22,
         perwtf = perwt22f,
         age = age22x)

# Add year
hc2022x$year <- c(2022)

## Diabetes filter
pattern <- c("END002", "END003", "END005") # Combine the CCSR codes
combined_pattern <- paste(pattern, collapse = "|") # adding an OR operator

diabetes2022 <- cond2022 %>% 
  unite("all_CCSR", ccsr1x:ccsr3x, remove = FALSE) %>% 
  filter(grepl(combined_pattern, all_CCSR))

diabetes2022$diabetes <- 1 # Add an indicator for diabetes

diabetes2022_unique <- diabetes2022 %>%
  distinct(dupersid, perwt22f, varpsu, varstr, .keep_all = TRUE) # Keep unique subjects

diabetes2022_sub <- diabetes2022_unique %>%
  select(dupersid, diabetes, perwt22f, varpsu, varstr) # Select only necessary variables

diabetes2022_sub <- diabetes2022_sub %>%
  rename(perwtf = perwt22f) # Rename variables

diabetes2022_sub <- zap_labels(diabetes2022_sub) # Remove any labels

# Merge data - LEFT JOIN
hc2022x_t2dm <- left_join(hc2022x, diabetes2022_sub, by = c("dupersid", "perwtf", "varpsu", "varstr"))

Append data

Once we have cleaned and prepared the data for 2021 and 2022, we need to append the data together.

First, we’ll need to remove the labels in the dataframe. Sometimes, the labels may create errors when you append two dataframes together. We will use the zap_labels() function from the haven package.

Second, we will use the rbind() package to append the 2021 and 2022 dataframes.

############################ APPEND ############################ 
## Remove label due to errors
hc2021x_t2dm <- zap_labels(hc2021x_t2dm)
hc2022x_t2dm <- zap_labels(hc2022x_t2dm)


## Append 2021 and 2022 data
hc <- rbind(hc2021x_t2dm, hc2022x_t2dm)
count(hc)
## # A tibble: 1 × 1
##       n
##   <int>
## 1 50767

Filter sample

Wu and colleagues perform some filtering operation to focus on respsonders with a positive person weight (perwtf), are in panels 23 or 25 if they are from the 2021 data, are in panels 24, 26, or 27 if they are from the 2022 data, adults (>=18 years), and has diabetes.

After we restrict the sample to those with positive perwtf, then the sample size is 49,079, which is the same number in the first box of the flow diagram (or attrition chart).

############################ FILTER ############################ 

#### Keep if weight is positive
hc_positive <- subset(hc, perwtf > 0)
count(hc_positive)
## # A tibble: 1 × 1
##       n
##   <int>
## 1 49079

Then, we limit the sample to those who are in the panel 23 or 25 for 2021 and panels 24, 26, or 27 for 2022. This will yield a sample size of 34,411 responders, which is the value in the second box of the flow diagra (or attrition chart).

######################## SUBSET - FILTER ######################## 
#### Subset panels for 2021 and 2022 (Independence)
hc_panel <- hc_positive %>%
  filter(((panel == 23 | panel == 25) & year == 2021) | 
         ((panel == 24 | panel == 26 | panel == 27) & year == 2022))
count(hc_panel)
## # A tibble: 1 × 1
##       n
##   <int>
## 1 34411

Next, we restrict the sample to responders who are 18 years and older. This will yield a sample size of 27,460 responders, which is the third box in the flow diagram (or attrition chart).

## Keep if age >= 18 years
hc_age <- subset(hc_panel, age >= 18)
count(hc_age)
## # A tibble: 1 × 1
##       n
##   <int>
## 1 27460

Then, we will look at the subset of the sammple with diabetes. This should yield a sample size of 3587 responders, which is the fourth box of the flow diagram (or attrition chart). We will end up with the hc_diabetes dataframe.

## Keep if diabetes == 1
hc_diabetes <- subset(hc_age, diabetes == 1)
count(hc_diabetes)
## # A tibble: 1 × 1
##       n
##   <int>
## 1  3587

Identify Responders with GLP1-RA prescription

In this part of the article, we will merge the Prescribed Medicines File with the hc_diabetes dataframe.

First, we identify the responder on a GLP1-RA. We will take advantage of the Multum Lexicon Variables format to identify the incretim mimetic, which includes GLP1-RAs. To do this, we use the tc1s1_1 == 373 variable. To see the list of other types of categories the tc1s1_1 variable contains, we will need to view the data dictionary (link). We do this for 2021 and 2022 and names these dataframes as rx2021_glp and rx2022_glp.

######################## DRUGS FILTER ######################## 
## Identify subjects on GLP1-RA in 2021
rx2021_glp <- rx2021 %>% 
  filter(tc1s1_1 == 373) %>%
  select(dupersid, perwt21f, varpsu, varstr, tc1s1_1) %>%
  distinct(dupersid, perwt21f, varpsu, varstr, tc1s1_1) %>%
  rename(perwtf = perwt21f)

## Identify subjects on GLP1-RA in 2022
rx2022_glp <- rx2022 %>% 
  filter(tc1s1_1 == 373) %>%
  select(dupersid, perwt22f, varpsu, varstr, tc1s1_1) %>%
  distinct(dupersid, perwt22f, varpsu, varstr, tc1s1_1) %>%
  rename(perwtf = perwt22f)

Second, we need to combine the rx2021_glp and rx2022_glp dataframes together.

## Append GLP1-RA data
rx_comb <- rbind(rx2021_glp, rx2022_glp) 

Third, we create an indicator for GLP1-RA called glp == 1.

## Add indicator for GLP1-RA
rx_comb$glp <- 1

Fourth, we filter to only the unique responders by removing the duplicates. We call this new dataframe rx_unique. .

## Filter uniques
rx_unique <- rx_comb %>%
  distinct(dupersid, perwtf, varpsu, varstr, glp)

Fifth, we merge rx_unique with the hc_diabetes dataframe using the dupersid, perwtf, varpsu, and varstr variables.

## Merge to the main dataframe
data_merge <- left_join(hc_diabetes, rx_unique, by = c("dupersid", "perwtf", "varpsu", "varstr"))

Last, we will finalize the indicator variable for GLP1-RA. This will yield a sample size of 637 for GLP1-RA and 2950 for no-GLP1-RA.

## GLP1-RA indicator variable
table(data_merge$glp)
## 
##   1 
## 637
data_merge$glp[is.na(data_merge$glp)] <- 0
table(data_merge$glp)
## 
##    0    1 
## 2950  637

Conclusions

By following the inclusion and exclusion criteria from the paper by Wu and colleagues, we can create the sample that they used for their study. In the next article, we will attempt to replicate the findings from the study.

References

  1. Wu J, Perez A, Sullivan PW. Patterns and costs associated with glucagon-like peptide-1 receptor agonist use in US adults with type 2 diabetes. J Manag Care Spec Pharm. 2025 Oct;31(10):1029-1038. doi: 10.18553/jmcp.2025.31.10.1029. PMID: 41004212; PMCID: PMC12467763.

Disclosures and Disclaimers

This is a work in progress; thus, I may update this in the future.

Any errors or mistakes are my fault, and I’ll try my best to fix them.

Lastly, this is for educational purposes only.