#getwd()

Centers for Medicare & Medicaid Services (CMS)

Center for Medicare and Medicaid Innovation (The Innovation Center) dataset
https://data.cms.gov/Special-Programs-Initiatives-Speed-Adoption-of-Bes/CMS-Innovation-Center-Model-Participants/x8pc-u7ta

An analysis of CMS Innovation Center’s Model Participants.

To set the context, Medicare is a federal health insurance program primarily for people ages 65 and older, for younger people with disabilities, or for people with end stage renal disease (ESRD). Over 60 million people use Medicare each year, and Medicare spending accounted for 15% of total federal spending in 2018, amounting to $731 billion [^1]. The image below indicates Medicare’s portion of the federal budget for 2018.

Medicare2018.

Medicare2018.

CMS is the single largest payer for health care in the United States [^2] and CMS leadership takes this role in overall health care services seriously. CMS influences the use of best practices, prevention initiatives, treatment choices, overall harm-free care, and patient follow-up across providers and insurance payers because of its major role in the industry. CMS’ decisions impact each person’s health care experience, whether they are aware of this or not.

For part of my job, I support one of the CMS Innovation Center’s voluntary payment model tests called Bundled Payments for Care Improvement (BPCI) Advanced. The Innovation Center tests payment and care delivery models like this to evaluate incentives for improving quality care and lowering healthcare costs delivered to Medicare beneficiaries. In this project, I explore the participants (i.e., providers, hospitals) in Innovation Center models, especially participants in BPCI Advanced. Let’s begin.
[^1]: https://www.kff.org/medicare/issue-brief/the-facts-on-medicare-spending-and-financing/
[^2]: https://www.cms.gov/Medicare/Quality-Initiatives-Patient-Assessment-Instruments/QualityInitiativesGenInfo/Downloads/RoadmapOverview_OEA_1-16.pdf

Install packages and dataset.

Before importing the dataset into R, I did some cleaning in Excel.

  • I deleted columns with all blanks.
    • Phase 1
    • MSA Name
    • Map Display
  • I edited the record for Unique ID #7437 (Baptist Cancer Center - Tipton) to remove artifacts in the organization name.
  • I renamed the dataset “CMSdata”.

I will use functions from the dplyr package like filter(), group_by(), and summarize() to manipulate the data.

#install.packages("tidyverse")
library(tidyverse)
## -- Attaching packages -------------------------------------------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.2.1     v purrr   0.3.3
## v tibble  2.1.3     v dplyr   0.8.3
## v tidyr   1.0.0     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0
## -- Conflicts ----------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(readr)
library(ggplot2)
allparticipants<-read_csv("CMSdata.csv") #this pulls my csv file from my working directory
## Parsed with column specification:
## cols(
##   `Name of Initiative` = col_character(),
##   `Organization Name` = col_character(),
##   Notes = col_character(),
##   `Location 1` = col_character(),
##   `Street Address` = col_character(),
##   City = col_character(),
##   State = col_character(),
##   `State Based` = col_logical(),
##   `Phase 2` = col_character(),
##   Facebook = col_character(),
##   Twitter = col_character(),
##   Youtube = col_character(),
##   Website = col_character(),
##   Category = col_character(),
##   `Unique ID` = col_double()
## )
dim(allparticipants) #this gives the dimensions of the dataset which is the number of rows and columns
## [1] 7886   15

This dataset has 7,886 records with 15 variables.

summary(allparticipants) #get a summary of the dataset
##  Name of Initiative Organization Name     Notes            Location 1       
##  Length:7886        Length:7886        Length:7886        Length:7886       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##  Street Address         City              State           State Based    
##  Length:7886        Length:7886        Length:7886        Mode :logical  
##  Class :character   Class :character   Class :character   FALSE:315      
##  Mode  :character   Mode  :character   Mode  :character   TRUE :1        
##                                                           NA's :7570     
##                                                                          
##                                                                          
##    Phase 2            Facebook           Twitter            Youtube         
##  Length:7886        Length:7886        Length:7886        Length:7886       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##    Website            Category           Unique ID   
##  Length:7886        Length:7886        Min.   :   1  
##  Class :character   Class :character   1st Qu.:1973  
##  Mode  :character   Mode  :character   Median :3970  
##                                        Mean   :3976  
##                                        3rd Qu.:5981  
##                                        Max.   :7976

The Notes column was a field of free text which had content for only a few of the initiatives. I cleaned up the Notes column using OpenRefine which is an open source application for data wrangling. I extracted the number of clinical episodes each participant selected in the BPCI initiatives into a separate column.

I used OpenRefine to:

  • Filter to select BPCI Initiatives
    • On Notes column - Edit column - Split column Notes into several columns
    • By separator
    • Separator :
    • Guess cell Type - checked
  • On Notes 2 column - Edit column - Split column Notes 2 into several columns
    • By separator
    • Separator //
  • Delete Notes 3 column
  • On Notes 2 column - Edit column - Split column Notes 2 several columns
    • By separator
    • Separator //
  • Delete Notes 2 column and Notes 2 2 column
  • Rename the column “Number of Clinical Episodes”
  • Rename the first column “Initiative”
  • Remove the BPCI Initiatives filter
  • Export the dataset and name it as “CMSdata2”

Pull the refined dataset into R.

allparticipants <- read_csv("CMSdata2.csv")
## Parsed with column specification:
## cols(
##   Initiative = col_character(),
##   `Organization Name` = col_character(),
##   `Number of Clinical Episodes` = col_double(),
##   `Location 1` = col_character(),
##   `Street Address` = col_character(),
##   City = col_character(),
##   State = col_character(),
##   `State Based` = col_logical(),
##   `Phase 2` = col_logical(),
##   Facebook = col_character(),
##   Twitter = col_character(),
##   Youtube = col_character(),
##   Website = col_character(),
##   Category = col_character(),
##   `Unique ID` = col_double()
## )
## Warning: 979 parsing failures.
##  row     col           expected                                                    actual           file
## 1889 Phase 2 1/0/T/F/TRUE/FALSE Major joint replacement of the lower extremity            'CMSdata2.csv'
## 1890 Phase 2 1/0/T/F/TRUE/FALSE Major joint replacement of the lower extremity            'CMSdata2.csv'
## 1891 Phase 2 1/0/T/F/TRUE/FALSE Major joint replacement of the lower extremity            'CMSdata2.csv'
## 1892 Phase 2 1/0/T/F/TRUE/FALSE Major bowel procedure                                     'CMSdata2.csv'
## 1893 Phase 2 1/0/T/F/TRUE/FALSE Chronic obstructive pulmonary disease, bronchitis, asthma 'CMSdata2.csv'
## .... ....... .................. ......................................................... ..............
## See problems(...) for more details.

R is returning a warning note for 979 parsing failures in the column called “Phase 2”. These rows are the only records with values (strings) in this field. R is trying to read this column as a logical value. I will use the mutate command with as.character to convert these values to characters to dismiss this problem.

allparticipants <- allparticipants %>%
  mutate_if(is.logical, as.character) #this changes all the columns/fields in the dataframe from logical (if applicable) to character

Make all headings (column names) lowercase. Remove the spaces between words in the headings. Then look at the data with head command.

names(allparticipants) <- tolower(names(allparticipants)) #makes the column headings lowercase
names(allparticipants) <- gsub(" ","",names(allparticipants)) #removes the space in the headings, if applicable.
head(allparticipants)
## # A tibble: 6 x 15
##   initiative organizationname numberofclinica~ location1 streetaddress city 
##   <chr>      <chr>                       <dbl> <chr>     <chr>         <chr>
## 1 Comprehen~ Brown Medicine ~               NA 375 Wamp~ 375 Wampanoa~ East~
## 2 Comprehen~ Brown Medicine ~               NA 375 Wamp~ 375 Wampanoa~ East~
## 3 Comprehen~ Brown Medicine ~               NA 375 Wamp~ 375 Wampanoa~ East~
## 4 Comprehen~ Brown Medicine ~               NA 909 Nort~ 909 North Ma~ Prov~
## 5 Comprehen~ Coastal Medical~               NA 11445 Wa~ 11445 Wampan~ East~
## 6 Comprehen~ LPG Primary Car~               NA 50 Memor~ 50 Memorial ~ Newp~
## # ... with 9 more variables: state <chr>, statebased <chr>, phase2 <chr>,
## #   facebook <chr>, twitter <chr>, youtube <chr>, website <chr>,
## #   category <chr>, uniqueid <dbl>

Look at the summary of the dataset again.

summary(allparticipants)
##   initiative        organizationname   numberofclinicalepisodes
##  Length:7886        Length:7886        Min.   : 1.000          
##  Class :character   Class :character   1st Qu.: 2.000          
##  Mode  :character   Mode  :character   Median : 4.000          
##                                        Mean   : 7.023          
##                                        3rd Qu.: 9.000          
##                                        Max.   :47.000          
##                                        NA's   :5612            
##   location1         streetaddress          city              state          
##  Length:7886        Length:7886        Length:7886        Length:7886       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   statebased           phase2            facebook           twitter         
##  Length:7886        Length:7886        Length:7886        Length:7886       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##    youtube            website            category            uniqueid   
##  Length:7886        Length:7886        Length:7886        Min.   :   1  
##  Class :character   Class :character   Class :character   1st Qu.:1973  
##  Mode  :character   Mode  :character   Mode  :character   Median :3970  
##                                                           Mean   :3976  
##                                                           3rd Qu.:5981  
##                                                           Max.   :7976  
## 

Now I have a quantitative variable for the number of clinical episodes column.

Group the dataframe by multiple columns with mean number of clinical episodes.

I want to see the mean number of clinical episodes for each of the BPCI models.

#this says to group the df by the initiative column then
#take the variable called numberofclinicalepisodes and apply the function mean and remove the nas when calculating
#then I filtered out the nas so that I only had the initiatives with clinical episodes selected.
#I assigned that summarization to something called bpcimodels.
bpcimodels <- summarise_at(group_by(allparticipants, initiative),vars(numberofclinicalepisodes),funs(mean(.,na.rm=TRUE))) %>%
  filter(!is.na(numberofclinicalepisodes)) 
## Warning: funs() is soft deprecated as of dplyr 0.8.0
## Please use a list of either functions or lambdas: 
## 
##   # Simple named list: 
##   list(mean = mean, median = median)
## 
##   # Auto named with `tibble::lst()`: 
##   tibble::lst(mean, median)
## 
##   # Using lambdas
##   list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once per session.
#filter here says to take all rows that do not have an 'na' value in the numberofclinicalepisodes variable.
#that leaves me with the 4 BPCI models with their mean number of clinical episodes selected.
bpcimodels #this displays the df in a tibble
## # A tibble: 4 x 2
##   initiative               numberofclinicalepisodes
##   <chr>                                       <dbl>
## 1 BPCI Advanced                                5.95
## 2 BPCI Initiative: Model 2                     5.43
## 3 BPCI Initiative: Model 3                    10.5 
## 4 BPCI Initiative: Model 4                     6.5

Let’s try a plot.

I’ll look at the average number of clinical episodes that participants chose within each iteration of the BPCI models.
#install.packages("plotly")
#install.packages("ggthemes")
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(ggthemes)
bpciepisodes <- bpcimodels %>% #this uses the small dataframe of just the bpci models
  ggplot(aes(x=initiative, y=numberofclinicalepisodes, fill=initiative)) +
  geom_bar(stat="identity") +
  theme_stata() +
  labs(x="Initiative", y="Average Number of Clinical Episodes Chosen", title="Average Number of Clinical Episodes within each Initiative") 
ggplotly(bpciepisodes) #using ggplotly makes the labels appear when you hover

Let’s try another plot.

First I will get some statistics on the variation.
#install.packages("rcompanion") # this has the groupwiseMean command which I use to get the statistics across all the BPCI participants
library(rcompanion)
bpciparticipants <- allparticipants %>%
  filter(!is.na(numberofclinicalepisodes))
Sum <- groupwiseMean(data=bpciparticipants, var="numberofclinicalepisodes", group="initiative",conf=0.95,digits=3)
Sum
##                 initiative    n  Mean Conf.level Trad.lower Trad.upper
## 1            BPCI Advanced 1295  5.95       0.95       5.63       6.27
## 2 BPCI Initiative: Model 2  402  5.43       0.95       4.83       6.03
## 3 BPCI Initiative: Model 3  575 10.50       0.95       9.65      11.40
## 4 BPCI Initiative: Model 4    2  6.50       0.95     -63.40      76.40
summary(Sum)
##   initiative              n               Mean          Conf.level  
##  Length:4           Min.   :   2.0   Min.   : 5.430   Min.   :0.95  
##  Class :character   1st Qu.: 302.0   1st Qu.: 5.820   1st Qu.:0.95  
##  Mode  :character   Median : 488.5   Median : 6.225   Median :0.95  
##                     Mean   : 568.5   Mean   : 7.095   Mean   :0.95  
##                     3rd Qu.: 755.0   3rd Qu.: 7.500   3rd Qu.:0.95  
##                     Max.   :1295.0   Max.   :10.500   Max.   :0.95  
##    Trad.lower        Trad.upper    
##  Min.   :-63.400   Min.   : 6.030  
##  1st Qu.:-12.227   1st Qu.: 6.210  
##  Median :  5.230   Median : 8.835  
##  Mean   :-10.822   Mean   :25.025  
##  3rd Qu.:  6.635   3rd Qu.:27.650  
##  Max.   :  9.650   Max.   :76.400

trying to plot

std error is for your sample std deviation is for your population

Sum %>% ggplot(aes(x=initiative, y=Mean,fill=initiative)) +
  geom_bar(stat="identity") +
  geom_errorbar(aes(ymin=Trad.lower, ymax=Trad.upper,width=0.15))

I need to remove BPCI Initiative: Model 4 since it is skewing everything.

#install.packages(data.table) # I tried an outlier.replace version first which did not work well for me.
library(data.table)
## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last
## The following object is masked from 'package:purrr':
## 
##     transpose
#Sum %>% outlier.replace(Sum,"numberofclinicalepisodes", which(numberofclinicalepisodes < 3), newValue=NA) 
#Sum %>% filter(!is.na(numberofclinicalepisodes))
statson3models <- Sum %>% filter(n >= 3) %>% ggplot(aes(x=initiative, y=Mean)) +
  geom_bar(stat="identity", aes(fill=initiative)) +
  geom_errorbar(aes(ymin=Trad.lower, ymax=Trad.upper,width=0.15))
ggplotly(statson3models)

This shows the mean and

Let’s make a plot

#plotse <- allparticipants %>% 
 # filter(!is.na(numberofclinicalepisodes)) +
#  ggplot(aes(x=initiative, y=numberofclinicalepisodes)) +
#  geom_bar(stat="identity",aes(fill=initiative)) +
#  geom_errorbar(aes(ymin=Trad.lower,ymax=Trad.upper,width=0.15))
#plotse
#bpcisum <- bpcimodels %>%
#  ggplot(Sum,aes(x=initiative, y=numberofclinicalepisodes,fill=initiative)) +
#  geom_bar(stat="identity",color="black") #+
  #geom_errorbar(aes(ymin=mean-se,ymax=mean+se),width=0.2,size=0.7,color="black") +
  #theme_stata() +
  #labs(x="Initiative", y="Number of Clinical Episodes Chosen", title="Number of Clinical Episodes within each Initiative")
#ggplotly(bpcisum)