#getwd()

Centers for Medicare & Medicaid Services (CMS)

Center for Medicare and Medicaid Innovation (The Innovation Center) dataset
https://data.cms.gov/Special-Programs-Initiatives-Speed-Adoption-of-Bes/CMS-Innovation-Center-Model-Participants/x8pc-u7ta

An analysis of CMS Innovation Center’s Model Participants.

Medicare is a federal health insurance program primarily for people ages 65 and older, for younger people with disabilities, or for people with end stage renal disease (ESRD). Over 60 million people use Medicare each year, and Medicare spending accounted for 15% of total federal spending in 2018, amounting to $731 billion ¹. The image below from the Kaiser Family Foundation (KFF) indicates Medicare’s portion of the federal budget for 2018.

Medicare2018.

CMS is the single largest payer for health care in the United States ² and the leadership of CMS takes this role in overall health care services seriously. CMS influences the use of best practices, prevention initiatives, treatment choices, overall harm-free care, and patient follow-up across providers and insurance payers because of its major role in the industry.f The decisions at CMS impact each person’s health care experience, whether they are aware of this influence or even whether they utilize Medicare for their health insurance or not.

For part of my job, I support one of the CMS Innovation Center’s voluntary payment model tests called Bundled Payments for Care Improvement (BPCI) Advanced. The Innovation Center tests payment and care delivery models like this to evaluate incentives for improving quality care and lowering healthcare costs delivered to Medicare beneficiaries. In this project, I explore the participants (i.e., providers, hospitals) in Innovation Center models, especially participants in BPCI models. For the BPCI models, the participants are acute care hospitals (ACHs) or physician group practices (PGPs). These participants choose which clinical episodes they want CMS to evaluate and pay them a bundled payment amount for. For example, a hospital may agree to participate in the congestive heart failure (CHF) clinical episode. This means that for a Medicare beneficiary who triggers a CHF episode at this ACH, CMS will pay the hospital a set dollar amount to care for that beneficiary within a 90-day window. If the hospital provides CHF care efficiently, reports on completing activities, provides services, with positive outcomes, they can do this for less cost than what CMS pays, thereby making a profit. If the hospital provides CHF care poorly, reports that they did not complete certain activities, did not provide services, and the beneficiary had negative outcomes, CMS reduces the payment so the hospital is penalized or loses money. This is a simplified explanation of a very complex payment system.
For my analysis, I begin by looking at the number of clinical episodes that participants select within each of the BPCI models. There are additional resources and information available at CMS’ bundled payments website shown here: https://innovation.cms.gov/initiatives/Bundled-Payments/

Install packages and dataset.

Before importing the dataset into R, I did some cleaning in Excel.

I deleted columns with all blanks.
Phase 1
MSA Name
Map Display
I edited one record for Unique ID #7437 (Baptist Cancer Center - Tipton) to remove artifacts in the organization name.
I renamed the dataset “CMSdata”.

I will use functions from the dplyr package like filter(), group_by(), and summarize() to manipulate the data.

#install.packages("vctrs")
#install.packages("tidyverse")
library(tidyverse)

## -- Attaching packages --------------------------------------------------------------------------------- tidyverse 1.3.0 --

## <U+2713> ggplot2 3.2.1     <U+2713> purrr   0.3.3
## <U+2713> tibble  2.1.3     <U+2713> dplyr   0.8.3
## <U+2713> tidyr   1.0.0     <U+2713> stringr 1.4.0
## <U+2713> readr   1.3.1     <U+2713> forcats 0.4.0

## -- Conflicts ------------------------------------------------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(readr)
library(ggplot2)
library(vctrs)
allparticipants<-read_csv("CMSdata.csv") #this pulls my csv file from my working directory

## Parsed with column specification:
## cols(
##   `Name of Initiative` = col_character(),
##   `Organization Name` = col_character(),
##   Notes = col_character(),
##   `Location 1` = col_character(),
##   `Street Address` = col_character(),
##   City = col_character(),
##   State = col_character(),
##   `State Based` = col_logical(),
##   `Phase 2` = col_character(),
##   Facebook = col_character(),
##   Twitter = col_character(),
##   Youtube = col_character(),
##   Website = col_character(),
##   Category = col_character(),
##   `Unique ID` = col_double()
## )

dim(allparticipants) #this gives the dimensions of the dataset, which is the number of rows and columns

## [1] 7886   15

This dataset has 7,886 records with 15 variables.

summary(allparticipants) #get a summary of the dataset

##  Name of Initiative Organization Name     Notes            Location 1       
##  Length:7886        Length:7886        Length:7886        Length:7886       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##  Street Address         City              State           State Based    
##  Length:7886        Length:7886        Length:7886        Mode :logical  
##  Class :character   Class :character   Class :character   FALSE:315      
##  Mode  :character   Mode  :character   Mode  :character   TRUE :1        
##                                                           NA's :7570     
##                                                                          
##                                                                          
##    Phase 2            Facebook           Twitter            Youtube         
##  Length:7886        Length:7886        Length:7886        Length:7886       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##    Website            Category           Unique ID   
##  Length:7886        Length:7886        Min.   :   1  
##  Class :character   Class :character   1st Qu.:1973  
##  Mode  :character   Mode  :character   Median :3970  
##                                        Mean   :3976  
##                                        3rd Qu.:5981  
##                                        Max.   :7976

The variables are primarily categorical. R is seeing the Unique ID variable as a quantitative variable so it has calcuated the statistics for this field but this field is really just an identifier or categorical variable. I could adjust this variable but I do not plan to use it for this analysis. This identifier is of limited value for my analysis as each participant/hospital receives a new unique identifier for each type of model which they join. Since this is the case, I am not able to use this column to match up participants who are in multiple models as their ID number is different for each CMS model. I believe this makes the data identification easier on CMS’ end of things when the contractor pulls fields to calculate payment for each model.

The Notes column was a field of free text (a string) which had content for only a few of the initiatives. I cleaned up the Notes column using OpenRefine https://openrefine.org/ which is an open source application for data wrangling. I extracted the number of clinical episodes each participant selected in the BPCI initiatives into a separate column that I named “Number of Clinical Episodes”.

I used OpenRefine to:

Filter to select BPCI Initiatives
On Notes column - Edit column - Split column Notes into several columns
By separator
Separator :
Guess cell Type - checked
On Notes 2 column - Edit column - Split column Notes 2 into several columns
By separator
Separator //
Delete Notes 3 column
On Notes 2 column - Edit column - Split column Notes 2 several columns
By separator
Separator //
Delete Notes 2 column and Notes 2 2 column
Rename the column “Number of Clinical Episodes”
Rename the first column “Initiative”
Remove the BPCI Initiatives filter
Export the full dataset and name it as “CMSdata2”

Pull the refined dataset into R, overwriting the previous dataset called allparticipants.

allparticipants <- read_csv("CMSdata2.csv")

## Parsed with column specification:
## cols(
##   Initiative = col_character(),
##   `Organization Name` = col_character(),
##   `Number of Clinical Episodes` = col_double(),
##   `Location 1` = col_character(),
##   `Street Address` = col_character(),
##   City = col_character(),
##   State = col_character(),
##   `State Based` = col_logical(),
##   `Phase 2` = col_logical(),
##   Facebook = col_character(),
##   Twitter = col_character(),
##   Youtube = col_character(),
##   Website = col_character(),
##   Category = col_character(),
##   `Unique ID` = col_double()
## )

## Warning: 979 parsing failures.
##  row     col           expected                                                    actual           file
## 1889 Phase 2 1/0/T/F/TRUE/FALSE Major joint replacement of the lower extremity            'CMSdata2.csv'
## 1890 Phase 2 1/0/T/F/TRUE/FALSE Major joint replacement of the lower extremity            'CMSdata2.csv'
## 1891 Phase 2 1/0/T/F/TRUE/FALSE Major joint replacement of the lower extremity            'CMSdata2.csv'
## 1892 Phase 2 1/0/T/F/TRUE/FALSE Major bowel procedure                                     'CMSdata2.csv'
## 1893 Phase 2 1/0/T/F/TRUE/FALSE Chronic obstructive pulmonary disease, bronchitis, asthma 'CMSdata2.csv'
## .... ....... .................. ......................................................... ..............
## See problems(...) for more details.

R is returning a warning note for 979 parsing failures in the column called “Phase 2”. These rows are the only records with values (strings) in this field. R is trying to read this column as a logical value. I will use the dplyr “mutate” command with as.character to convert these values to characters to dismiss this problem.

allparticipants <- allparticipants %>%
  mutate_if(is.logical, as.character) #this changes all the columns/fields in the dataframe from logical (if applicable) to character, which meets my needs for this dataset.

Make all headings (column names) lowercase. Remove the spaces between words in the headings. Then look at the data with head command.

names(allparticipants) <- tolower(names(allparticipants)) #makes the column headings lowercase
names(allparticipants) <- gsub(" ","",names(allparticipants)) #removes the space in the headings, if applicable.
head(allparticipants)

## # A tibble: 6 x 15
##   initiative organizationname numberofclinica… location1 streetaddress city 
##   <chr>      <chr>                       <dbl> <chr>     <chr>         <chr>
## 1 Comprehen… Brown Medicine …               NA 375 Wamp… 375 Wampanoa… East…
## 2 Comprehen… Brown Medicine …               NA 375 Wamp… 375 Wampanoa… East…
## 3 Comprehen… Brown Medicine …               NA 375 Wamp… 375 Wampanoa… East…
## 4 Comprehen… Brown Medicine …               NA 909 Nort… 909 North Ma… Prov…
## 5 Comprehen… Coastal Medical…               NA 11445 Wa… 11445 Wampan… East…
## 6 Comprehen… LPG Primary Car…               NA 50 Memor… 50 Memorial … Newp…
## # … with 9 more variables: state <chr>, statebased <chr>, phase2 <chr>,
## #   facebook <chr>, twitter <chr>, youtube <chr>, website <chr>,
## #   category <chr>, uniqueid <dbl>

Look at the summary of the dataset again.

summary(allparticipants)

##   initiative        organizationname   numberofclinicalepisodes
##  Length:7886        Length:7886        Min.   : 1.000          
##  Class :character   Class :character   1st Qu.: 2.000          
##  Mode  :character   Mode  :character   Median : 4.000          
##                                        Mean   : 7.023          
##                                        3rd Qu.: 9.000          
##                                        Max.   :47.000          
##                                        NA's   :5612            
##   location1         streetaddress          city              state          
##  Length:7886        Length:7886        Length:7886        Length:7886       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   statebased           phase2            facebook           twitter         
##  Length:7886        Length:7886        Length:7886        Length:7886       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##    youtube            website            category            uniqueid   
##  Length:7886        Length:7886        Length:7886        Min.   :   1  
##  Class :character   Class :character   Class :character   1st Qu.:1973  
##  Mode  :character   Mode  :character   Mode  :character   Median :3970  
##                                                           Mean   :3976  
##                                                           3rd Qu.:5981  
##                                                           Max.   :7976  
##

Now I have a quantitative variable for the number of clinical episodes column. One item I noticed was the number of clinical episodes selected by participants. But first, a little more background on the models and the episodes. The first BPCI model (Model 1) included all Medicare Severity Diagnosis Related Groups (MS-DRGs). I did not focus my analysis on this model. The BPCI Models 2-4 which began in 2013 included up to 48 clinical episodes while the most recent iteration, BPCI Advanced, has 32 clinical episodes. These include inpatient episodes for things like congestive heart failure, major joint replacement of the lower extremity, pacemaker, and stroke, and outpatient episodes for percutaneous coronary intervention, cardiac defibrillator, and back & neck spinal fusion. Additional information about the BPCI models is available on CMS websites such as https://innovation.cms.gov/initiatives/Bundled-Payments/ and https://innovation.cms.gov/initiatives/bpci-advanced .

The reason I find the number of episodes selected interesting is that the max value in this dataset (47) means that at least 1 hospital told CMS they would be held financially accountable for 47 of the available 48 clinical episodes. This is somewhat surprising since hospitals often specialize or focus in some areas more than others. A hospital selecting these 47 episodes would indicate that they were confident in their overall efficiency of care, the quality of care, positive outcomes for patients, and cost savings across their facility and network of providers. This is one potential area I could investigate more.

Group the dataframe by multiple columns with mean number of clinical episodes.

I want to see the mean number of clinical episodes for each of the BPCI models.

#this chunk says to group the df by the initiative column then
#take the variable called numberofclinicalepisodes and apply the function mean and remove the nas when calculating
#then I filtered out the nas so that I only had the initiatives with clinical episodes selected.
#I assigned that summarization to something called bpcimodels.
bpcimodels <- summarise_at(group_by(allparticipants, initiative),vars(numberofclinicalepisodes),funs(mean(.,na.rm=TRUE))) %>%
  filter(!is.na(numberofclinicalepisodes)) %>% #filter here says to take all rows that do not have an 'na' value in the numberofclinicalepisodes variable.
  mutate_if(is.numeric,round, digits=1) #this rounds the mean of episodes to 1 decimal place.

## Warning: funs() is soft deprecated as of dplyr 0.8.0
## Please use a list of either functions or lambdas: 
## 
##   # Simple named list: 
##   list(mean = mean, median = median)
## 
##   # Auto named with `tibble::lst()`: 
##   tibble::lst(mean, median)
## 
##   # Using lambdas
##   list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once per session.

#that leaves me with the 4 BPCI models with their mean number of clinical episodes selected.

#This attempt made a separate variable called roundedCE. Later I had trouble integrating this into another plot so I used the mutate function above instead of this part.
#bpcimodels <- bpcimodels %>%
 # (round(bpcimodels[,2,drop=TRUE],digits=1))
#roundedCE <- round(bpcimodels[,2,drop=TRUE],digits=1) #this rounds off the digits to 1 decimal for my next plot

bpcimodels  #this displays the df in a tibble

## # A tibble: 4 x 2
##   initiative               numberofclinicalepisodes
##   <chr>                                       <dbl>
## 1 BPCI Advanced                                 6  
## 2 BPCI Initiative: Model 2                      5.4
## 3 BPCI Initiative: Model 3                     10.5
## 4 BPCI Initiative: Model 4                      6.5

This tibble shows the average number of clinical episodes chosen for each of the four BPCI initiatives, rounded to 1 decimal place.

Plot the average number of clinical episodes.

I’ll look at the average number of clinical episodes that participants chose within each iteration of the BPCI models. I used theme_stata for my plot.

I tried pulling in a color scheme to use to try to keep my variable colors consistent throughout but could not get this part to work for me here. Later, I identified the color codes and manually assigned the colors to the appropriate initiative. I also reordered the initiatives so they would display in chronological order: BPCI Model 2, Model 3, Model 4, then Advanced.

#install.packages("plotly")
#install.packages("ggthemes")
#install.packages("RColorBrewer")
library(plotly)

## 
## Attaching package: 'plotly'

## The following object is masked from 'package:ggplot2':
## 
##     last_plot

## The following object is masked from 'package:stats':
## 
##     filter

## The following object is masked from 'package:graphics':
## 
##     layout

library(ggthemes)
library(RColorBrewer)
#mycolors <- brewer.pal(5, "Set2")
#names(mycolors) <- levels(bpcimodels$initiative)
#colScale <- scale_color_manual(name="initiative", values=mycolors)

bpcimodels$initiative <- factor(bpcimodels$initiative, levels=c("BPCI Initiative: Model 2", "BPCI Initiative: Model 3", "BPCI Initiative: Model 4", "BPCI Advanced")) #this lists them in the correct order

bpciepisodes <- ggplot(bpcimodels,aes(x=initiative,y=numberofclinicalepisodes)) + #, fill=initiative)) +
  geom_bar(stat="identity") +
  aes(fill=initiative) +
  theme_classic() +
  labs(x="Initiative", y="Average Number of Clinical Episodes Chosen", title="Average Number of Clinical Episodes within each Initiative") + 
  theme(legend.title=element_blank()) + #removes the title of the legend
  scale_fill_manual(values=c("#F8766D","#7CAE00","#00BFC4","#C77CFF")) #this makes the colors in the order I wanted them.
ggplotly(bpciepisodes) #using ggplotly makes the labels appear when you hover

#In this section I was trying to adjust the colors but ended up not doing it this way.
#bpciepisodes <- bpcimodels %>% #this uses the small dataframe of just the bpci models
  #ggplot(aes(x=initiative, y=roundedCE,fill=initiative)) + #this uses the rounded off number of clinical episodes
  #geom_bar(stat="identity") +
  #theme_stata() + #I used a different theme
  #labs(x="Initiative", y="Average Number of Clinical Episodes Chosen", title="Average Number of Clinical Episodes within each Initiative") #+
 # colScale #this is where the colors did not work
#ggplotly(bpciepisodes) #using ggplotly makes the labels appear when you hover

I tried a few ways to do the aesthetics throughout, trying to not have the label appear with duplicates. The tutorial for week 2, and in others I saw online, mostly focus on the placement of aes when using geom_point and the command geom_bar seems to work enough differently that I couldn’t get it right.

This plot is a good start. The axis labels are overcrowded and the hovering labels are repetitive. (I already fixed the legend label to be in the correct order.)

A plot that nudges the axis labels.

#install.packages("ggrepel")
library(ggrepel)
bpciepisodes <- ggplot(bpcimodels, aes(initiative, numberofclinicalepisodes, label=initiative, fill=initiative)) +
  geom_text_repel(nudge_x = 0.005) + #this nudges the x axis labels a bit
  geom_bar(stat="identity") +
  theme_stata() + #I tried a different theme here
  labs(x="Initiative", y="Average Number of Clinical Episodes Chosen", title="Average Number of Clinical Episodes within each Initiative") 
ggplotly(bpciepisodes)

## Warning in geom2trace.default(dots[[1L]][[1L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues

## Warning in geom2trace.default(dots[[1L]][[2L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues

## Warning in geom2trace.default(dots[[1L]][[3L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues

## Warning in geom2trace.default(dots[[1L]][[4L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues

#the geom_text_repel command looks for a label so I added that into the aesthetics. However, this also added another duplication of the hovering label. Also, the warnings after this plot indicate that plotly does not support the command yet.

So this version shifted the axis labels but the legend is not viewable (runs out of view). The labels are repetitive. I probably do not need a legend at all for this one since the axis labels are clear.

bpciepisodes <- ggplot(bpcimodels, aes(initiative, numberofclinicalepisodes, fill=initiative)) +
  #aes(fill=initiative) +
  geom_bar(stat="identity") +
  theme(legend.position="none") + #removes the legend completely
  #theme_stata() + #I had to remove the theme because I could not rotate the y labels when this was on
  labs(x="", y="",title="Average Number of Clinical Episodes within each Initiative") 
#I tried a few ways to do the aesthetics, trying to not have the label appear with duplicates. Our tutorial in week 2 and others I saw online mostly focused on the placement of aes when using geom_point and the command geom_bar seems to work enough differently that I couldn't get it right.
ggplotly(bpciepisodes)

This view is better, although the hovering labels are still duplicated. However, the y axis labels are now rotated for easier viewing, which required that I remove the theme_stata() to get it to work.

I want to plot the mean of the number of clinical episodes chosen.

First I will get some statistics on the variation.

#install.packages("rcompanion") # this has the groupwiseMean command which I use to get the statistics across each of the BPCI participants' selection of clinical episodes.

library(rcompanion)
bpciparticipants <- allparticipants %>%
  filter(!is.na(numberofclinicalepisodes))
CEmean <- groupwiseMean(data=bpciparticipants, var="numberofclinicalepisodes", group="initiative",conf=0.95,digits=3)
CEmean #this is now the mean of the number of clinical episodes for each of the 4 initiatives.

##                 initiative    n  Mean Conf.level Trad.lower Trad.upper
## 1            BPCI Advanced 1295  5.95       0.95       5.63       6.27
## 2 BPCI Initiative: Model 2  402  5.43       0.95       4.83       6.03
## 3 BPCI Initiative: Model 3  575 10.50       0.95       9.65      11.40
## 4 BPCI Initiative: Model 4    2  6.50       0.95     -63.40      76.40

summary(CEmean)

##   initiative              n               Mean          Conf.level  
##  Length:4           Min.   :   2.0   Min.   : 5.430   Min.   :0.95  
##  Class :character   1st Qu.: 302.0   1st Qu.: 5.820   1st Qu.:0.95  
##  Mode  :character   Median : 488.5   Median : 6.225   Median :0.95  
##                     Mean   : 568.5   Mean   : 7.095   Mean   :0.95  
##                     3rd Qu.: 755.0   3rd Qu.: 7.500   3rd Qu.:0.95  
##                     Max.   :1295.0   Max.   :10.500   Max.   :0.95  
##    Trad.lower        Trad.upper    
##  Min.   :-63.400   Min.   : 6.030  
##  1st Qu.:-12.227   1st Qu.: 6.210  
##  Median :  5.230   Median : 8.835  
##  Mean   :-10.822   Mean   :25.025  
##  3rd Qu.:  6.635   3rd Qu.:27.650  
##  Max.   :  9.650   Max.   :76.400

So this gives the mean of the number of clinical episodes with the quartiles.

Plotting the means with a deviation.

A standard error is for a sample and standard deviation is for an entire population. I have the whole population of participants in the models. I can plot the standard error calculated above.

CEmean$initiative <- factor(CEmean$initiative, levels=c("BPCI Initiative: Model 2", "BPCI Initiative: Model 3", "BPCI Initiative: Model 4", "BPCI Advanced")) #this lists them in the correct order

CEmeanplot <- ggplot(CEmean,aes(x=initiative, y=Mean, fill=initiative)) +
  geom_bar(stat="identity") +
  theme_classic() +
  theme(legend.position="none")
  #geom_errorbar(aes(ymin=Trad.lower, ymax=Trad.upper,width=0.15))
CEmeanplot #this plots the means of the numberofclinicalepisodes variable grouped by initiative

#I removed the error bar since it is not meaningful in this case. After doing so, the following text and plots became unnecessary 
#### I need to remove BPCI Initiative: Model 4 since it is making my plots stretch oddly.
#The BPCI Model 4 was only selected by 2 participant hospitals: 1 hospital reported 1 clinical episode and the other #hospital reported on 12 clinical episodes making for a mean of 6.50 for the numberofclinicalepisodes in this #initiative. I am not going to investigate this very small initiative so I am removing it from the visualization.

# I tried an outlier.replace version first which did not work well for me.
#install.packages(data.table) 
#library(data.table)
#Sum %>% outlier.replace(Sum,"numberofclinicalepisodes", which(numberofclinicalepisodes < 3), newValue=NA) 
#Sum %>% filter(!is.na(numberofclinicalepisodes))

statson3models <- CEmean %>% filter(n >= 3) #this filters out model #4 so the plot is easier to read. The filter is n greater than or equal to 3 which works in this case since there were only 2 participating hospitals.
  
statson3models <- ggplot(statson3models,aes(x=initiative, y=Mean)) + 
  geom_bar(stat="identity", aes(fill=initiative)) +
  #geom_errorbar(aes(ymin=Trad.lower, ymax=Trad.upper,width=0.15)) + 
  theme(legend.title=element_blank()) + #removes the title of the legend
  scale_fill_manual(values=c("#F8766D","#7CAE00","#C77CFF")) #this makes the colors I want in the order I want them.
ggplotly(statson3models)

I realized after doing this analysis that this is not really meaningful, statistically. I am plotting the entire “population”, not a “sample” so I am really not sure what the ‘error bar’ in this case would even mean.

Instead, I will move on to a box and whisker plots and look at some outliers.

This will make a boxplot of the number of clinical episodes selected.

bpciparticipants$initiative <- factor(bpciparticipants$initiative, levels=c("BPCI Initiative: Model 2", "BPCI Initiative: Model 3", "BPCI Initiative: Model 4", "BPCI Advanced")) #this lists them in the correct order

boxyplot1 <- bpciparticipants %>%
  ggplot() +
  geom_boxplot(aes(y=numberofclinicalepisodes, fill=initiative), notch=TRUE) #the notch puts indents at the mean. 
boxyplot1

## notch went outside hinges. Try setting notch=FALSE.

This plot is a good start to look at outliers. The “notch” feature does not add helpful information for this plot so I will remove it.

I will also try changing the color and shape of the outliers.

boxyplot2 <- bpciparticipants %>%
  ggplot() +
  geom_boxplot(aes(y=numberofclinicalepisodes, fill=initiative), outlier.colour="red", outlier.shape=6, outlier.size=2) +
  theme(axis.text.x=element_blank()) 
#these set the color, shape, and size of the indicator for each outlier and I removed the values on the x axis
boxyplot2

ggplot_build(boxyplot2)$data # this is telling me the color codes for the 4 initiative models

## [[1]]
##      fill ymin lower middle upper ymax
## 1 #F8766D    1  1.00    3.0  8.00   18
## 2 #7CAE00    1  2.00    7.0 15.00   33
## 3 #00BFC4    1  3.75    6.5  9.25   12
## 4 #C77CFF    1  2.00    4.0  8.00   17
##                                                                                                                                                                                                                                                                                                                                                                                     outliers
## 1                                                                                                                                                                                                                                                                                                                     19, 20, 20, 21, 21, 22, 22, 23, 24, 25, 26, 26, 28, 29, 30, 31, 31, 32
## 2                                                                                                                                                                                                                     36, 37, 37, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 39, 39, 40, 41, 41, 42, 43, 43, 43, 45, 47, 47
## 3                                                                                                                                                                                                                                                                                                                                                                                           
## 4 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 21, 22, 22, 22, 23, 23, 24, 24, 25, 26, 26, 27, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 32, 32, 32, 32
##   notchupper notchlower        x PANEL group ymin_final ymax_final      xmin
## 1   3.551623  2.4483773 -0.28125     1     1          1         32 -0.365625
## 2   7.856577  6.1434228 -0.09375     1     2          1         47 -0.178125
## 3  12.644758  0.3552421  0.09375     1     3          1         12  0.009375
## 4   4.263435  3.7365650  0.28125     1     4          1         32  0.196875
##        xmax weight colour size alpha shape linetype
## 1 -0.196875      1 grey20  0.5    NA    19    solid
## 2 -0.009375      1 grey20  0.5    NA    19    solid
## 3  0.178125      1 grey20  0.5    NA    19    solid
## 4  0.365625      1 grey20  0.5    NA    19    solid

This looks like a lot of outliers.

The initiatives are listed in the wrong order (which makes them mixed up in the color scheme). And I need to remove the legend label.

boxyplot3 <- bpciparticipants %>%
  ggplot() +
  geom_boxplot(aes(y=numberofclinicalepisodes, fill=initiative), outlier.colour="red", outlier.shape=6, outlier.size=2) +
  labs(title="BPCI Models with Number of Clinical Episodes Chosen",y="Number of Clinical Episodes",fill="") + #this removes the label for the legend
  theme(axis.text.x=element_blank()) #this removes the axis labels
boxyplot3

ggplot_build(boxyplot3)$data # this is telling me the color codes for the 4 initiative models

## [[1]]
##      fill ymin lower middle upper ymax
## 1 #F8766D    1  1.00    3.0  8.00   18
## 2 #7CAE00    1  2.00    7.0 15.00   33
## 3 #00BFC4    1  3.75    6.5  9.25   12
## 4 #C77CFF    1  2.00    4.0  8.00   17
##                                                                                                                                                                                                                                                                                                                                                                                     outliers
## 1                                                                                                                                                                                                                                                                                                                     19, 20, 20, 21, 21, 22, 22, 23, 24, 25, 26, 26, 28, 29, 30, 31, 31, 32
## 2                                                                                                                                                                                                                     36, 37, 37, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 39, 39, 40, 41, 41, 42, 43, 43, 43, 45, 47, 47
## 3                                                                                                                                                                                                                                                                                                                                                                                           
## 4 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 21, 22, 22, 22, 23, 23, 24, 24, 25, 26, 26, 27, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 32, 32, 32, 32
##   notchupper notchlower        x PANEL group ymin_final ymax_final      xmin
## 1   3.551623  2.4483773 -0.28125     1     1          1         32 -0.365625
## 2   7.856577  6.1434228 -0.09375     1     2          1         47 -0.178125
## 3  12.644758  0.3552421  0.09375     1     3          1         12  0.009375
## 4   4.263435  3.7365650  0.28125     1     4          1         32  0.196875
##        xmax weight colour size alpha shape linetype
## 1 -0.196875      1 grey20  0.5    NA    19    solid
## 2 -0.009375      1 grey20  0.5    NA    19    solid
## 3  0.178125      1 grey20  0.5    NA    19    solid
## 4  0.365625      1 grey20  0.5    NA    19    solid

I will focus in on Model 3 which has the highest number of clinical episodes chosen.

model3only <- bpciparticipants %>%
   filter(initiative == "BPCI Initiative: Model 3") #this pulls only model 3 into the dataframe called model3only

boxyplot4 <- ggplot(model3only,aes(x=initiative, y=numberofclinicalepisodes, fill=initiative)) +
  geom_boxplot(outlier.colour="red", outlier.shape=6, outlier.size=2) +
  labs(y="Number of Clinical Episodes",fill="") + #this removes the label for the legend 
  scale_fill_manual(values=c("#7CAE00")) + #this is assigning the specific color for Model 3 so it matches model 3 plots above
  theme(axis.text.x=element_blank()) + #this removes the axis markers
  stat_summary(fun.y=mean, geom="point", shape=8, size=2, color="white")
boxyplot4

#The white star represents the mean of clinical episodes selected. 
#If I add ggplotly(boxyplot4) then the formatting for the outliers and the star for the mean go away. However, plotly adds the whiskers (bars) on the plot.

ggplot_build(boxyplot4)$data #this tells me the color code

## [[1]]
##      fill ymin lower middle upper ymax
## 1 #7CAE00    1     2      7    15   33
##                                                                                                                                                                 outliers
## 1 36, 37, 37, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 39, 39, 40, 41, 41, 42, 43, 43, 43, 45, 47, 47
##   notchupper notchlower x PANEL group ymin_final ymax_final  xmin  xmax xid
## 1   7.856577   6.143423 1     1     1          1         47 0.625 1.375   1
##   newx new_width weight colour size alpha shape linetype
## 1    1      0.75      1 grey20  0.5    NA    19    solid
## 
## [[2]]
##      fill x group        y ymin ymax PANEL shape colour size alpha stroke
## 1 #7CAE00 1     1 10.54957   NA   NA     1     8  white    2    NA    0.5

I learned if I use plotly on these plots then the color coding for the outliers does not display - they stay as black points. The plot above shows that BPCI Model 3 had an average of 10.5 clinical episodes.

boxyplot5 <- ggplot(model3only,aes(x=initiative, y=numberofclinicalepisodes, fill=initiative)) +
  geom_boxplot(outlier.colour="red", outlier.shape=6, outlier.size=2) +
  labs(y="Number of Clinical Episodes",fill="") + #this removes the label for the legend 
  scale_fill_manual(values=c("#7CAE00")) + #this is assigning the specific color for Model 3 so it matches model 3 plots above
  theme(axis.text.x=element_blank()) + #this removes the axis markers
  stat_summary(fun.y=mean, geom="point", shape=8, size=2, color="white") #+
  #geom_text(aes(label=outlier),na.rm=TRUE,hjust=-0.3)

# get list of outliers 
out <- ggplot_build(boxyplot5)[["model3only"]][[1]][["outliers"]]

# label list elements with factor levels
#names(out) <- levels(factor(mtcars$cyl))

# convert to tidy data
#tidyout <- purrr::map_df(out, tibble::as_tibble, .id = "cyl")

# plot boxplots with labels
boxyplot5 + geom_text(data = out, aes(initiative, numberofclinicalepisodes, label = numberofclinicalepisodes), 
              hjust = -.3)

boxyplot5

#So this was an attempt to lable the outlier values, which ended up labeling all the values.

The highest number of clinical episodes selected was 47 episodes within BPCI Model 3. The outliers are marked as values of 36 and above.

highepisodes <- bpciparticipants %>%
  filter(initiative == "BPCI Initiative: Model 3") %>%
  filter(numberofclinicalepisodes>35) %>%
  arrange(desc(numberofclinicalepisodes)) #this sorts the number of clinical episodes selected from highest to lowest
highepisodes

## # A tibble: 42 x 15
##    initiative organizationname numberofclinica… location1 streetaddress city 
##    <fct>      <chr>                       <dbl> <chr>     <chr>         <chr>
##  1 BPCI Init… Bayley                         47 990 Bayl… 990 Bayley P… Cinc…
##  2 BPCI Init… Marjorie P. Lee…               47 3550 Sha… 3550 Shaw Ave Cinc…
##  3 BPCI Init… United Church H…               45 215 Seth… 215 Seth Ave  Jack…
##  4 BPCI Init… United Church H…               43 3800 Boa… 3800 Boardwa… Sand…
##  5 BPCI Init… United Church H…               43 3218 Ind… 3218 Indian … Beav…
##  6 BPCI Init… Gardner Heights…               43 172 Rock… 172 Rocky Re… Shel…
##  7 BPCI Init… United Church H…               42 12200 St… 12200 Straus… Cana…
##  8 BPCI Init… Friendship Vill…               41 6000 Riv… 6000 Riversi… Dubl…
##  9 BPCI Init… Twin Lakes - Li…               41 9840 Mon… 9840 Montgom… Cinc…
## 10 BPCI Init… United Church H…               40 850 Mars… 850 Marseill… Uppe…
## # … with 32 more rows, and 9 more variables: state <chr>, statebased <chr>,
## #   phase2 <chr>, facebook <chr>, twitter <chr>, youtube <chr>, website <chr>,
## #   category <chr>, uniqueid <dbl>

This reflects that 42 participant hospitals selected at least 36 clinical episodes for CMS to hold them financially responsible in this model. Since the mean number of clinical episodes for this model of 575 participants is 10.5 episodes, these 42 participants are the outliers.

plots2models <- bpciparticipants %>%
  filter(initiative != "BPCI Initiative: Model 2") %>%
  filter(initiative != "BPCI Initiative: Model 4")

in2models <- ggplot(plots2models) +
  geom_bar(aes(x=initiative)) +
  labs(title= "Number of Participants", x = "Initiative", y="") +
  scale_fill_manual(values=c("#7CAE00","C77CFF"))
ggplotly(in2models)

I like this as a clean chart but I could add colors and better labels to make it more appealing. This plot shows the change in number of participants from Model 3 to Advanced. There are many items to investigate here: What changes were made to make such an increase in the number of participants for the Advanced model over Models 3?

bpciparticipants$initiative <- factor(bpciparticipants$initiative, levels=c("BPCI Initiative: Model 2", "BPCI Initiative: Model 3", "BPCI Initiative: Model 4", "BPCI Advanced")) #this lists them in the correct order

modelsepisodescounts <- ggplot(bpciparticipants) +
  geom_bar(aes(x=numberofclinicalepisodes, fill=initiative)) +
  labs(x= "Number of Clinical Episodes", y= "Number of Participants") +
  labs(title= "BPCI Models: Participation and Number of Episodes", fill= "Initiative") +
  theme_classic()

ggplotly(modelsepisodescounts)

I love that you can double click on these initiatives and isolate those results!

This plot shows many things such as how the vast majority of participants in the Advanced model chose only 1 clinical episode. It is also interesting that there were 4 participants who chose 32 (the max) clinical episodes. These are things that could be researched further.

I wanted to improve on this visualization to make it more meaningful. The bars are a bit narrow and difficult to view but I do really like the ability to select and deselect from the legend.

Separate the 4 BPCI Models into charts.

This uses the facet_wrap command to make separate charts for each model.

#install.packages("devtools")
library(devtools)

## Loading required package: usethis

modelsepisodescounts2 <- ggplot(bpciparticipants) +
  #axis.title.y=element_text(angle=90,margin=margin(r=0.8*half_line,l=0.9*half_line/2)) +
  theme(axis.text.y.left = element_text(margin = unit(c(1, 1, 1, 5), "mm"))) +
  geom_bar(aes(x=numberofclinicalepisodes, fill=initiative), position="dodge") +
  labs(x= "Number of Clinical Episodes", y= "Number of Participants") +
  labs(title= "BPCI Models: Participation and Number of Episodes", fill= "Initiative") +
  theme_classic() +
  facet_wrap(~initiative)
modelsepisodescounts2 <- modelsepisodescounts2 +
  theme(legend.position = "none")
ggplotly(modelsepisodescounts2)

As previously discussed, Model 4 was a small demo with 2 participating hospitals. Model was the basis for Model 3. I would like to exclude models 2 and 4 in order to focus more on Model 3 and Advanced.

bpciparticipants2 <- bpciparticipants[bpciparticipants$initiative != "BPCI Initiative: Model 2",] #this removes model 2
bpciparticipants2 <- bpciparticipants2[bpciparticipants2$initiative != "BPCI Initiative: Model 4",] #this removes model 4
modelsepisodescounts3 <- ggplot(bpciparticipants2, aes(x=numberofclinicalepisodes)) +
  #geom_bar(aes(x=numberofclinicalepisodes, fill=initiative), position="dodge") +
  geom_bar(aes(fill=initiative)) +
  labs(x= "Number of Clinical Episodes", y= "Number of Participants") +
  labs(title= "BPCI Models: Participation and Number of Episodes", fill= "Initiative") +
  theme_classic() +
  facet_wrap(~initiative)

modelsepisodescounts3 <- modelsepisodescounts3 +
  theme(legend.position = "none")
ggplotly(modelsepisodescounts3)

Next, I will look at the number of participants in a few of the CMS Innovation Center models.

Pull together counts of participants in each initiative (model).

#since I have altered my dataset so much, I am going back to my cleaned dataset.
byinitiative <- read_csv("CMSdata2.csv")

## Parsed with column specification:
## cols(
##   Initiative = col_character(),
##   `Organization Name` = col_character(),
##   `Number of Clinical Episodes` = col_double(),
##   `Location 1` = col_character(),
##   `Street Address` = col_character(),
##   City = col_character(),
##   State = col_character(),
##   `State Based` = col_logical(),
##   `Phase 2` = col_logical(),
##   Facebook = col_character(),
##   Twitter = col_character(),
##   Youtube = col_character(),
##   Website = col_character(),
##   Category = col_character(),
##   `Unique ID` = col_double()
## )

## Warning: 979 parsing failures.
##  row     col           expected                                                    actual           file
## 1889 Phase 2 1/0/T/F/TRUE/FALSE Major joint replacement of the lower extremity            'CMSdata2.csv'
## 1890 Phase 2 1/0/T/F/TRUE/FALSE Major joint replacement of the lower extremity            'CMSdata2.csv'
## 1891 Phase 2 1/0/T/F/TRUE/FALSE Major joint replacement of the lower extremity            'CMSdata2.csv'
## 1892 Phase 2 1/0/T/F/TRUE/FALSE Major bowel procedure                                     'CMSdata2.csv'
## 1893 Phase 2 1/0/T/F/TRUE/FALSE Chronic obstructive pulmonary disease, bronchitis, asthma 'CMSdata2.csv'
## .... ....... .................. ......................................................... ..............
## See problems(...) for more details.

names(byinitiative) <- tolower(names(byinitiative)) #makes the column heading lowercase
names(byinitiative) <- gsub(" ","", names(byinitiative)) #removes spaces from headings, if applicable
head(subset(byinitiative, select = "initiative"))

## # A tibble: 6 x 1
##   initiative                     
##   <chr>                          
## 1 Comprehensive Primary Care Plus
## 2 Comprehensive Primary Care Plus
## 3 Comprehensive Primary Care Plus
## 4 Comprehensive Primary Care Plus
## 5 Comprehensive Primary Care Plus
## 6 Comprehensive Primary Care Plus

plotinitiative <- byinitiative %>%
  group_by(initiative) %>%
  count(organizationname) %>%
  ggplot(aes(fill=initiative)) +
  geom_bar(aes(x=initiative)) +
  labs(title="Number of Participants in each Model",x="Initiative",y="")+
  theme(axis.text.x=element_blank())+
  theme(legend.position="none")
ggplotly(plotinitiative)

This plot shows that most participants only join a few models. The models with the highest number of participants are BPCI Advanced which we have already explored and Comprehensive Primary Care Plus (CPC+).

table(byinitiative$initiative)

## 
##                                                        Accountable Health Communities Model 
##                                                                                          30 
##                                                                        ACO Investment Model 
##                                                                                          45 
##                                                                   Advance Payment ACO Model 
##                                                                                          35 
##                                                                               BPCI Advanced 
##                                                                                        1295 
##                                                                    BPCI Initiative: Model 2 
##                                                                                         402 
##                                                                    BPCI Initiative: Model 3 
##                                                                                         575 
##                                                                    BPCI Initiative: Model 4 
##                                                                                           2 
##                                                    Community-based Care Transitions Program 
##                                                                                          18 
##                                                               Comprehensive ESRD Care Model 
##                                                                                          33 
##                                                       Comprehensive Primary Care Initiative 
##                                                                                         439 
##                                                             Comprehensive Primary Care Plus 
##                                                                                        2851 
##       Federally Qualified Health Center (FQHC) Advanced Primary Care Practice Demonstration 
##                                                                                         434 
##                                 Frontier Community Health Integration Project Demonstration 
##                                                                                          10 
##                                                      Graduate Nurse Education Demonstration 
##                                                                                           5 
##                                                               Health Care Innovation Awards 
##                                                                                         236 
##                                                     Health Care Innovation Awards Round Two 
##                                                                                          59 
##                  Incentives for the Prevention of Chronic Disease in Medicaid Demonstration 
##                                                                                          10 
##                                                          Independence at Home Demonstration 
##                                                                                          16 
## Initiative to Reduce Avoidable Hospitalizations Among Nursing Facility Residents: Phase Two 
##                                                                                         247 
##                                                                 Innovation Advisors Program 
##                                                                                          71 
##                                                Medicaid Emergency Psychiatric Demonstration 
##                                                                                          12 
##                                       Medicare Advantage Value-Based Insurance Design Model 
##                                                                                          14 
##                                                                 Medicare Care Choices Model 
##                                                                                          85 
##                                                       Mercy Health - Tiffin Family Medicine 
##                                                                                           1 
##                                 Million Hearts: Cardiovascular Disease Risk Reduction Model 
##                                                                                         319 
##                                                   Multi-payer Advanced Primary Care Program 
##                                                                                           8 
##                                                                   Next Generation ACO Model 
##                                                                                          41 
##                                                                         Oncology Care Model 
##                                                                                         175 
##                                         Part D Enhanced Medication Therapy Management Model 
##                                                                                           6 
##                                                                           Pioneer ACO Model 
##                                                                                           9 
##                                                      Rural Community Hospital Demonstration 
##                                                                                          28 
##                                     State Innovation Models Initiative: Model Design Awards 
##                                                                                          16 
##                           State Innovation Models Initiative: Model Design Awards Round Two 
##                                                                                          21 
##                                State Innovation Models Initiative: Model Pre-Testing Awards 
##                                                                                           3 
##                             State Innovation Models Initiative: Model Test Awards Round Two 
##                                                                                          10 
##                                    State Innovation Models Initiative: Model Testing Awards 
##                                                                                           6 
##                                            Strong Start for Mothers and Newborns Initiative 
##                                                                                         182 
##                                                  Transforming Clinical Practices Initiative 
##                                                                                         136 
##                                                                     Vermont All-Payer Model 
##                                                                                           1

This displays the number of participants within each initiative.

I want to look at the participants who joined both BPCI Advanced and CPC+ models.

bpciaandcpcplus <- byinitiative %>%
  filter(initiative=="BPCI Advanced" | initiative=="Comprehensive Primary Care Plus")
head(bpciaandcpcplus)

## # A tibble: 6 x 15
##   initiative organizationname numberofclinica… location1 streetaddress city 
##   <chr>      <chr>                       <dbl> <chr>     <chr>         <chr>
## 1 Comprehen… Brown Medicine …               NA 375 Wamp… 375 Wampanoa… East…
## 2 Comprehen… Brown Medicine …               NA 375 Wamp… 375 Wampanoa… East…
## 3 Comprehen… Brown Medicine …               NA 375 Wamp… 375 Wampanoa… East…
## 4 Comprehen… Brown Medicine …               NA 909 Nort… 909 North Ma… Prov…
## 5 Comprehen… Coastal Medical…               NA 11445 Wa… 11445 Wampan… East…
## 6 Comprehen… LPG Primary Car…               NA 50 Memor… 50 Memorial … Newp…
## # … with 9 more variables: state <chr>, statebased <lgl>, phase2 <lgl>,
## #   facebook <chr>, twitter <chr>, youtube <chr>, website <chr>,
## #   category <chr>, uniqueid <dbl>

bpciaandcpcplus2 <- count(bpciaandcpcplus,vars="organizationname")
bpciaandcpcplus2

## # A tibble: 1 x 2
##   vars                 n
##   <chr>            <int>
## 1 organizationname  4146

#So I've proven that R can count the number of rows in my dataframe (!) but I wanted to try to get at the overlap of participants in the two models. I tried various things to get at the overlap but was unsuccessful. I tried to group at the organizationname but couldn't see how to pull out, list, or plot the organizationname that occurred more than once. Trying to do a >= 2 on organizationname doesn't work since I don't want R to try to match the value in the field but the number of occurences of a field.

bpciacpcplusplot <- ggplot(bpciaandcpcplus,aes(x=initiative)) +
  geom_bar()
ggplotly(bpciacpcplusplot)

This gives me a plot of the number of participants in each initiative.

cpc2models <- filter(byinitiative,initiative %in% c("Comprehensive Primary Care Plus","Comprehensive Primary Care Initiative")) #%>%
  #ggplot(cpc2models,aes(initiative)) +
  #geom_bar(stat="identity") +
  #theme(legend.position="none")  #removes the legend completely
  #theme_stata() + #I had to remove the theme because I could not rotate the y labels when this was on
head(cpc2models)

## # A tibble: 6 x 15
##   initiative organizationname numberofclinica… location1 streetaddress city 
##   <chr>      <chr>                       <dbl> <chr>     <chr>         <chr>
## 1 Comprehen… Brown Medicine …               NA 375 Wamp… 375 Wampanoa… East…
## 2 Comprehen… Brown Medicine …               NA 375 Wamp… 375 Wampanoa… East…
## 3 Comprehen… Brown Medicine …               NA 375 Wamp… 375 Wampanoa… East…
## 4 Comprehen… Brown Medicine …               NA 909 Nort… 909 North Ma… Prov…
## 5 Comprehen… Coastal Medical…               NA 11445 Wa… 11445 Wampan… East…
## 6 Comprehen… LPG Primary Car…               NA 50 Memor… 50 Memorial … Newp…
## # … with 9 more variables: state <chr>, statebased <lgl>, phase2 <lgl>,
## #   facebook <chr>, twitter <chr>, youtube <chr>, website <chr>,
## #   category <chr>, uniqueid <dbl>

I pulled the CPC and CPC+ initiatives together into a dataframe with the thought that I could overlap this dataframe with the bpcimodels dataframe and look for repeats but this was also unsuccessful to get me a list of organizations which participated in both models.

I wanted to try plotting quantitative variables so I pulled in a dataset of Medicare spending in the next section.

I will look at Medicare Program spending based on data from the Medicare Inpatient Hospitals dataset https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/CMSProgramStatistics/2017/2017_Utilization#MU1

This dataset represents Medicare spending for inpatient hospital stays for all beneficiaries for the years 2012-2017.

I selected only the rows of data for All Beneficiaries and put this information into a separate file called CPSMDCRINPTHOSP2.csv (The dataset was broken down by older beneficiaries vs. disabled beneficiaries but I grabbed the totals.)

Pull in the dataset.

cmspayments <-read_csv("CPSMDCRINPTHOSP2.csv")

## Parsed with column specification:
## cols(
##   .default = col_number(),
##   `Type of Entitlement and Calendar Year` = col_double(),
##   `Discharges Per 1,000 Original Medicare Part A Enrollees` = col_double(),
##   `Total Days of Care Per Person With Utilization` = col_double(),
##   `Total Days of Care Per Discharge` = col_double(),
##   `Covered Days of Care Per Person With Utilization` = col_double(),
##   `Covered Days of Care Per Discharge` = col_double(),
##   `Total Program Payments` = col_character(),
##   `Program Payments Per Original Medicare Part A Enrollee` = col_character(),
##   `Program Payments Per Person With Utilization` = col_character(),
##   `Program Payments Per Discharge` = col_character(),
##   `Program Payments Per Covered Day` = col_character(),
##   `Coinsurance Days Per Person With Coinsurance 1` = col_double(),
##   `Total Coinsurance Payments` = col_character(),
##   `Coinsurance Payments Per Person With Coinsurance 1` = col_character(),
##   `Total Deductible Payments` = col_character()
## )

## See spec(...) for full column specifications.

dim(cmspayments)

## [1]  6 28

summary(cmspayments)

##  Type of Entitlement and Calendar Year Total Original Medicare Part A Enrollees
##  Min.   :2012                          Min.   :36894884                        
##  1st Qu.:2013                          1st Qu.:37342622                        
##  Median :2014                          Median :37592364                        
##  Mean   :2014                          Mean   :37668874                        
##  3rd Qu.:2016                          3rd Qu.:38143119                        
##  Max.   :2017                          Max.   :38347556                        
##  Total Persons With Utilization Total Discharges  
##  Min.   :6608549                Min.   :11043946  
##  1st Qu.:6621522                1st Qu.:11078080  
##  Median :6629258                Median :11128774  
##  Mean   :6710188                Mean   :11282202  
##  3rd Qu.:6755216                3rd Qu.:11363434  
##  Max.   :6977914                Max.   :11888884  
##  Discharges Per 1,000 Original Medicare Part A Enrollees
##  Min.   :288.0                                          
##  1st Qu.:290.5                                          
##  Median :296.0                                          
##  Mean   :299.7                                          
##  3rd Qu.:304.5                                          
##  Max.   :322.0                                          
##  Discharges Per 1,000 Persons With Utilization Total Days of Care
##  Min.   :1671                                  Min.   :64561447  
##  1st Qu.:1672                                  1st Qu.:66051021  
##  Median :1678                                  Median :67203256  
##  Mean   :1681                                  Mean   :67499946  
##  3rd Qu.:1682                                  3rd Qu.:68677660  
##  Max.   :1704                                  Max.   :71195649  
##  Total Days of Care Per 1,000 Original Medicare Part A Enrollees
##  Min.   :1684                                                   
##  1st Qu.:1732                                                   
##  Median :1788                                                   
##  Mean   :1793                                                   
##  3rd Qu.:1839                                                   
##  Max.   :1930                                                   
##  Total Days of Care Per Person With Utilization
##  Min.   : 9.770                                
##  1st Qu.: 9.975                                
##  Median :10.135                                
##  Mean   :10.057                                
##  3rd Qu.:10.168                                
##  Max.   :10.200                                
##  Total Days of Care Per Discharge Covered Days of Care
##  Min.   :5.850                    Min.   :60472778    
##  1st Qu.:5.952                    1st Qu.:61876016    
##  Median :6.010                    Median :63072730    
##  Mean   :5.983                    Mean   :63268769    
##  3rd Qu.:6.037                    3rd Qu.:64440226    
##  Max.   :6.050                    Max.   :66621205    
##  Covered Days of Care Per 1,000 Original Medicare Part A Enrollees
##  Min.   :1577                                                     
##  1st Qu.:1622                                                     
##  Median :1678                                                     
##  Mean   :1681                                                     
##  3rd Qu.:1726                                                     
##  Max.   :1806                                                     
##  Covered Days of Care Per Person With Utilization
##  Min.   :9.150                                   
##  1st Qu.:9.345                                   
##  Median :9.510                                   
##  Mean   :9.428                                   
##  3rd Qu.:9.547                                   
##  Max.   :9.550                                   
##  Covered Days of Care Per Discharge Total Program Payments
##  Min.   :5.480                      Length:6              
##  1st Qu.:5.570                      Class :character      
##  Median :5.625                      Mode  :character      
##  Mean   :5.607                                            
##  3rd Qu.:5.665                                            
##  Max.   :5.680                                            
##  Program Payments Per Original Medicare Part A Enrollee
##  Length:6                                              
##  Class :character                                      
##  Mode  :character                                      
##                                                        
##                                                        
##                                                        
##  Program Payments Per Person With Utilization Program Payments Per Discharge
##  Length:6                                     Length:6                      
##  Class :character                             Class :character              
##  Mode  :character                             Mode  :character              
##                                                                             
##                                                                             
##                                                                             
##  Program Payments Per Covered Day Persons With Coinsurance 1
##  Length:6                         Min.   : 97580            
##  Class :character                 1st Qu.:102140            
##  Mode  :character                 Median :104588            
##                                   Mean   :105017            
##                                   3rd Qu.:107821            
##                                   Max.   :113125            
##  Total Coinsurance Days Coinsurance Days Per Person With Coinsurance 1
##  Min.   :1670386        Min.   :17.07                                 
##  1st Qu.:1747762        1st Qu.:17.09                                 
##  Median :1791928        Median :17.11                                 
##  Mean   :1796624        Mean   :17.11                                 
##  3rd Qu.:1844343        3rd Qu.:17.12                                 
##  Max.   :1930647        Max.   :17.14                                 
##  Total Coinsurance Payments Coinsurance Payments Per Person With Coinsurance 1
##  Length:6                   Length:6                                          
##  Class :character           Class :character                                  
##  Mode  :character           Mode  :character                                  
##                                                                               
##                                                                               
##                                                                               
##  Total Deductible Payments Total Lifetime Reserve Days
##  Length:6                  Min.   :791037             
##  Class :character          1st Qu.:826727             
##  Mode  :character          Median :835442             
##                            Mean   :839690             
##                            3rd Qu.:856080             
##                            Max.   :889436             
##  Persons with Lifetime Reserve Days 2 Discharged Dead 
##  Min.   :29419                        Min.   :340392  
##  1st Qu.:30799                        1st Qu.:348389  
##  Median :31212                        Median :357703  
##  Mean   :31436                        Mean   :358402  
##  3rd Qu.:32168                        3rd Qu.:367561  
##  Max.   :33626                        Max.   :378483

Make the variable names lower case and remove spaces and commas.

names(cmspayments) <- tolower(names(cmspayments)) #makes the column headings lowercase
names(cmspayments) <- gsub(" ","",names(cmspayments)) #removes the space in the headings, if applicable.
names(cmspayments) <- gsub(",","",names(cmspayments)) #removes the commas in the headings, if applicable.
head(cmspayments)

## # A tibble: 6 x 28
##   typeofentitleme… totaloriginalme… totalpersonswit… totaldischarges
##              <dbl>            <dbl>            <dbl>           <dbl>
## 1             2012         36894884          6977914        11888884
## 2             2013         37297881          6796868        11438772
## 3             2014         37476843          6628258        11137418
## 4             2015         37707886          6630259        11120130
## 5             2016         38288197          6619277        11064063
## 6             2017         38347556          6608549        11043946
## # … with 24 more variables:
## #   dischargesper1000originalmedicarepartaenrollees <dbl>,
## #   dischargesper1000personswithutilization <dbl>, totaldaysofcare <dbl>,
## #   totaldaysofcareper1000originalmedicarepartaenrollees <dbl>,
## #   totaldaysofcareperpersonwithutilization <dbl>,
## #   totaldaysofcareperdischarge <dbl>, covereddaysofcare <dbl>,
## #   covereddaysofcareper1000originalmedicarepartaenrollees <dbl>,
## #   covereddaysofcareperpersonwithutilization <dbl>,
## #   covereddaysofcareperdischarge <dbl>, totalprogrampayments <chr>,
## #   programpaymentsperoriginalmedicarepartaenrollee <chr>,
## #   programpaymentsperpersonwithutilization <chr>,
## #   programpaymentsperdischarge <chr>, programpaymentspercoveredday <chr>,
## #   personswithcoinsurance1 <dbl>, totalcoinsurancedays <dbl>,
## #   coinsurancedaysperpersonwithcoinsurance1 <dbl>,
## #   totalcoinsurancepayments <chr>,
## #   coinsurancepaymentsperpersonwithcoinsurance1 <chr>,
## #   totaldeductiblepayments <chr>, totallifetimereservedays <dbl>,
## #   personswithlifetimereservedays2 <dbl>, dischargeddead <dbl>

Stats on CMS hospital stays program payments.

cmspayments$totalprogrampayments<-str_remove_all(cmspayments$totalprogrampayments,"[$,]")  #cmspayments[,15,drop=TRUE],"[$,]")
cmspayments$totalprogrampayments <- as.numeric(cmspayments$totalprogrampayments)
cmspayments$totalprogrampayments <- round(cmspayments$totalprogrampayments/1000000000,digits=1)
t <- c("top")
#(round(bpcimodels[,2,drop=TRUE],digits=1)

#hovering <- plot_ly(data=cmspayments,tooltip="Total Program Payments: $")

plotcmspayments <- ggplot(cmspayments,aes(x=typeofentitlementandcalendaryear, y=totalprogrampayments)) +
  geom_line(color="steelblue") +
  geom_point() +
  labs(title="Total Medicare Hospital Stays Payments by Calendar Year", y="in Billions", x="") +
  theme(legend.title=element_blank()) + #removes the title of the legend
  theme_gray(base_size=16)#+
  #text=paste("Year",cmspayments$typeofentitlementandcalendaryear,"<br>Medicare Program Payments (in Billions) $", cmspayments$totalprogrampayments)
ggplotly(plotcmspayments)

I tried to change and format the labels for hovering but I was unsuccessful. I tried assigning the text to “hovering” and I also tried to use a text and paste but neither would work properly.

cmspayments$totaloriginalmedicarepartaenrollees <- round(cmspayments$totaloriginalmedicarepartaenrollees/1000000,digits=3)

plotcmspayments2 <- ggplot(cmspayments,aes(x=typeofentitlementandcalendaryear, y=totaloriginalmedicarepartaenrollees)) +
  geom_line(color="orange") +
  geom_point() +
  labs(title="Total Medicare Part A Enrollees by Calendar Year", y="in Millions", x="") +
  theme(legend.title=element_blank()) + #removes the title of the legend
  theme_gray(base_size=16)#+
  #text=paste("Year",cmspayments$typeofentitlementandcalendaryear,"<br>Medicare Program Payments (in Billions) $", cmspayments$totalprogrampayments)
ggplotly(plotcmspayments2)

payments <- data.frame(x=cmspayments$typeofentitlementandcalendaryear, y=cmspayments$totalprogrampayments)
enrollees <- data.frame(x=cmspayments$typeofentitlementandcalendaryear,y=cmspayments$totaloriginalmedicarepartaenrollees)
payments$z <- "Medicare Hospital Stays Payments"
enrollees$z <- "Medicare Part A Enrollees"
paymentenrollees <- rbind(payments,enrollees)
paymentenrollees

##       x       y                                z
## 1  2012 128.800 Medicare Hospital Stays Payments
## 2  2013 129.000 Medicare Hospital Stays Payments
## 3  2014 128.700 Medicare Hospital Stays Payments
## 4  2015 130.200 Medicare Hospital Stays Payments
## 5  2016 133.400 Medicare Hospital Stays Payments
## 6  2017 135.300 Medicare Hospital Stays Payments
## 7  2012  36.895        Medicare Part A Enrollees
## 8  2013  37.298        Medicare Part A Enrollees
## 9  2014  37.477        Medicare Part A Enrollees
## 10 2015  37.708        Medicare Part A Enrollees
## 11 2016  38.288        Medicare Part A Enrollees
## 12 2017  38.348        Medicare Part A Enrollees

I attempted to plot the 2 of these together - the total program spending for hospital stays with the number of enrollees but I was unsuccessful.

#plotcmspaymentenrollees <- ggplot(paymentenrollees,aes(x=x,y=y,group=z,color=z)) +
#  geom_path() +
#  geom_point() +
#  scale_y_continuous(name="Medicare Hospital Stays Payments",sec.axis=sec_axis(,name="Medicare Part A Enrollees"))
#plotcmspaymentenrollees

I could not have sorted through this project without the help of my sons who are both good at math, neither had knowledge of R but one has programmed in Python and Matlab. My next order of business is to invest time in programming courses, especially for Python, so that I can be prepared for next semester’s Data 201 class. :)

Data 110 Final Project CMS Innovation Center Participation

Jennifer Paraboschi

December 16, 2019