#getwd()
Center for Medicare and Medicaid Innovation (The Innovation Center) dataset
https://data.cms.gov/Special-Programs-Initiatives-Speed-Adoption-of-Bes/CMS-Innovation-Center-Model-Participants/x8pc-u7ta
Medicare is a federal health insurance program primarily for people ages 65 and older, for younger people with disabilities, or for people with end stage renal disease (ESRD). Over 60 million people use Medicare each year, and Medicare spending accounted for 15% of total federal spending in 2018, amounting to $731 billion 1. The image below from the Kaiser Family Foundation (KFF) indicates Medicare’s portion of the federal budget for 2018.
Medicare2018.
CMS is the single largest payer for health care in the United States 2 and the leadership of CMS takes this role in overall health care services seriously. CMS influences the use of best practices, prevention initiatives, treatment choices, overall harm-free care, and patient follow-up across providers and insurance payers because of its major role in the industry. The decisions at CMS impact each person’s health care experience, whether they are aware of this influence or even whether they utilize Medicare for their health insurance or not.
For part of my job, I support one of the CMS Innovation Center’s voluntary payment model tests called Bundled Payments for Care Improvement (BPCI) Advanced. The Innovation Center tests payment and care delivery models like this to evaluate incentives for improving quality care and lowering healthcare costs delivered to Medicare beneficiaries. In this project, I explore the participants (i.e., providers, hospitals) in Innovation Center models, especially participants in BPCI models. For the BPCI models, the participants are acute care hospitals (ACHs) or physician group practices (PGPs). These participants choose which clinical episodes they want CMS to evaluate and pay them a bundled payment amount for. For example, a hospital may agree to participate in the congestive heart failure (CHF) clinical episode. This means that for a Medicare beneficiary who triggers a CHF episode at this ACH, CMS will pay the hospital a set dollar amount to care for that beneficiary within a 90-day window. If the hospital provides CHF care efficiently, reports on completing activities, provides services, with positive outcomes, they can do this for less cost than what CMS pays, thereby making a profit. If the hospital provides CHF care poorly, reports that they did not complete certain activities, did not provide services, and the beneficiary had negative outcomes, CMS reduces the payment so the hospital is penalized or loses money. This is a simplified explanation of a very complex payment system.
For my analysis, I begin by looking at the number of clinical episodes that participants select within each of the BPCI models. There are additional resources and information available at CMS’ bundled payments website shown here: https://innovation.cms.gov/initiatives/Bundled-Payments/
I will use functions from the dplyr package like filter(), group_by(), and summarize() to manipulate the data.
#install.packages("tidyverse")
library(tidyverse)
## -- Attaching packages ----------------------------------------------------------------------------------------- tidyverse 1.3.0 --
## <U+2713> ggplot2 3.2.1 <U+2713> purrr 0.3.3
## <U+2713> tibble 2.1.3 <U+2713> dplyr 0.8.3
## <U+2713> tidyr 1.0.0 <U+2713> stringr 1.4.0
## <U+2713> readr 1.3.1 <U+2713> forcats 0.4.0
## -- Conflicts -------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(readr)
library(ggplot2)
allparticipants<-read_csv("CMSdata.csv") #this pulls my csv file from my working directory
## Parsed with column specification:
## cols(
## `Name of Initiative` = col_character(),
## `Organization Name` = col_character(),
## Notes = col_character(),
## `Location 1` = col_character(),
## `Street Address` = col_character(),
## City = col_character(),
## State = col_character(),
## `State Based` = col_logical(),
## `Phase 2` = col_character(),
## Facebook = col_character(),
## Twitter = col_character(),
## Youtube = col_character(),
## Website = col_character(),
## Category = col_character(),
## `Unique ID` = col_double()
## )
dim(allparticipants) #this gives the dimensions of the dataset, which is the number of rows and columns
## [1] 7886 15
This dataset has 7,886 records with 15 variables.
summary(allparticipants) #get a summary of the dataset
## Name of Initiative Organization Name Notes Location 1
## Length:7886 Length:7886 Length:7886 Length:7886
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Street Address City State State Based
## Length:7886 Length:7886 Length:7886 Mode :logical
## Class :character Class :character Class :character FALSE:315
## Mode :character Mode :character Mode :character TRUE :1
## NA's :7570
##
##
## Phase 2 Facebook Twitter Youtube
## Length:7886 Length:7886 Length:7886 Length:7886
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Website Category Unique ID
## Length:7886 Length:7886 Min. : 1
## Class :character Class :character 1st Qu.:1973
## Mode :character Mode :character Median :3970
## Mean :3976
## 3rd Qu.:5981
## Max. :7976
The variables are primarily categorical. R is seeing the Unique ID variable as a quantitative variable so it has calcuated the statistics for this field but this field is really just an identifier or categorical variable. I could adjust this variable but I do not plan to use it for this analysis.
The Notes column was a field of free text (a string) which had content for only a few of the initiatives. I cleaned up the Notes column using OpenRefine https://openrefine.org/ which is an open source application for data wrangling. I extracted the number of clinical episodes each participant selected in the BPCI initiatives into a separate column that I named “Number of Clinical Episodes”.
I used OpenRefine to:
Pull the refined dataset into R, overwriting the previous dataset called allparticipants.
allparticipants <- read_csv("CMSdata2.csv")
## Parsed with column specification:
## cols(
## Initiative = col_character(),
## `Organization Name` = col_character(),
## `Number of Clinical Episodes` = col_double(),
## `Location 1` = col_character(),
## `Street Address` = col_character(),
## City = col_character(),
## State = col_character(),
## `State Based` = col_logical(),
## `Phase 2` = col_logical(),
## Facebook = col_character(),
## Twitter = col_character(),
## Youtube = col_character(),
## Website = col_character(),
## Category = col_character(),
## `Unique ID` = col_double()
## )
## Warning: 979 parsing failures.
## row col expected actual file
## 1889 Phase 2 1/0/T/F/TRUE/FALSE Major joint replacement of the lower extremity 'CMSdata2.csv'
## 1890 Phase 2 1/0/T/F/TRUE/FALSE Major joint replacement of the lower extremity 'CMSdata2.csv'
## 1891 Phase 2 1/0/T/F/TRUE/FALSE Major joint replacement of the lower extremity 'CMSdata2.csv'
## 1892 Phase 2 1/0/T/F/TRUE/FALSE Major bowel procedure 'CMSdata2.csv'
## 1893 Phase 2 1/0/T/F/TRUE/FALSE Chronic obstructive pulmonary disease, bronchitis, asthma 'CMSdata2.csv'
## .... ....... .................. ......................................................... ..............
## See problems(...) for more details.
R is returning a warning note for 979 parsing failures in the column called “Phase 2”. These rows are the only records with values (strings) in this field. R is trying to read this column as a logical value. I will use the dplyr “mutate” command with as.character to convert these values to characters to dismiss this problem.
allparticipants <- allparticipants %>%
mutate_if(is.logical, as.character) #this changes all the columns/fields in the dataframe from logical (if applicable) to character, which meets my needs for this dataset.
names(allparticipants) <- tolower(names(allparticipants)) #makes the column headings lowercase
names(allparticipants) <- gsub(" ","",names(allparticipants)) #removes the space in the headings, if applicable.
head(allparticipants)
## # A tibble: 6 x 15
## initiative organizationname numberofclinica… location1 streetaddress city
## <chr> <chr> <dbl> <chr> <chr> <chr>
## 1 Comprehen… Brown Medicine … NA 375 Wamp… 375 Wampanoa… East…
## 2 Comprehen… Brown Medicine … NA 375 Wamp… 375 Wampanoa… East…
## 3 Comprehen… Brown Medicine … NA 375 Wamp… 375 Wampanoa… East…
## 4 Comprehen… Brown Medicine … NA 909 Nort… 909 North Ma… Prov…
## 5 Comprehen… Coastal Medical… NA 11445 Wa… 11445 Wampan… East…
## 6 Comprehen… LPG Primary Car… NA 50 Memor… 50 Memorial … Newp…
## # … with 9 more variables: state <chr>, statebased <chr>, phase2 <chr>,
## # facebook <chr>, twitter <chr>, youtube <chr>, website <chr>,
## # category <chr>, uniqueid <dbl>
Look at the summary of the dataset again.
summary(allparticipants)
## initiative organizationname numberofclinicalepisodes
## Length:7886 Length:7886 Min. : 1.000
## Class :character Class :character 1st Qu.: 2.000
## Mode :character Mode :character Median : 4.000
## Mean : 7.023
## 3rd Qu.: 9.000
## Max. :47.000
## NA's :5612
## location1 streetaddress city state
## Length:7886 Length:7886 Length:7886 Length:7886
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## statebased phase2 facebook twitter
## Length:7886 Length:7886 Length:7886 Length:7886
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## youtube website category uniqueid
## Length:7886 Length:7886 Length:7886 Min. : 1
## Class :character Class :character Class :character 1st Qu.:1973
## Mode :character Mode :character Mode :character Median :3970
## Mean :3976
## 3rd Qu.:5981
## Max. :7976
##
Now I have a quantitative variable for the number of clinical episodes column. One item I noticed was the number of clinical episodes selected by participants. But first, a little more background on the models and the episodes. The first BPCI model (Model 1) included all Medicare Severity Diagnosis Related Groups (MS-DRGs). I did not focus my analysis on this model. The BPCI Models 2-4 which began in 2013 included up to 48 clinical episodes while the most recent iteration, BPCI Advanced, has 31 clinical episodes. These include inpatient episodes for things like congestive heart failure, major joint replacement of the lower extremity, pacemaker, and stroke, and outpatient episodes for percutaneous coronary intervention, cardiac defibrillator, and back & neck spinal fusion. Additional information about the BPCI models is available on CMS websites such as https://innovation.cms.gov/initiatives/Bundled-Payments/ and https://innovation.cms.gov/initiatives/bpci-advanced .
The reason I find the number of episodes selected interesting is that the max value in this dataset (47) means that at least 1 hospital told CMS they would be held financiall y accountable for 47 of the available 48 clinical episodes. This is somewhat surprising since hospitals often specialize or focus in some areas more than others. A hospital selecting these 47 episodes would indicate that they were confident in their overall efficiency of care, the quality of care, positive outcomes for patients, and cost savings across their facility and network of providers. This is one potential area I could investigate more.
#this chunk says to group the df by the initiative column then
#take the variable called numberofclinicalepisodes and apply the function mean and remove the nas when calculating
#then I filtered out the nas so that I only had the initiatives with clinical episodes selected.
#I assigned that summarization to something called bpcimodels.
bpcimodels <- summarise_at(group_by(allparticipants, initiative),vars(numberofclinicalepisodes),funs(mean(.,na.rm=TRUE))) %>%
filter(!is.na(numberofclinicalepisodes)) %>% #filter here says to take all rows that do not have an 'na' value in the numberofclinicalepisodes variable.
mutate_if(is.numeric,round, digits=1) #this rounds the mean of episodes to 1 decimal place.
## Warning: funs() is soft deprecated as of dplyr 0.8.0
## Please use a list of either functions or lambdas:
##
## # Simple named list:
## list(mean = mean, median = median)
##
## # Auto named with `tibble::lst()`:
## tibble::lst(mean, median)
##
## # Using lambdas
## list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once per session.
#that leaves me with the 4 BPCI models with their mean number of clinical episodes selected.
#This attempt made a separate variable called roundedCE. Later I had trouble integrating this into another plot so I used the mutate function above instead of this part.
#bpcimodels <- bpcimodels %>%
# (round(bpcimodels[,2,drop=TRUE],digits=1))
#roundedCE <- round(bpcimodels[,2,drop=TRUE],digits=1) #this rounds off the digits to 1 decimal for my next plot
bpcimodels #this displays the df in a tibble
## # A tibble: 4 x 2
## initiative numberofclinicalepisodes
## <chr> <dbl>
## 1 BPCI Advanced 6
## 2 BPCI Initiative: Model 2 5.4
## 3 BPCI Initiative: Model 3 10.5
## 4 BPCI Initiative: Model 4 6.5
This tibble shows the average number of clinical episodes chosen for each of the four BPCI initiatives, rounded to 1 decimal place.
I tried pulling in a color scheme to use to try to keep my variable colors consistent throughout but could not get this part to work for me here. Later, I identified the color codes and manually assigned the colors to the appropriate initiative. I also reordered the initiatives so they would display in chronological order.
#install.packages("plotly")
#install.packages("ggthemes")
#install.packages("RColorBrewer")
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(ggthemes)
#library(RColorBrewer)
#mycolors <- brewer.pal(5, "Set2")
#names(mycolors) <- levels(bpcimodels$initiative)
#colScale <- scale_color_manual(name="initiative", values=mycolors)
bpcimodels$initiative <- factor(bpcimodels$initiative, levels=c("BPCI Initiative: Model 2", "BPCI Initiative: Model 3", "BPCI Initiative: Model 4", "BPCI Advanced")) #this lists them in the correct order
bpciepisodes <- ggplot(bpcimodels,aes(x=initiative,y=numberofclinicalepisodes)) + #, fill=initiative)) +
geom_bar(stat="identity") +
aes(fill=initiative) +
theme_classic() +
labs(x="Initiative", y="Average Number of Clinical Episodes Chosen", title="Average Number of Clinical Episodes within each Initiative") +
theme(legend.title=element_blank()) + #removes the title of the legend
scale_fill_manual(values=c("#F8766D","#7CAE00","#00BFC4","#C77CFF")) #this makes the colors in the order I wanted them.
ggplotly(bpciepisodes) #using ggplotly makes the labels appear when you hover
#In this section I was trying to adjust the colors but ended up not doing it this way.
#bpciepisodes <- bpcimodels %>% #this uses the small dataframe of just the bpci models
#ggplot(aes(x=initiative, y=roundedCE,fill=initiative)) + #this uses the rounded off number of clinical episodes
#geom_bar(stat="identity") +
#theme_stata() + #I used a different theme
#labs(x="Initiative", y="Average Number of Clinical Episodes Chosen", title="Average Number of Clinical Episodes within each Initiative") #+
# colScale #this is where the colors did not work
#ggplotly(bpciepisodes) #using ggplotly makes the labels appear when you hover
#install.packages("ggthemes")
#install.packages("ggrepel")
library(ggthemes)
library(ggrepel)
bpciepisodes <- ggplot(bpcimodels, aes(initiative, numberofclinicalepisodes, label=initiative, fill=initiative)) +
geom_text_repel(nudge_x = 0.005) + #this nudges the x axis labels a bit
geom_bar(stat="identity") +
theme_stata() + #I tried a different theme here
labs(x="Initiative", y="Average Number of Clinical Episodes Chosen", title="Average Number of Clinical Episodes within each Initiative")
ggplotly(bpciepisodes)
## Warning in geom2trace.default(dots[[1L]][[4L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
## If you'd like to see this geom implemented,
## Please open an issue with your example code at
## https://github.com/ropensci/plotly/issues
## Warning in geom2trace.default(dots[[1L]][[4L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
## If you'd like to see this geom implemented,
## Please open an issue with your example code at
## https://github.com/ropensci/plotly/issues
## Warning in geom2trace.default(dots[[1L]][[4L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
## If you'd like to see this geom implemented,
## Please open an issue with your example code at
## https://github.com/ropensci/plotly/issues
## Warning in geom2trace.default(dots[[1L]][[4L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
## If you'd like to see this geom implemented,
## Please open an issue with your example code at
## https://github.com/ropensci/plotly/issues
#the geom_text_repel command looks for a label so I added that into the aesthetics. However, this also added another duplication of the hovering label.
#install.packages("ggthemes")
#install.packages("ggrepel")
library(ggthemes)
library(ggrepel)
bpciepisodes <- ggplot(bpcimodels, aes(initiative, numberofclinicalepisodes, fill=initiative)) +
#aes(fill=initiative) +
geom_bar(stat="identity") +
theme(legend.position="none") + #removes the legend completely
#theme_stata() + #I had to remove the theme because I could not rotate the y labels when this was on
labs(x="", y="",title="Average Number of Clinical Episodes within each Initiative")
ggplotly(bpciepisodes)
This view is better, although the hovering labels are still duplicated. However, the y axis labels are now rotated for easier viewing.
#install.packages("rcompanion") # this has the groupwiseMean command which I use to get the statistics across each of the BPCI participants' selection of clinical episodes.
library(rcompanion)
bpciparticipants <- allparticipants %>%
filter(!is.na(numberofclinicalepisodes))
CEmean <- groupwiseMean(data=bpciparticipants, var="numberofclinicalepisodes", group="initiative",conf=0.95,digits=3)
CEmean #this is now the mean of the number of clinical episodes for each of the 4 initiatives.
## initiative n Mean Conf.level Trad.lower Trad.upper
## 1 BPCI Advanced 1295 5.95 0.95 5.63 6.27
## 2 BPCI Initiative: Model 2 402 5.43 0.95 4.83 6.03
## 3 BPCI Initiative: Model 3 575 10.50 0.95 9.65 11.40
## 4 BPCI Initiative: Model 4 2 6.50 0.95 -63.40 76.40
summary(CEmean)
## initiative n Mean Conf.level
## Length:4 Min. : 2.0 Min. : 5.430 Min. :0.95
## Class :character 1st Qu.: 302.0 1st Qu.: 5.820 1st Qu.:0.95
## Mode :character Median : 488.5 Median : 6.225 Median :0.95
## Mean : 568.5 Mean : 7.095 Mean :0.95
## 3rd Qu.: 755.0 3rd Qu.: 7.500 3rd Qu.:0.95
## Max. :1295.0 Max. :10.500 Max. :0.95
## Trad.lower Trad.upper
## Min. :-63.400 Min. : 6.030
## 1st Qu.:-12.227 1st Qu.: 6.210
## Median : 5.230 Median : 8.835
## Mean :-10.822 Mean :25.025
## 3rd Qu.: 6.635 3rd Qu.:27.650
## Max. : 9.650 Max. :76.400
So this gives the Mean of the number of clinical episodes with the quartiles.
A standard error is for a sample and standard deviation is for an entire population. I have the whole population of participants in the models. I can plot the standard error calculated above.
CEmean$initiative <- factor(CEmean$initiative, levels=c("BPCI Initiative: Model 2", "BPCI Initiative: Model 3", "BPCI Initiative: Model 4", "BPCI Advanced")) #this lists them in the correct order
CEmeanplot <- ggplot(CEmean,aes(x=initiative, y=Mean, fill=initiative)) +
geom_bar(stat="identity") +
geom_errorbar(aes(ymin=Trad.lower, ymax=Trad.upper,width=0.15))
CEmeanplot #this plots the means of the numberofclinicalepisodes variable grouped by initiative
The BPCI Model 4 was only selected by 2 participant hospitals: 1 hospital reported 1 clinical episode and the other hospital reported on 12 clinical episodes making for a mean of 6.50 for the numberofclinicalepisodes in this initiative.
#install.packages(data.table) # I tried an outlier.replace version first which did not work well for me.
library(data.table)
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
##
## between, first, last
## The following object is masked from 'package:purrr':
##
## transpose
#Sum %>% outlier.replace(Sum,"numberofclinicalepisodes", which(numberofclinicalepisodes < 3), newValue=NA)
#Sum %>% filter(!is.na(numberofclinicalepisodes))
statson3models <- CEmean %>% filter(n >= 3) #this filters out model #4 so the plot is easier to read. the filter is n greater than or equal to 3 which works since there were only 2 participating hospitals.
statson3models <- ggplot(statson3models,aes(x=initiative, y=Mean)) +
geom_bar(stat="identity", aes(fill=initiative)) +
geom_errorbar(aes(ymin=Trad.lower, ymax=Trad.upper,width=0.15)) +
theme(legend.title=element_blank()) + #removes the title of the legend
scale_fill_manual(values=c("#F8766D","#7CAE00","#C77CFF")) #this makes the colors I want in the order I want them.
ggplotly(statson3models)
Instead, I will move on to a box and whisker plot and look at some outliers.
boxyplot1 <- allparticipants %>%
ggplot() +
geom_boxplot(aes(y=numberofclinicalepisodes, fill=initiative), notch=TRUE) #the notch puts indents at the mean.
boxyplot1
## Warning: Removed 5612 rows containing non-finite values (stat_boxplot).
## notch went outside hinges. Try setting notch=FALSE.
boxyplot2 <- allparticipants %>%
ggplot() +
geom_boxplot(aes(y=numberofclinicalepisodes, fill=initiative), outlier.colour="red", outlier.shape=6, outlier.size=2) +
theme(axis.text.x=element_blank())
#these set the color, shape, and size of the indicator for each outlier and I removed the values on the x axis
boxyplot2
## Warning: Removed 5612 rows containing non-finite values (stat_boxplot).
ggplot_build(boxyplot2)$data # this is telling me the color codes for the 4 initiative models
## Warning: Removed 5612 rows containing non-finite values (stat_boxplot).
## [[1]]
## fill ymin lower middle upper ymax
## 1 #F8766D 1 2.00 4.0 8.00 17
## 2 #7CAE00 1 1.00 3.0 8.00 18
## 3 #00BFC4 1 2.00 7.0 15.00 33
## 4 #C77CFF 1 3.75 6.5 9.25 12
## outliers
## 1 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 21, 22, 22, 22, 23, 23, 24, 24, 25, 26, 26, 27, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 32, 32, 32, 32
## 2 19, 20, 20, 21, 21, 22, 22, 23, 24, 25, 26, 26, 28, 29, 30, 31, 31, 32
## 3 36, 37, 37, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 39, 39, 40, 41, 41, 42, 43, 43, 43, 45, 47, 47
## 4
## notchupper notchlower x PANEL group ymin_final ymax_final xmin
## 1 4.263435 3.7365650 -0.28125 1 4 1 32 -0.365625
## 2 3.551623 2.4483773 -0.09375 1 5 1 32 -0.178125
## 3 7.856577 6.1434228 0.09375 1 6 1 47 0.009375
## 4 12.644758 0.3552421 0.28125 1 7 1 12 0.196875
## xmax weight colour size alpha shape linetype
## 1 -0.196875 1 grey20 0.5 NA 19 solid
## 2 -0.009375 1 grey20 0.5 NA 19 solid
## 3 0.178125 1 grey20 0.5 NA 19 solid
## 4 0.365625 1 grey20 0.5 NA 19 solid
The initiatives are listed in the wrong order (so mixed up in the color scheme). And I need to remove the legend label.
allparticipants$initiative <- factor(allparticipants$initiative, levels=c("BPCI Initiative: Model 2", "BPCI Initiative: Model 3", "BPCI Initiative: Model 4", "BPCI Advanced")) #this lists them in the correct order
boxyplot3 <- allparticipants %>%
ggplot() +
geom_boxplot(aes(y=numberofclinicalepisodes, fill=initiative), outlier.colour="red", outlier.shape=6, outlier.size=2) +
labs(fill="") + #this removes the label for the legend
theme(axis.text.x=element_blank()) #this removes the axis labels
boxyplot3
## Warning: Removed 5612 rows containing non-finite values (stat_boxplot).
ggplot_build(boxyplot3)$data # this is telling me the color codes for the 4 initiative models
## Warning: Removed 5612 rows containing non-finite values (stat_boxplot).
## [[1]]
## fill ymin lower middle upper ymax
## 1 #F8766D 1 1.00 3.0 8.00 18
## 2 #7CAE00 1 2.00 7.0 15.00 33
## 3 #00BFC4 1 3.75 6.5 9.25 12
## 4 #C77CFF 1 2.00 4.0 8.00 17
## outliers
## 1 19, 20, 20, 21, 21, 22, 22, 23, 24, 25, 26, 26, 28, 29, 30, 31, 31, 32
## 2 36, 37, 37, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 39, 39, 40, 41, 41, 42, 43, 43, 43, 45, 47, 47
## 3
## 4 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 21, 22, 22, 22, 23, 23, 24, 24, 25, 26, 26, 27, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 32, 32, 32, 32
## notchupper notchlower x PANEL group ymin_final ymax_final xmin
## 1 3.551623 2.4483773 -0.28125 1 1 1 32 -0.365625
## 2 7.856577 6.1434228 -0.09375 1 2 1 47 -0.178125
## 3 12.644758 0.3552421 0.09375 1 3 1 12 0.009375
## 4 4.263435 3.7365650 0.28125 1 4 1 32 0.196875
## xmax weight colour size alpha shape linetype
## 1 -0.196875 1 grey20 0.5 NA 19 solid
## 2 -0.009375 1 grey20 0.5 NA 19 solid
## 3 0.178125 1 grey20 0.5 NA 19 solid
## 4 0.365625 1 grey20 0.5 NA 19 solid
boxyplot4 <- allparticipants %>%
filter(initiative == "BPCI Initiative: Model 3") %>% #this filter chooses only Model 3
ggplot() +
geom_boxplot(aes(y=numberofclinicalepisodes, fill=initiative), outlier.colour="red", outlier.shape=5, outlier.size=1) +
labs(fill="") + #this removes the label for the legend - another way to do this
scale_fill_manual(values=c("#7CAE00")) + #this is assigning the specific color for Model 3 so it matches model 3 plots above
theme(axis.text.x=element_blank())
boxyplot4
ggplot_build(boxyplot4)$data #this tells me the color code
## [[1]]
## fill ymin lower middle upper ymax
## 1 #7CAE00 1 2 7 15 33
## outliers
## 1 36, 37, 37, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 39, 39, 40, 41, 41, 42, 43, 43, 43, 45, 47, 47
## notchupper notchlower x PANEL group ymin_final ymax_final xmin xmax xid
## 1 7.856577 6.143423 0 1 1 1 47 -0.375 0.375 1
## newx new_width weight colour size alpha shape linetype
## 1 0 0.75 1 grey20 0.5 NA 19 solid
I learned if I use plotly on these plots then the coding for the outliers does not display - they stay as black points. The plot above shows that BPCI Model 3 had an average of about 10.5 clinical episodes
highepisodes <- allparticipants %>%
filter(initiative == "BPCI Initiative: Model 3") %>%
filter(numberofclinicalepisodes>40) %>%
arrange(desc(numberofclinicalepisodes)) #this sorts the number of clinical episodes selected from highest to lowest
highepisodes
## # A tibble: 9 x 15
## initiative organizationname numberofclinica… location1 streetaddress city
## <fct> <chr> <dbl> <chr> <chr> <chr>
## 1 BPCI Init… Bayley 47 990 Bayl… 990 Bayley P… Cinc…
## 2 BPCI Init… Marjorie P. Lee… 47 3550 Sha… 3550 Shaw Ave Cinc…
## 3 BPCI Init… United Church H… 45 215 Seth… 215 Seth Ave Jack…
## 4 BPCI Init… United Church H… 43 3800 Boa… 3800 Boardwa… Sand…
## 5 BPCI Init… United Church H… 43 3218 Ind… 3218 Indian … Beav…
## 6 BPCI Init… Gardner Heights… 43 172 Rock… 172 Rocky Re… Shel…
## 7 BPCI Init… United Church H… 42 12200 St… 12200 Straus… Cana…
## 8 BPCI Init… Friendship Vill… 41 6000 Riv… 6000 Riversi… Dubl…
## 9 BPCI Init… Twin Lakes - Li… 41 9840 Mon… 9840 Montgom… Cinc…
## # … with 9 more variables: state <chr>, statebased <chr>, phase2 <chr>,
## # facebook <chr>, twitter <chr>, youtube <chr>, website <chr>,
## # category <chr>, uniqueid <dbl>
This filtering shows that 9 of the BPCI Model 3 organizations chose to report on more than 40 clinical episodes. Further research reveals that 8 of them are located in Ohio and 1 is located in Connecticut.
plots2models <- bpciparticipants %>%
filter(initiative != "BPCI Initiative: Model 2") %>%
filter(initiative != "BPCI Initiative: Model 4")
plots2models$initiative <- factor(plots2models$initiative, levels=c("BPCI Initiative: Model 3", "BPCI Advanced")) #this lists them in the correct order
inmanymodels <- ggplot(plots2models) +
geom_bar(aes(x=initiative)) +
labs(title= "Number of Participants", x = "Initiative", y="") +
scale_fill_manual(values=c("#7CAE00","C77CFF"))
ggplotly(inmanymodels)
I like this as a clean chart but I could add colors and better labels to make it more appealing. This plot shows the change in number of participants from Models 2, 3, 4 and then Advanced. There are many items to investigate here: what changed in Model 4 that there were only 2 participants? Was it meant to be such a small trial? What changes were made to make such an increase in the number of participants for the Advanced model over Models 2 and 3?
bpciparticipants$initiative <- factor(bpciparticipants$initiative, levels=c("BPCI Initiative: Model 2", "BPCI Initiative: Model 3", "BPCI Initiative: Model 4", "BPCI Advanced")) #this lists them in the correct order
modelsepisodescounts <- ggplot(bpciparticipants) +
geom_bar(aes(x=numberofclinicalepisodes, fill=initiative), position="dodge") +
labs(x= "Number of Clinical Episodes", y= "Number of Participants") +
labs(title= "BPCI Models: Participation and Number of Episodes", fill= "Initiative") +
theme_classic()
ggplotly(modelsepisodescounts)
This plot shows many things such as how the vast majority of participants in the Advanced model chose only about 1 clinical episode. It is also interesting that there were 4 participants who chose 31 (the max) clinical episodes. These are things that could be researched further.
With more time and effort I would like to improve on this visualization. I would like to fix the hovering labels, round off the number of episodes in the hovering labels and see if I could improve on the visualization to make it more meaningful. The bars are a bit narrow and difficult to view but I do really like the ability to select and deselect from the legend.
#install.packages("devtools")
library(devtools)
## Loading required package: usethis
modelsepisodescounts2 <- ggplot(bpciparticipants) +
#axis.title.y=element_text(angle=90,margin=margin(r=0.8*half_line,l=0.9*half_line/2)) +
theme(axis.text.y.left = element_text(margin = unit(c(1, 1, 1, 5), "mm"))) +
geom_bar(aes(x=numberofclinicalepisodes, fill=initiative), position="dodge") +
labs(x= "Number of Clinical Episodes", y= "Number of Participants") +
labs(title= "BPCI Models: Participation and Number of Episodes", fill= "Initiative") +
theme_classic() +
facet_wrap(~initiative)
modelsepisodescounts2 <- modelsepisodescounts2 +
theme(legend.position = "none")
ggplotly(modelsepisodescounts2)
bpciparticipants2 <- bpciparticipants[bpciparticipants$initiative != "BPCI Initiative: Model 2",]
bpciparticipants2 <- bpciparticipants2[bpciparticipants2$initiative != "BPCI Initiative: Model 4",]
modelsepisodescounts3 <- ggplot(bpciparticipants2, aes(x=numberofclinicalepisodes)) +
#geom_bar(aes(x=numberofclinicalepisodes, fill=initiative), position="dodge") +
geom_bar(aes(fill=initiative)) +
labs(x= "Number of Clinical Episodes", y= "Number of Participants") +
labs(title= "BPCI Models: Participation and Number of Episodes", fill= "Initiative") +
theme_classic() +
facet_wrap(~initiative)
modelsepisodescounts3 <- modelsepisodescounts3 +
theme(legend.position = "none")
ggplotly(modelsepisodescounts3)
byinitiative <- read_csv("CMSdata2.csv") #since I have altered my dataset so much, I am going back to my cleaned dataset.
## Parsed with column specification:
## cols(
## Initiative = col_character(),
## `Organization Name` = col_character(),
## `Number of Clinical Episodes` = col_double(),
## `Location 1` = col_character(),
## `Street Address` = col_character(),
## City = col_character(),
## State = col_character(),
## `State Based` = col_logical(),
## `Phase 2` = col_logical(),
## Facebook = col_character(),
## Twitter = col_character(),
## Youtube = col_character(),
## Website = col_character(),
## Category = col_character(),
## `Unique ID` = col_double()
## )
## Warning: 979 parsing failures.
## row col expected actual file
## 1889 Phase 2 1/0/T/F/TRUE/FALSE Major joint replacement of the lower extremity 'CMSdata2.csv'
## 1890 Phase 2 1/0/T/F/TRUE/FALSE Major joint replacement of the lower extremity 'CMSdata2.csv'
## 1891 Phase 2 1/0/T/F/TRUE/FALSE Major joint replacement of the lower extremity 'CMSdata2.csv'
## 1892 Phase 2 1/0/T/F/TRUE/FALSE Major bowel procedure 'CMSdata2.csv'
## 1893 Phase 2 1/0/T/F/TRUE/FALSE Chronic obstructive pulmonary disease, bronchitis, asthma 'CMSdata2.csv'
## .... ....... .................. ......................................................... ..............
## See problems(...) for more details.
names(byinitiative) <- tolower(names(byinitiative)) #makes the column heading lowercase
names(byinitiative) <- gsub(" ","", names(byinitiative)) #removes spaces from headings, if applicable
head(subset(byinitiative, select = "initiative"))
## # A tibble: 6 x 1
## initiative
## <chr>
## 1 Comprehensive Primary Care Plus
## 2 Comprehensive Primary Care Plus
## 3 Comprehensive Primary Care Plus
## 4 Comprehensive Primary Care Plus
## 5 Comprehensive Primary Care Plus
## 6 Comprehensive Primary Care Plus
#install.packages("plyr")
#library(plyr)
#count(byinitiative, "initiative")
#plotinitiative <- byinitiative %>%
#group_by(initiative) +
#count(byinitiative, "initiative") %>%
# ggplot() +
# geom_bar(aes(x=initiative, y=count(initiative)))
#ggplotly(plotinitiative)
#initiativeplot <- byinitiative %>%
# ggplot()
#ggplotly(byinitiative)
compareintitiative
#install.packages("ggthemes")
#install.packages("ggrepel")
#library(ggthemes)
#library(ggrepel)
#bpciepisodes <- ggplot(bpcimodels,aes(initiative)) +
#geom_bar(stat="identity") +
#theme(legend.position="none") + #removes the legend completely
#theme_stata() + #I had to remove the theme because I could not rotate the y labels when this was on
#labs(x="", y="",title="Average Number of Clinical Episodes within each Initiative")
#bpciepisodes + geom_bar(stat="identity")