Medicare costs are increasing at 9% every year, outpacing private health insurance spending by 4.6%. Medicare, additionally, accounts for 15% of the United States federal budget and for 20% of health care expenditure. An understanding of where the money is being spent within Medicare is crucial for understanding how to reduce the overall healthcare bill in the United States.
The Centers for Medicare Services (CMS) prepares a dataset every year called the Provider Utilization and Payment Data Inpatient Public Use File (Inpatient PUF) to help improve transparency of the healthcare system.
The dataset discussed in this report includes information on services provided to those eligible for Medicare for the Fiscal Year 2014.
This document will define the items tracked within this PUF in Data Description section. Next, we will import the data and look at the integrity of the data and the types of observations for each data column. The Data Cleaning section will respond to the integrity and manipulate the data in a format that will make analysis simple.
This document will also investigate where Medicare charges are originating, analyzing providers, specific charges and the parts of the body that put the most burden on Medicare to shed some light on where preventative measures could be taken.
The column categories in this dataset are described below:
DRG.Definition: The DRG code describes the MS-DRG code for each line item. The MS-DRG (Medicare Severity Diagnosis Related Groups) coding system is used by Medicare to group, price and reimburse Medicare claims. Since this dataset concerns inpatients only, a MS-DRG is assigned to each patient. The MS-DRG ranges from 001-999 with many numbers unused to accommodate future codes. Some diagnoses have multiple MS-DRGs that include MCC, CC or no CC/MCC. MCC is a major complication, while CC is a less significant complication that occurs during the inpatient stay.
Example of MS-DRG with complication coding:
A DRG.Definition code is also grouped via a separate table as one of 25 Major Diagnostic Categories (MDC). Later, we will import a table that categorizes and join these tables together.
Examples of MDC categories:
Provider.ID: The Provider.ID is a CMS Certification Number (CCN) assigned to each facility. A Mayo Clinic in Rochester, MN will have a different Provider.ID than a Mayo Clinic in Jacksonville, FL.
Provider.Name: The Provider.Name is the name of the facility providing Medicare treatment.
Provider.Street.Address: The Provider.Address is the street address of the facility providing Medicare treatment.
Provider.City: The Provider.City is the city of the provider facility.
Provider.State: The Provider.State is the state of the provider facility.
Provider.Zip.Code: The Provider.Zip.Code is the Zip Code of the provider facility.
Hospital.Referral.Region: The Hospital.Referral.Region (HRRs) are geographic units of analysis based on facility location zip codes that were developed by the Dartmouth Atlas of Health Care to delineate regional health care markets in the United States.
Total.Discharges: The number of discharges billed by the provider for inpatient services.
Average.Covered.Charges: The average charge/discharge for service from the provider.
Average.Total.Payments: The average payment made to the provider including co-payment and deductible amounts that the patient pays as well as additional third-party payments for coordination of benefits.
Average.Medicare.Payments: The average payment that Medicare pays to the provider not including beneficiary co-payments/deductible nor any additional payments to third-parties for coordination of benefits.
The packages required for the code in this report are detailed in the code chunk below:
library(tidyverse) #tidyverse imports ggplot2 and dplyr (charting/manipulation)
library(stringr) #will be used to extract medical codes from categories
library(purrr) #the map function is used to customize functions and apply on df
library(rvest) #importing MDC categorization table
library(knitr) #table display
The data is available at the below url: https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Inpatient.html
We will import the FY 2014 data as mentioned in the Synopsis. Since this is a zip file, we will assign a temp file and dump the zip file into this temp file. Using unz() we will open the 2014_CSV file and then unlink the temp file.
#let's import the zipfile from the cms.gov website by downloading to a temp file and unzipping the file.
url <- "https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Downloads/Inpatient_Data_2014_CSV.zip"
temp <- tempfile()
download.file(url, temp)
df <- read.csv(unz(temp, "Medicare_Provider_Charge_Inpatient_DRGALL_FY2014.csv"))
unlink(temp)
#convert to a tibble for data verification purposes
df<- as_tibble(df)
Now let’s import the MDC codes using hte rvest package for scraping a table from Wikipedia. Some codes were missing from the table on Wikipedia and so we manually edit these after the import.
**link: https://en.wikipedia.org/wiki/Major_Diagnostic_Category**
url <- "https://en.wikipedia.org/wiki/Major_Diagnostic_Category"
mdc <- url %>%
html() %>%
html_nodes(xpath='//*[@id="mw-content-text"]/table') %>%
html_table()
mdc <- mdc[[1]]
Missing values: As this is a transparency-driven government generated summary table for payments, the expectation for data integrity is high. The user guide does not call out missing values and there are no null values in the dataset. Additionally, we explore each variable to ensure that the values make sense below.
sum(is.na(df))
## [1] 0
We use the summary() function to look at the different variable columns and check the ints for odd values. Of note is that we have a zip code with 4 digits and a provider ID of 5 digits. Let’s check these out.
summary(df)
## DRG.Definition
## 194 - SIMPLE PNEUMONIA & PLEURISY W CC : 2837
## 871 - SEPTICEMIA OR SEVERE SEPSIS W/O MV 96+ HOURS W MCC : 2837
## 292 - HEART FAILURE & SHOCK W CC : 2774
## 392 - ESOPHAGITIS, GASTROENT & MISC DIGEST DISORDERS W/O MCC : 2743
## 690 - KIDNEY & URINARY TRACT INFECTIONS W/O MCC : 2731
## 470 - MAJOR JOINT REPLACEMENT OR REATTACHMENT OF LOWER EXTREMITY W/O MCC: 2702
## (Other) :186032
## Provider.Id Provider.Name
## Min. : 10001 GOOD SAMARITAN HOSPITAL : 554
## 1st Qu.:110082 BAPTIST MEDICAL CENTER : 460
## Median :240093 FLORIDA HOSPITAL : 413
## Mean :255323 NEW YORK-PRESBYTERIAN HOSPITAL: 395
## 3rd Qu.:380014 MERCY HOSPITAL : 393
## Max. :670088 SAINT FRANCIS MEDICAL CENTER : 374
## (Other) :200067
## Provider.Street.Address Provider.City
## 601 E ROLLINS ST : 413 CHICAGO : 1878
## 525 EAST 68TH STREET : 395 BALTIMORE : 1793
## 20 YORK ST : 330 NEW YORK : 1521
## ONE BARNES-JEWISH HOSPITAL PLAZA: 321 HOUSTON : 1477
## 4755 OGLETOWN-STANTON ROAD : 317 PHILADELPHIA: 1416
## 1216 SECOND STREET SOUTHWEST : 315 SPRINGFIELD : 1323
## (Other) :200565 (Other) :193248
## Provider.State Provider.Zip.Code
## CA : 15387 Min. : 1040
## FL : 15278 1st Qu.:25301
## TX : 14192 Median :43701
## NY : 11423 Mean :47029
## IL : 9620 3rd Qu.:71603
## PA : 9521 Max. :99801
## (Other):127235
## Hospital.Referral.Region..HRR..Description Total.Discharges
## CA - Los Angeles : 4206 Min. : 11.00
## MA - Boston : 4141 1st Qu.: 14.00
## PA - Philadelphia : 3397 Median : 22.00
## GA - Atlanta : 3218 Mean : 36.01
## TX - Houston : 3092 3rd Qu.: 39.00
## NY - East Long Island: 2950 Max. :3855.00
## (Other) :181652
## Average.Covered.Charges Average.Total.Payments Average.Medicare.Payments
## Min. : 1368 Min. : 2320 Min. : 1330
## 1st Qu.: 20949 1st Qu.: 6259 1st Qu.: 5012
## Median : 34634 Median : 9180 Median : 7715
## Mean : 52589 Mean : 13169 Mean : 11218
## 3rd Qu.: 60762 3rd Qu.: 14632 3rd Qu.: 12473
## Max. :2238699 Max. :434396 Max. :403454
##
df %>%
filter(Provider.Id <=100000) %>%
group_by(Provider.Name) %>%
summarise(n=n())
## # A tibble: 571 x 2
## Provider.Name n
## <fctr> <int>
## 1 ABRAZO ARROWHEAD CAMPUS 50
## 2 ABRAZO CENTRAL CAMPUS 57
## 3 ABRAZO MARYVALE CAMPUS 8
## 4 ABRAZO SCOTTSDALE CAMPUS 17
## 5 ABRAZO WEST CAMPUS 43
## 6 ADVENTIST MEDICAL CENTER 82
## 7 ADVENTIST MEDICAL CENTER - REEDLEY 2
## 8 AHMC ANAHEIM REGIONAL MEDICAL CENTER 56
## 9 ALAMEDA HOSPITAL 28
## 10 ALASKA NATIVE MEDICAL CENTER 29
## # ... with 561 more rows
It looks like some places do have only 4 digits in their zip codes! New England states. Let’s now look at the provider ID with 5 digits:
df %>%
filter(Provider.Zip.Code <=10000) %>%
group_by(Provider.Name) %>%
summarise(n=n())
## # A tibble: 202 x 2
## Provider.Name n
## <fctr> <int>
## 1 ADCARE HOSPITAL OF WORCESTER INC 4
## 2 ANNA JAQUES HOSPITAL 66
## 3 AROOSTOOK MEDICAL CENTER 28
## 4 ATLANTICARE REGIONAL MEDICAL CENTER - CITY DIV 202
## 5 BAYSHORE COMMUNITY HOSPITAL 79
## 6 BAYSTATE FRANKLIN MEDICAL CENTER 39
## 7 BAYSTATE MARY LANE HOSPITAL 15
## 8 BAYSTATE MEDICAL CENTER 220
## 9 BAYSTATE WING HOSPITAL AND MEDICAL CENTERS 29
## 10 BERGEN REGIONAL MEDICAL CENTER 11
## # ... with 192 more rows
There are 561 providers with 5 digit provider IDs so this seems normal.
For the MDC data table from Wikipedia, we find that some DRG codes are missing from the categorization menu. Additionally, a “pre-MDC” row is included, which is not needed for our data.
str(mdc)
## 'data.frame': 28 obs. of 3 variables:
## $ MDC : int 0 1 2 3 4 5 6 7 8 9 ...
## $ Description: chr "Pre-MDC" "Diseases and Disorders of the Nervous System" "Diseases and Disorders of the Eye" "Diseases and Disorders of the Ear, Nose, Mouth And Throat" ...
## $ MS-DRG[1] : chr "" "020 - 103" "113 - 125" "129 - 159" ...
summarise(mdc)
## data frame with 0 columns and 0 rows
mdc <- mdc[-1,] #Removes pre-MDC row
mdc <- rename(mdc, DRG_codes = `MS-DRG[1]`) #Rename column to DRG_codes
mdc$DRG_codes[27] <- "981 - 997" #contained 999, which does not exist in dataset
mdc$DRG_codes[14] <- "765 - 782" #contained 998, which does not exist in dataset
mdc$DRG_codes[26] <- "001 - 017" #missing codes on Wikipedia
mdc$DRG_codes[9] <- "570 - 607" #missing codes on Wikipedia
Let’s look at unique values for each variable and explore the summary statistics for number data types.
#data type
str(df)
## Classes 'tbl_df', 'tbl' and 'data.frame': 202656 obs. of 12 variables:
## $ DRG.Definition : Factor w/ 564 levels "001 - HEART TRANSPLANT OR IMPLANT OF HEART ASSIST SYSTEM W MCC",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Provider.Id : int 10033 30103 50108 50262 50441 50454 50625 70022 70025 90011 ...
## $ Provider.Name : Factor w/ 3151 levels "ABBEVILLE GENERAL HOSPITAL",..: 2913 1483 2715 2226 2670 2864 399 3144 1000 1543 ...
## $ Provider.Street.Address : Factor w/ 3266 levels "#1 MEDICAL PARK DRIVE",..: 2587 2471 1599 2833 1684 2326 3020 1093 2881 270 ...
## $ Provider.City : Factor w/ 1949 levels "ABBEVILLE","ABERDEEN",..: 157 1348 1499 978 1655 1527 978 1190 721 1832 ...
## $ Provider.State : Factor w/ 51 levels "AK","AL","AR",..: 2 4 5 5 5 5 5 7 7 8 ...
## $ Provider.Zip.Code : int 35233 85054 95816 90095 94305 94143 90048 6504 6102 20010 ...
## $ Hospital.Referral.Region..HRR..Description: Factor w/ 306 levels "AK - Anchorage",..: 2 14 28 22 35 32 22 50 49 51 ...
## $ Total.Discharges : int 13 20 25 14 23 20 43 12 17 31 ...
## $ Average.Covered.Charges : num 1172866 437531 815674 1499044 2238699 ...
## $ Average.Total.Payments : num 251876 240423 233197 415969 420865 ...
## $ Average.Medicare.Payments : num 244458 133510 221682 366609 403454 ...
length(unique(df$DRG.Definition))
## [1] 564
DRG.Definition is a character string with 564 unique MS-DRG codes in this dataset. This means that there were (999-564) codes unassigned in the FY 2014 because they weren’t charged or they don’t exist.
length(unique(df$Provider.Id))
## [1] 3276
length(unique(df$Provider.Name))
## [1] 3151
Provider.Id is an integer, but since this is a unique ID, we could convert this later. There are 3276 unique IDs.
Provider.Name is a character string. There are 3151 unique names, which is less than IDs. Some providers have multiple locations. For example, Wesley Medical Center has a facility in KS and MS.
Provider.City is a character string. There are 1949 cities in this dataset.
Provider.State is a character string with the state names. There are 51 states. We look at the list to see why and see that Washington, D.C. is considered a State here. All of the Cities for State DC are “Washington”, which makes sense.
Provider.Zip.Code is a integer, which could be convereted to a character string later based on context. There are 2987 total zip codes in the dataset.
Hospital.Referral.Region..HRR is a character string. There are 306 HRRs.
length(unique(df$Provider.City))
## [1] 1949
length(unique(df$Provider.State))
## [1] 51
unique(df$Provider.State)
## [1] AL AZ CA CT DC FL GA IL IN KY LA MA MI MN MO NE NJ NY NC OH OK PA SC
## [24] TN TX VA WA WI AK AR CO DE ID IA KS ME MD MS NV NH NM ND OR RI UT VT
## [47] WV HI SD MT WY
## 51 Levels: AK AL AR AZ CA CO CT DC DE FL GA HI IA ID IL IN KS KY LA ... WY
length(unique(df$Provider.Zip.Code))
## [1] 2987
length(unique(df$Hospital.Referral.Region..HRR..Description))
## [1] 306
Total.Discharges is an integer. There were between 11 and 3855 discharges for each DRG with a median of 22 and a mean of 36. Let’s look at the 3855 to see if this makes sense. The Hospital for Special Surgery in Manhattan discharged 3855 inpatients for major joint replacement or reattachment of lower extremity without complications.
Average.Covered.Charges is an integer. On average, the provider billed $52,590/discharge with a median of $34,630/discharge. The lowest amount billed per discharge was 1368 and highest was 2239000. Respectively, these are from alcohol or drug abuse at Healthsource Saginaw and heart transplant at Stanford Hospital. Both of these seem logical.
Average.Total.Payments is an integer that spans between $2320 and $434400. The median total payment per discharge was $9180, while the mean total payment per discharge is $13170. The minimum payment was again for alcohol or drug abuse inpatient treatment but at Proctor Hospital in Peoria, IL and the maximum payment was for Cranial & Nerve Disorders with Major Complications at Memorial Hospital in Jacksonville, FL.
Average.Medicare.Payments is an integer that spans between $1330 and $403500. The median total payment per discharge was $7715 and the mean was $11220. This makes sense as it’s less than the Average.Total.Payments, which includes more additional costs for the same discharges. The minimum payment was for Cranial & Nerve Disorders without Major Complications at Thibodaux Regional Medical Center in LA. The maximum payment per discharge was for Heart Transplant with Major Complications at Stanford Hospital in CA, again.
summary(df$Total.Discharges)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 11.00 14.00 22.00 36.01 39.00 3855.00
df %>% filter(Total.Discharges == 3855)
## # A tibble: 1 x 12
## DRG.Definition
## <fctr>
## 1 470 - MAJOR JOINT REPLACEMENT OR REATTACHMENT OF LOWER EXTREMITY W/O MCC
## # ... with 11 more variables: Provider.Id <int>, Provider.Name <fctr>,
## # Provider.Street.Address <fctr>, Provider.City <fctr>,
## # Provider.State <fctr>, Provider.Zip.Code <int>,
## # Hospital.Referral.Region..HRR..Description <fctr>,
## # Total.Discharges <int>, Average.Covered.Charges <dbl>,
## # Average.Total.Payments <dbl>, Average.Medicare.Payments <dbl>
boxplot(df$Total.Discharges) #a little strange - worth exploring the highpoint
summary(df$Average.Covered.Charges)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1368 20950 34630 52590 60760 2239000
df %>% filter(min(Average.Covered.Charges) == Average.Covered.Charges)
## # A tibble: 1 x 12
## DRG.Definition Provider.Id
## <fctr> <int>
## 1 894 - ALCOHOL/DRUG ABUSE OR DEPENDENCE, LEFT AMA 230275
## # ... with 10 more variables: Provider.Name <fctr>,
## # Provider.Street.Address <fctr>, Provider.City <fctr>,
## # Provider.State <fctr>, Provider.Zip.Code <int>,
## # Hospital.Referral.Region..HRR..Description <fctr>,
## # Total.Discharges <int>, Average.Covered.Charges <dbl>,
## # Average.Total.Payments <dbl>, Average.Medicare.Payments <dbl>
df %>% filter(max(Average.Covered.Charges) == Average.Covered.Charges)
## # A tibble: 1 x 12
## DRG.Definition
## <fctr>
## 1 001 - HEART TRANSPLANT OR IMPLANT OF HEART ASSIST SYSTEM W MCC
## # ... with 11 more variables: Provider.Id <int>, Provider.Name <fctr>,
## # Provider.Street.Address <fctr>, Provider.City <fctr>,
## # Provider.State <fctr>, Provider.Zip.Code <int>,
## # Hospital.Referral.Region..HRR..Description <fctr>,
## # Total.Discharges <int>, Average.Covered.Charges <dbl>,
## # Average.Total.Payments <dbl>, Average.Medicare.Payments <dbl>
summary(df$Average.Total.Payments)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2320 6259 9180 13170 14630 434400
df %>% filter(min(Average.Total.Payments) == Average.Total.Payments)
## # A tibble: 1 x 12
## DRG.Definition Provider.Id
## <fctr> <int>
## 1 894 - ALCOHOL/DRUG ABUSE OR DEPENDENCE, LEFT AMA 140013
## # ... with 10 more variables: Provider.Name <fctr>,
## # Provider.Street.Address <fctr>, Provider.City <fctr>,
## # Provider.State <fctr>, Provider.Zip.Code <int>,
## # Hospital.Referral.Region..HRR..Description <fctr>,
## # Total.Discharges <int>, Average.Covered.Charges <dbl>,
## # Average.Total.Payments <dbl>, Average.Medicare.Payments <dbl>
df %>% filter(max(Average.Total.Payments) == Average.Total.Payments)
## # A tibble: 1 x 12
## DRG.Definition Provider.Id
## <fctr> <int>
## 1 073 - CRANIAL & PERIPHERAL NERVE DISORDERS W MCC 100179
## # ... with 10 more variables: Provider.Name <fctr>,
## # Provider.Street.Address <fctr>, Provider.City <fctr>,
## # Provider.State <fctr>, Provider.Zip.Code <int>,
## # Hospital.Referral.Region..HRR..Description <fctr>,
## # Total.Discharges <int>, Average.Covered.Charges <dbl>,
## # Average.Total.Payments <dbl>, Average.Medicare.Payments <dbl>
summary(df$Average.Medicare.Payments)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1330 5012 7715 11220 12470 403500
test <- df %>%
filter(min(Average.Medicare.Payments) == Average.Medicare.Payments)
df %>%
filter(max(Average.Medicare.Payments) == Average.Medicare.Payments)
## # A tibble: 1 x 12
## DRG.Definition
## <fctr>
## 1 001 - HEART TRANSPLANT OR IMPLANT OF HEART ASSIST SYSTEM W MCC
## # ... with 11 more variables: Provider.Id <int>, Provider.Name <fctr>,
## # Provider.Street.Address <fctr>, Provider.City <fctr>,
## # Provider.State <fctr>, Provider.Zip.Code <int>,
## # Hospital.Referral.Region..HRR..Description <fctr>,
## # Total.Discharges <int>, Average.Covered.Charges <dbl>,
## # Average.Total.Payments <dbl>, Average.Medicare.Payments <dbl>
For the separate MDC table, we can see that each of the three columns is unique with 27 MDCs, descriptions and DRG code ranges.
str(mdc)
## 'data.frame': 27 obs. of 3 variables:
## $ MDC : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Description: chr "Diseases and Disorders of the Nervous System" "Diseases and Disorders of the Eye" "Diseases and Disorders of the Ear, Nose, Mouth And Throat" "Diseases and Disorders of the Respiratory System" ...
## $ DRG_codes : chr "020 - 103" "113 - 125" "129 - 159" "163 - 208" ...
summary(mdc)
## MDC Description DRG_codes
## Min. : 1.0 Length:27 Length:27
## 1st Qu.: 7.5 Class :character Class :character
## Median :14.0 Mode :character Mode :character
## Mean :14.0
## 3rd Qu.:20.5
## Max. :27.0
length(unique(mdc$MDC))
## [1] 27
length(unique(mdc$Description))
## [1] 27
length(unique(mdc$DRG_codes))
## [1] 27
To join the MDC categorizations to the DRG codes, we must first separate the DRG code from the DRG.Definition column into just the DRG code. We do this by creating a function, called str_extract that returns just the code.
We then use the purrr package map_chr function to apply the function through the data frame.
#create new column of DRG codes
sep_col <- function(s)
{
return(str_extract(s, "\\d+"))
}
df$DRG_code <- map_chr(df$DRG.Definition, sep_col)
The DRG codes that are included in each MDC are listed as a range: Format: “###-###”
There are a few ways to join the MDC code and description from the MDC table into the DRG table. Two functions will be used here to take a string and search within the range of DRG codes on the MDC table and return either the MDC code or the description, depending on the function.
These two functions are applied on the DRG table to obtain two new columns, the MDC number and the MDC description, for each row.
getMDC <- function(x)
{
for (j in seq_along(mdc$DRG_codes))
{
if((as.numeric(x) >= as.numeric(str_sub(mdc$DRG_codes[j],1,3))) &
(as.numeric(x) <= as.numeric(str_sub(mdc$DRG_codes[j],-3,-1))))
{
return(mdc$MDC[j])
break
}
}
}
getMDC_desc <- function(x)
{
for (j in seq_along(mdc$DRG_codes))
{
if((as.numeric(x) >= as.numeric(str_sub(mdc$DRG_codes[j],1,3))) &
(as.numeric(x) <= as.numeric(str_sub(mdc$DRG_codes[j],-3,-1))))
{
return(mdc$Description[j])
break
}
}
}
df$MDC <- map_chr(df$DRG_code, getMDC)
df$MDC_desc <- map_chr(df$DRG_code, getMDC_desc)
In this section, the dataset will be explored in an effort to understand which cost associated with different variables in the dataset.
First, we will group the data by Provider and add total discharges so we can see the most active providers.
Florida Hospital had by far the majority of discharges at 37,886 discharges in the year.
df_top <- df %>% group_by(Provider.Name) %>%
summarise(total_discharge=sum(Total.Discharges))
boxplot(df_top$total_discharge)
summary(df_top$total_discharge)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 11.0 409.5 1326.0 2316.0 3170.0 37890.0
knitr::kable(top_n(df_top, 10, total_discharge))
| Provider.Name | total_discharge |
|---|---|
| BAPTIST MEDICAL CENTER | 23500 |
| BEAUMONT HOSPITAL, ROYAL OAK | 19460 |
| CHRISTIANA CARE HEALTH SERVICES, INC. | 21049 |
| FLORIDA HOSPITAL | 37886 |
| GOOD SAMARITAN HOSPITAL | 18342 |
| MAYO CLINIC HOSPITAL ROCHESTER | 16543 |
| METHODIST HEALTHCARE MEMPHIS HOSPITALS | 17949 |
| METHODIST HOSPITAL | 21067 |
| NEW YORK-PRESBYTERIAN HOSPITAL | 27026 |
| YALE-NEW HAVEN HOSPITAL | 19062 |
df %>% group_by(Provider.Name) %>%
summarise(total_discharge=sum(Total.Discharges)) %>%
top_n(10) %>%
ggplot(aes(x=reorder(Provider.Name, -total_discharge), y=total_discharge))+
geom_bar(stat='identity') +
xlab("Provider") +
ylab("Total discharges") +
ggtitle("Most Discharges per Provider") +
coord_flip()
California had by far and away the most charges. This information by itself isn’t super helpful as we should normalize for population. Ohio makes the top 10 at number 8, which is only interesting for me because I live in Ohio.
df_top <- df %>%
mutate(total_charges = Average.Covered.Charges*Total.Discharges) %>%
group_by(Provider.State) %>%
summarise(total_charges=sum(total_charges)) %>%
top_n(10, total_charges) %>% arrange(desc(total_charges))
knitr::kable(df_top)
| Provider.State | total_charges |
|---|---|
| CA | 44365030424 |
| FL | 33961149467 |
| TX | 28221200172 |
| NY | 22119819185 |
| NJ | 20250229517 |
| PA | 18407847026 |
| IL | 16133340349 |
| OH | 11519645159 |
| NC | 9909072357 |
| MI | 9790055394 |
df_top %>% ggplot(aes(x=reorder(Provider.State, total_charges), y=total_charges)) +
geom_bar(stat='identity') +
ylab("State") +
xlab("Total charges by State for code ($)") +
ggtitle("States that have charged the most to Medicare 2014") +
coord_flip()
Neither of these subcategories tells us a whole lot without additional information like state population.
By examining total billing and payments by different variables we can get a grasp of where most of the charges sit.
First, total about billed to Medicare and paid by Medicare is examined by provider.
Cedar-Sinai Medical Center had far and away the most billing to Medicare at ~$2.8 Billion. The provider was not in the top 10 for # of discharges so it’s possible that the medical center performed expensive procedures.
The ratio of paid to charged is also quantified here. This ratio for the top 10 providers that billed to Medicare is between 10.1 and 26.4%. Later we will look at how this percentage varies by MDC.
df_top <- df %>% mutate(total_c = Average.Covered.Charges*Total.Discharges,
total_p = Average.Medicare.Payments*Total.Discharges) %>%
group_by(Provider.Name) %>%
summarise(total_charges=sum(total_c), total_paid = sum(total_p),
ratio_paid_charged = sum(total_p)/sum(total_c)) %>%
top_n(10, wt = total_charges) %>%
arrange(desc(total_charges))
knitr::kable(df_top)
| Provider.Name | total_charges | total_paid | ratio_paid_charged |
|---|---|---|---|
| CEDARS-SINAI MEDICAL CENTER | 2788685964 | 313833786 | 0.1125382 |
| FLORIDA HOSPITAL | 2573698475 | 353695767 | 0.1374270 |
| NEW YORK-PRESBYTERIAN HOSPITAL | 2302847905 | 608910781 | 0.2644164 |
| UPMC PRESBYTERIAN SHADYSIDE | 1795562371 | 182333355 | 0.1015467 |
| STANFORD HOSPITAL | 1670843873 | 237589872 | 0.1421975 |
| BAPTIST MEDICAL CENTER | 1383022957 | 218881595 | 0.1582632 |
| METHODIST HOSPITAL | 1360782358 | 216809840 | 0.1593273 |
| YALE-NEW HAVEN HOSPITAL | 1337561828 | 335195291 | 0.2506017 |
| GOOD SAMARITAN HOSPITAL | 1297986815 | 204188419 | 0.1573116 |
| MASSACHUSETTS GENERAL HOSPITAL | 1255012049 | 298675391 | 0.2379861 |
df_top %>% ggplot(aes(x=reorder(Provider.Name, total_charges), y=total_charges)) +
geom_bar(stat='identity') +
xlab("Provider") +
ylab("Total charges by Provider ($)") +
ggtitle("Providers that billed the most in Medicare") +
coord_flip()
df_top %>% arrange(desc(total_paid)) %>%
ggplot(aes(x=reorder(Provider.Name, total_paid), y=total_paid)) +
geom_bar(stat='identity') +
xlab("Provider") +
ylab("Total paid to Provider ($)") +
ggtitle("Providers that were paid the most by Medicare") +
coord_flip()
Next, we look at the most expensive DRG across Medicare. The most expensive codes are 470 and 871, Major Joint Replacement/Reattachment of Lower Extremity and Septicemia/Severe Sepsis without MV. We can see that later in the list Septicemia or severe sepsis with MV shows again, so maybe MDC grouping will tell us something different.
The ratio paid for the top 10 charged was between 15.8 and 23.6%. The median was 22.7% and the minimum was DRG 247, a cardiovascular procedure.
df_top <- df %>% mutate(total_c = Average.Covered.Charges*Total.Discharges,
total_p = Average.Medicare.Payments*Total.Discharges) %>%
group_by(DRG.Definition) %>%
summarise(total_charges=sum(total_c), total_paid = sum(total_p),
ratio_paid_charged = sum(total_p)/sum(total_c)) %>%
top_n(10, wt = total_charges) %>%
arrange(desc(total_charges))
knitr::kable(df_top)
| DRG.Definition | total_charges | total_paid | ratio_paid_charged |
|---|---|---|---|
| 470 - MAJOR JOINT REPLACEMENT OR REATTACHMENT OF LOWER EXTREMITY W/O MCC | 25653617785 | 5596091568 | 0.2181404 |
| 871 - SEPTICEMIA OR SEVERE SEPSIS W/O MV 96+ HOURS W MCC | 23292420516 | 5306424301 | 0.2278176 |
| 853 - INFECTIOUS & PARASITIC DISEASES W O.R. PROCEDURE W MCC | 8367934257 | 1972994594 | 0.2357804 |
| 291 - HEART FAILURE & SHOCK W MCC | 8331664766 | 1901379449 | 0.2282112 |
| 460 - SPINAL FUSION EXCEPT CERVICAL W/O MCC | 7745595793 | 1802530535 | 0.2327168 |
| 247 - PERC CARDIOVASC PROC W DRUG-ELUTING STENT W/O MCC | 6544077580 | 1036121273 | 0.1583296 |
| 003 - ECMO OR TRACH W MV 96+ HRS OR PDX EXC FACE, MOUTH & NECK W MAJ O.R. | 6497437346 | 1529531693 | 0.2354054 |
| 193 - SIMPLE PNEUMONIA & PLEURISY W MCC | 5399897958 | 1165581160 | 0.2158524 |
| 292 - HEART FAILURE & SHOCK W CC | 5333146088 | 1207892117 | 0.2264877 |
| 870 - SEPTICEMIA OR SEVERE SEPSIS W MV 96+ HOURS | 5119709555 | 1155757648 | 0.2257467 |
summary(df_top$ratio_paid_charged)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1583 0.2200 0.2272 0.2204 0.2316 0.2358
df_top %>% ggplot(aes(x=reorder(DRG.Definition, total_charges), y=total_charges)) +
geom_bar(stat='identity') +
xlab("DRG") +
ylab("Total charges by DRG ($)") +
theme(axis.text = element_text(size = 8)) +
ggtitle("Most billed DRG") +
coord_flip()
df_top %>% arrange(desc(total_paid)) %>%
ggplot(aes(x=reorder(DRG.Definition, total_paid), y=total_paid)) +
geom_bar(stat='identity') +
xlab("DRG") +
ylab("Total paid by DRG ($)") +
theme(axis.text = element_text(size = 8)) +
ggtitle("Most paid DRG") +
coord_flip()
Let’s look at a specific MDC to understand the most charged ‘organ’ system to Medicare.
df_top <- df %>% mutate(total_c = Average.Covered.Charges*Total.Discharges,
total_p = Average.Medicare.Payments*Total.Discharges) %>%
group_by(MDC_desc) %>%
summarise(total_charges=sum(total_c), total_paid = sum(total_p),
ratio_paid_charged = sum(total_p)/sum(total_c)) %>%
arrange(desc(total_charges))
knitr::kable(df_top)
| MDC_desc | total_charges | total_paid | ratio_paid_charged |
|---|---|---|---|
| Diseases and Disorders of the Circulatory System | 89163784935 | 18437486561 | 0.2067822 |
| Diseases and Disorders of the Musculoskeletal System And Connective Tissue | 58475069377 | 12786752478 | 0.2186702 |
| Diseases and Disorders of the Respiratory System | 44051223925 | 9380699345 | 0.2129498 |
| Infectious and Parasitic DDs (Systemic or unspecified sites) | 42566102774 | 9696949315 | 0.2278092 |
| Diseases and Disorders of the Digestive System | 31186796587 | 6568855659 | 0.2106294 |
| Diseases and Disorders of the Nervous System | 22458996599 | 4504694005 | 0.2005741 |
| Diseases and Disorders of the Kidney And Urinary Tract | 21165815046 | 4442528600 | 0.2098917 |
| Transplants | 12803788304 | 2979028618 | 0.2326678 |
| Diseases and Disorders of the Endocrine, Nutritional And Metabolic System | 7229438722 | 1554436781 | 0.2150149 |
| Diseases and Disorders of the Hepatobiliary System And Pancreas | 6292197617 | 1299026590 | 0.2064504 |
| Diseases and Disorders of the Skin, Subcutaneous Tissue And Breast | 4163313878 | 912154985 | 0.2190935 |
| Diseases and Disorders of the Blood and Blood Forming Organs and Immunological Disorders | 3419849391 | 723656974 | 0.2116049 |
| Extensive Procedures Unrelated to Principal Diagnosis | 3143558507 | 703853347 | 0.2239034 |
| Injuries, Poison And Toxic Effect of Drugs | 2507636029 | 564774943 | 0.2252221 |
| Mental Diseases and Disorders | 2315320272 | 745177152 | 0.3218463 |
| Myeloproliferative DDs (Poorly Differentiated Neoplasms) | 1681070320 | 388517968 | 0.2311135 |
| Factors Influencing Health Status and Other Contacts with Health Services | 1122970543 | 313803568 | 0.2794406 |
| Alcohol/Drug Use or Induced Mental Disorders | 792063049 | 266490314 | 0.3364509 |
| Diseases and Disorders of the Male Reproductive System | 588586033 | 100039171 | 0.1699652 |
| Diseases and Disorders of the Ear, Nose, Mouth And Throat | 525155845 | 94743650 | 0.1804105 |
| Diseases and Disorders of the Female Reproductive System | 471446747 | 96396244 | 0.2044690 |
| Multiple Significant Trauma | 312485680 | 72249816 | 0.2312100 |
| Burns | 199470077 | 50869737 | 0.2550244 |
| Human Immunodeficiency Virus Infection | 194030821 | 54192299 | 0.2792974 |
| Pregnancy, Childbirth And Puerperium | 61051060 | 19912454 | 0.3261607 |
| Diseases and Disorders of the Eye | 21265894 | 4322467 | 0.2032582 |
summary(df_top$ratio_paid_charged)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1700 0.2076 0.2189 0.2323 0.2323 0.3365
df_top %>% ggplot(aes(x=reorder(MDC_desc, total_charges), y=total_charges)) +
geom_bar(stat='identity') +
xlab("MDC") +
ylab("Total charges by MDC ($)") +
theme(axis.text = element_text(size = 8), axis.text.x = element_text(angle = 90)) +
ggtitle("Most billed by MDC in Medicare") +
coord_flip()
df_top %>% arrange(desc(total_paid)) %>%
ggplot(aes(x=reorder(MDC_desc, total_paid), y=total_paid)) +
geom_bar(stat='identity') +
xlab("MDC") +
ylab("Total paid by MDC ($)") +
theme(axis.text = element_text(size = 8), axis.text.x = element_text(angle = 90)) +
ggtitle("Most paid MDC in Medicare") +
coord_flip()
The Circulatory System puts the most burden on our Medicare system, while the Musculoskeletal System and Connective Tissue is the next highest.
To see if the proportion of amount paid to amount charged is different by MDC grouping we perform a scatter plot and find that the relationship is quite linear.
df_top %>%
ggplot(aes(x= total_charges, y=total_paid)) +
geom_point() +
xlab("Total charged to Medicare ($)") +
ylab("Total paid by Medicare ($)") +
ggtitle("Charges versus payments for Medicare by MDC") +
coord_flip()
This above chart is fascinating. This means that the proportion paid to the providers is quite similar across all MDCs.
When we normalize against the total amount billed to Medicare, the results show a more holistic picture. The Circulatory System accounts for a quarter of all billing to Medicare!
df_new <- df %>% mutate(total_charges = Average.Covered.Charges*Total.Discharges)
total_medicare <- sum(df_new$total_charges)
df_top <- df %>%
mutate(total_charges = Average.Covered.Charges*Total.Discharges) %>%
group_by(MDC_desc) %>%
summarise(total_charges=sum(total_charges)/total_medicare) %>%
top_n(10, total_charges) %>% arrange(desc(total_charges))
knitr::kable(df_top)
| MDC_desc | total_charges |
|---|---|
| Diseases and Disorders of the Circulatory System | 0.2498197 |
| Diseases and Disorders of the Musculoskeletal System And Connective Tissue | 0.1638359 |
| Diseases and Disorders of the Respiratory System | 0.1234230 |
| Infectious and Parasitic DDs (Systemic or unspecified sites) | 0.1192620 |
| Diseases and Disorders of the Digestive System | 0.0873794 |
| Diseases and Disorders of the Nervous System | 0.0629258 |
| Diseases and Disorders of the Kidney And Urinary Tract | 0.0593025 |
| Transplants | 0.0358737 |
| Diseases and Disorders of the Endocrine, Nutritional And Metabolic System | 0.0202555 |
| Diseases and Disorders of the Hepatobiliary System And Pancreas | 0.0176295 |
df_top %>% ggplot(aes(x=reorder(MDC_desc, total_charges), y=total_charges)) +
geom_bar(stat='identity') +
xlab("MDC description") +
ylab("% of total Medicare amount billed") +
theme(axis.text = element_text(size = 8), axis.text.x = element_text(angle = 90)) +
ggtitle("Most expensive MDC groupings") +
coord_flip()
To further analyze the proportion of payments to total charges, we will use a scatter plot again.
The results are surprising. Medicare pays out more than it is billed for certain DRG and MDCs. MDC 20, alcohol and drug use or induced mental disorders is a reoccuring MDC for the top payments/amount charged.
When the scatter plot is analyzed, across all rows, it’s interesting to see that the proportions for certain providers and certain procedures within the provider have a wide variation.
knitr::kable(df %>%
mutate(prop_paid = Average.Medicare.Payments/Average.Covered.Charges) %>%
top_n(15, wt = prop_paid) %>%
arrange(desc(prop_paid)) %>%
select(DRG.Definition,Provider.Name,MDC_desc,prop_paid, Total.Discharges))
| DRG.Definition | Provider.Name | MDC_desc | prop_paid | Total.Discharges |
|---|---|---|---|---|
| 894 - ALCOHOL/DRUG ABUSE OR DEPENDENCE, LEFT AMA | KENSINGTON HOSPITAL | Alcohol/Drug Use or Induced Mental Disorders | 6.473323 | 13 |
| 897 - ALCOHOL/DRUG ABUSE OR DEPENDENCE W/O REHABILITATION THERAPY W/O MCC | KENSINGTON HOSPITAL | Alcohol/Drug Use or Induced Mental Disorders | 4.184601 | 32 |
| 313 - CHEST PAIN | RED LAKE HOSPITAL A | Diseases and Disorders of the Circulatory System | 3.349014 | 13 |
| 775 - VAGINAL DELIVERY W/O COMPLICATING DIAGNOSES | WOMAN’S HOSPITAL | Pregnancy, Childbirth And Puerperium | 3.320698 | 15 |
| 894 - ALCOHOL/DRUG ABUSE OR DEPENDENCE, LEFT AMA | COLLEGE HOSPITAL COSTA MESA | Alcohol/Drug Use or Induced Mental Disorders | 2.813144 | 21 |
| 689 - KIDNEY & URINARY TRACT INFECTIONS W MCC | FORT DEFIANCE INDIAN HOSPITAL | Diseases and Disorders of the Kidney And Urinary Tract | 2.727023 | 17 |
| 897 - ALCOHOL/DRUG ABUSE OR DEPENDENCE W/O REHABILITATION THERAPY W/O MCC | COLLEGE HOSPITAL COSTA MESA | Alcohol/Drug Use or Induced Mental Disorders | 2.246975 | 106 |
| 895 - ALCOHOL/DRUG ABUSE OR DEPENDENCE W REHABILITATION THERAPY | ST ELIZABETH’S MEDICAL CENTER | Alcohol/Drug Use or Induced Mental Disorders | 2.222524 | 114 |
| 690 - KIDNEY & URINARY TRACT INFECTIONS W/O MCC | YUKON KUSKOKWIM DELTA REG HOSPITAL | Diseases and Disorders of the Kidney And Urinary Tract | 2.121023 | 11 |
| 894 - ALCOHOL/DRUG ABUSE OR DEPENDENCE, LEFT AMA | BELLEVUE HOSPITAL CENTER | Alcohol/Drug Use or Induced Mental Disorders | 2.106272 | 16 |
| 847 - CHEMOTHERAPY W/O ACUTE LEUKEMIA AS SECONDARY DIAGNOSIS W CC | BRONX-LEBANON HOSPITAL CENTER | Myeloproliferative DDs (Poorly Differentiated Neoplasms) | 2.071471 | 21 |
| 470 - MAJOR JOINT REPLACEMENT OR REATTACHMENT OF LOWER EXTREMITY W/O MCC | CHICKASAW NATION MEDICAL CENTER | Diseases and Disorders of the Musculoskeletal System And Connective Tissue | 2.035351 | 16 |
| 897 - ALCOHOL/DRUG ABUSE OR DEPENDENCE W/O REHABILITATION THERAPY W/O MCC | BRONX-LEBANON HOSPITAL CENTER | Alcohol/Drug Use or Induced Mental Disorders | 2.026275 | 56 |
| 603 - CELLULITIS W/O MCC | YUKON KUSKOKWIM DELTA REG HOSPITAL | Diseases and Disorders of the Skin, Subcutaneous Tissue And Breast | 2.021620 | 22 |
| 743 - UTERINE & ADNEXA PROC FOR NON-MALIGNANCY W/O CC/MCC | WOMAN’S HOSPITAL | Diseases and Disorders of the Female Reproductive System | 1.991255 | 24 |
df %>%
ggplot(aes(x= Average.Covered.Charges, y=Average.Medicare.Payments)) +
geom_point() +
xlab("Total charged to Medicare ($)") +
ylab("Total paid by Medicare ($)") +
ggtitle("Charges versus payments for Medicare by MDC") +
coord_flip()
As seen earlier, the Circulatory System puts the most burden on our Medicare system, while the Musculoskeletal System and Connective Tissue comes in second.
In this section, we will explore MDC 5, Diseases and Disorders, to understand what contributes to the high expenditure of this category.
df5 <- df %>%
filter(MDC == "5")
nrow(df5)
## [1] 45605
We see above that the total number of rows is about a quarter of the rows in the entire dataset. We saw already that total amount charged to medicare from this MDC was about 25%, so we will assign the total billed from this MDC to a variable so we can see which DRGs contributed most to this number.
df5_tot<- df5 %>% mutate(total_charges = Average.Covered.Charges*Total.Discharges)
total_mdc5 <- sum(df5_tot$total_charges)
Let’s see which DRGs and Providers had the highest average charges.
df5_top <- df5 %>%
top_n(15, wt = Average.Covered.Charges) %>%
arrange(desc(Average.Covered.Charges)) %>%
select(DRG_code, Provider.Name,Average.Covered.Charges)
knitr::kable(df5_top)
| DRG_code | Provider.Name | Average.Covered.Charges |
|---|---|---|
| 216 | GOOD SAMARITAN HOSPITAL | 960960.6 |
| 216 | STANFORD HOSPITAL | 912233.5 |
| 215 | ROBERT WOOD JOHNSON UNIVERSITY HOSPITAL | 834653.1 |
| 216 | CEDARS-SINAI MEDICAL CENTER | 819072.0 |
| 219 | GOOD SAMARITAN HOSPITAL | 818491.4 |
| 219 | REGIONAL MEDICAL CENTER OF SAN JOSE | 754724.9 |
| 216 | DOCTORS MEDICAL CENTER | 751233.3 |
| 216 | UNIVERSITY OF CALIFORNIA DAVIS MEDICAL CENTER | 700702.9 |
| 233 | GARFIELD MEDICAL CENTER | 695798.1 |
| 233 | REGIONAL MEDICAL CENTER OF SAN JOSE | 689924.3 |
| 216 | MERCY GENERAL HOSPITAL | 688602.1 |
| 233 | DOCTORS MEDICAL CENTER | 676566.5 |
| 233 | EASTON HOSPITAL | 675319.0 |
| 235 | STANFORD HOSPITAL | 671651.5 |
| 219 | ST HELENA HOSPITAL | 671387.0 |
summary(df5_top$Average.Covered.Charges)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 671400 682600 700700 754800 818800 961000
The distribution from the summary function above shows that some providers and DRGs were abnormally high relative to the median and mean. The most expensive DRG is 216 with 219 also occurring frequently in the top 10. Good Samaritan Hospital and Stanford Hospital appear twice in this top 15 list.
When we look across DRGs below, we find that 291, Heart Failure, accounts for the highest amount of charges at 10% of the total MDC. Within 291, we find large discrepencies amongst providers and what they are charging. We show the top 15 culprits below.
df_new1 <- df5 %>%
mutate(tot_charges = Total.Discharges*Average.Covered.Charges) %>%
group_by(DRG.Definition) %>%
summarise(pct = sum(tot_charges)/total_mdc5) %>%
arrange(desc(pct)) %>%
top_n(10)
knitr::kable(df_new1) # top 10 DRGs within MDC
| DRG.Definition | pct |
|---|---|
| 291 - HEART FAILURE & SHOCK W MCC | 0.0934423 |
| 247 - PERC CARDIOVASC PROC W DRUG-ELUTING STENT W/O MCC | 0.0733939 |
| 292 - HEART FAILURE & SHOCK W CC | 0.0598129 |
| 219 - CARDIAC VALVE & OTH MAJ CARDIOTHORACIC PROC W/O CARD CATH W MCC | 0.0497051 |
| 220 - CARDIAC VALVE & OTH MAJ CARDIOTHORACIC PROC W/O CARD CATH W CC | 0.0418262 |
| 287 - CIRCULATORY DISORDERS EXCEPT AMI, W CARD CATH W/O MCC | 0.0381481 |
| 246 - PERC CARDIOVASC PROC W DRUG-ELUTING STENT W MCC OR 4+ VESSELS/STENTS | 0.0367650 |
| 238 - MAJOR CARDIOVASC PROCEDURES W/O MCC | 0.0346714 |
| 280 - ACUTE MYOCARDIAL INFARCTION, DISCHARGED ALIVE W MCC | 0.0343773 |
| 252 - OTHER VASCULAR PROCEDURES W MCC | 0.0320281 |
summary(df_new1$pct) #range shows that some DRGs are much more expensive
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.03203 0.03519 0.03999 0.04942 0.05729 0.09344
sum(df_new1$pct)
## [1] 0.4941701
The above tibble shows us that 10 DRG diagnoses account for 50% of the MDC. The majority of these DRGs deal with Heart Failure, Stents or Major Cardiovasc Procedures. So 10 diagnoses account for 12.5% of all Medicare expenses.
For further visualization, we look at a scatter plot of all charges versus the payment for just heart failure, code 291. There is a wide array within a single DRG. Remember, these are average charges, not total charged.
df291 <- df5 %>% filter(DRG_code == "291")
summary(df291$Average.Covered.Charges)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8195 23930 33590 40170 49730 208600
boxplot(df291$Average.Covered.Charges)
knitr::kable(df291 %>%
select(Provider.Name, Average.Covered.Charges) %>%
arrange(desc(Average.Covered.Charges)) %>%
top_n(15, Average.Covered.Charges))
| Provider.Name | Average.Covered.Charges |
|---|---|
| STANFORD HOSPITAL | 208578.0 |
| CAREPOINT HEALTH-CHRIST HOSPITAL | 203639.7 |
| CAREPOINT HEALTH - BAYONNE MEDICAL CENTER | 193575.6 |
| MONTEREY PARK HOSPITAL | 183909.4 |
| SAINT PETER’S UNIVERSITY HOSPITAL | 162203.3 |
| HAHNEMANN UNIVERSITY HOSPITAL | 156441.7 |
| WESTCHESTER MEDICAL CENTER | 154738.5 |
| CAPITAL HEALTH MEDICAL CENTER - HOPEWELL | 153973.9 |
| GATEWAY REGIONAL MEDICAL CENTER | 153616.5 |
| CAPITAL HEALTH SYSTEM-FULD CAMPUS | 149121.5 |
| REGIONAL MEDICAL CENTER OF SAN JOSE | 148973.0 |
| DOCTORS MEDICAL CENTER | 146757.2 |
| CEDARS-SINAI MEDICAL CENTER | 146381.1 |
| SEQUOIA HOSPITAL | 141240.4 |
| VALLEY HOSPITAL MEDICAL CENTER | 139482.7 |
df291 %>%
ggplot(aes(x= Average.Covered.Charges, y=Average.Medicare.Payments)) +
geom_point() +
xlab("Average charged to Medicare ($)") +
ylab("Average paid by Medicare ($)") +
ggtitle("Charges versus payments for Medicare by MDC") +
coord_flip()
The above analysis of the individual MDC shows a wide array paid to hospitals performing precedures on the same diagnoses. This tells us a simple things. That prevention should focus on care of the heart. Heart precedures account for 25% of all billed to Medicare and 10 diagnoses accounting for half of those charges.
Additionally, many providers seem to charge more for the same procedure. It would be worthwhile for Medicare to investigate (assuming they have not already) why certain providers bill so far away from the median.
This publically available dataset provides transparency to how Medicare is paying across procedures. After attaching the MDC organ system to the DRG codes we can see a story of what charges make up the largest share of Medicare charges.
When breaking the data down on an MDC level, it is easy to see that a few organ systems make up a majority of the Medicare charges. Ailments of the Circulatory System, the Musculoskeletal System, the Respiratory System and of Infectious diseases make up a disproportionatly high share of Medicare costs.
Across MDCs, however, the portion paid compared to the portion charged by the provider generated a linear relationship. This means that Medicare pays similar amounts to the providers regardless of the organ system.
When analyzing DRG codes, joint replacement of the lower part of the body made up the largest total cost to Medicare. of the top 10, most of the DRGs had a similar paid/charge ratio. This illustated that Medicare payments seem to be consistent across the top expenditures.
We see a disproportionate share of total amount billed to Medicare attributed to the Circulatory system. 10 diagnoses within this MDC account for half of the amount billed to Medicare under this MDC. Additionally, there is a wide array of average charges within the most expensive DRG, 291 - Heart Failure.
As shared in the previous section, the MDC level analysis gives us the best glimpse at Providers that charge more for a given DRG as well as allows the researcher to group the charges in a logical way. The Circulatory MDC accounts for a quarter of all bills to Medicare with a few procedures making up the majority. It would be worth investigating why some providers charge more (it’s possible that the better institutions get the trickier cases) as well as how the government can target preventative measures to reduce the burden on the government long term.