For this Assigment, we will be using the 2011 Inpatient Prospective Payment System (IPPS) that summarizes DRG payments made of the top 100 procedures. These are procedures most common amoung all hospital visits. For this data we will be looking specifically a the NYC region.
library(RCurl)
## Loading required package: bitops
library(sqldf)
## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: RSQLite
## Loading required package: DBI
library(ggplot2)
x <- getURL("https://raw.githubusercontent.com/mfarris9505/DRG-Dataset-/master/DRG%20Payments%202011.csv")
DRG_Payments<- read.csv(text =x)
#Removing Excess Columns Simplify
DRG_Payments$Provider.Street.Address <-NULL
DRG_Payments$Provider.Zip.Code <-NULL
DRG_Payments$Provider.State <-NULL
#Renaming Columns
columns <- c("DRG", "ID", "Provider", "City", "Region", "DC","Charges","Payments", "Medicare")
names(DRG_Payments) <- columns
#Reclassifying Dollar Values as Numeric
DRG_Payments$Payments <- as.numeric(as.character(sub("\\$","",DRG_Payments$Payments)))
DRG_Payments$Charges <- as.numeric(as.character(sub("\\$","",DRG_Payments$Charges)))
DRG_Payments$Medicare <- as.numeric(as.character(sub("\\$","",DRG_Payments$Medicare)))
#Difference Column Rounded
DRG_Payments$ChargeDiff <- round(DRG_Payments$Charges - DRG_Payments$Payments)
DRG_Payments$MedicareDiff <- round(DRG_Payments$Payments - DRG_Payments$Medicare)
head(DRG_Payments$ChargeDiff)
## [1] 2691 8090 8220 16764 24065 24351
summary(DRG_Payments$ChargeDiff)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -35710 9924 18800 24150 31080 377100
summary(DRG_Payments$MedicareDiff)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 159 775 949 1229 1276 35440
Now that we have several interesting values its now time to plot some of the data to see if there is any significance. First, a histogram of payment difference. As a brief explanation, the Charges column represents the amount billed by the hospital. The payment, is the amount the hospital actually recieves. If we plot this difference we can see something quite revealing.
qplot(DRG_Payments$ChargeDiff, geom="histogram", binwidth=5000, xlim = c(-50000,100000))
For the most part the distribution is a normal distribution, however, we can see that for the most part, hospitals typically do not recieve the full billed amount. In fact, the majority of bills are under paid by upwards of 10 to 15 thousand dollars. There has been a lot of complaints recently about rising health costs. What most people don’t seem to understand is that despite these increasing cost (ie. what’s being billed), hospitals are not recieving the full payment amount, rather they are contracted with different insurance companies and get a set rate irregardless of the cost.
ggplot(DRG_Payments, aes(x=Charges, y=Payments)) + geom_point(aes(colour=DRG_Payments$City))