For this Assigment, we will be using the 2011 Inpatient Prospective Payment System (IPPS) that summarizes DRG payments made of the top 100 procedures. These are procedures most common amoung all hospital visits. For this data we will be looking specifically a the NYC region.

library(RCurl)
## Loading required package: bitops
library(sqldf)
## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: RSQLite
## Loading required package: DBI
library(ggplot2)


x <- getURL("https://raw.githubusercontent.com/mfarris9505/DRG-Dataset-/master/DRG%20Payments%202011.csv")

DRG_Payments<- read.csv(text =x) 

#Removing Excess Columns Simplify 
DRG_Payments$Provider.Street.Address <-NULL
DRG_Payments$Provider.Zip.Code <-NULL
DRG_Payments$Provider.State <-NULL

#Renaming Columns
columns <- c("DRG", "ID", "Provider", "City", "Region", "DC","Charges","Payments", "Medicare")
names(DRG_Payments) <- columns

#Reclassifying Dollar Values as Numeric
DRG_Payments$Payments <- as.numeric(as.character(sub("\\$","",DRG_Payments$Payments)))
DRG_Payments$Charges <- as.numeric(as.character(sub("\\$","",DRG_Payments$Charges)))
DRG_Payments$Medicare <- as.numeric(as.character(sub("\\$","",DRG_Payments$Medicare)))


#Difference Column Rounded 
DRG_Payments$ChargeDiff <- round(DRG_Payments$Charges - DRG_Payments$Payments)
DRG_Payments$MedicareDiff <- round(DRG_Payments$Payments - DRG_Payments$Medicare)
head(DRG_Payments$ChargeDiff)
## [1]  2691  8090  8220 16764 24065 24351
summary(DRG_Payments$ChargeDiff)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -35710    9924   18800   24150   31080  377100
summary(DRG_Payments$MedicareDiff)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     159     775     949    1229    1276   35440

Now that we have several interesting values its now time to plot some of the data to see if there is any significance. First, a histogram of payment difference. As a brief explanation, the Charges column represents the amount billed by the hospital. The payment, is the amount the hospital actually recieves. If we plot this difference we can see something quite revealing.

qplot(DRG_Payments$ChargeDiff, geom="histogram", binwidth=5000, xlim = c(-50000,100000))

For the most part the distribution is a normal distribution, however, we can see that for the most part, hospitals typically do not recieve the full billed amount. In fact, the majority of bills are under paid by upwards of 10 to 15 thousand dollars. There has been a lot of complaints recently about rising health costs. What most people don’t seem to understand is that despite these increasing cost (ie. what’s being billed), hospitals are not recieving the full payment amount, rather they are contracted with different insurance companies and get a set rate irregardless of the cost.

ggplot(DRG_Payments, aes(x=Charges, y=Payments)) + geom_point(aes(colour=DRG_Payments$City))