by: D’Cypher
Finding the right hospital can be overwhelming. Hospitals, especially for major care, often have special areas of focus such as heart-attacks or stroke. The purpose of this project is to help lessen some of that work, and to help focus the research for the right hospital for a specific need.
This project is a program that uses R code to process hospital data and provide recommendations on hospitals within given criterions.
The program uses data from the Hospital Compare web site (http://hospitalcompare.hhs.gov) run by the U.S. Department of Health and Human Services. The purpose of the web site is to provide data and information about the quality of care at over 4,000 Medicare-certified hospitals in the U.S.
This dataset essentially covers all major U.S. hospitals. This dataset is used for a variety of purposes, including determining whether hospitals should be fined for not providing high quality care to patients (see http://goo.gl/jAXFX for some background on this particular topic).
The Hospital Compare web site contains a lot of data and this program will only look at a small subset. The analysis was performed on:
outcome-of-care-measures.csv: Contains information about 30-day mortality and readmission rates for heart attacks, heart failure, and pneumonia for over 4,000 hospitals.
hospital-data.csv: Contains information about each hospitals
Hospital_Revised_Flatfiles.pdf: Descriptions of the variables in each (i.e the code book).
This code reads in the data.
setwd("C:/Users/wcai/Desktop/")
getwd()
outcome_data <- read.csv("outcome-of-care-measures.csv", colClasses = "character")
#head(outcome_data)
To get a sense of what the data looks like, this is a plot the 30-day mortality rates for heart attacks for hospitals accross the US:
#Histogram of 30 day death rates from heart attacks
outcome_data[, 11] <- as.numeric(outcome_data[, 11])
hist(outcome_data[, 11])
This is the first of three programs that processes the data. This program is called ‘Best’. It is a function call that takes two inputs: a US State abbreviation, and a catagory (“heart attack”, “heart failure”, or “pneumonia”). ‘Best’ then returns the hospital with the lowest mortality on that catagory rate for that given state.
# "best" function takes State and outcomes ("heart attack", "heart failure",
# or "pneumonia"). and returns hospital with
# the lowest mortality rate for that outcome in that state.
best <- function(state, outcome){
#--Input testing:
possible_state <- (unique(outcome_data$State) == state)
possible_outcome <- (c("heart attack", "heart failure", "pneumonia") == outcome)
if(sum(possible_state) != 1){
stop(print("invalid state"))
} else if(sum(possible_outcome) != 1){
stop(print("invalid outcome"))
} else {
#-- Creates State subset
state_filter <- outcome_data[outcome_data$State == state,]
state_filter[state_filter == "Not Available" ] = NA
#Turns "Not Availible" string into NA
state_subset <- data.frame(as.character(state_filter$Hospital.Name),
as.character(state_filter$State),
as.numeric(state_filter$Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack),
as.numeric(state_filter$Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure),
as.numeric(state_filter$Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia))
colnames(state_subset) <- c("Hospital.Name", "State", "heart_attack", "heart_failure", "pneumonia")
rm(state_filter)
#-- Conditional Min and lookup
if(outcome == "heart attack") {
min_outcome <- min(state_subset$heart_attack, na.rm = TRUE)
lookup_row <- which(state_subset$State == state
& state_subset$heart_attack == min_outcome
& complete.cases(state_subset$heart_attack) == T)
lookup_col <- which(colnames(state_subset)=="Hospital.Name")
best_hospitals <- sort(as.vector(state_subset[lookup_row,lookup_col]))
} else if (outcome == "heart failure") {
min_outcome <- min(state_subset$heart_failure, na.rm = TRUE)
lookup_row <- which(state_subset$State == state
& state_subset$heart_failure == min_outcome
& complete.cases(state_subset$heart_failure) == T)
lookup_col <- which(colnames(state_subset)=="Hospital.Name")
best_hospitals <- sort(as.vector(state_subset[lookup_row,lookup_col]))
} else if (outcome == "pneumonia"){
min_outcome <- min(state_subset$pneumonia, na.rm = TRUE)
lookup_row <- which(state_subset$State == state
& state_subset$pneumonia == min_outcome
& complete.cases(state_subset$pneumonia) == T)
lookup_col <- which(colnames(state_subset)=="Hospital.Name")
best_hospitals <- sort(as.vector(state_subset[lookup_row,lookup_col]))
} else {
stop(print("Not valid input for outcome."))
}
#Gives 1 of best hospital based on name order
print(best_hospitals[1])
}
}
Here are 4 examples of the input and output:
#"CYPRESS FAIRBANKS MEDICAL CENTER"
best("TX", "heart attack")
## [1] "CYPRESS FAIRBANKS MEDICAL CENTER"
#"FORT DUNCAN MEDICAL CENTER"
best("TX", "heart failure")
## [1] "FORT DUNCAN MEDICAL CENTER"
#"JOHNS HOPKINS HOSPITAL, THE"
best("MD", "heart attack")
## [1] "JOHNS HOPKINS HOSPITAL, THE"
#"GREATER BALTIMORE MEDICAL CENTER"
best("MD", "pneumonia")
## [1] "GREATER BALTIMORE MEDICAL CENTER"
The next program is called ‘Rank’. It’s a function called rankhospital that takes three arguments: the 2-character abbreviated name of a state (state), an outcome (outcome), and the ranking of a hospital in that state for that outcome (num).
The function reads the outcome-of-care-measures.csv and returns a character vector with the name of the hospital that has the ranking specified by the num argument.
For example, the call rankhospital(“MD”, “heart failure”, 5) would return a character vector containing the name of the hospital with the 5th lowest 30-day death rate for heart failure. The num argument can take values “best”, “worst”, or an integer indicating the ranking (smaller numbers are better).
If the number given by num is larger than the number of hospitals in that state, then the function should return NA. Hospitals that do not have data on a particular outcome should be excluded from the set of hospitals when deciding the rankings.
# "rankhospital" function takes State, outcomes ("heart attack", "heart failure",
# or "pneumonia"), and rank, and returns hospital with
# the lowest mortality rate for that outcome in that state for that rank.
rankhospital <- function(state, outcome, num = "best") {
#--Input testing on state and outcome:
possible_state <- (unique(outcome_data$State) == state)
possible_outcome <- (c("heart attack", "heart failure", "pneumonia") == outcome)
if(sum(possible_state) != 1){
stop(print("invalid state"))
} else if(sum(possible_outcome) != 1){
stop(print("invalid outcome"))
} else {
#-- Creates State subset
state_filter <- outcome_data[outcome_data$State == state,]
state_filter[state_filter == "Not Available" ] = NA
#Turns "Not Availible" string into NA
state_subset <- data.frame(as.character(state_filter$Hospital.Name),
as.character(state_filter$State),
as.numeric(state_filter$Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack),
as.numeric(state_filter$Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure),
as.numeric(state_filter$Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia))
colnames(state_subset) <- c("Hospital.Name", "State", "heart_attack", "heart_failure", "pneumonia")
rm(state_filter)
#-- Conditional outcome columns
if(outcome == "heart attack"){
outcome_col = 3
} else if (outcome == "heart failure"){
outcome_col = 4
} else if (outcome == "pneumonia"){
outcome_col = 5
} else {
stop(print("Not valid input for outcome."))
}
#--Ranking identifier:
n <- as.numeric(sum(complete.cases(state_subset[,outcome_col])))
best_outcome <- min(state_subset[,outcome_col], na.rm = TRUE)
worst_outcome <- max(state_subset[,outcome_col], na.rm = TRUE)
if(num == "best") {
Nth_score <- best_outcome
} else if(num == "worst") {
Nth_score <- worst_outcome
} else if(num >= 1 & num <= n) {
Nth_score <- (sort(state_subset[,outcome_col], partial=(n-(n-num)))[n-(n-num)])
} else if(num < 0 | num > n) {
return("NA")
} else {
stop(print("Invalid input for num"))
}
sort(state_subset$Hospital.Name, decreasing = TRUE)
lookup_row <- which(state_subset$State == state
& state_subset[,outcome_col] == Nth_score
& complete.cases(state_subset[,outcome_col]) == T)
lookup_col <- as.numeric(which(colnames(state_subset)=="Hospital.Name"))
Nth_best <- as.vector(state_subset[lookup_row,lookup_col])
countof_Nth_best <- as.numeric(length(Nth_best))
if(countof_Nth_best == 1){
print(Nth_best)
} else if (countof_Nth_best > 1){
#-- Breaks ties
top_Nth_filter <- state_subset[state_subset[,outcome_col] <= Nth_score & complete.cases(state_subset[,outcome_col]) == T,]
top_Nth <- data.frame(as.character(top_Nth_filter$Hospital.Name),
as.character(top_Nth_filter$State),
as.numeric(top_Nth_filter$heart_attack),
as.numeric(top_Nth_filter$heart_failure),
as.numeric(top_Nth_filter$pneumonia))
colnames(top_Nth) <- c("Hospital.Name", "State", "heart_attack", "heart_failure", "pneumonia")
rm(top_Nth_filter)
outcome_col_name <- names(top_Nth)[outcome_col]
hospital_col_name <- names(top_Nth)[1]
with_order <- with(top_Nth, order(top_Nth[outcome_col_name], top_Nth[hospital_col_name]))
top_Nth_ordered <- top_Nth[with_order, ]
print(as.vector(top_Nth_ordered[num,1]))
} else {
stop(print("No Hospitals Qualify"))
}
}
}
Here are 2 examples of the input and output:
#"HARFORD MEMORIAL HOSPITAL"
rankhospital("MD", "heart attack", "worst")
## [1] "HARFORD MEMORIAL HOSPITAL"
#"NA"
rankhospital("ID", "heart failure", 20)
## [1] "NA"
This last program is called ‘Rank All’. It’s a function called rankall that takes two arguments: an outcome name (outcome) and a hospital ranking (num).
The function reads the outcome-of-care-measures.csv and returns a 2-column data frame containing the hospital in each state that has the ranking specified in num.
For example the function call rankall(“heart attack”, “best”) would return a data frame containing the names of the hospitals that are the best in their respective states for 30-day heart attack death rates.
The function should return a value for every state (some may be NA). The first column in the data frame is named hospital, which contains the hospital name, and the second column is named state, which contains the 2-character abbreviation for the state name. Hospitals that do not have data on a particular outcome should be excluded from the set of hospitals when deciding the rankings.
# "rankall" function takes outcomes ("heart attack", "heart failure",
# or "pneumonia"), and rank, and returns hospital with
# the lowest mortality rate for that outcome per state.
rankall <- function(outcome, num = "best") {
state <- as.vector(unique(outcome_data$State))
hospital <- vector(mode ="character")
for (i in seq_along(state)){
rankhospital <- function(state, outcome, num = "best") {
#--Input testing on state and outcome:
possible_state <- (unique(outcome_data$State) == state)
possible_outcome <- (c("heart attack", "heart failure", "pneumonia") == outcome)
if(sum(possible_state) != 1){
stop(print("invalid state"))
} else if(sum(possible_outcome) != 1){
stop(print("invalid outcome"))
} else {
#-- Creates State subset
state_filter <- outcome_data[outcome_data$State == state,]
state_filter[state_filter == "Not Available" ] = NA
#Turns "Not Availible" string into NA
state_subset <- data.frame(as.character(state_filter$Hospital.Name),
as.character(state_filter$State),
as.numeric(state_filter$Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack),
as.numeric(state_filter$Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure),
as.numeric(state_filter$Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia))
colnames(state_subset) <- c("Hospital.Name", "State", "heart_attack", "heart_failure", "pneumonia")
rm(state_filter)
#-- Conditional outcome columns
if(outcome == "heart attack"){
outcome_col = 3
} else if (outcome == "heart failure"){
outcome_col = 4
} else if (outcome == "pneumonia"){
outcome_col = 5
} else {
stop(print("Not valid input for outcome."))
}
#--Ranking identifier:
n <- as.numeric(sum(complete.cases(state_subset[,outcome_col])))
best_outcome <- min(state_subset[,outcome_col], na.rm = TRUE)
worst_outcome <- max(state_subset[,outcome_col], na.rm = TRUE)
if(num == "best") {
Nth_score <- best_outcome
} else if(num == "worst") {
Nth_score <- worst_outcome
} else if(num >= 1 & num <= n) {
Nth_score <- (sort(state_subset[,outcome_col], partial=(n-(n-num)))[n-(n-num)])
} else if(num < 0 | num > n) {
return("NA")
} else {
return("NA")
}
sort(state_subset$Hospital.Name, decreasing = TRUE)
lookup_row <- which(state_subset$State == state
& state_subset[,outcome_col] == Nth_score
& complete.cases(state_subset[,outcome_col]) == T)
lookup_col <- as.numeric(which(colnames(state_subset)=="Hospital.Name"))
Nth_best <- as.vector(state_subset[lookup_row,lookup_col])
countof_Nth_best <- as.numeric(length(Nth_best))
if(countof_Nth_best == 1){
print(Nth_best)
} else if (countof_Nth_best > 1){
#-- Breaks ties
top_Nth_filter <- state_subset[state_subset[,outcome_col] <= Nth_score & complete.cases(state_subset[,outcome_col]) == T,]
top_Nth <- data.frame(as.character(top_Nth_filter$Hospital.Name),
as.character(top_Nth_filter$State),
as.numeric(top_Nth_filter$heart_attack),
as.numeric(top_Nth_filter$heart_failure),
as.numeric(top_Nth_filter$pneumonia))
colnames(top_Nth) <- c("Hospital.Name", "State", "heart_attack", "heart_failure", "pneumonia")
rm(top_Nth_filter)
outcome_col_name <- names(top_Nth)[outcome_col]
hospital_col_name <- names(top_Nth)[1]
with_order <- with(top_Nth, order(top_Nth[outcome_col_name], top_Nth[hospital_col_name]))
top_Nth_ordered <- top_Nth[with_order, ]
as.vector(top_Nth_ordered[num,1])
} else {
stop(print("No Hospitals Qualify"))
}
}
}
hospital[i] <- rankhospital(state[i], outcome, num)
}
Nth_best <- data.frame(hospital, state)
print(Nth_best)
}
Here are 3 examples of the input and output:
#Top 20 on heart attack across the Country
rankall("heart attack", 20)
## [1] "D W MCMILLAN MEMORIAL HOSPITAL"
## [1] "JOHN C LINCOLN DEER VALLEY HOSPITAL"
## [1] "SKY RIDGE MEDICAL CENTER"
## [1] "COVENANT MEDICAL CENTER"
## [1] "COFFEYVILLE REGIONAL MEDICAL CENTER"
## [1] "HEYWOOD HOSPITAL"
## [1] "MARION GENERAL HOSPITAL"
## [1] "FRANKLIN REGIONAL HOSPITAL"
## [1] "MEDWEST HAYWOOD"
## [1] "HOSPITAL METROPOLITANO DR TITO MATTEI"
## [1] "ST CROIX REG MED CTR"
## hospital state
## 1 D W MCMILLAN MEMORIAL HOSPITAL AL
## 2 NA AK
## 3 JOHN C LINCOLN DEER VALLEY HOSPITAL AZ
## 4 ARKANSAS METHODIST MEDICAL CENTER AR
## 5 SHERMAN OAKS HOSPITAL CA
## 6 SKY RIDGE MEDICAL CENTER CO
## 7 MIDSTATE MEDICAL CENTER CT
## 8 NA DE
## 9 NA DC
## 10 SOUTH FLORIDA BAPTIST HOSPITAL FL
## 11 UPSON REGIONAL MEDICAL CENTER GA
## 12 NA HI
## 13 NA ID
## 14 JESSE BROWN VA MEDICAL CENTER - VA CHICAGO HEALTHCARE SYSTEM IL
## 15 COMMUNITY HOSPITAL IN
## 16 COVENANT MEDICAL CENTER IA
## 17 COFFEYVILLE REGIONAL MEDICAL CENTER KS
## 18 KING'S DAUGHTERS' MEDICAL CENTER KY
## 19 NORTH OAKS MEDICAL CENTER, LLC LA
## 20 RUMFORD HOSPITAL ME
## 21 CIVISTA MEDICAL CENTER MD
## 22 HEYWOOD HOSPITAL MA
## 23 GENESYS REGIONAL MEDICAL CENTER - HEALTH PARK MI
## 24 HEALTHEAST WOODWINDS HOSPITAL MN
## 25 MARION GENERAL HOSPITAL MS
## 26 LIBERTY HOSPITAL MO
## 27 NA MT
## 28 NA NE
## 29 NA NV
## 30 FRANKLIN REGIONAL HOSPITAL NH
## 31 CAPITAL HEALTH MEDICAL CENTER - HOPEWELL NJ
## 32 NA NM
## 33 METROPOLITAN HOSPITAL CENTER NY
## 34 MEDWEST HAYWOOD NC
## 35 NA ND
## 36 CINCINNATI VA MEDICAL CENTER OH
## 37 JACKSON COUNTY MEMORIAL HOSPITAL OK
## 38 ST ALPHONSUS MEDICAL CENTER - BAKER CITY, INC OR
## 39 UPMC PASSAVANT PA
## 40 HOSPITAL METROPOLITANO DR TITO MATTEI PR
## 41 NA RI
## 42 PALMETTO HEALTH BAPTIST SC
## 43 NA SD
## 44 INDIAN PATH MEDICAL CENTER TN
## 45 NIX HEALTH CARE SYSTEM TX
## 46 NA UT
## 47 NA VT
## 48 NA VI
## 49 CARILION GILES COMMUNITY HOSPITAL VA
## 50 SWEDISH MEDICAL CENTER WA
## 51 PLATEAU MEDICAL CENTER WV
## 52 ST CROIX REG MED CTR WI
## 53 NA WY
## 54 NA GU
#Top 20 on pneumonia across the Country
rankall("pneumonia", 20)
## [1] "SCOTTSDALE HEALTHCARE-SHEA MEDICAL CENTER"
## [1] "JOHNS HOPKINS BAYVIEW MEDICAL CENTER"
## [1] "CONCORD HOSPITAL"
## [1] "LOS ALAMOS MEDICAL CENTER"
## [1] "LINTON HOSPITAL - CAH"
## [1] "ST VINCENT CHARITY MEDICAL CENTER"
## [1] "SEQUOYAH MEMORIAL HOSPITAL"
## [1] "SISTEMA INTEGRADOS DE SALUD DEL SUR OESTE INC"
## [1] "MARLBORO PARK HOSPITAL"
## [1] "DAVIS HOSPITAL AND MEDICAL CENTER"
## hospital state
## 1 CHILTON MEDICAL CENTER AL
## 2 NA AK
## 3 SCOTTSDALE HEALTHCARE-SHEA MEDICAL CENTER AZ
## 4 BAPTIST HEALTH MEDICAL CENTER HEBER SPINGS AR
## 5 FOUNTAIN VALLEY REGIONAL HOSPITAL & MEDICAL CENTER CA
## 6 VALLEY VIEW HOSPITAL ASSOCIATION CO
## 7 MIDSTATE MEDICAL CENTER CT
## 8 NA DE
## 9 NA DC
## 10 KENDALL REGIONAL MEDICAL CENTER FL
## 11 JASPER MEMORIAL HOSPITAL GA
## 12 NA HI
## 13 BOUNDARY COMMUNITY HOSPITAL ID
## 14 METHODIST HOSPITAL OF CHICAGO IL
## 15 ST MARY MEDICAL CENTER INC IN
## 16 OTTUMWA REGIONAL HEALTH CENTER IA
## 17 ANDERSON COUNTY HOSPITAL KS
## 18 NORTON HOSPITALS, INC KY
## 19 THE REGIONAL MEDICAL CENTER OF ACADIANA LA
## 20 DOWN EAST COMMUNITY HOSPITAL ME
## 21 JOHNS HOPKINS BAYVIEW MEDICAL CENTER MD
## 22 HOLY FAMILY HOSPITAL MA
## 23 PENNOCK HOSPITAL MI
## 24 BUFFALO HOSPITAL MN
## 25 GRENADA LAKE MEDICAL CENTER MS
## 26 BARNES-JEWISH WEST COUNTY HOSPITAL MO
## 27 MARCUS DALY MEMORIAL HOSPITAL - CAH MT
## 28 FAITH REGIONAL HEALTH SERVICES NE
## 29 BOULDER CITY HOSPITAL NV
## 30 CONCORD HOSPITAL NH
## 31 HOLY NAME MEDICAL CENTER NJ
## 32 LOS ALAMOS MEDICAL CENTER NM
## 33 NIAGARA FALLS MEMORIAL MEDICAL CENTER NY
## 34 SOUTHEASTERN REGIONAL MEDICAL CENTER NC
## 35 LINTON HOSPITAL - CAH ND
## 36 ST VINCENT CHARITY MEDICAL CENTER OH
## 37 SEQUOYAH MEMORIAL HOSPITAL OK
## 38 WALLOWA MEMORIAL HOSPITAL OR
## 39 ELK REGIONAL HEALTH CENTER PA
## 40 SISTEMA INTEGRADOS DE SALUD DEL SUR OESTE INC PR
## 41 NA RI
## 42 MARLBORO PARK HOSPITAL SC
## 43 ST MICHAEL'S HOSPITAL - CRITICAL ACCESS HOSPITAL SD
## 44 METHODIST MEDICAL CENTER OF OAK RIDGE TN
## 45 DOCTORS HOSPITAL TIDWELL TX
## 46 DAVIS HOSPITAL AND MEDICAL CENTER UT
## 47 NA VT
## 48 NA VI
## 49 SENTARA NORFOLK GENERAL HOSPITAL VA
## 50 KADLEC REGIONAL MEDICAL CENTER WA
## 51 FAIRMONT GENERAL HOSPITAL WV
## 52 COMMUNITY MEMORIAL HSPTL WI
## 53 SHERIDAN VA MEDICAL CENTER WY
## 54 NA GU
Researching for the right hospital takes time and is a long process. These programs allow for the user to narrow his/her search very quickly by drawing from data and information about the quality of care at over 4,000 hospitals in the U.S. Narrowing that search is half the battle!
A full description of the variables in each of the files is in the included PDF named Hospital_Revised_Flatfiles.pdf.
Enjoy the program!