The prupose of this project is to rank over 4000 US hospitals according to the quality of care. The data for this assignment reprsent a small subset of the data available at the Hospital Compare web site (http://hospitalcompare.hhs.gov) run by the U.S. Department of Health and Human Services.
The zip file for this assignment contains three files:
We Write a function called best which takes two arguments: the 2-character abbreviated name of a state and an outcome name. The function returns a character vector with the name of the hospital that has the lowest 30-day mortality for the specified outcome in that state. The outcomes can be one of “heart attack”, “heart failure”, or “pneumonia”. The Hospitals that do not have data on a particular outcome are excluded from the set of hospitals when deciding the rankings.
If there is a tie for the best hospital for a given outcome, then the hospital names should be sorted in alphabetical order and the first hospital in that set should be chosen.
The function checks the validity of its arguments and throws an error via the stop function with the message “invalid state” or “invalid outcome” when an invalid state resp. outcome value is passed.
library(data.table)
best <- function(state, outcome) {
## Read outcome data
data <- read.csv("outcome-of-care-measures.csv", colClasses = "character")
## Check that state and outcome are valid
if (!(state %in% data$State)) {
result <- "invalid state"
}
else if (!outcome %in% c("heart attack", "heart failure", "pneumonia")) {
result <- "invalid outcome"
}
else{
keys <- c("heart attack" = 11, "heart failure" = 17, "pneumonia" = 23)
outcomeKey <- keys[outcome]
## Return hospital name in that state with lowest 30-day death rate
dataPerState <- split(data, data$State)
dataOurState <- dataPerState[[state]]
dataOurState <- dataOurState[ order(dataOurState["Hospital.Name"]), ]
dataOutcome <- suppressWarnings(as.numeric(dataOurState[, outcomeKey]))
good <- complete.cases(dataOutcome)
dataOutcome <- dataOutcome[good]
dataOurState <- dataOurState[good,]
minimum <- min(dataOutcome)
index <- match(minimum, dataOutcome)
result <- dataOurState[index, 2]
}
result
}
A set of state names and outcomes is used to check the function:
chk1 <- c("TX", "heart attack")
chk2 <- c("TX", "heart failure")
chk3 <- c("MD", "heart attack")
chk4 <- c("MD", "pneumonia")
chk5 <- c("BB", "heart attack")
chk6 <- c("NY", "hert attack")
dat <- data.table(chk1, chk2, chk3, chk4, chk5, chk6)
dat <- t(dat)
as.list(apply(dat, 1, function(x){do.call(best, as.list(x))}))
## $chk1
## [1] "CYPRESS FAIRBANKS MEDICAL CENTER"
##
## $chk2
## [1] "FORT DUNCAN MEDICAL CENTER"
##
## $chk3
## [1] "JOHNS HOPKINS HOSPITAL, THE"
##
## $chk4
## [1] "GREATER BALTIMORE MEDICAL CENTER"
##
## $chk5
## [1] "invalid state"
##
## $chk6
## [1] "invalid outcome"
To this intent, I write a function rankHospital which takes three arguments: the 2-character abbreviated name of a state (state), an outcome (outcome), and the ranking of a hospital in that state for that outcome (num).
The function returns a character vector with the name of the hospital that has the ranking specified by the num argument. The num argument can take the values “best”, “worst”, or an integer indicating the ranking.
The Hospitals that do not have data on a particular outcome are excluded from the set of hospitals when deciding the rankings. Also, If the number given by num is larger than the number of hospitals in that state, then the function returns NA.
rankHospital <- function(state, outcome, num = "best") {
## Read outcome data
data <- read.csv("outcome-of-care-measures.csv", colClasses = "character")
## Check that state and outcome are valid
if (!(state %in% data$State)) {
result <- "invalid state"
}
else if (!outcome %in% c("heart attack", "heart failure", "pneumonia")) {
result <- "invalid outcome"
}
else {
keys <- c("heart attack" = 11, "heart failure" = 17, "pneumonia" = 23)
outcomeKey <- keys[outcome]
## Return hospital name in that state with the given rank
## 30-day death rate
dataPerState <- split(data, data$State)
dataOurState <- dataPerState[[state]]
dataOutcome <- suppressWarnings(as.numeric(dataOurState[, outcomeKey]))
good <- complete.cases(dataOutcome)
dataOutcome <- dataOutcome[good]
dataOurState <- dataOurState[good,]
dataOurState <- dataOurState[order(dataOutcome, dataOurState["Hospital.Name"]),]
if (grepl("^[0-9]+$", num)) {
if (as.numeric(num) > length(dataOutcome)) {
result <- NA
}
else {
result <- dataOurState[as.numeric(num), "Hospital.Name"]
}
}
else if (num == "best") {
result <- dataOurState[1, "Hospital.Name"]
}
else if (num == "worst") {
result <- dataOurState[length(dataOutcome), "Hospital.Name"]
}
else result <- NA
}
result
}
chk1 <- c("TX", "heart failure", 4)
chk2 <- c("MD", "heart attack", "worst")
chk3 <- c("MN", "heart attack", 5000)
dat <- data.table(chk1, chk2, chk3)
dat <- t(dat)
as.list(apply(dat, 1, function(x){do.call(rankHospital, as.list(x))}))
## $chk1
## [1] "DETAR HOSPITAL NAVARRO"
##
## $chk2
## [1] "HARFORD MEMORIAL HOSPITAL"
##
## $chk3
## [1] NA
I implement a function rankAll which takes as arguments the outcome name (outcome) and hospital ranking (num) and returns a 2-column data frame containing the hospital in each state that has the ranking specified in num.
The function returns a value for every state (some may be NA). The first column in the data frame contains the hospital name and the second one contains the 2-character abbreviation for the state name. Hospitals that do not have data on a particular outcome are excluded from the set of hospitals when deciding the rankings.
Although it is possible to call the rankHospital function from the previous section, I decided, for didactic purposes, not using it.
rankAll <- function(outcome, num = "best") {
dataAll <- data.frame(hospital = character(), state = character())
## Read outcome data
data <- read.csv("outcome-of-care-measures.csv", colClasses = "character")
## Check that outcome and num are valid
if (!outcome %in% c("heart attack", "heart failure", "pneumonia")) {
dataAll <- "invalid outcome"
}
else {
keys <- c("heart attack" = 11, "heart failure" = 17, "pneumonia" = 23)
outcomeKey <- keys[outcome]
## For each state, find the hospital of the given rank
dataPerState <- split(data, data$State)
for (stat in names(dataPerState)) {
dataOurState <- dataPerState[[stat]]
dataOutcome <- suppressWarnings(as.numeric(dataOurState[, outcomeKey]))
good <- complete.cases(dataOutcome)
dataOutcome <- dataOutcome[good]
dataOurState <- dataOurState[good,]
dataOurState <- dataOurState[ order(dataOutcome, dataOurState["Hospital.Name"]), ]
if (num == "best") {
numState <- c(1)
} else {
if (num == "worst") {
numState <- length(dataOutcome)
} else {
numState <- num
}
}
dataPart <- data.frame(hospital = dataOurState[numState, "Hospital.Name"], state = stat, row.names = stat)
dataAll <- rbind(dataAll, dataPart)
}
}
## Return a data frame with the hospital names and the (abbreviated) state name
dataAll
}
head(rankAll("heart attack", 20), 10)
## hospital state
## AK <NA> AK
## AL D W MCMILLAN MEMORIAL HOSPITAL AL
## AR ARKANSAS METHODIST MEDICAL CENTER AR
## AZ JOHN C LINCOLN DEER VALLEY HOSPITAL AZ
## CA SHERMAN OAKS HOSPITAL CA
## CO SKY RIDGE MEDICAL CENTER CO
## CT MIDSTATE MEDICAL CENTER CT
## DC <NA> DC
## DE <NA> DE
## FL SOUTH FLORIDA BAPTIST HOSPITAL FL
tail(rankAll("pneumonia", "worst"), 3)
## hospital state
## WI MAYO CLINIC HEALTH SYSTEM - NORTHLAND, INC WI
## WV PLATEAU MEDICAL CENTER WV
## WY NORTH BIG HORN HOSPITAL DISTRICT WY
tail(rankAll("heart failure"), 10)
## hospital state
## TN WELLMONT HAWKINS COUNTY MEMORIAL HOSPITAL TN
## TX FORT DUNCAN MEDICAL CENTER TX
## UT VA SALT LAKE CITY HEALTHCARE - GEORGE E. WAHLEN VA MEDICAL CENTER UT
## VA SENTARA POTOMAC HOSPITAL VA
## VI GOV JUAN F LUIS HOSPITAL & MEDICAL CTR VI
## VT SPRINGFIELD HOSPITAL VT
## WA HARBORVIEW MEDICAL CENTER WA
## WI AURORA ST LUKES MEDICAL CENTER WI
## WV FAIRMONT GENERAL HOSPITAL WV
## WY CHEYENNE VA MEDICAL CENTER WY