Introduction

The prupose of this project is to rank over 4000 US hospitals according to the quality of care. The data for this assignment reprsent a small subset of the data available at the Hospital Compare web site (http://hospitalcompare.hhs.gov) run by the U.S. Department of Health and Human Services.

The zip file for this assignment contains three files:

Finding the best hospital in a state

We Write a function called best which takes two arguments: the 2-character abbreviated name of a state and an outcome name. The function returns a character vector with the name of the hospital that has the lowest 30-day mortality for the specified outcome in that state. The outcomes can be one of “heart attack”, “heart failure”, or “pneumonia”. The Hospitals that do not have data on a particular outcome are excluded from the set of hospitals when deciding the rankings.

If there is a tie for the best hospital for a given outcome, then the hospital names should be sorted in alphabetical order and the first hospital in that set should be chosen.

The function checks the validity of its arguments and throws an error via the stop function with the message “invalid state” or “invalid outcome” when an invalid state resp. outcome value is passed.

library(data.table)
best <- function(state, outcome) {
## Read outcome data

    data <- read.csv("outcome-of-care-measures.csv", colClasses = "character")

## Check that state and outcome are valid

    if (!(state %in% data$State)) {
        result <- "invalid state"
      }
    else if (!outcome %in% c("heart attack", "heart failure", "pneumonia")) {
        result <- "invalid outcome"
      }
    else{
        keys <- c("heart attack" = 11, "heart failure" = 17, "pneumonia" = 23)
        outcomeKey <- keys[outcome]
  
## Return hospital name in that state with lowest 30-day death rate
  
        dataPerState <- split(data, data$State)
        dataOurState <- dataPerState[[state]]
        dataOurState <- dataOurState[ order(dataOurState["Hospital.Name"]), ]
        dataOutcome <- suppressWarnings(as.numeric(dataOurState[, outcomeKey]))
        good <- complete.cases(dataOutcome)
        dataOutcome <- dataOutcome[good]
        dataOurState <- dataOurState[good,]
        minimum <- min(dataOutcome)
        index <- match(minimum, dataOutcome)
        result <- dataOurState[index, 2]
    }
    result
}

Testing best:

A set of state names and outcomes is used to check the function:

chk1 <- c("TX", "heart attack")
chk2 <- c("TX", "heart failure")
chk3 <- c("MD", "heart attack")
chk4 <- c("MD", "pneumonia")
chk5 <- c("BB", "heart attack")
chk6 <- c("NY", "hert attack")
dat <- data.table(chk1, chk2, chk3, chk4, chk5, chk6)
dat <- t(dat)
as.list(apply(dat, 1, function(x){do.call(best, as.list(x))}))
## $chk1
## [1] "CYPRESS FAIRBANKS MEDICAL CENTER"
## 
## $chk2
## [1] "FORT DUNCAN MEDICAL CENTER"
## 
## $chk3
## [1] "JOHNS HOPKINS HOSPITAL, THE"
## 
## $chk4
## [1] "GREATER BALTIMORE MEDICAL CENTER"
## 
## $chk5
## [1] "invalid state"
## 
## $chk6
## [1] "invalid outcome"

Ranking hospitals by outcome in a state

To this intent, I write a function rankHospital which takes three arguments: the 2-character abbreviated name of a state (state), an outcome (outcome), and the ranking of a hospital in that state for that outcome (num).

The function returns a character vector with the name of the hospital that has the ranking specified by the num argument. The num argument can take the values “best”, “worst”, or an integer indicating the ranking.

The Hospitals that do not have data on a particular outcome are excluded from the set of hospitals when deciding the rankings. Also, If the number given by num is larger than the number of hospitals in that state, then the function returns NA.

rankHospital <- function(state, outcome, num = "best") {
    
  
## Read outcome data

    data <- read.csv("outcome-of-care-measures.csv", colClasses = "character")

## Check that state and outcome are valid

    if (!(state %in% data$State)) {
        result <- "invalid state"
    }
    else if (!outcome %in% c("heart attack", "heart failure", "pneumonia")) {
        result <- "invalid outcome"
    }
    else {
        keys <- c("heart attack" = 11, "heart failure" = 17, "pneumonia" = 23)
        outcomeKey <- keys[outcome]
  
  
## Return hospital name in that state with the given rank
## 30-day death rate
  
        dataPerState <- split(data, data$State)
        dataOurState <- dataPerState[[state]]
        dataOutcome <- suppressWarnings(as.numeric(dataOurState[, outcomeKey]))
        good <- complete.cases(dataOutcome)
        dataOutcome <- dataOutcome[good]
        dataOurState <- dataOurState[good,]
        dataOurState <- dataOurState[order(dataOutcome, dataOurState["Hospital.Name"]),]
        if (grepl("^[0-9]+$", num)) {
            if (as.numeric(num) > length(dataOutcome)) {
                result <- NA
            }
            else {
                result <- dataOurState[as.numeric(num), "Hospital.Name"]
            }
        }    
        else if (num == "best") {
                result <- dataOurState[1, "Hospital.Name"]
        }
        else if (num == "worst") {
                result <- dataOurState[length(dataOutcome), "Hospital.Name"]
        }
        else result <- NA
    }
    result
}

Testing rankHospital

chk1 <- c("TX", "heart failure", 4)
chk2 <- c("MD", "heart attack", "worst")
chk3 <- c("MN", "heart attack", 5000)
dat <- data.table(chk1, chk2, chk3)
dat <- t(dat)
as.list(apply(dat, 1, function(x){do.call(rankHospital, as.list(x))}))
## $chk1
## [1] "DETAR HOSPITAL NAVARRO"
## 
## $chk2
## [1] "HARFORD MEMORIAL HOSPITAL"
## 
## $chk3
## [1] NA

Ranking hospitals in all states

I implement a function rankAll which takes as arguments the outcome name (outcome) and hospital ranking (num) and returns a 2-column data frame containing the hospital in each state that has the ranking specified in num.

The function returns a value for every state (some may be NA). The first column in the data frame contains the hospital name and the second one contains the 2-character abbreviation for the state name. Hospitals that do not have data on a particular outcome are excluded from the set of hospitals when deciding the rankings.

Although it is possible to call the rankHospital function from the previous section, I decided, for didactic purposes, not using it.

rankAll <- function(outcome, num = "best") {

    dataAll <- data.frame(hospital = character(), state = character())
  
## Read outcome data

    data <- read.csv("outcome-of-care-measures.csv", colClasses = "character")
  
## Check that outcome and num are valid

    if (!outcome %in% c("heart attack", "heart failure", "pneumonia")) {
        dataAll <- "invalid outcome"
    }
    else {
        keys <- c("heart attack" = 11, "heart failure" = 17, "pneumonia" = 23)
        outcomeKey <- keys[outcome]

## For each state, find the hospital of the given rank

        dataPerState <- split(data, data$State)
        for (stat in names(dataPerState)) {
        dataOurState <- dataPerState[[stat]]
        dataOutcome <- suppressWarnings(as.numeric(dataOurState[, outcomeKey]))
        good <- complete.cases(dataOutcome)
        dataOutcome <- dataOutcome[good]
        dataOurState <- dataOurState[good,]
        dataOurState <- dataOurState[ order(dataOutcome, dataOurState["Hospital.Name"]), ]
        
        if (num == "best") {
            numState <- c(1)
        } else {
            if (num == "worst") {
                numState <- length(dataOutcome)
            } else {
                numState <- num
            }
        }
    
        dataPart <- data.frame(hospital = dataOurState[numState, "Hospital.Name"], state = stat, row.names = stat)
        dataAll <- rbind(dataAll, dataPart)
        }
    }

## Return a data frame with the hospital names and the (abbreviated) state name

    dataAll
}

Testing rankAll

head(rankAll("heart attack", 20), 10)
##                               hospital state
## AK                                <NA>    AK
## AL      D W MCMILLAN MEMORIAL HOSPITAL    AL
## AR   ARKANSAS METHODIST MEDICAL CENTER    AR
## AZ JOHN C LINCOLN DEER VALLEY HOSPITAL    AZ
## CA               SHERMAN OAKS HOSPITAL    CA
## CO            SKY RIDGE MEDICAL CENTER    CO
## CT             MIDSTATE MEDICAL CENTER    CT
## DC                                <NA>    DC
## DE                                <NA>    DE
## FL      SOUTH FLORIDA BAPTIST HOSPITAL    FL
tail(rankAll("pneumonia", "worst"), 3)
##                                      hospital state
## WI MAYO CLINIC HEALTH SYSTEM - NORTHLAND, INC    WI
## WV                     PLATEAU MEDICAL CENTER    WV
## WY           NORTH BIG HORN HOSPITAL DISTRICT    WY
tail(rankAll("heart failure"), 10)
##                                                             hospital state
## TN                         WELLMONT HAWKINS COUNTY MEMORIAL HOSPITAL    TN
## TX                                        FORT DUNCAN MEDICAL CENTER    TX
## UT VA SALT LAKE CITY HEALTHCARE - GEORGE E. WAHLEN VA MEDICAL CENTER    UT
## VA                                          SENTARA POTOMAC HOSPITAL    VA
## VI                            GOV JUAN F LUIS HOSPITAL & MEDICAL CTR    VI
## VT                                              SPRINGFIELD HOSPITAL    VT
## WA                                         HARBORVIEW MEDICAL CENTER    WA
## WI                                    AURORA ST LUKES MEDICAL CENTER    WI
## WV                                         FAIRMONT GENERAL HOSPITAL    WV
## WY                                        CHEYENNE VA MEDICAL CENTER    WY