The data for this assignment come from the Hospital Compare web site (http://hospitalcompare.hhs.gov) run by the U.S. Department of Health and Human Services. The purpose of the web site is to provide data and information about the quality of care at over 4,000 Medicare-certified hospitals in the U.S. This dataset es- sentially covers all major U.S. hospitals. This dataset is used for a variety of purposes, including determining whether hospitals should be fined for not providing high quality care to patients (see http://goo.gl/jAXFX for some background on this particular topic).
Download the file ProgAssignment3-data.zip file containing the data for Programming Assignment 3 from the Coursera web site. Unzip the file in a directory that will serve as your working directory. When you start up R make sure to change your working directory to the directory where you unzipped the data.
Write a function called best() that takes TWO (2) arguments: (a) the TWO(2)-character abbreviated name of a state; and (b) an outcome name. The function reads the outcome-of-care-measures.csv file and returns a character vector with the name of the hospital that has the best (i.e. LOWEST) 30-day mortality for the specified outcome in that state. The hospital name is the name provided in the Hospital.Name variable. The outcomes can be one of “heart attack”, “heart failure”, or “pneumonia”. The function should use the following template.
best <- function(state, outcome) {
## Read outcome data
## Check that state and outcome are valid
## Return hospital name in that state with lowest 30-day death rate
} The function should check the validity of its arguments. If an invalid state value is passed to best(), the function should throw an error via the stop() function with the exact message “invalid state”. If an invalid outcome value is passed to best(), the function should throw an error via the stop() function with the exact message “invalid outcome”.
Save your code for this function to a file named best.R. To run the test script for this part, make sure your working directory has the file best.R in it.
Write a function called rankhospital() that takes THREE (3) arguments: (a) the TWO(2)-character abbreviated name of a state (state); (b) an outcome (outcome); and © the ranking of a hospital in that state for that outcome (num). The function reads the outcome-of-care-measures.csv file and returns a character vector with the name of the hospital that has the ranking specified by the num argument. For example, the call:
rankhospital(“MD”, “heart failure”, 5)
would return a character vector containing the name of the hospital with the FIFTH (5th) LOWEST THIRTY(30)-day death rate for heart failure. The num argument can take values “best”, “worst”, or an integer indicating the ranking (SMALLER numbers are better). If the number given by num is LARGER THAN the number of hospitals in that state, then the function should return NA. The function should use the following template.
rankhospital <- function(state, outcome, num = "best") {
## Read outcome data
## Check that state and outcome are valid
## Return hospital name in that state with the given rank
## THIRTY(30)-day death rate
} Hospitals that do NOT have data on a particular outcome should be excluded from the set of hospitals when deciding the rankings.
If there is MORE THAN ONE (1) hospital for a given ranking, then the hospital names should be sorted in alphabetical order and the FIRST (1st) hospital in that set should be returned (i.e. if hospitals “b”, “c”, and “f” are tied for a given rank, then hospital “b” should be returned).
The function should check the validity of its arguments. If an invalid state value is passed to rankhospital(), the function should throw an error via the stop() function with the exact message “invalid state”. If an invalid outcome value is passed to rankhospital(), the function should throw an error via the stop() function with the exact message “invalid outcome”. The num variable can take values “best”, “worst”, or an integer indicating the ranking (SMALLER numbers are better). If the number given by num is larger than the number of hospitals in that state, then the function should return NA.
Save your code for this function to a file named rankhospital.R. To run the test script for this part, make sure your working directory has the file rankhospital.R in it.
Write a function called rankall() that takes TWO (2) arguments: (a) an outcome name (outcome); and (b) a hospital ranking (num). The function reads the outcome-of-care-measures.csv file and returns a TWO(2)-column data frame containing the hospital in EACH state that has the ranking specified in num. For example the function call
rankall(“heart attack”, “best”)
would return a data frame containing the names of the hospitals that are the best in their respective states for THIRTY(30)-day heart attack death rates. The function should return a value for EVERY state (some may be NA). The FIRST (1st) column in the data frame is named hospital, which contains the hospital name, and the SECOND (2nd) column is named state, which contains the TWO(2)-character abbreviation for the state name. The function should use the following template.
rankall <- function(outcome, num = "best") {
## Read outcome data
## For each state, find the hospital of the given rank
## Return a data frame with the hospital names and the (abbreviated)
## state name
} Hospitals that do NOT have data on a particular outcome should be excluded from the set of hospitals when deciding the rankings.
If there is MORE THAN ONE (1) hospital for a given ranking, then the hospital names should be sorted in alphabetical order and the FIRST (1st) hospital in that set should be returned (i.e. if hospitals “b”, “c”, and “f” are tied for a given rank, then hospital “b” should be returned).
NOTE: For the purpose of this part of the assignment (and for efficiency), your function should NOT call the rankhospital() function from the previous section.
The function should check the validity of its arguments. If an invalid outcome value is passed to rankall(), the function should throw an error via the stop() function with the exact message “invalid outcome”. The num variable can take values “best”, “worst”, or an integer indicating the ranking (SMALLER numbers are better). If the number given by num is larger than the number of hospitals in that state, then the function should return NA.
Save your code for this function to a file named rankall.R. To run the test script for this part, make sure your working directory has the file rankall.R in it.
The first function finds best hospital in state
best<- function(state, outcome)
{
outcome1 <- read.csv("outcome-of-care-measures.csv",
colClasses = "character")
if(!any(state == outcome1$State)){
stop("invalid state")}
else if((outcome %in% c("heart attack", "heart failure",
"pneumonia")) == FALSE) {
stop(print("invalid outcome"))
}
outcome2 <- subset(outcome1, State == state)
if (outcome == "heart attack") {
colnum <- 11
}
else if (outcome == "heart failure") {
colnum <- 17
}
else {
colnum <- 23
}
min_row <- which(as.numeric(outcome2[ ,colnum]) ==
min(as.numeric(outcome2[ ,colnum]), na.rm = TRUE))
hospitals <- outcome2[min_row,2]
hospitals <- sort(hospitals)
return(hospitals[1])
}
# example output:
best("SC", "heart attack") ## [1] "MUSC MEDICAL CENTER"
The second funtion ranks hospitals by outcome in a state
rankhospital<- function(state, outcome, num = "best")
{
outcome1 <- read.csv("outcome-of-care-measures.csv",
colClasses = "character")
if(!any(state == outcome1$State)){
stop("invalid state")}
else if((outcome %in% c("heart attack", "heart failure",
"pneumonia")) == FALSE) {
stop(print("invalid outcome"))
}
outcome2 <- subset(outcome1, State == state)
if (outcome == "heart attack") {
colnum <- 11
}
else if (outcome == "heart failure") {
colnum <- 17
}
else {
colnum <- 23
}
outcome2[ ,colnum] <- as.numeric(outcome2[ ,colnum])
outcome3 <- outcome2[order(outcome2[ ,colnum],outcome2[,2]), ]
outcome3 <- outcome3[(!is.na(outcome3[ ,colnum])),]
if(num == "best"){
num <- 1
}
else if (num == "worst"){
num <- nrow(outcome3)
}
return(outcome3[num,2])
}
# example output:
rankhospital("NC", "heart attack", "worst")## [1] "WAYNE MEMORIAL HOSPITAL"
The third function ranks hospitals in all states.
rankall<- function(outcome, num = "best")
{
library(dplyr)
library(magrittr)
outcome2 <- read.csv("outcome-of-care-measures.csv",
colClasses = "character")
if((outcome %in% c("heart attack", "heart failure",
"pneumonia")) == FALSE) {
stop(print("invalid outcome"))
}
if (outcome == "heart attack") {
colnum <- 11
}
else if (outcome == "heart failure") {
colnum <- 17
}
else {
colnum <- 23
}
outcome2[ ,colnum] <- as.numeric(outcome2[ ,colnum])
outcome2 = outcome2[!is.na(outcome2[,colnum]),]
splited = split(outcome2, outcome2$State)
ans = lapply(splited, function(x, num) {
x = x[order(x[,colnum], x$Hospital.Name),]
if(class(num) == "character") {
if(num == "best") {
return (x$Hospital.Name[1])
}
else if(num == "worst") {
return (x$Hospital.Name[nrow(x)])
}
}
else {
return (x$Hospital.Name[num])
}
}, num)
#Return data.frame with format
return ( data.frame(hospital=unlist(ans), state=names(ans)) )
}
# example output:
r <- rankall("heart attack", 4)
as.character(subset(r, state == "HI")$hospital)## [1] "CASTLE MEDICAL CENTER"
head(rankall("heart attack", "worst"))## hospital state
## AK MAT-SU REGIONAL MEDICAL CENTER AK
## AL HELEN KELLER MEMORIAL HOSPITAL AL
## AR MEDICAL CENTER SOUTH ARKANSAS AR
## AZ VERDE VALLEY MEDICAL CENTER AZ
## CA METHODIST HOSPITAL OF SACRAMENTO CA
## CO NORTH SUBURBAN MEDICAL CENTER CO