The goal of this study is to analyze the Hospital Outcome of care data and draw inferences. Similar data can be downloaded from the medicare website (www.medicare.gov).
The data helps compare hospitals based on the mortality and readmission rate from Heart Attack, Heart Failure and Pneumonia.
In this case study we will compare data based on Heart Attack. However similar analysis can be done on Heart Faiure and Pneumonia.
Load the required libraries
library(jsonlite)
library(dplyr)
library(ggplot2)
library(maps)
library(scales)
Load the data into a dataframe. (This is sample data and doesnot include the entire dataset)
dataHospOutcome <- fromJSON('hospital-outcome-of-care.json')
Clean Up the data for analysis
# Convert the Mortality Rate to numeric
dataHospOutcome$`Hospital 30-Day Death (Mortality) Rates from Heart Attack` <-
as.numeric(dataHospOutcome$`Hospital 30-Day Death (Mortality) Rates from Heart Attack`)
dataHospOutcome$`Hospital 30-Day Death (Mortality) Rates from Heart Failure` <-
as.numeric(dataHospOutcome$`Hospital 30-Day Death (Mortality) Rates from Heart Failure`)
dataHospOutcome$`Hospital 30-Day Death (Mortality) Rates from Pneumonia` <-
as.numeric(dataHospOutcome$`Hospital 30-Day Death (Mortality) Rates from Pneumonia`)
# Convert State to factor variable
dataHospOutcome$State <- as.factor(dataHospOutcome$State)
## Convert US comparision Column to factor variable
dataHospOutcome$`Comparison to U.S. Rate - Hospital 30-Day Death (Mortality) Rates from Heart Attack` <-
as.factor(dataHospOutcome$`Comparison to U.S. Rate - Hospital 30-Day Death (Mortality) Rates from Heart Attack`)
## Remove rows with NA
## Select columns that are required for further analysis renaming the columns
dataHospOutcomeHA <- dataHospOutcome %>%
filter(is.na(`Hospital 30-Day Death (Mortality) Rates from Heart Attack`) == FALSE) %>%
rename(ProviderNumber = `Provider Number`,
ZipCode = `ZIP Code`,
ComparisonToUSRateFromHeartAttack = `Comparison to U.S. Rate - Hospital 30-Day Death (Mortality) Rates from Heart Attack`,
MortFromHeartAttack = `Hospital 30-Day Death (Mortality) Rates from Heart Attack`) %>%
select(ProviderNumber, State, ZipCode, MortFromHeartAttack, ComparisonToUSRateFromHeartAttack) %>%
arrange(State)
Here is a quick look at the structure of the cleaned up data set
str(dataHospOutcomeHA)
## 'data.frame': 2720 obs. of 5 variables:
## $ ProviderNumber : chr "20001" "20006" "20012" "20017" ...
## $ State : Factor w/ 54 levels "AK","AL","AR",..: 1 1 1 1 1 2 2 2 2 2 ...
## $ ZipCode : int 99519 99645 99701 99508 99508 36301 35957 35631 35235 35968 ...
## $ MortFromHeartAttack : num 13.4 17.7 15.5 14.5 15.7 14.3 18.5 18.1 17.7 18 ...
## $ ComparisonToUSRateFromHeartAttack: Factor w/ 5 levels "Better than U.S. National Rate",..: 2 2 2 2 2 2 2 2 2 2 ...
A look at the first few row from the cleaned data set.
The data contains
head(dataHospOutcomeHA)
## ProviderNumber State ZipCode MortFromHeartAttack
## 1 20001 AK 99519 13.4
## 2 20006 AK 99645 17.7
## 3 20012 AK 99701 15.5
## 4 20017 AK 99508 14.5
## 5 20026 AK 99508 15.7
## 6 10001 AL 36301 14.3
## ComparisonToUSRateFromHeartAttack
## 1 No Different than U.S. National Rate
## 2 No Different than U.S. National Rate
## 3 No Different than U.S. National Rate
## 4 No Different than U.S. National Rate
## 5 No Different than U.S. National Rate
## 6 No Different than U.S. National Rate
We an see that the factors that compare the hospital to the US average has values like:
levels(dataHospOutcomeHA$ComparisonToUSRateFromHeartAttack)
## [1] "Better than U.S. National Rate"
## [2] "No Different than U.S. National Rate"
## [3] "Not Available"
## [4] "Number of Cases Too Small"
## [5] "Worse than U.S. National Rate"
Plot shows the min, max, draws a box across the first to third quantile and marks the median
plot (dataHospOutcomeHA$State, dataHospOutcomeHA$MortFromHeartAttack, main="Hospital Outcome Analysis - Rate from Heart Attack across States", horizontal = TRUE, xlab="Rate", ylab="State", col="light blue", las=1, cex.lab=1.15, cex.main=2)
Plot the number of providers that fall in the factors of the US Average
We can observer that there are a few states with providers that are exceptional whose averages are better than US Average and a few that are worser than US average.
Lets look at the states in US that has the highest Mortality Rate.
Deep colors depicts the states with providers having highest rate on average.
dataGroupByState <-
dataHospOutcomeHA %>%
group_by(State) %>%
summarise(AvgMortRateHA = mean(`MortFromHeartAttack`)) %>%
select(State, AvgMortRateHA)
dataStates <- map_data("state")
dataStates <- dataStates %>%
mutate(State = state.abb[match(toupper(dataStates$region), toupper(state.name))])
dataStates$State <- as.factor(dataStates$State)
dataStatesAndHA <- merge(dataStates, dataGroupByState, by.x = "State", by.y ="State", all.x = TRUE)
ggplot(dataStatesAndHA, aes(x=long, y=lat, group = group, fill = AvgMortRateHA)) +
geom_polygon(color="white") +
scale_fill_gradient(name="Rate", low="#ffe8ee", high="#c81f49", guide="colorbar", na.value="#eeeeee", breaks = pretty_breaks(n=5)) +
labs(title="Mortality Rate for Heart Attacks") +
theme(plot.title = element_text(size=25, face="bold")) +
theme(axis.title.x = element_blank(), axis.title.y = element_blank(), axis.text.y = element_blank(), axis.text.x = element_blank(),
axis.line = element_blank(), axis.ticks = element_blank(), panel.background = element_blank()) +
coord_map()