Synopsys

The goal of this study is to analyze the Hospital Outcome of care data and draw inferences. Similar data can be downloaded from the medicare website (www.medicare.gov).

The data helps compare hospitals based on the mortality and readmission rate from Heart Attack, Heart Failure and Pneumonia.

In this case study we will compare data based on Heart Attack. However similar analysis can be done on Heart Faiure and Pneumonia.

Pre-Processing

Load the required libraries

library(jsonlite)
library(dplyr)
library(ggplot2)
library(maps)
library(scales)

Load the data into a dataframe. (This is sample data and doesnot include the entire dataset)

dataHospOutcome <- fromJSON('hospital-outcome-of-care.json')

Data Cleanup

Clean Up the data for analysis

# Convert the Mortality Rate to numeric
dataHospOutcome$`Hospital 30-Day Death (Mortality) Rates from Heart Attack` <- 
  as.numeric(dataHospOutcome$`Hospital 30-Day Death (Mortality) Rates from Heart Attack`)
dataHospOutcome$`Hospital 30-Day Death (Mortality) Rates from Heart Failure` <- 
  as.numeric(dataHospOutcome$`Hospital 30-Day Death (Mortality) Rates from Heart Failure`)
dataHospOutcome$`Hospital 30-Day Death (Mortality) Rates from Pneumonia` <- 
  as.numeric(dataHospOutcome$`Hospital 30-Day Death (Mortality) Rates from Pneumonia`)

# Convert State to factor variable
dataHospOutcome$State <- as.factor(dataHospOutcome$State)

## Convert US comparision Column to factor variable
dataHospOutcome$`Comparison to U.S. Rate - Hospital 30-Day Death (Mortality) Rates from Heart Attack` <- 
  as.factor(dataHospOutcome$`Comparison to U.S. Rate - Hospital 30-Day Death (Mortality) Rates from Heart Attack`)

## Remove rows with NA 
##    Select columns that are required for further analysis renaming the columns
dataHospOutcomeHA <- dataHospOutcome %>%
  filter(is.na(`Hospital 30-Day Death (Mortality) Rates from Heart Attack`) == FALSE) %>%
  rename(ProviderNumber = `Provider Number`,
         ZipCode = `ZIP Code`,
         ComparisonToUSRateFromHeartAttack = `Comparison to U.S. Rate - Hospital 30-Day Death (Mortality) Rates from Heart Attack`,
         MortFromHeartAttack = `Hospital 30-Day Death (Mortality) Rates from Heart Attack`) %>%
  select(ProviderNumber, State, ZipCode, MortFromHeartAttack, ComparisonToUSRateFromHeartAttack) %>%
  arrange(State)

Here is a quick look at the structure of the cleaned up data set

str(dataHospOutcomeHA)
## 'data.frame':    2720 obs. of  5 variables:
##  $ ProviderNumber                   : chr  "20001" "20006" "20012" "20017" ...
##  $ State                            : Factor w/ 54 levels "AK","AL","AR",..: 1 1 1 1 1 2 2 2 2 2 ...
##  $ ZipCode                          : int  99519 99645 99701 99508 99508 36301 35957 35631 35235 35968 ...
##  $ MortFromHeartAttack              : num  13.4 17.7 15.5 14.5 15.7 14.3 18.5 18.1 17.7 18 ...
##  $ ComparisonToUSRateFromHeartAttack: Factor w/ 5 levels "Better than U.S. National Rate",..: 2 2 2 2 2 2 2 2 2 2 ...

Explore Data Analysis

A look at the first few row from the cleaned data set.

The data contains

head(dataHospOutcomeHA)
##   ProviderNumber State ZipCode MortFromHeartAttack
## 1          20001    AK   99519                13.4
## 2          20006    AK   99645                17.7
## 3          20012    AK   99701                15.5
## 4          20017    AK   99508                14.5
## 5          20026    AK   99508                15.7
## 6          10001    AL   36301                14.3
##      ComparisonToUSRateFromHeartAttack
## 1 No Different than U.S. National Rate
## 2 No Different than U.S. National Rate
## 3 No Different than U.S. National Rate
## 4 No Different than U.S. National Rate
## 5 No Different than U.S. National Rate
## 6 No Different than U.S. National Rate

We an see that the factors that compare the hospital to the US average has values like:

levels(dataHospOutcomeHA$ComparisonToUSRateFromHeartAttack)
## [1] "Better than U.S. National Rate"      
## [2] "No Different than U.S. National Rate"
## [3] "Not Available"                       
## [4] "Number of Cases Too Small"           
## [5] "Worse than U.S. National Rate"

Compare Rate of Heart Attack across states

Plot shows the min, max, draws a box across the first to third quantile and marks the median

plot (dataHospOutcomeHA$State, dataHospOutcomeHA$MortFromHeartAttack, main="Hospital Outcome Analysis - Rate from Heart Attack across States", horizontal = TRUE, xlab="Rate", ylab="State", col="light blue", las=1, cex.lab=1.15, cex.main=2)

We can observe the median is high for Arkansas.

State Outcomes to US Average

Plot the number of providers that fall in the factors of the US Average

  • Better than U.S. National Rate
  • No Different than U.S. National Rate
  • Worse than U.S. National Rate

We can observer that there are a few states with providers that are exceptional whose averages are better than US Average and a few that are worser than US average.

State with highest Mortality Rates

Lets look at the states in US that has the highest Mortality Rate.

Deep colors depicts the states with providers having highest rate on average.

dataGroupByState <-
  dataHospOutcomeHA %>% 
  group_by(State) %>%
  summarise(AvgMortRateHA = mean(`MortFromHeartAttack`)) %>%
  select(State, AvgMortRateHA)

dataStates <- map_data("state")

dataStates <- dataStates %>% 
  mutate(State = state.abb[match(toupper(dataStates$region), toupper(state.name))])

dataStates$State <- as.factor(dataStates$State)

dataStatesAndHA <- merge(dataStates, dataGroupByState, by.x = "State", by.y ="State", all.x = TRUE)

ggplot(dataStatesAndHA, aes(x=long, y=lat, group = group, fill = AvgMortRateHA)) +
  geom_polygon(color="white") +
  scale_fill_gradient(name="Rate", low="#ffe8ee", high="#c81f49", guide="colorbar", na.value="#eeeeee", breaks = pretty_breaks(n=5)) +
  labs(title="Mortality Rate for Heart Attacks") +
  theme(plot.title = element_text(size=25, face="bold")) +
  theme(axis.title.x = element_blank(), axis.title.y = element_blank(), axis.text.y = element_blank(), axis.text.x = element_blank(),
        axis.line = element_blank(), axis.ticks = element_blank(), panel.background = element_blank()) +
  coord_map()

Inference

From the map

  • Arkansa, Vermonts are some of the states with Higher Rate of Mortality from Heart Attack on average.
  • Arkansas, Oregon, Washington are some of the states with Higher Rate of Mortality from Heart Failure on average.
  • Arkansas, Nevada, Mississippi are some of the states with Higher Rate of Mortality from Pneumonia on average.