Healthcare rankings:

Rankings are a common way of comparing different objects based on different metrics, and we have all interacted with this kind of data. For example, we might choose which school to attend based on studies which use metrics such as number of research publications, dollar amount of scholarships awarded, graduation outcomes and so on to rank them.

Similarly, states can be ranked based on the quality of care that their hospitals provide. The metrics used can vary widely, from number of hospital beds to percent of favorable ratings on patient experiences surveys. I stumbled upon this article that provides state healthcare rankings based on four metrics:

Number of readmissions
Number of preventable admissions
Medicare quality
Nursing home quality

The number of readmissions is a crucial metric for many hospitals, especially because Medicare reduces payments to hospitals with excess readmissions.

According to the artice, Utah is a state that is performing very well as far as the number of readmissions are concerned. It is ranked 3rd in the country, meaning the number of readmissions in Utah hospitals are relatively low. On the same scale, Mississippi is ranked 49th, indicating that Mississippi hospitals have a very high number of readmissions.

Let’s verify this for ourselves

In this walkthrough, we will use R to:

Access Medicare data to obtain the numbers of readmissions for each hospital in the country
Wrangle data
- Clean, filter and select columns of interest for the states of Utah and Mississippi (and rename columns to make them more understandable)
- Convert data from long to wide and back to get familiar with what works for human eyes and what R prefers.
- Use dplyr pipes to precisely extract information of interest from a large dataset.
Use a t-test to compare the mean number of readmissions for hospitals in Utah and Mississippi and see if the difference in ranking is justified statistically.

Load libraries

library(tidyverse)
library(knitr)
library(tinytex)

Accessing the data

The dataset can be found here.

dat <- read.csv("Readmissions.csv")

Data wrangling

The data has 19.7k rows and 12 columns.

colnames(dat)

The columns of interest for us are:

Column 1: ‘ï..Hospital.Name’: This is the name of the reporting hospital.
Column 3: ‘State’
Column 4: ‘Measure.Name’ : This is the code for the disease that is being evaluated. A more detailed explanation is given later in this document.
column 10: ‘Number.of.Readmissions’ : The number of readmissions.

Let us rename columns 1, 4 and 10 for readability.

names(dat)[c(1, 4, 10)] <- c("Hospital", "Measure", "Readmissions")

Now, let us use a dplyr pipe to first pull the data from Utah and Mississippi, and then visualize only the 4 columns of interest:

clean <- dat %>% filter(State %in% c("UT", "MS")) %>% 
  select(Hospital, State, Readmissions, Measure)

Let’s take a glimpse at a few random rows of this data.

clean[25:36, ]

##                             Hospital State  Readmissions
## 25           MERIT HEALTH RIVER OAKS    MS Not Available
## 26           MERIT HEALTH RIVER OAKS    MS Not Available
## 27           MERIT HEALTH RIVER OAKS    MS            22
## 28           MERIT HEALTH RIVER OAKS    MS            11
## 29           MERIT HEALTH RIVER OAKS    MS            20
## 30           MERIT HEALTH RIVER OAKS    MS            15
## 31 DAVIS HOSPITAL AND MEDICAL CENTER    UT Not Available
## 32 DAVIS HOSPITAL AND MEDICAL CENTER    UT Not Available
## 33 DAVIS HOSPITAL AND MEDICAL CENTER    UT            11
## 34 DAVIS HOSPITAL AND MEDICAL CENTER    UT            34
## 35 DAVIS HOSPITAL AND MEDICAL CENTER    UT            12
## 36 DAVIS HOSPITAL AND MEDICAL CENTER    UT            32
##                   Measure
## 25      READM_30_AMI_HRRP
## 26     READM_30_CABG_HRRP
## 27     READM_30_COPD_HRRP
## 28       READM_30_HF_HRRP
## 29 READM_30_HIP_KNEE_HRRP
## 30       READM_30_PN_HRRP
## 31      READM_30_AMI_HRRP
## 32     READM_30_CABG_HRRP
## 33     READM_30_COPD_HRRP
## 34       READM_30_HF_HRRP
## 35 READM_30_HIP_KNEE_HRRP
## 36       READM_30_PN_HRRP

We have a nice chunk of data representing readmissions from hospitals from both Utah and Mississipi, along with the particular disease for which the readmission happened.

Why are the names of the hospital repeating?

This is because because each hospital records multiple readmissions measures. This representation of data is considered to be a ‘long’ format. This faciliates easy analysis and visualization in R.

Let’s try to make this easier to look at by:

Converting measure name codes to names of diseases/procedures.
Changing data from long to wide.

wide <- clean %>% 
  mutate(Disease = ifelse(Measure =="READM_30_AMI_HRRP", "Myocardial Infarction", 
                   ifelse(Measure == "READM_30_CABG_HRRP", "Bypass Graft",
                   ifelse(Measure== "READM_30_COPD_HRRP", "COPD",
                   ifelse(Measure == "READM_30_HF_HRRP", "Heart Failure",
  ifelse(Measure == "READM_30_PN_HRRP", "Pneumonia", "Hip/Knee Replacement")))))) %>%
  group_by(State, Disease) %>% 
  select(State, Disease, Readmissions) %>%
summarize(Mean = mean(as.numeric(Readmissions, na.rm = TRUE))) %>% 
  spread(Disease, Mean)

kable(wide)

State	Bypass Graft	COPD	Heart Failure	Hip/Knee Replacement	Myocardial Infarction	Pneumonia
MS	315.3492	208.7778	197.6032	308.6508	302.9365	204.7460
UT	343.6452	284.2581	265.0323	272.6129	329.6452	251.8387

This is an interesting table. This shows that Mississippi has, on average, fewer readmissions than Utah in every type of measure except hip/knee replacement.

Let’s use the long form data to compare readmission numbers for Utah and Mississippi. We will not be looking at the measure for readmission.

long <- clean %>% select(-Measure) %>% 
mutate(Readmissions = as.numeric(Readmissions, na.rm = TRUE))

kable(head(long, 10))

Hospital	State	Readmissions
ANDERSON REGIONAL MEDICAL CTR	MS	278
ANDERSON REGIONAL MEDICAL CTR	MS	55
ANDERSON REGIONAL MEDICAL CTR	MS	35
ANDERSON REGIONAL MEDICAL CTR	MS	60
ANDERSON REGIONAL MEDICAL CTR	MS	44
ANDERSON REGIONAL MEDICAL CTR	MS	23
BRIGHAM CITY COMMUNITY HOSPITAL	UT	354
BRIGHAM CITY COMMUNITY HOSPITAL	UT	354
BRIGHAM CITY COMMUNITY HOSPITAL	UT	354
BRIGHAM CITY COMMUNITY HOSPITAL	UT	354

Preparing for the t-test

Let’s partition this data into 2 vectors containing numbers of readmissions for hospitals, one for Utah and One for Mississippi.

x <- long %>% filter(State == "UT") %>% .$Readmissions #Utah readmissions
y <- long %>% filter(State == "MS") %>% .$Readmissions #Mississippi readmissions

We are going to use this data to perform a t-test to compare the mean number of readmissions for the two states.

There are three major assumptions for a t-test:

Independence of observations: This assumption is met because the individual readmission data for any one hospital is independent of the number of readmissions for other hospitals.
Homegeneity of variances: The variances for the two groups should be equal. We can confirm that using the var.test() function in R.

var.test(x,y)

## 
##  F test to compare two variances
## 
## data:  x and y
## F = 0.83534, num df = 185, denom df = 377, p-value = 0.1657
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.6544439 1.0777766
## sample estimates:
## ratio of variances 
##          0.8353404

The p-value of 0.16 suggests that we cannot reject the null hypothesis that the variances are equal. The assumption of equal variances is met.

Normality of data: This is an important assumption that needs to be checked.

hist(x, xlab = "Number of Readmissions", 
     ylab = "Number of Hospitals", col = "lightblue", 
     main = "Distribution of Utah Readmissions", ylim =c(0, 150))

hist(y, xlab = "Number of Readmissions", 
     ylab = "Number of Hospitals", col = "lightgreen", 
     main = "Distribution of Mississippi Readmissions", ylim = c(0, 200))

Both of these vectors are not normally distributed. There are two ways to tackle this—either by using a log transformation, using a non-parametric test like the Wilcoxon Mann Whitney. In this example, we do not have to do this, because the t-test is usually robust when samples sizes are large. Let us proceed with the test.

T-test

t.test(x, y, alternative = "less")

## 
##  Welch Two Sample t-test
## 
## data:  x and y
## t = 3.4013, df = 399.18, p-value = 0.9996
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##      -Inf 51.71007
## sample estimates:
## mean of x mean of y 
##  291.1720  256.3439

# p value is 0.99, much more than alpha of 0.05
#there is not enough evidence to suggest that 
#the mean number of readmissions is lesser for Utah than Mississippi.

Interpretation

There is no evidence to show that mean readmission numbers in Utah are lesser than Mississippi. Why then did Utah receive a rank of 3 and Mississippi a rank of 49? I noticed when I scrolled down in the article that while their infographic specifies that they are ranking based on readmission numbers, the commentary at the bottom shows that they are using readmission rates. This could be a reason for the discrepancy in our findings.

Let’s try a t-test again, but this time, let’s use the excess readmission ratio column to base the t-test on:

Here’s a quick repeat of the wrangling we did, but this time, we choose to keep the readmission ratio column:

colnames(dat)[7] <- "Readmission_Ratio"

stock <- dat %>% filter(State %in% c("UT", "MS")) %>% 
  select(Hospital, State, Readmission_Ratio)

long <- stock %>% select(-Hospital) %>% 
mutate(Readmission_Ratio = as.numeric(Readmission_Ratio, na.rm = TRUE))

x <- long %>% filter(State == "UT") %>% .$Readmission_Ratio 
y <- long %>% filter(State == "MS") %>% .$Readmission_Ratio

Here’s the t-test:

t.test(x, y, alternative = "less")

## 
##  Welch Two Sample t-test
## 
## data:  x and y
## t = -5.7292, df = 291.25, p-value = 1.258e-08
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##       -Inf -436.2049
## sample estimates:
## mean of x mean of y 
##  2075.586  2688.249

The p-value is highly significant! This means we reject the hypothesis that the means of excess readmission ratios are equal, and we pick the alternate hypothesis that states that the mean of the excess readmission ratio is lesser for Utah than Mississippi.

The article ranking is vindicated. Utah is doing much better than Mississippi in keeping readmissions low.

Next steps:

This was an interesting demonstration of using statistics to verify claims made in media publications. I would like to take this further by downloading other variables for Utah hospitals from Hospital Compare and try to run a regression analysis to find out what is contributing to the readmission ratios.

Using t-tests to compare hospital rankings

Lakshman Balaji

April 15, 2019