Working with rHealthDataGov package

The rHealthDataGov package was released recently. Per the CRAN page:

An R interface for the HealthData.gov data API. For each data resource, you can filter >results (server-side) to select subsets of data.

Here's a quick overview of using it to display the HAI-1 (Central Line Associated Blood Infenctions, CLABSI) measures for hospitals in RI.

# load rHealthDataGov package
library(rHealthDataGov)
state <- "RI"

# get list of providers
hospST <- fetch_healthdata(resource = "hosp", filter = list(addr_state = state))

# convert provider_id to atomic data type character
hospST$provider_id <- as.character.integer64(hospST$provider_id)
prov <- hospST$provider_id

# get state benchmark
hais <- fetch_healthdata(resource = "hais", filter = list(state_code = state))

# grab timeframes
q <- fetch_healthdata(resource = "q", filter = list(measureid = "HAI-1"))

Note one of the dates in the Quarters table comes in as class Date while the other reads as a character class, so…

# convert hospital_discharge_end_2 to date
q$hospital_discharge_end_2 <- as.Date(q$hospital_discharge_end_2, "%m/%d/%Y")

The Provider tables in the Hospital Compare data sets are larger than the State and National tables which contain little more than a benchmark/comparison number. The following allows us to look at the variables in a given table, then we'll know which to filter on to only get Providers in our chosen state.

data(filters)
summary(filters$haip)
##                  Length Class     Mode     
## hai_1_sir         950   -none-    numeric  
## provider_id      3619   integer64 numeric  
## seqn             3619   integer64 numeric  
## hai_1_footnote      6   -none-    character
## hai_1_eligcases  2427   -none-    numeric  
## hai_1_devicedays 2201   integer64 numeric  
## hai_1_ci_lower    595   -none-    numeric  
## _id              3619   -none-    numeric  
## hai_1_numerator    60   integer64 numeric  
## hai_1_ci_upper   1519   -none-    numeric

The provider_id field is in the haip table. Also notice that integer64 Class of many of the variables. The rHealthDataGov package requires the bit64 package which we can use to convert these to atomic data types (character, numeric, etc).

# get hospital data for selected state, first select all
haip <- fetch_healthdata(resource = "haip", filter = NULL)

# convert provider_id to character
haip$provider_id <- as.character.integer64(haip$provider_id)

# filter to just Providers we identified earlier
library(dplyr)
haip <- filter(haip, provider_id %in% prov)

# convert the remaining integer64 variables to double
haip$seqn <- as.double.integer64(haip$seqn)
haip$hai_1_numerator <- as.double.integer64(haip$hai_1_numerator)
haip$hai_1_devicedays <- as.double.integer64(haip$hai_1_devicedays)

Now set up the plot data and create plot

# set up plot data
pd <- haip
pd <- inner_join(pd, hospST[, c("provider_id", "hsp_name")], by = "provider_id")

# set state comparison
pd$state_comp <- hais$hai_1_sir
pd[is.na(pd$hai_1_ci_lower), "hai_1_ci_lower"] <- 0
pd[is.na(pd$hai_1_ci_upper), "hai_1_ci_upper"] <- 0
# create plot
library(ggplot2)
limits <- aes(ymax = pd$hai_1_ci_upper, ymin = hai_1_ci_lower)

ggplot(pd, aes(hsp_name, hai_1_sir)) + geom_point(size = 5) + theme(axis.text.x = element_text(angle = 90)) + 
    geom_linerange(limits, width = 0.25) + 
# set RI comparison
geom_hline(yintercept = pd$state_comp, colour = "blue", linetype = "dashed") + 

# set SIR comp
geom_hline(yintercept = 1, colour = "red") + 
xlab("Hospital Name") + ggtitle(paste0("HAI-1, ", state, ", ", q$hospital_discharge_sta_1, 
    " - ", q$hospital_discharge_end_1))

plot of chunk unnamed-chunk-5

The blue dashed line is the RI comparison, and the red line is set at the value of 1 which, for a Standard Infection Ratio (SIR), will always be the midpoint or average of the dataset. The lines off of the points show the confidence intervals which are included in the data set. In a future view we can look at the number of device days and how that factors into the SIR and the confidence intervals.

Last note is that the data available through the API seems to be older than what's available on the Hospital Compare site. Hopefully the API will connect to the most recent data in the future.