Introduction

The purpose of this post is to explain how to graph topological data with the statebins package. To do this we will play with General Payment Data for non-research/ownership payments to physicians and teaching hospitals. This data was very recently release and, in short, contains the data for “gifts” pharma companies and others give to doctors and teaching hospitals because they are just great people. The data used throughtout this tutorial can be found on the open payments data section of the center for Medicare and Medicaid Services here.

Statebins is an R package that produces choropleth maps for US states that preserves the geographic placement of states, but has the look and feel of a traditional heatmap. This package is based on work by the Washington Post graphics department in their report on The States Most Threatened by Trade. Functions are provided that allow binned, discrete, and continuous scale. we will example to of these through binning the number of doctors receiving payments in each state as well as the mean and total payments in each state.

Loading Data

Lets get started, first let load in the data. We are going to use read.csv(), although the data is one GB in size so we will use the fread() function in the package data.table. stringsAsFactors is by default FALSE, however, call me old fashioned we are going to put it in anyway.

library(statebins)
library(data.table)
pharm.data <- fread("./General_Payment_Data_2013.csv",header=TRUE,stringsAsFactors=FALSE,showProgress=FALSE)

Once the data is loaded note that using str(pharm.data) reveals a $ in one of our variables of interest, Total_Amount_of_Payment_USDollars. To make this usable we will use the substring function starting at the second character to convert this column into numeric.

pharm.data$Total_Amount_of_Payment_USDollars <-  as.numeric(substring(pharm.data$Total_Amount_of_Payment_USDollars,2))

Aggregating Data

Now we can start messing with the data. Lets create three different subsets fo the data. All of these will use data.table’s aggregation technique. The functions used in each are the normal R mean(), sum(), and length() functions. Notice we use the data.table function setkey() to tell R we want the Recipient_State as the key for aggregating.

setkey(pharm.data,Recipient_State )

# Aggregate means of each state
pharm.data.mean <- as.data.frame(pharm.data[, mean(Total_Amount_of_Payment_USDollars, na.rm = TRUE),by = Recipient_State])

# Aggregate totals of each state
pharm.data.total <- as.data.frame(pharm.data[, sum(Total_Amount_of_Payment_USDollars, na.rm = TRUE),by = Recipient_State])

# Aggregate number of doctors in each state
# Notice na.omit() in length() instead of na.rm=TRUE
pharm.data.docs <- as.data.frame(pharm.data[, length(na.omit(Physician_Last_Name)),by = Recipient_State])

To get these aggregates we tell R to make a data frame out of pharm.data that consists of either the mean, sum, or length of the column choice variable. We tell R to group each of these by the recipient state. Now that we have all of our data we have to ask another question. Did we time travel to some wacky future where America has gained eight new states?? I don’t think so, but why do we have 59 observations? This dataset also includes some US army bases and territories. The only way I know how to remove these is to do them individually, but if someone knows an easier way please leave a comment!

pharm.data.mean<-pharm.data.mean[-1,]
pharm.data.mean<-pharm.data.mean[-1,]
pharm.data.mean<-pharm.data.mean[-1,]
pharm.data.mean<-pharm.data.mean[-3,]
pharm.data.mean<-pharm.data.mean[-12,]
pharm.data.mean<-pharm.data.mean[-41,]
pharm.data.mean<-pharm.data.mean[-38,]
pharm.data.mean<-pharm.data.mean[-47,]

pharm.data.total<-pharm.data.total[-1,]
pharm.data.total<-pharm.data.total[-1,]
pharm.data.total<-pharm.data.total[-1,]
pharm.data.total<-pharm.data.total[-3,]
pharm.data.total<-pharm.data.total[-12,]
pharm.data.total<-pharm.data.total[-41,]
pharm.data.total<-pharm.data.total[-38,]
pharm.data.total<-pharm.data.total[-47,]

pharm.data.docs<-pharm.data.docs[-1,]
pharm.data.docs<-pharm.data.docs[-1,]
pharm.data.docs<-pharm.data.docs[-1,]
pharm.data.docs<-pharm.data.docs[-3,]
pharm.data.docs<-pharm.data.docs[-12,]
pharm.data.docs<-pharm.data.docs[-41,]
pharm.data.docs<-pharm.data.docs[-38,]
pharm.data.docs<-pharm.data.docs[-47,]


colnames(pharm.data.mean)<- c("state","value")
colnames(pharm.data.total)<- c("state","value")
colnames(pharm.data.docs)<- c("state","length")

Notice at the end I slipped in a column name change. This is just to make the step of plotting a little easier for me and is not necessary.

Creating Visualization

We have data munged and paid our data dues, lets make some pretty graphs. We’ll use the statebins_continuous() function for our mean and total graphs. This function’s ability to go from simple to complex is noted by our ability to attach ggplot2 additional arguments to the function. This means that you can use the base function, but if you want to customize something in particular its very doable.

State.Payment.mean <- statebins_continuous(pharm.data.mean, "state", "value",
                           legend_title="Mean of Money Transferred From Pharma companies to Doctors By State", font_size=3, 
                           brewer_pal="PuRd", text_color="black", 
                           plot_title="Mean Transfers of money from Pharmaceutical Companies to Doctors in each state"
                           , legend_position="bottom", 
                           title_position="top")+ guides(fill = guide_colorbar(barwidth = 10, barheight = 1))

State.Payment.mean

plot of chunk unnamed-chunk-5

First we specify our dataset, pharm.data.mean, then tell the function the name of the column of states (which can be an abbreviation like our example or full names) and name of the value to place in the heat map. Note both of these come in as strings.

State.Payment.total <- statebins_continuous(pharm.data.total, "state", "value",
                                      legend_title="Total of Money Transferred From Pharma companies to Doctors By State", font_size=3, 
                                      brewer_pal="PuRd", text_color="black", 
                                      plot_title="Total Transfers of money from Pharmaceutical Companies to Doctors in each state"
                                      , legend_position="bottom", 
                                      title_position="top")+ guides(fill = guide_colorbar(barwidth = 10, barheight = 1))

State.Payment.total

plot of chunk unnamed-chunk-6

State.docs <- statebins(pharm.data.docs, "state", "length", breaks=6, 
                labels=c("1", "2", "3", "4","5","6"),
                legend_title="Rank of states by number of doctors who receive", font_size=3, 
                brewer_pal="PuBu", text_color="black", 
                plot_title="Transfers of money from Pharmaceutical Companies to Doctors in each state"
                , title_position="bottom")

State.docs

plot of chunk unnamed-chunk-7