Pretty Visualizations of Doctor Gifts with the statebins package

Loading Data

Lets get started, first let load in the data. We are going to use read.csv(), although the data is one GB in size so we will use the fread() function in the package data.table. stringsAsFactors is by default FALSE, however, call me old fashioned we are going to put it in anyway.

library(statebins)
library(data.table)
pharm.data <- fread("./General_Payment_Data_2013.csv",header=TRUE,stringsAsFactors=FALSE,showProgress=FALSE)

Once the data is loaded note that using str(pharm.data) reveals a $ in one of our variables of interest, Total_Amount_of_Payment_USDollars. To make this usable we will use the substring function starting at the second character to convert this column into numeric.

pharm.data$Total_Amount_of_Payment_USDollars <-  as.numeric(substring(pharm.data$Total_Amount_of_Payment_USDollars,2))

Aggregating Data

Now we can start messing with the data. Lets create three different subsets fo the data. All of these will use data.table’s aggregation technique. The functions used in each are the normal R mean(), sum(), and length() functions. Notice we use the data.table function setkey() to tell R we want the Recipient_State as the key for aggregating.

setkey(pharm.data,Recipient_State )

# Aggregate means of each state
pharm.data.mean <- as.data.frame(pharm.data[, mean(Total_Amount_of_Payment_USDollars, na.rm = TRUE),by = Recipient_State])

# Aggregate totals of each state
pharm.data.total <- as.data.frame(pharm.data[, sum(Total_Amount_of_Payment_USDollars, na.rm = TRUE),by = Recipient_State])

# Aggregate number of doctors in each state
# Notice na.omit() in length() instead of na.rm=TRUE
pharm.data.docs <- as.data.frame(pharm.data[, length(na.omit(Physician_Last_Name)),by = Recipient_State])

To get these aggregates we tell R to make a data frame out of pharm.data that consists of either the mean, sum, or length of the column choice variable. We tell R to group each of these by the recipient state. Now that we have all of our data we have to ask another question. Did we time travel to some wacky future where America has gained eight new states?? I don’t think so, but why do we have 59 observations? This dataset also includes some US army bases and territories. The only way I know how to remove these is to do them individually, but if someone knows an easier way please leave a comment!

pharm.data.mean<-pharm.data.mean[-1,]
pharm.data.mean<-pharm.data.mean[-1,]
pharm.data.mean<-pharm.data.mean[-1,]
pharm.data.mean<-pharm.data.mean[-3,]
pharm.data.mean<-pharm.data.mean[-12,]
pharm.data.mean<-pharm.data.mean[-41,]
pharm.data.mean<-pharm.data.mean[-38,]
pharm.data.mean<-pharm.data.mean[-47,]

pharm.data.total<-pharm.data.total[-1,]
pharm.data.total<-pharm.data.total[-1,]
pharm.data.total<-pharm.data.total[-1,]
pharm.data.total<-pharm.data.total[-3,]
pharm.data.total<-pharm.data.total[-12,]
pharm.data.total<-pharm.data.total[-41,]
pharm.data.total<-pharm.data.total[-38,]
pharm.data.total<-pharm.data.total[-47,]

pharm.data.docs<-pharm.data.docs[-1,]
pharm.data.docs<-pharm.data.docs[-1,]
pharm.data.docs<-pharm.data.docs[-1,]
pharm.data.docs<-pharm.data.docs[-3,]
pharm.data.docs<-pharm.data.docs[-12,]
pharm.data.docs<-pharm.data.docs[-41,]
pharm.data.docs<-pharm.data.docs[-38,]
pharm.data.docs<-pharm.data.docs[-47,]


colnames(pharm.data.mean)<- c("state","value")
colnames(pharm.data.total)<- c("state","value")
colnames(pharm.data.docs)<- c("state","length")

Notice at the end I slipped in a column name change. This is just to make the step of plotting a little easier for me and is not necessary.

Creating Visualization

We have data munged and paid our data dues, lets make some pretty graphs. We’ll use the statebins_continuous() function for our mean and total graphs. This function’s ability to go from simple to complex is noted by our ability to attach ggplot2 additional arguments to the function. This means that you can use the base function, but if you want to customize something in particular its very doable.

State.Payment.mean <- statebins_continuous(pharm.data.mean, "state", "value",
                           legend_title="Mean of Money Transferred From Pharma companies to Doctors By State", font_size=3, 
                           brewer_pal="PuRd", text_color="black", 
                           plot_title="Mean Transfers of money from Pharmaceutical Companies to Doctors in each state"
                           , legend_position="bottom", 
                           title_position="top")+ guides(fill = guide_colorbar(barwidth = 10, barheight = 1))

State.Payment.mean

First we specify our dataset, pharm.data.mean, then tell the function the name of the column of states (which can be an abbreviation like our example or full names) and name of the value to place in the heat map. Note both of these come in as strings.

State.Payment.total <- statebins_continuous(pharm.data.total, "state", "value",
                                      legend_title="Total of Money Transferred From Pharma companies to Doctors By State", font_size=3, 
                                      brewer_pal="PuRd", text_color="black", 
                                      plot_title="Total Transfers of money from Pharmaceutical Companies to Doctors in each state"
                                      , legend_position="bottom", 
                                      title_position="top")+ guides(fill = guide_colorbar(barwidth = 10, barheight = 1))

State.Payment.total

State.docs <- statebins(pharm.data.docs, "state", "length", breaks=6, 
                labels=c("1", "2", "3", "4","5","6"),
                legend_title="Rank of states by number of doctors who receive", font_size=3, 
                brewer_pal="PuBu", text_color="black", 
                plot_title="Transfers of money from Pharmaceutical Companies to Doctors in each state"
                , title_position="bottom")

State.docs

Pretty Visualizations of Doctor Gifts with the statebins package

Steve Bronder

Friday, October 03, 2014

Introduction

Loading Data

Aggregating Data

Creating Visualization