An alternative to pie charts comparing different populations

There are several options for graphically comparing proportions. Many of these, such as pie graphs and stacked bar charts, make comparisons between two different population proportions difficult. In this document, I present a proportional circle graph using ggplot that makes comparisons between proportions in two different populations easier.

First, load the required packages.

require(ggplot2) # Primary plotting package
require(cowplot) # Used for combining ggplot objects
require(RColorBrewer) # For additional colors
require(grid) # arrange grobs
require(gridExtra) # arrange grobs
require(plotrix) # cluster plotting

As an example, I have made a dataset of 6 source contributors to ambient particulate matter (PM) in 10 different sites. The dataframe is set to “long” format:

head(df)

##     Site   Source Proportion
## 1 Site 1 Source 1 0.02429485
## 2 Site 1 Source 2 0.05554860
## 3 Site 1 Source 3 0.14355765
## 4 Site 1 Source 4 0.16731885
## 5 Site 1 Source 5 0.24071562
## 6 Site 1 Source 6 0.36856443

tail(df)

##       Site   Source Proportion
## 55 Site 10 Source 1 0.21148834
## 56 Site 10 Source 2 0.14156087
## 57 Site 10 Source 3 0.04936692
## 58 Site 10 Source 4 0.31050098
## 59 Site 10 Source 5 0.16606038
## 60 Site 10 Source 6 0.12102251

Here is an example of a proportional circle plot for each site’s source distribution:

ggplot(aes(fill=Source,label=Site,size=Proportion,y=c(.8,1.2,.9,1.1,.8,1.2),x=1:6),
       data=df[df$Site=="Site 1",])+
  geom_jitter(shape=21,width = .00,height = .1,alpha=1)+
  scale_size_area(
  limits=c(0,1),max_size = 32
  )+
  scale_fill_manual(values=brewer.pal(6,"Set1"))+
  ylim(.6,1.4)+xlim(0,7)+
  xlab(label=df[,"Site"])+
  theme(
      axis.ticks=element_blank(),
      axis.title.y=element_blank(),
      axis.text=element_blank()
  )+
  guides(fill=FALSE,size=F)

If you remove geom jitter for geom point, you can specify the locations of your circles precisely to make them closer. I personally prefer the slight randomness of geom jitter. The above may be automated for each source:

for(i in 1:10){
temp <- ggplot(aes(fill=Source,label=Site,size=Proportion,y=c(.8,1.2,.9,1.1,.8,1.2),x=1:6),
       data=df[df$Site==paste("Site",i),])+
  geom_jitter(shape=21,width = .00,height = .1,alpha=1)+
  scale_size_area(
  limits=c(0,1),max_size = 32
  )+
  scale_fill_manual(values=brewer.pal(6,"Set1"))+
  ylim(.6,1.4)+xlim(0,7)+
  xlab(label=df[df$Site==paste("Site",i),"Site"])+
  theme(
      axis.ticks=element_blank(),
      axis.title.y=element_blank(),
      axis.text=element_blank()
  )+
  guides(fill=FALSE,size=F)
  assign(x = paste("Figure",i,sep="_"),temp)
}

Since the legend is the same for each plot, we can extract just the legend and avoid redundancy when combining plots per this article

leg<-ggplot(data = df, aes(x = Source,y=Proportion,fill=Source)) + geom_point(shape=21,size=8) +
  scale_fill_manual("",labels=paste("Source",1:6),values=brewer.pal(6,"Set1"))+
  theme(
    legend.title=element_blank(),
    legend.text=element_text(size=15)
  )

g_legend<-function(a.gplot){ 
  tmp <- ggplot_gtable(ggplot_build(a.gplot)) 
  leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box") 
  legend <- tmp$grobs[[leg]] 
  return(legend)} 

legend <- g_legend(leg) 
grid.draw(legend)

Now we just need to arrange these ggplot objects using cow.plot.

ggdraw()+
  geom_rect(aes(xmin = 0, xmax =1, ymin = 0, ymax = 1),
        colour = "black", fill = "white",size=.8)+
  draw_plot(Figure_1,.33,.75,.33,.25)+
  draw_plot(Figure_2,.66,.75,.33,.25)+
  draw_plot(Figure_3,0,.5,.33,.25)+
  draw_plot(Figure_4,.33,.5,.33,.25)+
  draw_plot(Figure_5,.66,.5,.33,.25)+
  draw_plot(Figure_6,0,.25,.33,.25)+
  draw_plot(Figure_7,.33,.25,.33,.25)+
  draw_plot(Figure_8,.66,.25,.33,.25)+
  draw_plot(Figure_9,0.12,0,.33,.25)+
  draw_plot(Figure_10,0.33+.21,0,.33,.25)+
  draw_plot(legend,0,0.805,.33,.15)

An alternative to pie charts comparing different populations

Matthew Secrest

June 29, 2016