email: jc3181 AT columbia DOT edu
I was asked recently by one of my students if it was possible to recreate this sort of image in R. This image comes from this paper
I didn’t think I could (& I’m still not sure I can!), but it did give me an excuse for playing around with the animal images available in the rphylopic R package.
First off, let’s create some random data. We need to generate enough data points that it is worth plotting 95% confidence intervals and we need to also make sure that these CIs don’t overlap too much. These sorts of plots are pretty common when comparing two phenotypic measures across animal species such as in the image above.
set.seed(101)
n <- 1000
x1 <- rnorm(n, mean=2)
y1 <- 1.75 + 0.4*x1 + rnorm(n)
df <- data.frame(x=x1, y=y1, group="A")
x2 <- rnorm(n, mean=8)
y2 <- 0.7*x2 + 2 + rnorm(n)
df <- rbind(df, data.frame(x=x2, y=y2, group="B"))
x3 <- rnorm(n, mean=6)
y3 <- x3 - 5 - rnorm(n)
df <- rbind(df, data.frame(x=x3, y=y3, group="C"))
head(df)
## x y group
## 1 1.673964 2.9227812 A
## 2 2.552462 0.6735141 A
## 3 1.325056 2.1855379 A
## 4 2.214359 3.6502862 A
## 5 2.310769 2.5949072 A
## 6 3.173966 2.7644576 A
This can be done using the ellipse package. What this function is doing is essentially calculating the ‘x’ and ‘y’ coordinates for the path of each ellipse.
library(ellipse)
df_ell <- data.frame()
for(g in levels(df$group)){
df_ell <- rbind(df_ell,
cbind(as.data.frame
(with
(df[df$group==g,],
ellipse(cor(x, y),
scale=c(sd(x),sd(y)),
centre=c(mean(x),mean(y))
)
)
),
group=g)
)
}
This is the basic plot of the data including all data points and ellipses:
library(ggplot2)
ggplot(data=df, aes(x=x, y=y,colour=group)) +
geom_point(size=1.5, alpha=.6) +
geom_polygon(data=df_ell, aes(x=x, y=y,colour=group, fill=group), alpha=0.2, size=1, linetype=1)
In this one we remove the data points, so that they don’t clutter the final chart.
library(ggplot2)
p <- ggplot(data=df, aes(x=x, y=y,colour=group)) +
geom_polygon(data=df_ell, aes(x=x, y=y,colour=group, fill=group), alpha=0.2, size=1, linetype=1)
p
I would love to be able to insert high quality images as the ‘fill’ part of the geom_polygon(fill="") but I don’t think that that’s possible. Instead, my strategy is to retrieve animal images from the phylopic gallery. You can go to that website and find the image you like. Simply save the url for that image and then insert the number/character string part of it as the first argument in the get_image function of the rphylopic package.
rphylopic can be installed directly from GitHub like this:
library(devtools)
install_github("sckott/rphylopic")
I decided to pretend that my data came from studies of lions, mice and bugs.
library(rphylopic)
lion <- get_image("e2015ba3-4f7e-4950-9bde-005e8678d77b", size = "512")[[1]]
mouse <- get_image("6b2b98f6-f879-445f-9ac2-2c2563157025", size="512")[[1]]
bug <- get_image("136edfe2-2731-4acd-9a05-907262dd1311", size="512")[[1]]
I’m going to add these images as de facto data points right in the center point of the ellipses. To get that x,y coordinate for each group, we simply need to know the mean x and mean y coordinate:
library(dplyr)
ell_center <- df_ell %>% group_by(group) %>% summarise(x=mean(x), y=mean(y))
This now means we can plot each image as a new layer on top of our saved ggplot ‘p’. Playing around with the colors and alpha levels will lead us to getting the desired effect:
p + add_phylopic(lion, alpha=0.9, x=ell_center[[1,2]], y=ell_center[[1,3]], ysize=2, color="firebrick1") +
add_phylopic(mouse, alpha=1, x=ell_center[[2,2]], y=ell_center[[2,3]], ysize=2, color="darkgreen") +
add_phylopic(bug, alpha=0.9, x=ell_center[[3,2]], y=ell_center[[3,3]], ysize=2, color="mediumblue") +
theme_bw() +
theme(legend.position = "none")
There you go. It certainly not as good as the image of algae, but it’s not bad and could well be useful for certain projects !