A while back while I was completing the Coursera Data Science specialization, I came across a problem where I wanted to plot a histogram with an overlay of a density plot as well as an overlay of a normal density plot.

As an example, this is what I was aiming for:

library(ggplot2)
set.seed(1234)
dat <- data.frame(cond = factor(rep(c("A","B"), each=200)), rating = c(rnorm(200),rnorm(200, mean=.8)))
plot <- ggplot(dat, aes(x = rating)) 
plot <- plot + geom_histogram(aes(y=..density..), color="black", fill = "steelblue", binwidth = 0.5, alpha = 0.2)
plot <- plot + geom_density()
plot <- plot + stat_function(fun = dnorm, colour = "red", args = list(mean = 0.3, sd = 1))
plot

The problem I encountered was that I wanted to add a legend to the plot explaining what the red and black density plots were but while this sounds like a simple task I couldn’t figure out how to do it in ggplot.

In the end I resorted to asking the question on Stack Overflow and a user called mpalanco provided a nice solution.

The key was that I had to include the legend labels with aes(color = “xxx”) for both plots and then add the legend using scale_colour_manual function (scale_colour_manual("Density", values = c("red", "black"))).

library(ggplot2)
set.seed(1234)
dat <- data.frame(cond = factor(rep(c("A","B"), each=200)), rating = c(rnorm(200),rnorm(200, mean=.8)))
plot <- ggplot(dat, aes(x = rating))
plot <- plot + geom_histogram(aes(y = ..density..), color = "black", fill = "steelblue", binwidth = 0.5, alpha = 0.2)
plot <- plot + geom_density(aes(color = "Simulated"))
plot <- plot + stat_function(aes(color = "Normal"), fun = dnorm, args = list(mean = 0.3, sd = 1)) 
plot <- plot + scale_colour_manual("Density", values = c("red", "black"))
plot

Sometimes ggplot is not that intuitive but once you get the hang of it, it is very powerful and flexible.