library(ggplot2)

This is a simple bar plot showing how many diamonds we have for every diamond cut:

ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut))

With geom_bar you are saying “I want a bar plot” and you are implying “I want to count how many samples I have of each type”. ggplot internals separate the two concepts in stat_count and geom_bar, where the first takes care of counting how many samples your data have of each type and the second takes care of giving the coordinates of the rectangles that give the shape to each bar.

This separation of geom_ and stat_ is present in all of ggplot. For instance, geom_histogram is very similar to a geom_bar, but uses stat_bin instead, to put samples into bins and then count the number of samples in each bin.

stat_count provides two internal variables ..count.. and ..prop.., referring to count and proportion respectively. Don’t be surprised by the ..name.. notation, it is used to prevent confusion with your own columns (don’t name your own columns with weird names like ..count..!)

You can create the same plot giving some extra information that ggplot assumes by default:

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, y = ..count..), stat = "count")

When working with custom stat_ it is often useful to extract or understand the internal counting or computation. ggplot allows us to do that:

plt <- ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = cut))
plt_b <- ggplot_build(plt)
plt_b$data[[1]]
y count prop x PANEL group ymin ymax xmin xmax colour fill size linetype alpha
1610 1610 1 1 1 1 0 1610 0.55 1.45 NA grey35 0.5 1 NA
4906 4906 1 2 1 2 0 4906 1.55 2.45 NA grey35 0.5 1 NA
12082 12082 1 3 1 3 0 12082 2.55 3.45 NA grey35 0.5 1 NA
13791 13791 1 4 1 4 0 13791 3.55 4.45 NA grey35 0.5 1 NA
21551 21551 1 5 1 5 0 21551 4.55 5.45 NA grey35 0.5 1 NA

Here one can see the count and prop columns. The prop column is created as count divided by the sum of all of the count that belong to the same group. By default, ggplot created one group per each bar, so all the proportions are set to 1.

When we try to transform the counts into percentages we should use ..prop.. as y variable, but we will fail if we don’t provide a group:

ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = cut, y = ..prop..), stat = "count")

However if we provide a group, stat_count will compute the proportions as we want:

plt <- ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = cut, y = ..prop.., group = 1))
plt_b <- ggplot_build(plt)
plt_b$data[[1]]
y count prop x group PANEL ymin ymax xmin xmax colour fill size linetype alpha
0.0298480 1610 0.0298480 1 1 1 0 0.0298480 0.55 1.45 NA grey35 0.5 1 NA
0.0909529 4906 0.0909529 2 1 1 0 0.0909529 1.55 2.45 NA grey35 0.5 1 NA
0.2239896 12082 0.2239896 3 1 1 0 0.2239896 2.55 3.45 NA grey35 0.5 1 NA
0.2556730 13791 0.2556730 4 1 1 0 0.2556730 3.55 4.45 NA grey35 0.5 1 NA
0.3995365 21551 0.3995365 5 1 1 0 0.3995365 4.55 5.45 NA grey35 0.5 1 NA

And the plot will be what we expect:

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, y = ..prop.., group = 1), stat = "count")

We can finally tweak the labels so they are expressed as percentages:

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, y = ..prop.., group = 1), stat = "count") + 
  scale_y_continuous(labels = scales::percent_format())