2024-09-20

How to determine the number of bins to use in a histogram:

  • In general, the correct number of bins to use is case-by-case.

  • Too few bins can mask patterns within the distribution, and too many bins can make the distribution noisy and create gaps in the spread. Both scenarios make it equally hard to glean insights.

  • Thankfully, there are several rules of thumb to help us figure out that goldilocks number of bins. I’ll only discuss the following: Square Root Rule, Sturges’ Rule, and Freedman-Diaconis Rule.

Example when too few bins are used:

Example when too many bins are used:

Square Root Rule: \[K = \sqrt{n}\]

  • K: # of bins
  • n: # of observations
  • In the case of the mtcars dataset, which has 32 observations, the equation would be as follows: \[K = \sqrt{32}\]
  • After calculating K (5.656854), you simply round up.
  • For this dataset then, the appropriate number of bins to use is 6.

Sturges’ Rule: \[K = [1 + \log_2(n)]\]

  • K: # of bins
  • n: # of observations
  • The square brackets denote rounding up to the nearest integer.
  • Again, the recommended number of bins to use calculates to 6.

Freedman-Diaconis Rule: \[K = \frac{2 * IQR}{n^\frac{1}{3}}\]

  • K: # of bins
  • n: # of observations
  • IQR: interquartile range
  • The IQR for mpg is 7.375
  • With this rule, the recommended bin number is 5.

When the bin number is just right!

mpg_hist = ggplot(mtcars) + 
  geom_histogram(mapping = aes(x=mpg), bins = 6) +
  ggtitle("Distribution of MPG") +
  xlab("Miles Per Gallon") +
  ylab("Frequency")
mpg_hist

Side-by-side comparison of 6 and 5 bins, respectively.

Plotlyfication of Histogram

with Freedman-Diaconis Rule

Code Chunk for plotly Histogram

x = mtcars$mpg
xax = list(title = "Miles Per Gallon",
           titlefont = list(family="Modern Computer Roman"))
yax = list(title = "Frequency",
           titlefont = list(family="Modern Computer Roman"),
           range = c(0,15))
fig = plot_ly(x=x,type="histogram") %>%
  layout(xaxis=xax, yaxis=yax) %>%
  layout(autosize=T)