Introduction

(This tutorial extends a previous tutorial on histograms by adding more information about controlling color.)

Histograms are used to visualize the spread in the data point values of a continuous numerical variable. The Plotly histogram divides the numerical variable up into equally sized bins, ranging from the lowest to the highest value for the variable. It then counts the number of data point values in each bin.

The histogram can also be normalized so as to show a frequency distribution.

A simple histogram

The code below creates a variable called wcc. It takes \(200\) data point values from a normal distribution with \(\mu = 15\) and \(\sigma = 4\).

wcc <- rnorm(100,
             mean = 15,
             sd = 4)
summary(wcc)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   5.617  11.419  13.461  14.373  16.885  25.196

A simple histogram is generated below.

p1 <- plot_ly(x = ~wcc,
              type = "histogram")
p1

Adding a title axes labels

The layout() command added in the pipeline creates titles.

p2 <- plot_ly(x = ~wcc,
              type = "histogram") %>% 
  layout(title = "Histogram of white cell count",
         xaxis = list(title = "White cell count",
                      zeroline = FALSE),
         yaxis = list(title = "Count",
                      zeroline = FALSE))
p2

Normalized histogram

A normalized histogram shows the frequency distribution. It divides the count for each bin by the total number of data point values in the sample. It is achieved by using the histnorm = argument.

p3 <- plot_ly(x = ~wcc,
              type = "histogram",
              histnorm = "probability") %>% 
  layout(title = "Frequency distribution of white cell count",
         xaxis = list(title = "White cell count",
                      zeroline = FALSE),
         yaxis = list(title = "Frequency",
                      zeroline = FALSE))
p3

Horizontal histogram

Flipping the plot to the horizontal is as easy as setting the variable, wcc to the y = argument. Remember to change the axes titles as well.

p4 <- plot_ly(y = ~wcc,
              type = "histogram",
              histnorm = "probability") %>% 
  layout(title = "Frequency distribution of white cell count",
         yaxis = list(title = "White cell count",
                      zeroline = FALSE),
         xaxis = list(title = "Frequency",
                      zeroline = FALSE))
p4

Displaying a histogram for more than one categorical group

A histogram for the same variable can be shown for more than one group. The code below creates a data.frame and then splits this into two data.frame objects using the dplyr library.

df <- data.frame(Group = sample(c("A", "B"),
                                200,
                                replace = TRUE),
                 WCC = wcc)
groupA <- df %>% filter(Group == "A")
groupB <- df %>% filter(Group == "B")

The code below creates a histogram of the WCC variable for each group. Since an overlay is going to occur, the opacity is set using the alpha = argument. Each individual histogram is added using the add_histogram() command. The barmode = argument is added to the layout() command.

p5 <- plot_ly(alpha = 0.7) %>% 
  add_histogram(x = ~groupA$WCC,
                name = "Group A") %>% 
  add_histogram(x = ~groupB$WCC,
                name = "GroupB") %>% 
  layout(barmode = "overlay",
         title = "Histogram of white cell count for groups A and B",
         xaxis = list(title = "White cell count",
                      zeroline = FALSE),
         yaxis = list(title = "Count",
                      zeroline = FALSE))
p5

Changing the colors of the bars

The Plotly default is to print the markers in blue, be they bars or points. If a second color is required, orange is chosen. These colors can be changed.

It is also difficult at times to distinguish the different bars that make up a histogram. Since histograms represent continuous numerical variables, there are no gaps between the bars as with a bar chart (which represents discrete or categorical variable counts).

In the code chunk below, we choose a teal color for the bar and a dark gray for the border. This monochrome chart is more suitable for submission for publication that the colorful charts that we create for the web. Since most journals are now online, hopefully this dreary look will disappear.

p6 <- plot_ly(x = ~wcc,
              type = "histogram",
              histnorm = "probability",
              marker = list(color = "lightgray",
                            line = list(color = "darkgray",
                                        width = 2))) %>% 
  layout(title = "Frequency distribution of white cell count",
         yaxis = list(title = "White cell count",
                      zeroline = FALSE),
         xaxis = list(title = "Frequency",
                      zeroline = FALSE))
p6

We can plot more than one histogram trace on the same plot and color them separately as well. In the code chunk below, we lower the opacity of the second layer so as to be able to see the bottom layers bars.

p7 <- plot_ly() %>% 
  add_histogram(x = ~groupA$WCC,
                name = "Group A",
                marker = list(color = "teal",
                            line = list(color = "darkgray",
                                        width = 2))) %>% 
  add_histogram(x = ~groupB$WCC,
                opacity = 0.7,
                name = "GroupB",
                marker = list(color = "orange",
                            line = list(color = "darkgray",
                                        width = 2))) %>% 
  layout(barmode = "overlay",
         title = "Histogram of white cell count for groups A and B",
         xaxis = list(title = "White cell count",
                      zeroline = FALSE),
         yaxis = list(title = "Count",
                      zeroline = FALSE))
p7

Using "rgba()" (red, green, blue, and opacity) and "rgb()" (red, green, and blue) values allows for more control over the colors and their opacity.

p8 <- plot_ly() %>% 
  add_histogram(x = ~groupA$WCC,
                name = "Group A",
                marker = list(color = "rgba(255, 165, 0, 1.0)",
                            line = list(color = "rgb(169, 169, 169)",
                                        width = 2))) %>% 
  add_histogram(x = ~groupB$WCC,
                name = "GroupB",
                marker = list(color = "rgba(150, 150, 150, 0.7)",
                            line = list(color = "rgb(169, 169, 169)",
                                        width = 2))) %>% 
  layout(barmode = "overlay",
         title = "Histogram of white cell count for groups A and B",
         xaxis = list(title = "White cell count",
                      zeroline = FALSE),
         yaxis = list(title = "Count",
                      zeroline = FALSE))
p8