Histograms are used to visualize the spread in the data point values of a continuous numerical variable. The Plotly histogram divides the numerical variable up into equally sized bins, ranging from the lowest to the highest value for the variable. It then counts the number of data point values in each bin.
The histogram can also be normalized so as to show a frequency distribution.
The code below creates a variable called wcc. It takes \(200\) data point values from a normal distribution with \(\mu = 15\) and \(\sigma = 4\).
wcc <- rnorm(100,
15, 4)
summary(wcc)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.617 11.419 13.461 14.373 16.885 25.196
A simple histogram is generated below.
p1 <- plot_ly(x = ~wcc,
type = "histogram")
p1
The layout() command added in the pipeline creates titles.
p2 <- plot_ly(x = ~wcc,
type = "histogram") %>%
layout(title = "Histogram of white cell count",
xaxis = list(title = "White cell count",
zeroline = FALSE),
yaxis = list(title = "Count",
zeroline = FALSE))
p2
A normalized histogram shows the frequency distribution. It divides the count for each bin by the total number of data point values in the sample. It is achieved by using the histnorm = argument.
p3 <- plot_ly(x = ~wcc,
type = "histogram",
histnorm = "probability") %>%
layout(title = "Frequency distribution of white cell count",
xaxis = list(title = "White cell count",
zeroline = FALSE),
yaxis = list(title = "Frequency",
zeroline = FALSE))
p3
Flipping the plot to horizontal is as easy as setting the variable, wcc to the y = argument. Remember to change the axes titles as well.
p4 <- plot_ly(y = ~wcc,
type = "histogram",
histnorm = "probability") %>%
layout(title = "Frequency distribution of white cell count",
yaxis = list(title = "White cell count",
zeroline = FALSE),
xaxis = list(title = "Frequency",
zeroline = FALSE))
p4
A histogram for the same variable can be shown for more than one group. The code below creates a data.frame and then splits this into two data.frame objects using the dplyr library.
df <- data.frame(Group = sample(c("A", "B"),
200,
replace = TRUE),
WCC = wcc)
groupA <- df %>% filter(Group == "A")
groupB <- df %>% filter(Group == "B")
The code below creates a histogram of the WCC variable for each group. Since an overlay is going to occur, the opacity is set using the alpha = argument. Each individual histogram is added using the add_histogram() command. The barmode = argument is added to the layout() command.
p5 <- plot_ly(alpha = 0.7) %>%
add_histogram(x = ~groupA$WCC,
name = "Group A") %>%
add_histogram(x = ~groupB$WCC,
name = "GroupB") %>%
layout(barmode = "overlay",
title = "Histogram of white cell count for groups A and B",
xaxis = list(title = "White cell count",
zeroline = FALSE),
yaxis = list(title = "Count",
zeroline = FALSE))
p5