library(tidyverse)
Bar Charts with Diamonds Dataset
Access Library package - Tidyverse
Load the pre-built dataset, Diamonds, and view it in the global environment
head(diamonds) # shows the first few lines of the dataset
# A tibble: 6 × 10
carat cut color clarity depth table price x y z
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63
5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
data(diamonds) # places the dataset in the global environment
Statistical transformations (from R for Data Science)
Bar charts seem simple, but they are interesting because they reveal something subtle about plots. Consider a basic bar chart, as drawn with geom_bar(). The following chart displays the total number of diamonds in the diamonds dataset, grouped by cut. The diamonds dataset comes in ggplot2 and contains information about ~54,000 diamonds, including the price, carat, color, clarity, and cut of each diamond. The bar graph shows that more diamonds are available with high quality cuts than with low quality cuts.
First Bar Plot
ggplot(data = diamonds) +
geom_bar(aes(x = cut))
How do bar charts work with 2 variables?
Bar graphs are EASY when you have a single categorical variable that defines several levels for each observation. Ex: “cut” has levels: fair, good, very good, premium, and ideal. Each observation is categorized this way. But what if you have a table of aggregated data: x = cut vs y = frequency?
Here is a tibble to show this table and how you can create a bar graph from this data
A Tibble (think of this like a dataframe)
We will create a frequency table of the types of cuts that mimick the calculations done to create geom_bar
<- tribble(
demo ~cut, ~freq,
"Fair", 1610,
"Good", 4906,
"Very Good", 12082,
"Premium", 13791,
"Ideal", 21551
)
Demo Tibble Bar Plot Looks just like our other bar graphs
ggplot(data = demo) +
geom_bar(mapping = aes(x = cut, y = freq), stat = "identity")
(Don’t worry that you haven’t seen <- or tribble() before. You might be able to guess at their meaning from the context, and you’ll learn exactly what they do soon!)
Creating Proportional Bars
You might want to override the default mapping from transformed variables to aesthetics. For example, you might want to display a bar chart of proportion, rather than count:
Proportional Bar Graph (Relative Frequencies)
You need “group=1” when plotting proportions (try to omit it and see)
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, y = stat(prop), group = 1)) +
labs(x = "Diamond Cut", y = "Proportion",
title = "Proportional Bar Graph of Diamond Cuts")
Warning: `stat(prop)` was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(prop)` instead.
To find the variables computed by the stat, look for the help section titled “computed variables”.
This is a different type of plot that shows a line with min, max, and median values
You might want to draw greater attention to the statistical transformation in your code. For example, you might use stat_summary(), which summarises the y values for each unique x value, to draw attention to the summary that you’re computing:
Line Plot
This is a different way of visualizing center and spread of cuts and depth
ggplot(data = diamonds) +
stat_summary(
mapping = aes(x = cut, y = depth),
fun.min = min,
fun.max = max,
fun = median
)
Fill vs Color
Position adjustments
There’s one more piece of magic associated with bar charts. You can color a bar chart using either the color aesthetic, or, more usefully, fill:
Notice that “fill=” fills the inside of the bar, whereas “color=” draws a color outline of the bar. Alpha gives a level of transparency, with alpha = 0 is invisible and alpha = 1 is fully saturated
Bar Plot with Alpha Transparency
ggplot(data = diamonds, aes(x=cut, fill = cut)) +
geom_bar(alpha = 0.5)+ # try replacing alpha = 0.5 with 0.8 to see how it changes
labs(x = "Diamond Cut", y = "Frequency",
title = "Frequency Bar Graph of Diamond Cuts")
Try stacking bar graphs with position = “stack”
Note what happens if you map the fill aesthetic to another variable, like clarity: the bars are automatically stacked. Each colored rectangle represents a combination of cut and clarity.
ggplot(data = diamonds) +
geom_bar(aes(x = cut, fill = clarity), position = "stack") +
labs(x = "Diamond Cut", y = "Frequency",
title = "Stacked Bar Graph of Diamond Cuts by Clarity")
The identity position adjustment is more useful for 2d geoms, like points, where it is the default. position = “fill” works like stacking, but makes each set of stacked bars the same height. This makes it easier to compare proportions across groups.
Using position = “fill” the bars fill the vertical space proportionally
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity), position = "fill") +
labs(x = "Diamond Cut", y = "Proportion",
title = "Proportional Bar Graph of Diamond Cuts by Clarity")
Position = “dodge” will get side-by-side bars
ggplot(data = diamonds, aes(x = cut, fill = clarity)) +
geom_bar(alpha = .7, position = "dodge") +
labs(x = "Diamond Cut", y = "Frequency",
title = "Side-by-Side Bar Graph of Diamond Cuts by Clarity")
Change the angle of the x-axis labels
When x-axis labels are too long, they may overlap. You can change the text angle with axis.text.x = element_text(angle = 45))
ggplot(data = diamonds, aes(x = cut, fill = clarity)) +
geom_bar(alpha = .7, position = "dodge") +
labs(x = "Diamond Cut", y = "Frequency",
title = "Side-by-Side Bar Graph of Diamond Cuts by Clarity") +
theme(axis.text.x = element_text(angle = 45))
Finally, make the x-axis labels fit in a narrow width
Here is another option for dealing with x-axis labels when they are long. You can use this function to break words into 2 lines.
ggplot(data = diamonds, aes(x = cut, fill = clarity)) +
geom_bar(alpha = .7, position = "dodge") +
labs(x = "Diamond Cut", y = "Frequency",
title = "Side-by-Side Bar Graph of Diamond Cuts by Clarity") +
scale_x_discrete(labels = function(x) str_wrap(x, width = 5))
# notice "very good" will fit on two lines instead of one line
Bonus “bar-like” graph - Polar Plot
<- ggplot(data = diamonds) +
bar geom_bar(aes(x = clarity, fill = clarity),
show.legend = FALSE, width = 1) +
theme(aspect.ratio = 1)
+ coord_flip() bar
+ coord_polar() bar