# A tibble: 6 × 10
carat cut color clarity depth table price x y z
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63
5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
Bar charts seem simple, but they are interesting because they reveal something subtle about plots. Consider a basic bar chart, as drawn with geom_bar(). The following chart displays the total number of diamonds in the diamonds dataset, grouped by cut. The diamonds dataset comes in ggplot2 and contains information about ~54,000 diamonds, including the price, carat, color, clarity, and cut of each diamond. The bar graph shows that more diamonds are available with high quality cuts than with low quality cuts.
Bar graphs are EASY when you have a single categorical variable that defines several levels for each observation. Ex: “cut” has levels: fair, good, very good, premium, and ideal. Each observation is categorized this way. But what if you have a table of aggregated data: x = cut vs y = frequency?
Here is a tibble to show this table and how you can create a bar graph from this data
We will create a frequency table of the types of cuts that mimick the calculations done to create geom_bar
(Don’t worry that you haven’t seen tibble() or tribble() before. You might be able to guess at their meaning from the context, and you’ll learn exactly what they do soon!)
You might want to override the default mapping from transformed variables to aesthetics. For example, you might want to display a bar chart of proportion, rather than count:
You need “group=1” when plotting proportions (try to omit it and see)
To find the variables computed by the stat, look for the help section titled “computed variables”.
You might want to draw greater attention to the statistical transformation in your code. For example, you might use stat_summary(), which summarises the y values for each unique x value, to draw attention to the summary that you’re computing:
This is a different way of visualizing center and spread of cuts and depth
There’s one more piece of magic associated with bar charts. You can color a bar chart using either the color aesthetic, or, more usefully, fill:
Notice that “fill=” fills the inside of the bar, whereas “color=” draws a color outline of the bar. Alpha gives a level of transparency, with alpha = 0 is invisible and alpha = 1 is fully saturated
Note what happens if you map the fill aesthetic to another variable, like clarity: the bars are automatically stacked. Each colored rectangle represents a combination of cut and clarity.
The identity position adjustment is more useful for 2d geoms, like points, where it is the default. position = “fill” works like stacking, but makes each set of stacked bars the same height. This makes it easier to compare proportions across groups.
When x-axis labels are too long, they may overlap. You can change the text angle with axis.text.x = element_text(angle = 45))
Here is another option for dealing with x-axis labels when they are long. You can use this function to break words into 2 lines.
Notice “very good” will fit on two lines instead of one line