Use this guide to inform the graphics that you make using ggplot. You may copy and paste directly from this guide for course assignments - just make sure you change things like variable and data frame names where needed.
Notes next to each line indicate what that line will do, as we cover them in class
The big benefit of ggplot is that these lines can be combined in many different ways to produce exactly the graphic that you have in mind.
Note, uou are meant to just take specific pieces of code from this reference guide as needed.
Note, this list is not exhaustive, and we will add to it throughout the semester.
Here, we have 50 bars that have a dark grey outline and a grey fill color.
geom_histogram(aes(x = XVariableName), bins = 50, color = "darkgrey", fill = "grey")
Here, alpha = 0.15 sets transparency to 15%.
geom_point(aes(x = XVariableName, y = YVariableName, size = VariableName, color = VariableName), alpha = 0.15)
Jitter is a variation on scatterplots that allows points to separate
from their true value slightly. This is useful for points that are
stacked up or overlapping. We specify how much we will allow those
points to separate from their true value with width = and
height =. These arguments are on the scale of the x and y
axes, respectively.
geom_jitter(aes(x = XVariableName, y = YVariableName), alpha = 0.02, width = 0.35, height = 0.05)
geom_hex(aes(x = XVariableName, y = YVariableName))
Color is the color of the line and linetype is type of line (solid, dashed, etc.). Here, since they are in the aes() function, color and linetype are mapped to a variable.
geom_line(aes(x = XVariableName, y = YVariableName, color = VariableOrColor, linetype = VariableOrLinetype))
Here we need a ymax and ymin for the ribbon, the color we specify
will fill in the middle. alpha = 0.15 makes this shaded
region 15% transparent.
geom_ribbon(aes(x = XVariableName, ymin = YVariableLower, ymax = YVariableHigher), alpha = 0.15, fill = "palegreen")
Remember, x is our categorical variable in a boxplot.
geom_boxplot(aes(x = XVariableName, y = YVariableName))
This creates a bar graph. The stat argument is used to signify whether this should be a stacked bar graph, regular, etc. This denotes regular bar graph. Google the other options it can be! Sometimes the stat option will put a legend on the graph. You can remove that legend by use show.legend = FALSE.
geom_col(aes(x = XVariableName, y = YVariableName), stat = "identity", show.legend = FALSE)
Here are some details on making stacked and grouped bar charts: https://r-graph-gallery.com/48-grouped-barplot-with-ggplot2
Create a ridgeline graph, showing densities. Note that here, the
categorical variable goes on the y axis, not x. You can add a fill color
based on another categorical variable. If we include the argument
stat = "binline", these will change from density functions
to histograms.
geom_density_ridges(aes(x = XVariableName, y = YVariableName), panel_scaling = FALSE)
geom_density_ridges_gradient(aes(x = XVariableName, y = YVariableName, fill = FillVariable), panel_scaling = FALSE)
Here’s an example of histograms that are colored on a gradient.
The fill = ..x.. just means fill it based on the x
variable. You can see here that the breaks in the x axis are also
specific explicitly. Changeing scale = 1.2 allows you to
change the amount that the plots overlap.
geom_density_ridges_gradient(aes(x = Duration, y = Species, fill = ..x..), stat = "binline", bins = 15, draw_baseline = FALSE, show.legend = FALSE, position = "identity", breaks = c(0,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90), scale = 1.2)
This means that the contour variable is essentially a density of points, similar to the hex bins. Adding more bins makes the graphic a bit more fluid. This is essentially a 2D histogram.
geom_density_2d_filled(aes(x = XVariableName, y = YVariableName), contour_var = "ndensity", bins = 5, show.legend = FALSE)
This line does the same as above, but gets rid of the contour lines and the bins and makes the colors a true gradient.
stat_density_2d(aes(fill = after_stat(ndensity)), geom = "raster", contour = FALSE, contour_var = "ndensity", position = "identity", show.legend = FALSE)
Another option is:
stat_density2d(aes(x = XVariableName, y = YVariableName, fill = ..density..), geom = "raster", contour = FALSE, show.legend = FALSE)
This line will make several different plots based on a Variable. For
instance, let’s say you wanted three scatterplots - one for each
species. VariableName would be Species, and you’d have
geom_point above. In this form, the different plots will be
stacked on top of each other. You can use
facet_grid( ~ VariableName, scales = "free") to put them
side by side.
facet_grid(VariableName ~ ., scales = "free")
You can also make a grid based on two variables:
facet_grid(VariableName ~ VariableName, scales = "free")
Here are some preset theme examples that work well:
theme_bw() theme_minimal()
theme_classic() theme_linedraw()
theme_light() theme_void()
We can also add some white space around the graph using theme, which is typically useful.
theme(plot.margin = margin(1, 1, 1, 1, "cm"))
Grid lines are also a part of themes. Here’s how to change them: https://felixfan.github.io/ggplot2-remove-grid-background-margin/
Legends are a part of the theme. We can mess with them in lots of ways. Here are some details: http://www.sthda.com/english/wiki/ggplot2-legend-easy-steps-to-change-the-position-and-the-appearance-of-a-graph-legend-in-r-software
We can change the text size of everything on the graph at once with
this argument inside our theme: base_size = 16
theme_bw(base_size = 16)
We can change the font and size of individual elements as well. This line changes the text to Times New Roman size 12, and also changes the x axis text to be rotated on a 90 degree angle, for example.
theme(axis.text = element_text(family = "Times New Roman",size = 12), text = element_text(family = "Times New Roman", size = 12), axis.text.x = element_text(angle = 45, vjust =1, hjust = 1))
We can change the labels of axes, etc. with the labs() function.
labs(x = "X Variable Label", y = "Y Variable Label", fill = "Fill Variable Label", color = "Color Variable Label", linetype = "Linetype Label")
We can also add titles, subtitles, and captions the same way.
labs(title = "Title", subtitle = "Subtitle", caption = "Caption")
We won’t really use R to add labels to points, but here’s how you do it in case it’s of interest in the future: https://www.geeksforgeeks.org/how-to-add-labels-directly-in-ggplot2-in-r/
We can change limits quickly with xlim() and
ylim().
xlim(c(0,100)) ylim(c(0,100))
We can change the axis tick numbers manually on each axis as well. This function depends on whether the axis is discrete or continuous. Typically, it’s continuous. Here, we set the breaks in the axis to 1 through 12, like if we were graphing month on the x axis.
scale_x_continuous(limits = c(1,12),breaks = c(1,2,3,4,5,6,7,8,9,10,11,12))
To edit y, we would use scale_y_continuous().
We can add an argument called expand that allows the plot to fill the entire frame in the x or y directions.
expand = c(0, 0)
To put an axis on a log scale, we can use:
trans = "log"
There are a ton of different ways to change colors, which means it can be frustrating. You need to know a few key pieces of information before you start it. First, what colors are you aiming for? Second, is the thing you’re trying to color a single entity (e.g. turn a line red, where the line is the only thing on the graph), or is a variable (e.g. color code a line graph by Species).
Next, you need to know if what you’re coloring is discrete or continuous. An example of a discrete variable is Species - each species is a single color. Continuous is the opposite, something like temperature, which would require a color ramp.
The functions we use to set colors depends upon what we’re coloring, and whether it’s discrete or continuous.
Note that R calls outlines “color” and filled colors “fill”.
For example, change an entire bar graph - the outline and fill -
green. Note that since color doesn’t depend on a variable, it goes
outside the aes() function.
geom_col(aes(x = XVariableName, y = YVariableName), stat = "identity", show.legend = FALSE, color = "green", fill = "green")
Note that whatever we are coloring has to be mapped to a variable in an aes() function somewhere. That can be in a line plot to have different lines of different colors, each representing a different variable, for example. Or it could be a grouped bar graph where different colors represent different variables.
Color can also be mapped to a continuous variable, like on a gradient. That might be in a ridgeline plot where color changes as the x axis variable changes.
Color and fill are changed the same way, with slightly different
functions: starting with scale_fill_ or
scale_color_.
We can change colors on a discrete scale manually, or with a preset selection.
For example, let’s say that we had three bars on a graph that we wanted to fill a different shade of green - one each. Or fill was mapped to a categorical variables that had three levels. We could use:
scale_fill_manual(values = c("palegreen3","palegreen4","darkgreen"))
The strings that you put in the values list are presets in ggplot: http://sape.inf.usi.ch/quick-reference/ggplot2/colour
We can use the viridis color ramps as well, and map them to discrete
variables using the discrete = TRUE argument.
scale_fill_viridis(discrete = TRUE, option = "magma")
We can change the direction of the ramp by specifying
direction = 1 or direction = -1.
Another option you can use to generate bins of colors is here: https://www.learnui.design/tools/data-color-picker.html#palette
You can specify these hex codes instead of the ggplot strings.
We can also use the brewer color palettes: https://ggplot2.tidyverse.org/reference/scale_brewer.html
scale_fill_brewer(palette = "Greens")
Continuous colors are mapped to a color ramp. You can create your own color ramp, use a viridis ramp, or a number of other options. There are many, many functions out there to specificy a color ramp. We’ll do manual and viridis because those are widely applicable.
To manually specify a ramp we can use scale_fill_gradient(). We specify the low and high end of the color ramp.
scale_color_gradient(low = "blue", high = "red")
We can also add mid = "white" or another color to
specify a middle point, with a slightly tweaked function.
scale_color_gradient2(low = "blue", mid = "white", high = "red")
Typically to produce a great graphic that is accessible but doesn’t have to be branded to a specific color, we will use preset color ramps in R. The viridis color ramps are the ones I like best, but there are others. Here are some examples of how to do this.
Viridis ramps: https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html
scale_fill_viridis(option = "magma")
Brewer color palettes: https://ggplot2.tidyverse.org/reference/scale_brewer.html
scale_fill_distiller(palette = "Spectral")
Finally, we can use the scico palettes: https://www.data-imaginist.com/2018/scico-and-the-colour-conundrum/
scale_fill_scico(palette = 'turku')