Use this guide to inform the graphics that you make using ggplot. You may copy and paste directly from this guide for course assignments - just make sure you change things like variable and data frame names where needed.

Notes next to each line indicate what that line will do, as we cover them in class

The big benefit of ggplot is that these lines can be combined in many different ways to produce exactly the graphic that you have in mind.

Note, uou are meant to just take specific pieces of code from this reference guide as needed.

Note, this list is not exhaustive, and we will add to it throughout the semester.

Plotting Functions

Histogram

Here, we have 50 bars that have a dark grey outline and a grey fill color.

geom_histogram(aes(x = XVariableName), bins = 50, color = "darkgrey", fill = "grey")

Scatterplot

Here, alpha = 0.15 sets transparency to 15%.

geom_point(aes(x = XVariableName, y = YVariableName, size = VariableName, color = VariableName), alpha = 0.15)

Jitter is a variation on scatterplots that allows points to separate from their true value slightly. This is useful for points that are stacked up or overlapping. We specify how much we will allow those points to separate from their true value with width = and height =. These arguments are on the scale of the x and y axes, respectively.

geom_jitter(aes(x = XVariableName, y = YVariableName), alpha = 0.02, width = 0.35, height = 0.05)

Hexbin

geom_hex(aes(x = XVariableName, y = YVariableName))

Line

Color is the color of the line and linetype is type of line (solid, dashed, etc.). Here, since they are in the aes() function, color and linetype are mapped to a variable.

geom_line(aes(x = XVariableName, y = YVariableName, color = VariableOrColor, linetype = VariableOrLinetype))

Ribbon (like a CI)

Here we need a ymax and ymin for the ribbon, the color we specify will fill in the middle. alpha = 0.15 makes this shaded region 15% transparent.

geom_ribbon(aes(x = XVariableName, ymin = YVariableLower, ymax = YVariableHigher), alpha = 0.15, fill = "palegreen")

Boxplot

Remember, x is our categorical variable in a boxplot.

geom_boxplot(aes(x = XVariableName, y = YVariableName))

Barplot

This creates a bar graph. The stat argument is used to signify whether this should be a stacked bar graph, regular, etc. This denotes regular bar graph. Google the other options it can be! Sometimes the stat option will put a legend on the graph. You can remove that legend by use show.legend = FALSE.

geom_col(aes(x = XVariableName, y = YVariableName), stat = "identity", show.legend = FALSE)

Here are some details on making stacked and grouped bar charts: https://r-graph-gallery.com/48-grouped-barplot-with-ggplot2

Ridgeline

Create a ridgeline graph, showing densities. Note that here, the categorical variable goes on the y axis, not x. You can add a fill color based on another categorical variable. If we include the argument stat = "binline", these will change from density functions to histograms.

geom_density_ridges(aes(x = XVariableName, y = YVariableName), panel_scaling = FALSE)

geom_density_ridges_gradient(aes(x = XVariableName, y = YVariableName, fill = FillVariable), panel_scaling = FALSE)

Here’s an example of histograms that are colored on a gradient. The fill = ..x.. just means fill it based on the x variable. You can see here that the breaks in the x axis are also specific explicitly. Changeing scale = 1.2 allows you to change the amount that the plots overlap.

geom_density_ridges_gradient(aes(x = Duration, y = Species, fill = ..x..), stat = "binline", bins = 15, draw_baseline = FALSE, show.legend = FALSE, position = "identity", breaks = c(0,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90), scale = 1.2)

2D Density

This means that the contour variable is essentially a density of points, similar to the hex bins. Adding more bins makes the graphic a bit more fluid. This is essentially a 2D histogram.

geom_density_2d_filled(aes(x = XVariableName, y = YVariableName), contour_var = "ndensity", bins = 5, show.legend = FALSE)

This line does the same as above, but gets rid of the contour lines and the bins and makes the colors a true gradient.

stat_density_2d(aes(fill = after_stat(ndensity)), geom = "raster", contour = FALSE, contour_var = "ndensity", position = "identity", show.legend = FALSE)

Another option is:

stat_density2d(aes(x = XVariableName, y = YVariableName, fill = ..density..), geom = "raster", contour = FALSE, show.legend = FALSE)

Facets

This line will make several different plots based on a Variable. For instance, let’s say you wanted three scatterplots - one for each species. VariableName would be Species, and you’d have geom_point above. In this form, the different plots will be stacked on top of each other. You can use facet_grid( ~ VariableName, scales = "free") to put them side by side.

facet_grid(VariableName ~ ., scales = "free")

You can also make a grid based on two variables:

facet_grid(VariableName ~ VariableName, scales = "free")

Themes

Here are some preset theme examples that work well:

theme_bw() theme_minimal() theme_classic() theme_linedraw() theme_light() theme_void()

We can also add some white space around the graph using theme, which is typically useful.

theme(plot.margin = margin(1, 1, 1, 1, "cm"))

Grid Lines

Grid lines are also a part of themes. Here’s how to change them: https://felixfan.github.io/ggplot2-remove-grid-background-margin/

Legend Themes

Legends are a part of the theme. We can mess with them in lots of ways. Here are some details: http://www.sthda.com/english/wiki/ggplot2-legend-easy-steps-to-change-the-position-and-the-appearance-of-a-graph-legend-in-r-software

Text

We can change the text size of everything on the graph at once with this argument inside our theme: base_size = 16

theme_bw(base_size = 16)

We can change the font and size of individual elements as well. This line changes the text to Times New Roman size 12, and also changes the x axis text to be rotated on a 90 degree angle, for example.

theme(axis.text = element_text(family = "Times New Roman",size = 12), text = element_text(family = "Times New Roman", size = 12), axis.text.x = element_text(angle = 45, vjust =1, hjust = 1))

Labels

We can change the labels of axes, etc. with the labs() function.

labs(x = "X Variable Label", y = "Y Variable Label", fill = "Fill Variable Label", color = "Color Variable Label", linetype = "Linetype Label")

We can also add titles, subtitles, and captions the same way.

labs(title = "Title", subtitle = "Subtitle", caption = "Caption")

We won’t really use R to add labels to points, but here’s how you do it in case it’s of interest in the future: https://www.geeksforgeeks.org/how-to-add-labels-directly-in-ggplot2-in-r/

Axes

We can change limits quickly with xlim() and ylim().

xlim(c(0,100)) ylim(c(0,100))

We can change the axis tick numbers manually on each axis as well. This function depends on whether the axis is discrete or continuous. Typically, it’s continuous. Here, we set the breaks in the axis to 1 through 12, like if we were graphing month on the x axis.

scale_x_continuous(limits = c(1,12),breaks = c(1,2,3,4,5,6,7,8,9,10,11,12))

To edit y, we would use scale_y_continuous().

We can add an argument called expand that allows the plot to fill the entire frame in the x or y directions.

expand = c(0, 0)

To put an axis on a log scale, we can use:

trans = "log"

Colors

There are a ton of different ways to change colors, which means it can be frustrating. You need to know a few key pieces of information before you start it. First, what colors are you aiming for? Second, is the thing you’re trying to color a single entity (e.g. turn a line red, where the line is the only thing on the graph), or is a variable (e.g. color code a line graph by Species).

Next, you need to know if what you’re coloring is discrete or continuous. An example of a discrete variable is Species - each species is a single color. Continuous is the opposite, something like temperature, which would require a color ramp.

The functions we use to set colors depends upon what we’re coloring, and whether it’s discrete or continuous.

Note that R calls outlines “color” and filled colors “fill”.

A Single Entity (color doesn’t depend upon a variable)

For example, change an entire bar graph - the outline and fill - green. Note that since color doesn’t depend on a variable, it goes outside the aes() function.

geom_col(aes(x = XVariableName, y = YVariableName), stat = "identity", show.legend = FALSE, color = "green", fill = "green")

Variable-based Colors

Note that whatever we are coloring has to be mapped to a variable in an aes() function somewhere. That can be in a line plot to have different lines of different colors, each representing a different variable, for example. Or it could be a grouped bar graph where different colors represent different variables.

Color can also be mapped to a continuous variable, like on a gradient. That might be in a ridgeline plot where color changes as the x axis variable changes.

Color and fill are changed the same way, with slightly different functions: starting with scale_fill_ or scale_color_.

Discrete Colors

We can change colors on a discrete scale manually, or with a preset selection.

For example, let’s say that we had three bars on a graph that we wanted to fill a different shade of green - one each. Or fill was mapped to a categorical variables that had three levels. We could use:

scale_fill_manual(values = c("palegreen3","palegreen4","darkgreen"))

The strings that you put in the values list are presets in ggplot: http://sape.inf.usi.ch/quick-reference/ggplot2/colour

We can use the viridis color ramps as well, and map them to discrete variables using the discrete = TRUE argument.

scale_fill_viridis(discrete = TRUE, option = "magma")

We can change the direction of the ramp by specifying direction = 1 or direction = -1.

Another option you can use to generate bins of colors is here: https://www.learnui.design/tools/data-color-picker.html#palette

You can specify these hex codes instead of the ggplot strings.

We can also use the brewer color palettes: https://ggplot2.tidyverse.org/reference/scale_brewer.html

scale_fill_brewer(palette = "Greens")

Continuous Colors

Continuous colors are mapped to a color ramp. You can create your own color ramp, use a viridis ramp, or a number of other options. There are many, many functions out there to specificy a color ramp. We’ll do manual and viridis because those are widely applicable.

To manually specify a ramp we can use scale_fill_gradient(). We specify the low and high end of the color ramp.

scale_color_gradient(low = "blue", high = "red")

We can also add mid = "white" or another color to specify a middle point, with a slightly tweaked function.

scale_color_gradient2(low = "blue", mid = "white", high = "red")

Typically to produce a great graphic that is accessible but doesn’t have to be branded to a specific color, we will use preset color ramps in R. The viridis color ramps are the ones I like best, but there are others. Here are some examples of how to do this.

Viridis ramps: https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html

scale_fill_viridis(option = "magma")

Brewer color palettes: https://ggplot2.tidyverse.org/reference/scale_brewer.html

scale_fill_distiller(palette = "Spectral")

Finally, we can use the scico palettes: https://www.data-imaginist.com/2018/scico-and-the-colour-conundrum/

scale_fill_scico(palette = 'turku')