Ryan Clement, Data Services Librarian: go/ryan/
Wendy Shook, Science Data Librarian: go/wshook/
Jonathan Kemp, Telescope & Scientific Computing Specialist: go/jkemp/
June 27, 2022 1:00 - 3:30 PM EDT
Ryan Clement, Data Services Librarian: go/ryan/
Wendy Shook, Science Data Librarian: go/wshook/
Jonathan Kemp, Telescope & Scientific Computing Specialist: go/jkemp/
ggplot2 basicsNOTE: We’ll have a break about halfway through!
interviews_plotting <-
read_csv("https://raw.githubusercontent.com/rkclement/2021-summer-data-workshops/main/data_output/interviews_plotting.csv")
“The layered grammar of graphics approach is implemented in ggplot2, a widely used graphics library for R. All graphics in this library are built using a layered approach, building layers up to create the final graphic.”
— Benjamin Soltoff, “Computing for the Social Sciences,” University of Chicago, emphasis added.
“A grammar may … help guide us on what a well-formed or correct graphic looks like, but there will still be many grammatically correct but nonsensical graphics. This is easy to see by analogy to the English language: good grammar is just the first step in creating a good sentence.”
— Hadley Wickham. “A Layered Grammar of Graphics.” Journal of Computational and Graphical Statistics 19(1) (2010): 3–28. emphasis added.
ggplot2 graphicggplot2 graphicaes)
color, fill, shape, and alphageom)
geom_bar), scatterplot (geom_point), line (geom_line), etc.<DATA> %>%
ggplot(aes(<MAPPINGS>)) +
<GEOM_FUNCTION>()
+ for adding layers## This is the correct syntax for adding layers
interviews_plot +
geom_point()
## This will not add the new layer and will return an error message
interviews_plot
+ geom_point()
Use what you just learned to create a scatter plot of rooms by village with the respondent_wall_type showing in different colors. Does this seem like a good way to display the relationship between these variables? What other kinds of plots might you use to show this type of data?
When you’re done, please give a :thumbs-up: or a :green-check: in the Zoom reactions.
Boxplots are useful summaries, but hide the shape of the distribution. For example, if the distribution is bimodal, we would not see it in a boxplot. An alternative to the boxplot is the violin plot, where the shape (of the density of points) is drawn.
geom_violin().When you’re done, please give a :thumbs-up: or a :green-check: in the Zoom reactions.
So far, we’ve looked at the distribution of room number within wall type. Try making a new plot to explore the distribution of another variable within wall type.
liv_count for each wall type. Overlay the boxplot layer on a jitter layer to show actual measurements.When you’re done, please give a :thumbs-up: or a :green-check: in the Zoom reactions.
Add colour to the data points on your boxplot according to whether the respondent is a member of an irrigation association (memb_assoc).
When you’re done, please give a :thumbs-up: or a :green-check: in the Zoom reactions.
Create a bar plot showing the proportion of respondents in each village who are or are not part of an irrigation association (memb_assoc). Include only respondents who answered that question in the calculations and plot. Which village had the lowest proportion of respondents in an irrigation association?
The labs function takes the following arguments:
title – to produce a plot titlesubtitle – to produce a plot subtitle (smaller text placed beneath the title)caption – a caption for the plot... – any pair of name and value for aesthetics used in the plot (e.g., x, y, fill, color, size)ggplot2 themesIn addition to theme_bw(), which changes the plot background to white, ggplot2 comes with several other themes which can be useful to quickly change the look of your visualization. The complete list of themes is available at https://ggplot2.tidyverse.org/reference/ggtheme.html. theme_minimal() and theme_light() are popular, and theme_void() can be useful as a starting point to create a new hand-crafted theme.
The ggthemes package provides a wide variety of options (including an Excel 2003 theme). The ggplot2 extensions website provides a list of packages that extend the capabilities of ggplot2, including additional themes.
With all of this information in hand, please take another five minutes to either improve one of the plots generated in this exercise or create a beautiful graph of your own. Use the RStudio ggplot2 cheat sheet for inspiration. Here are some ideas:
After creating your plot, you can save it to a file in your favorite format. The Export tab in the Plot pane in RStudio will save your plots at low resolution, which will not be accepted by many journals and will not scale well for posters.
Instead, use the ggsave() function, which allows you easily change the dimension and resolution of your plot by adjusting the appropriate arguments (width, height, and dpi).
Make sure you have the fig_output/ folder in your working directory.
To sign up for more sessions: go/summer-data-workshops/
Assessment survey: go/summer-data-assessment/
Ryan’s contact info: go/ryan/