class: center, middle, inverse, title-slide # Creating Beautiful Data Visualizations in R: ## a
ggplot2
Crash Course ### Samantha Tyner, Ph.D. ### July 21, 2020, 2:00-4:00pm EDT --- <link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.6.0/css/all.css" integrity="sha384-aOkxzJ5uQz7WBObEZcHvV5JvRW3TUc2rNPA7pe3AwnsUohiw1Vj2Rgx2KSOkF5+h" crossorigin="anonymous"> # How to watch this webinar For optimum learning: - Have both the webinar software and rstudio.cloud visible at all times * Material available at https://rstudio.cloud/project/1442337 - Follow along with the code (`project/code/03-follow-along.R`) or the slides (`project/slides/slides.Rmd`) and run the code as we go - Use the ask question and chat features to communicate with TAs and ask questions - Have fun! --- # Learning Goals Upon completion of this tutorial, you will be able to: 1. **identify** the appropriate plot types and corresponding `ggplot2` `geom`s to consider when visualizing your data; 2. **implement** the `ggplot2` grammar of graphics by using `ggplot()` and building up plots with the `+` operator; 3. **iterate** through multiple visualizations of your data by changing the aesthetic mappings, geometries, and other graph properties; 4. **incorporate** custom elements (colors, fonts, etc.) into your visualizations by adjusting `ggplot2` theme elements; and 5. **investigate** the world of `ggplot2` independently to expand upon the skills learned in the course. --- # Motivating the motivating example Clip from *Brooklyn Nine-Nine*, Season 4, Episode 8: <iframe width="850" height="478" src="https://www.youtube.com/embed/QGxyIQzLeUc?start=0&end=63" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> --- # Motivating example -- <img src="gifs/monty_hall.gif" width="49%" style="display: block; margin: auto;" /> --- class: inverse, center # `ggplot2` and its Grammar of Graphics <div class="figure" style="text-align: center"> <img src="img/ggplot2_masterpiece.png" alt="Artwork by <a href="https://twitter.com/allison_horst">@allison_horst</a>" width="50%" /> <p class="caption">Artwork by <a href="https://twitter.com/allison_horst">@allison_horst</a></p> </div> --- # What is the grammar of graphics? From a book, [*The Grammar of Graphics*](https://www.springer.com/us/book/9781475731002) by Leland Wilkinson (1999) > **grammar** (noun): (1) the study of the classes of words, their inflections, and their functions and relations in the sentence; (2) the principles or rules of an art, science, or technique > **grammar of graphics** (noun): a set of principles for constructing data visualizations ### grammar of language : sentence :: grammar of graphics : data visualization <i class="fas fa-bullhorn" style="color: #FF0035;"></i> I will try to use "data visualization" instead of "plot", "graph", "chart", or "graphic" because it is a more precise term. --- # A simple data viz <img src="img/monty_hall.png" width="40%" style="display: block; margin: auto;" /> --- # Data ↔️ Noun .pull-left[ ### Data Visualization (Sentence) <img src="img/monty_hall.png" width="70%" style="display: block; margin: auto;" /> ] .pull-right[ ### Data (Noun) <table> <thead> <tr> <th style="text-align:left;"> Switch </th> <th style="text-align:left;"> Win </th> <th style="text-align:right;"> n </th> <th style="text-align:right;"> perc </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> No </td> <td style="text-align:left;"> Lose </td> <td style="text-align:right;"> 29 </td> <td style="text-align:right;"> 0.6744186 </td> </tr> <tr> <td style="text-align:left;"> No </td> <td style="text-align:left;"> Win </td> <td style="text-align:right;"> 14 </td> <td style="text-align:right;"> 0.3255814 </td> </tr> <tr> <td style="text-align:left;"> Yes </td> <td style="text-align:left;"> Lose </td> <td style="text-align:right;"> 17 </td> <td style="text-align:right;"> 0.2982456 </td> </tr> <tr> <td style="text-align:left;"> Yes </td> <td style="text-align:left;"> Win </td> <td style="text-align:right;"> 40 </td> <td style="text-align:right;"> 0.7017544 </td> </tr> </tbody> </table> ] --- # `ggplot2` code ```r mh_sims <- readr::read_csv("dat/monty_hall.csv") library(ggplot2) ggplot(data = mh_sims) ``` <img src="slides_files/figure-html/unnamed-chunk-6-1.png" width="30%" style="display: block; margin: auto;" /> --- # Geom ↔️ Verb .pull-left[ ### Data Visualization (Sentence) <img src="img/monty_hall.png" width="70%" style="display: block; margin: auto;" /> ] .pull-right[ ### Geom (Verb) - bar chart a.k.a. column chart - [`geom_col()`](https://ggplot2.tidyverse.org/reference/geom_bar.html) ] --- # In `ggplot2` code We build up a data visualization in `ggplot2` with the `+` operator. ```r ggplot(data = mh_sims) + geom_col() ``` .preman[ `## Error: geom_col requires the following missing aesthetics: x, y` ] <i class="fas fa-bullhorn" style="color: #FF0035;"></i> The `geom_*()` suite of functions can take many arguments, which vary by the geom type <div class="figure"> <img src="img/geom-basic-2.png" alt="Source: <a href="https://ggplot2-book.org/individual-geoms.html">ggplot2 book</a>" width="12.5%" /><img src="img/geom-basic-3.png" alt="Source: <a href="https://ggplot2-book.org/individual-geoms.html">ggplot2 book</a>" width="12.5%" /><img src="img/geom-basic-4.png" alt="Source: <a href="https://ggplot2-book.org/individual-geoms.html">ggplot2 book</a>" width="12.5%" /><img src="img/geom-basic-5.png" alt="Source: <a href="https://ggplot2-book.org/individual-geoms.html">ggplot2 book</a>" width="12.5%" /><img src="img/geom-basic-6.png" alt="Source: <a href="https://ggplot2-book.org/individual-geoms.html">ggplot2 book</a>" width="12.5%" /><img src="img/geom-basic-7.png" alt="Source: <a href="https://ggplot2-book.org/individual-geoms.html">ggplot2 book</a>" width="12.5%" /><img src="img/geom-basic-8.png" alt="Source: <a href="https://ggplot2-book.org/individual-geoms.html">ggplot2 book</a>" width="12.5%" /><img src="img/geom-basic.png" alt="Source: <a href="https://ggplot2-book.org/individual-geoms.html">ggplot2 book</a>" width="12.5%" /> <p class="caption">Source: <a href="https://ggplot2-book.org/individual-geoms.html">ggplot2 book</a></p> </div> --- # aes mapping ↔️ Pronouns .pull-left[ ### Data Visualization (Sentence) <img src="img/monty_hall.png" width="70%" style="display: block; margin: auto;" /> ] .pull-right[ ### aes mapping (pronouns) - x-axis: Outcome (win or lose) - y-axis: % of outcomes in switch group - Fill color: Outcome ] --- # In `ggplot2` code Use the `aes()` function inside `ggplot()`. (Can also use in `geom_col()`.) ```r ggplot(data = mh_sims, aes(x = Win, y = perc, fill = Win)) + geom_col() ``` <img src="slides_files/figure-html/unnamed-chunk-11-1.png" width="70%" style="display: block; margin: auto;" /> --- # In `ggplot2` code - **Note**: aesthetics / aes values do not have to be connected to data. - To change an aes value for the entire plot, use the aes value *outside* of the `aes()` function. ```r ggplot(data = mh_sims, aes(x = Win, y = perc, fill = Win)) + geom_col(fill = "#4AA0BB") ``` <img src="slides_files/figure-html/unnamed-chunk-12-1.png" width="65%" style="display: block; margin: auto;" /> --- # Stat ↔️ Adverb .pull-left[ ### Data Visualization (Sentence) <img src="img/monty_hall.png" width="70%" style="display: block; margin: auto;" /> ] .pull-right[ ### stat (adverb) - **Identity**: The data are not altered in any way <table> <thead> <tr> <th style="text-align:left;"> Switch </th> <th style="text-align:left;"> Win </th> <th style="text-align:right;"> n </th> <th style="text-align:right;"> perc </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> No </td> <td style="text-align:left;"> Lose </td> <td style="text-align:right;"> 29 </td> <td style="text-align:right;"> 0.6744186 </td> </tr> <tr> <td style="text-align:left;"> No </td> <td style="text-align:left;"> Win </td> <td style="text-align:right;"> 14 </td> <td style="text-align:right;"> 0.3255814 </td> </tr> <tr> <td style="text-align:left;"> Yes </td> <td style="text-align:left;"> Lose </td> <td style="text-align:right;"> 17 </td> <td style="text-align:right;"> 0.2982456 </td> </tr> <tr> <td style="text-align:left;"> Yes </td> <td style="text-align:left;"> Win </td> <td style="text-align:right;"> 40 </td> <td style="text-align:right;"> 0.7017544 </td> </tr> </tbody> </table> ] --- # In `ggplot2` code ```r ggplot(data = mh_sims, aes(x = Win, y = perc, fill = Win)) + * stat_identity(geom="col") ``` <img src="slides_files/figure-html/unnamed-chunk-15-1.png" width="70%" style="display: block; margin: auto;" /> --- # Theme ↔️ Adjective .pull-left[ ### Data Visualization (Sentence) <img src="img/monty_hall.png" width="70%" style="display: block; margin: auto;" /> ] .pull-right[ ### Theme (adjectives) - white background - no gridlines - text font, size & face ] --- # In `ggplot2` code The `theme()` function can modify any non-data element of the plot. ```r p <- ggplot(data = mh_sims, aes(x = Win, y = perc, fill = Win)) + geom_col() ``` <i class="fas fa-bullhorn" style="color: #FF0035;"></i> This is a common trick. Create an object `p` that is the `ggplot` to add to later on. --- # In `ggplot2` code Can use pre-made `theme_*()` functions and the `theme()` function to alter any non-data element of the plot. ```r p2 <- p + theme_bw() + theme(text = element_text(family = "serif", size = 20), panel.grid = element_blank()) ``` <div class="figure"> <img src="img/built-in-1.png" alt="Source: <a href="https://ggplot2-book.org/polishing.html">ggplot2 book</a>" width="16.65%" /><img src="img/built-in-2.png" alt="Source: <a href="https://ggplot2-book.org/polishing.html">ggplot2 book</a>" width="16.65%" /><img src="img/built-in-3.png" alt="Source: <a href="https://ggplot2-book.org/polishing.html">ggplot2 book</a>" width="16.65%" /><img src="img/built-in-4.png" alt="Source: <a href="https://ggplot2-book.org/polishing.html">ggplot2 book</a>" width="16.65%" /><img src="img/built-in-5.png" alt="Source: <a href="https://ggplot2-book.org/polishing.html">ggplot2 book</a>" width="16.65%" /><img src="img/built-in-6.png" alt="Source: <a href="https://ggplot2-book.org/polishing.html">ggplot2 book</a>" width="16.65%" /> <p class="caption">Source: <a href="https://ggplot2-book.org/polishing.html">ggplot2 book</a></p> </div> --- # In `ggplot2` code ```r p2 ``` <img src="slides_files/figure-html/unnamed-chunk-20-1.png" width="80%" style="display: block; margin: auto;" /> --- # Guides ↔️ Prepositions .pull-left[ ### Data Visualization (Sentence) <img src="img/monty_hall.png" width="70%" style="display: block; margin: auto;" /> ] .pull-right[ ### guide (preposition) - There is no guide/legend. - Not needed here, because color and x-axis are the same. ] --- # In `ggplot2` code The `guides()` function controls all legends by connecting to the aes. ```r p3 <- p2 + guides(fill = "none") p3 ``` <img src="slides_files/figure-html/unnamed-chunk-22-1.png" width="70%" style="display: block; margin: auto;" /> --- # Facets ↔️ Conjunctions .pull-left[ ### Data Visualization (Sentence) <img src="img/monty_hall.png" width="70%" style="display: block; margin: auto;" /> ] .pull-right[ ### Facets (conjuction) - Facet by the Switch variable ] --- # `ggplot2` code ```r p4 <- p3 + facet_grid(cols = vars(Switch)) p4 ``` <img src="slides_files/figure-html/unnamed-chunk-24-1.png" width="70%" style="display: block; margin: auto;" /> --- # Other grammar elements ### **Scales**: control how data are translated to visual properties (sentence structure) ### **Coordinate system**: how data are positioned in a 2D data visualization (verb tense) ### **Position**: How to deal with overlap, if any (word order) - "native speakers" don't have to think about these too much - `ggplot2` has smart defaults here, less work for you - Largely up to individual taste/style --- # Example .pull-left[ ### Data Visualization (Sentence) <img src="img/monty_hall.png" width="70%" style="display: block; margin: auto;" /> ] .pull-right[ #### Scales, coordinate systems, positions (sentence structure, verb tense, word order) - Scale: y-axis labeled every 10% - Scale: color of bars - Scale: axes & plot titles - Coordinate system: cartesian (automatic) - Position: no position shift (identity) ] --- # `ggplot2` code There are no overlapping elements, so no position needed. ```r p4 + scale_y_continuous(name = NULL, breaks = seq(0, .7, by = .1), label = scales::label_percent(accuracy = 1)) + scale_fill_manual(values = c("#F6C40C", "#4AA0BB")) + labs(x = NULL, title = "Let's Make a Deal!") ``` --- # Final Result (for now) <img src="slides_files/figure-html/unnamed-chunk-27-1.png" width="50%" style="display: block; margin: auto;" /> --- # Complete `ggplot2` code ```r ggplot(data = mh_sims, aes(x = Win, y = perc, fill = Win)) + geom_col() + facet_grid(cols = vars(Switch)) + scale_y_continuous(name = NULL, breaks = seq(0, .7, by = .1), label = scales::label_percent(accuracy = 1)) + scale_fill_manual(values = c("#F6C40C", "#4AA0BB")) + labs(x = NULL, title = "Let's Make a Deal!") + theme_bw() + theme(text = element_text(family = "serif", size = 20), panel.grid = element_blank()) + guides(fill = "none") ``` --- # Complete `ggplot2` code ```r *ggplot(data = mh_sims, * aes(x = Win, y = perc, fill = Win)) + * geom_col() + * facet_grid(cols = vars(Switch)) + scale_y_continuous(name = NULL, breaks = seq(0, .7, by = .1), label = scales::label_percent(accuracy = 1)) + scale_fill_manual(values = c("#F6C40C", "#4AA0BB")) + labs(x = NULL, title = "Let's Make a Deal!") + theme_bw() + theme(text = element_text(family = "serif", size = 20), panel.grid = element_blank()) + guides(fill = "none") ``` --- # What's next? More and more detail on the `ggplot2` universe: - `geom`s for one- and two-variable visualization - Including three or more variables in the visualization * Colors, sizes, shapes, linetypes * Grouping & faceting * Maps --- class: inverse, center # One-variable visualization <div class="figure" style="text-align: center"> <img src="img/tradition.png" alt="<a href='https://xkcd.com/988/'>xkcd.com/988/</a>" width="60%" /> <p class="caption"><a href='https://xkcd.com/988/'>xkcd.com/988/</a></p> </div> --- # Histogram > A **histogram** approximates the distribution of a single numeric variable. It shows frequency of values in specified ranges. ### **`geom_histogram`** - requires the `x` aesthetic inside `aes()` - Specify width of bars with the `bins` or `binwidth` argument - Can change appearance of the bars with `color`, `fill`, `alpha` arguments --- # Your Turn
02
:
00
Complete the code below to recreate the histogram of weekly NFL attendance. (Source: [Tidy Tuesday](https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-02-04/readme.md)) ```r library(readr) attendance <- read_csv("dat/nfl_attendance.csv") ggplot(data = ???, aes(x = ???)) + geom_???() ``` <img src="slides_files/figure-html/unnamed-chunk-32-1.png" width="75%" style="display: block; margin: auto;" /> --- # Bar chart > A **bar chart** displays counts of a categorical variable, and is the categorical equivalent of the histogram. ### **`geom_bar`** - requires the `x` aesthetic inside `aes()` - Can change appearance of the bars with `color`, `fill`, `alpha` arguments --- # Your Turn
02
:
00
Complete the code below to display a bar chart of total NFL attendance from 2000-2019 by team. ```r ggplot(??? = attendance, ???(??? = ???, weight = weekly_attendance)) + geom_bar() + coord_flip() ``` <img src="slides_files/figure-html/unnamed-chunk-34-1.png" width="70%" style="display: block; margin: auto;" /> --- # Density plot > A **density** estimate is a smoothed version of the histogram which is especially useful if the data come from a continuous distribution. ### **`geom_density`** - requires the `x` aesthetic inside `aes()` - change the underlying kernel smoother with the `kernel` parameter - change bandwidth with the `adjust` parameter - Can change appearance of the bars with `color`, `fill`, `alpha` arguments --- # Example ```r ggplot(data = attendance, aes(x = weekly_attendance)) + geom_density() ``` <img src="slides_files/figure-html/unnamed-chunk-35-1.png" width="75%" style="display: block; margin: auto;" /> --- # Other one-variable viz - `geom_freqpoly`: behaves the same as `geom_histogram` but with connected lines instead of bars - `geom_dotplot`: show values in bins as individual dots - `geom_rug`: place lines along an axis for each observation .pull-left[ `geom_freqpoly` + `geom_rug`: <img src="slides_files/figure-html/unnamed-chunk-36-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ `geom_dotplot`: <img src="slides_files/figure-html/unnamed-chunk-37-1.png" width="100%" style="display: block; margin: auto;" /> ] --- class: inverse, center # Two-variable visualization <div class="figure" style="text-align: center"> <img src="img/xkcd_grapefruit.png" alt="<a href='https://xkcd.com/388/'>xkcd.com/388</a>" width="50%" /> <p class="caption"><a href='https://xkcd.com/388/'>xkcd.com/388</a></p> </div> --- # Which variables? <img src="img/two-vars.png" width="40%" style="display: block; margin: auto;" /> --- # Two numeric variables ### `time`, `weekly_attendance` Using data from The DC Team. ```r library(dplyr) dc <- attendance %>% filter(team_name == "The DC Team") ``` ### **`geom_line`** - requires the `x, y` aesthetics inside `aes()` - Can change appearance of the lines with `color`, `linetype`, `alpha` arguments - The `group` `aes` draws lines according to a grouping variable (later) --- # Your Turn
02
:
00
Complete the code to make the time series chart of weekly attendance from 2000-2019 for the DC Team. ```r ???(data = ???, aes(x = ???, y = ???)) + geom_???() ``` <img src="slides_files/figure-html/unnamed-chunk-42-1.png" width="75%" style="display: block; margin: auto;" /> --- # One numeric, one categorical ### `team`, `weekly_attendance` Return to the full `attendance` dataset. ### **`geom_boxplot`** - requires the `x,y` aesthetics inside `aes()` - Can change appearance of the boxes with `color`, `fill`, `alpha` arguments --- # Your Turn
02
:
00
Modify the code below to create box plots for weekly attendance by team name. ```r ggplot(data = ???, aes( ??? , ???)) + geom_???() + coord_flip() ``` <img src="slides_files/figure-html/unnamed-chunk-44-1.png" width="75%" style="display: block; margin: auto;" /> --- # Two categorical variables ### `year`, `week` ```r ggplot(data = dc, aes(x = year, y = week, fill = weekly_attendance)) + geom_tile(color = "grey40") + scale_x_continuous(breaks = 2000:2019) + scale_y_continuous(breaks = 1:17) + scale_fill_gradient(low = "white", high = "forestgreen") ``` <img src="slides_files/figure-html/unnamed-chunk-45-1.png" width="72%" style="display: block; margin: auto;" /> --- class: inverse, center # Three or more variable visualization <div class="figure" style="text-align: center"> <img src="img/xkcd-3d.png" alt="<a href='https://xkcd.com/880/'>xkcd.com/880</a>" width="40%" /> <p class="caption"><a href='https://xkcd.com/880/'>xkcd.com/880</a></p> </div> --- # The limits of `ggplot2` #### `ggplot2` does not do 3D visualization - (there are ways...but you won't see them here.) #### `ggplot2` is not interactive - See the `plotly` package and its [`ggplotly()`](https://plotly-r.com/overview.html#intro-ggplotly) function. ### But, there are plenty of other ways to make informative graphics with `ggplot2` --- # Additional `aes()` mappings Add a third variable to a plot with: <i class="fas fa-rainbow"></i> **color** : `color, fill` <i class="fas fa-expand-arrows-alt"></i> **size**: `size, stroke` <i class="fas fa-shapes"></i> **shape**: `shape, linetype` <i class="fas fa-water"></i> **contour lines**: `geom_contour(), geom_density_2d(), z` <i class="fas fa-chart-bar"></i> **bins**: `geom_bin2d(), geom_hex(), geom_tile()` --- # Choosing a mapping Image from Garrett Grolemund's [2019 JSM tidyverse tutorial](https://rstudio.cloud/project/385945). <img src="img/garrett-aes-vars.PNG" width="50%" style="display: block; margin: auto;" /> --- # Add a discrete/categorical variable New York has two teams: the Giants and the Jets. ```r ny <- attendance %>% filter(team == "New York") glimpse(ny) ``` ``` ## Rows: 680 ## Columns: 11 ## $ team <chr> "New York", "New York", "New York", "New York", "Ne… ## $ team_name <chr> "Giants", "Giants", "Giants", "Giants", "Giants", "… ## $ year <dbl> 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 200… ## $ total <dbl> 1135455, 1135455, 1135455, 1135455, 1135455, 113545… ## $ home <dbl> 624085, 624085, 624085, 624085, 624085, 624085, 624… ## $ away <dbl> 511370, 511370, 511370, 511370, 511370, 511370, 511… ## $ week <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, … ## $ weekly_attendance <dbl> 77434, 65530, 66944, 78216, 68341, 50947, 78189, NA… ## $ time <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, … ## $ conference <chr> "NFC", "NFC", "NFC", "NFC", "NFC", "NFC", "NFC", "N… ## $ division <chr> "East", "East", "East", "East", "East", "East", "Ea… ``` --- # Two time series, one plot Use color to indicate team ```r *ggplot(data = ny, aes(x = time, y = weekly_attendance, color =team_name)) + geom_line(alpha = .7) ``` <img src="slides_files/figure-html/unnamed-chunk-49-1.png" width="80%" style="display: block; margin: auto;" /> --- # Two time series, one plot Use linetype to indicate team ```r *ggplot(data = ny, aes(x = time, y = weekly_attendance, linetype = team_name)) + geom_line(alpha = .7) ``` <img src="slides_files/figure-html/unnamed-chunk-50-1.png" width="80%" style="display: block; margin: auto;" /> --- # Two time series, one plot Use size to indicate team. <i class="fas fa-bullhorn" style="color: #FF0035;"></i> Note: I do not recommended this! Why? ```r *ggplot(data = ny, aes(x = time, y = weekly_attendance, size =team_name)) + geom_line(alpha = .7) ``` <img src="slides_files/figure-html/unnamed-chunk-51-1.png" width="80%" style="display: block; margin: auto;" /> --- # Adding a continuous/numeric variable Using the `games` data from [Tidy Tuesday](https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-02-04/readme.md): ```r games <- read_csv("dat/nfl_games.csv") glimpse(games) ``` ``` ## Rows: 5,324 ## Columns: 20 ## $ year <dbl> 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, … ## $ week <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",… ## $ home_team <chr> "Minnesota Vikings", "Kansas City Team", "Washington T… ## $ away_team <chr> "Chicago Bears", "Indianapolis Colts", "Carolina Panth… ## $ winner <chr> "Minnesota Vikings", "Indianapolis Colts", "Washington… ## $ tie <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA… ## $ day <chr> "Sun", "Sun", "Sun", "Sun", "Sun", "Sun", "Sun", "Sun"… ## $ date <chr> "September 3", "September 3", "September 3", "Septembe… ## $ time <time> 13:00:00, 13:00:00, 13:01:00, 13:02:00, 13:02:00, 13:… ## $ pts_win <dbl> 30, 27, 20, 36, 16, 27, 21, 14, 21, 41, 9, 23, 20, 16,… ## $ pts_loss <dbl> 27, 14, 17, 28, 0, 7, 16, 10, 16, 14, 6, 0, 16, 13, 36… ## $ yds_win <dbl> 374, 386, 396, 359, 336, 398, 296, 187, 395, 425, 233,… ## $ turnovers_win <dbl> 1, 2, 0, 1, 0, 0, 1, 2, 2, 3, 0, 1, 1, 0, 3, 4, 1, 0, … ## $ yds_loss <dbl> 425, 280, 236, 339, 223, 249, 278, 252, 355, 167, 255,… ## $ turnovers_loss <dbl> 1, 1, 1, 1, 1, 1, 1, 3, 4, 2, 4, 6, 2, 1, 0, 1, 3, 3, … ## $ home_team_name <chr> "Vikings", "The KC Team", "The DC Team", "Falcons", "S… ## $ home_team_city <chr> "Minnesota", "Kansas City", "Washington", "Atlanta", "… ## $ away_team_name <chr> "Bears", "Colts", "Panthers", "49ers", "Ravens", "Jagu… ## $ away_team_city <chr> "Chicago", "Indianapolis", "Carolina", "San Francisco"… ## $ pt_spread <dbl> 3, 13, 3, 8, 16, 20, 5, 4, 5, 27, 3, 23, 4, 3, 5, 6, 1… ``` --- # Size ```r *ggplot(data = games, aes(x = yds_loss, y = yds_win, size = pt_spread)) + geom_point(alpha = .1) ``` <img src="slides_files/figure-html/unnamed-chunk-53-1.png" width="80%" style="display: block; margin: auto;" /> --- # Color ```r *ggplot(data = games, aes(x = yds_loss, y = yds_win, color = pt_spread)) + geom_point(alpha = .3) ``` <img src="slides_files/figure-html/unnamed-chunk-54-1.png" width="80%" style="display: block; margin: auto;" /> --- # Grouping > The `group` aesthetic partitions the data for plotting into groups ```r ny19 <- ny %>% filter(year == 2019) ggplot(data = ny19, aes(x = time, y = weekly_attendance)) + geom_line(alpha = .7) ggplot(data = ny19, aes(x = time, y = weekly_attendance, group =team_name)) + geom_line(alpha = .5) ``` <img src="slides_files/figure-html/unnamed-chunk-55-1.png" width="50%" /><img src="slides_files/figure-html/unnamed-chunk-55-2.png" width="50%" /> --- # Many groups Look at all attendance for all teams ```r ggplot(data = attendance, aes(x = time, y = weekly_attendance, group =team_name))+ geom_line(alpha = .4) ``` <img src="slides_files/figure-html/unnamed-chunk-56-1.png" width="80%" style="display: block; margin: auto;" /> --- # Add another variable ```r ggplot(data = attendance, aes(x = time, y = weekly_attendance, group =team_name, color = division))+ geom_line(alpha = .4) ``` <img src="slides_files/figure-html/unnamed-chunk-57-1.png" width="80%" style="display: block; margin: auto;" /> --- # Facets > Faceting generates small multiples, each displaying a different subset of the data. Facets are an alternative to aesthetics for displaying additional discrete variables. - plot subgroups separately - can be arranged by rows, columns, or both - can be "wrapped" for many subgroups - great for exploratory analyses --- # Faceting functions .pull-left[ #### `facet_grid()` - create a grid of graphs, by rows and columns - use `vars()` to call on the variables - adjust scales with `scales = "free"` #### `facet_wrap()` - create small multiples by "wrapping" a series of plots - use `vars()` to call on the variables - `nrow` and `ncol` arguments for dictating shape of grid ] .pull-right[ ```r p <- ggplot(data = attendance, aes(x = time, y = weekly_attendance, group =team_name, color = division))+ geom_line(alpha = .4) p ``` <img src="slides_files/figure-html/unnamed-chunk-58-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Faceting example .pull-left[ ```r p + facet_grid(cols = vars(conference)) ``` <img src="slides_files/figure-html/unnamed-chunk-59-1.png" width="80%" style="display: block; margin: auto;" /> ] .pull-right[ ```r p + facet_wrap(vars(conference), nrow = 2) ``` <img src="slides_files/figure-html/unnamed-chunk-60-1.png" width="80%" style="display: block; margin: auto;" /> ] --- class: inverse, center, middle # Part 2: Advanced customization <div class="figure" style="text-align: center"> <img src="img/djnavarro.png" alt="Source <a href='https://twitter.com/djnavarro/status/1243769699055239170?s=20'>@djnavarro</a>" width="80%" /> <p class="caption">Source <a href='https://twitter.com/djnavarro/status/1243769699055239170?s=20'>@djnavarro</a></p> </div> --- class: inverse, center # Combining layers <div class="figure" style="text-align: center"> <img src="img/layers.png" alt="Image from <a href='https://skillgaze.com/2017/10/31/understanding-different-visualization-layers-of-ggplot/'>skillgaze.com</a>" width="60%" /> <p class="caption">Image from <a href='https://skillgaze.com/2017/10/31/understanding-different-visualization-layers-of-ggplot/'>skillgaze.com</a></p> </div> --- # Using the same data Add another layer by adding a different geom ```r ggplot(data = games, aes(x = yds_loss, y = yds_win)) + geom_point(aes(size = pt_spread), alpha = .1) + * geom_smooth(se = FALSE, method = "lm") ``` <img src="slides_files/figure-html/unnamed-chunk-63-1.png" width="70%" style="display: block; margin: auto;" /> --- # Your Turn
03
:
00
<img src="slides_files/figure-html/unnamed-chunk-64-1.png" width="80%" style="display: block; margin: auto;" /> --- # Using different data ```r ggplot() + geom_line(data = attendance, aes(x = time, y = weekly_attendance, group = team_name), alpha = .1) + *geom_line(data = dc, aes(x = time, y = weekly_attendance), alpha = .7, color = "blue") ``` <img src="slides_files/figure-html/unnamed-chunk-65-1.png" width="80%" style="display: block; margin: auto;" /> --- class: inverse, center # Graph appearance <div class="figure" style="text-align: center"> <img src="img/glamourgraphics.png" alt="Image by <a href='https://twitter.com/w_r_chase/status/1155212225621221376'>@w_r_chase</a>" width="30%" /> <p class="caption">Image by <a href='https://twitter.com/w_r_chase/status/1155212225621221376'>@w_r_chase</a></p> </div> --- # Scales > "Scales control the details of how data values are translated to visual properties." - Every aes value has a corresponding family of scales functions - Of the form `scale_{aes}_*()`, e.g. `scale_x_continuous()` - Values of the * depend on the aes - Possible scale function arguments: * `name`: label of the axis/legend * `breaks`: numeric positions of breaks on axes/legends * `labels`: labels of the breaks on axes/legends * `limits`: continuous axis limits * `expand`: padding around data * `na.value`: what to do with missings * `trans`: continuous transformations of data * `guide`: function to create guide/legend * `date_breaks`: breaks for date variables --- # Scales for axes .pull-left[ `scale_x_*()`, `scale_y_*()` - continuous - discrete - binned - log10 - sqrt - date - datetime - reverse ] .pull-right[ ```r ggplot(dc, aes(x = time, y = weekly_attendance)) + geom_line() + scale_x_continuous(breaks = seq(1, 340, by = 17), labels = 2000:2019, name = "Year") ``` <img src="slides_files/figure-html/unnamed-chunk-67-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Scales for color .pull-left[ `scale_color_*()`, `scale_fill_*()` - manual - continuous - brewer/distiller/fermenter - gradient/gradient2/gradientn - steps - viridis ] .pull-right[ ```r ggplot(ny, aes(x = time, y = weekly_attendance, color = team_name)) + geom_line(alpha = .7) + scale_color_manual(name = NULL, values = c("navy", "forestgreen")) ``` <img src="slides_files/figure-html/unnamed-chunk-68-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Scales for size .pull-left[ `scale_*()` - `size` - `radius` - `size_binned` - `size_area` - `size_binned_area` ] .pull-right[ ```r ggplot(data=games, aes(x = yds_loss, y = yds_win, size = pt_spread)) + geom_point(alpha = .1) + scale_size_binned("Point Spread", n.breaks = 5) ``` <img src="slides_files/figure-html/unnamed-chunk-69-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Other scale functions - `scale_alpha_*()`: for mapping a variable to transparency - `scale_linetype_*()`: for mapping a variable to linetype (`geom_line`) - `scale_shape_*()`: for mapping a variable to shape (`geom_point`) <i class="fas fa-bullhorn" style="color: #FF0035;"></i> Note that linetype and shape have relatively few possible values by default, and are best for variables with only a few levels. <img src="https://ggplot2.tidyverse.org/reference/scale_shape-6.png" width="50%" /><img src="https://ggplot2.tidyverse.org/reference/scale_linetype-3.png" width="50%" /> ```r ggplot(data = mpg, aes(x = cty, y = hwy, shape = class, alpha = cyl)) + geom_jitter(color = "red") + theme(legend.position = "top") + scale_shape_manual(values = c("◐", '◑', '◒' ,'◓', '◔','◕','◖')) ``` --- # Labels Labels are also scale elements - `ggtitle(main, subtitle)`: plot title & subtitle - `xlab()`, `ylab()`: axes titles - `labs()`: all of the above plus captions, tags, and other aes values * e.g. `color = "My variable"` names the legend "My variable" * `title`: plot title * `subtitle`: plot subtitle * `caption`: text for bottom right corner of plot area e.g. to ID data source * `tag`: text for top left corner of plot area, e.g. A, B, C when combining many plots together --- # Coordinates The `coord_*()` family of functions dictate position aesthetics (e.g. `x`, `y`): - Controls the "canvas" the data are "drawn" on - Especially useful for maps Function examples: - `coord_cartesian()`: the default. x, y axes - `coord_polar()`: x becomes radius, y becomes angle <i class="fas fa-bullhorn" style="color: #FF0035;"></i> You can apply limits and transformations to axes in scales or coordinates (e.g. `xlim()`, `ylim()`) but using coordinates is probably what you want. --- # Example All three visualizations below begin with the same plot code: ```r ggplot(dc, aes(x = time, y = weekly_attendance)) + geom_line() + ``` <img src="slides_files/figure-html/unnamed-chunk-72-1.png" width="33%" /><img src="slides_files/figure-html/unnamed-chunk-72-2.png" width="33%" /><img src="slides_files/figure-html/unnamed-chunk-72-3.png" width="33%" /> --- # Themes Specific themes: - `theme_grey()`: default - `theme_bw()`: white background, gray gridlines - `theme_classic()`: looks more like base R plots - `theme_void()`: removes all background elements, all axes elements, keeps legends General `theme` function for advanced customization: - `theme()` * adjust the appearance every "non-data element" of the viz * fonts, background, text positioning, legend appearance, facet appearance, etc. - <i class="fas fa-bullhorn" style="color: #FF0035;"></i> Rule of thumb: when changing an element that shows data, use aes() and scales. Otherwise, use themes. --- # Theme elements Every theme element is either a line, a rect, or text. See [documentation](https://ggplot2.tidyverse.org/reference/theme.html) for more. To modify a theme element, use: - `element_line()`: change lines' appearance (color, linetype, size, etc.) - `element_rect()`: change rectangles' appearance (fill, border lines, etc.) - `element_text()`: change text elements' appearance (family, face, color, etc.) - `element_blank()`: draw nothing. Use to remove a theme element. <i class="fas fa-bullhorn" style="color: #FF0035;"></i> Note: there are 92 possible arguments used to modify a ggplot theme. Usually, we will only need to call on a handful. --- # Example ```r mytheme <- theme(legend.position = "top", axis.text = element_text(face = "italic", color = "navy"), plot.background = element_rect(fill = "#a0d1f2"), panel.background = element_blank(), panel.grid = element_line(linetype = "dotdash")) ggplot(data = mpg) + geom_jitter(aes(x = cty, y = hwy, color = class)) + mytheme ``` <img src="slides_files/figure-html/unnamed-chunk-73-1.png" width="80%" style="display: block; margin: auto;" /> --- # Legends The `guides()` family of functions control legends' appearance - `guide_colorbar()`: continuous colors - `guide_legend()`: discrete values (shapes, colors) - `guide_axis()`: control axis text/spacing, add a secondary axis - `guide_bins()`: creates "bins" of values in the legend - `guide_colorsteps()`: makes colorbar discrete <img src="img/guides_examples.png" width="80%" style="display: block; margin: auto;" /> --- # Example ```r ggplot(data = mpg) + * geom_jitter(aes(x = cty, y = hwy, color = class),key_glyph = draw_key_pointrange) + mytheme + * guides(color = guide_legend(nrow = 1)) ``` <img src="slides_files/figure-html/unnamed-chunk-75-1.png" width="80%" style="display: block; margin: auto;" /> --- # Fonts To change fonts in a `ggplot2` viz: - Use the `element_text()` function inside of `theme()` * `family`: font family * `face` : bold, italic, bold.italic, plain * `color`, `size`, `angle`, etc. - Include additional fonts with the [`extrafont`](https://github.com/wch/extrafont) package: ```r library(extrafont) font_import() # will take 2-3 minutes. Only need to run once loadfonts() fonts() fonttable() ``` <i class="fas fa-bullhorn" style="color: #FF0035;"></i> Restart R after importing the fonts for the first time, and load `extrafont` and `loadfonts` BEFORE loading `ggplot2`. --- # Example ```r ggplot(data = mpg) + geom_jitter(aes(x = cty, y = hwy, color = class)) + * theme(text = element_text(family = "Peralta")) ``` <img src="img/font-demo.png" width="80%" style="display: block; margin: auto;" /> --- # Design principles .pull-left[ Advice on data visualization design from [Will Chase](https://www.williamrchase.com/slides/assets/player/KeynoteDHTMLPlayer.html). - Left-align text - Don't make people tilt their heads! - Remove borders & gridlines - Directly label in place of legends - White space is your friend - Use simple, clear fonts - Color is hard. (Stick to the available palettes) Watch Will's talk [here](https://resources.rstudio.com/rstudio-conf-2020/the-glamour-of-graphics-william-chase). ] .pull-right[ <img src="img/chase-glamour.png" width="80%" style="display: block; margin: auto;" /> ] --- class: inverse, center # `ggplot2` extensions <div class="figure" style="text-align: center"> <img src="img/gganimate_fireworks.PNG" alt="Image by <a href="https://twitter.com/allison_horst">@allison_horst</a>" width="60%" /> <p class="caption">Image by <a href="https://twitter.com/allison_horst">@allison_horst</a></p> </div> --- # Where to find them Maintainers of packages can put their `ggplot2` extension on [exts.ggplot2.tidyverse.org/gallery](http://exts.ggplot2.tidyverse.org/gallery/) <img src="img/ggplot-exts.png" width="80%" style="display: block; margin: auto;" /> --- # Animation [`gganimate`](https://gganimate.com/): a grammar of animated graphics <img src="img/gganimate.png" style="position: absolute; top: 5%; right: 5%;"> From the documentation, here are the `gganimate` function families: - `transition_*()`: defines how the data should be spread out and how it relates to itself across time. - `view_*()`: defines how the positional scales should change along the animation. - `shadow_*()`: defines how data from other points in time should be presented in the given point in time. - `enter_*()/exit_*()`: defines how new data should appear and how old data should disappear during the course of the animation. - `ease_aes()`: defines how different aesthetics should be eased during transitions. <i class="fas fa-bullhorn" style="color: #FF0035;"></i> For optimum performance, use the `animate()` and `anim_save()` functions to create and save animations. --- # Example (data) Full simulated data: ``` ## Rows: 100 ## Columns: 9 ## $ sim <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1… ## $ switch <dbl> 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, … ## $ win <dbl> 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, … ## $ Switch <chr> "Yes", "No", "No", "No", "No", "No", "No", "Yes", "Yes", "No… ## $ Win <chr> "Win", "Lose", "Lose", "Win", "Win", "Lose", "Lose", "Win", … ## $ count <dbl> 1, 1, 2, 1, 2, 3, 4, 2, 1, 5, 3, 6, 3, 7, 2, 3, 4, 4, 8, 9, … ## $ n_switch <dbl> 1, 1, 2, 3, 4, 5, 6, 2, 3, 7, 8, 9, 4, 10, 5, 6, 7, 11, 12, … ## $ perc <dbl> 0.02127660, 0.01886792, 0.03773585, 0.01886792, 0.03773585, … ## $ perc2 <dbl> 0.00000000, 0.00000000, 0.01886792, 0.00000000, 0.01886792, … ``` --- # Example (code) ```r library(gganimate) ggplot(data = sims) + * geom_col(aes(x = Win, y = perc, fill = Win, group = seq_along(sim))) + facet_grid(cols = vars(Switch), labeller = label_both) + scale_fill_manual(values = c("#F6C40C", "#4AA0BB")) + theme_bw() + theme(legend.position = "none", text = element_text(family = "Peralta", size = 20), panel.grid = element_blank(), strip.background = element_rect(fill = NA), plot.title = element_text(hjust = .5), plot.background = element_rect(fill = NA), panel.background = element_rect(fill = NA)) + * labs(title = "Let's Make a Deal!", subtitle = "Sim: {frame_along}") + * transition_reveal(seq_along(sim)) ``` --- # Example (animation) <img src="gifs/monty_hall_simpler.gif" width="50%" style="display: block; margin: auto;" /> --- # Your Turn
06
:
00
Complete the code to recreate the GIF from the motivating example. (GIF on next slide) ```r ggplot(data = sims) + geom_???(aes(xmin = win - .5, xmax = win + .5, ymin = perc2, ymax = perc, fill = Win, group = seq_along(sim)), color = "grey40") + facet_grid(cols = vars(Switch), labeller = label_both) + scale_x_continuous(breaks = c(0,1), labels = ??? ) + scale_y_continuous(breaks = ???, labels = scales::label_percent(accuracy = 1)) + scale_fill_manual(values = c("#F6C40C", "#4AA0BB")) + theme_bw() + theme(legend.position = "none", text = element_text(family = "Peralta", size = 20), panel.grid = element_blank(), strip.background = element_rect(fill = NA), plot.title = element_text(hjust = .5), plot.background = element_rect(fill = NA), panel.background = element_rect(fill = NA)) + labs(title = "Let's Make a Deal!") + transition_???(sim, transition_length = 10, state_length = 5) + shadow_???(color = NA) ``` --- # Motivating example (again) <img src="gifs/monty_hall.gif" width="49%" style="display: block; margin: auto;" /> --- class: inverse, center # Conclusion <img src="https://media.giphy.com/media/lD76yTC5zxZPG/giphy.gif" width="50%" style="display: block; margin: auto;" /> --- # Additional resources - [ggplot2 book](https://ggplot2-book.org/) - [plotly book](https://plotly-r.com/) - [R for Data Science book](https://r4ds.had.co.nz/) - [Tidy Tuesday](https://github.com/rfordatascience/tidytuesday) - [My advice for getting help in R](https://sctyner.github.io/rhelp.html) - Thomas Lin Pedersen's ggplot2 webinar: [part 1](https://youtu.be/h29g21z0a68) and [part 2](https://youtu.be/0m4yywqNPVY) - [RStudio Cheat Sheets](https://github.com/rstudio/cheatsheets) - [Will Chase's Design Talk](https://rstudio.com/resources/rstudioconf-2020/the-glamour-of-graphics/) at rstudio::conf - The [4-hour version](https://rstudio.cloud/project/1116791) of this workshop - More in the `resources/` folder