class: center, middle, inverse, title-slide .title[ # Data wrangling with
tidyverse
] .subtitle[ ## Hannah Owens, adapted from Maria Novosolov ] .date[ ### 07-03-2024 ] --- # Tidyverse is a collection of packages .center[ <img src="img/tidyverse_core.png" width=50%> ] --- # The advantages .large[ - shared syntax & conventions - tibble/data.frame in, tibble out - neat code ] --- # Tidy data .center[ <img src="img/tidydata_1.jpg" width=90%> ] .footnote[ img credit: Allison Horst ] --- |Film |Gender |Race | Words| |:--------------------------|:------|:------|-----:| |The Fellowship Of The Ring |Female |Elf | 1229| |The Fellowship Of The Ring |Male |Elf | 971| |The Fellowship Of The Ring |Female |Hobbit | 14| |The Fellowship Of The Ring |Male |Hobbit | 3644| |The Fellowship Of The Ring |Female |Man | 0| |The Fellowship Of The Ring |Male |Man | 1995| |The Two Towers |Female |Elf | 331| |The Two Towers |Male |Elf | 513| |The Two Towers |Female |Hobbit | 0| |The Two Towers |Male |Hobbit | 2463| --- # Does your code resemble this? ```r starwars_human_subset <- subset(starwars,species == "Human") starwars_human_subset$bmi <- starwars_human_subset$mass / (0.01 * starwars_human_subset$height)^2 fattest_human_from_each_planet <- aggregate(bmi ~ homeworld,data = starwars_human_subset, FUN = "max") fattest_human_from_each_planet <- merge( x=fattest_human_from_each_planet, y=starwars_human_subset,by = c("homeworld","bmi")) fattest_human_from_each_planet <- fattest_human_from_each_planet [,1:5] ```  --- # Code should be pleasant to read  --- # Tibbles .left-column[ <img src="img/tibble.png"> ] .right-column[ Tibbles are data.frames that are lazy and surly: - **They do less** (i.e. don't change variable names or types, don't do partial matching) - **They complain more** (e.g. when a variable does not exist). - Force you to confront problems earlier, typically leading to cleaner, more expressive code. ] --- # `data.frame` ```r iris ``` ``` ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3.0 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa ## 4 4.6 3.1 1.5 0.2 setosa ## 5 5.0 3.6 1.4 0.2 setosa ## 6 5.4 3.9 1.7 0.4 setosa ## 7 4.6 3.4 1.4 0.3 setosa ## 8 5.0 3.4 1.5 0.2 setosa ## 9 4.4 2.9 1.4 0.2 setosa ## 10 4.9 3.1 1.5 0.1 setosa ## 11 5.4 3.7 1.5 0.2 setosa ## 12 4.8 3.4 1.6 0.2 setosa ## 13 4.8 3.0 1.4 0.1 setosa ## 14 4.3 3.0 1.1 0.1 setosa ## 15 5.8 4.0 1.2 0.2 setosa ## 16 5.7 4.4 1.5 0.4 setosa ## 17 5.4 3.9 1.3 0.4 setosa ## 18 5.1 3.5 1.4 0.3 setosa ## 19 5.7 3.8 1.7 0.3 setosa ## 20 5.1 3.8 1.5 0.3 setosa ## 21 5.4 3.4 1.7 0.2 setosa ## 22 5.1 3.7 1.5 0.4 setosa ## 23 4.6 3.6 1.0 0.2 setosa ## 24 5.1 3.3 1.7 0.5 setosa ## 25 4.8 3.4 1.9 0.2 setosa ## 26 5.0 3.0 1.6 0.2 setosa ## 27 5.0 3.4 1.6 0.4 setosa ## 28 5.2 3.5 1.5 0.2 setosa ## 29 5.2 3.4 1.4 0.2 setosa ## 30 4.7 3.2 1.6 0.2 setosa ## 31 4.8 3.1 1.6 0.2 setosa ## 32 5.4 3.4 1.5 0.4 setosa ## 33 5.2 4.1 1.5 0.1 setosa ## 34 5.5 4.2 1.4 0.2 setosa ## 35 4.9 3.1 1.5 0.2 setosa ## 36 5.0 3.2 1.2 0.2 setosa ## 37 5.5 3.5 1.3 0.2 setosa ## 38 4.9 3.6 1.4 0.1 setosa ## 39 4.4 3.0 1.3 0.2 setosa ## 40 5.1 3.4 1.5 0.2 setosa ## 41 5.0 3.5 1.3 0.3 setosa ## 42 4.5 2.3 1.3 0.3 setosa ## 43 4.4 3.2 1.3 0.2 setosa ## 44 5.0 3.5 1.6 0.6 setosa ## 45 5.1 3.8 1.9 0.4 setosa ## 46 4.8 3.0 1.4 0.3 setosa ## 47 5.1 3.8 1.6 0.2 setosa ## 48 4.6 3.2 1.4 0.2 setosa ## 49 5.3 3.7 1.5 0.2 setosa ## 50 5.0 3.3 1.4 0.2 setosa ## 51 7.0 3.2 4.7 1.4 versicolor ## 52 6.4 3.2 4.5 1.5 versicolor ## 53 6.9 3.1 4.9 1.5 versicolor ## 54 5.5 2.3 4.0 1.3 versicolor ## 55 6.5 2.8 4.6 1.5 versicolor ## 56 5.7 2.8 4.5 1.3 versicolor ## 57 6.3 3.3 4.7 1.6 versicolor ## 58 4.9 2.4 3.3 1.0 versicolor ## 59 6.6 2.9 4.6 1.3 versicolor ## 60 5.2 2.7 3.9 1.4 versicolor ## 61 5.0 2.0 3.5 1.0 versicolor ## 62 5.9 3.0 4.2 1.5 versicolor ## 63 6.0 2.2 4.0 1.0 versicolor ## 64 6.1 2.9 4.7 1.4 versicolor ## 65 5.6 2.9 3.6 1.3 versicolor ## 66 6.7 3.1 4.4 1.4 versicolor ## 67 5.6 3.0 4.5 1.5 versicolor ## 68 5.8 2.7 4.1 1.0 versicolor ## 69 6.2 2.2 4.5 1.5 versicolor ## 70 5.6 2.5 3.9 1.1 versicolor ## 71 5.9 3.2 4.8 1.8 versicolor ## 72 6.1 2.8 4.0 1.3 versicolor ## 73 6.3 2.5 4.9 1.5 versicolor ## 74 6.1 2.8 4.7 1.2 versicolor ## 75 6.4 2.9 4.3 1.3 versicolor ## 76 6.6 3.0 4.4 1.4 versicolor ## 77 6.8 2.8 4.8 1.4 versicolor ## 78 6.7 3.0 5.0 1.7 versicolor ## 79 6.0 2.9 4.5 1.5 versicolor ## 80 5.7 2.6 3.5 1.0 versicolor ## 81 5.5 2.4 3.8 1.1 versicolor ## 82 5.5 2.4 3.7 1.0 versicolor ## 83 5.8 2.7 3.9 1.2 versicolor ## 84 6.0 2.7 5.1 1.6 versicolor ## 85 5.4 3.0 4.5 1.5 versicolor ## 86 6.0 3.4 4.5 1.6 versicolor ## 87 6.7 3.1 4.7 1.5 versicolor ## 88 6.3 2.3 4.4 1.3 versicolor ## 89 5.6 3.0 4.1 1.3 versicolor ## 90 5.5 2.5 4.0 1.3 versicolor ## 91 5.5 2.6 4.4 1.2 versicolor ## 92 6.1 3.0 4.6 1.4 versicolor ## 93 5.8 2.6 4.0 1.2 versicolor ## 94 5.0 2.3 3.3 1.0 versicolor ## 95 5.6 2.7 4.2 1.3 versicolor ## 96 5.7 3.0 4.2 1.2 versicolor ## 97 5.7 2.9 4.2 1.3 versicolor ## 98 6.2 2.9 4.3 1.3 versicolor ## 99 5.1 2.5 3.0 1.1 versicolor ## 100 5.7 2.8 4.1 1.3 versicolor ## 101 6.3 3.3 6.0 2.5 virginica ## 102 5.8 2.7 5.1 1.9 virginica ## 103 7.1 3.0 5.9 2.1 virginica ## 104 6.3 2.9 5.6 1.8 virginica ## 105 6.5 3.0 5.8 2.2 virginica ## 106 7.6 3.0 6.6 2.1 virginica ## 107 4.9 2.5 4.5 1.7 virginica ## 108 7.3 2.9 6.3 1.8 virginica ## 109 6.7 2.5 5.8 1.8 virginica ## 110 7.2 3.6 6.1 2.5 virginica ## 111 6.5 3.2 5.1 2.0 virginica ## 112 6.4 2.7 5.3 1.9 virginica ## 113 6.8 3.0 5.5 2.1 virginica ## 114 5.7 2.5 5.0 2.0 virginica ## 115 5.8 2.8 5.1 2.4 virginica ## 116 6.4 3.2 5.3 2.3 virginica ## 117 6.5 3.0 5.5 1.8 virginica ## 118 7.7 3.8 6.7 2.2 virginica ## 119 7.7 2.6 6.9 2.3 virginica ## 120 6.0 2.2 5.0 1.5 virginica ## 121 6.9 3.2 5.7 2.3 virginica ## 122 5.6 2.8 4.9 2.0 virginica ## 123 7.7 2.8 6.7 2.0 virginica ## 124 6.3 2.7 4.9 1.8 virginica ## 125 6.7 3.3 5.7 2.1 virginica ## 126 7.2 3.2 6.0 1.8 virginica ## 127 6.2 2.8 4.8 1.8 virginica ## 128 6.1 3.0 4.9 1.8 virginica ## 129 6.4 2.8 5.6 2.1 virginica ## 130 7.2 3.0 5.8 1.6 virginica ## 131 7.4 2.8 6.1 1.9 virginica ## 132 7.9 3.8 6.4 2.0 virginica ## 133 6.4 2.8 5.6 2.2 virginica ## 134 6.3 2.8 5.1 1.5 virginica ## 135 6.1 2.6 5.6 1.4 virginica ## 136 7.7 3.0 6.1 2.3 virginica ## 137 6.3 3.4 5.6 2.4 virginica ## 138 6.4 3.1 5.5 1.8 virginica ## 139 6.0 3.0 4.8 1.8 virginica ## 140 6.9 3.1 5.4 2.1 virginica ## 141 6.7 3.1 5.6 2.4 virginica ## 142 6.9 3.1 5.1 2.3 virginica ## 143 5.8 2.7 5.1 1.9 virginica ## 144 6.8 3.2 5.9 2.3 virginica ## 145 6.7 3.3 5.7 2.5 virginica ## 146 6.7 3.0 5.2 2.3 virginica ## 147 6.3 2.5 5.0 1.9 virginica ## 148 6.5 3.0 5.2 2.0 virginica ## 149 6.2 3.4 5.4 2.3 virginica ## 150 5.9 3.0 5.1 1.8 virginica ``` --- # Tibbles print nicely! ```r library(tidyverse) as_tibble(iris) ``` ``` ## # A tibble: 150 × 5 ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## <dbl> <dbl> <dbl> <dbl> <fct> ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa ## 4 4.6 3.1 1.5 0.2 setosa ## 5 5 3.6 1.4 0.2 setosa ## 6 5.4 3.9 1.7 0.4 setosa ## 7 4.6 3.4 1.4 0.3 setosa ## 8 5 3.4 1.5 0.2 setosa ## 9 4.4 2.9 1.4 0.2 setosa ## 10 4.9 3.1 1.5 0.1 setosa ## # ℹ 140 more rows ``` --- class: exercise, middle # Let's practice! 1. Load tidyverse with `library(tidyverse)` 2. Run the function `as_tibble()` on your squid data --- # Pipe ("then") .pull-left[  ] .pull-right[ Data in, data out ```r do_another_thing(do_something(data)) # versus data %>% do_something() %>% do_another_thing() ``` ] .footnote[ * keyboard shortcut: ctrl/cmd + shift + m ] --- class: center, middle # `readr` package <img src="img/readr.png" width=30%> --- # read_xxx function * Neater import than `read.table` and `read.csv` * Does data check and prints a report of the data imported * Character columns are not converted to factors * Most useful are `read_csv`, `read_table`, and `read_delim` * Compatible with pipe workflow --- # Example ```r library(readr) mydata<- read_csv("data/Data_Squid.csv") ``` ``` ## Rows: 2644 Columns: 6 ## ── Column specification ──────────────────────────────────────────────────────── ## Delimiter: "," ## chr (1): Sex ## dbl (5): Sample, Year, Month, Location, GSI ## ## ℹ Use `spec()` to retrieve the full column specification for this data. ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. ``` --- ```r mydata ``` ``` ## # A tibble: 2,644 × 6 ## Sample Year Month Location Sex GSI ## <dbl> <dbl> <dbl> <dbl> <chr> <dbl> ## 1 1 1 1 1 Female 10.4 ## 2 2 1 1 3 Female 9.83 ## 3 3 1 1 1 Female 9.74 ## 4 4 1 1 1 Female 9.31 ## 5 5 1 1 1 Female 8.99 ## 6 6 1 1 1 Female 8.77 ## 7 7 1 1 1 Female 8.26 ## 8 8 1 1 3 Female 7.40 ## 9 9 1 1 3 Female 7.22 ## 10 10 1 2 1 Female 6.84 ## # ℹ 2,634 more rows ``` --- class: center, middle # `janitor` package <img src="img/janitor.png" width=40%> --- # `clean_names()` function * cleans the column names to something more computer friendly * For example, brings all the column names to lowercase and adds underscores between words --- # Regular column names ```r mydata ``` ``` ## # A tibble: 2,644 × 6 ## Sample Year Month Location Sex GSI ## <dbl> <dbl> <dbl> <dbl> <chr> <dbl> ## 1 1 1 1 1 Female 10.4 ## 2 2 1 1 3 Female 9.83 ## 3 3 1 1 1 Female 9.74 ## 4 4 1 1 1 Female 9.31 ## 5 5 1 1 1 Female 8.99 ## 6 6 1 1 1 Female 8.77 ## 7 7 1 1 1 Female 8.26 ## 8 8 1 1 3 Female 7.40 ## 9 9 1 1 3 Female 7.22 ## 10 10 1 2 1 Female 6.84 ## # ℹ 2,634 more rows ``` --- # Clean column names ```r mydata %>% * janitor::clean_names() ``` ``` ## # A tibble: 2,644 × 6 ## sample year month location sex gsi ## <dbl> <dbl> <dbl> <dbl> <chr> <dbl> ## 1 1 1 1 1 Female 10.4 ## 2 2 1 1 3 Female 9.83 ## 3 3 1 1 1 Female 9.74 ## 4 4 1 1 1 Female 9.31 ## 5 5 1 1 1 Female 8.99 ## 6 6 1 1 1 Female 8.77 ## 7 7 1 1 1 Female 8.26 ## 8 8 1 1 3 Female 7.40 ## 9 9 1 1 3 Female 7.22 ## 10 10 1 2 1 Female 6.84 ## # ℹ 2,634 more rows ``` --- # Can also be: ```r mydata<- read_csv("data/Data_Squid.csv") %>% janitor::clean_names() mydata ``` ``` ## # A tibble: 2,644 × 6 ## sample year month location sex gsi ## <dbl> <dbl> <dbl> <dbl> <chr> <dbl> ## 1 1 1 1 1 Female 10.4 ## 2 2 1 1 3 Female 9.83 ## 3 3 1 1 1 Female 9.74 ## 4 4 1 1 1 Female 9.31 ## 5 5 1 1 1 Female 8.99 ## 6 6 1 1 1 Female 8.77 ## 7 7 1 1 1 Female 8.26 ## 8 8 1 1 3 Female 7.40 ## 9 9 1 1 3 Female 7.22 ## 10 10 1 2 1 Female 6.84 ## # ℹ 2,634 more rows ``` --- class: exercise, middle # Lets practice! ## Load the data and change all the column names to caps lock **Hint:** check out the help in `clean_names()` <img src="img/janitor_clean_names.png" width=60%> .footnote[ img credit: Allison Horst ] --- class: center, middle # `dplyr` functions <img src="img/dplyr.png" width=30%> --- # `select()`  --- # Select "sample", "sex", and "gsi" columns only ```r mydata %>% * select(sample, sex, gsi) ``` ``` ## # A tibble: 2,644 × 3 ## sample sex gsi ## <dbl> <chr> <dbl> ## 1 1 Female 10.4 ## 2 2 Female 9.83 ## 3 3 Female 9.74 ## 4 4 Female 9.31 ## 5 5 Female 8.99 ## 6 6 Female 8.77 ## 7 7 Female 8.26 ## 8 8 Female 7.40 ## 9 9 Female 7.22 ## 10 10 Female 6.84 ## # ℹ 2,634 more rows ``` --- class: exercise, middle # Let's practice! ## Select "year", "sex", "location", and "gsi". --- # `mutate()` .center[ <img src="img/dplyr_mutate.png" width=60%> ] ---  --- # Add a gsi_log column ```r mydata %>% select(sample, sex, gsi) %>% * mutate(gsi_log = log10(gsi)) ``` ``` ## # A tibble: 2,644 × 4 ## sample sex gsi gsi_log ## <dbl> <chr> <dbl> <dbl> ## 1 1 Female 10.4 1.02 ## 2 2 Female 9.83 0.993 ## 3 3 Female 9.74 0.988 ## 4 4 Female 9.31 0.969 ## 5 5 Female 8.99 0.954 ## 6 6 Female 8.77 0.943 ## 7 7 Female 8.26 0.917 ## 8 8 Female 7.40 0.869 ## 9 9 Female 7.22 0.858 ## 10 10 Female 6.84 0.835 ## # ℹ 2,634 more rows ``` --- class: exercise, middle # Let's practice! ## Add a new column with gsi multiplied by 10 --- # `filter()` Works similar to `subset()` .center[ <img src="img/dplyr_filter.jpg" width=80%> ] ---  --- # Filter the data to have only males ```r mydata %>% select(sample, sex, gsi) %>% mutate(gsi_log = log10(gsi)) %>% * filter(sex == "Male") ``` ``` ## # A tibble: 1,402 × 4 ## sample sex gsi gsi_log ## <dbl> <chr> <dbl> <dbl> ## 1 24 Male 5.30 0.724 ## 2 48 Male 4.30 0.633 ## 3 58 Male 3.50 0.544 ## 4 60 Male 3.25 0.512 ## 5 61 Male 3.23 0.509 ## 6 62 Male 3.23 0.509 ## 7 63 Male 3.18 0.503 ## 8 65 Male 2.97 0.473 ## 9 66 Male 2.95 0.470 ## 10 67 Male 2.94 0.468 ## # ℹ 1,392 more rows ``` --- # Filter the data to have only males with gonads larger than 4. ```r mydata %>% select(sample, sex, gsi) %>% mutate(gsi_log = log10(gsi)) %>% * filter(sex == "Male",gsi > 4) ``` ``` ## # A tibble: 7 × 4 ## sample sex gsi gsi_log ## <dbl> <chr> <dbl> <dbl> ## 1 24 Male 5.30 0.724 ## 2 48 Male 4.30 0.633 ## 3 763 Male 4.21 0.625 ## 4 765 Male 4.19 0.622 ## 5 1671 Male 4.55 0.658 ## 6 1676 Male 4.33 0.636 ## 7 1679 Male 4.01 0.603 ``` --- class: exercise, middle # Let's practice! 1. Filter the data to have only males from year 1 2. Talk to your neighbor: do you have the same number of rows? --- # `arrange()`  --- # Sort the data based on the gsi_log - `desc()` puts things in descending order ```r mydata %>% select(sample, sex, gsi) %>% mutate(gsi_log = log10(gsi)) %>% * arrange(desc(gsi_log)) ``` ``` ## # A tibble: 2,644 × 4 ## sample sex gsi gsi_log ## <dbl> <chr> <dbl> <dbl> ## 1 546 Female 14.6 1.16 ## 2 547 Female 13.3 1.12 ## 3 1520 Female 11.9 1.07 ## 4 548 Female 11.2 1.05 ## 5 549 Female 11.2 1.05 ## 6 1521 Female 10.8 1.03 ## 7 550 Female 10.7 1.03 ## 8 551 Female 10.7 1.03 ## 9 552 Female 10.6 1.03 ## 10 2284 Female 10.6 1.03 ## # ℹ 2,634 more rows ``` --- class: exercise, middle # Let's practice! ## Arrange the data based on year. --- # `group_by(), summarize()`  --- ## Create a summary with the avarage GSI for each combination of year and location, and sort it by the avarage GSI. ```r mydata %>% select(location, year, sex, gsi) %>% * group_by(location,year) %>% * summarise(avg_gsi = mean(gsi,na.rm = T)) %>% arrange(desc(avg_gsi)) %>% * ungroup() ``` ``` ## `summarise()` has grouped output by 'location'. You can override using the ## `.groups` argument. ``` ``` ## # A tibble: 11 × 3 ## location year avg_gsi ## <dbl> <dbl> <dbl> ## 1 3 3 3.78 ## 2 3 4 3.65 ## 3 4 2 3.06 ## 4 1 2 2.84 ## 5 1 4 2.60 ## 6 3 2 2.38 ## 7 1 3 2.24 ## 8 1 1 1.46 ## 9 3 1 1.11 ## 10 2 2 0.491 ## 11 2 3 0.209 ``` --- .center[ ### **Remember:** When using `group_by()`, always add `ungroup()` at the end to convert the data to a standard tibble <img src="img/group_by_ungroup.png" width=80%> ] --- class: exercise, middle # Let's practice! Work with your neighbor. 1. What is the number of samples in each year? 2. How many females and males there are in each location? 3. What is the maximum GSI for males and females in each month? --- class: center, middle # 5 Minute Break <img src="https://cornerstone.ms/wp-content/uploads/2020/03/Brain.png" width=110%> --- # Rename columns with `rename()` ```r mydata %>% select(location, year, sex, gsi) %>% * rename(gonad_size = gsi) ``` ``` ## # A tibble: 2,644 × 4 ## location year sex gonad_size ## <dbl> <dbl> <chr> <dbl> ## 1 1 1 Female 10.4 ## 2 3 1 Female 9.83 ## 3 1 1 Female 9.74 ## 4 1 1 Female 9.31 ## 5 1 1 Female 8.99 ## 6 1 1 Female 8.77 ## 7 1 1 Female 8.26 ## 8 3 1 Female 7.40 ## 9 3 1 Female 7.22 ## 10 1 1 Female 6.84 ## # ℹ 2,634 more rows ``` --- # `rename_all()` * Works similar to `janitor::clean_names()` Change all the column names to upper case ```r mydata %>% select(location, year, sex, gsi) %>% * rename_all(toupper) ``` ``` ## # A tibble: 2,644 × 4 ## LOCATION YEAR SEX GSI ## <dbl> <dbl> <chr> <dbl> ## 1 1 1 Female 10.4 ## 2 3 1 Female 9.83 ## 3 1 1 Female 9.74 ## 4 1 1 Female 9.31 ## 5 1 1 Female 8.99 ## 6 1 1 Female 8.77 ## 7 1 1 Female 8.26 ## 8 3 1 Female 7.40 ## 9 3 1 Female 7.22 ## 10 1 1 Female 6.84 ## # ℹ 2,634 more rows ``` --- class: inverse, center, middle # Some more useful functions in the tidyverse family --- class: center, middle <img src="img/relocate.jpg" width=70%> --- class: center, middle <img src="img/accross.jpg" width=70%> .footnote[ * Syntax has changed slightly: across(<font color="red">when(</font>is.numeric<font color="red">)</font>, .f) ] --- class: center, middle <img src="img/case_when.jpg" width=70%> --- class: exercise, middle # Let's practice! Work with your neighbor. Using across() and case_when(): Group the squid data by location, calculate the maximum of each numeric column except sample, and add a new column that makes a note of whether at a given location, sampling was done all year, or if it stopped in the spring. --- class: exercise # If you want more practice Open `swirl` ### For practicing manipulating data with `tidyverse`: Download the course "Getting and Cleaning Data" `swirl::install_course("Getting and Cleaning Data")` Work on sections 1-3 ### If you want a challenge, try `purrr`: Download the course "Advanced R Programming" `swirl::install_course("Advanced R Programming")` Work on sections 2 and 3 --- class: inverse, center # Congratulations! ## You now know the basics of `tidyverse` <img src="img/tidyverse_cartoon.jpg" width=70%> .footnote[ credit: Allison Horst ]