WG Chapter 2

Harold Nelson

2023-01-20

Setup

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.2      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(palmerpenguins)

Exercise 2.2.5.1

How many rows are in penguins? How many columns?

Solution

str(penguins)
## tibble [344 × 8] (S3: tbl_df/tbl/data.frame)
##  $ species          : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ island           : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ bill_length_mm   : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
##  $ bill_depth_mm    : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
##  $ flipper_length_mm: int [1:344] 181 186 195 NA 193 190 181 195 193 190 ...
##  $ body_mass_g      : int [1:344] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
##  $ sex              : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...
##  $ year             : int [1:344] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...

Exercise 2.2.5.

What does the bill_depth_mm variable in the penguins data frame describe? Read the help for ?penguins to find out.

Solution

?palmerpenguins

Exercise 2.2.5.3

Make a scatterplot of bill_depth_mm vs. bill_length_mm. Describe the relationship between these two variables.

Solution

ggplot(data = penguins, 
       mapping = aes(bill_depth_mm,bill_length_mm)) +
  geom_point()
## Warning: Removed 2 rows containing missing values (geom_point).

Exercise 2.2.5.4

What happens if you make a scatterplot of species vs bill_depth_mm? Why is the plot not useful?

Solution

ggplot(data = penguins, 
       mapping = aes(species, bill_depth_mm)) +
  geom_point()
## Warning: Removed 2 rows containing missing values (geom_point).

ggplot(data = penguins, 
       mapping = aes(species, bill_depth_mm)) +
  geom_boxplot()
## Warning: Removed 2 rows containing non-finite values (stat_boxplot).

Exercise 2.2.5.5

Why does the following give an error and how would you fix it?

ggplot(data = penguins) + geom_point()

Solution

There is no mapping of variables to visual attributes. You need to add an aes() to either the call to ggplot or in the geom_point().

Exercise 2.2.5.6

What does the na.rm argument do in geom_point()? What is the default value of the argument? Create a scatterplot where you successfully use this argument set to TRUE.

Solution

ggplot(data = penguins, 
       mapping = aes(species, bill_depth_mm)) +
  geom_boxplot(na.rm = TRUE)

You don’t see the warning. The default value is FALSE.

Exercise 2.2.5.7

Add the following caption to the plot you made in the previous exercise: “Data come from the palmerpenguins package.” Hint: Take a look at the documentation for labs().

Solution

ggplot(data = penguins, 
       mapping = aes(species, bill_depth_mm)) +
  geom_boxplot(na.rm = TRUE) + 
  labs(caption = "Data come from the palmerpenguins package.")

Exercise 2.2.5.8

Recreate the following visualization. What aesthetic should bill_depth_mm be mapped to? And should it be mapped at the global level or at the geom level?

ggplot(data = penguins,
       aes(y = body_mass_g,x = flipper_length_mm)) +
       geom_point(aes(color = bill_depth_mm)) +
       geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).