R for Data Science (Answers for Exercise 3.5.1)

What happens if you facet on a continous variable?

Multiple panels or subplots are created based on the distinct values of that continous variable.Each subplot represents a subset of the data defined by the range or category of the continous variable

What do the empty cells in plot with facet_grid(drv ~ cyl) mean? How do they relate to this plot?

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = drv, y = cyl))

When using facet_grid(drv ~ cyl), it generates a grid of panels where each panel represents a unique combination of drv and cyl values. The empty cells in the grid indicate that there are no observations with those specific combinations in the dataset

What plots does the following code make? What does . do?

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_grid(drv ~ .)

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_grid(. ~ cyl)

The . in facet_grid(drv ~ .) represents a placeholder for the variable that is not explicitly specified. In this case, it means that the panels are created based on the values of the drv variable, and the . indicates that the variable on the y-axis is not used for faceting. Therefore, the plot will have separate panels for each unique value of drv, arranged in rows.

In this case, the . before the ~ indicates that the variable on the x-axis is not used for faceting, and the panels are created based on the values of the cyl variable. Therefore, the plot will have separate panels for each unique value of cyl, arranged in columns.

Take the first faceted plot in this section:

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) + 
  facet_wrap(~ class, nrow = 2)

What are the advantages to using faceting instead of the colour aesthetic? What are the disadvantages? How might the balance change if you had a larger dataset?

Clear visual separation of subsets, making it easier to compare relationships within each subset. Flexibility to create a grid of panels with different layouts. Faceting can uncover patterns or trends that might be obscured by using color alone.

Disadvantages of Faceting: Limited ability to compare relationships across subsets. Inefficient when there are many subsets or overlapping points. Faceting can result in smaller panels, reducing visibility and interpretability.

Read ?facet_wrap. What does nrow do? What does ncoldo? What other options control the layout of the individual panels? Why doesn’t facet_grid() have nrowand ncol arguments? The facet_wrap() function in ggplot2 is used to create a grid of panels by wrapping a single variable

Regarding the absence of nrow and ncol arguments in facet_grid(), it is because facet_grid() creates a grid of panels using two variables, resulting in a more explicit specification of the panel arrangement. The number of rows and columns in the grid are determined by the unique combinations of levels from each variable, ensuring that all possible combinations are represented. Specifying nrow and ncol would not be meaningful in this context as the arrangement is defined by the variables themselves.

When using facet_grid() you should usually put the variable with more unique levels in the columns. Why?

This convention helps to optimize the layout of panels and make the resulting plot more readable. Here’s why:

Facet Panel Width: Placing the variable with more unique levels in the columns ensures that the resulting panels are wider, which allows for better visibility of the data within each panel. Wider panels provide more space for the x-axis labels, making it easier to read and interpret the information.

Efficient Use of Space: By putting the variable with more unique levels in the columns, you can potentially create a more compact grid of panels. This efficient use of space is particularly helpful when dealing with a large number of facets, as it minimizes the need for scrolling or zooming to view the entire plot.

Logical Reading Order: Placing the variable with more unique levels in the columns aligns with the convention of reading from left to right. It follows the natural reading order in many cultures and makes it easier for viewers to navigate the plot and understand the relationship between the variables.

R for Data Science (Answers for Exercise 3.5.1)

JM Walker

2023-06-09