Why Separate Data in Visualizations?

When we visualize data, our primary goal is often to uncover patterns, trends, and relationships. However, datasets are rarely homogenous; they often contain distinct subgroups, such as different categories, time periods, or experimental conditions. Visualizing the entire dataset as a single entity can sometimes obscure important details or, worse, lead to misleading conclusions, a phenomenon famously illustrated by Simpson’s Paradox.

Separating out data by these subgroups in our visualizations is crucial for several reasons. It allows us to reveal subgroup-specific patterns, as different groups within your data might behave very differently. These unique trends might be averaged out or hidden in an aggregated view. Furthermore, by plotting subgroups side-by-side or distinguishing them with visual cues like different colors or shapes, we can effectively make comparisons of their characteristics, distributions, or trends. For complex datasets with many observations or multiple variables, breaking down the visualization into smaller, more focused parts can significantly improve clarity, making it much easier to understand and interpret. Finally, separating data helps in avoiding misinterpretation, as aggregated data can hide underlying variations. For example, a positive trend observed in an overall dataset might actually mask a negative trend within one or more significant subgroups.

R’s ggplot2 package offers powerful and flexible ways to achieve this separation, primarily through two main strategies: 1. Mapping variables to aesthetics: Using visual properties like color, shape, size, or linetype to distinguish groups within a single plot. 2. Faceting: Creating multiple subplots (small multiples), where each subplot displays a different subset of the data.

Let’s explore these methods.

Separating Data Using Aesthetics

One common and intuitive way to separate data is by mapping a categorical variable from your dataset to an aesthetic property of the geoms in your plot. Aesthetics like color, shape, size, or linetype can be used to visually differentiate groups.

In the example below, we want to see if the relationship between GDP per capita and life expectancy differs across continents. We achieve this by mapping the continent variable to the color aesthetic within aes(). When this mapping is done, ggplot2 automatically assigns a unique color to each continent, applies these colors to the points (geom_point) and the smoothed lines (geom_smooth), and generates a legend to identify which color corresponds to which continent.

p <- ggplot(data = gapminder %>% filter(year==1987),
            mapping = aes(x = gdpPercap,
                          y = lifeExp,
                          color = continent)) 
p + geom_point(alpha=0.2) +
    geom_smooth(method = "lm", se=F, formula = y ~ x) + 
    scale_x_log10(labels = scales::dollar_format(accuracy = 1)) + 
  labs(x = "GDP Per Capita", y = "Life Expectancy in Years",
       title = "Economic Growth and Life Expectancy by Continent",
       subtitle = "Data points are country-years; lines are per-continent linear models",
       caption = "Source: Gapminder.")

This single plot now clearly shows five distinct lines, one for each continent, allowing us to compare their respective trends. For instance, we can observe differences in the slope or intercept of the relationship between GDP and life expectancy across continents.

Separating Data Using Faceting

Another powerful method for separating data is faceting. Faceting creates a grid of plots (often called “small multiples”), where each plot displays a subset of the data corresponding to a level of one or more categorical variables. This is particularly useful when you want to see the same type of plot for different groups side-by-side, making comparisons straightforward, especially if using aesthetics for separation would lead to a cluttered plot.

Here’s an example from Jonathan Rodden’s work on left vote share and population density in majoritarian democracies. Notice how much information it presents in a very compact but clear fashion. And notice how it’s obviously a ggplot…

Basic Faceting with `facet_wrap()`

facet_wrap() is typically used to create a grid of plots based on a single categorical variable. It “wraps” a sequence of panels into a 2D grid. Let’s use it to create separate plots for each continent, showing the relationship between GDP per capita and life expectancy.

In this case, the color aesthetic is removed from the main aes() mapping (or not set globally), and instead, facet_wrap(~ continent) is added. This tells ggplot2 to create one panel for each unique value in the continent column.

# Define the base plot without color aesthetic for faceting

p_base_data <- gapminder %>% filter(year==1987)

p_base <- ggplot(data = gapminder %>% filter(year==1987),
            mapping = aes(x = gdpPercap, y = lifeExp))

p_base + geom_point(alpha = 0.2) +
    geom_smooth(method = "lm", formula = y ~ x) + 
    scale_x_log10(labels = scales::dollar_format(accuracy = 1)) + 
    facet_wrap(~ continent) + 
    labs(x = "GDP Per Capita", y = "Life Expectancy in Years",
         title = "Economic Growth and Life Expectancy (Faceted by Continent)",
         subtitle = "Data points are country-years; one plot per continent",
         caption = "Source: Gapminder.")

## Warning in qt((1 - level)/2, df): NaNs produced

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

In this faceted plot, facet_wrap(~ continent) creates a separate plot panel for each continent. Each plot shows all data points for that continent and a single linear model fit to that continent’s data. By default, all facets share the same x and y scales, which is crucial for making valid comparisons of trends and data distributions across the facets.

levels(gapminder$continent)

## [1] "Africa"   "Americas" "Asia"     "Europe"   "Oceania"

Reordering Facets using `forcats::fct_reorder()`

By default, facet_wrap() orders the facets alphabetically by the levels of the faceting variable (if it’s a character string) or by the factor levels (if it’s a factor). Sometimes, you might want to order the facets in a more meaningful way, for example, by some summary statistic of the data within each facet.

The forcats package (part of the tidyverse) provides functions that can make factor manipulation, including reordering, more concise. The fct_reorder() function is particularly useful here. It reorders the levels of a factor based on values of another variable (typically after a summary function like mean or median is applied).

Let’s reorder the continent facets by their average life expectancy in 1987, from highest to lowest (descending order), using fct_reorder():

# Reorder continent factor directly within the mutate step using fct_reorder
# We want to order 'continent' by 'lifeExp', using the mean of lifeExp for ordering.
# The .desc = TRUE argument ensures descending order.
p_base_data <- gapminder %>% filter(year==1987)

p_base_data_forcats_ordered <- p_base_data %>%
  filter(!is.na(lifeExp)) %>%
  mutate(continent = fct_reorder(continent, lifeExp, .fun = mean, .na_rm = TRUE, .desc = TRUE))

# Create the plot
ggplot(data = p_base_data_forcats_ordered,
       mapping = aes(x = gdpPercap, y = lifeExp)) +
  geom_point(alpha = 0.2) +
  geom_smooth(method = "lm", formula = y ~ x) + 
  scale_x_log10(labels = scales::dollar_format(accuracy = 1)) + 
  facet_wrap(~ continent) + 
  labs(x = "GDP Per Capita", y = "Life Expectancy in Years",
       title = "Economic Growth and Life Expectancy (Faceted by Continent, using forcats)",
       subtitle = "Facets ordered by average life expectancy in 1987 (highest to lowest)",
       caption = "Source: Gapminder.") +
  theme_minimal()

## Warning in qt((1 - level)/2, df): NaNs produced

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

As you can see, fct_reorder() makes the code for reordering factors based on summary statistics more compact and often more readable once you are familiar with its syntax. It handles the summarization and re-leveling internally. For simple manual ordering where the order isn’t data-driven, directly using factor(variable, levels = c("level3", "level1", "level2")) remains the most straightforward approach.

Customizing Facets

You can customize the appearance and behavior of facets to improve readability and aesthetics.

Adjusting the Number of Rows and Columns

You can control the layout of the facets created by facet_wrap() using the nrow or ncol parameters. This determines how many rows or columns the grid of plots will have.

# Using the data with forcats-ordered continents for consistency
ggplot(data = p_base_data_forcats_ordered,
       mapping = aes(x = gdpPercap, y = lifeExp)) +
  geom_point(alpha = 0.2) +
  geom_smooth(method = "lm", formula = y ~ x) +
  scale_x_log10(labels = scales::dollar_format(accuracy = 1)) + 
  facet_wrap(~ continent, ncol = 2) + 
  labs(x = "GDP Per Capita", y = "Life Expectancy in Years",
       title = "Economic Growth and Life Expectancy (2 Columns)",
       subtitle = "Facets ordered by average life expectancy",
       caption = "Source: Gapminder.") +
  theme_minimal()

## Warning in qt((1 - level)/2, df): NaNs produced

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

Here, ncol = 2 arranges the five continent facets into a grid with two columns (and three rows, in this case).

Free Scales

Sometimes, the range of data can vary significantly between facets. While shared scales are good for direct comparison, forcing all facets to use the same scales might make it difficult to see patterns within individual facets if one group has a much wider data range than others. In such cases, it can be useful to allow each facet to have its own scale.

The scales argument in facet_wrap() (and facet_grid()) controls this. The default, "fixed", means all panels share the same scales. Setting scales = "free" allows each panel to use its own scales for both x and y axes. Alternatively, "free_x" lets each panel use its own x-axis scale while sharing the y-axis scale, and "free_y" does the opposite.

# Using the data with forcats-ordered continents
ggplot(data = p_base_data_forcats_ordered,
       mapping = aes(x = gdpPercap, y = lifeExp)) +
  geom_point(alpha = 0.2) +
  geom_smooth(method = "lm", formula = y ~ x) +
  scale_x_log10(labels = scales::dollar_format(accuracy = 1)) + 
  facet_wrap(~ continent, scales = "free") + 
  labs(x = "GDP Per Capita", y = "Life Expectancy in Years",
       title = "Economic Growth and Life Expectancy (Free Scales)",
       subtitle = "Facets ordered by average life expectancy; scales vary by continent",
       caption = "Source: Gapminder.") +
  theme_minimal()

## Warning in qt((1 - level)/2, df): NaNs produced

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

With scales = "free", each facet’s axes are scaled to fit its own data range. This can help reveal details within each specific continent’s plot, but makes direct visual comparison of slopes or absolute values across continents more challenging. The choice depends on the analytical goal.

Case Study: Statehouse Democracy

Let’s apply these concepts to the Statehouse Democracy dataset. First, we set up the data as in previous lessons. This involves selecting relevant variables, filtering out Alaska and Hawaii, and creating some new variables like polconserv and region.name.

states <- get_cspp_data(vars=c("pid","ideo","statemin","pollib_median","region"),years=seq(1976,2020),core=F) %>%
  select(-stateno,-state_fips,-state_icpsr) %>%
  filter(!st%in%c("AK","HI")) %>%
  mutate(
        pid = round(pid,2),
        ideo = round(ideo,2),
        pollib_median = round(pollib_median,2),
        polconserv = -(pollib_median),
        year=as.character(year), 
        ymd=lubridate::ymd(paste0(year, "-01-01")),
        region.name = case_when(
          region == 1 ~ "south",
          region == 2 ~ "west",
          region == 3 ~ "midwest",
          region == 4 ~ "northeast",
        TRUE ~ NA_character_
    )) %>%
  relocate(region.name, .after=state)

Combining Multiple Years on One Plot using Aesthetics

Previously, we might have created entirely separate plots for different years. An alternative is to combine data from multiple years onto a single plot. By mapping the year variable to the color aesthetic (after converting year to a factor so ggplot2 treats it as a discrete categorical variable), we can directly compare the relationship between partisanship (pid) and ideology (ideo) for the years 1976 and 2011 on the same set of axes. This allows us to visually assess the overall relationship in each year, whether the slope or intercept of this relationship has changed, and how individual states compare across these time points.

states

## # A tibble: 2,205 × 11
##    st    state   region.name year    pid  ideo statemin pollib_median region
##    <chr> <chr>   <chr>       <chr> <dbl> <dbl>    <dbl>         <dbl>  <int>
##  1 AL    Alabama south       1976   0.23 -0.21    NA            -1.43      1
##  2 AL    Alabama south       1977   0.25 -0.08    NA            -1.42      1
##  3 AL    Alabama south       1978   0.44 -0.18    NA            -1.47      1
##  4 AL    Alabama south       1979   0.38 -0.21    NA            -1.51      1
##  5 AL    Alabama south       1980   0.38 -0.23     3.1          -1.74      1
##  6 AL    Alabama south       1981   0.25 -0.31     3.35         -1.7       1
##  7 AL    Alabama south       1982   0.29 -0.26     3.35         -1.76      1
##  8 AL    Alabama south       1983   0.29 -0.27     3.35         -1.82      1
##  9 AL    Alabama south       1984   0.18 -0.15     3.35         -1.78      1
## 10 AL    Alabama south       1985  -0.06 -0.23     3.35         -1.78      1
## # ℹ 2,195 more rows
## # ℹ 2 more variables: polconserv <dbl>, ymd <date>

ggplot(data=states %>% filter(year %in% c("1976","2011")), 
       aes(x=pid,y=ideo,color=as.factor(year))) + 
  geom_smooth(method="lm",formula = y ~ x, se=F) + 
  geom_text(aes(label=st)) + 
  labs(x="Net Partisanship (Democratic PID)",y="Net Ideology (Liberalism)", color = "Year") + 
  scale_x_continuous(labels = scales::percent) + 
  scale_y_continuous(labels = scales::percent) +
  theme_bw()

The plot above shows two distinct lines, one for 1976 and one for 2011. However, the state labels (geom_text) are very crowded, making it difficult to read. This illustrates a common challenge when plotting many data points with labels on a single panel.

Using Faceting for Clarity Across Years

While the combined plot with colors is useful for seeing overarching shifts, the overlapping text labels reduce its clarity. Faceting by year provides an alternative that can alleviate this clutter. By adding facet_wrap(~year), we create separate panels for 1976 and 2011. This approach offers several advantages: each year gets its own dedicated space, making it easier to examine the specific pattern within that year; text labels are less likely to overlap; and ggplot2 ensures consistent scales across facets by default, crucial for valid comparisons.

ggplot(states %>% filter(year %in% c("1976","2011")), 
       aes(x=pid,y=ideo)) + 
  geom_smooth(method="lm",formula = y ~ x, se=F) + 
  geom_text(aes(label=st)) + 
  labs(x="Net Partisanship (Democratic PID)",y="Net Ideology (Liberalism)") +
  scale_x_continuous(labels = scales::percent) +
  scale_y_continuous(labels = scales::percent) +
  facet_wrap(~year)

Now we definitely need smaller labels and to repel them to avoid overlap within each facet! geom_text_repel from the ggrepel package is excellent for this.

ggplot(states %>% filter(year %in% c("1976","2011")), 
       aes(x=pid,y=ideo)) +
  geom_smooth(method="lm",formula = y ~ x, se=F) + 
  ggrepel::geom_text_repel(aes(label=st), size = 3) + 
  labs(x="Net Partisanship (Democratic PID)",y="Net Ideology (Liberalism)",
       title="State Partisanship vs. Ideology", subtitle="Faceted by Year (1976 & 2011)") +
  scale_x_continuous(labels = scales::percent) +
  scale_y_continuous(labels = scales::percent) +
  facet_wrap(~year)+
  theme_bw()

## Warning: ggrepel: 6 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

This version is much clearer, with each year’s data neatly presented in its own panel and labels positioned to minimize overlap.

Exercises

Generate a facet wrapped plot for the relationship between partisanship (pid) and state minimum wage (statemin) for the years 1980 and 2011.
Generate a facet wrapped plot for the relationship between partisanship (pid) and state policy liberalism (pollib_median) across three different years: 1980, 1990, and 2000.

states %>% filter(year %in% c("1976","2011")) %>% select(st, statemin)

## # A tibble: 98 × 2
##    st    statemin
##    <chr>    <dbl>
##  1 AL       NA   
##  2 AL        7.25
##  3 AR       NA   
##  4 AR        6.25
##  5 AZ       NA   
##  6 AZ        7.35
##  7 CA       NA   
##  8 CA        8   
##  9 CO       NA   
## 10 CO        7.36
## # ℹ 88 more rows

ggplot(states %>% filter(year %in% c("1983","2011")), 
       aes(x=pid,y=statemin)) +
  geom_smooth(method="lm",formula = y ~ x, se=F) + 
  ggrepel::geom_text_repel(aes(label=st), size = 3) + 
  labs(x="Net Partisanship (Democratic PID)",y="Minimum Wage",
       title="State Partisanship vs. Minimum Wage", subtitle="Faceted by Year (1976 & 2011)") +
  scale_x_continuous(labels = scales::percent) +
  facet_wrap(~year)+
  theme_bw()

## Warning: ggrepel: 28 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

## Warning: ggrepel: 34 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

We can further enhance this by combining faceting with aesthetics. For instance, in addition to faceting by year, we can map the region.name variable to the color aesthetic for the text labels. This will show us if the relationship between partisanship and ideology differs by region within each year. The smoothed line will represent the overall trend for all states within that year’s facet.

# Combining faceting by year with color aesthetic for region (on text only)
ggplot(states %>% filter(year %in% c("1976","2011")), 
       aes(x=pid, y=ideo)) + 
  geom_smooth(method="lm", formula = y ~ x, se=F) + 
  ggrepel::geom_text_repel(aes(label=st, color=region.name), size = 3) + 
  labs(x="Net Partisanship (Democratic PID)",y="Net Ideology (Liberalism)",
       title="State Partisanship vs. Ideology by Region", 
       subtitle="Faceted by Year (1976 & 2011), Text Colored by Region", 
       color = "Region") + 
  scale_x_continuous(labels = scales::percent) +
  scale_y_continuous(labels = scales::percent) +
  facet_wrap(~year)+
  theme_bw()

## Warning: ggrepel: 3 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

## Warning: ggrepel: 11 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

This combined approach allows for a richer comparison, showing both year-over-year changes (via facets) and regional differences within those years (via text color), with a single trend line per year.

By using faceting, we can effectively compare these relationships across multiple years, with each year presented clearly in its own panel. This makes it easier to identify changes in trends or patterns over time.

Faceting with facet_grid

While facet_wrap() is excellent for faceting by a single variable, ggplot2 also offers facet_grid(). facet_grid() creates a 2D grid of panels defined by one or two categorical variables, with rows corresponding to the levels of one variable and columns to the levels of another (e.g., facet_grid(var1 ~ var2)). This can be very useful for more structured comparisons across two dimensions.

Let’s switch from facet_wrap() to facet_grid() [Slide]

First let’s look at a new data set, the 2016 General Social Survey (GSS). Notice how many of the variables are now categorical, as opposed to continuous in the Gapminder data.

head(gss_sm,n=20)

## # A tibble: 20 × 32
##     year    id ballot       age childs sibs   degree race  sex   region income16
##    <dbl> <dbl> <labelled> <dbl>  <dbl> <labe> <fct>  <fct> <fct> <fct>  <fct>   
##  1  2016     1 1             47      3 2      Bache… White Male  New E… $170000…
##  2  2016     2 2             61      0 3      High … White Male  New E… $50000 …
##  3  2016     3 3             72      2 3      Bache… White Male  New E… $75000 …
##  4  2016     4 1             43      4 3      High … White Fema… New E… $170000…
##  5  2016     5 3             55      2 2      Gradu… White Fema… New E… $170000…
##  6  2016     6 2             53      2 2      Junio… White Fema… New E… $60000 …
##  7  2016     7 1             50      2 2      High … White Male  New E… $170000…
##  8  2016     8 3             23      3 6      High … Other Fema… Middl… $30000 …
##  9  2016     9 1             45      3 5      High … Black Male  Middl… $60000 …
## 10  2016    10 3             71      4 1      Junio… White Male  Middl… $60000 …
## 11  2016    11 2             33      5 4      High … Black Fema… Middl… under $…
## 12  2016    12 1             86      4 4      High … White Fema… Middl… under $…
## 13  2016    13 2             32      3 3      High … Black Male  Middl… $8 000 …
## 14  2016    14 3             60      5 6      High … Black Fema… Middl… $12500 …
## 15  2016    15 2             76      7 0      Lt Hi… White Male  New E… $40000 …
## 16  2016    16 3             33      2 1      High … White Fema… New E… $50000 …
## 17  2016    17 3             56      6 3      High … White Male  New E… $50000 …
## 18  2016    18 2             62      5 8      Lt Hi… Other Fema… New E… $5 000 …
## 19  2016    19 2             31      0 2      Gradu… Black Male  New E… $35000 …
## 20  2016    20 1             43      2 0      High … Black Male  New E… $25000 …
## # ℹ 21 more variables: relig <fct>, marital <fct>, padeg <fct>, madeg <fct>,
## #   partyid <fct>, polviews <fct>, happy <fct>, partners <fct>, grass <fct>,
## #   zodiac <fct>, pres12 <labelled>, wtssall <dbl>, income_rc <fct>,
## #   agegrp <fct>, ageq <fct>, siblings <fct>, kids <fct>, religion <fct>,
## #   bigregion <fct>, partners_rc <fct>, obama <dbl>

Let’s create a scatterplot of the relationship between respondent age and number of children. The small multiples we are doing are a cross-classification between bigregion and degree. We add a smoother and an alpha for the actual points.

R’s formula notation is used for the cross-classification (bigregion ~ degree).

Notice that ggplot removes certain observations because of missing values. You could have subsetted the data before to get rid of them as well, but ggplot works hard for you.

p <- ggplot(data = subset(gss_sm,!is.na(age) & !is.na(childs) & !is.na(bigregion) & !is.na(degree)),
            mapping = aes(x = age, y = childs))
p + geom_point(alpha = 0.2) +
    geom_smooth(method="lm", formula = y ~ x) +
    facet_grid(bigregion ~ degree) +
    theme_bw()

We can reverse the order of the cross-classification.

p + geom_point(alpha = 0.2) +
    geom_smooth(method="lm",formula = y ~ x) +
    facet_grid(degree ~ bigregion)

Choosing Between Aesthetics and Faceting for Separation

Both mapping to aesthetics (like color or shape) and faceting are powerful ways to show data for different subgroups. How do you choose which one to use?

Generally, use aesthetics like color or shape when you have a relatively small number of groups (e.g., 2-5) and want to directly compare them on the same set of axes. This is effective for highlighting group differences within a single, unified plot, such as comparing the slopes of regression lines.

Conversely, use faceting when dealing with a larger number of groups where different colors or shapes would become confusing, or when each group needs its own dedicated space for clarity. This is the idea of “small multiples,” useful for presenting the same type of plot for many segments of your data consistently, especially if overplotting is an issue in a single panel.

Sometimes, you can even combine both strategies: use facets to separate by one variable (e.g., different years) and then use aesthetics like color or shape to distinguish subgroups within each facet (e.g., different regions within each year). We’ll see an example of this in the Case Study section.

Combining Plots Manually with `patchwork`

While ggplot2’s faceting system is powerful for creating small multiples of similar plots, sometimes you need more flexibility to combine different types of plots or arrange them in more complex layouts. This is where the patchwork package comes in. patchwork provides an intuitive way to combine separate ggplot objects into a single figure using arithmetic operators.

Using patchwork is beneficial because it allows you to combine disparate plots, such as placing a scatter plot next to a bar chart, which isn’t directly achievable with faceting. It enables complex layouts, including nested arrangements and control over relative widths/heights, and you can add overall annotations. Importantly, patchwork maintains ggplot objects; you create individual plots as usual, and patchwork handles the assembly.

Let’s create two different plots using the gapminder dataset and then combine them.

Prepare data for 2007

# Filter data for 2007
data_2007 <- gapminder %>%
  filter(year == 2007)

# Create summarized data for continent averages
avg_life_exp_data <- data_2007 %>%
  group_by(continent) %>%
  summarise(avg_lifeExp = mean(lifeExp), .groups = 'drop')

Plot 1: Scatter plot of GDP vs. Life Expectancy for Asia in 2007

plot1_asia_scatter <- ggplot(data = data_2007 %>% filter(continent == "Asia"),aes(x = gdpPercap, y = lifeExp)) + 
  geom_point(aes(size = pop), color = "steelblue", alpha = 0.7) +
  geom_text_repel(aes(label = country), size = 3) +
  scale_x_log10(labels = scales::dollar_format(accuracy = 1)) + 
  scale_size_continuous(name = "Population", labels = scales::comma) +
  labs(title = "Asia: GDP vs. Life Expectancy (2007)",
       x = "GDP Per Capita (log scale)",
       y = "Life Expectancy") +
  theme_minimal()

plot1_asia_scatter # Optionally print individual plot

## Warning: ggrepel: 1 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

Plot 2: Bar chart of average Life Expectancy by Continent in 2007

plot2_avg_life_exp_bar <- avg_life_exp_data %>%
  ggplot(aes(x = reorder(continent, avg_lifeExp), y = avg_lifeExp, fill = continent)) +
  geom_col(show.legend = FALSE) +
  coord_flip() + 
  labs(title = "Average Life Expectancy by Continent (2007)",
       x = "Continent",
       y = "Average Life Expectancy (Years)") +
  theme_minimal()

# print(plot2_avg_life_exp_bar) # Optionally print individual plot

Now, let’s combine these two plots using patchwork. - plot1 + plot2 will place them side-by-side. - plot1 / plot2 will place plot1 above plot2.

# Combine plots side-by-side
plot1_asia_scatter + plot2_avg_life_exp_bar

## Warning: ggrepel: 3 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

We can also arrange them vertically:

# Combine plots vertically
plot1_asia_scatter / plot2_avg_life_exp_bar

## Warning: ggrepel: 4 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

patchwork also allows for more complex arrangements. For instance, (plot1 | plot2) is another way for side-by-side, and you can control layouts with plot_layout() for things like number of columns/rows, or relative widths/heights. You can also add overall titles and annotations to the assembled patchwork using plot_annotation().

Example with plot_layout and plot_annotation:

# More complex layout: plot1 takes up more space
plot1_asia_scatter + plot2_avg_life_exp_bar + 
  plot_layout(widths = c(2, 1)) + 
  plot_annotation(
    title = "Combined Gapminder Insights (2007)",
    caption = "Data source: Gapminder package"
  )

## Warning: ggrepel: 3 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

# Ensure the plot object is the last thing to be evaluated in the chunk

R Visualization 3: Separating out the Data

Boris Shor

2025-06-03

Why Separate Data in Visualizations?

Separating Data Using Aesthetics

Separating Data Using Faceting

Basic Faceting with `facet_wrap()`

Reordering Facets using `forcats::fct_reorder()`

Customizing Facets

Adjusting the Number of Rows and Columns

Free Scales

Case Study: Statehouse Democracy

Combining Multiple Years on One Plot using Aesthetics

Using Faceting for Clarity Across Years

Exercises

Faceting with facet_grid

Choosing Between Aesthetics and Faceting for Separation

Combining Plots Manually with `patchwork`

R Visualization 3: Separating out the Data

Boris Shor

2025-06-03

Why Separate Data in Visualizations?

Separating Data Using Aesthetics

Separating Data Using Faceting

Basic Faceting with facet_wrap()

Reordering Facets using forcats::fct_reorder()

Customizing Facets

Adjusting the Number of Rows and Columns

Free Scales

Case Study: Statehouse Democracy

Combining Multiple Years on One Plot using Aesthetics

Using Faceting for Clarity Across Years

Exercises

Faceting with facet_grid

Choosing Between Aesthetics and Faceting for Separation

Combining Plots Manually with patchwork

Basic Faceting with `facet_wrap()`

Reordering Facets using `forcats::fct_reorder()`

Combining Plots Manually with `patchwork`