M04 - Advanced Data Visualization Overview

Author

Nicole Bewley-Hudson

Published

March 1, 2026

[conflicted] Will prefer palmerpenguins::penguins over any other package.

Introduction

This reflection summarizes what I learned throughout the Advanced Data Visualization module. Each section below includes the original prompt followed directly by my response. My goal is not only to summarize content, but to reflect on how my understanding of data visualization has evolved.

1. Thomas Lin Pedersen – ggplot2 Workshop (Part 2)

Prompt: Summarize what you learned from the presentation by Thomas Lin Pedersen on “ggplot2 workshop: Part 2” in Step 1.

Response:

Thomas Lin Pedersen’s workshop deepened my understanding of ggplot2 beyond simply building plots amd it reinforced why ggplot2 works the way it does. The most important takeaway for me was that the grammar of graphics is not just a plotting framework.

Important

The grammar of graphics is not just a plotting framework — it is a way of thinking.

Once you understand the theoretical backbone, you stop memorizing functions and start reasoning through structure. Visualization is no longer about decoration. It becomes about constructing meaning layer by layer.

Once you internalize this structure, you stop asking “Which function makes this chart?” and start asking “How is this constructed grammatically?”

Core Components of ggplot2

Pedersen emphasized that ggplot2 is built around composable elements that work together systematically:

The information being visualized.

How variables are mapped to visual properties.

The geometric objects that represent the data.

Statistical transformations applied before rendering.

How data values are translated into visual space.

The spatial system that determines layout.

Stylistic elements that control appearance.

What stood out to me is that once you internalize this structure, you can deconstruct almost any visualization and understand how it was built — or how it should have been built.

“When you understand the grammar, you don’t need to memorize everything — you understand the logic behind it.”

Moving Beyond Core ggplot2

A major focus of Part 2 was the idea of the extended universe of ggplot2. Instead of endlessly expanding the core package, ggplot2 exposes extension mechanisms that allow other packages to build on top of the grammar. This design choice makes ggplot2 scalable without becoming unmanageable.

Three extension ideas that stood out:

Plot Composition (patchwork)
Instead of hacking facets, plot composition can be done programmatically and reproducibly.
Animation (gganimate)
Animation extends the grammar rather than replacing it. Transitions become part of the grammar itself.
Annotation & Network Extensions (ggforce, ggrepel, ggraph)
These packages expand storytelling power while still respecting the grammar structure.

This reinforced that ggplot2 is not static — it is a framework designed for extensibility.

Rethinking Chart Types

One concept that genuinely changed how I think about visualization was the idea that chart types are not fixed categories. For example, I came to understand that a pie chart is essentially a stacked bar chart transformed into polar coordinates.

Figure 1: Pie chart constructed from stacked bar chart using polar coordinates

2. ggplot2 Scales and Transformations

Prompt:

In Step 2, a short video on “ggplot2 Scales and Transformations - Data Communication / Data Visualization | Video (13:36)” was introduced. (1) Summarize it and (2) discuss how it fits with the grammar of graphics you learned in Step 1.

Response:

Core Idea

Scales translate raw data into visual meaning.

Summary of the Video

The video explored how scales operate within ggplot2 and how they determine the way raw data becomes visible on a graph. Every mapped aesthetic, whether x, y, color, fill, or size, automatically has a scale even when we do not explicitly define one. The presenter emphasized the distinction between continuous and discrete scales and showed how ggplot2 allows us to modify them using scale_*_*() functions.

Through examples, he demonstrated how scales can relabel values, such as converting 0.06 into 6%, adjust limits to control displayed range, manually assign specific colors to categories, apply transformations such as log or reverse scaling, and bin continuous variables into grouped categories. The central insight was that scales are responsible for transforming raw data into visual representations. They determine not just labels, but how magnitude, spacing, and structure appear to the viewer.

For example, formatting percentages directly within the scale function makes the transformation explicit in code:

scale_y_continuous(
  labels = scales::percent,
  limits = c(0, 0.1)
)

Connection to Grammar of Graphics

This video clearly connected it to the grammar of graphics from Step 1. In the grammar framework, we map variables to aesthetics. However, scales determine how those mapped aesthetics are rendered and interpreted.

Grammar Component	Role	Example from Video
Aesthetics	Map variables to axes or color	x, y, fill
Scales	Translate mapped values	percent labels
Transformations	Adjust structural perception	log scaling

Transformations such as scale_y_log10() or reversing an axis with scale_x_reverse() do not change the underlying dataset. Instead, they alter how the data is visually structured and interpreted. That distinction clarified something important for me: scales are not cosmetic adjustments. They are structural components of the grammar that influence how magnitude, grouping, and relationships are perceived.

Note

Understanding this helped me see that applying a transformation is not simply “formatting” a chart — it reshapes how the audience understands patterns and variation. The data remains constant, but the interpretation can shift significantly depending on how the scale is defined.

Demonstrating the Effect of a Scale Transformation

Transformations such as scale_y_log10() do not change the underlying dataset. Instead, they alter how the data is visually structured and interpreted. That distinction clarified something important for me: scales are not cosmetic adjustments — they are structural components of the grammar that influence how magnitude and relationships are perceived.

To illustrate this, the same dataset is plotted twice below. The only difference is the y-axis scale.

Figure 2: Exponential growth displayed on a linear scale (left) and a log-transformed scale (right)

3. David Robinson – Impromptu Visualization

Prompt:

In Step 3, David Robinson’s video demonstrates impromptu data visualization. List three new functions that were not taught previously but were deemed helpful for you in the future. Briefly explain what you can do with each of the functions.

Response:

Below are three functions I found especially useful:

Table 1: Functions I Plan to Use in the Future

Function	What It Does	Why It Matters
`tidyr::pivot_longer()`¹	Reshapes data from wide to long so multiple related columns can be analyzed using one consistent workflow.	Makes it easier to facet, compare metrics, and avoid rewriting the same plot and summary code for each variable.
`tidyr::nest()`²	Splits a dataset into grouped mini datasets stored inside a list column.	Makes repeated analysis scalable, especially when you want to run the same model or summary separately for each group.
`broom::tidy()`³	Converts model output into a tidy data frame with terms, estimates, and p values.	Turns statistical results into something you can filter, compare, and visualize directly in a tidy workflow.

As shown in Table Table 1, these tools expand my ability to explore data quickly and flexibly.

4. gt and gtExtra Packages

Prompt:

What did you like about the gt and gtExtra packages demonstrated in Step 4? How do they complement data visualization using charts that we have spent so much time on so far? Under what circumstances would you prefer to use tables over charts in visualization data?

Response:

Tip

Tables are not “less visual” than charts — they are visual by design when structured intentionally.

What I appreciated most about the gt and gtExtra packages in Step 4 is that they treat tables as a deliberate communication tool rather than a secondary output. The discussion emphasized that tidy data is ideal for analysis, but presentation tables often require reshaping and design decisions so that comparisons become intuitive for the reader.⁴

A well-designed table reduces cognitive load by guiding comparison through alignment, spacing, grouping, and emphasis.

What I Liked Most

The demonstration showed how removing all dividers creates “numbers on a white canvas,” which forces the reader to work harder. Subtle header separation and intentional spacing make it clear what should be compared.

Right-aligning numbers supports place-value comparison because decimals line up vertically. Left-aligning text supports fast scanning. This is not cosmetic — it directly affects readability.

The discussion about decimal places reinforced that too much precision can imply false accuracy. Formatting numbers appropriately ensures the table communicates meaning instead of noise.

Using logic-based styling (e.g., highlighting negative values) allows the data itself to guide attention. This transforms a table into a structured narrative rather than a static output.

Example: A Reader-First Table Using `gt`

Table 2: A structured gt table designed to guide comparison and control precision

	MPG	Cyl	HP	Weight
Top MPG Vehicles (mtcars)
Structured for fast comparison
Toyota Corolla	33.9	4	65.0	1.8
Fiat 128	32.4	4	66.0	2.2
Honda Civic	30.4	4	52.0	1.6
Lotus Europa	30.4	4	113.0	1.5
Fiat X1-9	27.3	4	66.0	1.9
Porsche 914-2	26.0	4	91.0	2.1
Merc 240D	24.4	4	62.0	3.2
Datsun 710	22.8	4	93.0	2.3

As shown in Table Table 2, the formatting decisions such as: alignment, controlled decimals, and subtle highlighting, can make comparison intuitive without overwhelming the reader.

How Tables Complement Charts

Charts work best for…

Trends
Distributions
Relationships
Patterns at a glance

Tables work best for…

Exact values
Lookup tasks
Policy/reporting contexts
Dense comparison across multiple variables

Important

I would prefer tables over charts when precision matters more than pattern recognition. This is especially true in auditing, reporting, or decision-making contexts where exact values must be verified.

Step 4 changed how I think about tables. Instead of viewing them as secondary to charts, I now see them as a complementary visualization format. When thoughtfully structured using gt, tables can communicate hierarchy, emphasis, and comparison just as effectively as charts — and in some cases, more effectively when exact values matter.

Rather than asking “chart or table?”, I now think:
Which format best supports the decision the reader needs to make?

5. Cédrìc Scherer – Tips and Tricks

Prompt:

Pick three tips and tricks you learned from Data Visualization guru Cédrìc Scherer in Step 5 and explain why you liked them.

Response:

Cédrìc Scherer’s talk reframed ggplot2 for me. Instead of presenting it as a plotting tool, he positioned it as a design system built on the grammar of graphics, one that allows you to build visualizations intentionally, layer by layer, by controlling even the smallest details.

What stood out most was his reminder that impressive visualizations are not the result of magic, they are the result of deliberate code and thoughtful refinement. Below are three specific tips that I found especially impactful.

1. Elevating Typography with `ggtext`

One of Scherer’s most practical and powerful tricks was using the ggtext package to render Markdown and HTML inside ggplot elements. In his example of penguin bill dimensions, he corrected the non-italicized genus name in the title by using Markdown syntax and converting the theme element to element_markdown().

This detail may seem small, but it completely changes the professionalism of a plot.

Why I liked this tip

It reinforces that text is part of the design — not just decoration.
It allows scientific accuracy (e.g., italicizing genus names).
It creates visual hierarchy through bolding, coloring, and formatting.
It moves plots from “default” to publication-ready.

Example

Figure 3: Using ggtext to render markdown in plot titles

This aligns with Scherer’s broader philosophy on his website: thoughtful typography and labeling are essential components of effective visual storytelling.

2. Improving ggplot Defaults (Small Tweaks, Big Impact)

Another insight I appreciated was Scherer’s focus on refining ggplot’s default settings. He demonstrated how small design changes — such as aligning titles properly, adjusting legend placement, removing unnecessary axis expansion, or modifying margins — can dramatically improve visual balance.

For example, he explained that by default, plot titles align with the panel, not the outer margin. Changing plot.title.position = "plot" ensures true alignment with the margin.

Why I liked this tip

It highlights that design lives in the margins.
It shows that polish comes from iteration.
It reinforces intentional alignment and spatial balance.
It encourages thinking beyond defaults.

Example

This tip reflects Scherer’s emphasis on incremental improvement. As he demonstrated, even subtle spacing and alignment adjustments significantly elevate the visual outcome.

3. Using `patchwork` for Seamless Composition

Scherer also highlighted the patchwork package as a powerful way to combine multiple plots. Rather than relying on complex layout tools, patchwork uses intuitive operators like + and / to combine plots.

This approach reflects his broader idea that visualization should be modular and compositional.

Why I liked this tip

It simplifies multi-panel storytelling.
It keeps layout reproducible.
It aligns grids automatically.
It enables dashboard-style visualizations.

Example

Figure 5: Combining plots using patchwork for clean composition

This layered composition approach mirrors the complex multi-panel visualizations Scherer showcased, such as energy dashboards and geofaceted layouts.

Broader Takeaway: Design as Iteration and Community

Beyond individual tricks, what resonated most with me was Scherer’s mindset. He emphasized:

Learning from open-source code (especially through TidyTuesday)
Experimenting beyond traditional chart types
Iterating layer by layer
Treating visualization as both analytical and aesthetic

This philosophy is reinforced on his personal website (https://www.cedricscherer.com/top/about/), where he describes combining scientific rigor with design principles. His work demonstrates that visualization is not merely about presenting results, but about crafting clarity through thoughtful decisions.

On his data visualization portfolio page (https://www.cedricscherer.com/top/dataviz/), this mindset becomes tangible. The projects illustrate how typography, spacing, annotation, layout, and composition all work together to guide interpretation. None of the visualizations rely on default settings — they reflect deliberate refinement and design awareness.

Key Insight

What changed for me was the shift from simply making plots to intentionally designing visual communication. That mindset, thinking about alignment, hierarchy, clarity, and refinement, was the most valuable insight I gained from this session.

6. Advanced Visualization Notebook

Prompt:

In Step 6, you will find my advanced visualization notebook. Run the codes one by one and digest as much as you can. What are you most impressed with about the visualization approach or tools demonstrated in the codebook?

Response:

What impressed me most about the advanced visualization codebook is how it shows the workflow of building meaning—not just “making a plot.” The notebook is structured like a toolbox that moves from exploration (finding patterns) to communication (designing charts, animations, maps, and tables that tell a story clearly).

Advanced Data Visualization Reference Guide

1. Setup and Data Preparation

Loading Packages
Loading and Preparing Data
Preparing Data for ANOVA Framework

2. Correlation Visualizations

GGally::ggcorr()
corrplot::corrplot()
- Basic Correlation Plots
- Mixed Correlation Plots
psych::pairs.panels()

3. Barplots with Error Bars (APA 7 Style)

Simple Implementation
Complete Customization

4. Scatterplots and Multi-Variable Visualizations

Scatter Plot (5 Variables)
- Standard Version
- Log-Transformed Version
geom_point (3 Variables)
geom_count (3 Variables)
Bubble Charts (4–5 Variables)

5. Interactive Visualizations (Plotly)

Adding Country Names to Tooltips
Splitting Continents into Panels
Customizing Hover Text
Interactive Bubble Charts

6. Line Plots (Multiple Variables)

Highlighting Two Countries
Highlighting Multiple Countries by Continent

7. Animation

Required Packages: gganimate + gifski
Creating an Animated Bubble Chart
Rendering and Saving Animations

8. Geospatial Mapping

World Mapping (rnaturalearth)

Saving Map Data
World Map
Africa Map
US Map
- States
Brazil: São Paulo Example

Mapping US Cities

Required Mapping Packages
US State Map
Binned Choropleth Map

9. Axis Customization

Adjusting X and Y Scales
Log Transformations
Formatting Labels

10. Publication-Ready Tables (GT Package)

Continuous Data Tables
Categorical Data Tables
Plain gt()
Custom Table Formatting

11. Enhanced Tables with gtExtra

Basic Plotting with Iris
mtcars Examples
- gt_plt_summary()
- gt_plt_dist()
- gt_plt_sparkline()
- Density Distributions
- Histogram Distributions
Survey Data Examples

12. References

Course materials
Package documentation
Webinar resources

Note

Even when some chunks feel “beyond the scope,” the big takeaway is that R/ggplot2 lets you scale up from simple charts to professional outputs by stacking small decisions: data → mapping → layers → scales → annotation → layout → export.

1. High-Dimensional Thinking: Managing Multiple Variables

One of the most important skills demonstrated in the codebook is how to visualize multiple variables simultaneously without overwhelming the viewer. Sections such as “Scatter Plot (5 Variables),” “geom_point (3 Variables),” and “Interactive Bubble Chart (4–5 Variables)” illustrate how complexity can remain readable when visual mappings are intentional.

Rather than adding variables randomly, the notebook layers meaning strategically: - Position encodes primary relationships. - Color differentiates categories. - Size represents magnitude. - Faceting separates groups for clarity.

The key takeaway is that complexity itself is not the problem—poor structure is. When aesthetic mappings are purposeful and supported by clean labeling and thoughtful scaling, high-dimensional charts remain interpretable rather than chaotic.

2. Moving Beyond Static Charts: Interactivity and Animation

The transition from static to dynamic visualization represents a conceptual shift in how data is communicated.

Interactivity (plotly)

Interactive bubble charts demonstrate how tooltips and hover information can replace cluttered text labels. Instead of forcing all information onto the canvas, interaction allows users to explore details selectively. Filtering and panel splitting further reduce visual overload while preserving information richness.

This approach reframes visualization as exploration rather than passive viewing.

Animation (gganimate + gifski)

Animation introduces time as motion rather than compression. Instead of squeezing temporal data into a crowded static image, gganimate reveals change progressively through transitions.

The workflow—from building the plot to rendering and exporting a GIF—also highlights an important insight: effective visualization includes consideration of output format and audience consumption, not just code execution.

The mapping sections expand visualization into the spatial domain using rnaturalearth and sf. These examples reinforce that maps are not simply images; they are the result of:

Geometric data structures
Data joins
Coordinate reference systems
Layered styling decisions
Controlled zoom levels

By progressing from world maps to regional and state-level views, the notebook demonstrates scalability in spatial visualization. Once the workflow—geometry + attribute data + styling—is understood, the same structure can be applied across different geographic scales.

Spatial visualization becomes a form of structured storytelling rather than decorative cartography.

Across all sections, the underlying theme is intentional design. Advanced visualization is not about adding features—it is about managing complexity, guiding interpretation, and choosing the appropriate tool (static, interactive, animated, or spatial) for the analytical question being asked.

Tip

A resource that connects directly to this “grammar + layering” mindset is the online ggplot2 book. I like it because it explains not only how to code plots, but why the structure works—which makes it easier to troubleshoot and improve designs over time.

Reference: ggplot2 book (online)

7. Personal Challenge and Growth

Prompt:

What seems to be the challenge, if any, for you when you try to master data visualization skills? How may you be able to overcome the challenge?

Response:

#| echo: false
library(gt)
library(tibble)

reflection <- tibble(
  Challenge = c(
    "Managing Complexity",
    "Choosing the Right Tool",
    "Balancing Design and Analysis"
  ),
  Difficulty = c(
    "Too many variables can create clutter",
    "Uncertainty about which visualization format to use",
    "Focusing on code instead of communication"
  ),
  Solution = c(
    "Use intentional aesthetic mappings and simplify",
    "Start with the research question, then choose format",
    "Prioritize clarity and audience understanding"
  )
)

reflection %>%
  gt() %>%
  tab_header(
    title = "Challenges in Mastering Data Visualization",
    subtitle = "Reflection on Skill Development"
  )

Challenge	Difficulty	Solution
Challenges in Mastering Data Visualization
Reflection on Skill Development
Managing Complexity	Too many variables can create clutter	Use intentional aesthetic mappings and simplify
Choosing the Right Tool	Uncertainty about which visualization format to use	Start with the research question, then choose format
Balancing Design and Analysis	Focusing on code instead of communication	Prioritize clarity and audience understanding

Footnotes

pivot_longer() gathers multiple columns into two columns.↩︎
nest() creates grouped mini data frames stored in a list column.↩︎
tidy() converts model output into a clean data frame for interpretation and visualization.↩︎
This distinction between analysis-friendly structure and reader-friendly presentation was one of the most important conceptual takeaways from Step 4.↩︎

Introduction

1. Thomas Lin Pedersen – ggplot2 Workshop (Part 2)

Prompt: Summarize what you learned from the presentation by Thomas Lin Pedersen on “ggplot2 workshop: Part 2” in Step 1.

Response:

Core Components of ggplot2

Moving Beyond Core ggplot2

Rethinking Chart Types

2. ggplot2 Scales and Transformations

Prompt:

Response:

Summary of the Video

Connection to Grammar of Graphics

Demonstrating the Effect of a Scale Transformation

3. David Robinson – Impromptu Visualization

Prompt:

Response:

4. gt and gtExtra Packages

Prompt:

Response:

What I Liked Most

Example: A Reader-First Table Using gt

How Tables Complement Charts

5. Cédrìc Scherer – Tips and Tricks

Prompt:

Response:

1. Elevating Typography with ggtext

Why I liked this tip

Example

2. Improving ggplot Defaults (Small Tweaks, Big Impact)

Why I liked this tip

Example

3. Using patchwork for Seamless Composition

Why I liked this tip

Example

Broader Takeaway: Design as Iteration and Community

6. Advanced Visualization Notebook

Prompt:

Response:

Advanced Data Visualization Reference Guide

World Mapping (rnaturalearth)

Mapping US Cities

1. High-Dimensional Thinking: Managing Multiple Variables

2. Moving Beyond Static Charts: Interactivity and Animation

Interactivity (plotly)

Animation (gganimate + gifski)

7. Personal Challenge and Growth

Prompt:

Response:

Footnotes

Example: A Reader-First Table Using `gt`

1. Elevating Typography with `ggtext`

3. Using `patchwork` for Seamless Composition