[conflicted] Will prefer palmerpenguins::penguins over any other package.
M04 - Advanced Data Visualization Overview
Introduction
This reflection summarizes what I learned throughout the Advanced Data Visualization module. Each section below includes the original prompt followed directly by my response. My goal is not only to summarize content, but to reflect on how my understanding of data visualization has evolved.
1. Thomas Lin Pedersen – ggplot2 Workshop (Part 2)
Prompt: Summarize what you learned from the presentation by Thomas Lin Pedersen on “ggplot2 workshop: Part 2” in Step 1.
Response:
Thomas Lin Pedersen’s workshop deepened my understanding of ggplot2 beyond simply building plots amd it reinforced why ggplot2 works the way it does. The most important takeaway for me was that the grammar of graphics is not just a plotting framework.
The grammar of graphics is not just a plotting framework — it is a way of thinking.
Once you understand the theoretical backbone, you stop memorizing functions and start reasoning through structure. Visualization is no longer about decoration. It becomes about constructing meaning layer by layer.
Once you internalize this structure, you stop asking “Which function makes this chart?” and start asking “How is this constructed grammatically?”
Core Components of ggplot2
Pedersen emphasized that ggplot2 is built around composable elements that work together systematically:
The information being visualized.
How variables are mapped to visual properties.
The geometric objects that represent the data.
Statistical transformations applied before rendering.
How data values are translated into visual space.
The spatial system that determines layout.
Stylistic elements that control appearance.
What stood out to me is that once you internalize this structure, you can deconstruct almost any visualization and understand how it was built — or how it should have been built.
“When you understand the grammar, you don’t need to memorize everything — you understand the logic behind it.”
Moving Beyond Core ggplot2
A major focus of Part 2 was the idea of the extended universe of ggplot2. Instead of endlessly expanding the core package, ggplot2 exposes extension mechanisms that allow other packages to build on top of the grammar. This design choice makes ggplot2 scalable without becoming unmanageable.
Three extension ideas that stood out:
Plot Composition (patchwork)
Instead of hacking facets, plot composition can be done programmatically and reproducibly.Animation (gganimate)
Animation extends the grammar rather than replacing it. Transitions become part of the grammar itself.Annotation & Network Extensions (ggforce, ggrepel, ggraph)
These packages expand storytelling power while still respecting the grammar structure.
This reinforced that ggplot2 is not static — it is a framework designed for extensibility.
Rethinking Chart Types
One concept that genuinely changed how I think about visualization was the idea that chart types are not fixed categories. For example, I came to understand that a pie chart is essentially a stacked bar chart transformed into polar coordinates.
2. ggplot2 Scales and Transformations
Prompt:
In Step 2, a short video on “ggplot2 Scales and Transformations - Data Communication / Data Visualization | Video (13:36)” was introduced. (1) Summarize it and (2) discuss how it fits with the grammar of graphics you learned in Step 1.
Response:
Scales translate raw data into visual meaning.
Summary of the Video
The video explored how scales operate within ggplot2 and how they determine the way raw data becomes visible on a graph. Every mapped aesthetic, whether x, y, color, fill, or size, automatically has a scale even when we do not explicitly define one. The presenter emphasized the distinction between continuous and discrete scales and showed how ggplot2 allows us to modify them using scale_*_*() functions.
Through examples, he demonstrated how scales can relabel values, such as converting 0.06 into 6%, adjust limits to control displayed range, manually assign specific colors to categories, apply transformations such as log or reverse scaling, and bin continuous variables into grouped categories. The central insight was that scales are responsible for transforming raw data into visual representations. They determine not just labels, but how magnitude, spacing, and structure appear to the viewer.
For example, formatting percentages directly within the scale function makes the transformation explicit in code:
scale_y_continuous(
labels = scales::percent,
limits = c(0, 0.1)
)Connection to Grammar of Graphics
This video clearly connected it to the grammar of graphics from Step 1. In the grammar framework, we map variables to aesthetics. However, scales determine how those mapped aesthetics are rendered and interpreted.
| Grammar Component | Role | Example from Video |
|---|---|---|
| Aesthetics | Map variables to axes or color | x, y, fill |
| Scales | Translate mapped values | percent labels |
| Transformations | Adjust structural perception | log scaling |
Transformations such as scale_y_log10() or reversing an axis with scale_x_reverse() do not change the underlying dataset. Instead, they alter how the data is visually structured and interpreted. That distinction clarified something important for me: scales are not cosmetic adjustments. They are structural components of the grammar that influence how magnitude, grouping, and relationships are perceived.
Understanding this helped me see that applying a transformation is not simply “formatting” a chart — it reshapes how the audience understands patterns and variation. The data remains constant, but the interpretation can shift significantly depending on how the scale is defined.
Demonstrating the Effect of a Scale Transformation
Transformations such as scale_y_log10() do not change the underlying dataset. Instead, they alter how the data is visually structured and interpreted. That distinction clarified something important for me: scales are not cosmetic adjustments — they are structural components of the grammar that influence how magnitude and relationships are perceived.
To illustrate this, the same dataset is plotted twice below. The only difference is the y-axis scale.
3. David Robinson – Impromptu Visualization
Prompt:
In Step 3, David Robinson’s video demonstrates impromptu data visualization. List three new functions that were not taught previously but were deemed helpful for you in the future. Briefly explain what you can do with each of the functions.
Response:
Below are three functions I found especially useful:
| Function | What It Does | Why It Matters |
|---|---|---|
tidyr::pivot_longer()1 |
Reshapes data from wide to long so multiple related columns can be analyzed using one consistent workflow. | Makes it easier to facet, compare metrics, and avoid rewriting the same plot and summary code for each variable. |
tidyr::nest()2 |
Splits a dataset into grouped mini datasets stored inside a list column. | Makes repeated analysis scalable, especially when you want to run the same model or summary separately for each group. |
broom::tidy()3 |
Converts model output into a tidy data frame with terms, estimates, and p values. | Turns statistical results into something you can filter, compare, and visualize directly in a tidy workflow. |
As shown in Table Table 1, these tools expand my ability to explore data quickly and flexibly.
4. gt and gtExtra Packages
Prompt:
What did you like about the gt and gtExtra packages demonstrated in Step 4? How do they complement data visualization using charts that we have spent so much time on so far? Under what circumstances would you prefer to use tables over charts in visualization data?
Response:
Tables are not “less visual” than charts — they are visual by design when structured intentionally.
What I appreciated most about the gt and gtExtra packages in Step 4 is that they treat tables as a deliberate communication tool rather than a secondary output. The discussion emphasized that tidy data is ideal for analysis, but presentation tables often require reshaping and design decisions so that comparisons become intuitive for the reader.4
A well-designed table reduces cognitive load by guiding comparison through alignment, spacing, grouping, and emphasis.
What I Liked Most
The demonstration showed how removing all dividers creates “numbers on a white canvas,” which forces the reader to work harder. Subtle header separation and intentional spacing make it clear what should be compared.
Right-aligning numbers supports place-value comparison because decimals line up vertically. Left-aligning text supports fast scanning. This is not cosmetic — it directly affects readability.
The discussion about decimal places reinforced that too much precision can imply false accuracy. Formatting numbers appropriately ensures the table communicates meaning instead of noise.
Using logic-based styling (e.g., highlighting negative values) allows the data itself to guide attention. This transforms a table into a structured narrative rather than a static output.
Example: A Reader-First Table Using gt
gt table designed to guide comparison and control precision
| Top MPG Vehicles (mtcars) | ||||
| Structured for fast comparison | ||||
| MPG | Cyl | HP | Weight | |
|---|---|---|---|---|
| Toyota Corolla | 33.9 | 4 | 65.0 | 1.8 |
| Fiat 128 | 32.4 | 4 | 66.0 | 2.2 |
| Honda Civic | 30.4 | 4 | 52.0 | 1.6 |
| Lotus Europa | 30.4 | 4 | 113.0 | 1.5 |
| Fiat X1-9 | 27.3 | 4 | 66.0 | 1.9 |
| Porsche 914-2 | 26.0 | 4 | 91.0 | 2.1 |
| Merc 240D | 24.4 | 4 | 62.0 | 3.2 |
| Datsun 710 | 22.8 | 4 | 93.0 | 2.3 |
As shown in Table Table 2, the formatting decisions such as: alignment, controlled decimals, and subtle highlighting, can make comparison intuitive without overwhelming the reader.
How Tables Complement Charts
I would prefer tables over charts when precision matters more than pattern recognition. This is especially true in auditing, reporting, or decision-making contexts where exact values must be verified.
Step 4 changed how I think about tables. Instead of viewing them as secondary to charts, I now see them as a complementary visualization format. When thoughtfully structured using gt, tables can communicate hierarchy, emphasis, and comparison just as effectively as charts — and in some cases, more effectively when exact values matter.
Rather than asking “chart or table?”, I now think:
Which format best supports the decision the reader needs to make?
5. Cédrìc Scherer – Tips and Tricks
Prompt:
Pick three tips and tricks you learned from Data Visualization guru Cédrìc Scherer in Step 5 and explain why you liked them.
Response:
Cédrìc Scherer’s talk reframed ggplot2 for me. Instead of presenting it as a plotting tool, he positioned it as a design system built on the grammar of graphics, one that allows you to build visualizations intentionally, layer by layer, by controlling even the smallest details.
What stood out most was his reminder that impressive visualizations are not the result of magic, they are the result of deliberate code and thoughtful refinement. Below are three specific tips that I found especially impactful.
1. Elevating Typography with ggtext
One of Scherer’s most practical and powerful tricks was using the ggtext package to render Markdown and HTML inside ggplot elements. In his example of penguin bill dimensions, he corrected the non-italicized genus name in the title by using Markdown syntax and converting the theme element to element_markdown().
This detail may seem small, but it completely changes the professionalism of a plot.
Why I liked this tip
- It reinforces that text is part of the design — not just decoration.
- It allows scientific accuracy (e.g., italicizing genus names).
- It creates visual hierarchy through bolding, coloring, and formatting.
- It moves plots from “default” to publication-ready.
Example
This aligns with Scherer’s broader philosophy on his website: thoughtful typography and labeling are essential components of effective visual storytelling.
2. Improving ggplot Defaults (Small Tweaks, Big Impact)
Another insight I appreciated was Scherer’s focus on refining ggplot’s default settings. He demonstrated how small design changes — such as aligning titles properly, adjusting legend placement, removing unnecessary axis expansion, or modifying margins — can dramatically improve visual balance.
For example, he explained that by default, plot titles align with the panel, not the outer margin. Changing plot.title.position = "plot" ensures true alignment with the margin.
Why I liked this tip
- It highlights that design lives in the margins.
- It shows that polish comes from iteration.
- It reinforces intentional alignment and spatial balance.
- It encourages thinking beyond defaults.
Example
This tip reflects Scherer’s emphasis on incremental improvement. As he demonstrated, even subtle spacing and alignment adjustments significantly elevate the visual outcome.
3. Using patchwork for Seamless Composition
Scherer also highlighted the patchwork package as a powerful way to combine multiple plots. Rather than relying on complex layout tools, patchwork uses intuitive operators like + and / to combine plots.
This approach reflects his broader idea that visualization should be modular and compositional.
Why I liked this tip
- It simplifies multi-panel storytelling.
- It keeps layout reproducible.
- It aligns grids automatically.
- It enables dashboard-style visualizations.
Example
This layered composition approach mirrors the complex multi-panel visualizations Scherer showcased, such as energy dashboards and geofaceted layouts.
Broader Takeaway: Design as Iteration and Community
Beyond individual tricks, what resonated most with me was Scherer’s mindset. He emphasized:
- Learning from open-source code (especially through TidyTuesday)
- Experimenting beyond traditional chart types
- Iterating layer by layer
- Treating visualization as both analytical and aesthetic
This philosophy is reinforced on his personal website (https://www.cedricscherer.com/top/about/), where he describes combining scientific rigor with design principles. His work demonstrates that visualization is not merely about presenting results, but about crafting clarity through thoughtful decisions.
On his data visualization portfolio page (https://www.cedricscherer.com/top/dataviz/), this mindset becomes tangible. The projects illustrate how typography, spacing, annotation, layout, and composition all work together to guide interpretation. None of the visualizations rely on default settings — they reflect deliberate refinement and design awareness.
What changed for me was the shift from simply making plots to intentionally designing visual communication. That mindset, thinking about alignment, hierarchy, clarity, and refinement, was the most valuable insight I gained from this session.
6. Advanced Visualization Notebook
Prompt:
In Step 6, you will find my advanced visualization notebook. Run the codes one by one and digest as much as you can. What are you most impressed with about the visualization approach or tools demonstrated in the codebook?
Response:
What impressed me most about the advanced visualization codebook is how it shows the workflow of building meaning—not just “making a plot.” The notebook is structured like a toolbox that moves from exploration (finding patterns) to communication (designing charts, animations, maps, and tables that tell a story clearly).
Advanced Data Visualization Reference Guide
1. Setup and Data Preparation
- Loading Packages
- Loading and Preparing Data
- Preparing Data for ANOVA Framework
2. Correlation Visualizations
- GGally::ggcorr()
- corrplot::corrplot()
- Basic Correlation Plots
- Mixed Correlation Plots
- Basic Correlation Plots
- psych::pairs.panels()
3. Barplots with Error Bars (APA 7 Style)
- Simple Implementation
- Complete Customization
4. Scatterplots and Multi-Variable Visualizations
- Scatter Plot (5 Variables)
- Standard Version
- Log-Transformed Version
- Standard Version
- geom_point (3 Variables)
- geom_count (3 Variables)
- Bubble Charts (4–5 Variables)
5. Interactive Visualizations (Plotly)
- Adding Country Names to Tooltips
- Splitting Continents into Panels
- Customizing Hover Text
- Interactive Bubble Charts
6. Line Plots (Multiple Variables)
- Highlighting Two Countries
- Highlighting Multiple Countries by Continent
7. Animation
- Required Packages: gganimate + gifski
- Creating an Animated Bubble Chart
- Rendering and Saving Animations
8. Geospatial Mapping
World Mapping (rnaturalearth)
- Saving Map Data
- World Map
- Africa Map
- US Map
- States
- States
- Brazil: São Paulo Example
Mapping US Cities
- Required Mapping Packages
- US State Map
- Binned Choropleth Map
9. Axis Customization
- Adjusting X and Y Scales
- Log Transformations
- Formatting Labels
10. Publication-Ready Tables (GT Package)
- Continuous Data Tables
- Categorical Data Tables
- Plain gt()
- Custom Table Formatting
11. Enhanced Tables with gtExtra
- Basic Plotting with Iris
- mtcars Examples
- gt_plt_summary()
- gt_plt_dist()
- gt_plt_sparkline()
- Density Distributions
- Histogram Distributions
- gt_plt_summary()
- Survey Data Examples
12. References
- Course materials
- Package documentation
- Webinar resources
Even when some chunks feel “beyond the scope,” the big takeaway is that R/ggplot2 lets you scale up from simple charts to professional outputs by stacking small decisions: data → mapping → layers → scales → annotation → layout → export.
1. High-Dimensional Thinking: Managing Multiple Variables
One of the most important skills demonstrated in the codebook is how to visualize multiple variables simultaneously without overwhelming the viewer. Sections such as “Scatter Plot (5 Variables),” “geom_point (3 Variables),” and “Interactive Bubble Chart (4–5 Variables)” illustrate how complexity can remain readable when visual mappings are intentional.
Rather than adding variables randomly, the notebook layers meaning strategically: - Position encodes primary relationships. - Color differentiates categories. - Size represents magnitude. - Faceting separates groups for clarity.
The key takeaway is that complexity itself is not the problem—poor structure is. When aesthetic mappings are purposeful and supported by clean labeling and thoughtful scaling, high-dimensional charts remain interpretable rather than chaotic.
2. Moving Beyond Static Charts: Interactivity and Animation
The transition from static to dynamic visualization represents a conceptual shift in how data is communicated.
Interactivity (plotly)
Interactive bubble charts demonstrate how tooltips and hover information can replace cluttered text labels. Instead of forcing all information onto the canvas, interaction allows users to explore details selectively. Filtering and panel splitting further reduce visual overload while preserving information richness.
This approach reframes visualization as exploration rather than passive viewing.
Animation (gganimate + gifski)
Animation introduces time as motion rather than compression. Instead of squeezing temporal data into a crowded static image, gganimate reveals change progressively through transitions.
The workflow—from building the plot to rendering and exporting a GIF—also highlights an important insight: effective visualization includes consideration of output format and audience consumption, not just code execution.
The mapping sections expand visualization into the spatial domain using rnaturalearth and sf. These examples reinforce that maps are not simply images; they are the result of:
- Geometric data structures
- Data joins
- Coordinate reference systems
- Layered styling decisions
- Controlled zoom levels
By progressing from world maps to regional and state-level views, the notebook demonstrates scalability in spatial visualization. Once the workflow—geometry + attribute data + styling—is understood, the same structure can be applied across different geographic scales.
Spatial visualization becomes a form of structured storytelling rather than decorative cartography.
Across all sections, the underlying theme is intentional design. Advanced visualization is not about adding features—it is about managing complexity, guiding interpretation, and choosing the appropriate tool (static, interactive, animated, or spatial) for the analytical question being asked.
A resource that connects directly to this “grammar + layering” mindset is the online ggplot2 book. I like it because it explains not only how to code plots, but why the structure works—which makes it easier to troubleshoot and improve designs over time.
- Reference: ggplot2 book (online)
7. Personal Challenge and Growth
Prompt:
What seems to be the challenge, if any, for you when you try to master data visualization skills? How may you be able to overcome the challenge?
Response:
#| echo: false
library(gt)
library(tibble)
reflection <- tibble(
Challenge = c(
"Managing Complexity",
"Choosing the Right Tool",
"Balancing Design and Analysis"
),
Difficulty = c(
"Too many variables can create clutter",
"Uncertainty about which visualization format to use",
"Focusing on code instead of communication"
),
Solution = c(
"Use intentional aesthetic mappings and simplify",
"Start with the research question, then choose format",
"Prioritize clarity and audience understanding"
)
)
reflection %>%
gt() %>%
tab_header(
title = "Challenges in Mastering Data Visualization",
subtitle = "Reflection on Skill Development"
)| Challenges in Mastering Data Visualization | ||
| Reflection on Skill Development | ||
| Challenge | Difficulty | Solution |
|---|---|---|
| Managing Complexity | Too many variables can create clutter | Use intentional aesthetic mappings and simplify |
| Choosing the Right Tool | Uncertainty about which visualization format to use | Start with the research question, then choose format |
| Balancing Design and Analysis | Focusing on code instead of communication | Prioritize clarity and audience understanding |
Footnotes
pivot_longer()gathers multiple columns into two columns.↩︎nest()creates grouped mini data frames stored in a list column.↩︎tidy()converts model output into a clean data frame for interpretation and visualization.↩︎This distinction between analysis-friendly structure and reader-friendly presentation was one of the most important conceptual takeaways from Step 4.↩︎