General outline for weeks 2 – 5

  • Week 2: Principles of data visualisation
  • Week 3: Grammar of graphics; aesthetics and attributes
  • Week 4: Major visualisation tools
  • Week 5: Customising visualisations (scales, themes, and labels)

Checking up on the formative assessment

  • Identify a data set for formative assessment.
  • Work on visualisation for your data set.

Objectives for today

  • Appreciate the wide range of different visualisation tools
  • Understand and explain what “geometries” are.
  • Differentiate three major visualisation tools, when to use them, and how to create them in ggplot2.
  • Explain why barplots are problematic for certain data types.
  • Use data wrangling to adjust data to your visualisation goals.

Download exercises from Week 4 folder on NOW and move them into your R-project directory.

Warm-up

From the top of your head, what types of data visualisations have you seen in journal articles / news / social media etc.?

The list goes on forever

Basic Charts

  • Bar Chart: compares quantities across categories.
  • Column Chart: vertical version of a bar chart.
  • Line Chart: shows trends over time.
  • Pie Chart: shows proportions of a whole.
  • Donut Chart: like a pie chart but with a blank center.
  • Area Chart: like a line chart but filled below the line.

Statistical Visualisations

  • Histogram: shows distribution of numerical data.
  • Box Plot (Box-and-Whisker Plot): displays data spread and outliers.
  • Violin Plot: combines box plot and density plot.
  • Scatter Plot: shows relationships between two variables.
  • Bubble Chart: scatter plot with a third variable represented by bubble size.

Multivariate Visualisations

  • Heatmap: uses color to represent values in a matrix.
  • Pair Plot: shows scatter plots for all variable combinations.
  • Parallel Coordinates Plot: compares many variables across observations.
  • Radar Chart (Spider Chart): compares multiple variables on axes from a central point.

Geospatial Visualisations

  • Choropleth Map: uses color to show data across geographic regions.
  • Dot Map: uses dots to show data density.
  • Heat Map (Geospatial): shows intensity of data over a map.
  • Flow Map: shows movement between locations.

Hierarchical & Network Visualisations

  • Tree Map: nested rectangles showing hierarchical data.
  • Sunburst Chart: radial version of a tree map.
  • Dendrogram: tree diagram showing hierarchical clustering.
  • Network Graph: shows relationships between entities (nodes and edges).
  • Sankey Diagram: shows flow and proportions between stages.

Temporal Visualisations

  • Gantt Chart: shows project timelines.
  • Timeline: displays events in chronological order.
  • Calendar Heatmap: shows data across days/weeks/months.

Specialised Visualisations

  • Word Cloud: visualises text data by word frequency.
  • Contour Plot: shows 3D surface on 2D plane.
  • Streamgraph: variation of stacked area chart for time series.
  • Ternary Plot: shows proportions of three variables that sum to a constant.

Here with some more system

Major visualisation tools

Geometries

  • choice depends on visualisation goals, number and type of variables (and your subject domain)
  • what kind of visual element you want to draw?
  • a geometry (or geom) is the type of plot layer that tells R how to represent the data visually.

Think of it like this:

  • you have data (e.g., heights of children, test scores, etc.).
  • you want to show it visually.
  • the geom decides how it appears: as points, lines, bars, boxes, etc.
  • geoms are your drawing toolkit

Geometries

  • Geometries (geom_) control visual encoding of aesthetics layer
  • ~50 geometries: geom_ are part of ggplot2 (below)
  • more geoms in other packages such as ggdist, ggbeeswarm, and ggridges
  • many can be combined
 [1] abline            area              bar               bin_2d           
 [5] bin2d             blank             boxplot           col              
 [9] column            contour           contour_filled    count            
[13] crossbar          curve             density           density_2d       
[17] density_2d_filled density2d         density2d_filled  dotplot          
[21] errorbar          errorbarh         freqpoly          function         
[25] hex               histogram         hline             jitter           
[29] label             line              linerange         map              
[33] path              point             pointrange        polygon          
[37] qq                qq_line           quantile          raster           
[41] rect              ribbon            rug               segment          
[45] sf                sf_label          sf_text           smooth           
[49] spoke             step              text              tile             
[53] violin            vline            

Common geometries and their uses

Geometry Function What it shows
geom_point() Scatter plot Relationship between two variables
geom_line() Line plot Trends over time or ordered categories
geom_bar() Bar chart Counts or values for categories
geom_histogram() Histogram Distribution of a single variable
geom_boxplot() Boxplot Summary of distribution (median, quartiles, outliers)
geom_text() Text labels Add labels to points or bars

Same data, different geometries

Major visualisation types

  • univariate distributions
  • bivariate distributions
  • group comparisons

Univariate distribution

  • function: distribution of values
  • variable type: continuous or discrete
  • examples: histograms, density plots, bar plots, rug

ggplot(d_spellname, aes(x = rt)) +
  geom_histogram() 

Univariate distribution

Complete RMarkdown document 1_univariate_viz.Rmd

Bivariate distribution

  • function: relationship between two variables
  • variable type: typically continuous
  • examples: scatter plot, time series

Bivariate distribution

  • function: relationship between two variables
  • variable type: typically continuous
  • examples: scatter plot, time series

Bivariate distribution

  • function: relationship between two variables
  • variable type: typically continuous
  • examples: scatter plot, time series

Bivariate distribution

  • function: relationship between two variables
  • variable type: typically continuous
  • examples: scatter plot, time series

Bivariate distribution

  • function: relationship between two variables
  • variable type: typically continuous
  • examples: scatter plot, time series

Bivariate distribution

  • function: relationship between two variables
  • variable type: typically continuous
  • examples: scatter plot, time series

Bivariate distribution

  • function: relationship between two variables
  • variable type: typically continuous
  • examples: scatter plot, time series

Bivariate distributions: interactivity

Any ggplot2 object can easily be transformed into an interactive visualisation using the plotly package.

# Load plotly package
library(plotly)

# Create ggplot plot and save in `plot`
plot <- ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_point()

# Create an interactive version of `plot` using `ggplotly`
ggplotly(plot)

Bivariate distribution

Complete RMarkdown document 2_bivariate_viz.Rmd

Groups comparison

  • function: distribution of values for two or more groups (often closely tied to statistical descriptions)
  • variable type: continuous
  • examples: (jitter) dots, box plot, violin plot, beeswarm plots, barplot (pie chart), dynamite plots

Dynamite plot and its pitfalls

What problems can you think of?

What’s possibly problematic with this visualisation?

Dynamite plot and its pitfalls

Visualisation suggests \(\dots\)

  • normal distribution
  • same number of observations in each group
  • presence of data where there are none?
  • absence of data above the errorbars

Now watch what happens to the y-axis when we use dots.

Dynamite plots: bars

Alternative geometries: points

Alternative geometries: jittered points

Alternative geometries: points and errorbars

Alternative geometries: box-and-whiskers plot

Alternative geometries: box-and-whiskers plot

Alternative geometries: box-and-whiskers plot

Raincloud plots

library(ggdist)

ggplot(d_spellname, aes(x = modality, y = rt)) +
  # half violin (the "cloud")
  stat_halfeye(
    adjust = 0.45,        # smoothness
    justification = -0.2, # shift left/right
    .width = 0,           # no interval bars
    point_colour = NA) +
  # boxplot
  geom_boxplot(
    width = 0.15,
    outlier.shape = NA,
    alpha = 0.5) +
  # jittered points (the "rain")
  geom_jitter(
    width = 0.1,
    alpha = 0.5,
    size = 1) +
  coord_flip() # horizontal orientation 

Group comparisons

Complete RMarkdown document 3_group_comparisons.Rmd

Crash course in data wrangling

  • Data come in various formats.
  • For some visualisations, the format of the data needs to be changed.

You’ve already seen these:

# Count number of identical observations 
count(data, group)

# Calculate descriptive statistics
summarise(data, mean = mean(rt))

# Remove rows with missing data (i.e. NA) 
drop_na()

These wrangling tasks can be managed in ggplot and tidyverse, more specifically dplyr.

Crash course in data wrangling

  • dplyr has many useful functions for data wrangling.
  • Knowing the following functions will give you a lot of flexibility.
# Transforms dataframes into a long format
pivot_longer(data, cols)

# Transforms dataframes into a wide format
pivot_wider(data, names_from, values_from)

# Selects and removes variables
select(data, var1, var2)

# Retains and removes observations
filter(data, condition)

# Creates new variables
mutate(data, new_var = old_var)

Complete RMarkdown document 4_data_wrangling.Rmd

Homework

  • Complete RMarkdown document 4_data_wrangling.Rmd
  • Bring in data and anything you’ve already done for the formative assessment.
  • Complete recommended reading

Reading

References

Andrews, Mark. 2021. Doing Data Science in R: An Introduction for Social Scientists. SAGE Publications Ltd.

Tukey, John W. 1977. Exploratory Data Analysis. Vol. 2.

Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer.

Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc.