Download repository

  • Download: https://github.com/jensroes/ntu-data-viz
  • Click on: Code > Download ZIP > unzip directory on your machine.
  • Open project by double-clicking on ntu-data-viz.Rproj
  • exercises/: exercises associated with each topic
  • slides/slides.Rmd: these slides in R markdown format (.html format provided as well)
  • data/: scripts read data from here

Example data set: Blomkvist et al. (2017)

  • Age-related changes in cognitive performance through adolescence and adulthood in a real-world task.

Real-world task: StarCraft 2

  • Real-time strategy video game
  • Nintendo Wii Balance Board

Example data set: Blomkvist et al. (2017)

blomkvist <- read_csv("../data/blomkvist.csv")
glimpse(blomkvist)
Rows: 267
Columns: 10
$ id         <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, …
$ sex        <chr> "male", "female", "female", "female", "…
$ age        <dbl> 84, 37, 62, 85, 73, 65, 30, 49, 83, 58,…
$ medicine   <dbl> 8, 1, 0, 4, 5, 0, 0, 0, 11, 0, 0, 4, 3,…
$ meds_cat   <chr> "a lot", "little", "none", "few", "a lo…
$ smoker     <chr> "former", "no", "yes", "former", "forme…
$ rt_hand_d  <dbl> 701.6667, 470.6667, 638.6667, 708.0000,…
$ rt_hand_nd <dbl> 780.3333, 497.0000, 638.0000, 638.6667,…
$ rt_foot_d  <dbl> 1009.0000, 737.6667, 878.0000, 902.3333…
$ rt_foot_nd <dbl> 962.6667, 692.3333, 786.0000, 1373.6667…
  • Average reaction time (rt) of dominant (_d) or non-dominant (_nd) hand or foot in msecs
  • medicine: number of drugs used daily

Outline

  • principles of data visualisation
  • grammar of graphics
  • aesthetics and attributes
  • geometries
  • major tools of data visualisation
  • cosmetics I
  • resources
  • cosmetics II (homework)

What is data visualisation?

  • graphical representation of data
  • graphical data analysis (stats): what do we want to know?
  • communication and perception (design): what do we want to communicate?
  • exploratory plots: get to know data (small specialist audience)
  • explanatory plots: inform and persuade (wide audience)
  • think about your audience

Exploring data

blomkvist <- read_csv("../data/blomkvist.csv")
glimpse(blomkvist)
Rows: 267
Columns: 10
$ id         <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, …
$ sex        <chr> "male", "female", "female", "female", "male", "male", "fema…
$ age        <dbl> 84, 37, 62, 85, 73, 65, 30, 49, 83, 58, 25, 88, 62, 88, 27,…
$ medicine   <dbl> 8, 1, 0, 4, 5, 0, 0, 0, 11, 0, 0, 4, 3, 8, 1, 3, 4, 1, 1, 0…
$ meds_cat   <chr> "a lot", "little", "none", "few", "a lot", "none", "none", …
$ smoker     <chr> "former", "no", "yes", "former", "former", "no", "no", "for…
$ rt_hand_d  <dbl> 701.6667, 470.6667, 638.6667, 708.0000, 607.3333, 541.6667,…
$ rt_hand_nd <dbl> 780.3333, 497.0000, 638.0000, 638.6667, 652.0000, 498.6667,…
$ rt_foot_d  <dbl> 1009.0000, 737.6667, 878.0000, 902.3333, 923.0000, 686.6667…
$ rt_foot_nd <dbl> 962.6667, 692.3333, 786.0000, 1373.6667, 805.0000, 599.6667…

Building up a plot

ggplot(data = blomkvist, 
       mapping = aes(x = age, 
                     y = rt_hand_d)) 

Building up a plot

ggplot(data = blomkvist, 
       mapping = aes(x = age, 
                     y = rt_hand_d)) +
  geom_point()  

Building up a plot

ggplot(data = blomkvist, 
       mapping = aes(x = age, 
                     y = rt_hand_d)) +
  geom_point() +
  scale_y_log10()

Building up a plot

ggplot(data = blomkvist, 
       mapping = aes(x = age, 
                     y = rt_hand_d)) +
  geom_point() +
  scale_y_log10() +
  stat_smooth(method = "lm") 

Building up a plot

ggplot(data = blomkvist, 
       mapping = aes(x = age, 
                     y = rt_hand_d)) +
  geom_point() +
  scale_y_log10() +
  stat_smooth(method = "lm",
              formula = y ~ x + I(x^2)) 

Building up a plot

ggplot(data = blomkvist, 
       mapping = aes(x = age, 
                     y = rt_hand_d,
                     colour = smoker)) +
  geom_point() +
  scale_y_log10() +
  stat_smooth(method = "lm",
              formula = y ~ x + I(x^2)) 

Building up a plot

ggplot(data = blomkvist, 
       mapping = aes(x = age, 
                     y = rt_hand_d, 
                     colour = smoker))  +
  geom_point(alpha = .25) +
  scale_y_log10(labels = scales::comma) +
  stat_smooth(method = "lm", formula = y ~ x + I(x^2), se = FALSE, fullrange = TRUE) +
  ggthemes::theme_clean() +
  ggthemes::scale_color_colorblind() +
  labs(y = "Average reaction time of dominant\nhand (in msecs)", 
       x = "Age (in years)",
       caption = "Data published in\nBlomkvist et al. (2017)",
       colour = "Smoker") +
  theme(legend.position = "top",
        legend.justification = "right",
        axis.title = element_text(hjust = 0))

Explanatory plot

Exercise 1

creating scatterplots in R

Open script exercises/1_scatterplots.R

Why data visualisation?

“[data visualization] forces us to notice what we never expected to see.” (Tukey 1977)

  • exploring structures in the data
  • relationship between variables
  • distribution of data
  • develop an understanding of patterns (beyond descriptives)
  • selecting appropriate stats

Anscombe’s quartet

Anscombe (1973) and Tufte (1989)

x
y
y ~ x
Data set Mean SD Mean SD Correlation Intercept Slope
1 9 3.32 7.5 2.03 0.82 3 0.5
2 9 3.32 7.5 2.03 0.82 3 0.5
3 9 3.32 7.5 2.03 0.82 3 0.5
4 9 3.32 7.5 2.03 0.82 3 0.5

Anscombe’s quartet

Anscombe’s quartet

The datasaurus dozen

Matejka and Fitzmaurice (2017): see link

Principles of data visualisation

  • no “one fits all” method
  • some methods are more informative than others
  • maximise what we can learn from data
  • going beyond summary statistics
  • descriptive summary statistics may conceal / obscure important patterns
  • visualisation helps us to understand patterns, structures, relationships
  • prevent wrong conclusions about data / theory

Basic principles

Hartwig and Dearing (1979):

  • skepticism: any visualization might obscure or misrepresent data
  • openness: there might be patterns and structures that we were not expecting

Tufte (1983):

  • above all else show the data
  • avoid distorting what the data have to say
  • present many numbers in a small space
  • encourage the eye to compare different pieces of data
  • reveal data at several levels of detail, from broad overview to fine structures

Grammar of graphics

Grammar of graphics

  • “gg” in ggplot2 refers to grammar of graphics (Wickham 2016, 2010)
  • framework for data visualisation
  • higher-level plotting system compared to base R functions (e.g. plot(), hist())
  • complex visualisations can be creased with a minimal amount of code
  • integration of statistical information
  • base R is great for quick and basic plots but is limited

Grammar of graphics

Wilkinson (1999)

  • property 1: graphics consist of distinct layers of grammatical elements (data, aesthetics, geometries)
  • property 2: graphics are built around mappings that determine how data, aesthetics and geometries are combined.
  • e.g. similar ingredients (1) can be combined following different recipe (2)
  • grammatical elements are organised as layers
  • underlying grammar controls how graphics are combined
  • system of rules for mapping variables to graphical properties

Obligatory grammatical elements

  • data: the data you want to visualise indicated as ggplot(data = ...)
  • aesthetics: mapping of data to graphic properties (axes, size, colour) indicated as mapping = aes()
  • geometries: visual elements encoding the data indicated as geom_...()

ggplot(data = blomkvist, 
       mapping = aes(x = age, 
                     y = rt_hand_d))

ggplot(data = blomkvist, 
       mapping = aes(x = age, 
                     y = rt_hand_d)) +
  geom_point()

ggplot(data = blomkvist, 
       mapping = aes(x = age, 
                     y = rt_hand_d)) +
  geom_quantile()

ggplot(data = blomkvist, 
       mapping = aes(x = age, 
                     y = rt_hand_d)) +
  geom_rug()

ggplot(data = blomkvist, 
       mapping = aes(x = age, 
                     y = rt_hand_d)) +
  geom_point() +
  geom_quantile() +
  geom_rug()

Optional grammatical elements

  • facets: dividing data into subplots
  • statistics: summarising representations
  • coordinates: plotting space
  • theme: visual properties not related to the data (font, background)

  • data
  • aesthetics
  • geometries

ggplot(blomkvist, aes(x = age, y = rt_hand_d)) +
  geom_point()

  • data
  • aesthetics
  • geometries
  • facets

ggplot(blomkvist, aes(x = age, y = rt_hand_d)) +
  geom_point() +
  facet_grid(~sex)

  • data
  • aesthetics
  • geometries
  • facets
  • statistics

ggplot(blomkvist, aes(x = age, y = rt_hand_d)) +
  geom_point() +
  stat_smooth(method = "lm", se = FALSE) 

  • data
  • aesthetics
  • geometries
  • facets
  • statistics
  • coordinates

ggplot(blomkvist, aes(x = age, y = rt_hand_d)) +
  geom_point() +
  coord_trans(x = "log", y = "reverse")

  • data
  • aesthetics
  • geometries
  • facets
  • statistics
  • coordinates

ggplot(blomkvist, aes(x = age, y = rt_hand_d)) +
  geom_point() +
  coord_flip()

  • data
  • aesthetics
  • geometries
  • facets
  • statistics
  • coordinates
  • theme

ggplot(blomkvist, aes(x = age, y = rt_hand_d)) +
  geom_point() +
  theme_dark()

  • data
  • aesthetics
  • geometries
  • facets
  • statistics
  • coordinates
  • theme

ggplot(blomkvist, aes(x = age, y = rt_hand_d)) +
  geom_point() +
  theme(panel.background = element_blank())

Exercise 2

grammatical elements in action

Open script exercises/2a_grammar_of_graphics.R

Bonus: exercises/2b_grammar_of_graphics.R

Aesthetics and attributes

Aesthetics and attributes

  • appearance of geometries
  • e.g. colour, size, shape
  • attributes take properties
  • aesthetics take variables

ggplot(blomkvist, aes(x = age, y = rt_hand_d)) +
  geom_point(colour = "red")

Aesthetics and attributes

  • appearance of geometries
  • e.g. colour, size, shape
  • attributes take properties
  • aesthetics take variables

ggplot(blomkvist, aes(x = age, y = rt_hand_d)) +
  geom_point(aes(colour = smoker))

Aesthetics and attributes

  • appearance of geometries
  • e.g. colour, size, shape
  • attributes take properties
  • aesthetics take variables

ggplot(blomkvist, aes(x = age, y = rt_hand_d)) +
  geom_point(aes(colour = smoker)) +
  stat_smooth(method = "lm")

Aesthetics and attributes

  • appearance of geometries
  • e.g. colour, size, shape
  • attributes take properties
  • aesthetics take variables

ggplot(blomkvist, aes(x = age, y = rt_hand_d)) +
  geom_point() +
  stat_smooth(aes(colour = smoker), method = "lm")

Aesthetics and attributes

  • appearance of geometries
  • e.g. colour, size, shape
  • attributes take properties
  • aesthetics take variables

ggplot(blomkvist, aes(x = age, y = rt_hand_d)) +
  geom_point(aes(colour = smoker)) +
  stat_smooth(aes(colour = smoker), method = "lm")

Aesthetics and attributes

  • appearance of geometries
  • e.g. colour, size, shape
  • attributes take properties
  • aesthetics take variables

ggplot(blomkvist, aes(x = age, y = rt_hand_d, colour = smoker)) +
  geom_point() +
  stat_smooth(method = "lm")

Aesthetics and attributes

ggplot(blomkvist, aes(x = age, y = rt_hand_d, colour = smoker)) +
  geom_point(size = 2.5) 

Aesthetics and attributes

ggplot(blomkvist, aes(x = age, y = rt_hand_d, shape = smoker)) +
  geom_point(size = 2.5)

Aesthetics and attributes

ggplot(blomkvist, aes(x = age, y = rt_hand_d, colour = smoker, shape = smoker))  +
  geom_point(size = 2.5)

Aesthetics and attributes

ggplot(blomkvist, aes(x = age, y = rt_hand_d, colour = smoker, shape = sex))  +
  geom_point(size = 2.5)

Aesthetics

typically x, y, colour, fill, group

  • some are required by geometries; others are optional
  • continuous vs discrete variables:
    • e.g. shape and label can only be used for categorical values
  • should be chosen to facilitate comprehension
  • scatterplot: geom_point()
x, y, shape, colour, size, fill, alpha, stroke, group
  • barplot: geom_bar()
x, y, colour, fill, linewidth, linetype, alpha, group
  • boxplot: geom_boxplot()
x, y, lower, xlower, upper, xupper, middle, xmiddle, ymin, 
xmin, ymax, xmax, weight, colour, fill, size, alpha, shape, 
linetype, linewidth, group

Decoding of continuous variables (e.g. rt)

(Wong 2010, 665)

  • position on a common scale
  • position on the same but nonaligned scales
  • lengths
  • angles, slopes
  • areas
  • volume, monochromatic colour spectrum (saturation, grey scale)
  • pure spectrum colours

Decoding of continuous variables

position on common scale

ggplot(blomkvist, 
       aes(x = smoker, 
           y = rt_hand_d)) +
  geom_jitter() 

Decoding of continuous variables

position on non aligned scale

ggplot(blomkvist, 
       aes(x = smoker, 
           y = rt_hand_d)) +
  geom_jitter() +
  facet_wrap(~smoker, scales = "free") 

Decoding of categorical variables (groups)

(Wong 2010, 665)

  • qualitative colours, labels, line colours
  • sequential colours, shape outlines, line type
  • filled shapes, hatching (shading with lines), line width

Decoding of categorical variables (groups)

ggplot(blomkvist, aes(x = age, y = rt_hand_d, label = sex)) +
  geom_text(size = 3)

  • qualitative colours, labels, line colours
  • sequential colours, shape outlines, line type
  • filled shapes, hatching (shading with lines), line width

Decoding of categorical variables (groups)

ggplot(blomkvist, aes(x = age, y = rt_hand_d, shape = sex)) +
  geom_point(size = 3)

  • qualitative colours, labels, line colours
  • sequential colours, shape outlines, line type
  • filled shapes, hatching (shading with lines), line width

Decoding of categorical variables (groups)

ggplot(blomkvist, aes(x = age, y = rt_hand_d, colour = sex)) +
  geom_point(size = 3)

  • qualitative colours, labels, line colours
  • sequential colours, shape outlines, line type
  • filled shapes, hatching (shading with lines), line width

Decoding of categorical variables (groups)

ggplot(blomkvist, aes(x = age, y = rt_hand_d, colour = sex))  +
  stat_smooth(method = "lm", se = F)

  • qualitative colours, labels, line colours
  • sequential colours, shape outlines, line type
  • filled shapes, hatching (shading with lines), line width

Decoding of categorical variables (groups)

ggplot(blomkvist, aes(x = age, y = rt_hand_d, linetype = sex)) +
  stat_smooth(method = "lm", se = F)

  • qualitative colours, labels, line colours
  • sequential colours, shape outlines, line type
  • filled shapes, hatching (shading with lines), line width

Decoding of categorical variables (groups)

ggplot(blomkvist, aes(x = age, y = rt_hand_d, size = sex)) +
  stat_smooth(method = "lm", se = F)

  • qualitative colours, labels, line colours
  • sequential colours, shape outlines, line type
  • filled shapes, hatching (shading with lines), line width

Exercise 3

practice aesthetics and attributes

Open script exercises/3a_aesthetics_and_attributes.R

If you have time continue with

  • exercises/3b_aesthetics_and_attributes.R
  • exercises/3c_aesthetics_and_attributes.R

Major visualisation tools

Major visualisation tools

  • Geometries (geom_) control visual encoding of aesthetics layer
  • ~50 geometries: geom_... are part of ggplot2
 [1] abline            area              bar               bin_2d           
 [5] bin2d             blank             boxplot           col              
 [9] column            contour           contour_filled    count            
[13] crossbar          curve             density           density_2d       
[17] density_2d_filled density2d         density2d_filled  dotplot          
[21] errorbar          errorbarh         freqpoly          function         
[25] hex               histogram         hline             jitter           
[29] label             line              linerange         map              
[33] path              point             pointrange        polygon          
[37] qq                qq_line           quantile          raster           
[41] rect              ribbon            rug               segment          
[45] sf                sf_label          sf_text           smooth           
[49] spoke             step              text              tile             
[53] violin            vline            

Major visualisation tools

  • choice depends on visualisation goals (and your subject domain)
  • more geoms in other packages such as tidybayes, ggbeeswarm, and ggridges
  • many can be combined
  • three important groups:
    • bivariate distributions
    • univariate distributions
    • groups comparison

Bivariate distribution

  • function: relationship between two variables
  • variable type: typically continuous
  • examples: scatter plot, time series

Univariate distribution

  • function: distribution of values
  • variable type: continuous or discrete
  • examples: histograms, density plots, bar plots, rug

ggplot(blomkvist, aes(x = rt_hand_d)) +
  geom_histogram() 

Univariate distribution

  • function: distribution of values
  • variable type: continuous or discrete
  • examples: histograms, density plots, bar plots, rug

ggplot(blomkvist, aes(x = rt_hand_d)) +
  geom_density() 

Univariate distribution

  • function: distribution of values
  • variable type: continuous or discrete
  • examples: histograms, density plots, bar plots, rug

ggplot(blomkvist, aes(x = rt_hand_d)) +
  geom_density() +
  geom_rug()

Groups comparison

  • function: distribution of values for two or more groups (often closely tied to statistical descriptions)
  • variable type: continuous
  • examples: (jitter) dots, box plot, violin plot, beeswarm plots, barplot (pie chart), dynamite plots

Groups comparison

dynamite plot and its pitfalls

  • suggest normal distribution?
  • same number of observations in each group?
  • bars suggest data where there are none?
  • are there no values above the errorbar (watch what’s going to happen to the y-axis)?

Groups comparison

dynamite plots

Groups comparison

dots

Groups comparison

jittered dots

Groups comparison

jittered dots and errorbars

Groups comparison

box-and-whiskers plot

Groups comparison

box-and-whiskers plot

Groups comparison

box-and-whiskers plot (Tukey 1977)

Exercise 4

major visualisation tools

Open script exercises/4a_major_viz_tools.R

Continue with exercises/4b_major_viz_tools.R

Cosmetics I

Changing text: labs

  • title
  • subtitle
  • caption
  • tag
  • x
  • y
  • colour, shape, linetype, fill

ggplot(blomkvist, aes(y = rt_hand_d, x = age, colour = smoker)) +
  geom_point() + 
  labs()

Changing text: labs

  • title
  • subtitle
  • caption
  • tag
  • x
  • y
  • colour, shape, linetype, fill

ggplot(blomkvist, aes(y = rt_hand_d, x = age, colour = smoker)) +
  geom_point() + 
  labs(title = "My scatter plot")

Changing text: labs

  • title
  • subtitle
  • caption
  • tag
  • x
  • y
  • colour, shape, linetype, fill

ggplot(blomkvist, aes(y = rt_hand_d, x = age, colour = smoker)) +
  geom_point() + 
  labs(title = "My scatter plot", 
       subtitle = "I'm a subtitle")

Changing text: labs

  • title
  • subtitle
  • caption
  • tag
  • x
  • y
  • colour, shape, linetype, fill

ggplot(blomkvist, aes(y = rt_hand_d, x = age, colour = smoker)) +
  geom_point() + 
  labs(caption = "Caption for data source")

Changing text: labs

  • title
  • subtitle
  • caption
  • tag
  • x
  • y
  • colour, shape, linetype, fill

ggplot(blomkvist, aes(y = rt_hand_d, x = age, colour = smoker)) +
  geom_point() + 
  labs(tag = "A")

Changing text: labs

  • title
  • subtitle
  • caption
  • tag
  • x
  • y
  • colour, shape, linetype, fill

ggplot(blomkvist, aes(y = rt_hand_d, x = age, colour = smoker)) +
  geom_point() + 
  labs(x = "Age in years", 
       y = "Reaction time in msecs")

Changing text: labs

  • title
  • subtitle
  • caption
  • tag
  • x
  • y
  • colour, shape, linetype, fill

ggplot(blomkvist, aes(y = rt_hand_d, x = age, colour = smoker)) +
  geom_point() + 
  labs(colour = "Legend\ntitle:")

Themes

  • specify appearance of non-data related ink
  • can be done manually using themes() or using wrapper functions
  • All ggplot wrappers:
[1] "theme_bw"       "theme_classic"  "theme_dark"     "theme_grey"    
[5] "theme_light"    "theme_linedraw" "theme_minimal"  "theme_void"    
  • e.g. ggthemes for more themes:
 [1] "theme_base"            "theme_calc"            "theme_clean"          
 [4] "theme_economist"       "theme_economist_white" "theme_excel"          
 [7] "theme_excel_new"       "theme_few"             "theme_fivethirtyeight"
[10] "theme_foundation"      "theme_gdocs"           "theme_hc"             
[13] "theme_map"             "theme_pander"          "theme_par"            
[16] "theme_solarized"       "theme_solarized_2"     "theme_solid"          
[19] "theme_stata"           "theme_stata_base"      "theme_stata_colors"   
[22] "theme_tufte"           "theme_wsj"            

Themes (ggplot2 default)

ggplot(blomkvist, aes(y = rt_hand_d, x = age)) + 
  geom_point() +
  facet_grid(~smoker) +
  theme_grey(base_size = 11)

Themes

ggplot(blomkvist, aes(y = rt_hand_d, x = age)) + 
  geom_point() +
  facet_grid(~smoker) +
  theme_minimal(base_size = 14)

Themes

ggplot(blomkvist, aes(y = rt_hand_d, x = age)) + 
  geom_point() +
  facet_grid(~smoker) +
  theme_light(base_size = 14)

Themes

ggplot(blomkvist, aes(y = rt_hand_d, x = age)) + 
  geom_point() +
  facet_grid(~smoker) +
  theme_dark(base_size = 14)

Themes

ggplot(blomkvist, aes(y = rt_hand_d, x = age)) + 
  geom_point() +
  facet_grid(~smoker) +
  ggthemes::theme_clean()

Saving your plot

ggsave("name of plot.png", width = 5, height = 5)
  • .eps, .pdf, .svg, .wmf, .png, .jpg, .bmp, .tiff
  • sizes requires some manual adjustment
  • make sure fonts are not too small / large
  • keep the aspect ratio sensible
  • or export function in plots panel

Exercise 5

bringing everything together

Open script exercises/5a_bringing_everything_together.R

Continue with exercises/5b_bringing_everything_together.R

Useful resources

Cosmetics II

Changing text: legend keys

  • scale_colour_discrete
  • scale_colour_continuous
  • scale_colour_manual
  • or any other aesthetic instead of colour

ggplot(blomkvist, aes(y = rt_hand_d, x = age, colour = smoker)) +
  geom_point() +  
  scale_colour_discrete(
    labels = c("ex-smoker", "non-smoker", "smoker")) 

Changing text: legend keys

  • change colour values manually
  • colour names: link
  • ggthemes

ggplot(blomkvist, aes(y = rt_hand_d, x = age, colour = smoker)) +
  geom_point() +  
  scale_colour_manual(
    labels = c("ex-smoker", "non-smoker", "smoker"),
    values = c("firebrick", "turquoise2", "cornflowerblue"))

Changing text: legend keys

  • change colour values manually
  • colour names: link
  • ggthemes

ggplot(blomkvist, aes(y = rt_hand_d, x = age, colour = smoker)) +
  geom_point() +  
  scale_colour_manual(
    labels = c("ex-smoker", "non-smoker", "smoker"),
    values = c("firebrick", "turquoise2", "cornflowerblue"))

Changing text: legend keys

  • change colour values manually
  • colour names: link
  • ggthemes
# RGB codes of "colorblind" function
mycolours <- c("#000000", "#E69F00", "#56B4E9", "#009E73", 
               "#F0E442", "#0072B2", "#D55E00", "#CC79A7")

# RGB codes of "colorblind" function
scales::show_col(colorblind_pal()(8))

Changing text: legend keys

  • change colour values manually
  • colour names: link
  • ggthemes

ggplot(blomkvist, aes(y = rt_hand_d, x = age, colour = smoker)) +
  geom_point() +  
  scale_colour_manual(
    labels = c("ex-smoker", "non-smoker", "smoker"),
    values = mycolours[1:3])

Changing text: legend keys

  • change colour values manually
  • colour names: link
  • ggthemes

ggplot(blomkvist, aes(y = rt_hand_d, x = age, colour = smoker)) +
  geom_point() +  
  scale_colour_colorblind(
    labels = c("ex-smoker", "non-smoker", "smoker"))

Changing text: strips

ggplot(blomkvist, aes(y = rt_hand_d, x = age)) +
  geom_point() + 
  facet_grid(~smoker)

Changing text: strips

ggplot(blomkvist, aes(y = rt_hand_d, x = age)) +
  geom_point() + 
  facet_grid(~smoker, labeller = label_both)

Changing text: strips

blomkvist <- mutate(blomkvist, 
                    smoker = recode(smoker, 
                    "former" = "Ex-smoker",
                    "no" = "Non-smoker",
                    "yes" = "Smoker"))

Changing text: strips

blomkvist <- mutate(blomkvist, 
                    smoker = recode(smoker, 
                    "former" = "Ex-smoker",
                    "no" = "Non-smoker",
                    "yes" = "Smoker"))

Themes

  • axis
  • legend
  • panel
  • plot
  • strip

ggplot(blomkvist, aes(y = rt_hand_d, x = age)) + 
  geom_point() +
  theme()

Themes: axis

  • axis.text
    • axis.text.x
    • axis.text.y
  • axis.title
    • axis.title.x
    • axis.title.y

ggplot(blomkvist, aes(y = rt_hand_d, x = age)) + 
  geom_point() +
  theme(axis.text = element_text(face = "italic"))

Themes: axis

  • axis.text
    • axis.text.x
    • axis.text.y
  • axis.title
    • axis.title.x
    • axis.title.y

ggplot(blomkvist, aes(y = rt_hand_d, x = age)) + 
  geom_point() +
  theme(axis.title = element_text(face = "bold"))

Themes: axis

  • axis.text
    • axis.text.x
    • axis.text.y
  • axis.title
    • axis.title.x
    • axis.title.y

ggplot(blomkvist, aes(y = rt_hand_d, x = age)) + 
  geom_point() +
  theme(axis.title.y = element_text(face = "bold"))

Themes: legend

  • legend.background
  • legend.margin
  • legend.spacing
  • legend.key
  • legend.text
  • legend.title
  • legend.position
  • legend.orientation
  • legend.justification
  • legend.box

ggplot(blomkvist, aes(y = rt_hand_d, x = age, colour = smoker)) + 
  geom_point() +
  theme()

Themes: legend

  • legend.background
  • legend.margin
  • legend.spacing
  • legend.key
  • legend.text
  • legend.title
  • legend.position
  • legend.orientation
  • legend.justification
  • legend.box

ggplot(blomkvist, aes(y = rt_hand_d, x = age, colour = smoker)) + 
  geom_point() +
  theme(legend.position = "top")

Themes: legend

  • legend.background
  • legend.margin
  • legend.spacing
  • legend.key
  • legend.text
  • legend.title
  • legend.position
  • legend.orientation
  • legend.justification
  • legend.box

ggplot(blomkvist, aes(y = rt_hand_d, x = age, colour = smoker)) + 
  geom_point() +
  theme(legend.position = "top",
        legend.justification = "right")

Themes: legend

  • legend.background
  • legend.margin
  • legend.spacing
  • legend.key
  • legend.text
  • legend.title
  • legend.position
  • legend.orientation
  • legend.justification
  • legend.box

ggplot(blomkvist, aes(y = rt_hand_d, x = age, colour = smoker)) +
  geom_point() + 
  theme(legend.position = c(.15,.8))

Themes: panel

  • panel.background
  • panel.border
  • panel.spacing
  • panel.grid
    • panel.grid.major
    • panel.grid.minor

ggplot(blomkvist, aes(y = rt_hand_d, x = age)) +
  geom_point() + 
  theme()

Themes: panel

  • panel.background
  • panel.border
  • panel.spacing
  • panel.grid
    • panel.grid.major
    • panel.grid.minor

ggplot(blomkvist, aes(y = rt_hand_d, x = age)) +
  geom_point() + 
  theme(panel.background = element_blank())

Themes: plot

  • plot.background
  • plot.margin
  • plot.title
  • plot.subtitle
  • plot.caption
  • plot.tag

ggplot(blomkvist, aes(y = rt_hand_d, x = age)) +
  geom_point() +
  theme()

Themes: plot

  • plot.background
  • plot.margin
  • plot.title
  • plot.subtitle
  • plot.caption
  • plot.tag

ggplot(blomkvist, aes(y = rt_hand_d, x = age)) +
  geom_point() + 
  theme(plot.background = element_rect(fill = "pink"))

Themes: plot

  • plot.background
  • plot.margin
  • plot.title
  • plot.subtitle
  • plot.caption
  • plot.tag

ggplot(blomkvist, aes(y = rt_hand_d, x = age)) +
  geom_point() + 
  theme(plot.background = element_rect(fill = "pink"),
        plot.margin = unit(c(2,2,2,2), "cm"))

Themes: plot

  • plot.background
  • plot.margin
  • plot.title
  • plot.subtitle
  • plot.caption
  • plot.tag

ggplot(blomkvist, aes(y = rt_hand_d, x = age)) +
  geom_point() + 
  labs(title = "I'm a title") +
  theme(plot.title = element_text(colour = "pink"))

Themes: plot

  • plot.background
  • plot.margin
  • plot.title
  • plot.subtitle
  • plot.caption
  • plot.tag

ggplot(blomkvist, aes(y = rt_hand_d, x = age)) +
  geom_point() + 
  labs(caption = "I'm a caption") +
  theme(plot.caption = element_text(face = "italic"))

Themes: facet strips

  • strip.background
  • strip.placement
  • strip.text

ggplot(blomkvist, aes(y = rt_hand_d, x = age)) +
  geom_point() + 
  facet_grid(~smoker, labeller = label_both) +
  theme()

Themes: strip.background

  • strip.background
  • strip.placement
  • strip.text

ggplot(blomkvist, aes(y = rt_hand_d, x = age)) +
  geom_point() + 
  facet_grid(~smoker, labeller = label_both) +
  theme(strip.background = element_blank())

Themes: strip.background

  • strip.background
  • strip.placement
  • strip.text

ggplot(blomkvist, aes(y = rt_hand_d, x = age)) +
  geom_point() + 
  facet_grid(~smoker, labeller = label_both) +
  theme(strip.background = element_rect(fill = "forestgreen"))

Themes: strip.text

  • strip.background
  • strip.placement
  • strip.text

ggplot(blomkvist, aes(y = rt_hand_d, x = age)) +
  geom_point() + 
  facet_grid(~smoker, labeller = label_both) +
  theme(strip.background = element_rect(fill = "forestgreen"),
        strip.text = element_text(colour = "white", hjust = 0))

Themes: strip.text

  • strip.background
  • strip.placement
  • strip.text

ggplot(blomkvist, aes(y = rt_hand_d, x = age)) +
  geom_point() + 
  facet_grid(~smoker, labeller = label_both) +
  theme(strip.background = element_rect(fill = "forestgreen"),
        strip.text = element_text(colour = "white", hjust = 0, 
                                  face = "bold", size = 16, 
                                  angle = 180))

References

Andrews, Mark. 2021. Doing data science in R: An Introduction for Social Scientists. London, UK: SAGE Publications Ltd.

Anscombe, Francis J. 1973. “Graphs in Statistical Analysis.” The American Statistician 27: 17–21.

Blomkvist, Andreas W., Fredrik Eika, Martin T. Rahbek, Karin D. Eikhof, Mette D. Hansen, Malene Søndergaard, Jesper Ryg, Stig Andersen, and Martin G. Jørgensen. 2017. “Reference Data on Reaction Time and Aging Using the Nintendo Wii Balance Board: A Cross-Sectional Study of 354 Subjects from 20 to 99 Years of Age.” PLoS One 12 (12): e0189598.

Hartwig, Frederick, and Brian E. Dearing. 1979. Exploratory Data Analysis. 16. Sage.

Matejka, Justin, and George Fitzmaurice. 2017. “Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics Through Simulated Annealing.” In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, 1290–94.

Tufte, Edward R. 1983. The Visual Display of Information. Cheshire, Ct: Graphics Press.

———. 1989. The Visual Display of Quantitative Information. Vol. 13–14. Graphic Press.

Tukey, John W. 1977. Exploratory Data Analysis. Vol. 2.

Wickham, Hadley. 2010. “A Layered Grammar of Graphics.” Journal of Computational and Graphical Statistics 19 (1): 3–28.

———. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer.

Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc.

Wilkinson, Leland. 1999. The Grammar of Graphics. Springer.

Wong, Bang. 2010. “Points of View: Design of Data Figures.” Nature Methods 7 (9): 665.