March 26, 2015

Course content

Course content

  • Aim
    • Create some intuition for neat and clean graphical visualization
    • Introducing the concept of Grammar of Graphics
    • Working with ggplot2 package
    • With a focus on static graphics for publication


  • Requirements
    • Elementary coding skills in R
    • Basics in statistics for data analysis

Outline

  1. Introduction to data visualization
  2. The Grammar of Graphics
  3. Graphics with ggplot2
    • Plotting geometric objects
    • Mapping variables to aesthetics
    • Scales for XY-Position, Color, etc.
    • Faceting
    • Themes, labs and saving
  4. Final remarks on ggplot2

Introduction to data visualization

Graphical excellence by Edward Tufte

  • "…is that which gives the viewer the greatest number of ideas,
  • in the stortest time,
  • with the least ink,
  • the smallest space,
  • and which tells the truth about data."
  • (Tufte 2001)

Introduction

Drawing

Introduction

Graphical excellence - by Andrew Abela

Drawing

Graphical excellence - Do's

  • Balance between importance and graphical attention
  • Clear and well-structured representation
  • Data-Inc-Ratio should be 1
  • Use consistant and full axis
  • Use gridlines and ticks, but gently
  • Check for grey print-out and color blindness
  • Proportionality between numbers and their graphical representation

Graphical excellence - The lie factor

  • Proportionality betwen numbers and their graphical representation
Drawing

Graphical excellence - The lie factor

  • Proportionality betwen numbers and their graphical representation
Drawing
es_graph <- (5.3 - 0.6) / 0.6  #Effect size in graphic (inches)
es_data <- (27.5 - 18) / 18   #Effect size in data (miles per gallon)
lie_factor <- es_graph / es_data;   
lie_factor  #should be around 1
## [1] 14.84211

Graphical excellence - Dont's (!)

Drawing

Graphical excellence - Dont's (!)

Drawing

Graphical excellence - Steven's power law

  • Subjective sensation \(w(I)\) = Physical stimulus I\(^\beta\)
    • \(I\) Magnitude of the physical stimulus
    • \(w(I)\) Subjective magnitude of the sensation evoked by the stimulus
    • \(\beta\) > 1 … perceived over-estimation (e.g. salty taste, electric shock)
    • \(\beta\) < 1 … perceived under-estimation (e.g. smell, brightness)

Graphical excellence - Steven's power law

Drawing

Graphical excellence - Steven's power law

Drawing

Graphical excellence - Steven's power law

  • Sensation \(\frac{w(I_1)}{w(I_2)}\) = Intensity \((\frac{I_1}{I_2})^\beta\)
Attribute Estimated \(\beta\)
Length 0.9 to 1.1
Area 0.6 to 0.9
Volume 0.5 to 0.8

Graphical excellence - Dont's (!)

  • Do not …
    • use 3D charts
    • use more than 6 colors
    • overload chart with information
    • use rainbow or too intensive colors
    • narrow context, e.g. very small time interval
  • But have a look at the 20 imperatives of information design

The Grammar of Graphics

The Grammar of Graphics

  • Visualisation concept created by Wilkinson (1999)
    • to define the basic elements of a statistical graphic


  • Adapted for R by Wickham (2009)
    • who created the ggplot2 package
    • consistent and compact syntax to describe statistical graphics
    • highly modular as it breaks up graphs into semantic components


  • Is not a guide which graph to choose and how to convey information best!

The Grammar of Graphics - Terminology

A statistical graphic is a …

  • mapping of data
  • to aesthetic attributes (color, size, xy-position)
  • using geometric objects (points, lines, bars)
  • with data being statistically transformed (summarised, log-transformed)
  • and mapped onto a specific facet and coordinate system

The Grammar of Graphics

The Grammar of Graphics

  • Which data is used as an input?
  • What geometric objects are chosen for visualization?
  • What variables are mapped onto which attributes?
  • What type of scales are used to map data to aesthetics?
  • Are the variables statistically transformed before plotting?

The Grammar of Graphics

The Grammar of Graphics

  • Which data is used as an input?
  • What geometric objects are chosen for visualization?
  • What variables are mapped onto which attributes?
  • What type of scales are used to map data to aesthetics?
  • Are the variables statistically transformed before plotting?

The Grammar of Graphics

The Grammar of Graphics

  • Which data is used as an input?
  • What geometric objects are chosen for visualization?
  • What variables are mapped onto which attributes?
  • What type of scales are used to map data to aesthetics?
  • Are the variables statistically transformed before plotting?

The Grammar of Graphics

The Grammar of Graphics

  • Which data is used as an input?
  • What geometric objects are chosen for visualization?
  • What variables are mapped onto which attributes?
  • What type of scales are used to map data to aesthetics?
  • Are the variables statistically transformed before plotting?
  • Is any form of facetting applied?

The Grammar of Graphics

Graphics with ggplot2

Data preparation

library("ggplot2")
packageDescription("ggplot2")
## Package: ggplot2
## Type: Package
## Title: An implementation of the Grammar of Graphics
## Version: 1.0.0
## Author: Hadley Wickham <h.wickham@gmail.com>, Winston Chang
##        <winston@stdout.org>
## Maintainer: Hadley Wickham <h.wickham@gmail.com>
## Description: An implementation of the grammar of graphics in R. It
##        combines the advantages of both base and lattice graphics:
##        conditioning and shared axes are handled automatically, and
##        you can still build up a plot step by step from multiple
##        data sources. It also implements a sophisticated
##        multidimensional conditioning system and a consistent
##        interface to map data to aesthetic attributes. See the
##        ggplot2 website for more information, documentation and
##        examples.
## Depends: R (>= 2.14), stats, methods
## Imports: plyr (>= 1.7.1), digest, grid, gtable (>= 0.1.1),
##        reshape2, scales (>= 0.2.3), proto, MASS
## Suggests: quantreg, Hmisc, mapproj, maps, hexbin, maptools,
##        multcomp, nlme, testthat, knitr, mgcv
## VignetteBuilder: knitr
## Enhances: sp
## License: GPL-2
## URL: http://ggplot2.org, https://github.com/hadley/ggplot2
## BugReports: https://github.com/hadley/ggplot2/issues
## LazyData: true
## Collate: 'aaa-.r' 'aaa-constants.r' 'aes-calculated.r' .....
## Packaged: 2014-05-20 22:12:23 UTC; hadley
## NeedsCompilation: no
## Repository: CRAN
## Date/Publication: 2014-05-21 15:36:28
## Built: R 3.1.0; ; 2014-05-22 09:40:09 UTC; unix
## 
## -- File: /Library/Frameworks/R.framework/Versions/3.1/Resources/library/ggplot2/Meta/package.rds

Data preparation

data("diamonds")
# Prices of 50,000 round cut diamonds

# A dataset containing the prices and other attributes of almost 54,000 diamonds. 
# The variables are price in USD, carat, cut quality,...
# help(diamonds)
head(diamonds)
##   carat       cut color clarity depth table price    x    y    z
## 1  0.23     Ideal     E     SI2  61.5    55   326 3.95 3.98 2.43
## 2  0.21   Premium     E     SI1  59.8    61   326 3.89 3.84 2.31
## 3  0.23      Good     E     VS1  56.9    65   327 4.05 4.07 2.31
## 4  0.29   Premium     I     VS2  62.4    58   334 4.20 4.23 2.63
## 5  0.31      Good     J     SI2  63.3    58   335 4.34 4.35 2.75
## 6  0.24 Very Good     J    VVS2  62.8    57   336 3.94 3.96 2.48

Basics: Initiate ggplot object

qplot(data, ...) #close to plot() fct with compressed functionality and lots of defaults
ggplot(data, mapping = aes(), ...) #the main plotting function
  • data: the data frame employed
  • mapping: list of asthethic assignments
    • aes(x, y, color, size, fill, shape)

Basics: Initiate ggplot object

ggplot(data = diamonds, mapping = aes(x=x, y=price))
# Warning: No layers in plot 
  • ggplot() itself …
    • is not a plotting layer but initializes a ggplot object
    • declares the input data and some common aesthetics
  • Add layers by using the + operator

Basics: Geometric objects

ggplot(data = diamonds, mapping = aes(x=x, y=price)) + geom_point()

Basics: Geometric objects

geom_point(mapping = NULL, data = NULL, stat, ...)
  • mapping: list of asthethic assignments aes() for geom object
  • stat: statistical transformation required for geom object
  • NULL: inhibit values from ggplot()
  • ... other arguments,
    • often aesthetics you want to set unconditionally, e.g. color="red"

Geometric objects - Exercise

  • Excercise:
    1. Load library(ggplot2)
    2. Load data(diamonds) from the ggplot2 package
    3. Create a scatterplot of price and carat
      • Use ggplot(data = ..., mapping = aes(x=..., y=...)) to initiate an ggplot object
      • Use + geom_point() to create a scatterplot layer

Geometric objects - Exercise

ggplot(data = diamonds, mapping = aes(x=carat, y=price)) + geom_point()

Geometric objects - Exercise

ggplot(diamonds, aes(x=carat, y=price)) + geom_point()

Basics: Geometric objects

Examples for basic geom_ functions

geom_point(mapping = NULL, data = NULL,
stat = "identity", position = "identity", ...)

geom_line(mapping = NULL, data = NULL,
stat = "identity", position = "identity", ...)

geom_boxplot(mapping = NULL, data = NULL,
stat = "boxplot", position = "dodge", outlier.color = "black", 
                      outlier.shape = 16, outlier.size = 2, ...)

Basics: Geometric objects

  • Add, combine and edit layers like a toolbox
  • Extensive list of all ggplot2 objects can be found at

Basics: Mapping aesthetics

  • Besides mapping onto x- and y-position
    • variables can be assigned to geom aesthetics

Examples:

geom_point(aes(x=carat, y=price, size = carat ))#: point size varies with `carat`
aes(..., color = carat)#: color varies with `carat`
aes(..., fill = carat)#: fill color varies with `carat`
aes(..., linetype = carat)#: linetype varies with `carat`

Basics: Mapping aesthetics

Basics: Mapping aesthetics

Setting mappings for geom extends or replaces ggplot() mappings

ggplot(diamonds, aes(x=carat, y=price)) + geom_point(aes(color = cut))

Basics: Mapping aesthetics

But you can also state universal mappings within ggplot() objects

ggplot(diamonds, aes(x=carat, y=price, color = cut)) + geom_point()

Basics: Mapping aesthetics

Including some additional manipulations of variables

ggplot(diamonds, aes(x=carat, y=price, size=carat^2, alpha=carat^2)) + geom_point() 

Mapping aesthetics - Exercise

  • Excercise:
    1. Create a boxplot of y=price and x=carat
    2. Look at docs.ggplot2.org to find the right geom-object for a boxplot
    3. Map the variable cut onto the fill aesthetic: aes(fill=cut)

Mapping aesthetics - Exercise

ggplot(diamonds, aes(x=cut, y=price, fill=cut)) + geom_boxplot() 

Basics: Mapping aesthetics

  • data and aes inside geom replace ggplot() attributes
avgprice <- data.frame(avg = mean(diamonds$price))

ggplot(diamonds, aes(x=cut, y=price, fill=cut)) + geom_boxplot()  +
  geom_hline(data=avgprice, aes(yintercept = avg))

Basics: Mapping aesthetics

avgprice <- data.frame(avg = mean(diamonds$price))

ggplot() + 
  geom_boxplot(data=diamonds, aes(x=cut, y=price, fill=cut))  +
  geom_hline(data=avgprice, aes(yintercept = avg))

Basics: Mapping aesthetics

  • you can also save a ggplot object and later add layers
p <- ggplot(diamonds, aes(x=cut, y=price, fill=cut)) + geom_boxplot()  
p

Basics: Statistical transformation

Basics: Statistical transformation

ggplot(diamonds, aes(x=carat)) + geom_bar()

Basics: Statistical transformation

geom_bar(mapping = NULL, data = NULL, stat = "bin", position = "stack",
  ...)
  • stat argument statistically transforms input data (bin means bin and count)
  • position argument: dodge for side-by-side bars or stack for additive bars

Basics: Statistical transformation

ggplot(diamonds, aes(x=carat)) + geom_bar(stat="bin")

Basics: Statistical transformation

ggplot(diamonds, aes(x=carat)) + geom_bar(stat="bin", binwidth=1)

- binwidth sets the width of the bin

Statistical transformation - Excercise

  • Excercise:
    1. Do the transformation stat="bin" by hand
    2. Cut carat into groups (0,1] (1,2] (2,3] (3,4] (4,5]
      • cut(diamonds$carat, breaks=seq(0,5,1))
    3. Count observation by group
      • as.data.frame(table(diamonds$caratcut))
    4. Use ggplot() + geom_bar()
      • what stat argument do you need?

Statistical transformation - Excercise

diamonds$caratcut <- cut(diamonds$carat, breaks=seq(0,5,1))
diamondbin <- as.data.frame(table(diamonds$caratcut))
diamondbin
##    Var1  Freq
## 1 (0,1] 36438
## 2 (1,2] 15613
## 3 (2,3]  1857
## 4 (3,4]    27
## 5 (4,5]     4

Statistical transformation - Excercise

ggplot(diamondbin, aes(x=Var1, y=Freq)) + 
  geom_bar(stat="bin")  # Not right, why?

Statistical transformation - Excercise

ggplot(diamondbin, aes(x=Var1, y=Freq)) + 
  geom_bar(stat="identity")  # Here we go!

Basics - Summary

Drawing

Customizing scales (1/3)

Introduction to scales

ggplot(diamonds, aes(x=carat, fill=cut)) +  geom_bar() 

Introduction to scales

  • Scale is a realization of data values in terms of asthetical/physical values

  • control the mapping of data (domain) to aesthetics (range)
  • each aesthethic has its own (default) scale
  • scale depends on the variable type:
    • discrete (factor, logical, character)
    • continuous (numeric)

Introduction to scales

  • Scale specifications have the form scale_AESTHETIC_SCALENAME()
    • AESTHETIC: x, y, color, size or shape
    • SCALENAME: grey, gradient, hue, manual, continuous

Introduction to scales

Aesthetic Scalename (discrete variables) Scalename (continuous variables)
Position (x, y) discrete continuous
. date



- to adjust the ticks of a (continuous) y-axis you would set
+ scale_y_continuous(breaks = seq(0,9000,1000))

Introduction to scales

ggplot(diamonds, aes(x=carat, fill=cut)) +  geom_bar() +
  scale_y_continuous(breaks = seq(0,9000,1000))

Introduction to scales

Aesthetic Scalename (discrete variables) Scalename (continuous variables)
Color and fill discrete continuous
. brewer gradient
. grey gradient2
. hue gradientn
. identity
. manual



- to adjust the name of the fill mapping you would set
+ scale_fill_hue(name = "Cut quality")

Introduction to scales

ggplot(diamonds, aes(x=carat, fill=cut)) +  geom_bar() +
  scale_fill_hue(name="Cut quality")

Introduction to scales

Aesthetic Scalename (discrete variables) Scalename (continuous variables)
Color and fill discrete continuous
. brewer gradient
. grey gradient2
. hue gradientn
. identity
. manual



- to switch from hue color scale to grey scale
+ scale_fill_grey()

Introduction to scales

ggplot(diamonds, aes(x=carat, fill=cut)) +  geom_bar() +
  scale_fill_grey()

Introduction to scales - Excercise

  • Excercise:
    1. Use the previous barplot
      • ggplot(diamonds, aes(x=carat, fill=cut)) + geom_bar()
    2. Set the name of the fill aesthetic to "Cut quality"
      • scale_fill_discrete(…)
    3. Set the name of the x aesthetic to "Carat in gramm"
      • scale_x_continuous(…)
    4. Set the breaks of the y aesthetic to seq(0, 9000, 500)
    5. Set the label of the fill aesthetic to c(1:5)

Introduction to scales - Excercise

ggplot(diamonds, aes(x=carat, fill=cut)) +  geom_bar() +
  scale_fill_discrete(name = "Cut quality", label = c(1:5)) + 
  scale_x_continuous(name = "Carat in gramm") + 
  scale_y_continuous(breaks = seq(0,9000,500))

Introduction to scales

Aesthetic Discrete Continuous
Shape shape .
. identity .
. manual .
Line type linetype .
. identity .
. manual .
Size identity .
. manual size

Introduction to scales

ggplot(diamonds, aes(x=carat, fill=cut)) +  geom_bar() +  scale_fill_hue(
  name = expression(paste("Age \ngroup \nnot", integral(f(x) * dx^{2}, a, b)))
  ) 

#for more mathematical expressions see ?plotmath

Customizing scales (2/3)

Scales - Position Scales

Every plot has two position scales - horizontal and vertical

data(economics) #US economic time series
head(economics)
##         date   pce    pop psavert uempmed unemploy
## 1 1967-06-30 507.8 198712     9.8     4.5     2944
## 2 1967-07-31 510.9 198911     9.8     4.7     2945
## 3 1967-08-31 516.7 199113     9.0     4.6     2958
## 4 1967-09-30 513.3 199311     9.8     4.9     3143
## 5 1967-10-31 518.5 199498     9.7     4.7     3066
## 6 1967-11-30 526.2 199657     9.4     4.8     3018
?economics #monthly data with consumption, population and unemployment 

Scales - Position Scales

p <- ggplot(economics, aes(x=date, y=unemploy)) + geom_line()
p

Scales - Position Scales

To change the limits of a scale (and remove data outside limits)

p + scale_y_continuous(limits=c(6000,9000)) # removes observations outside [6000,9000]

Scales - Position Scales

Alternatively

p + ylim(6000,9000) # removes observations outside [6000,9000]

Scales - Position Scales

p + coord_cartesian(ylim=c(6000,9000)) # to zoom in and not remove data

Scales - Date scale

When working with time series data, the date scale is useful:

p + scale_x_date(limits = as.Date(c("2000-10-01", "2005-01-01"))) # for date values

Scales - Date scale

When working with time series data, the date scale is useful:

p + scale_x_date(limits = as.Date(c("2000-10-01", "2005-01-01")),
                 breaks= "6 months")

Scale transformation

  • Position scales (scale_x_continous and scale_y_continous)
    • have an attribute trans="identity" to transform the scale
p + scale_y_continuous(trans="identity", breaks=seq(0,15000,2500), limits=c(0,15000))

Scale transformation

  • Position scales (scale_x_continous and scale_y_continous)
    • have an attribute trans="log" to transform the scale
p + scale_y_continuous(trans="log", breaks=c(1,100,1000,15000), limits=c(1,15000))

Scale transformation

  • Position scales (scale_x_continous and scale_y_continous)
    • have an attribute trans="identity" to transform the scale
      positionscales
  • taken from Wickham (2009)

Scale transformation

Alternatively, one can use the functions

              p1 <- p + scale_y_log10(limits=c(1,15000))
                                p2 <- p + scale_y_reverse()
                                                  p3 <- p + scale_y_sqrt()

Scale transformation - Excercise

  • Excercise
    1. Load data set data(economics)
    2. Create a time series plot of consumption expenditures pce
      • aes(x=date, y=pce)
      • geom_line()
    3. Limit the time frame to 1970 to 2005
      • scale_x_date(limits = as.Date(c(..., ...)))
    4. Log-transform the y scale
      • scale_y_continuous()
    5. Provide proper limits and ticks for y axis
      • scale_y_continuous(breaks = c(1,...), limits = c(1,...))

Scale transformation - Excercise

ggplot(economics, aes(x=date, y=pce)) + geom_line() +
  scale_x_date(limits = as.Date(c("1970-01-01", "2005-01-01"))) +
  scale_y_continuous(trans="log", breaks=c(1,100,1000,10000), limits=c(1,10000))

Customizing scales (3/3)

Scales - Color

Aesthetic Discrete Continuous
Color and fill brewer gradient
. grey gradient2
. hue gradientn
. identity
. manual

Scales - Color - Continuous

p <- ggplot(diamonds, aes(x=carat, y=price, color=carat)) + geom_point()
p + scale_color_gradient()

Scales - Color - Continuous

Input via #RRGGBB hexadecimal-code

p + scale_color_gradient(low="#FF0000" , high="black", 
                              na.value = "grey50", limits=c(0,5))

Scales - Color - Continuous

  • via hsv: hue, saturation, and value/brightness, each in the range [0, 1].
  • via hcl: hue [0,360], chroma [0,100], and luminance [0,100]
Drawing

Scales - Color - Continuous

p + scale_color_gradient2(midpoint=3, low=hsv(0.3,0.5,1) , mid="blue", 
                               high=hcl(20,80,40))

Scales - Color - Continuous

via color palettes, e.g., rainbow, terrain.colors, topo.colors

p + scale_color_gradientn(colours=rainbow(3), breaks=c(2,3,4))

Scales - Color - Discrete

Aesthetic Discrete Continuous
Color and fill brewer gradient
. grey gradient2
. hue gradientn
. identity
. manual

Scales - Color - Discrete

  • via _grey: grey color scale
p <- ggplot(diamonds, aes(x=carat, fill=cut)) +
              geom_bar(position = "dodge", stat="bin")
p + scale_fill_grey()  

Scales - Color - Discrete

  • via _hue: qualitative color scale with evenly spaced hues
    • chroma=60, luminance=100 and hue moves in steps of 15°
Drawing

Scales - Color - Discrete

  • via _hue: qualitative color scale with evenly spaced hues
    • chroma=60, luminance=100 and hue moves in steps of 15°
p + scale_fill_hue()  

Scales - Color - Discrete

Scales - Color - Discrete

via color brewer and RColorBrewer package

RColorBrewer::display.brewer.all() #for an overview of all palletes
  • Wickham (2009) recommends:
    • "Set1" and "Dark2" for categorical data
    • "Set2", "Pastel1", "Pastel2" and "Accent" for areas

Scales - Color - Discrete

via _manual: either set each color by hand or via secondary sources

library(wesanderson)
p + scale_fill_manual(values = wes_palette("Cavalcanti",5))

Scales - Color - Excercise

  • Excercise:
    1. Use ggplot(diamonds, aes(x=carat, fill=cut)) + geom_bar()
    2. Change the position of the bar plot to dodge
    3. Have a look at docs.ggplot2.org/current/scale_brewer.html
    4. Set the fill color according to brewer type
      • seq (sequential), div (diverging) or qual (qualitative)
      • scale_fill_brewer(type="")
    5. Run RColorBrewer::display.brewer.all()
    6. Set the fill colors according to brewer set PuRd
      • scale_fill_brewer(palette="")

Scales - Color - Excercise

ggplot(diamonds, aes(x=carat, fill=cut)) +  geom_bar(position = "dodge") +
 scale_fill_brewer(type="seq") 

Scales - Color - Excercise

ggplot(diamonds, aes(x=carat, fill=cut)) +  geom_bar(position = "dodge") +
 scale_fill_brewer(type="div") 

Scales - Color - Excercise

ggplot(diamonds, aes(x=carat, fill=cut)) +  geom_bar(position = "dodge") +
 scale_fill_brewer(type="qual") 

Scales - Color - Excercise

ggplot(diamonds, aes(x=carat, fill=cut)) +  geom_bar(position = "dodge") +
 scale_fill_brewer(palette="PuRd") 

Scales - Colors - References

Faceting

Faceting

  • Faceting (also called conditioning, latticing or trellising)
    • divides the data into subsets according to categorical variables
    • then plots in multiple panels
    • a bit like creating a contingency table

Faceting - Two implementations

facet_grid(facets = . ~ ., scales = "fixed", ...) # e.g.: margins=TRUE for totals
facet_wrap(facets = . ~ ., scales = "fixed", ...) # e.g.: ncol=3 ,nrow=3
  • facets: formula how to split facet (row ~ col)
  • scales: axis scale be fixed, free, free_x, free_y
  • margins: add row/column totals as additional plot panels
  • ncol, nrow: set number of panel columns and rows

Faceting

Drawing
  • facet_grid() (left) is fundamentally 2d
    • being made up of two independent components
  • facet_wrap() (right) is 1d, but wrapped into 2d to save space.

Faceting using facet_grid

  • aligns vertical scales which is useful for wide screens

Faceting using facet_grid

diamondssub <- subset(diamonds, 
                        cut %in% c("Good", "Premium") & 
                        clarity %in% c("VS2", "IF"))

ggplot(diamondssub, aes(x=carat)) + geom_histogram() + 
  facet_grid(cut ~ .)

  • single column useful to compare distributions

Faceting using facet_grid

ggplot(diamondssub, aes(x=carat)) + geom_histogram() + 
  facet_grid(cut ~ ., scales="free_y")

  • same plot but with free y scales

Faceting using facet_grid

ggplot(diamondssub, aes(x=carat, y=price)) + geom_point() + 
  facet_grid(cut ~ clarity)

Faceting using facet_grid - Exercise

  • Excercise
    1. Create a smoothed line of x=carat and y=price
      • geom_smooth()
    2. Use facet_grid to explore the relationship for levels of cut
      • facet_grid(. ~ .)
    3. Add points with same xy-variables to the plot
      • + geom_point()
    4. Set the alpha value of the points to 0.4

Faceting using facet_grid - Exercise

ggplot(diamondssub, aes(x=carat, y= price)) + geom_smooth() + 
  facet_grid(cut ~ .) + 
  geom_point(alpha=0.4)

Faceting using facet_wrap

facet_wrap(facets = . ~ ., scales = "fixed", ...) # e.g.: ncol=3 ,nrow=3
  • most useful when a single variable has many levels
  • construct a long ribbon of panels

Faceting using facet_wrap

ggplot(diamonds, aes(x=carat, y=price)) + geom_point() + 
  facet_wrap(~clarity, nrow=2)

  • faceting by level of clarity

Faceting vs. group aesthetics

  • Faceting (using facet_XXX) when
    • need to broadly disentangle variables
    • avoid overlaps when plotting
    • allows two dimensions to compare two variables at ones
    • scales can vary across panels

  • Grouping (using asthetics) when
    • want to explore small differences
    • want to have groups all in one plot

Themes, labs and saving (1/3)

Themes

  • Themes control all aspects of non-data display
    • Don't affect how data is rendered by geoms or scale
    • Control over
      • title, axis labels, axis ticks labels,
      • legend labels, legend key labels
      • as well as the color of ticks, grid lines and backgrounds

Themes

p + theme_grey() # see plot to the left
p + theme_bw() #see plot to the right 
p + theme() # to set options

Themes

ggplot(diamondssub, aes(x=carat, y=price, linetype=cut)) +  geom_smooth() + 
  theme(legend.position = "bottom", legend.direction = "horizontal", 
        title= element_text(face = "bold"),
        panel.background = element_rect(fill = "#999922"), 
        panel.grid.minor = element_line(color="red", linetype="dotted"))

Themes, labs and saving (2/3)

Labs

Labs control the labelling of titles and axes

p + labs(title="Diamond bar plot", x="Carat in gramm", y="Price in USD")

Labs

Alternatively

p + ggtitle("Diamond bar plot") + xlab("Carat in gramm") + ylab("Price in USD")

Themes, labs and saving (3/3)

Save your output

  • ggplot has implemented ggsave() for exporting graphs
  • recognises the extensions:
    • eps/ps, tex (pictex), pdf, jpeg, tiff, png, bmp, svg and wmf
ggsave(filename="diamondplot.eps", plot=last_plot(), 
       width=10, height=5, dpi=300, units="cm")

ggsave(filename="diamondplot.pdf", 
       width=11, height=8.5, dpi=600, units="in")

Final remarks on ggplot2

Final remarks on ggplot2

  • Weaknesses of ggplot2
    • Since Feb 2014 only maintenance mode = no new features
      • However, ggvis is supposed to become the new ggplot
    • ggplot2 is slower than some other graphic systems (e.g. lattice)
    • Does not really support 3d-graphs

Final remarks on ggplot2

  • Why to use ggplot2
    • Large community as one of the most popular R packages
    • Uses sensible and attractive decisions about
      • dimensions, scales and colors by default
    • Additional packages in-the-same-vain like ggplot2 for
      • geographical information (ggmap),
      • genomic data (ggbio),
      • Markov Chain Monte Carlo simulations (ggmcmc)
      • and interactive graphics (ggvis)

Helpful website and books

References

Tufte, Edward R. 2001. The Visual Display of Quantitative Information. Vol. second edition. Bertrams.

Wickham, H. 2009. Ggplot2: Elegant Graphics for Data Analysis. Use R! Springer. http://books.google.de/books?id=bes-AAAAQBAJ.

Wilkinson, L. 1999. The Grammar of Graphics. Statistics and Computing. Springer.