Data visualization using ggplot2

Mike McCann
22-23 January 2015

Review

ggplot is not a base package, so we need to install it.

  1. How would you install the package ggplot?

  2. What is the next step if you want to use a function from ggplot2? What is the code?

Install & load ggplot2

install.packages("ggplot2")
library(ggplot2)

Why ggplot2?

  • More elegent & compact code than base graphics
  • More aesthetically pleasing than base graphics
  • Very powerful for exploratory analysis

Why ggplot2?

  • Supports a continuum of expertise
  • Easy to get started, plenty of power for complex figures

Publication-quality figures

alt text

Publication-quality figures

alt text

Why gg?

  • gg is for “grammar of graphics”
  • Uses a set of terms that defines the basic components of a plot
  • Used to produce figures using coherant, consistant syntax

The grammar

A basic ggplot2 plot consists of:

  • data: Must be a data.frame
  • aesthetics: How your data are represented visually
    • x, y, color, size, shape, etc.
  • geometry: Geometries of plotted objects
    • points, lines, polygones, etc.
  • and more…

A basic plot

library(ggplot2)
ggplot(data=iris, aes(x=Sepal.Length, y=Sepal.Width)) + geom_point()

plot of chunk unnamed-chunk-2

Plots can be assembled in pieces

myplot <- ggplot(data=iris, aes(x=Sepal.Length, y=Sepal.Width))

myplot + geom_point()

plot of chunk unnamed-chunk-3

Changing aesthetics of a geom

Increase the size of points

ggplot(data=iris, aes(x=Sepal.Length, y=Sepal.Width)) + geom_point(size=3)

plot of chunk unnamed-chunk-4

Changing aesthetics of a geom

Differentiate Species by color

ggplot(data=iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) + geom_point(size=3)

plot of chunk unnamed-chunk-5

Changing aesthetics of a geom

Differentiate Species by color & shape

ggplot(data=iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species, shape=Species)) + geom_point(size=3)

plot of chunk unnamed-chunk-6

Try It!

Take a sample of the diamonds dataset

d2 <- diamonds[sample(1:nrow(diamonds),1000),]

Then generate this plot:
plot of chunk unnamed-chunk-8

Other types of geoms

Type geom_ and hit tab to see them all!

Then, use ?geom_nameofgeom to see the help screen.

Example of other geoms

Boxplot!

ggplot(iris, aes(x=Species,y=Sepal.Length)) + geom_boxplot()

plot of chunk unnamed-chunk-9

Try It!

  1. Look up geom_histogram. What does it do?

  2. Make a histogram of Sepal.Length from the iris data set. What did it do with the different species?

Facets

Plots can also have facets to make lattice plots.

ggplot(iris, aes(Sepal.Length)) + geom_histogram() + facet_grid(Species ~ .)

plot of chunk unnamed-chunk-10

Facets

Change to facet_grid(. ~ Species) and get one row, three columns.

ggplot(iris, aes(Sepal.Length)) + geom_histogram() + facet_grid(. ~ Species)

plot of chunk unnamed-chunk-11

Adding stats

Type stat_ and hit tab to see them all!

Then, use ?stat_nameofstat to see the help screen.

Adding stats

Use stat_smooth to add a linear fit

ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) + geom_point() + stat_smooth(method="lm")

plot of chunk unnamed-chunk-12

Scales

scales are used to modify axes and colors

For example:

  • scale_y_continuous() Set name, breaks, labels, limits of y-axis
  • scale_x_log10() log transform the x-axis
  • scale_colour_manual() Specify colors for geoms
  • scale_fill_discrete() Specify colors for geoms

Scales

ggplot(data=iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) + geom_point(size=3) + scale_colour_manual(values=c("red","blue","yellow"))

plot of chunk unnamed-chunk-13

More examples: Histograms

ggplot(faithful, aes(x=waiting)) + geom_histogram(binwidth=30, colour="black")

plot of chunk unnamed-chunk-14

More examples: Histograms

Change some of the aesthetics

ggplot(faithful, aes(x=waiting)) + geom_histogram(binwidth=8, colour="black", fill="steelblue")

plot of chunk unnamed-chunk-15

More examples: Bar plots

ggplot(iris, aes(Species, Sepal.Length)) + geom_bar(stat = "identity")

plot of chunk unnamed-chunk-16

More examples: Line plots

ggplot(mtcars, aes(x=wt, y=mpg, color=as.factor(cyl))) + geom_line()

plot of chunk unnamed-chunk-17

More examples: Density plots

ggplot(faithful, aes(waiting)) + geom_density()

plot of chunk unnamed-chunk-18

More examples: Density plots

Add a fill

ggplot(faithful, aes(waiting)) + geom_density(fill="blue")

plot of chunk unnamed-chunk-19

More examples: Density plots

Sometimes many ways to make the same (similar) graphs

ggplot(faithful, aes(waiting)) + geom_line(stat="density")

plot of chunk unnamed-chunk-20

Themes

Even more precise control can be done with themes

See ?theme for all of the options

Themes

I commonly use + theme_classic() or + theme_bw()

ggplot(iris, aes(Species, Sepal.Length)) + geom_bar(stat = "identity") + theme_bw()

plot of chunk unnamed-chunk-21

Saving plots

my_plot <- ggplot(iris, aes(Species, Sepal.Length)) + geom_bar(stat = "identity") + theme_bw()

ggsave("my_plot.jpg",my_plot,height=4,width=4,units="in")

You can specify the file name, dimensions, resolution, etc.

Note: Saved in your current working directory (unless specified).

Remember!

Data must be a data frame to plot with ggplot2

# This won't work!
xvar <- rnorm(100)
yvar <- rnorm(100)
ggplot(aes(xvar,yvar)) + geom_point()

Remember!

Data must be a data frame to plot with ggplot2

xvar <- rnorm(100)
yvar <- rnorm(100)
df <- data.frame(xvar, yvar) # make a data frame 
ggplot(df, aes(xvar,yvar)) + geom_point()

plot of chunk unnamed-chunk-24

Proper data formatting

Often our data looks like this (“wide”)

       spA      spB      spC      spD
1 51.85901 71.59855 20.19121 24.16370
2 49.75879 80.24066 19.12824 25.09150
3 50.12833 86.98701 20.25850 25.15004
4 49.72746 77.64475 19.35652 25.30864
5 51.10151 75.03489 19.32278 24.71188
6 50.79769 71.12593 20.78745 24.94725
dim(df)
[1] 100   4

Proper data formatting

But our data should look like this (“long”)

  species   weight
1       A 62.17762
2       B 75.65408
3       C 73.25439
4       D 79.00973
5       A 66.80117
6       B 76.37421
dim(df2)
[1] 400   2

Melting: from wide to long

# make some fake "wide" data
df <- data.frame(A=rnorm(100,50,6),
           B=rnorm(100,75,5),
           C=rnorm(100,50,4),
           D=rnorm(100,55,3))

Melting: from wide to long

Use the melt() function in reshape2 package

library(reshape2)
df2 <- melt(df)
head(df2)
  variable    value
1        A 54.32326
2        A 60.42918
3        A 50.89982
4        A 44.93549
5        A 42.19869
6        A 64.04846
dim(df2)
[1] 400   2

Now we can plot

ggplot(df2, aes(x=value)) + geom_histogram() + facet_grid(.~variable)

plot of chunk unnamed-chunk-33

Plotting means & error bars

alt text

Examples:

Further help

Questions?

Worksheet

Answers

  1. Type in data(package="datasets") to see all of the datasets pre-installed with R.

  2. Find some data that interests you (or use your own) and examine its structure. Are they vectors, data frames, other? How many observations are there?

  3. Use ggplot2 to make a one plot of some attribute of the data.