Mike McCann
22-23 January 2015
ggplot is not a base package, so we need to install it.
How would you install the package ggplot?
What is the next step if you want to use a function from ggplot2? What is the code?
install.packages("ggplot2")
library(ggplot2)
gg is for “grammar of graphics”A basic ggplot2 plot consists of:
library(ggplot2)
ggplot(data=iris, aes(x=Sepal.Length, y=Sepal.Width)) + geom_point()
myplot <- ggplot(data=iris, aes(x=Sepal.Length, y=Sepal.Width))
myplot + geom_point()
Increase the size of points
ggplot(data=iris, aes(x=Sepal.Length, y=Sepal.Width)) + geom_point(size=3)
Differentiate Species by color
ggplot(data=iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) + geom_point(size=3)
Differentiate Species by color & shape
ggplot(data=iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species, shape=Species)) + geom_point(size=3)
Take a sample of the diamonds dataset
d2 <- diamonds[sample(1:nrow(diamonds),1000),]
Then generate this plot:
Type geom_ and hit tab to see them all!
Then, use ?geom_nameofgeom to see the help screen.
Boxplot!
ggplot(iris, aes(x=Species,y=Sepal.Length)) + geom_boxplot()
Look up geom_histogram. What does it do?
Make a histogram of Sepal.Length from the iris data set. What did it do with the different species?
Plots can also have facets to make lattice plots.
ggplot(iris, aes(Sepal.Length)) + geom_histogram() + facet_grid(Species ~ .)
Change to facet_grid(. ~ Species) and get one row, three columns.
ggplot(iris, aes(Sepal.Length)) + geom_histogram() + facet_grid(. ~ Species)
Type stat_ and hit tab to see them all!
Then, use ?stat_nameofstat to see the help screen.
Use stat_smooth to add a linear fit
ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) + geom_point() + stat_smooth(method="lm")
scales are used to modify axes and colors
For example:
ggplot(data=iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) + geom_point(size=3) + scale_colour_manual(values=c("red","blue","yellow"))
ggplot(faithful, aes(x=waiting)) + geom_histogram(binwidth=30, colour="black")
Change some of the aesthetics
ggplot(faithful, aes(x=waiting)) + geom_histogram(binwidth=8, colour="black", fill="steelblue")
ggplot(iris, aes(Species, Sepal.Length)) + geom_bar(stat = "identity")
ggplot(mtcars, aes(x=wt, y=mpg, color=as.factor(cyl))) + geom_line()
ggplot(faithful, aes(waiting)) + geom_density()
Add a fill
ggplot(faithful, aes(waiting)) + geom_density(fill="blue")
Sometimes many ways to make the same (similar) graphs
ggplot(faithful, aes(waiting)) + geom_line(stat="density")
Even more precise control can be done with themes
See ?theme for all of the options
I commonly use + theme_classic() or + theme_bw()
ggplot(iris, aes(Species, Sepal.Length)) + geom_bar(stat = "identity") + theme_bw()
my_plot <- ggplot(iris, aes(Species, Sepal.Length)) + geom_bar(stat = "identity") + theme_bw()
ggsave("my_plot.jpg",my_plot,height=4,width=4,units="in")
You can specify the file name, dimensions, resolution, etc.
Note: Saved in your current working directory (unless specified).
Data must be a data frame to plot with ggplot2
# This won't work!
xvar <- rnorm(100)
yvar <- rnorm(100)
ggplot(aes(xvar,yvar)) + geom_point()
Data must be a data frame to plot with ggplot2
xvar <- rnorm(100)
yvar <- rnorm(100)
df <- data.frame(xvar, yvar) # make a data frame
ggplot(df, aes(xvar,yvar)) + geom_point()
Often our data looks like this (“wide”)
spA spB spC spD
1 51.85901 71.59855 20.19121 24.16370
2 49.75879 80.24066 19.12824 25.09150
3 50.12833 86.98701 20.25850 25.15004
4 49.72746 77.64475 19.35652 25.30864
5 51.10151 75.03489 19.32278 24.71188
6 50.79769 71.12593 20.78745 24.94725
dim(df)
[1] 100 4
But our data should look like this (“long”)
species weight
1 A 62.17762
2 B 75.65408
3 C 73.25439
4 D 79.00973
5 A 66.80117
6 B 76.37421
dim(df2)
[1] 400 2
# make some fake "wide" data
df <- data.frame(A=rnorm(100,50,6),
B=rnorm(100,75,5),
C=rnorm(100,50,4),
D=rnorm(100,55,3))
Use the melt() function in reshape2 package
library(reshape2)
df2 <- melt(df)
head(df2)
variable value
1 A 54.32326
2 A 60.42918
3 A 50.89982
4 A 44.93549
5 A 42.19869
6 A 64.04846
dim(df2)
[1] 400 2
ggplot(df2, aes(x=value)) + geom_histogram() + facet_grid(.~variable)
Type in data(package="datasets") to see all of the datasets pre-installed with R.
Find some data that interests you (or use your own) and examine its structure. Are they vectors, data frames, other? How many observations are there?
Use ggplot2 to make a one plot of some attribute of the data.