March 26, 2015
ggplot2 packageggplot2
ggplot2
es_graph <- (5.3 - 0.6) / 0.6 #Effect size in graphic (inches) es_data <- (27.5 - 18) / 18 #Effect size in data (miles per gallon) lie_factor <- es_graph / es_data; lie_factor #should be around 1
## [1] 14.84211
| Attribute | Estimated \(\beta\) |
|---|---|
| Length | 0.9 to 1.1 |
| Area | 0.6 to 0.9 |
| Volume | 0.5 to 0.8 |
ggplot2 packageA statistical graphic is a …
ggplot2library("ggplot2")
packageDescription("ggplot2")
## Package: ggplot2 ## Type: Package ## Title: An implementation of the Grammar of Graphics ## Version: 1.0.0 ## Author: Hadley Wickham <h.wickham@gmail.com>, Winston Chang ## <winston@stdout.org> ## Maintainer: Hadley Wickham <h.wickham@gmail.com> ## Description: An implementation of the grammar of graphics in R. It ## combines the advantages of both base and lattice graphics: ## conditioning and shared axes are handled automatically, and ## you can still build up a plot step by step from multiple ## data sources. It also implements a sophisticated ## multidimensional conditioning system and a consistent ## interface to map data to aesthetic attributes. See the ## ggplot2 website for more information, documentation and ## examples. ## Depends: R (>= 2.14), stats, methods ## Imports: plyr (>= 1.7.1), digest, grid, gtable (>= 0.1.1), ## reshape2, scales (>= 0.2.3), proto, MASS ## Suggests: quantreg, Hmisc, mapproj, maps, hexbin, maptools, ## multcomp, nlme, testthat, knitr, mgcv ## VignetteBuilder: knitr ## Enhances: sp ## License: GPL-2 ## URL: http://ggplot2.org, https://github.com/hadley/ggplot2 ## BugReports: https://github.com/hadley/ggplot2/issues ## LazyData: true ## Collate: 'aaa-.r' 'aaa-constants.r' 'aes-calculated.r' ..... ## Packaged: 2014-05-20 22:12:23 UTC; hadley ## NeedsCompilation: no ## Repository: CRAN ## Date/Publication: 2014-05-21 15:36:28 ## Built: R 3.1.0; ; 2014-05-22 09:40:09 UTC; unix ## ## -- File: /Library/Frameworks/R.framework/Versions/3.1/Resources/library/ggplot2/Meta/package.rds
data("diamonds")
# Prices of 50,000 round cut diamonds
# A dataset containing the prices and other attributes of almost 54,000 diamonds.
# The variables are price in USD, carat, cut quality,...
# help(diamonds)
head(diamonds)
## carat cut color clarity depth table price x y z ## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 ## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 ## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 ## 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63 ## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 ## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
qplot(data, ...) #close to plot() fct with compressed functionality and lots of defaults ggplot(data, mapping = aes(), ...) #the main plotting function
data: the data frame employedmapping: list of asthethic assignments
aes(x, y, color, size, fill, shape)ggplot(data = diamonds, mapping = aes(x=x, y=price)) # Warning: No layers in plot
ggplot() itself …
+ operatorggplot(data = diamonds, mapping = aes(x=x, y=price)) + geom_point()
geom_point(mapping = NULL, data = NULL, stat, ...)
mapping: list of asthethic assignments aes() for geom objectstat: statistical transformation required for geom objectNULL: inhibit values from ggplot()... other arguments,
library(ggplot2)data(diamonds) from the ggplot2 packageprice and carat
ggplot(data = ..., mapping = aes(x=..., y=...)) to initiate an ggplot object+ geom_point() to create a scatterplot layerggplot(data = diamonds, mapping = aes(x=carat, y=price)) + geom_point()
ggplot(diamonds, aes(x=carat, y=price)) + geom_point()
Examples for basic geom_ functions
geom_point(mapping = NULL, data = NULL,
stat = "identity", position = "identity", ...)
geom_line(mapping = NULL, data = NULL,
stat = "identity", position = "identity", ...)
geom_boxplot(mapping = NULL, data = NULL,
stat = "boxplot", position = "dodge", outlier.color = "black",
outlier.shape = 16, outlier.size = 2, ...)
ggplot2 objects can be found at
geom aestheticsExamples:
geom_point(aes(x=carat, y=price, size = carat ))#: point size varies with `carat` aes(..., color = carat)#: color varies with `carat` aes(..., fill = carat)#: fill color varies with `carat` aes(..., linetype = carat)#: linetype varies with `carat`
Setting mappings for geom extends or replaces ggplot() mappings
ggplot(diamonds, aes(x=carat, y=price)) + geom_point(aes(color = cut))
But you can also state universal mappings within ggplot() objects
ggplot(diamonds, aes(x=carat, y=price, color = cut)) + geom_point()
Including some additional manipulations of variables
ggplot(diamonds, aes(x=carat, y=price, size=carat^2, alpha=carat^2)) + geom_point()
y=price and x=caratcut onto the fill aesthetic: aes(fill=cut)ggplot(diamonds, aes(x=cut, y=price, fill=cut)) + geom_boxplot()
data and aes inside geom replace ggplot() attributesavgprice <- data.frame(avg = mean(diamonds$price)) ggplot(diamonds, aes(x=cut, y=price, fill=cut)) + geom_boxplot() + geom_hline(data=avgprice, aes(yintercept = avg))
avgprice <- data.frame(avg = mean(diamonds$price)) ggplot() + geom_boxplot(data=diamonds, aes(x=cut, y=price, fill=cut)) + geom_hline(data=avgprice, aes(yintercept = avg))
ggplot object and later add layersp <- ggplot(diamonds, aes(x=cut, y=price, fill=cut)) + geom_boxplot() p
ggplot(diamonds, aes(x=carat)) + geom_bar()
geom_bar(mapping = NULL, data = NULL, stat = "bin", position = "stack", ...)
stat argument statistically transforms input data (bin means bin and count)position argument: dodge for side-by-side bars or stack for additive barsggplot(diamonds, aes(x=carat)) + geom_bar(stat="bin")
ggplot(diamonds, aes(x=carat)) + geom_bar(stat="bin", binwidth=1)
- binwidth sets the width of the bin
stat="bin" by handcarat into groups (0,1] (1,2] (2,3] (3,4] (4,5]
cut(diamonds$carat, breaks=seq(0,5,1))as.data.frame(table(diamonds$caratcut))stat argument do you need?diamonds$caratcut <- cut(diamonds$carat, breaks=seq(0,5,1)) diamondbin <- as.data.frame(table(diamonds$caratcut)) diamondbin
## Var1 Freq ## 1 (0,1] 36438 ## 2 (1,2] 15613 ## 3 (2,3] 1857 ## 4 (3,4] 27 ## 5 (4,5] 4
ggplot(diamondbin, aes(x=Var1, y=Freq)) + geom_bar(stat="bin") # Not right, why?
ggplot(diamondbin, aes(x=Var1, y=Freq)) + geom_bar(stat="identity") # Here we go!
ggplot(diamonds, aes(x=carat, fill=cut)) + geom_bar()
scale_AESTHETIC_SCALENAME()
x, y, color, size or shapegrey, gradient, hue, manual, continuous| Aesthetic | Scalename (discrete variables) | Scalename (continuous variables) |
|---|---|---|
| Position (x, y) | discrete | continuous |
| . | date |
- to adjust the ticks of a (continuous) y-axis you would set
+ scale_y_continuous(breaks = seq(0,9000,1000))
ggplot(diamonds, aes(x=carat, fill=cut)) + geom_bar() + scale_y_continuous(breaks = seq(0,9000,1000))
| Aesthetic | Scalename (discrete variables) | Scalename (continuous variables) |
|---|---|---|
| Color and fill | discrete | continuous |
| . | brewer | gradient |
| . | grey | gradient2 |
| . | hue | gradientn |
| . | identity | |
| . | manual |
- to adjust the name of the fill mapping you would set
+ scale_fill_hue(name = "Cut quality")
ggplot(diamonds, aes(x=carat, fill=cut)) + geom_bar() + scale_fill_hue(name="Cut quality")
| Aesthetic | Scalename (discrete variables) | Scalename (continuous variables) |
|---|---|---|
| Color and fill | discrete | continuous |
| . | brewer | gradient |
| . | grey | gradient2 |
| . | hue | gradientn |
| . | identity | |
| . | manual |
- to switch from hue color scale to grey scale
+ scale_fill_grey()
ggplot(diamonds, aes(x=carat, fill=cut)) + geom_bar() + scale_fill_grey()
name of the fill aesthetic to "Cut quality"
name of the x aesthetic to "Carat in gramm"
breaks of the y aesthetic to seq(0, 9000, 500)label of the fill aesthetic to c(1:5)
ggplot(diamonds, aes(x=carat, fill=cut)) + geom_bar() + scale_fill_discrete(name = "Cut quality", label = c(1:5)) + scale_x_continuous(name = "Carat in gramm") + scale_y_continuous(breaks = seq(0,9000,500))
| Aesthetic | Discrete | Continuous |
|---|---|---|
| Shape | shape | . |
| . | identity | . |
| . | manual | . |
| Line type | linetype | . |
| . | identity | . |
| . | manual | . |
| Size | identity | . |
| . | manual | size |
ggplot(diamonds, aes(x=carat, fill=cut)) + geom_bar() + scale_fill_hue(
name = expression(paste("Age \ngroup \nnot", integral(f(x) * dx^{2}, a, b)))
)
#for more mathematical expressions see ?plotmath
Every plot has two position scales - horizontal and vertical
data(economics) #US economic time series head(economics)
## date pce pop psavert uempmed unemploy ## 1 1967-06-30 507.8 198712 9.8 4.5 2944 ## 2 1967-07-31 510.9 198911 9.8 4.7 2945 ## 3 1967-08-31 516.7 199113 9.0 4.6 2958 ## 4 1967-09-30 513.3 199311 9.8 4.9 3143 ## 5 1967-10-31 518.5 199498 9.7 4.7 3066 ## 6 1967-11-30 526.2 199657 9.4 4.8 3018
?economics #monthly data with consumption, population and unemployment
p <- ggplot(economics, aes(x=date, y=unemploy)) + geom_line() p
To change the limits of a scale (and remove data outside limits)
p + scale_y_continuous(limits=c(6000,9000)) # removes observations outside [6000,9000]
Alternatively
p + ylim(6000,9000) # removes observations outside [6000,9000]
p + coord_cartesian(ylim=c(6000,9000)) # to zoom in and not remove data
When working with time series data, the date scale is useful:
p + scale_x_date(limits = as.Date(c("2000-10-01", "2005-01-01"))) # for date values
When working with time series data, the date scale is useful:
p + scale_x_date(limits = as.Date(c("2000-10-01", "2005-01-01")),
breaks= "6 months")
scale_x_continous and scale_y_continous)
trans="identity" to transform the scalep + scale_y_continuous(trans="identity", breaks=seq(0,15000,2500), limits=c(0,15000))
scale_x_continous and scale_y_continous)
trans="log" to transform the scalep + scale_y_continuous(trans="log", breaks=c(1,100,1000,15000), limits=c(1,15000))
scale_x_continous and scale_y_continous)
trans="identity" to transform the scale
Alternatively, one can use the functions
p1 <- p + scale_y_log10(limits=c(1,15000))
p2 <- p + scale_y_reverse()
p3 <- p + scale_y_sqrt()
pce
aes(x=date, y=pce)geom_line()scale_x_date(limits = as.Date(c(..., ...)))scale_y_continuous()scale_y_continuous(breaks = c(1,...), limits = c(1,...))ggplot(economics, aes(x=date, y=pce)) + geom_line() +
scale_x_date(limits = as.Date(c("1970-01-01", "2005-01-01"))) +
scale_y_continuous(trans="log", breaks=c(1,100,1000,10000), limits=c(1,10000))
| Aesthetic | Discrete | Continuous |
|---|---|---|
| Color and fill | brewer | gradient |
| . | grey | gradient2 |
| . | hue | gradientn |
| . | identity | |
| . | manual |
p <- ggplot(diamonds, aes(x=carat, y=price, color=carat)) + geom_point() p + scale_color_gradient()
Input via #RRGGBB hexadecimal-code
p + scale_color_gradient(low="#FF0000" , high="black",
na.value = "grey50", limits=c(0,5))
hsv: hue, saturation, and value/brightness, each in the range [0, 1].hcl: hue [0,360], chroma [0,100], and luminance [0,100]
p + scale_color_gradient2(midpoint=3, low=hsv(0.3,0.5,1) , mid="blue",
high=hcl(20,80,40))
via color palettes, e.g., rainbow, terrain.colors, topo.colors
p + scale_color_gradientn(colours=rainbow(3), breaks=c(2,3,4))
| Aesthetic | Discrete | Continuous |
|---|---|---|
| Color and fill | brewer | gradient |
| . | grey | gradient2 |
| . | hue | gradientn |
| . | identity | |
| . | manual |
_grey: grey color scalep <- ggplot(diamonds, aes(x=carat, fill=cut)) +
geom_bar(position = "dodge", stat="bin")
p + scale_fill_grey()
_hue: qualitative color scale with evenly spaced hues
chroma=60, luminance=100 and hue moves in steps of 15°
_hue: qualitative color scale with evenly spaced hues
chroma=60, luminance=100 and hue moves in steps of 15°p + scale_fill_hue()
via color brewer and RColorBrewer package
p + scale_fill_brewer(type="qual", palette="Set1")
via color brewer and RColorBrewer package
RColorBrewer::display.brewer.all() #for an overview of all palletes
via _manual: either set each color by hand or via secondary sources
library(wesanderson)
p + scale_fill_manual(values = wes_palette("Cavalcanti",5))
ggplot(diamonds, aes(x=carat, fill=cut)) + geom_bar()dodgeseq (sequential), div (diverging) or qual (qualitative)RColorBrewer::display.brewer.all()PuRd
ggplot(diamonds, aes(x=carat, fill=cut)) + geom_bar(position = "dodge") + scale_fill_brewer(type="seq")
ggplot(diamonds, aes(x=carat, fill=cut)) + geom_bar(position = "dodge") + scale_fill_brewer(type="div")
ggplot(diamonds, aes(x=carat, fill=cut)) + geom_bar(position = "dodge") + scale_fill_brewer(type="qual")
ggplot(diamonds, aes(x=carat, fill=cut)) + geom_bar(position = "dodge") + scale_fill_brewer(palette="PuRd")
facet_grid(facets = . ~ ., scales = "fixed", ...) # e.g.: margins=TRUE for totals facet_wrap(facets = . ~ ., scales = "fixed", ...) # e.g.: ncol=3 ,nrow=3
facets: formula how to split facet (row ~ col)scales: axis scale be fixed, free, free_x, free_ymargins: add row/column totals as additional plot panelsncol, nrow: set number of panel columns and rows
facet_grid() (left) is fundamentally 2d
facet_wrap() (right) is 1d, but wrapped into 2d to save space.facet_gridfacet_griddiamondssub <- subset(diamonds,
cut %in% c("Good", "Premium") &
clarity %in% c("VS2", "IF"))
ggplot(diamondssub, aes(x=carat)) + geom_histogram() +
facet_grid(cut ~ .)
facet_gridggplot(diamondssub, aes(x=carat)) + geom_histogram() + facet_grid(cut ~ ., scales="free_y")
facet_gridggplot(diamondssub, aes(x=carat, y=price)) + geom_point() + facet_grid(cut ~ clarity)
facet_grid - Exercisegeom_smooth()cut
facet_grid(. ~ .)+ geom_point()alpha value of the points to 0.4facet_grid - Exerciseggplot(diamondssub, aes(x=carat, y= price)) + geom_smooth() + facet_grid(cut ~ .) + geom_point(alpha=0.4)
facet_wrapfacet_wrap(facets = . ~ ., scales = "fixed", ...) # e.g.: ncol=3 ,nrow=3
facet_wrapggplot(diamonds, aes(x=carat, y=price)) + geom_point() + facet_wrap(~clarity, nrow=2)
facet_XXX) when
asthetics) when
geoms or scaletitle, axis labels, axis ticks labels,legend labels, legend key labelsp + theme_grey() # see plot to the left p + theme_bw() #see plot to the right p + theme() # to set options
ggplot(diamondssub, aes(x=carat, y=price, linetype=cut)) + geom_smooth() +
theme(legend.position = "bottom", legend.direction = "horizontal",
title= element_text(face = "bold"),
panel.background = element_rect(fill = "#999922"),
panel.grid.minor = element_line(color="red", linetype="dotted"))
Labs control the labelling of titles and axes
p + labs(title="Diamond bar plot", x="Carat in gramm", y="Price in USD")
Alternatively
p + ggtitle("Diamond bar plot") + xlab("Carat in gramm") + ylab("Price in USD")
ggsave() for exporting graphsggsave(filename="diamondplot.eps", plot=last_plot(),
width=10, height=5, dpi=300, units="cm")
ggsave(filename="diamondplot.pdf",
width=11, height=8.5, dpi=600, units="in")
ggplot2ggplot2ggplot2
ggvis is supposed to become the new ggplotggplot2 is slower than some other graphic systems (e.g. lattice)ggplot2ggplot2
ggplot2 for
ggmap),ggbio),ggmcmc)ggvis)Tufte, Edward R. 2001. The Visual Display of Quantitative Information. Vol. second edition. Bertrams.
Wickham, H. 2009. Ggplot2: Elegant Graphics for Data Analysis. Use R! Springer. http://books.google.de/books?id=bes-AAAAQBAJ.
Wilkinson, L. 1999. The Grammar of Graphics. Statistics and Computing. Springer.