getwd()#查看工作路径
## [1] "/Users/xrb/Desktop/R语言学习/UCLA网站学习/R Graphics_ Introduction to ggplot2 (1)_files"
说明:这篇文档主要记录UCLA网站中Introduction to ggplot2研讨课程的整体内容。
首先加载学习所用的R包
library(pacman)
p_load(ggplot2,MASS,tidyverse)
了解绘图语法中的基本元素:
ggplot2包使用的基础语法为:
ggplot() #注意和包的名称“ggplot2”进行区分
ggplot(dataset, aes(x=xvar, y=yvar)) + geom_function()
做一个散点图:
# scatter plot of volume vs sales
ggplot(txhousing, aes(x=volume, y=sales)) +
geom_point()
## Warning: Removed 568 rows containing missing values (`geom_point()`).
如果去掉geom_()部分则只生成XY坐标轴
ggplot(txhousing, aes(x=volume, y=sales))
可以使用”+“逐个添加图层,包括geoms, stats, scales, faceting, themes
丰富一下上面这幅图(添加地毯图层rug):
ggplot(txhousing, aes(x=volume, y=sales)) + # x=volume and y=sales inherited by all layers
geom_point() +
geom_rug(aes(color=median)) # color will only apply to the rug plot because not specified in ggplot()
aes)参数Commonly used aesthetics:
x: positioning along x-axis
y: positioning along y-axis
color: color of objects; for 2-d objects, the color
of the object’s outline (compare to fill below)
fill: fill color of objects
linetype: how lines should be drawn (solid, dashed,
dotted, etc.)
shape: shape of markers in scatter plots
size: how large objects appear
alpha: transparency of objects (value between 0,
transparent, and 1, opaque – inverse of how many stacked objects it will
take to be opaque)
加入一些aes参数美化一下刚刚产出的散点图:
# mapping color to median inside of aes()
ggplot(txhousing, aes(x=volume, y=sales)) +
geom_point(aes(color=median))
## Warning: Removed 568 rows containing missing values (`geom_point()`).
#导入一个临时数据集
data(Sitka)
该数据集描述了树木随时间的生长情况,其中一些树木生长在富含臭氧的室内。变量说明:
size: numeric, log of size (height times diameter2)
Time: numeric, time of measurement (days since January 1, 1988)
tree: integer, tree id
treat: factor, treatment group, 2 levels=“control” and “ozone”
题目:
A. Create a scatter plot of
Timevssizeto view the growth of trees over time.
B. Color the scatter plot points by the variable .
treat
C. Add an additional (loess) layer to the graph.
geom_smooth()
D. SET the color of the loess smooth to “green” rather than have it colored by
treat. Why is there only one smoothed curve now?ggplot(Sitka,aes(x = Time,y = size,color = treat,fill = treat)) + geom_point() + geom_smooth(method = lm)#按照threat进行分组## `geom_smooth()` using formula = 'y ~ x'
ggplot(Sitka,aes(x = Time,y = size,color = treat)) + geom_point() + geom_smooth(method = lm,col = "green")#不按照threat分组## `geom_smooth()` using formula = 'y ~ x'
Geom函数可以为数据集绘制不同的几何图形。
常见的geoms:
geom_bar(): bars with bases on the x-axis
geom_boxplot(): boxes-and-whiskers
geom_errorbar(): T-shaped error bars
geom_density(): density plots
geom_histogram(): histogram
geom_line(): lines
geom_point(): points (scatterplot)
geom_ribbon(): bands spanning y-values across a
range of x-values
geom_smooth(): smoothed conditional means
(e.g. loess smooth)
geom_text(): text
常见图形绘制:
ggplot(txhousing, aes(x=median)) +
geom_histogram(color="steelblue",fill="lightblue")#单独为图形set颜色:color指定边框颜色;fill制定填充颜色
2.密度图——曲线平滑的直方图
ggplot(txhousing, aes(x=median)) +
geom_density()
## Warning: Removed 616 rows containing non-finite values (`stat_density()`).
密度图可以进行分组绘制:
ggplot(txhousing, aes(x=median,color = factor(month))) +
# month为 数值型变量,需要用factor()转换为分组变量再进行操作,相当于stata中的encode命令
geom_density()
## Warning: Removed 616 rows containing non-finite values (`stat_density()`).
3.箱形图——用于查看Y变量在X变量不同取值上的分布情况
ggplot(txhousing, aes(x=factor(year), y=median)) +
geom_boxplot()#画出房价在各年的分布
## Warning: Removed 616 rows containing non-finite values (`stat_boxplot()`).
4.柱状图——统计分类变量的频率
ggplot(diamonds, aes(x=cut, fill=clarity)) +
geom_bar()
5.散点图——描述两变量间的协方差关系
ggplot(txhousing, aes(x=volume, y=sales,
color=median, alpha=listings, size=inventory)) + #丰富画面内容
geom_point()
## Warning: Removed 1468 rows containing missing values (`geom_point()`).
6.折线图
ggplot(txhousing, aes(x=date, y=sales, color=city)) +
geom_line()
## Warning: Removed 430 rows containing missing values (`geom_line()`).
尝试一下可视化我国省级数字普惠金融指数变化趋势:
library(haven)
merge6 <- read_dta("~/Desktop/论文/小论文-数字经济与农户收入/数据/目前可用的/merge6.dta")
ggplot(merge6, aes(x=year,y=index_aggregate,color=prov_name_eng)) +
geom_line()
## Warning: Removed 434 rows containing missing values (`geom_line()`).
We will be using the Sitka data set again for this
exercise.
A. Using 2 different geoms, compare the distribution of
sizebetween the two levels oftreat.Use a different color for each distribution.
B. Use a bar plot to visualize the crosstabulation of
Timeandtreat. PutTimeon the x-axis.
C. Create a line graph of
sizeoverTime, with separate lines bytreeand lines colored bytreat.
D. Imagine you plan to submit your line graph of size over time by tree to a journal that does not print color graphs. How else can you distinguish between the 2 treatments on the graph?
解答:
ggplot(Sitka,aes(x=size,fill=treat)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(Sitka,aes(x=size,color=treat)) +
geom_density()
tab = table(Sitka$Time,Sitka$treat)
barplot(tab,
xlab = "Time",
ylab = "treat_frequence",
col = c("green","blue"))
ggplot(Sitka,aes(x=Time,y=size,group=tree))+
geom_line(aes(color=treat))
D.期刊要求为黑白图片时,可使用不同形状的线linetype对组别进行区分:
ggplot(Sitka,aes(x=Time,y=size,group=tree))+
geom_line(aes(linetype=treat))
---
Stats、尺度Scales、坐标系Coordinate Systems和分子集(facet)统计函数对数据进行统计转换,通常作为某种形式的总结,如平均值,或标准差,或置信区间。
每个stat函数都与一个默认的geom相关联,所以不需要geom来呈现形状.
# 使用txhousing数据集,summarize sales (y) for each year (x)
ggplot(txhousing,aes(x=year,y=sales))+
stat_summary()
## Warning: Removed 568 rows containing non-finite values (`stat_summary()`).
## No summary function supplied, defaulting to `mean_se()`
Create a new plot where x is mapped to Time
and y is mapped to size. Then, add a
stat_summary() layer.
ggplot(Sitka,aes(x=Time,y=size))+
stat_summary()
## No summary function supplied, defaulting to `mean_se()`
Here is a color scale that ggplot2 chooses for us:
ggplot(diamonds, aes(x=carat, y=price, color=cut)) + #按照cut匹配颜色
geom_point()
可以使用scale_color_manual()手动设置value=()的颜色:
ggplot(diamonds, aes(x=carat, y=price, color=cut)) +
geom_point() +
scale_color_manual(values=c("red", "yellow", "green", "blue", "violet"))
scale函数作用于Y坐标轴添加scale_y_continuous()调整y坐标轴的分断区间breaks:
调整前:
ggplot(diamonds, aes(x=carat, y=price, color=cut)) +
geom_point() +
scale_color_manual(values=c("red", "yellow", "green", "blue", "violet"))
此时y轴刻度线分别为:0、5000、10000和15000。
调整后:
ggplot(diamonds, aes(x=carat, y=price, color=cut)) +
geom_point() +
scale_color_manual(values=c("red", "yellow", "green", "blue", "violet")) +
scale_y_continuous(breaks=c(0,2500,5000,7500,10000,12500,15000,17500))#设置y轴分断值
现在,我们继续把上图y轴中的(美元)单位替换为(千美元)单位。给每个break处添加对应的数值,最后给y轴修改标题为:price(thousands of dollars)
ggplot(diamonds, aes(x=carat, y=price, color=cut)) +
geom_point() +
scale_color_manual(values=c("red", "yellow", "green", "blue", "violet")) +
scale_y_continuous(breaks=c(0,2500,5000,7500,10000,12500,15000,17500),
labels = c(0,1.5,5,7.5,10,12.5,15,17.5),#这里每个lables的值对应上方breaks中的值,特意用1.5对应了2500
name = "price(thousands of dollars)")
scale函数作用于X坐标轴给X轴的取值划定一个范围并修改坐标轴的标题:
ggplot(diamonds, aes(x=carat, y=price, color=cut)) +
geom_point() +
xlim(c(0,3))+# cut 0-3之间的
labs(x="CARAT", y="PRICE", color="Diamond_CUT", title="CARAT vs PRICE by CUT")#修改X、Y轴以及图例标题
## Warning: Removed 32 rows containing missing values (`geom_point()`).
在上文做的所有图中,默认输出带有scale
guide作用的图例,可以使用guides()命令将其移除。
Guides for each scale can be set scale-by-scale with the
guide argument, or en masse with guides().
ggplot(diamonds, aes(x=carat, y=price, color=cut)) +
geom_point() +
labs(x="CARAT", y="PRICE",title="CARAT vs PRICE by CUT")+
guides(color="none")#去除按cut映射颜色的guide图例
坐标系统定义了物体在图上的空间定位平面,大多数图都使用笛卡尔坐标系。
用切面函数faceting
functions将图按切面变量的子集分成小的(panels)进行排布
。此部分学习的函数包括facet_wrap()和facet_grid()。
facet_wrap() 函数:ggplot(diamonds, aes(x=carat, y=price)) +
geom_point() +
facet_wrap(~cut) # 根据变量cut的各个子集分组做图,按列展示。
facet_grid() 函数:facet_grid()允许直接指定哪些变量被用来沿着行和列分割图。把分行变量放在
~前面,分列变量放在 ~后面。
ggplot(diamonds, aes(x=carat, y=price)) +
geom_point(color = "steelblue",shape = 17,size = 0.5) + #为散点设置颜色、形状、尺寸
facet_grid(clarity~cut)#每行按照clarity分割,每列按照cut分割
上面这幅图使用facet_grid(clarity~cut)
参数实现了每行不同清澈度子集,每列不同切割度子集的钻石,其价格和重量的散点相关情况。
以上就是facet()函数的主要用法,更详细的facet语法和参数配置可以查看Facets for ggplot2 in R.
Use the Sitka data set.
A. Recreate the line plot
Timeof vssize, with thecolorof the lines mapped totreat. Usescale_color_manual()to change the colors to “orange” and “purple”.
B. Use to
scale_x_continuous()convert the x-axis from days to months. First, relabel the tick marks from (150,180,210,240) to (5,6,7,8). Then retitle the x-axis “time(months)”.
C. Split the scatter plot into a panel of scatter plots by
tree. (Note: Make the graph area large; graph may take a few seconds to appear)
ggplot(Sitka,aes(x=Time,y=size,color=treat)) +
geom_line()
ggplot(Sitka,aes(x=Time,y=size,color=treat)) +
geom_line()+
scale_color_manual(values=c("orange","purple"))+ #修改折线颜色
facet_wrap(~treat)
ggplot(Sitka,aes(x=Time,y=size,color=treat)) +
geom_line()+
scale_color_manual(values=c("orange","purple")) +#修改折线颜色
scale_x_continuous(breaks = c(150,180,210,240),
labels = c(5,6,7,8),
name = "time(months)")#修改X轴信息
ggplot(Sitka,aes(x=Time,y=size))+
geom_point(color = "forestgreen")+
scale_x_continuous(breaks = c(150,180,210,240),
labels = c(5,6,7,8),)+#修改X轴信息
facet_wrap(~treat,ncol = 1)
主题设置的主要参数:
element_line() - can specify
color,linewidth,linetype etc.
element_rect() - can specify
fill,color,size , etc.
element_text() - can specify
family,face,size,color,angle etc.
element_blank() - removes theme elements from
graph
给图形的坐标轴进行颜色和线宽的设置:
ggplot(txhousing, aes(x=volume, y=sales, color=median)) +
geom_point() +
theme(axis.line=element_line(color="black",linewidth = 2)) # linewidth in mm
## Warning: Removed 568 rows containing missing values (`geom_point()`).
进一步可以设置背景颜色、字体、轴标题等细节:
ggplot(txhousing, aes(x=volume, y=sales, color=median)) +
geom_point() +
theme(axis.line=element_line(color="black", linewidth=2),
panel.background=element_rect(fill="white", color="gray"),
title=element_text(family="serif", face="bold"))
## Warning: Removed 568 rows containing missing values (`geom_point()`).
除了通过一个个参数来调整外观,ggplot2包也提供了一些完整的theme主题,使用它们可以更简单地改变图形的背景和线条风格。
一些完整的theme:
theme_bw()
theme_light()
theme_dark()
theme_classic()
下面,我们直接调用完整主题来改变图形外观:
ggplot(txhousing, aes(x=volume, y=sales, color=median)) +
geom_point() +
theme_classic()+
labs(title = 'classic style')
## Warning: Removed 568 rows containing missing values (`geom_point()`).
ggplot(txhousing, aes(x=volume, y=sales, color=median)) +
geom_point() +
theme_dark() +
labs(title = 'dark style')
## Warning: Removed 568 rows containing missing values (`geom_point()`).
ggsave()命令可以将生成的图形保存到文件(默认情况下,会保存最后生成的图形),支持的格式包括:eps/ps,
tex (pictex), pdf, jpeg, tiff, png, bmp, svg, wmf.
一些参数设置:
width
height
units:units of width and
height of plot file ("in","cm"or"mm")
dpi: plot resolution in dots per inch
plot: name of object with stored plot
#save last displayed plot as pdf
ggsave("plot.pdf")#保存为pdf格式,图片名为“plot“
## Saving 7 x 5 in image
## Warning: Removed 568 rows containing missing values (`geom_point()`).
#if you're working with lots of graphs, you can store them in R objects
p <- ggplot(Sitka, aes(x=Time, y=size)) +
geom_point()
#You can then use the plot argument of ggsave() to specify which plot to save instead of the last
ggsave("myplot.png", plot=p)#用plot指定要保存的图像
## Saving 7 x 5 in image
本练习使用MASS包中的Rabbits数据集创建图形,Run
data(Rabbit) and then click on Rabbit in the RStudio
Environment pane.
The Rabbit data set describes an experiment where:
5 rabbits were each given both a treatment drug (MDL) and a control drug.
After injection of either drug, each rabbit was then given one of 6 doses (6.25, 12.5, 25, 50, 100, 200 micrograms) of another chemical that interacts with the drug to affect blood pressure.
Change in blood pressure was measured as the outcome.
Thus each rabbit’s blood pressure was measured 12 times, 6 each for treatment and control.
The data set contains 60 rows (5 rabbits measured 12 times) of the following 5 variables:
BPchange: change in blood pressure relative to the
start of the experiment
Dose: dose of interacting chemical in
micrograms
Run: label of trial
Treatment: Control or MDL
Animal: animal ID (“R1” through “R5”)
Goal: create a dose-response curve for each rabbit under each treatment, resulting in 10 curves (2 each for 5 rabbits)
Constraints: no color, but publication quality (imagine submitting to a journal that only accepts non-color figures)
提示(可能的步骤):
A. First, try creating a line graph with
Doseon the x-axis andBPchangeon the y-axis, with separate linetypes byAnimal.
Why does this graph look wrong?
B. Draw separate lines by
Treatment. How can we accomplish this without color?
Some of the line patterns still look a little too similar to distinguish between rabbits.
C. Add a scatter plot where the shape of the points is mapped to
Animal.
Next we will change the shapes used. See for a list of codes for
shapes.?pch
D. Use
scale_shape_manual()to change the shapes of the points. Use the shapes corresponding to the codes (0, 3, 8, 16, 17).
Ok, the graph has all the data we want on it. Now, we’ll prepare it for publication.
E. Change the x-axis title to “Dose(mcg)” and the y-axis title to “Change in blood pressure”.
Finally, we will change some of the elements.theme()
F. First, change the background from gray to white (or remove it) using in
theme(panel.background).
G. Next, change the color of the grid lines to “gray90”, a light gray using
panel.grid.
H. Use to
titlefacechange the titles (axes and legend) to bold .
I. Use
strip.textfaceto change the facet titles to bold .
J. Save your last plot as
pubilc.png.
library(pacman)
p_load(MASS,ggplot2)
data(Rabbit)#import data
做图:
#A.
ggplot(Rabbit,aes(x=Dose,y=BPchange,linetype=Animal))+
geom_line()#按照编号生成每个兔子血压和用药剂量的折线图
#B.
ggplot(Rabbit,aes(x=Dose,y=BPchange,linetype=Animal))+
geom_line()+
facet_grid(Animal~Treatment)#按照是否注射药物处理,进行控制组和处理组的分割(发现此药物可能有降血压的作用)
#C.
ggplot(Rabbit,aes(x=Dose,y=BPchange,linetype=Animal,shape=Animal))+ #给不同的兔子映射不同形状的散点
geom_line()+
facet_grid(Animal~Treatment)+
geom_point()+#添加散点图层
scale_shape_manual(values = c(0,3,8,16,17))#?pch查看点的各种形状,设置区分度较高的几种形状
#D.调整外观以符合出版要求
pubilc = ggplot(Rabbit,aes(x=Dose,y=BPchange,linetype=Animal,shape=Animal))+
geom_line()+
facet_grid(Animal~Treatment)+
geom_point()+
scale_shape_manual(values = c(0,3,8,16,17))+
labs(x="Dose(mcg)",y="Change in blood pressure",title = "BP vs Dose")+
theme(plot.title = element_text(hjust = 0.5),#标题水平居中
panel.background = element_rect(fill = "white"),#设置背景填充色为白色
panel.grid = element_line(colour = "grey90"),#设置网格线颜色
title = element_text(face = "bold"),#加粗标题,坐标轴,图例
strip.text = element_text(face = "bold"))#加粗分割标题
ggsave("public.pdf",plot = pubilc)
## Saving 7 x 5 in image
#E.
pubilc2 = ggplot(Rabbit,aes(x=Dose,y=BPchange,linetype=Animal,shape=Animal))+
geom_line()+
facet_wrap(~Treatment)+
geom_point()+
scale_shape_manual(values = c(0,3,8,16,17))+
labs(x="Dose(mcg)",y="Change in blood pressure",title = "BP vs Dose")+
theme(plot.title = element_text(hjust = 0.5),#标题水平居中
panel.background = element_rect(fill = "white"),#设置背景填充色为白色
panel.grid = element_line(colour = "grey90"),#设置网格线颜色
title = element_text(face = "bold"),#加粗标题,坐标轴,图例
strip.text = element_text(face = "bold"))#加粗分割标题
ggsave("public2.pdf",plot = pubilc2)
## Saving 7 x 5 in image
导入这部分要使用的数据集:
library(MASS)
data("birthwt")
birthwt
数据集描述了风险因素和婴儿低出生体重之间的关系。包含的变量有:
low: 0/1 indicator of birth weight < 2.5
kg
age: mother’s age
lwt: mother’s weight in pounds
race: mother’s race, (1=white, 2=black,
3=other)
smoke: 0/1 indicator of smoking during
pregnancy
ptl: number of previous premature labors
ht: 0/1 indicator of history of
hypertension
ui: 0/1 indicator of uterine irritability
ftv: numer of physician visits during first
trimester
在R中,可以使用factor()函数来编码encode分类变量。
ggplot2中既可以映射到numeric变量,又可以映射到factor变量的美学参数项aesthetics有:
x and y: continuous or discrete
axes
color and fill: color gradient scales
(颜色梯度)or evenly-spaced hue scales(间隔颜色)
只能被映射到分类变量上的aes参数:
shape
linetype
只能被映射到数值变量上的aes参数:
size
alpha
在可映射两种类型变量的aes参数中使用不同的变量类型可输出不同的图形效果:
ggplot(birthwt,aes(x=age,y=bwt,color=race))+#race作为数值型变量进行映射
geom_point()
ggplot(birthwt,aes(x=age,y=bwt,color=factor(race)))+#race作为分类变量进行映射
geom_point()
因此,在绘图之前,我们最好把分类变量用factor()函数转化为正确的格式,在转化的同时,可以为每种分类赋予你想要的label。
例如,我们想绘制一张描述婴儿出生体重bwt和母亲是否抽烟smoke之间关系的箱形图boxplot,可以这么做:
birthwt$smokef = factor(birthwt$smoke,levels = 0:1,labels = c("did not smoke","smoked"))#数据预处理
ggplot(birthwt,aes(y=bwt,fill=smokef))+
geom_boxplot()
A. For the
birthwtdata, converthtto a factor and label the values 0 and 1 “non-hyper” and “hyper”, respectively.
B. Create boxplots of
bwt(birth weight), colored byht, with separate panels bysmokef.
birthwt$htr = factor(birthwt$ht,levels = 0:1,labels = c("non-hyper","hyper"))#将母亲是否超重ht转换为分类变量
ggplot(birthwt,aes(y=bwt,fill=htr))+
geom_boxplot()+
facet_wrap(~smokef)