getwd()#查看工作路径
## [1] "/Users/xrb/Desktop/R语言学习/UCLA网站学习/R Graphics_ Introduction to ggplot2 (1)_files"

Introduction to ggplot2学习笔记

说明:这篇文档主要记录UCLA网站中Introduction to ggplot2研讨课程的整体内容。

2023-8-3学习内容:

1.基础知识部分:

首先加载学习所用的R包

library(pacman)
p_load(ggplot2,MASS,tidyverse)

了解绘图语法中的基本元素:

2.基础语法介绍

ggplot2包使用的基础语法为:

ggplot() #注意和包的名称“ggplot2”进行区分
ggplot(dataset, aes(x=xvar, y=yvar)) + geom_function() 

做一个散点图:

# scatter plot of volume vs sales 
ggplot(txhousing, aes(x=volume, y=sales)) +   
  geom_point()
## Warning: Removed 568 rows containing missing values (`geom_point()`).

如果去掉geom_()部分则只生成XY坐标轴

ggplot(txhousing, aes(x=volume, y=sales))

可以使用”+“逐个添加图层,包括geoms, stats, scales, faceting, themes

丰富一下上面这幅图(添加地毯图层rug):

ggplot(txhousing, aes(x=volume, y=sales)) +     # x=volume and y=sales inherited by all layers     
  geom_point() +   
  geom_rug(aes(color=median))   # color will only apply to the rug plot because not specified in ggplot()

3.Aesthetics(aes)参数

Commonly used aesthetics:

  • x: positioning along x-axis

  • y: positioning along y-axis

  • color: color of objects; for 2-d objects, the color of the object’s outline (compare to fill below)

  • fill: fill color of objects

  • linetype: how lines should be drawn (solid, dashed, dotted, etc.)

  • shape: shape of markers in scatter plots

  • size: how large objects appear

  • alpha: transparency of objects (value between 0, transparent, and 1, opaque – inverse of how many stacked objects it will take to be opaque)

加入一些aes参数美化一下刚刚产出的散点图:

# mapping color to median inside of aes() 
ggplot(txhousing, aes(x=volume, y=sales)) +   
  geom_point(aes(color=median))
## Warning: Removed 568 rows containing missing values (`geom_point()`).

练习1:

#导入一个临时数据集

data(Sitka)

该数据集描述了树木随时间的生长情况,其中一些树木生长在富含臭氧的室内。变量说明:

  • size: numeric, log of size (height times diameter2)

  • Time: numeric, time of measurement (days since January 1, 1988)

  • tree: integer, tree id

  • treat: factor, treatment group, 2 levels=“control” and “ozone”

题目:

A. Create a scatter plot of Time vs size to view the growth of trees over time.

B. Color the scatter plot points by the variable .treat

C. Add an additional (loess) layer to the graph.geom_smooth()

D. SET the color of the loess smooth to “green” rather than have it colored by treat. Why is there only one smoothed curve now?

ggplot(Sitka,aes(x = Time,y = size,color = treat,fill = treat)) +
  geom_point() + 
  geom_smooth(method = lm)#按照threat进行分组
## `geom_smooth()` using formula = 'y ~ x'

ggplot(Sitka,aes(x = Time,y = size,color = treat)) +   
  geom_point() +    
  geom_smooth(method = lm,col = "green")#不按照threat分组
## `geom_smooth()` using formula = 'y ~ x'

4.设计对象Geoms

Geom函数可以为数据集绘制不同的几何图形。

常见的geoms:

  • geom_bar(): bars with bases on the x-axis

  • geom_boxplot(): boxes-and-whiskers

  • geom_errorbar(): T-shaped error bars

  • geom_density(): density plots

  • geom_histogram(): histogram

  • geom_line(): lines

  • geom_point(): points (scatterplot)

  • geom_ribbon(): bands spanning y-values across a range of x-values

  • geom_smooth(): smoothed conditional means (e.g. loess smooth)

  • geom_text(): text

常见图形绘制:

  1. 直方图——描述连续变量的分布情况
ggplot(txhousing, aes(x=median)) +    
  geom_histogram(color="steelblue",fill="lightblue")#单独为图形set颜色:color指定边框颜色;fill制定填充颜色

2.密度图——曲线平滑的直方图

ggplot(txhousing, aes(x=median)) +    
  geom_density()
## Warning: Removed 616 rows containing non-finite values (`stat_density()`).

密度图可以进行分组绘制:

ggplot(txhousing, aes(x=median,color = factor(month))) +   
# month为 数值型变量,需要用factor()转换为分组变量再进行操作,相当于stata中的encode命令
  geom_density()
## Warning: Removed 616 rows containing non-finite values (`stat_density()`).

3.箱形图——用于查看Y变量在X变量不同取值上的分布情况

ggplot(txhousing, aes(x=factor(year), y=median)) +    
  geom_boxplot()#画出房价在各年的分布
## Warning: Removed 616 rows containing non-finite values (`stat_boxplot()`).

4.柱状图——统计分类变量的频率

ggplot(diamonds, aes(x=cut, fill=clarity)) +    
  geom_bar()

5.散点图——描述两变量间的协方差关系

ggplot(txhousing, aes(x=volume, y=sales,
                      color=median, alpha=listings, size=inventory)) + #丰富画面内容   
  geom_point() 
## Warning: Removed 1468 rows containing missing values (`geom_point()`).

6.折线图

ggplot(txhousing, aes(x=date, y=sales, color=city)) +    
  geom_line()
## Warning: Removed 430 rows containing missing values (`geom_line()`).

尝试一下可视化我国省级数字普惠金融指数变化趋势:

library(haven) 
merge6 <- read_dta("~/Desktop/论文/小论文-数字经济与农户收入/数据/目前可用的/merge6.dta")

ggplot(merge6, aes(x=year,y=index_aggregate,color=prov_name_eng)) +
  geom_line()
## Warning: Removed 434 rows containing missing values (`geom_line()`).

练习2:

We will be using the Sitka data set again for this exercise.

A. Using 2 different geoms, compare the distribution of size between the two levels of treat .Use a different color for each distribution.

B. Use a bar plot to visualize the crosstabulation of Time and treat. Put Time on the x-axis.

C. Create a line graph of size over Time, with separate lines by tree and lines colored by treat.

D. Imagine you plan to submit your line graph of size over time by tree to a journal that does not print color graphs. How else can you distinguish between the 2 treatments on the graph?

解答:

ggplot(Sitka,aes(x=size,fill=treat)) +
  geom_histogram() 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(Sitka,aes(x=size,color=treat)) +   
  geom_density() 

tab = table(Sitka$Time,Sitka$treat)
barplot(tab,
        xlab = "Time",
        ylab = "treat_frequence",
        col = c("green","blue"))

ggplot(Sitka,aes(x=Time,y=size,group=tree))+
  geom_line(aes(color=treat))

D.期刊要求为黑白图片时,可使用不同形状的线linetype对组别进行区分:

ggplot(Sitka,aes(x=Time,y=size,group=tree))+   
  geom_line(aes(linetype=treat))

---

2023-8-4

5.统计量Stats、尺度Scales、坐标系Coordinate Systems和分子集(facet

5.1 stat function

统计函数对数据进行统计转换,通常作为某种形式的总结,如平均值,或标准差,或置信区间。

每个stat函数都与一个默认的geom相关联,所以不需要geom来呈现形状.

# 使用txhousing数据集,summarize sales (y) for each year (x) 
ggplot(txhousing,aes(x=year,y=sales))+
  stat_summary()
## Warning: Removed 568 rows containing non-finite values (`stat_summary()`).
## No summary function supplied, defaulting to `mean_se()`

Create a new plot where x is mapped to Time and y is mapped to size. Then, add a stat_summary() layer.

ggplot(Sitka,aes(x=Time,y=size))+
  stat_summary()
## No summary function supplied, defaulting to `mean_se()`

5.2 scale function

  • Scales定义了哪些aes值被映射到数据值上。

Here is a color scale that ggplot2 chooses for us:

ggplot(diamonds, aes(x=carat, y=price, color=cut)) +  #按照cut匹配颜色  
  geom_point()

可以使用scale_color_manual()手动设置value=()的颜色:

ggplot(diamonds, aes(x=carat, y=price, color=cut)) +    
  geom_point() +   
  scale_color_manual(values=c("red", "yellow", "green", "blue", "violet"))

  • scale函数作用于Y坐标轴

添加scale_y_continuous()调整y坐标轴的分断区间breaks

调整前:

ggplot(diamonds, aes(x=carat, y=price, color=cut)) + 
  geom_point() +
  scale_color_manual(values=c("red", "yellow", "green", "blue", "violet")) 

此时y轴刻度线分别为:0、5000、10000和15000。

调整后:

ggplot(diamonds, aes(x=carat, y=price, color=cut)) + 
  geom_point() +
  scale_color_manual(values=c("red", "yellow", "green", "blue", "violet")) + 
  scale_y_continuous(breaks=c(0,2500,5000,7500,10000,12500,15000,17500))#设置y轴分断值

现在,我们继续把上图y轴中的(美元)单位替换为(千美元)单位。给每个break处添加对应的数值,最后给y轴修改标题为:price(thousands of dollars)

ggplot(diamonds, aes(x=carat, y=price, color=cut)) + 
  geom_point() +
  scale_color_manual(values=c("red", "yellow", "green", "blue", "violet")) + 
  scale_y_continuous(breaks=c(0,2500,5000,7500,10000,12500,15000,17500),
                     labels = c(0,1.5,5,7.5,10,12.5,15,17.5),#这里每个lables的值对应上方breaks中的值,特意用1.5对应了2500
                     name = "price(thousands of dollars)")

  • scale函数作用于X坐标轴

给X轴的取值划定一个范围并修改坐标轴的标题:

ggplot(diamonds, aes(x=carat, y=price, color=cut)) +    
  geom_point() +   
  xlim(c(0,3))+# cut 0-3之间的
  labs(x="CARAT", y="PRICE", color="Diamond_CUT", title="CARAT vs PRICE by CUT")#修改X、Y轴以及图例标题
## Warning: Removed 32 rows containing missing values (`geom_point()`).

  • Guide(图例设置)

在上文做的所有图中,默认输出带有scale guide作用的图例,可以使用guides()命令将其移除。

Guides for each scale can be set scale-by-scale with the guide argument, or en masse with guides().

ggplot(diamonds, aes(x=carat, y=price, color=cut)) +    
  geom_point() +   
  labs(x="CARAT", y="PRICE",title="CARAT vs PRICE by CUT")+
  guides(color="none")#去除按cut映射颜色的guide图例

5.3 Coordinate system

坐标系统定义了物体在图上的空间定位平面,大多数图都使用笛卡尔坐标系。

用切面函数faceting functions将图按切面变量的子集分成小的(panels)进行排布 。此部分学习的函数包括facet_wrap()facet_grid()。

  • facet_wrap() 函数:
ggplot(diamonds, aes(x=carat, y=price)) +    
  geom_point() +    
  facet_wrap(~cut) # 根据变量cut的各个子集分组做图,按列展示。

  • facet_grid() 函数:

facet_grid()允许直接指定哪些变量被用来沿着行和列分割图。把分行变量放在 ~前面,分列变量放在 ~后面。

ggplot(diamonds, aes(x=carat, y=price)) +       
  geom_point(color = "steelblue",shape = 17,size = 0.5) +  #为散点设置颜色、形状、尺寸     
  facet_grid(clarity~cut)#每行按照clarity分割,每列按照cut分割

上面这幅图使用facet_grid(clarity~cut) 参数实现了每行不同清澈度子集,每列不同切割度子集的钻石,其价格和重量的散点相关情况。

以上就是facet()函数的主要用法,更详细的facet语法和参数配置可以查看Facets for ggplot2 in R.

练习3:

Use the Sitka data set.

A. Recreate the line plot Time of vs size, with the color of the lines mapped to treat. Use scale_color_manual() to change the colors to “orange” and “purple”.

B. Use to scale_x_continuous() convert the x-axis from days to months. First, relabel the tick marks from (150,180,210,240) to (5,6,7,8). Then retitle the x-axis “time(months)”.

C. Split the scatter plot into a panel of scatter plots by tree. (Note: Make the graph area large; graph may take a few seconds to appear)

解答:

ggplot(Sitka,aes(x=Time,y=size,color=treat)) +
  geom_line()

ggplot(Sitka,aes(x=Time,y=size,color=treat)) +   
  geom_line()+
  scale_color_manual(values=c("orange","purple"))+ #修改折线颜色
  facet_wrap(~treat)

ggplot(Sitka,aes(x=Time,y=size,color=treat)) +      
  geom_line()+   
  scale_color_manual(values=c("orange","purple")) +#修改折线颜色
  scale_x_continuous(breaks = c(150,180,210,240),
                     labels = c(5,6,7,8),
                     name = "time(months)")#修改X轴信息

ggplot(Sitka,aes(x=Time,y=size))+
  geom_point(color = "forestgreen")+
  scale_x_continuous(breaks = c(150,180,210,240),
                      labels = c(5,6,7,8),)+#修改X轴信息
  facet_wrap(~treat,ncol = 1)

6.Themes——主题风格

通过参数改变主题外观

主题设置的主要参数:

  • element_line() - can specify color,linewidth,linetype etc.

  • element_rect() - can specify fill,color,size , etc.

  • element_text() - can specify family,face,size,color,angle etc.

  • element_blank() - removes theme elements from graph

给图形的坐标轴进行颜色和线宽的设置:

ggplot(txhousing, aes(x=volume, y=sales, color=median)) +    
  geom_point() +   
  theme(axis.line=element_line(color="black",linewidth = 2)) # linewidth in mm 
## Warning: Removed 568 rows containing missing values (`geom_point()`).

进一步可以设置背景颜色、字体、轴标题等细节:

ggplot(txhousing, aes(x=volume, y=sales, color=median)) + 
  geom_point() +
  theme(axis.line=element_line(color="black", linewidth=2),
        panel.background=element_rect(fill="white", color="gray"),
        title=element_text(family="serif", face="bold")) 
## Warning: Removed 568 rows containing missing values (`geom_point()`).

使用完整主题改变整体外观

除了通过一个个参数来调整外观,ggplot2包也提供了一些完整的theme主题,使用它们可以更简单地改变图形的背景线条风格。

一些完整的theme

  • theme_bw()

  • theme_light()

  • theme_dark()

  • theme_classic()

下面,我们直接调用完整主题来改变图形外观:

ggplot(txhousing, aes(x=volume, y=sales, color=median)) + 
  geom_point() +
  theme_classic()+
  labs(title = 'classic style')
## Warning: Removed 568 rows containing missing values (`geom_point()`).

ggplot(txhousing, aes(x=volume, y=sales, color=median)) + 
  geom_point() +
  theme_dark() +
  labs(title = 'dark style')
## Warning: Removed 568 rows containing missing values (`geom_point()`).

7.把图形保存到文件

ggsave()命令可以将生成的图形保存到文件(默认情况下,会保存最后生成的图形),支持的格式包括:eps/ps, tex (pictex), pdf, jpeg, tiff, png, bmp, svg, wmf.

一些参数设置:

  • width

  • height

  • units:units of width and height of plot file ("in","cm"or"mm")

  • dpi: plot resolution in dots per inch

  • plot: name of object with stored plot

#save last displayed plot as pdf
ggsave("plot.pdf")#保存为pdf格式,图片名为“plot“
## Saving 7 x 5 in image
## Warning: Removed 568 rows containing missing values (`geom_point()`).
#if you're working with lots of graphs, you can store them in R objects
p <- ggplot(Sitka, aes(x=Time, y=size)) + 
  geom_point()
#You can then use the plot argument of ggsave() to specify which plot to save instead of the last
ggsave("myplot.png", plot=p)#用plot指定要保存的图像
## Saving 7 x 5 in image

练习4:

数据说明:

本练习使用MASS包中的Rabbits数据集创建图形,Run data(Rabbit) and then click on Rabbit in the RStudio Environment pane.

The Rabbit data set describes an experiment where:

5 rabbits were each given both a treatment drug (MDL) and a control drug.

After injection of either drug, each rabbit was then given one of 6 doses (6.25, 12.5, 25, 50, 100, 200 micrograms) of another chemical that interacts with the drug to affect blood pressure.

Change in blood pressure was measured as the outcome.

Thus each rabbit’s blood pressure was measured 12 times, 6 each for treatment and control.

The data set contains 60 rows (5 rabbits measured 12 times) of the following 5 variables:

  • BPchange: change in blood pressure relative to the start of the experiment

  • Dose: dose of interacting chemical in micrograms

  • Run: label of trial

  • Treatment: Control or MDL

  • Animal: animal ID (“R1” through “R5”)

题目要求:

Goal: create a dose-response curve for each rabbit under each treatment, resulting in 10 curves (2 each for 5 rabbits)

Constraints: no color, but publication quality (imagine submitting to a journal that only accepts non-color figures)

提示(可能的步骤):

A. First, try creating a line graph with Dose on the x-axis and BPchange on the y-axis, with separate linetypes by Animal.

Why does this graph look wrong?

B. Draw separate lines by Treatment. How can we accomplish this without color?

Some of the line patterns still look a little too similar to distinguish between rabbits.

C. Add a scatter plot where the shape of the points is mapped to Animal.

Next we will change the shapes used. See for a list of codes for shapes.?pch

D. Use scale_shape_manual() to change the shapes of the points. Use the shapes corresponding to the codes (0, 3, 8, 16, 17).

Ok, the graph has all the data we want on it. Now, we’ll prepare it for publication.

E. Change the x-axis title to “Dose(mcg)” and the y-axis title to “Change in blood pressure”.

Finally, we will change some of the elements.theme()

F. First, change the background from gray to white (or remove it) using in theme(panel.background).

G. Next, change the color of the grid lines to “gray90”, a light gray using panel.grid.

H. Use to titleface change the titles (axes and legend) to bold .

I. Use strip.textface to change the facet titles to bold .

J. Save your last plot as pubilc.png.

解答:

library(pacman)
p_load(MASS,ggplot2)
data(Rabbit)#import data

做图:

#A.
ggplot(Rabbit,aes(x=Dose,y=BPchange,linetype=Animal))+
  geom_line()#按照编号生成每个兔子血压和用药剂量的折线图

#B.
ggplot(Rabbit,aes(x=Dose,y=BPchange,linetype=Animal))+   
  geom_line()+
  facet_grid(Animal~Treatment)#按照是否注射药物处理,进行控制组和处理组的分割(发现此药物可能有降血压的作用)

#C.
ggplot(Rabbit,aes(x=Dose,y=BPchange,linetype=Animal,shape=Animal))+  #给不同的兔子映射不同形状的散点 
  geom_line()+
  facet_grid(Animal~Treatment)+
  geom_point()+#添加散点图层
  scale_shape_manual(values = c(0,3,8,16,17))#?pch查看点的各种形状,设置区分度较高的几种形状

#D.调整外观以符合出版要求
pubilc = ggplot(Rabbit,aes(x=Dose,y=BPchange,linetype=Animal,shape=Animal))+  
  geom_line()+
  facet_grid(Animal~Treatment)+
  geom_point()+
  scale_shape_manual(values = c(0,3,8,16,17))+
  labs(x="Dose(mcg)",y="Change in blood pressure",title = "BP vs Dose")+
  theme(plot.title = element_text(hjust = 0.5),#标题水平居中
        panel.background = element_rect(fill = "white"),#设置背景填充色为白色
        panel.grid = element_line(colour = "grey90"),#设置网格线颜色
        title = element_text(face = "bold"),#加粗标题,坐标轴,图例
        strip.text = element_text(face = "bold"))#加粗分割标题
ggsave("public.pdf",plot = pubilc)
## Saving 7 x 5 in image
#E.
pubilc2 = ggplot(Rabbit,aes(x=Dose,y=BPchange,linetype=Animal,shape=Animal))+  
  geom_line()+
  facet_wrap(~Treatment)+
  geom_point()+
  scale_shape_manual(values = c(0,3,8,16,17))+
  labs(x="Dose(mcg)",y="Change in blood pressure",title = "BP vs Dose")+
  theme(plot.title = element_text(hjust = 0.5),#标题水平居中
        panel.background = element_rect(fill = "white"),#设置背景填充色为白色
        panel.grid = element_line(colour = "grey90"),#设置网格线颜色
        title = element_text(face = "bold"),#加粗标题,坐标轴,图例
        strip.text = element_text(face = "bold"))#加粗分割标题
ggsave("public2.pdf",plot = pubilc2)
## Saving 7 x 5 in image

2023-8-5

8.分类变量Factors vs 数值变量numeric variables in ggplot2

导入这部分要使用的数据集:

library(MASS)
data("birthwt")

birthwt 数据集描述了风险因素婴儿低出生体重之间的关系。包含的变量有:

  • low: 0/1 indicator of birth weight < 2.5 kg

  • age: mother’s age

  • lwt: mother’s weight in pounds

  • race: mother’s race, (1=white, 2=black, 3=other)

  • smoke: 0/1 indicator of smoking during pregnancy

  • ptl: number of previous premature labors

  • ht: 0/1 indicator of history of hypertension

  • ui: 0/1 indicator of uterine irritability

  • ftv: numer of physician visits during first trimester

在R中,可以使用factor()函数来编码encode分类变量。

ggplot2中既可以映射到numeric变量,又可以映射到factor变量美学参数项aesthetics有:

  • x and y: continuous or discrete axes

  • color and fill: color gradient scales (颜色梯度)or evenly-spaced hue scales(间隔颜色)

只能被映射到分类变量上的aes参数:

  • shape

  • linetype

只能被映射到数值变量上的aes参数:

  • size

  • alpha

在可映射两种类型变量的aes参数中使用不同的变量类型可输出不同的图形效果:

ggplot(birthwt,aes(x=age,y=bwt,color=race))+#race作为数值型变量进行映射
  geom_point()

ggplot(birthwt,aes(x=age,y=bwt,color=factor(race)))+#race作为分类变量进行映射
  geom_point()

因此,在绘图之前,我们最好把分类变量用factor()函数转化为正确的格式,在转化的同时,可以为每种分类赋予你想要的label

例如,我们想绘制一张描述婴儿出生体重bwt和母亲是否抽烟smoke之间关系的箱形图boxplot,可以这么做:

birthwt$smokef = factor(birthwt$smoke,levels = 0:1,labels = c("did not smoke","smoked"))#数据预处理
ggplot(birthwt,aes(y=bwt,fill=smokef))+
  geom_boxplot()

练习5

A. For the birthwt data, convert ht to a factor and label the values 0 and 1 “non-hyper” and “hyper”, respectively.

B. Create boxplots of bwt(birth weight), colored by ht , with separate panels by smokef.

birthwt$htr = factor(birthwt$ht,levels = 0:1,labels = c("non-hyper","hyper"))#将母亲是否超重ht转换为分类变量
ggplot(birthwt,aes(y=bwt,fill=htr))+
  geom_boxplot()+
  facet_wrap(~smokef)