ggplot的作者和原理

Hadely对ggplot2的图形总结：“一张图就是从数据到几何对象(geometric object, 缩写为geom, 包括点、线、条形等)的图形属性(aesthetic attributes, 缩写为aes, 包括颜色、形状、大小等)的一个映射(mapping)。此外,图形中还可能包含数据的统计变换(statistical transformation, 缩写为stats), 最后绘制在某个特定的坐标系(coordinate system, 缩写为coord)中, 而分面(facet,指将绘图窗口划分为若干个子窗口)则可以用来生成数据中不同子集的图形。”

ggplot的特点

采用图层设计方式，图层之间的叠加用+，后面的图层在前面图层上方。
将表征数据和图形细节分开，能快速将图形表现出来。
扩展包丰富，有专门调整颜色、字体和主题等的辅助包。

> library(ggplot2)
> ggplot(mtcars,aes(x=wt, y=mpg))+geom_point()

ggplot基本语法

数据(变量)映射到几何对象(geom，包括点、线、面等)的图形属性(aes，包括颜色、形状、大小等)。此外还包括数据的统计变换(stats)、绘制特定坐标系(coord)、形成分面(facet)等过程

Plot = data + Aesthetics + Geometry
在ggplot()函数中制定数据集data
在aes()函数中制定图形属性的映射变量，aes()函数必须嵌套在ggplot()中，或嵌套在geom_xxx()函数中
在geom_xxx()函数中制定数据绘图类型

– geom_point（）绘制散点图

– geom_bar（）绘制条形图

– geom_line（）绘制线图

– geom_histogram（）绘制直方图

– geom_boxplot（）绘制箱式图

– geom_density（）绘制概率密度函数

– geom函数约有40种

其他函数可以以图层的方式对图形进行修饰

> ggplot(mtcars, aes(x=wt, y=mpg, color=cyl, size=cyl)) +
+       geom_point()

> #可以把上面的图分解
> p <-ggplot(mtcars, aes(x=wt, y=mpg, color=cyl, size=cyl))
> p#只生成底图和框架

> p+geom_point()#加散点图图层

> #其实geom_point()只是图层包装函数，背后隐藏的是layer()
> p+layer(
+   mapping = NULL, 
+   data = NULL,
+   geom = "point", 
+   stat = "identity",
+   position = "identity"
+ )

layer()图层组成

1.映射（mapping）：绘图中图形属性与数据中的对应关系,在aes() 函数中指定，未制定的话从ggplot()函数继承
2.数据（data）：需要可视化的数据集，多数据集以图层形式叠加
3.几何对象（geom）：如bar条形图，point散点图
4.统计变换（stats）：对数据进行统计变换的方式。To keep the data as is, use the “identity” stat.
5.位置（position）：图形位置的形式，jittering（扰动，避免重合）， stacking（堆积），dodging（并列）等

ggplot 其他图形调整组成

标度（scale）：变量以什么形状、颜色映射到图形上
分面（facet）：将数据拆开，进行分层画图
主题（theme）：主题设定，与数据无关的图层可以通过这个函数实现
注释(annotate):对图形增加文字类的注释

ggplot实战

Scatterplot 以mtcars数据集为例，绘制wt（横轴）和mpg（纵轴）的散点图，以am变量为颜色选项

选中单击查看



```{.r .watch-out}
> ggplot(mtcars, aes(x=wt, y=mpg, color=am))+geom_point() 
```

在上面的代码中，颜色用as.factor(am)会有何不同？

比较以下两种绘图颜色的差别及原因

> ggplot(mtcars, aes(x=wt, y=mpg, color="blue"))+geom_point()

> ggplot(mtcars, aes(x=wt, y=mpg ),color="blue")+geom_point()

> ggplot(mtcars, aes(x=wt, y=mpg ))+geom_point(color="blue")

> ggplot(mtcars, aes(x=wt, y=mpg ))+geom_point(aes(color="blue"))

> ggplot(mtcars, aes(x=wt, y=mpg ))+geom_point(aes(color="blue"))+scale_colour_identity()

boxplot

> ggplot(data=iris,aes(x=Species, y=Sepal.Length)) + geom_boxplot(aes(fill=Species))

Histogram

> ggplot(data=iris, aes(x=Sepal.Width)) + geom_histogram(binwidth=0.2, color="black", aes(fill=Species))

Barplot

> ggplot(data=iris, aes(x=Species)) + geom_bar()

lineplot

> ggplot(data=iris, aes(x=Sepal.Length,y=Sepal.Width,group=Species,color=Species)) + geom_line(aes(linetype=Species))

图形修饰

> ggplot(data=iris, aes(x=Sepal.Length,y=Sepal.Width,group=Species,color=Species)) +  geom_line(aes(linetype=Species), size = 1.2) +
+   geom_point(aes(shape=Species), size = 3) +        
+   scale_shape_manual(values=c(6, 5, 4)) +               
+   scale_linetype_manual(values=c("dotdash", "solid", "dotted")) +
+   xlab("Sepal Length") + ylab("Sepal Width") + ggtitle("Line plot of sepal length and width")

分面功能

> ggplot(data=iris, aes(x=Sepal.Length,y=Sepal.Width,color=Species)) +  
+   geom_point(aes(shape=Species), size = 3) +
+   facet_wrap(~Species)

ggplot()的两个分面函数facet_wrap() 和facet_grid()
这两个函数的区别：

ggplot统计功能与绘图

描述统计图 stat_summary()

stat_summary() operates on unique x or y;stat_summary_bin() operates on binned x or y. They are more flexible versions of stat_bin(): instead of just counting, they can compute any aggregate.

> ggplot(data=iris, aes(x=Species,y=Sepal.Width,color=Species)) +  
+   geom_point(size = 3)

> ggplot(data=iris, aes(x=Species,y=Sepal.Width,color=Species)) +  
+   geom_point(position="jitter", size = 3)

> ggplot(data=iris, aes(x=Species,y=Sepal.Width,color=Species)) +  
+   geom_point(position="jitter", size = 3)+
+   stat_summary(
+     fun.y="mean",
+     geom='errorbar', 
+     aes(ymin=..y.., ymax=..y..), 
+     width=0.6, 
+     size=1.5,
+     colour="grey25"
+   )

直方图 geom_histogram()

Visualise the distribution of a single continuous variable by dividing the x axis into bins and counting the number of observations in each bin. Histograms (geom_histogram) display the count with bars

>  ggplot(data=iris, aes(x=Sepal.Width))+ geom_histogram(binwidth=0.2, color="black", fill="blue", aes(y=..density..))#两个.. 是ggplot的标识符，不是ggplot自定义的，而是需要计算的变量

分布密度函数 geom_density()

Computes and draws kernel density estimate, which is a smoothed version of the histogram.

>  ggplot(data=iris, aes(x=Sepal.Width, fill=Species))+ geom_density(stat="density", alpha=I(0.2))

散点+平滑 geom_smooth()

Aids the eye in seeing patterns in the presence of overplotting. geom_smooth() and stat_smooth() are effectively aliases: they both use the same arguments. Use stat_smooth() if you want to display the results with a non-standard geom.

>  ggplot(data=iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) +   geom_point(aes(shape=Species), size=1.5) +geom_smooth(method="lm")

## `geom_smooth()` using formula 'y ~ x'

> ggplot(data=iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) +   geom_point(aes(shape=Species), size=1.5) +geom_smooth(method="loess")

## `geom_smooth()` using formula 'y ~ x'

期末考核问题

(1)R语言最早始于哪一年？
(2)R语言为什么命名为R语言？
(3)R语言是基于什么语言开发的？
(4)R语言的两大基本功能是？
(5)Rstudio和R的关系是？
(6)以tidyverse包为例，列举R两种安装包的方式（截图和语言描述都可以）？
(7)R语言有哪些数据类型？
(8)R语言的数据结构类型有哪些？
(9)R中的向量是什么？
(10)各列举三条R语言的优势和劣势
(11)实际操作–rivers是R自带的北美河流长度信息，请以rivers为例，求出哪一个河流（序号）最长（把代码和结果粘贴下边，不要截图！！）？
(12)实际操作–以mtcars为例，除am、cyl和vs之外求所有变量的平均值（把代码和结果粘贴下边，不要截图！！）
(13)实际操作–以mtcars为例，分别用plot和ggplot函数绘制mpg与disp关系的散点图（把代码和结果粘贴下边，不要截图！！）
(14)实际操作–随机生成100个均值为1，标准差为2的正态随机数（把代码和结果粘贴下边，不要截图！！）
(15)实际操作–利用ggplot并以mtcars数据集为例，演示如何生成箱式图（把代码和结果粘贴下边，不要截图！！）
(16)实际操作–利用for循环实现从1开始到100的累加运算（把代码和结果粘贴下边，不要截图！！）
(17)实际操作–利用t检验检验mtcars中手动挡和自动挡的车mpg是否有显著性差异（把代码和结果粘贴下边，不要截图！！）
(18)实际操作–以diamonds数据为例，按照color和clarity分类计算平均价格（把代码和结果粘贴下边，不要截图！！）
(19)实际操作–利用R求下面矩阵行列式的值，并计算该矩阵的逆

\[ \begin{bmatrix} 6~~2 \\ 2~~5\\ \end{bmatrix} \]

（把代码和结果粘贴下边，不要截图！！）

(20)实际操作–利用R求下面矩阵的单位特征向量和特征根,并验证两个特征值对应的特征向量是正交的

\[ \begin{bmatrix} 6~~ 2 \\ 2~~ 5\\ \end{bmatrix} \]