安装 R and Rstudio

下载R软件国内镜像:https://mirrors.tuna.tsinghua.edu.cn/CRAN/
下载Rstudio:https://posit.co/download/rstudio-desktop/

注意:安装完R和Rstudio之后,我们使用的软件是Rstudio而不是R。RStudio是一个集成开发环境(IDE),旨在提供一个更友好和便捷的开发环境来使用R语言,它基于R来运行,所以R不可以卸载。

Rstudio 界面介绍

1 左上角用于写代码和保存代码。
2 左下角用于运行代码。
3 右上角的enviroment用于记录运行代码产生的数据集,history用于记录历史代码。
4 右下角file处可设置工作路径,plots显示运行代码产生的图片,packages用于安装R包,help可用于查看包和函数的使用方法。

设置 Global options

e.g.:
1 点开options后,选择packages,可对Primary SRAN的路径进行修改,选择国内镜像,在下载packages时可能会比国外快点。
2 Appearance中可以更改Rstudio的背景和代码颜色,可根据喜好修改。

创建一个new project和一个new script

1 创建new project 并设置路径:
工具栏File下选择New project,弹框选择New Directory –> New project –> 为project命名并选择保存的路径。

2 创建new script:
工具栏File下选择new script, 编写代码并保存。

设置工作目录

可在右下角的file/more中设置工作路径,也可以使用setwd()函数 setwd(“path”)

假如在你目前的工作目录下创建了week1、week2,下级目录可直接切换

setwd("week1") 

思考:同级目录下是否可以直接切换?

setwd("week2") 

Error in setwd(“DAY2”) : cannot change working directory

安装和加载R包(packages)

1 安装R包

install.packages("vcd") # 包含各种类型的数据和函数,可用于《R语言之数据统计分析》的自学练习。
install.packages("plyr") # 数据处理和操作的包
install.packages("dplyr") # 数据处理和操作的包
install.packages("tidyverse") #The easiest way to get ggplot2 is to install the whole tidyverse
install.packages("ggplot2") # 高级绘图

2 加载 R package

library(vcd)
## 载入需要的程辑包:grid

如果在加载R包时提示出现冲突,怎么办?

e.g.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

解决方案

install.packages("conflicted")
library(conflicted)

#查看所有冲突
conflicted::conflict_scout()
## 2 conflicts
## • `filter()`: dplyr and stats
## • `lag()`: dplyr and stats
#解除冲突
conflict_prefer("filter", "dplyr") #[conflicted] Will prefer dplyr::filter over any other package.
## [conflicted] Will prefer dplyr::filter over any other package.
conflict_prefer("lag", "dplyr") #[conflicted] Will prefer dplyr::lag over any other package.
## [conflicted] Will prefer dplyr::lag over any other package.
library(tidyverse)

一个简单的例子感受R包和函数的使用

1 查看R包”vcd”的帮助

library(vcd)
help(package="vcd")  

2 Arthritis为vcd包中的数据之一,显示该数据的前5行

head(Arthritis,10)
##    ID Treatment  Sex Age Improved
## 1  57   Treated Male  27     Some
## 2  46   Treated Male  29     None
## 3  77   Treated Male  30     None
## 4  17   Treated Male  32   Marked
## 5  36   Treated Male  46   Marked
## 6  23   Treated Male  58   Marked
## 7  75   Treated Male  59     None
## 8  39   Treated Male  59   Marked
## 9  33   Treated Male  63     None
## 10 55   Treated Male  63     None

3 在environment中创建Arthritis的数据框

data("Arthritis") 

4 使用summary()函数对Arthritis数据进行基本统计分析

summary(Arthritis)
##        ID          Treatment      Sex          Age          Improved 
##  Min.   : 1.00   Placebo:43   Female:59   Min.   :23.00   None  :42  
##  1st Qu.:21.75   Treated:41   Male  :25   1st Qu.:46.00   Some  :14  
##  Median :42.50                            Median :57.00   Marked:28  
##  Mean   :42.50                            Mean   :53.36              
##  3rd Qu.:63.25                            3rd Qu.:63.00              
##  Max.   :84.00                            Max.   :74.00

5 思考:
1)Arthritis 的数据排列方式是怎样的?行和列分别是什么?根据该数据格式替换成自己的数据。
2)执行data(“Arthritis”) 后,会在Environment中显示和存储Arthritis,其中int 和 factor是什么意思?
3)每次重新启动Rstudio时,是否需要重新加载需要的R包?
4)如果你的数据名称为“SHJLO”,请根据以上代码进行相应的替换
5)如何解读summary()计算给出的结果?

R 赋值,通过简单的计算认识两种符号 <- , =

把5赋值给a

a=5
a+2
## [1] 7

把6赋值给b

b <- 6
b+3
## [1] 9

思考:==是什么呢?

5==5
## [1] TRUE
5==6
## [1] FALSE

R中的快捷键

常用的查找、保存、撤回等与word中的类似。
查找:ctrl + f
保存:ctrl + s
撤回:ctrl + z

运行命令:ctrl + enter

常见错误

1 中英文字符号,代码中只能用英文字符。
2 括号少一半。
3 没有加载所需的R包。

不论哪种错误,学会看左下角运行运行结束时的ERROR的提示,理解错误在哪里。

两类错误warning 和 error的区别:warning 通常不是致命的,依然可以运行;error是致命的,终止运行。

创建数据集

1 向量

用于存储数值型、字符型或逻辑性数据的一维数据,执行组合功能的函数c()可用来创建向量。

age <- c(25, 30, 35, 40, 45) #数值型
age
## [1] 25 30 35 40 45
sex <- c("male","female","male","female") #字符型
sex
## [1] "male"   "female" "male"   "female"
logi <- c("TURE","FALSE","TURE","FALSE") #逻辑型
logi
## [1] "TURE"  "FALSE" "TURE"  "FALSE"

2 矩阵

二维数据组,可通过函数matrix()创建。

my_matrix <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, ncol = 3)
my_matrix
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

3 数组

维度>2,可通过函数array()创建

my_array <- array(1:27, dim = c(3, 3, 3))
my_array
## , , 1
## 
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
## 
## , , 2
## 
##      [,1] [,2] [,3]
## [1,]   10   13   16
## [2,]   11   14   17
## [3,]   12   15   18
## 
## , , 3
## 
##      [,1] [,2] [,3]
## [1,]   19   22   25
## [2,]   20   23   26
## [3,]   21   24   27

4 数据框

处理数据时最常见的结构,不同的列可以包含不同类型的数据,较矩阵来说更为常用,可用data.frame()函数创建

student_data <- data.frame(
  name = c("Alice", "Bob", "Charlie"),
  age = c(20, 22, 21),
  grade = c("A", "B", "C"),
  stringsAsFactors = FALSE
)
student_data
##      name age grade
## 1   Alice  20     A
## 2     Bob  22     B
## 3 Charlie  21     C

5 列表

一种灵活的数据结构,可以包含不同类型的对象,例如向量、矩阵、数据框等。可用list()函数创建

my_list <- list(
  name = "John",
  age = 25,
  scores = c(85, 90, 92),
  student_data = data.frame(
    name = c("Alice", "Bob", "Charlie"),
    age = c(20, 22, 21),
    grade = c("A", "B", "C"),
    stringsAsFactors = FALSE
    )
  )

my_list
## $name
## [1] "John"
## 
## $age
## [1] 25
## 
## $scores
## [1] 85 90 92
## 
## $student_data
##      name age grade
## 1   Alice  20     A
## 2     Bob  22     B
## 3 Charlie  21     C

读取数据

第一次读取数据可以用右上角environment下的import dataset –> From text –> 选择要读取的文件,导入到environment中,这时 在右下角的命令符>后面会生成相应的代码,将它复制在这里保存,下次读取可以直接运行代码,例如:

读取csv:

studC <- read.csv("F:/9.courses/R/studentsInf.csv") #此处请替换自己的数据

读取txt:

studT <- read.delim("F:/9.courses/R/studentsInf.txt")

输出数据

输出csv:

write.table(studC, file = "stuinf.csv")

输出txt:

write.table(studT, file = "stuinf.txt")

我个人比较喜欢使用csv格式

玩转数据

1 数据框行列反转

#显示studC前5行
head(studC,5)
##     sample    sex age height weight     from
## 1 student1   male  23    173   65.0 shandong
## 2 student2 female  22    170   56.0   shanxi
## 3 student3 female  22    160   52.0   yunnan
## 4 student4   male  24    172   57.5   yunnan
## 5 student5 female  22    165   49.0    henan
#行列反转
studC_R <- t(studC)
#显示studC_R前5行前4列
head(studC_R, 5)[, 1:4]
##        [,1]       [,2]       [,3]       [,4]      
## sample "student1" "student2" "student3" "student4"
## sex    "male"     "female"   "female"   "male"    
## age    "23"       "22"       "22"       "24"      
## height "173.0"    "170.0"    "160.0"    "172.0"   
## weight "65.0"     "56.0"     "52.0"     "57.5"

2 提取列

提取第6列

column_6 <- studC[, 6]

提取前3列

column_h3 <- studC[, 1:3]

提取第一列和第三列

column_13 <- studC[,c(1, 3)]

3 提取行

提取前5行

Row_h5 <- studC[1:5, ]
print(Row_h5)
##     sample    sex age height weight     from
## 1 student1   male  23    173   65.0 shandong
## 2 student2 female  22    170   56.0   shanxi
## 3 student3 female  22    160   52.0   yunnan
## 4 student4   male  24    172   57.5   yunnan
## 5 student5 female  22    165   49.0    henan

提取末尾5行

Row_t5 <- tail(studC, 5)
print(Row_t5)
##       sample    sex age height weight    from
## 41 student41 female  21    172     55  yunnan
## 42 student42 female  26    158     43  yunnan
## 43 student43 female  22    165     52  yunnan
## 44 student44 female  26    164     50  yunnan
## 45 student45 female  22    156     49 guizhou

提取第1,4,6,8行

Row_r1468 <- studC[c(1, 4, 6, 8), ]
print(Row_r1468)
##     sample    sex age height weight     from
## 1 student1   male  23    173   65.0 shandong
## 4 student4   male  24    172   57.5   yunnan
## 6 student6   male  25    172   62.0    henan
## 8 student8 female  21    163   65.0 liaoning

4 删除列

DelCol1 = Row_h5[, -3]
DelCol2 = Row_h5[, -(2:4)]
print(DelCol1)
##     sample    sex height weight     from
## 1 student1   male    173   65.0 shandong
## 2 student2 female    170   56.0   shanxi
## 3 student3 female    160   52.0   yunnan
## 4 student4   male    172   57.5   yunnan
## 5 student5 female    165   49.0    henan
print(DelCol2)
##     sample weight     from
## 1 student1   65.0 shandong
## 2 student2   56.0   shanxi
## 3 student3   52.0   yunnan
## 4 student4   57.5   yunnan
## 5 student5   49.0    henan

5 删除行

DelRow1 = Row_h5[-3,]
DelRow2 = Row_h5[-(2:4), ]
DelRow3 = Row_h5[-c(1,5), ]
print(DelRow1)
##     sample    sex age height weight     from
## 1 student1   male  23    173   65.0 shandong
## 2 student2 female  22    170   56.0   shanxi
## 4 student4   male  24    172   57.5   yunnan
## 5 student5 female  22    165   49.0    henan
print(DelRow2)
##     sample    sex age height weight     from
## 1 student1   male  23    173     65 shandong
## 5 student5 female  22    165     49    henan
print(DelRow3)
##     sample    sex age height weight   from
## 2 student2 female  22    170   56.0 shanxi
## 3 student3 female  22    160   52.0 yunnan
## 4 student4   male  24    172   57.5 yunnan

总结:对于行列的处理,提取或删除行就在逗号前操作,提取或删除列就在逗号后处理。

6 合并数据框

1)rbind()函数

#合并行
merged1 <- rbind(DelRow2, DelRow3) 
print(merged1)
##     sample    sex age height weight     from
## 1 student1   male  23    173   65.0 shandong
## 5 student5 female  22    165   49.0    henan
## 2 student2 female  22    170   56.0   shanxi
## 3 student3 female  22    160   52.0   yunnan
## 4 student4   male  24    172   57.5   yunnan
merged2 <- rbind(DelRow1, DelRow2) 
print(merged2)
##      sample    sex age height weight     from
## 1  student1   male  23    173   65.0 shandong
## 2  student2 female  22    170   56.0   shanxi
## 4  student4   male  24    172   57.5   yunnan
## 5  student5 female  22    165   49.0    henan
## 11 student1   male  23    173   65.0 shandong
## 51 student5 female  22    165   49.0    henan
#合并列
merged3<- cbind(DelCol1, DelCol2)
print(merged3)
##     sample    sex height weight     from   sample weight     from
## 1 student1   male    173   65.0 shandong student1   65.0 shandong
## 2 student2 female    170   56.0   shanxi student2   56.0   shanxi
## 3 student3 female    160   52.0   yunnan student3   52.0   yunnan
## 4 student4   male    172   57.5   yunnan student4   57.5   yunnan
## 5 student5 female    165   49.0    henan student5   49.0    henan

思考:为什么在merged2的第一列中出现11和51的序列号?

2)inner_join()函数,合并列

library(tidyverse)
inner_join(DelCol1, DelCol2, by = 'sample')
##     sample    sex height weight.x   from.x weight.y   from.y
## 1 student1   male    173     65.0 shandong     65.0 shandong
## 2 student2 female    170     56.0   shanxi     56.0   shanxi
## 3 student3 female    160     52.0   yunnan     52.0   yunnan
## 4 student4   male    172     57.5   yunnan     57.5   yunnan
## 5 student5 female    165     49.0    henan     49.0    henan

思考:对比inner_join()和cbind()函数的输出结果,有什么区别。

7 去重复

#去重复前
column_6
##  [1] "shandong"     "shanxi"       "yunnan"       "yunnan"       "henan"       
##  [6] "henan"        "yunnan"       "liaoning"     "heilongjiang" "hubei"       
## [11] "yunnan"       "sichuan"      "shanxi"       "guizhou"      "yunnan"      
## [16] "chongqing"    "yunnan"       "sichuan"      "hubei"        "yunnan"      
## [21] "guizhou"      "sichuan"      "hainan"       "yunnan"       "yunnan"      
## [26] "neimenggu"    "jiangsu"      "anhui"        "yunnan"       "guangxi"     
## [31] "yunnan"       "yunnan"       "yunnan"       "hubei"        "shandong"    
## [36] "yunnan"       "guizhou"      "feijian"      "guizhou"      "hubei"       
## [41] "yunnan"       "yunnan"       "yunnan"       "yunnan"       "guizhou"
#去重复后
DelDup <- unique(column_6)
print(DelDup)
##  [1] "shandong"     "shanxi"       "yunnan"       "henan"        "liaoning"    
##  [6] "heilongjiang" "hubei"        "sichuan"      "guizhou"      "chongqing"   
## [11] "hainan"       "neimenggu"    "jiangsu"      "anhui"        "guangxi"     
## [16] "feijian"

将字符型chr转换factor

factor在R中非常重要,它决定了数据的分析方式以及如何进行视觉呈现,例如:

输入数据

studC <- read.csv("F:/9.courses/R/studentsInf.csv")

1 转换前的summary

summary(studC)
##     sample              sex                 age            height     
##  Length:45          Length:45          Min.   :21.00   Min.   :147.0  
##  Class :character   Class :character   1st Qu.:22.00   1st Qu.:158.0  
##  Mode  :character   Mode  :character   Median :23.00   Median :163.0  
##                                        Mean   :23.04   Mean   :163.7  
##                                        3rd Qu.:24.00   3rd Qu.:170.0  
##                                        Max.   :27.00   Max.   :180.0  
##      weight          from          
##  Min.   :42.00   Length:45         
##  1st Qu.:50.00   Class :character  
##  Median :56.00   Mode  :character  
##  Mean   :58.08                     
##  3rd Qu.:66.00                     
##  Max.   :88.00

2 用as.foctor()函数转换

studC$sex <- as.factor(studC$sex)
studC$from <- as.factor(studC$from)

3 转换后的summary

summary(studC)
##     sample              sex          age            height          weight     
##  Length:45          female:33   Min.   :21.00   Min.   :147.0   Min.   :42.00  
##  Class :character   male  :12   1st Qu.:22.00   1st Qu.:158.0   1st Qu.:50.00  
##  Mode  :character               Median :23.00   Median :163.0   Median :56.00  
##                                 Mean   :23.04   Mean   :163.7   Mean   :58.08  
##                                 3rd Qu.:24.00   3rd Qu.:170.0   3rd Qu.:66.00  
##                                 Max.   :27.00   Max.   :180.0   Max.   :88.00  
##                                                                                
##        from   
##  yunnan  :18  
##  guizhou : 5  
##  hubei   : 4  
##  sichuan : 3  
##  henan   : 2  
##  shandong: 2  
##  (Other) :11

4 思考:转换前和转换后summary给出结果的差异在哪?

图形初阶

1 饼图 pie函数画

sexRatio <- c(12, 33)
# 创建标签
labels <-  c("male","female")# 每个部分的标签
# 绘制饼图
pie(sexRatio, labels = labels, main = "2023届专硕班男女比例") 

2 柱状分布图 hist()函数

hist(studC$height,nclass=10, ylab="Frequency", xlab = 'height',col ='blue' )

hist(studC$weight,nclass=10, ylab="Frequency", xlab = 'weight',col ='green' )

3 把两个图合并到一个图上,便于比较

par(mfrow = c(1,2))
hist(studC$height,nclass=10, ylab="Frequency", xlab = 'height',col ='blue' )
hist(studC$weight,nclass=10, ylab="Frequency", xlab = 'weight',col ='green' )

##dev.off()函数或者重新设置图形参数为默认值
dev.off()
## null device 
##           1

思考:图形与summary数据摘要进行对比理解。

summary(studC)
##     sample              sex          age            height          weight     
##  Length:45          female:33   Min.   :21.00   Min.   :147.0   Min.   :42.00  
##  Class :character   male  :12   1st Qu.:22.00   1st Qu.:158.0   1st Qu.:50.00  
##  Mode  :character               Median :23.00   Median :163.0   Median :56.00  
##                                 Mean   :23.04   Mean   :163.7   Mean   :58.08  
##                                 3rd Qu.:24.00   3rd Qu.:170.0   3rd Qu.:66.00  
##                                 Max.   :27.00   Max.   :180.0   Max.   :88.00  
##                                                                                
##        from   
##  yunnan  :18  
##  guizhou : 5  
##  hubei   : 4  
##  sichuan : 3  
##  henan   : 2  
##  shandong: 2  
##  (Other) :11

图形进阶

4 相关性分析

1)使用全部数据比较体重和身高的相关性

library(ggpubr)
p1 <- ggscatter(studC, x = "height", y = "weight", 
          add = "reg.line", #添加线性回归线
          conf.int = TRUE, 
          cor.coef = TRUE, cor.method = "pearson",
          xlab = "height", ylab = "Weight", col ='pink')
print(p1)

2)仅使用female的值观察体重和身高的关系

#提取female data
femaleinf <- subset(studC, sex == "female")
head(femaleinf,5)
##     sample    sex age height weight     from
## 2 student2 female  22    170     56   shanxi
## 3 student3 female  22    160     52   yunnan
## 5 student5 female  22    165     49    henan
## 7 student7 female  21    160     50   yunnan
## 8 student8 female  21    163     65 liaoning
#相关性
p2 <- ggscatter(femaleinf, x = "height", y = "weight", 
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.method = "pearson",
          xlab = "height", ylab = "Weight", col ='pink')
print(p2)

思考:

1)身高和体重是否具有相关性?
2)两部分结果中身高和体重的相关性哪个更高?

提示:
R值衡量了两个变量之间的线性相关程度,取值范围为-1到1。当R值接近1时,表示两个变量之间存在强正相关关系;
当R值接近-1时,表示两个变量之间存在强负相关关系;
当R值接近0时,表示两个变量之间存在较弱或无相关关系。
可以通过观察散点图上的数据点分布和回归线的趋势来初步判断相关性的强度和方向。

5 violin plot 小提琴图

library(ggplot2)

# 创建示例数据
group <- rep(c("Group 1", "Group 2"), each = 100)
value <- c(rnorm(100, mean = 0, sd = 1), rnorm(100, mean = 2, sd = 1))
data <- data.frame(group, value)

# 创建小提琴图
p3 <- ggplot(data, aes(x = group, y = value, fill = group)) + 
  geom_violin(trim = FALSE) +
  geom_boxplot(width = 0.1, fill = "white", color = "black") +
  theme_minimal()
#p
print(p3)

在上述代码中,首先加载了 ggplot2 包。然后,创建了一个示例数据框 data,其中包含两个组别 “Group 1” 和 “Group 2”,以及对应的数值数据。接下来,使用 ggplot() 函数创建一个绘图对象,并通过 aes() 函数指定 x 轴为组别,y 轴为数值,以及填充颜色。然后,使用 geom_violin() 函数创建小提琴图层,并设置 trim = FALSE 参数来展示完整的小提琴形状。同时,还使用 geom_boxplot() 函数添加盒图层,并设置 width 参数调整盒图的宽度,以及 fill 和 color 参数设置盒图的填充和边框颜色。最后,使用 theme_minimal() 函数来设置绘图的主题风格。

6 Sankey diagram 桑基图

from: https://r-graph-gallery.com/321-introduction-to-interactive-sankey-diagram-2.html

install.packages("networkD3")
# Library
library(networkD3)
library(dplyr)
 
# A connection data frame is a list of flows with intensity for each flow
links <- data.frame(
  source=c("group_A","group_A", "group_B", "group_C", "group_C", "group_E"), 
  target=c("group_C","group_D", "group_E", "group_F", "group_G", "group_H"), 
  value=c(2,3, 2, 3, 1, 3)
  )
 
# From these flows we need to create a node data frame: it lists every entities involved in the flow
nodes <- data.frame(
  name=c(as.character(links$source), 
  as.character(links$target)) %>% unique()
)
 
# With networkD3, connection must be provided using id, not using real name like in the links dataframe.. So we need to reformat it.
links$IDsource <- match(links$source, nodes$name)-1 
links$IDtarget <- match(links$target, nodes$name)-1
 
# Make the Network
p4 <- sankeyNetwork(Links = links, Nodes = nodes,
              Source = "IDsource", Target = "IDtarget",
              Value = "value", NodeID = "name", 
              sinksRight=FALSE)
p4
# save the widget
# library(htmlwidgets)
# saveWidget(p, file=paste0( getwd(), "/HtmlWidget/sankeyBasic1.html"))

7. Circular dendrogram 圆形树状图

from: https://r-graph-gallery.com/339-circular-dendrogram-with-ggraph.html

install.packages("ggraph")
install.packages("igraph")
install.packages("RColorBrewer")
# Libraries
library(ggraph)
library(igraph)
library(tidyverse)
library(RColorBrewer) 
# create a data frame giving the hierarchical structure of your individuals
d1=data.frame(from="origin", to=paste("group", seq(1,10), sep=""))
d2=data.frame(from=rep(d1$to, each=10), to=paste("subgroup", seq(1,100), sep="_"))
edges=rbind(d1, d2)
 
# create a vertices data.frame. One line per object of our hierarchy
vertices = data.frame(
  name = unique(c(as.character(edges$from), as.character(edges$to))) , 
  value = runif(111)
) 
# Let's add a column with the group of each name. It will be useful later to color points
vertices$group = edges$from[ match( vertices$name, edges$to ) ]
 
 
#Let's add information concerning the label we are going to add: angle, horizontal adjustement and potential flip
#calculate the ANGLE of the labels
vertices$id=NA
myleaves=which(is.na( match(vertices$name, edges$from) ))
nleaves=length(myleaves)
vertices$id[ myleaves ] = seq(1:nleaves)
vertices$angle= 90 - 360 * vertices$id / nleaves
 
# calculate the alignment of labels: right or left
# If I am on the left part of the plot, my labels have currently an angle < -90
vertices$hjust<-ifelse( vertices$angle < -90, 1, 0)
 
# flip angle BY to make them readable
vertices$angle<-ifelse(vertices$angle < -90, vertices$angle+180, vertices$angle)
 
# Create a graph object
mygraph <- graph_from_data_frame( edges, vertices=vertices )
 
# Make the plot
p5 <- ggraph(mygraph, layout = 'dendrogram', circular = TRUE) + 
  geom_edge_diagonal(colour="grey") +
  scale_edge_colour_distiller(palette = "RdPu") +
  geom_node_text(aes(x = x*1.15, y=y*1.15, filter = leaf, label=name, angle = angle, hjust=hjust, colour=group), size=2, alpha=1) +
  geom_node_point(aes(filter = leaf, x = x*1.07, y=y*1.07, colour=group, size=value, alpha=0.2)) +
  scale_colour_manual(values= rep( brewer.pal(9,"Paired") , 30)) +
  scale_size_continuous( range = c(0.1,10) ) +
  theme_void() +
  theme(
    legend.position="none",
    plot.margin=unit(c(0,0,0,0),"cm"),
  ) +
  expand_limits(x = c(-1.3, 1.3), y = c(-1.3, 1.3))
p5
## Warning: Using the `size` aesthetic in this geom was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` in the `default_aes` field and elsewhere instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

思考:如何用自己的数据去套用现成的R模板代码?

举例说明如何套用上面的Circular dendrogram 圆形树状图代码,然后请同学进行举一反三。

Step1 理解该套代码中哪些字符名称表示数据?

最简单的方法:运行整套代码后查看右上角environment中的Data框(前提是你要把data框先清空),框中的一级标题就是该套代码中的所有数据名称。

Step2 使用print() 或 view()了解数据结构,思考哪几个数据对应你的试验元数据?

print(d1)
##      from      to
## 1  origin  group1
## 2  origin  group2
## 3  origin  group3
## 4  origin  group4
## 5  origin  group5
## 6  origin  group6
## 7  origin  group7
## 8  origin  group8
## 9  origin  group9
## 10 origin group10
print(d2)
##        from           to
## 1    group1   subgroup_1
## 2    group1   subgroup_2
## 3    group1   subgroup_3
## 4    group1   subgroup_4
## 5    group1   subgroup_5
## 6    group1   subgroup_6
## 7    group1   subgroup_7
## 8    group1   subgroup_8
## 9    group1   subgroup_9
## 10   group1  subgroup_10
## 11   group2  subgroup_11
## 12   group2  subgroup_12
## 13   group2  subgroup_13
## 14   group2  subgroup_14
## 15   group2  subgroup_15
## 16   group2  subgroup_16
## 17   group2  subgroup_17
## 18   group2  subgroup_18
## 19   group2  subgroup_19
## 20   group2  subgroup_20
## 21   group3  subgroup_21
## 22   group3  subgroup_22
## 23   group3  subgroup_23
## 24   group3  subgroup_24
## 25   group3  subgroup_25
## 26   group3  subgroup_26
## 27   group3  subgroup_27
## 28   group3  subgroup_28
## 29   group3  subgroup_29
## 30   group3  subgroup_30
## 31   group4  subgroup_31
## 32   group4  subgroup_32
## 33   group4  subgroup_33
## 34   group4  subgroup_34
## 35   group4  subgroup_35
## 36   group4  subgroup_36
## 37   group4  subgroup_37
## 38   group4  subgroup_38
## 39   group4  subgroup_39
## 40   group4  subgroup_40
## 41   group5  subgroup_41
## 42   group5  subgroup_42
## 43   group5  subgroup_43
## 44   group5  subgroup_44
## 45   group5  subgroup_45
## 46   group5  subgroup_46
## 47   group5  subgroup_47
## 48   group5  subgroup_48
## 49   group5  subgroup_49
## 50   group5  subgroup_50
## 51   group6  subgroup_51
## 52   group6  subgroup_52
## 53   group6  subgroup_53
## 54   group6  subgroup_54
## 55   group6  subgroup_55
## 56   group6  subgroup_56
## 57   group6  subgroup_57
## 58   group6  subgroup_58
## 59   group6  subgroup_59
## 60   group6  subgroup_60
## 61   group7  subgroup_61
## 62   group7  subgroup_62
## 63   group7  subgroup_63
## 64   group7  subgroup_64
## 65   group7  subgroup_65
## 66   group7  subgroup_66
## 67   group7  subgroup_67
## 68   group7  subgroup_68
## 69   group7  subgroup_69
## 70   group7  subgroup_70
## 71   group8  subgroup_71
## 72   group8  subgroup_72
## 73   group8  subgroup_73
## 74   group8  subgroup_74
## 75   group8  subgroup_75
## 76   group8  subgroup_76
## 77   group8  subgroup_77
## 78   group8  subgroup_78
## 79   group8  subgroup_79
## 80   group8  subgroup_80
## 81   group9  subgroup_81
## 82   group9  subgroup_82
## 83   group9  subgroup_83
## 84   group9  subgroup_84
## 85   group9  subgroup_85
## 86   group9  subgroup_86
## 87   group9  subgroup_87
## 88   group9  subgroup_88
## 89   group9  subgroup_89
## 90   group9  subgroup_90
## 91  group10  subgroup_91
## 92  group10  subgroup_92
## 93  group10  subgroup_93
## 94  group10  subgroup_94
## 95  group10  subgroup_95
## 96  group10  subgroup_96
## 97  group10  subgroup_97
## 98  group10  subgroup_98
## 99  group10  subgroup_99
## 100 group10 subgroup_100
print(edges)
##        from           to
## 1    origin       group1
## 2    origin       group2
## 3    origin       group3
## 4    origin       group4
## 5    origin       group5
## 6    origin       group6
## 7    origin       group7
## 8    origin       group8
## 9    origin       group9
## 10   origin      group10
## 11   group1   subgroup_1
## 12   group1   subgroup_2
## 13   group1   subgroup_3
## 14   group1   subgroup_4
## 15   group1   subgroup_5
## 16   group1   subgroup_6
## 17   group1   subgroup_7
## 18   group1   subgroup_8
## 19   group1   subgroup_9
## 20   group1  subgroup_10
## 21   group2  subgroup_11
## 22   group2  subgroup_12
## 23   group2  subgroup_13
## 24   group2  subgroup_14
## 25   group2  subgroup_15
## 26   group2  subgroup_16
## 27   group2  subgroup_17
## 28   group2  subgroup_18
## 29   group2  subgroup_19
## 30   group2  subgroup_20
## 31   group3  subgroup_21
## 32   group3  subgroup_22
## 33   group3  subgroup_23
## 34   group3  subgroup_24
## 35   group3  subgroup_25
## 36   group3  subgroup_26
## 37   group3  subgroup_27
## 38   group3  subgroup_28
## 39   group3  subgroup_29
## 40   group3  subgroup_30
## 41   group4  subgroup_31
## 42   group4  subgroup_32
## 43   group4  subgroup_33
## 44   group4  subgroup_34
## 45   group4  subgroup_35
## 46   group4  subgroup_36
## 47   group4  subgroup_37
## 48   group4  subgroup_38
## 49   group4  subgroup_39
## 50   group4  subgroup_40
## 51   group5  subgroup_41
## 52   group5  subgroup_42
## 53   group5  subgroup_43
## 54   group5  subgroup_44
## 55   group5  subgroup_45
## 56   group5  subgroup_46
## 57   group5  subgroup_47
## 58   group5  subgroup_48
## 59   group5  subgroup_49
## 60   group5  subgroup_50
## 61   group6  subgroup_51
## 62   group6  subgroup_52
## 63   group6  subgroup_53
## 64   group6  subgroup_54
## 65   group6  subgroup_55
## 66   group6  subgroup_56
## 67   group6  subgroup_57
## 68   group6  subgroup_58
## 69   group6  subgroup_59
## 70   group6  subgroup_60
## 71   group7  subgroup_61
## 72   group7  subgroup_62
## 73   group7  subgroup_63
## 74   group7  subgroup_64
## 75   group7  subgroup_65
## 76   group7  subgroup_66
## 77   group7  subgroup_67
## 78   group7  subgroup_68
## 79   group7  subgroup_69
## 80   group7  subgroup_70
## 81   group8  subgroup_71
## 82   group8  subgroup_72
## 83   group8  subgroup_73
## 84   group8  subgroup_74
## 85   group8  subgroup_75
## 86   group8  subgroup_76
## 87   group8  subgroup_77
## 88   group8  subgroup_78
## 89   group8  subgroup_79
## 90   group8  subgroup_80
## 91   group9  subgroup_81
## 92   group9  subgroup_82
## 93   group9  subgroup_83
## 94   group9  subgroup_84
## 95   group9  subgroup_85
## 96   group9  subgroup_86
## 97   group9  subgroup_87
## 98   group9  subgroup_88
## 99   group9  subgroup_89
## 100  group9  subgroup_90
## 101 group10  subgroup_91
## 102 group10  subgroup_92
## 103 group10  subgroup_93
## 104 group10  subgroup_94
## 105 group10  subgroup_95
## 106 group10  subgroup_96
## 107 group10  subgroup_97
## 108 group10  subgroup_98
## 109 group10  subgroup_99
## 110 group10 subgroup_100
print(mygraph)
## IGRAPH 92aeaaa DN-- 111 110 -- 
## + attr: name (v/c), value (v/n), group (v/c), id (v/n), angle (v/n),
## | hjust (v/n)
## + edges from 92aeaaa (vertex names):
##  [1] origin->group1      origin->group2      origin->group3     
##  [4] origin->group4      origin->group5      origin->group6     
##  [7] origin->group7      origin->group8      origin->group9     
## [10] origin->group10     group1->subgroup_1  group1->subgroup_2 
## [13] group1->subgroup_3  group1->subgroup_4  group1->subgroup_5 
## [16] group1->subgroup_6  group1->subgroup_7  group1->subgroup_8 
## [19] group1->subgroup_9  group1->subgroup_10 group2->subgroup_11
## + ... omitted several edges
print(vertices)
##             name      value   group  id angle hjust
## 1         origin 0.44800742    <NA>  NA    NA    NA
## 2         group1 0.26655036  origin  NA    NA    NA
## 3         group2 0.31203776  origin  NA    NA    NA
## 4         group3 0.79181592  origin  NA    NA    NA
## 5         group4 0.05585058  origin  NA    NA    NA
## 6         group5 0.32995128  origin  NA    NA    NA
## 7         group6 0.02850016  origin  NA    NA    NA
## 8         group7 0.85771688  origin  NA    NA    NA
## 9         group8 0.81555825  origin  NA    NA    NA
## 10        group9 0.45430256  origin  NA    NA    NA
## 11       group10 0.20030234  origin  NA    NA    NA
## 12    subgroup_1 0.98959081  group1   1  86.4     0
## 13    subgroup_2 0.74503827  group1   2  82.8     0
## 14    subgroup_3 0.60365412  group1   3  79.2     0
## 15    subgroup_4 0.77637777  group1   4  75.6     0
## 16    subgroup_5 0.71010442  group1   5  72.0     0
## 17    subgroup_6 0.46416162  group1   6  68.4     0
## 18    subgroup_7 0.57125994  group1   7  64.8     0
## 19    subgroup_8 0.94477534  group1   8  61.2     0
## 20    subgroup_9 0.44172403  group1   9  57.6     0
## 21   subgroup_10 0.69499809  group1  10  54.0     0
## 22   subgroup_11 0.12803949  group2  11  50.4     0
## 23   subgroup_12 0.88753814  group2  12  46.8     0
## 24   subgroup_13 0.08320777  group2  13  43.2     0
## 25   subgroup_14 0.30573049  group2  14  39.6     0
## 26   subgroup_15 0.51419146  group2  15  36.0     0
## 27   subgroup_16 0.72021910  group2  16  32.4     0
## 28   subgroup_17 0.42017897  group2  17  28.8     0
## 29   subgroup_18 0.09006888  group2  18  25.2     0
## 30   subgroup_19 0.73441077  group2  19  21.6     0
## 31   subgroup_20 0.77268533  group2  20  18.0     0
## 32   subgroup_21 0.49937881  group3  21  14.4     0
## 33   subgroup_22 0.44270437  group3  22  10.8     0
## 34   subgroup_23 0.79249107  group3  23   7.2     0
## 35   subgroup_24 0.45202810  group3  24   3.6     0
## 36   subgroup_25 0.54255351  group3  25   0.0     0
## 37   subgroup_26 0.08434688  group3  26  -3.6     0
## 38   subgroup_27 0.88726748  group3  27  -7.2     0
## 39   subgroup_28 0.18711969  group3  28 -10.8     0
## 40   subgroup_29 0.89674607  group3  29 -14.4     0
## 41   subgroup_30 0.46212743  group3  30 -18.0     0
## 42   subgroup_31 0.22491721  group4  31 -21.6     0
## 43   subgroup_32 0.47266067  group4  32 -25.2     0
## 44   subgroup_33 0.81502083  group4  33 -28.8     0
## 45   subgroup_34 0.09302276  group4  34 -32.4     0
## 46   subgroup_35 0.25313071  group4  35 -36.0     0
## 47   subgroup_36 0.04565808  group4  36 -39.6     0
## 48   subgroup_37 0.75828602  group4  37 -43.2     0
## 49   subgroup_38 0.25401609  group4  38 -46.8     0
## 50   subgroup_39 0.25659480  group4  39 -50.4     0
## 51   subgroup_40 0.58664747  group4  40 -54.0     0
## 52   subgroup_41 0.59526738  group5  41 -57.6     0
## 53   subgroup_42 0.99029509  group5  42 -61.2     0
## 54   subgroup_43 0.16839686  group5  43 -64.8     0
## 55   subgroup_44 0.75773809  group5  44 -68.4     0
## 56   subgroup_45 0.25796999  group5  45 -72.0     0
## 57   subgroup_46 0.17682004  group5  46 -75.6     0
## 58   subgroup_47 0.24308785  group5  47 -79.2     0
## 59   subgroup_48 0.82644313  group5  48 -82.8     0
## 60   subgroup_49 0.13209491  group5  49 -86.4     0
## 61   subgroup_50 0.57465728  group5  50 -90.0     0
## 62   subgroup_51 0.87544752  group6  51  86.4     1
## 63   subgroup_52 0.91518356  group6  52  82.8     1
## 64   subgroup_53 0.52080183  group6  53  79.2     1
## 65   subgroup_54 0.66542379  group6  54  75.6     1
## 66   subgroup_55 0.58711622  group6  55  72.0     1
## 67   subgroup_56 0.19247736  group6  56  68.4     1
## 68   subgroup_57 0.46903629  group6  57  64.8     1
## 69   subgroup_58 0.90573718  group6  58  61.2     1
## 70   subgroup_59 0.82789350  group6  59  57.6     1
## 71   subgroup_60 0.65591924  group6  60  54.0     1
## 72   subgroup_61 0.91193616  group7  61  50.4     1
## 73   subgroup_62 0.53272488  group7  62  46.8     1
## 74   subgroup_63 0.95420129  group7  63  43.2     1
## 75   subgroup_64 0.44200329  group7  64  39.6     1
## 76   subgroup_65 0.76157813  group7  65  36.0     1
## 77   subgroup_66 0.19868759  group7  66  32.4     1
## 78   subgroup_67 0.28585503  group7  67  28.8     1
## 79   subgroup_68 0.10546475  group7  68  25.2     1
## 80   subgroup_69 0.21927622  group7  69  21.6     1
## 81   subgroup_70 0.38321552  group7  70  18.0     1
## 82   subgroup_71 0.81657712  group8  71  14.4     1
## 83   subgroup_72 0.82805822  group8  72  10.8     1
## 84   subgroup_73 0.80359772  group8  73   7.2     1
## 85   subgroup_74 0.02102864  group8  74   3.6     1
## 86   subgroup_75 0.34098824  group8  75   0.0     1
## 87   subgroup_76 0.11452250  group8  76  -3.6     1
## 88   subgroup_77 0.37177285  group8  77  -7.2     1
## 89   subgroup_78 0.27052051  group8  78 -10.8     1
## 90   subgroup_79 0.80620871  group8  79 -14.4     1
## 91   subgroup_80 0.66400577  group8  80 -18.0     1
## 92   subgroup_81 0.27976840  group9  81 -21.6     1
## 93   subgroup_82 0.57438941  group9  82 -25.2     1
## 94   subgroup_83 0.22449205  group9  83 -28.8     1
## 95   subgroup_84 0.78080396  group9  84 -32.4     1
## 96   subgroup_85 0.08164244  group9  85 -36.0     1
## 97   subgroup_86 0.59261693  group9  86 -39.6     1
## 98   subgroup_87 0.97517322  group9  87 -43.2     1
## 99   subgroup_88 0.20012700  group9  88 -46.8     1
## 100  subgroup_89 0.86214768  group9  89 -50.4     1
## 101  subgroup_90 0.22909202  group9  90 -54.0     1
## 102  subgroup_91 0.01786694 group10  91 -57.6     1
## 103  subgroup_92 0.48977770 group10  92 -61.2     1
## 104  subgroup_93 0.43194225 group10  93 -64.8     1
## 105  subgroup_94 0.47971528 group10  94 -68.4     1
## 106  subgroup_95 0.59724094 group10  95 -72.0     1
## 107  subgroup_96 0.44759819 group10  96 -75.6     1
## 108  subgroup_97 0.02367235 group10  97 -79.2     1
## 109  subgroup_98 0.29594446 group10  98 -82.8     1
## 110  subgroup_99 0.13900171 group10  99 -86.4     1
## 111 subgroup_100 0.16073637 group10 100 -90.0     1

Step3 根据上面的数据格式调整你试验的元数据格式,输入数据。

Step4 在整套代码中,把你的元数据名称进行相应的替换。分析过程中产生的名称可换可不换,随各人喜欢。

总结

学习使用R语言进行数据分析和绘图其实很简单,因为已经有很多供你套用的代码,你不用全部读懂代码也可以无障碍的画出类似的图,只要需要明白如何替换数据就行。
如果你还想对图的大小、颜色、标题、排版等进行更改,可以用R中的help工具,去了解各参数的含义,或者更简单的去问ChatGPT。

模板代码推荐两个网站:

https://r-graph-gallery.com/
https://www.datanovia.com/en/

这两个网站提供的代码,几乎可以满足你所有的绘图需求。