你肯定在一些语言培训软文中看到一些所谓大厂用python快速处理excel表格的大神操作。 本文介绍如何用R语言快速批量操作.xlsx或.csv文件,几行代码立地成神。
library(tidyverse)
library(openxlsx)
# dir.create("D:/data/iris", recursive = T)
任务01:把一个含有分类变量的数据集以此分类变量n个值为单位拆分成n个数据子集并保存成以拆分变量值为名的多个.csv文件
以iris为例,按Species变量的三个值setosa、versicolor、virginica为单位拆分成3个子集,并分别保存为setosa.csv、versicolor.csv、virginica.csv。
#为每个物种随机抽样5个观测做为最小可重复数据集,方便结果展示
iris_sample <- iris %>%
group_by(Species) %>%
slice_sample(n = 5) %>%
ungroup()
iris_sample
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
---|---|---|---|---|
5.0 | 3.4 | 1.6 | 0.4 | setosa |
4.6 | 3.2 | 1.4 | 0.2 | setosa |
5.1 | 3.4 | 1.5 | 0.2 | setosa |
5.0 | 3.6 | 1.4 | 0.2 | setosa |
5.1 | 3.5 | 1.4 | 0.3 | setosa |
5.7 | 3.0 | 4.2 | 1.2 | versicolor |
4.9 | 2.4 | 3.3 | 1.0 | versicolor |
5.5 | 2.6 | 4.4 | 1.2 | versicolor |
6.3 | 2.5 | 4.9 | 1.5 | versicolor |
6.9 | 3.1 | 4.9 | 1.5 | versicolor |
6.2 | 3.4 | 5.4 | 2.3 | virginica |
7.2 | 3.2 | 6.0 | 1.8 | virginica |
6.9 | 3.1 | 5.1 | 2.3 | virginica |
5.6 | 2.8 | 4.9 | 2.0 | virginica |
7.4 | 2.8 | 6.1 | 1.9 | virginica |
iris_subset <- iris_sample %>%
nest_by(Species, .keep = T)
iris_subset
Species | data |
---|---|
setosa | 5.0, 4.6, 5.1, 5.0, 5.1, 3.4, 3.2, 3.4, 3.6, 3.5, 1.6, 1.4, 1.5, 1.4, 1.4, 0.4, 0.2, 0.2, 0.2, 0.3, 1.0, 1.0, 1.0, 1.0, 1.0 |
versicolor | 5.7, 4.9, 5.5, 6.3, 6.9, 3.0, 2.4, 2.6, 2.5, 3.1, 4.2, 3.3, 4.4, 4.9, 4.9, 1.2, 1.0, 1.2, 1.5, 1.5, 2.0, 2.0, 2.0, 2.0, 2.0 |
virginica | 6.2, 7.2, 6.9, 5.6, 7.4, 3.4, 3.2, 3.1, 2.8, 2.8, 5.4, 6.0, 5.1, 4.9, 6.1, 2.3, 1.8, 2.3, 2.0, 1.9, 3.0, 3.0, 3.0, 3.0, 3.0 |
iris_subset %>%
pwalk( ~ write.csv(..2, paste0("D:/data/iris/", ..1, ".csv"), row.names = F))
#pwalk只执行操作不返回结果,请到上述路径下查看生成的文件
iris_subset <- iris_sample %>%
nest_by(Species, .keep = T)
iris_subset
Species | data |
---|---|
setosa | 5.0, 4.6, 5.1, 5.0, 5.1, 3.4, 3.2, 3.4, 3.6, 3.5, 1.6, 1.4, 1.5, 1.4, 1.4, 0.4, 0.2, 0.2, 0.2, 0.3, 1.0, 1.0, 1.0, 1.0, 1.0 |
versicolor | 5.7, 4.9, 5.5, 6.3, 6.9, 3.0, 2.4, 2.6, 2.5, 3.1, 4.2, 3.3, 4.4, 4.9, 4.9, 1.2, 1.0, 1.2, 1.5, 1.5, 2.0, 2.0, 2.0, 2.0, 2.0 |
virginica | 6.2, 7.2, 6.9, 5.6, 7.4, 3.4, 3.2, 3.1, 2.8, 2.8, 5.4, 6.0, 5.1, 4.9, 6.1, 2.3, 1.8, 2.3, 2.0, 1.9, 3.0, 3.0, 3.0, 3.0, 3.0 |
任务02:把一个含有分类变量的数据集以此分类变量n个值为单位拆分成n个数据子集并保存成以指定名称+数字序号为名的多个.xlsx文件
以iris为例,按Species变量的三个值setosa、versicolor、virginica为单位拆分成3个子集,并分别保存为iris1.xlsx、iris2.xlsx、iris3.xlsx。
iris |>
split(iris$Species) |>
walk2(1:3,
~ write.xlsx(..1, paste0("D:/data/iris/iris", ..2, ".xlsx")))
任务03:把一个含有分类变量的数据集以此分类变量n个值为单位拆分成n个数据子集并保存成以拆分变量值为名的多个.xlsx文件
以iris为例,按Species变量的三个值setosa、versicolor、virginica为单位拆分成3个子集,并分别保存为setosa.xlsx、versicolor.xlsx、virginica.xlsx。
iris_subset %>%
pwalk(
~ write.xlsx(..2, paste0("D:/data/iris/", ..1, ".xlsx")))
#pwalk只执行操作不返回结果,请到上述路径下查看生成的文件
任务04:把一个含有分类变量的数据集以此分类变量n个值为单位拆分成n个数据子集并保存到默认sheet名(一般为sheet1,sheet2……)的一个.xlsx文件
以iris为例,按Species变量的三个值setosa、versicolor、virginica为单位拆分成3个子集,并分别保存到sheet名分别为为sheet1,sheet2,sheet3的三个sheets的iris.xlsx。
iris_sample %>%
group_split(Species, .keep = T) %>%
write.xlsx("D:/data/iris/iris1.xlsx")
#只执行操作不返回结果,请到上述路径下查看生成的文件
任务05:把一个含有分类变量的数据集以此分类变量n个值为单位拆分成n个数据子集并保存成以拆分变量值为sheet名的一个.xlsx文件
以iris为例,按Species变量的三个值setosa、versicolor、virginica为单位拆分成3个子集,并分别保存到sheet名分别为为setosa、versicolor、virginica的三个sheets的iris_sheet.xlsx。
iris_sample |>
split(iris$Species) |>
write.xlsx("D:/data/iris/iris_sheet.xlsx")
#只执行操作不返回结果,请到上述路径下查看生成的文件
任务06:把指定目录中的所有.csv文件读入内存并合并为一个数据集(数据集结构和变量名必须相同,如不相同需做前期处理) > 把D:/data/iris/目录下的.csv读入,并同步将子集合并为一个数据集。
list.files("D:/data/iris/", pattern = '.csv', full.names = T) %>%
map_dfr(read.csv, header = T)
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | gender |
---|---|---|---|---|---|
6.3 | 3.3 | 6.0 | 2.5 | virginica | 1 |
5.8 | 2.7 | 5.1 | 1.9 | virginica | 1 |
6.3 | 2.9 | 5.6 | 1.8 | virginica | 1 |
6.5 | 3.0 | 5.8 | 2.2 | virginica | 1 |
4.9 | 2.5 | 4.5 | 1.7 | virginica | 1 |
6.8 | 3.0 | 5.5 | 2.1 | virginica | 1 |
7.7 | 2.8 | 6.7 | 2.0 | virginica | 1 |
6.7 | 3.3 | 5.7 | 2.1 | virginica | 1 |
6.1 | 3.0 | 4.9 | 1.8 | virginica | 1 |
6.4 | 2.8 | 5.6 | 2.1 | virginica | 1 |
7.2 | 3.0 | 5.8 | 1.6 | virginica | 1 |
7.4 | 2.8 | 6.1 | 1.9 | virginica | 1 |
7.9 | 3.8 | 6.4 | 2.0 | virginica | 1 |
6.4 | 2.8 | 5.6 | 2.2 | virginica | 1 |
6.3 | 2.8 | 5.1 | 1.5 | virginica | 1 |
7.7 | 3.0 | 6.1 | 2.3 | virginica | 1 |
6.0 | 3.0 | 4.8 | 1.8 | virginica | 1 |
6.7 | 3.1 | 5.6 | 2.4 | virginica | 1 |
6.9 | 3.1 | 5.1 | 2.3 | virginica | 1 |
5.8 | 2.7 | 5.1 | 1.9 | virginica | 1 |
6.7 | 3.3 | 5.7 | 2.5 | virginica | 1 |
6.3 | 2.5 | 5.0 | 1.9 | virginica | 1 |
6.2 | 3.4 | 5.4 | 2.3 | virginica | 1 |
4.9 | 3.0 | 1.4 | 0.2 | setosa | 1 |
4.6 | 3.1 | 1.5 | 0.2 | setosa | 1 |
5.0 | 3.6 | 1.4 | 0.2 | setosa | 1 |
4.8 | 3.4 | 1.6 | 0.2 | setosa | 1 |
4.8 | 3.0 | 1.4 | 0.1 | setosa | 1 |
5.8 | 4.0 | 1.2 | 0.2 | setosa | 1 |
5.7 | 4.4 | 1.5 | 0.4 | setosa | 1 |
5.1 | 3.8 | 1.5 | 0.3 | setosa | 1 |
5.4 | 3.4 | 1.7 | 0.2 | setosa | 1 |
5.1 | 3.7 | 1.5 | 0.4 | setosa | 1 |
4.6 | 3.6 | 1.0 | 0.2 | setosa | 1 |
4.8 | 3.4 | 1.9 | 0.2 | setosa | 1 |
5.0 | 3.0 | 1.6 | 0.2 | setosa | 1 |
5.0 | 3.4 | 1.6 | 0.4 | setosa | 1 |
5.2 | 3.5 | 1.5 | 0.2 | setosa | 1 |
5.2 | 3.4 | 1.4 | 0.2 | setosa | 1 |
5.4 | 3.4 | 1.5 | 0.4 | setosa | 1 |
5.2 | 4.1 | 1.5 | 0.1 | setosa | 1 |
5.5 | 4.2 | 1.4 | 0.2 | setosa | 1 |
4.9 | 3.1 | 1.5 | 0.2 | setosa | 1 |
5.5 | 3.5 | 1.3 | 0.2 | setosa | 1 |
4.9 | 3.6 | 1.4 | 0.1 | setosa | 1 |
4.4 | 3.0 | 1.3 | 0.2 | setosa | 1 |
5.1 | 3.4 | 1.5 | 0.2 | setosa | 1 |
4.4 | 3.2 | 1.3 | 0.2 | setosa | 1 |
5.1 | 3.8 | 1.9 | 0.4 | setosa | 1 |
5.1 | 3.8 | 1.6 | 0.2 | setosa | 1 |
5.3 | 3.7 | 1.5 | 0.2 | setosa | 1 |
5.0 | 3.3 | 1.4 | 0.2 | setosa | 1 |
7.0 | 3.2 | 4.7 | 1.4 | versicolor | 1 |
6.4 | 3.2 | 4.5 | 1.5 | versicolor | 1 |
6.9 | 3.1 | 4.9 | 1.5 | versicolor | 1 |
5.5 | 2.3 | 4.0 | 1.3 | versicolor | 1 |
6.5 | 2.8 | 4.6 | 1.5 | versicolor | 1 |
5.7 | 2.8 | 4.5 | 1.3 | versicolor | 1 |
4.9 | 2.4 | 3.3 | 1.0 | versicolor | 1 |
5.2 | 2.7 | 3.9 | 1.4 | versicolor | 1 |
5.0 | 2.0 | 3.5 | 1.0 | versicolor | 1 |
5.9 | 3.0 | 4.2 | 1.5 | versicolor | 1 |
5.6 | 2.9 | 3.6 | 1.3 | versicolor | 1 |
6.7 | 3.1 | 4.4 | 1.4 | versicolor | 1 |
5.9 | 3.2 | 4.8 | 1.8 | versicolor | 1 |
6.3 | 2.5 | 4.9 | 1.5 | versicolor | 1 |
6.7 | 3.0 | 5.0 | 1.7 | versicolor | 1 |
6.0 | 2.7 | 5.1 | 1.6 | versicolor | 1 |
6.0 | 3.4 | 4.5 | 1.6 | versicolor | 1 |
6.7 | 3.1 | 4.7 | 1.5 | versicolor | 1 |
6.3 | 2.3 | 4.4 | 1.3 | versicolor | 1 |
5.6 | 3.0 | 4.1 | 1.3 | versicolor | 1 |
5.8 | 2.6 | 4.0 | 1.2 | versicolor | 1 |
5.0 | 2.3 | 3.3 | 1.0 | versicolor | 1 |
5.7 | 2.9 | 4.2 | 1.3 | versicolor | 1 |
6.2 | 2.9 | 4.3 | 1.3 | versicolor | 1 |
5.1 | 2.5 | 3.0 | 1.1 | versicolor | 1 |
5.0 | 3.4 | 1.6 | 0.4 | setosa | NA |
4.6 | 3.2 | 1.4 | 0.2 | setosa | NA |
5.1 | 3.4 | 1.5 | 0.2 | setosa | NA |
5.0 | 3.6 | 1.4 | 0.2 | setosa | NA |
5.1 | 3.5 | 1.4 | 0.3 | setosa | NA |
5.1 | 3.4 | 1.5 | 0.2 | setosa | NA |
5.1 | 3.8 | 1.5 | 0.3 | setosa | NA |
4.6 | 3.2 | 1.4 | 0.2 | setosa | NA |
5.0 | 3.5 | 1.6 | 0.6 | setosa | NA |
4.7 | 3.2 | 1.3 | 0.2 | setosa | NA |
6.3 | 2.5 | 4.9 | 1.5 | versicolor | NA |
5.5 | 2.6 | 4.4 | 1.2 | versicolor | NA |
6.0 | 3.4 | 4.5 | 1.6 | versicolor | NA |
4.9 | 2.4 | 3.3 | 1.0 | versicolor | NA |
5.5 | 2.4 | 3.7 | 1.0 | versicolor | NA |
6.9 | 3.1 | 5.1 | 2.3 | virginica | NA |
5.8 | 2.7 | 5.1 | 1.9 | virginica | NA |
5.8 | 2.7 | 5.1 | 1.9 | virginica | NA |
5.8 | 2.8 | 5.1 | 2.4 | virginica | NA |
6.4 | 3.1 | 5.5 | 1.8 | virginica | NA |
5.7 | 3.0 | 4.2 | 1.2 | versicolor | NA |
4.9 | 2.4 | 3.3 | 1.0 | versicolor | NA |
5.5 | 2.6 | 4.4 | 1.2 | versicolor | NA |
6.3 | 2.5 | 4.9 | 1.5 | versicolor | NA |
6.9 | 3.1 | 4.9 | 1.5 | versicolor | NA |
6.2 | 3.4 | 5.4 | 2.3 | virginica | NA |
7.2 | 3.2 | 6.0 | 1.8 | virginica | NA |
6.9 | 3.1 | 5.1 | 2.3 | virginica | NA |
5.6 | 2.8 | 4.9 | 2.0 | virginica | NA |
7.4 | 2.8 | 6.1 | 1.9 | virginica | NA |
or
list.files("D:/data/iris/", pattern = 'iris\\d.csv', full.names = T) %>%
readr::read_csv(show_col_types = F, #readr需2.0.0以上
id = "source")
source | Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | gender |
---|---|---|---|---|---|---|
D:/data/iris/iris1.csv | 6.3 | 3.3 | 6.0 | 2.5 | virginica | 1 |
D:/data/iris/iris1.csv | 5.8 | 2.7 | 5.1 | 1.9 | virginica | 1 |
D:/data/iris/iris1.csv | 6.3 | 2.9 | 5.6 | 1.8 | virginica | 1 |
D:/data/iris/iris1.csv | 6.5 | 3.0 | 5.8 | 2.2 | virginica | 1 |
D:/data/iris/iris1.csv | 4.9 | 2.5 | 4.5 | 1.7 | virginica | 1 |
D:/data/iris/iris1.csv | 6.8 | 3.0 | 5.5 | 2.1 | virginica | 1 |
D:/data/iris/iris1.csv | 7.7 | 2.8 | 6.7 | 2.0 | virginica | 1 |
D:/data/iris/iris1.csv | 6.7 | 3.3 | 5.7 | 2.1 | virginica | 1 |
D:/data/iris/iris1.csv | 6.1 | 3.0 | 4.9 | 1.8 | virginica | 1 |
D:/data/iris/iris1.csv | 6.4 | 2.8 | 5.6 | 2.1 | virginica | 1 |
D:/data/iris/iris1.csv | 7.2 | 3.0 | 5.8 | 1.6 | virginica | 1 |
D:/data/iris/iris1.csv | 7.4 | 2.8 | 6.1 | 1.9 | virginica | 1 |
D:/data/iris/iris1.csv | 7.9 | 3.8 | 6.4 | 2.0 | virginica | 1 |
D:/data/iris/iris1.csv | 6.4 | 2.8 | 5.6 | 2.2 | virginica | 1 |
D:/data/iris/iris1.csv | 6.3 | 2.8 | 5.1 | 1.5 | virginica | 1 |
D:/data/iris/iris1.csv | 7.7 | 3.0 | 6.1 | 2.3 | virginica | 1 |
D:/data/iris/iris1.csv | 6.0 | 3.0 | 4.8 | 1.8 | virginica | 1 |
D:/data/iris/iris1.csv | 6.7 | 3.1 | 5.6 | 2.4 | virginica | 1 |
D:/data/iris/iris1.csv | 6.9 | 3.1 | 5.1 | 2.3 | virginica | 1 |
D:/data/iris/iris1.csv | 5.8 | 2.7 | 5.1 | 1.9 | virginica | 1 |
D:/data/iris/iris1.csv | 6.7 | 3.3 | 5.7 | 2.5 | virginica | 1 |
D:/data/iris/iris1.csv | 6.3 | 2.5 | 5.0 | 1.9 | virginica | 1 |
D:/data/iris/iris1.csv | 6.2 | 3.4 | 5.4 | 2.3 | virginica | 1 |
D:/data/iris/iris2.csv | 4.9 | 3.0 | 1.4 | 0.2 | setosa | 1 |
D:/data/iris/iris2.csv | 4.6 | 3.1 | 1.5 | 0.2 | setosa | 1 |
D:/data/iris/iris2.csv | 5.0 | 3.6 | 1.4 | 0.2 | setosa | 1 |
D:/data/iris/iris2.csv | 4.8 | 3.4 | 1.6 | 0.2 | setosa | 1 |
D:/data/iris/iris2.csv | 4.8 | 3.0 | 1.4 | 0.1 | setosa | 1 |
D:/data/iris/iris2.csv | 5.8 | 4.0 | 1.2 | 0.2 | setosa | 1 |
D:/data/iris/iris2.csv | 5.7 | 4.4 | 1.5 | 0.4 | setosa | 1 |
D:/data/iris/iris2.csv | 5.1 | 3.8 | 1.5 | 0.3 | setosa | 1 |
D:/data/iris/iris2.csv | 5.4 | 3.4 | 1.7 | 0.2 | setosa | 1 |
D:/data/iris/iris2.csv | 5.1 | 3.7 | 1.5 | 0.4 | setosa | 1 |
D:/data/iris/iris2.csv | 4.6 | 3.6 | 1.0 | 0.2 | setosa | 1 |
D:/data/iris/iris2.csv | 4.8 | 3.4 | 1.9 | 0.2 | setosa | 1 |
D:/data/iris/iris2.csv | 5.0 | 3.0 | 1.6 | 0.2 | setosa | 1 |
D:/data/iris/iris2.csv | 5.0 | 3.4 | 1.6 | 0.4 | setosa | 1 |
D:/data/iris/iris2.csv | 5.2 | 3.5 | 1.5 | 0.2 | setosa | 1 |
D:/data/iris/iris2.csv | 5.2 | 3.4 | 1.4 | 0.2 | setosa | 1 |
D:/data/iris/iris2.csv | 5.4 | 3.4 | 1.5 | 0.4 | setosa | 1 |
D:/data/iris/iris2.csv | 5.2 | 4.1 | 1.5 | 0.1 | setosa | 1 |
D:/data/iris/iris2.csv | 5.5 | 4.2 | 1.4 | 0.2 | setosa | 1 |
D:/data/iris/iris2.csv | 4.9 | 3.1 | 1.5 | 0.2 | setosa | 1 |
D:/data/iris/iris2.csv | 5.5 | 3.5 | 1.3 | 0.2 | setosa | 1 |
D:/data/iris/iris2.csv | 4.9 | 3.6 | 1.4 | 0.1 | setosa | 1 |
D:/data/iris/iris2.csv | 4.4 | 3.0 | 1.3 | 0.2 | setosa | 1 |
D:/data/iris/iris2.csv | 5.1 | 3.4 | 1.5 | 0.2 | setosa | 1 |
D:/data/iris/iris2.csv | 4.4 | 3.2 | 1.3 | 0.2 | setosa | 1 |
D:/data/iris/iris2.csv | 5.1 | 3.8 | 1.9 | 0.4 | setosa | 1 |
D:/data/iris/iris2.csv | 5.1 | 3.8 | 1.6 | 0.2 | setosa | 1 |
D:/data/iris/iris2.csv | 5.3 | 3.7 | 1.5 | 0.2 | setosa | 1 |
D:/data/iris/iris2.csv | 5.0 | 3.3 | 1.4 | 0.2 | setosa | 1 |
D:/data/iris/iris3.csv | 7.0 | 3.2 | 4.7 | 1.4 | versicolor | 1 |
D:/data/iris/iris3.csv | 6.4 | 3.2 | 4.5 | 1.5 | versicolor | 1 |
D:/data/iris/iris3.csv | 6.9 | 3.1 | 4.9 | 1.5 | versicolor | 1 |
D:/data/iris/iris3.csv | 5.5 | 2.3 | 4.0 | 1.3 | versicolor | 1 |
D:/data/iris/iris3.csv | 6.5 | 2.8 | 4.6 | 1.5 | versicolor | 1 |
D:/data/iris/iris3.csv | 5.7 | 2.8 | 4.5 | 1.3 | versicolor | 1 |
D:/data/iris/iris3.csv | 4.9 | 2.4 | 3.3 | 1.0 | versicolor | 1 |
D:/data/iris/iris3.csv | 5.2 | 2.7 | 3.9 | 1.4 | versicolor | 1 |
D:/data/iris/iris3.csv | 5.0 | 2.0 | 3.5 | 1.0 | versicolor | 1 |
D:/data/iris/iris3.csv | 5.9 | 3.0 | 4.2 | 1.5 | versicolor | 1 |
D:/data/iris/iris3.csv | 5.6 | 2.9 | 3.6 | 1.3 | versicolor | 1 |
D:/data/iris/iris3.csv | 6.7 | 3.1 | 4.4 | 1.4 | versicolor | 1 |
D:/data/iris/iris3.csv | 5.9 | 3.2 | 4.8 | 1.8 | versicolor | 1 |
D:/data/iris/iris3.csv | 6.3 | 2.5 | 4.9 | 1.5 | versicolor | 1 |
D:/data/iris/iris3.csv | 6.7 | 3.0 | 5.0 | 1.7 | versicolor | 1 |
D:/data/iris/iris3.csv | 6.0 | 2.7 | 5.1 | 1.6 | versicolor | 1 |
D:/data/iris/iris3.csv | 6.0 | 3.4 | 4.5 | 1.6 | versicolor | 1 |
D:/data/iris/iris3.csv | 6.7 | 3.1 | 4.7 | 1.5 | versicolor | 1 |
D:/data/iris/iris3.csv | 6.3 | 2.3 | 4.4 | 1.3 | versicolor | 1 |
D:/data/iris/iris3.csv | 5.6 | 3.0 | 4.1 | 1.3 | versicolor | 1 |
D:/data/iris/iris3.csv | 5.8 | 2.6 | 4.0 | 1.2 | versicolor | 1 |
D:/data/iris/iris3.csv | 5.0 | 2.3 | 3.3 | 1.0 | versicolor | 1 |
D:/data/iris/iris3.csv | 5.7 | 2.9 | 4.2 | 1.3 | versicolor | 1 |
D:/data/iris/iris3.csv | 6.2 | 2.9 | 4.3 | 1.3 | versicolor | 1 |
D:/data/iris/iris3.csv | 5.1 | 2.5 | 3.0 | 1.1 | versicolor | 1 |
任务07:把指定目录中的所有单sheet的.xlsx文件读入内存并合并为一个数据集(数据集结构和变量名必须相同,如不相同需做前期处理) > 把D:/data/iris/目录下的以s或v开头的.xlsx读入,并同步将子集合并为一个数据集。
list.files("D:/data/iris/", pattern = '^[s|v].+\\.xlsx$', full.names = T) %>%
map_dfr(read.xlsx)
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
---|---|---|---|---|
5.0 | 3.4 | 1.6 | 0.4 | setosa |
4.6 | 3.2 | 1.4 | 0.2 | setosa |
5.1 | 3.4 | 1.5 | 0.2 | setosa |
5.0 | 3.6 | 1.4 | 0.2 | setosa |
5.1 | 3.5 | 1.4 | 0.3 | setosa |
5.7 | 3.0 | 4.2 | 1.2 | versicolor |
4.9 | 2.4 | 3.3 | 1.0 | versicolor |
5.5 | 2.6 | 4.4 | 1.2 | versicolor |
6.3 | 2.5 | 4.9 | 1.5 | versicolor |
6.9 | 3.1 | 4.9 | 1.5 | versicolor |
6.2 | 3.4 | 5.4 | 2.3 | virginica |
7.2 | 3.2 | 6.0 | 1.8 | virginica |
6.9 | 3.1 | 5.1 | 2.3 | virginica |
5.6 | 2.8 | 4.9 | 2.0 | virginica |
7.4 | 2.8 | 6.1 | 1.9 | virginica |
任务08:把指定目录中的所有多sheet的.xlsx文件读入内存并合并为一个数据集(数据集结构和变量名必须相同,如不相同需做前期处理) > 把D:/data/iris/目录下的iris1.xlsx中的所有sheets读入,并同步将子集合并为一个数据集。
#用sheet名读取各个sheet(通用)
file <- "D:/data/iris/iris1.xlsx"
#也可以用sheet的顺序号map
getSheetNames(file) %>%
map_dfr(~read.xlsx(file, sheet = .))%>%
as_tibble()
x |
---|
5, 4.6, 5.1, 5, 5.1, 3.4, 3.2, 3.4, 3.6, 3.5, 1.6, 1.4, 1.5, 1.4, 1.4, 0.4, 0.2, 0.2, 0.2, 0.3, 1, 1, 1, 1, 1 |
5.7, 4.9, 5.5, 6.3, 6.9, 3, 2.4, 2.6, 2.5, 3.1, 4.2, 3.3, 4.4, 4.9, 4.9, 1.2, 1, 1.2, 1.5, 1.5, 2, 2, 2, 2, 2 |
6.2, 7.2, 6.9, 5.6, 7.4, 3.4, 3.2, 3.1, 2.8, 2.8, 5.4, 6, 5.1, 4.9, 6.1, 2.3, 1.8, 2.3, 2, 1.9, 3, 3, 3, 3, 3 |
5, 4.6, 5.1, 5, 5.1, 3.4, 3.2, 3.4, 3.6, 3.5, 1.6, 1.4, 1.5, 1.4, 1.4, 0.4, 0.2, 0.2, 0.2, 0.3, 1, 1, 1, 1, 1 |
5.7, 4.9, 5.5, 6.3, 6.9, 3, 2.4, 2.6, 2.5, 3.1, 4.2, 3.3, 4.4, 4.9, 4.9, 1.2, 1, 1.2, 1.5, 1.5, 2, 2, 2, 2, 2 |
6.2, 7.2, 6.9, 5.6, 7.4, 3.4, 3.2, 3.1, 2.8, 2.8, 5.4, 6, 5.1, 4.9, 6.1, 2.3, 1.8, 2.3, 2, 1.9, 3, 3, 3, 3, 3 |
5, 4.6, 5.1, 5, 5.1, 3.4, 3.2, 3.4, 3.6, 3.5, 1.6, 1.4, 1.5, 1.4, 1.4, 0.4, 0.2, 0.2, 0.2, 0.3, 1, 1, 1, 1, 1 |
5.7, 4.9, 5.5, 6.3, 6.9, 3, 2.4, 2.6, 2.5, 3.1, 4.2, 3.3, 4.4, 4.9, 4.9, 1.2, 1, 1.2, 1.5, 1.5, 2, 2, 2, 2, 2 |
6.2, 7.2, 6.9, 5.6, 7.4, 3.4, 3.2, 3.1, 2.8, 2.8, 5.4, 6, 5.1, 4.9, 6.1, 2.3, 1.8, 2.3, 2, 1.9, 3, 3, 3, 3, 3 |
or
library(tidyverse)
file <- "D:/data/iris/iris2.xlsx"
readxl::excel_sheets(file) %>%
purrr::set_names() %>%
map_dfr(~openxlsx::read.xlsx(file, sheet = .), .id = "sets")
sets | Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
---|---|---|---|---|---|
Sheet 1 | 7.0 | 3.2 | 4.7 | 1.4 | versicolor |
Sheet 1 | 6.4 | 3.2 | 4.5 | 1.5 | versicolor |
Sheet 1 | 6.9 | 3.1 | 4.9 | 1.5 | versicolor |
Sheet 1 | 5.5 | 2.3 | 4.0 | 1.3 | versicolor |
Sheet 1 | 6.5 | 2.8 | 4.6 | 1.5 | versicolor |
Sheet 1 | 5.7 | 2.8 | 4.5 | 1.3 | versicolor |
Sheet 1 | 6.3 | 3.3 | 4.7 | 1.6 | versicolor |
Sheet 1 | 4.9 | 2.4 | 3.3 | 1.0 | versicolor |
Sheet 1 | 6.6 | 2.9 | 4.6 | 1.3 | versicolor |
Sheet 1 | 5.2 | 2.7 | 3.9 | 1.4 | versicolor |
Sheet 1 | 5.0 | 2.0 | 3.5 | 1.0 | versicolor |
Sheet 1 | 5.9 | 3.0 | 4.2 | 1.5 | versicolor |
Sheet 1 | 6.0 | 2.2 | 4.0 | 1.0 | versicolor |
Sheet 1 | 6.1 | 2.9 | 4.7 | 1.4 | versicolor |
Sheet 1 | 5.6 | 2.9 | 3.6 | 1.3 | versicolor |
Sheet 1 | 6.7 | 3.1 | 4.4 | 1.4 | versicolor |
Sheet 1 | 5.6 | 3.0 | 4.5 | 1.5 | versicolor |
Sheet 1 | 5.8 | 2.7 | 4.1 | 1.0 | versicolor |
Sheet 1 | 6.2 | 2.2 | 4.5 | 1.5 | versicolor |
Sheet 1 | 5.6 | 2.5 | 3.9 | 1.1 | versicolor |
Sheet 1 | 5.9 | 3.2 | 4.8 | 1.8 | versicolor |
Sheet 1 | 6.1 | 2.8 | 4.0 | 1.3 | versicolor |
Sheet 1 | 6.3 | 2.5 | 4.9 | 1.5 | versicolor |
Sheet 1 | 6.1 | 2.8 | 4.7 | 1.2 | versicolor |
Sheet 1 | 6.4 | 2.9 | 4.3 | 1.3 | versicolor |
Sheet 1 | 6.6 | 3.0 | 4.4 | 1.4 | versicolor |
Sheet 1 | 6.8 | 2.8 | 4.8 | 1.4 | versicolor |
Sheet 1 | 6.7 | 3.0 | 5.0 | 1.7 | versicolor |
Sheet 1 | 6.0 | 2.9 | 4.5 | 1.5 | versicolor |
Sheet 1 | 5.7 | 2.6 | 3.5 | 1.0 | versicolor |
Sheet 1 | 5.5 | 2.4 | 3.8 | 1.1 | versicolor |
Sheet 1 | 5.5 | 2.4 | 3.7 | 1.0 | versicolor |
Sheet 1 | 5.8 | 2.7 | 3.9 | 1.2 | versicolor |
Sheet 1 | 6.0 | 2.7 | 5.1 | 1.6 | versicolor |
Sheet 1 | 5.4 | 3.0 | 4.5 | 1.5 | versicolor |
Sheet 1 | 6.0 | 3.4 | 4.5 | 1.6 | versicolor |
Sheet 1 | 6.7 | 3.1 | 4.7 | 1.5 | versicolor |
Sheet 1 | 6.3 | 2.3 | 4.4 | 1.3 | versicolor |
Sheet 1 | 5.6 | 3.0 | 4.1 | 1.3 | versicolor |
Sheet 1 | 5.5 | 2.5 | 4.0 | 1.3 | versicolor |
Sheet 1 | 5.5 | 2.6 | 4.4 | 1.2 | versicolor |
Sheet 1 | 6.1 | 3.0 | 4.6 | 1.4 | versicolor |
Sheet 1 | 5.8 | 2.6 | 4.0 | 1.2 | versicolor |
Sheet 1 | 5.0 | 2.3 | 3.3 | 1.0 | versicolor |
Sheet 1 | 5.6 | 2.7 | 4.2 | 1.3 | versicolor |
Sheet 1 | 5.7 | 3.0 | 4.2 | 1.2 | versicolor |
Sheet 1 | 5.7 | 2.9 | 4.2 | 1.3 | versicolor |
Sheet 1 | 6.2 | 2.9 | 4.3 | 1.3 | versicolor |
Sheet 1 | 5.1 | 2.5 | 3.0 | 1.1 | versicolor |
Sheet 1 | 5.7 | 2.8 | 4.1 | 1.3 | versicolor |
任务09:把指定目录中的所有多sheet的多个.xlsx文件读入内存并合并为一个数据集(数据集结构和变量名必须相同,如不相同需做前期处理) > 把D:/data/iris/目录下的iris*.xlsx中的所有sheets读入,并同步将子集合并为一个数据集。
library(tidyverse)
library(readxl)
# # 获取多个Excel文件的路径
files <- list.files("D:/data/iris",
pattern = "iris\\d.xlsx",
full.names = TRUE)
# # 自定义函数读取一个excel的多个sheet
read_exclsheets <- function(file) {
map_dfr(set_names(excel_sheets(file)),
~ read_xlsx(file, sheet = .x),
.id = "sheet_name")
}
# 文件名和sheet名会出现在数据框的相应列
df <- map_dfr(set_names(files),
read_exclsheets,
.id = "path") %>%
mutate(file_name = str_extract(path, "\\w+(?=\\.)"), # "(?<=/)\\w+(?=\\.)"
.before = path)
任务10:把指定目录下的所有.csv文件读入,并合并数据集,写回到指定目录生成一个.csv
把D:/data/iris/目录下所有.csv文件读入,并将数据集合并,然后写回到原目录,生成iris.csv
list.files("D:/data/iris/", pattern = '.csv', full.names = T) %>%
map_dfr(read.csv, header = T) %>%
write.csv("D:/data/iris.csv", row.names = F)
任务11:把指定目录下的所有.csv文件读入,并合并数据集,写回到指定目录生成一个.xlsx
把D:/data/iris/目录下所有.csv文件读入,并将数据集合并,然后写回到原目录,生成iris.xlsx
list.files("D:/data/iris/", pattern = '.csv', full.names = T) %>%
map_dfr(read.csv, header = T) %>%
write.xlsx("D:/data/iris.xlsx", row.names = F)
任务12:把指定目录下的所有.csv文件读入,并合并数据集,写回到指定目录生成一个.xlsx,并将每个数据集写到一个以该文件名为sheet名的sheet里。
把D:/data/iris/目录下所有.csv文件读入,并将数据集合并,然后写回到D:/data目录,生成iris.xlsx。把setosa.csv,versicolor.csv,virginica.csv分别写到sheet名分别为setosa,versicolor,virginica的sheet里。
filelist <- list.files("D:/data/iris/", pattern = '.csv') %>% gsub(".csv", "", .)
set <- list.files("D:/data/iris/", pattern = '.csv', full.names = T) %>%
map(read.csv, header = T)
names(set) <- filelist
write.xlsx(set, "D:/data/iris1.xlsx")
任务13:把list里的每个数据集写出到一个.csv文件。
把iris_list合并成一个数据框,写出生成iris.csv。
iris_list <- iris %>%
split(.$Species)
iris_list %>%
map_dfr(~ .) %>%
write_csv("D:/data/iris.csv")
#只提取部分变量
iris_list %>%
map_dfr(., `[`, c("Sepal.Length", "Sepal.Width")) %>%
write_csv("D:/data/iris.csv")
任务14:把list里的每个数据集写出到多个.csv文件,且.csv的文件名为各数据集的名。
把iris_list合并成一个数据框,写出生成setosa.csv,versicolor.csv,virginica.csv。
iris_list <- iris %>%
split(.$Species)
walk(1:length(iris_list),
~ write_csv(iris_list[[.]],
paste0("D:/data/iris/", names(iris_list)[.], ".csv")))
or
names(iris_list) %>%
walk(
~ write_csv(iris_list[[.x]],
file = str_glue("D:/data/iris/{.x}.csv")))
任务15:把list里的每个数据集写出到一个.xlsx文件,且sheets名分别为各数据集的名。
把iris_list合并成一个数据框,写出生成一个iris2.xlsx数据集,且sheets名分别为setosa,versicolor,virginica。
iris_list <- iris %>%
split(.$Species)
openxlsx::write.xlsx(iris_list, "D:/data/iris/iris2.xlsx")