tidyverse家族

接下來陸續介紹了tidyverse家族，家庭主要成員包括

功能包

數據處理王者 dplyr
數據轉換專家 tidyr
數據加載利器 readr
繪圖顏值擔當 ggplot2
循環加速器 purrr
強化數據框 tibble
字符串處理 stringr
因子處理 forcats

dplyr

dplyr 定義了資料處理的語法，包括6個主要的函數:

select()
mutate()
filter()
summarise()
group_by()
arrange()

其他： + count() + left_join() + full_join()

主要的函數

library(dplyr)
library(tidyverse) # scoped 函数
library(ggplot2)

select()

用途：select()函式可選要分析的欄位。使用方式：select(資料名稱,欄位條件1,欄位條件2,…)

# 以ggplot2 自帶 diamonds dataset 為例
head(diamonds)

## # A tibble: 6 x 10
##   carat cut       color clarity depth table price     x     y     z
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
## 2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
## 3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31
## 4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
## 5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75
## 6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48

# 查看 diamonds 的簡單統計量
summary(diamonds)

##      carat               cut        color        clarity          depth      
##  Min.   :0.2000   Fair     : 1610   D: 6775   SI1    :13065   Min.   :43.00  
##  1st Qu.:0.4000   Good     : 4906   E: 9797   VS2    :12258   1st Qu.:61.00  
##  Median :0.7000   Very Good:12082   F: 9542   SI2    : 9194   Median :61.80  
##  Mean   :0.7979   Premium  :13791   G:11292   VS1    : 8171   Mean   :61.75  
##  3rd Qu.:1.0400   Ideal    :21551   H: 8304   VVS2   : 5066   3rd Qu.:62.50  
##  Max.   :5.0100                     I: 5422   VVS1   : 3655   Max.   :79.00  
##                                     J: 2808   (Other): 2531                  
##      table           price             x                y         
##  Min.   :43.00   Min.   :  326   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.:56.00   1st Qu.:  950   1st Qu.: 4.710   1st Qu.: 4.720  
##  Median :57.00   Median : 2401   Median : 5.700   Median : 5.710  
##  Mean   :57.46   Mean   : 3933   Mean   : 5.731   Mean   : 5.735  
##  3rd Qu.:59.00   3rd Qu.: 5324   3rd Qu.: 6.540   3rd Qu.: 6.540  
##  Max.   :95.00   Max.   :18823   Max.   :10.740   Max.   :58.900  
##                                                                   
##        z         
##  Min.   : 0.000  
##  1st Qu.: 2.910  
##  Median : 3.530  
##  Mean   : 3.539  
##  3rd Qu.: 4.040  
##  Max.   :31.800  
##

選擇連續欄位

我們只選擇 carat, cut, color, clarity, depth, table 這些欄位

diamonds_select1 <- diamonds %>%
  select(carat:table)
head(diamonds_select1,5)

## # A tibble: 5 x 6
##   carat cut     color clarity depth table
##   <dbl> <ord>   <ord> <ord>   <dbl> <dbl>
## 1 0.23  Ideal   E     SI2      61.5    55
## 2 0.21  Premium E     SI1      59.8    61
## 3 0.23  Good    E     VS1      56.9    65
## 4 0.290 Premium I     VS2      62.4    58
## 5 0.31  Good    J     SI2      63.3    58

選擇不連續的欄位

這裏我們不想選擇 cut 和 price 這兩個欄位

diamonds_select2 <- diamonds %>%
  select(carat, color:z,-price)
head(diamonds_select2,5)

## # A tibble: 5 x 8
##   carat color clarity depth table     x     y     z
##   <dbl> <ord> <ord>   <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.23  E     SI2      61.5    55  3.95  3.98  2.43
## 2 0.21  E     SI1      59.8    61  3.89  3.84  2.31
## 3 0.23  E     VS1      56.9    65  4.05  4.07  2.31
## 4 0.290 I     VS2      62.4    58  4.2   4.23  2.63
## 5 0.31  J     SI2      63.3    58  4.34  4.35  2.75

重新排列欄位的順序，將資料集排列成我們需要的順序

我們希望把x,y,z放到前面，把price放到最後一欄

diamonds_select4 <- diamonds %>%
  select(x,y,z, everything(), price)
head(diamonds_select4)

## # A tibble: 6 x 10
##       x     y     z carat cut       color clarity depth table price
##   <dbl> <dbl> <dbl> <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int>
## 1  3.95  3.98  2.43 0.23  Ideal     E     SI2      61.5    55   326
## 2  3.89  3.84  2.31 0.21  Premium   E     SI1      59.8    61   326
## 3  4.05  4.07  2.31 0.23  Good      E     VS1      56.9    65   327
## 4  4.2   4.23  2.63 0.290 Premium   I     VS2      62.4    58   334
## 5  4.34  4.35  2.75 0.31  Good      J     SI2      63.3    58   335
## 6  3.94  3.96  2.48 0.24  Very Good J     VVS2     62.8    57   336

mutate()

mutate 主要用來新增欄位

直接新增

# 新建一個欄位，可以當作編號欄位
diamonds_mutate <- diamonds %>%
  mutate(num = 1:length(diamonds$cut))
head(diamonds_mutate)

## # A tibble: 6 x 11
##   carat cut       color clarity depth table price     x     y     z   num
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl> <int>
## 1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43     1
## 2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31     2
## 3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31     3
## 4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63     4
## 5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75     5
## 6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48     6

新增運算過後的欄位

透過已知欄位新增欄位

diamonds_mutate2 <- diamonds %>% 
  mutate(newprice = price * 0.95)
head(diamonds_mutate2)

## # A tibble: 6 x 11
##   carat cut       color clarity depth table price     x     y     z newprice
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>    <dbl>
## 1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43     310.
## 2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31     310.
## 3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31     311.
## 4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63     317.
## 5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75     318.
## 6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48     319.

新建多个欄位，包含剛剛新建出的欄位

diamonds_mutate3 <- diamonds %>% 
  mutate(newprice = price * 0.95, newtotal = newprice + 100)
head(diamonds_mutate3)

## # A tibble: 6 x 12
##   carat cut       color clarity depth table price     x     y     z newprice
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>    <dbl>
## 1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43     310.
## 2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31     310.
## 3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31     311.
## 4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63     317.
## 5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75     318.
## 6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48     319.
## # … with 1 more variable: newtotal <dbl>

新增條件欄位

使用if_else新增條件欄位

#使用if_else新增条件列
diamonds_mutate4 <- diamonds %>% 
  mutate(type = if_else(price > 3933, "> mean", "<= mean"))
head(diamonds_mutate4)

## # A tibble: 6 x 11
##   carat cut       color clarity depth table price     x     y     z type   
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl> <chr>  
## 1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43 <= mean
## 2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31 <= mean
## 3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31 <= mean
## 4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63 <= mean
## 5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75 <= mean
## 6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48 <= mean

filter()

filter 主要就是以 row的方向進行選擇

單一條件的filter

diamonds_filter <- diamonds %>%
  filter(price > 1000)
head(diamonds_filter)

## # A tibble: 6 x 10
##   carat cut       color clarity depth table price     x     y     z
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1  0.7  Ideal     E     SI1      62.5    57  2757  5.7   5.72  3.57
## 2  0.86 Fair      E     SI2      55.1    69  2757  6.45  6.33  3.52
## 3  0.7  Ideal     G     VS2      61.6    56  2757  5.7   5.67  3.5 
## 4  0.71 Very Good E     VS2      62.4    57  2759  5.68  5.73  3.56
## 5  0.78 Very Good G     SI2      63.8    56  2759  5.81  5.85  3.72
## 6  0.7  Good      E     VS2      57.5    58  2759  5.85  5.9   3.38

多個條件的 filter

單一屬性中的多個不同類型的選擇，可以用%in%

# %in%
diamonds_filter2 <- diamonds %>%
  filter(cut %in% c('Ideal','Very Good','Good'))
head(diamonds_filter2)

## # A tibble: 6 x 10
##   carat cut       color clarity depth table price     x     y     z
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
## 2  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
## 3  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
## 4  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
## 5  0.24 Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
## 6  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53

多個屬性的，多個條件的選擇。

# 每個屬性的條件用逗號隔開即可
diamonds_filter3 <- diamonds %>%
  filter(cut %in% c('Ideal','Very Good','Good'), price>3000)
head(diamonds_filter3)

## # A tibble: 6 x 10
##   carat cut       color clarity depth table price     x     y     z
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1  0.9  Very Good I     SI2      62.5    58  3001  6.1   6.15  3.83
## 2  0.9  Ideal     I     SI2      62      56  3001  6.16  6.2   3.83
## 3  0.72 Very Good D     SI1      61.5    58  3001  5.74  5.78  3.54
## 4  0.7  Good      D     VS1      63.6    60  3001  5.61  5.52  3.54
## 5  0.7  Very Good D     VS1      63.4    59  3001  5.58  5.55  3.53
## 6  0.6  Ideal     G     VVS1     62.1    56  3001  5.42  5.43  3.37

# 也可使用 & 和 |等符號串連邏輯
diamonds_filter4 <- diamonds %>%
  filter(cut %in% c('Ideal','Very Good','Good') | price>2000 | color =='D')
head(diamonds_filter4)

## # A tibble: 6 x 10
##   carat cut       color clarity depth table price     x     y     z
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
## 2  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
## 3  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
## 4  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
## 5  0.24 Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
## 6  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53

篩選rows，可以看出來filter函數中的參數為邏輯值即可，這就非常強大，能夠篩選出複雜的多條件。同樣的，還支持很多運算符： ==，>，>=等等＆，|，！，xor（） is.na（） between（），near（）

summarise()

summarise()函式用來計算統計值單獨使用時一般就是對資料進行一個整體的了解，與group_by()合併使用時就是對每個分組的統計量進行了解

# 我們想查看的統計量是共有多少row，最高價，平均depth，cut的種類
diamonds_sum <- diamonds %>% 
  summarise(nrow = n(), max_price = max(price), mean_depth = mean(depth), nd_cut = n_distinct(cut))
head(diamonds_sum)

## # A tibble: 1 x 4
##    nrow max_price mean_depth nd_cut
##   <int>     <int>      <dbl>  <int>
## 1 53940     18823       61.7      5

summarise函式能對輸入的資料集指定不同屬性進行聚合。除了像上面常用的聚合有求和，求平均值，還支持：

Center: mean()，median()
Spread: sd()， IQR()，mad()
Range: min()，max()， quantile()
Position: first()，last()， nth()
Count: n()， n_distinct()
Logical: any()， all()

group_by()

group_by()函數的功能為設定分組依據，通常會與summarise()函式合併使用

diamonds_groupby <- diamonds %>%
  group_by(color) %>%
  summarise(nrow = n(), mean_price = mean(price))
head(diamonds_groupby)

## # A tibble: 6 x 3
##   color  nrow mean_price
##   <ord> <int>      <dbl>
## 1 D      6775      3170.
## 2 E      9797      3077.
## 3 F      9542      3725.
## 4 G     11292      3999.
## 5 H      8304      4487.
## 6 I      5422      5092.

summarise和groupby的進階用法

做一些簡單的線型回歸，然後提取統計量

# 用常用到的mtcars
head(mtcars)

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

mtcars %>%
  group_by(cyl) %>%
  group_modify(
    ~ broom::tidy(lm(mpg ~ wt, data = .))
  )

## # A tibble: 6 x 6
## # Groups:   cyl [3]
##     cyl term        estimate std.error statistic    p.value
##   <dbl> <chr>          <dbl>     <dbl>     <dbl>      <dbl>
## 1     4 (Intercept)    39.6      4.35       9.10 0.00000777
## 2     4 wt             -5.65     1.85      -3.05 0.0137    
## 3     6 (Intercept)    28.4      4.18       6.79 0.00105   
## 4     6 wt             -2.78     1.33      -2.08 0.0918    
## 5     8 (Intercept)    23.9      3.01       7.94 0.00000405
## 6     8 wt             -2.19     0.739     -2.97 0.0118

# summarise()的進階用法
mtcars %>%
  group_by(cyl) %>%
  summarise(
    broom::tidy(lm(mpg ~ wt))
  )

## `summarise()` has grouped output by 'cyl'. You can override using the `.groups` argument.

## # A tibble: 6 x 6
## # Groups:   cyl [3]
##     cyl term        estimate std.error statistic    p.value
##   <dbl> <chr>          <dbl>     <dbl>     <dbl>      <dbl>
## 1     4 (Intercept)    39.6      4.35       9.10 0.00000777
## 2     4 wt             -5.65     1.85      -3.05 0.0137    
## 3     6 (Intercept)    28.4      4.18       6.79 0.00105   
## 4     6 wt             -2.78     1.33      -2.08 0.0918    
## 5     8 (Intercept)    23.9      3.01       7.94 0.00000405
## 6     8 wt             -2.19     0.739     -2.97 0.0118

arrange()

排序功能，預設為遞增排序

diamonds_arrange <- diamonds %>%
  group_by(color) %>%
  summarise(nrow = n(), mean_price = mean(price))%>%
  arrange(desc(mean_price))
head(diamonds_arrange)

## # A tibble: 6 x 3
##   color  nrow mean_price
##   <ord> <int>      <dbl>
## 1 J      2808      5324.
## 2 I      5422      5092.
## 3 H      8304      4487.
## 4 G     11292      3999.
## 5 F      9542      3725.
## 6 D      6775      3170.

按多個屬性排序

diamonds_arrange2 <- diamonds %>%
  arrange(price & desc(depth))
head(diamonds_arrange2)

## # A tibble: 6 x 10
##   carat cut       color clarity depth table price     x     y     z
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
## 2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
## 3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31
## 4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
## 5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75
## 6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48

count()

其功能就是統計某個變量中各組出現的次數

統計資料筆數

統計所有的資料筆數

diamonds_count <- diamonds %>%
  count()
head(diamonds_count)

## # A tibble: 1 x 1
##       n
##   <int>
## 1 53940

統計類別變數，各個類別的數量

diamonds_count2 <- diamonds %>%
  count(cut)
diamonds_count2

## # A tibble: 5 x 2
##   cut           n
##   <ord>     <int>
## 1 Fair       1610
## 2 Good       4906
## 3 Very Good 12082
## 4 Premium   13791
## 5 Ideal     21551

和 table() 所達到的目的一樣

table(diamonds$cut)

## 
##      Fair      Good Very Good   Premium     Ideal 
##      1610      4906     12082     13791     21551

在count（）中創造新變量, 並統計變量

diamonds_count3 <- diamonds %>%
  count(new = 7 * (price %% 10))
head(diamonds_count3)

## # A tibble: 6 x 2
##     new     n
##   <dbl> <int>
## 1     0  5406
## 2     7  5012
## 3    14  5586
## 4    21  5266
## 5    28  5475
## 6    35  5419

add_count()

當你想要加入一個統計欄位的時候可以用比較傳統的：group_by() + mutate() 也可以用更簡單的add_count()

diamonds_count4 <- diamonds %>%
  group_by(color) %>%
  mutate(total_col = n()) %>%
  ungroup()
head(diamonds_count4)

## # A tibble: 6 x 11
##   carat cut       color clarity depth table price     x     y     z total_col
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>     <int>
## 1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43      9797
## 2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31      9797
## 3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31      9797
## 4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63      5422
## 5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75      2808
## 6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48      2808

add_count()的用法，最後會有一個“n”的欄位就是統計結果

diamonds_count5 <- diamonds %>%
  add_count(color)
head(diamonds_count5)

## # A tibble: 6 x 11
##   carat cut       color clarity depth table price     x     y     z     n
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl> <int>
## 1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43  9797
## 2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31  9797
## 3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31  9797
## 4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63  5422
## 5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75  2808
## 6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48  2808

scoped 函数

dplyr的一些函數（mutate（select（）等）），事實上，這些函數加上後綴_all，_at，_if，形成三組變體函數，可以方便對特定的子集進行操作。

對數據框所有列操作，可以用_all
對數據框指定的幾列操作，可以用_at
對數據框符合條件的幾列進行操作，可以用_if

Operate	_all	_at	_if
select()	select_all()	select_at()	select_if()
mutate()	mutate_all()	mutate_at()	mutate_if()
rename()	rename_all()	rename_at()	rename_if()
arrange()	arrange_all()	arrange_at()	arrange_if()
filter()	filter_all()	filter_at()	filter_if()
distinct()	distinct_all()	distinct_at()	distinct_if()
group_by()	group_by_all()	group_by_at()	group_by_if()
summarise()	summarise_all()	summarise_at()	summarise_if()
map()	map_all()	map_at()	map_if()
modify()	modify_all()	modify_at()	modify_if()

select_if()

可以用 select_if 選擇某一數據類型

# 下看一下資料結構
str(diamonds)

## tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame)
##  $ carat  : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
##  $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
##  $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
##  $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
##  $ depth  : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
##  $ table  : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ...
##  $ price  : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
##  $ x      : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
##  $ y      : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
##  $ z      : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...

# 挑選出所有factor型態的資料
diamonds_select3 <- diamonds %>%
   select_if(is.factor)
head(diamonds_select3,5)

## # A tibble: 5 x 3
##   cut     color clarity
##   <ord>   <ord> <ord>  
## 1 Ideal   E     SI2    
## 2 Premium E     SI1    
## 3 Good    E     VS1    
## 4 Premium I     VS2    
## 5 Good    J     SI2

R: tidyverse家族 dplyr包

skye ye

3/16/2021