tidyverse家族

接下來陸續介紹了tidyverse家族,家庭主要成員包括

功能 包

dplyr

dplyr 定義了資料處理的語法,包括6個主要的函數:

其他: + count() + left_join() + full_join()

主要的函數

主要的函數

library(dplyr)
library(tidyverse) # scoped 函数
library(ggplot2)

select()

用途:select()函式可選要分析的欄位。 使用方式:select(資料名稱,欄位條件1,欄位條件2,…)

# 以ggplot2 自帶 diamonds dataset 為例
head(diamonds)
## # A tibble: 6 x 10
##   carat cut       color clarity depth table price     x     y     z
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
## 2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
## 3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31
## 4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
## 5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75
## 6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
# 查看 diamonds 的簡單統計量
summary(diamonds)
##      carat               cut        color        clarity          depth      
##  Min.   :0.2000   Fair     : 1610   D: 6775   SI1    :13065   Min.   :43.00  
##  1st Qu.:0.4000   Good     : 4906   E: 9797   VS2    :12258   1st Qu.:61.00  
##  Median :0.7000   Very Good:12082   F: 9542   SI2    : 9194   Median :61.80  
##  Mean   :0.7979   Premium  :13791   G:11292   VS1    : 8171   Mean   :61.75  
##  3rd Qu.:1.0400   Ideal    :21551   H: 8304   VVS2   : 5066   3rd Qu.:62.50  
##  Max.   :5.0100                     I: 5422   VVS1   : 3655   Max.   :79.00  
##                                     J: 2808   (Other): 2531                  
##      table           price             x                y         
##  Min.   :43.00   Min.   :  326   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.:56.00   1st Qu.:  950   1st Qu.: 4.710   1st Qu.: 4.720  
##  Median :57.00   Median : 2401   Median : 5.700   Median : 5.710  
##  Mean   :57.46   Mean   : 3933   Mean   : 5.731   Mean   : 5.735  
##  3rd Qu.:59.00   3rd Qu.: 5324   3rd Qu.: 6.540   3rd Qu.: 6.540  
##  Max.   :95.00   Max.   :18823   Max.   :10.740   Max.   :58.900  
##                                                                   
##        z         
##  Min.   : 0.000  
##  1st Qu.: 2.910  
##  Median : 3.530  
##  Mean   : 3.539  
##  3rd Qu.: 4.040  
##  Max.   :31.800  
## 

選擇連續欄位

我們只選擇 carat, cut, color, clarity, depth, table 這些欄位

diamonds_select1 <- diamonds %>%
  select(carat:table)
head(diamonds_select1,5)
## # A tibble: 5 x 6
##   carat cut     color clarity depth table
##   <dbl> <ord>   <ord> <ord>   <dbl> <dbl>
## 1 0.23  Ideal   E     SI2      61.5    55
## 2 0.21  Premium E     SI1      59.8    61
## 3 0.23  Good    E     VS1      56.9    65
## 4 0.290 Premium I     VS2      62.4    58
## 5 0.31  Good    J     SI2      63.3    58

選擇不連續的欄位

這裏我們不想選擇 cut 和 price 這兩個欄位

diamonds_select2 <- diamonds %>%
  select(carat, color:z,-price)
head(diamonds_select2,5)
## # A tibble: 5 x 8
##   carat color clarity depth table     x     y     z
##   <dbl> <ord> <ord>   <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.23  E     SI2      61.5    55  3.95  3.98  2.43
## 2 0.21  E     SI1      59.8    61  3.89  3.84  2.31
## 3 0.23  E     VS1      56.9    65  4.05  4.07  2.31
## 4 0.290 I     VS2      62.4    58  4.2   4.23  2.63
## 5 0.31  J     SI2      63.3    58  4.34  4.35  2.75

重新排列欄位的順序,將資料集排列成我們需要的順序

我們希望把x,y,z放到前面,把price放到最後一欄

diamonds_select4 <- diamonds %>%
  select(x,y,z, everything(), price)
head(diamonds_select4)
## # A tibble: 6 x 10
##       x     y     z carat cut       color clarity depth table price
##   <dbl> <dbl> <dbl> <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int>
## 1  3.95  3.98  2.43 0.23  Ideal     E     SI2      61.5    55   326
## 2  3.89  3.84  2.31 0.21  Premium   E     SI1      59.8    61   326
## 3  4.05  4.07  2.31 0.23  Good      E     VS1      56.9    65   327
## 4  4.2   4.23  2.63 0.290 Premium   I     VS2      62.4    58   334
## 5  4.34  4.35  2.75 0.31  Good      J     SI2      63.3    58   335
## 6  3.94  3.96  2.48 0.24  Very Good J     VVS2     62.8    57   336

mutate()

mutate 主要用來新增欄位

直接新增

# 新建一個欄位,可以當作編號欄位
diamonds_mutate <- diamonds %>%
  mutate(num = 1:length(diamonds$cut))
head(diamonds_mutate)
## # A tibble: 6 x 11
##   carat cut       color clarity depth table price     x     y     z   num
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl> <int>
## 1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43     1
## 2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31     2
## 3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31     3
## 4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63     4
## 5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75     5
## 6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48     6

新增運算過後的欄位

透過已知欄位新增欄位

diamonds_mutate2 <- diamonds %>% 
  mutate(newprice = price * 0.95)
head(diamonds_mutate2)
## # A tibble: 6 x 11
##   carat cut       color clarity depth table price     x     y     z newprice
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>    <dbl>
## 1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43     310.
## 2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31     310.
## 3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31     311.
## 4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63     317.
## 5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75     318.
## 6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48     319.

新建多个欄位,包含剛剛新建出的欄位

diamonds_mutate3 <- diamonds %>% 
  mutate(newprice = price * 0.95, newtotal = newprice + 100)
head(diamonds_mutate3)
## # A tibble: 6 x 12
##   carat cut       color clarity depth table price     x     y     z newprice
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>    <dbl>
## 1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43     310.
## 2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31     310.
## 3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31     311.
## 4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63     317.
## 5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75     318.
## 6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48     319.
## # … with 1 more variable: newtotal <dbl>

新增條件欄位

使用if_else新增條件欄位

#使用if_else新增条件列
diamonds_mutate4 <- diamonds %>% 
  mutate(type = if_else(price > 3933, "> mean", "<= mean"))
head(diamonds_mutate4)
## # A tibble: 6 x 11
##   carat cut       color clarity depth table price     x     y     z type   
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl> <chr>  
## 1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43 <= mean
## 2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31 <= mean
## 3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31 <= mean
## 4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63 <= mean
## 5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75 <= mean
## 6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48 <= mean

filter()

filter 主要就是以 row的方向進行選擇

單一條件的filter

diamonds_filter <- diamonds %>%
  filter(price > 1000)
head(diamonds_filter)
## # A tibble: 6 x 10
##   carat cut       color clarity depth table price     x     y     z
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1  0.7  Ideal     E     SI1      62.5    57  2757  5.7   5.72  3.57
## 2  0.86 Fair      E     SI2      55.1    69  2757  6.45  6.33  3.52
## 3  0.7  Ideal     G     VS2      61.6    56  2757  5.7   5.67  3.5 
## 4  0.71 Very Good E     VS2      62.4    57  2759  5.68  5.73  3.56
## 5  0.78 Very Good G     SI2      63.8    56  2759  5.81  5.85  3.72
## 6  0.7  Good      E     VS2      57.5    58  2759  5.85  5.9   3.38

多個條件的 filter

單一屬性中的多個不同類型的選擇,可以用%in%

# %in%
diamonds_filter2 <- diamonds %>%
  filter(cut %in% c('Ideal','Very Good','Good'))
head(diamonds_filter2)
## # A tibble: 6 x 10
##   carat cut       color clarity depth table price     x     y     z
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
## 2  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
## 3  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
## 4  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
## 5  0.24 Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
## 6  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53

多個屬性的,多個條件的選擇。

# 每個屬性的條件用逗號隔開即可
diamonds_filter3 <- diamonds %>%
  filter(cut %in% c('Ideal','Very Good','Good'), price>3000)
head(diamonds_filter3)
## # A tibble: 6 x 10
##   carat cut       color clarity depth table price     x     y     z
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1  0.9  Very Good I     SI2      62.5    58  3001  6.1   6.15  3.83
## 2  0.9  Ideal     I     SI2      62      56  3001  6.16  6.2   3.83
## 3  0.72 Very Good D     SI1      61.5    58  3001  5.74  5.78  3.54
## 4  0.7  Good      D     VS1      63.6    60  3001  5.61  5.52  3.54
## 5  0.7  Very Good D     VS1      63.4    59  3001  5.58  5.55  3.53
## 6  0.6  Ideal     G     VVS1     62.1    56  3001  5.42  5.43  3.37
# 也可使用 & 和 |等符號串連邏輯
diamonds_filter4 <- diamonds %>%
  filter(cut %in% c('Ideal','Very Good','Good') | price>2000 | color =='D')
head(diamonds_filter4)
## # A tibble: 6 x 10
##   carat cut       color clarity depth table price     x     y     z
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
## 2  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
## 3  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
## 4  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
## 5  0.24 Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
## 6  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53

篩選rows,可以看出來filter函數中的參數為邏輯值即可,這就非常強大,能夠篩選出複雜的多條件。同樣的,還支持很多運算符: ==,>,>=等等 &,|,!,xor() is.na() between(),near()

summarise()

summarise()函式用來計算統計值 單獨使用時一般就是對資料進行一個整體的了解,與group_by()合併使用時就是對每個分組的統計量進行了解

# 我們想查看的統計量是共有多少row,最高價,平均depth,cut的種類
diamonds_sum <- diamonds %>% 
  summarise(nrow = n(), max_price = max(price), mean_depth = mean(depth), nd_cut = n_distinct(cut))
head(diamonds_sum)
## # A tibble: 1 x 4
##    nrow max_price mean_depth nd_cut
##   <int>     <int>      <dbl>  <int>
## 1 53940     18823       61.7      5

summarise函式能對輸入的資料集指定不同屬性進行聚合。除了像上面常用的聚合有求和,求平均值,還支持:

  • Center: mean(),median()
  • Spread: sd(), IQR(),mad()
  • Range: min(),max(), quantile()
  • Position: first(),last(), nth()
  • Count: n(), n_distinct()
  • Logical: any(), all()

group_by()

group_by()函數的功能為設定分組依據,通常會與summarise()函式合併使用

diamonds_groupby <- diamonds %>%
  group_by(color) %>%
  summarise(nrow = n(), mean_price = mean(price))
head(diamonds_groupby)
## # A tibble: 6 x 3
##   color  nrow mean_price
##   <ord> <int>      <dbl>
## 1 D      6775      3170.
## 2 E      9797      3077.
## 3 F      9542      3725.
## 4 G     11292      3999.
## 5 H      8304      4487.
## 6 I      5422      5092.

summarise和groupby的進階用法

做一些簡單的線型回歸,然後提取統計量

# 用常用到的mtcars
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
mtcars %>%
  group_by(cyl) %>%
  group_modify(
    ~ broom::tidy(lm(mpg ~ wt, data = .))
  )
## # A tibble: 6 x 6
## # Groups:   cyl [3]
##     cyl term        estimate std.error statistic    p.value
##   <dbl> <chr>          <dbl>     <dbl>     <dbl>      <dbl>
## 1     4 (Intercept)    39.6      4.35       9.10 0.00000777
## 2     4 wt             -5.65     1.85      -3.05 0.0137    
## 3     6 (Intercept)    28.4      4.18       6.79 0.00105   
## 4     6 wt             -2.78     1.33      -2.08 0.0918    
## 5     8 (Intercept)    23.9      3.01       7.94 0.00000405
## 6     8 wt             -2.19     0.739     -2.97 0.0118
# summarise()的進階用法
mtcars %>%
  group_by(cyl) %>%
  summarise(
    broom::tidy(lm(mpg ~ wt))
  )
## `summarise()` has grouped output by 'cyl'. You can override using the `.groups` argument.
## # A tibble: 6 x 6
## # Groups:   cyl [3]
##     cyl term        estimate std.error statistic    p.value
##   <dbl> <chr>          <dbl>     <dbl>     <dbl>      <dbl>
## 1     4 (Intercept)    39.6      4.35       9.10 0.00000777
## 2     4 wt             -5.65     1.85      -3.05 0.0137    
## 3     6 (Intercept)    28.4      4.18       6.79 0.00105   
## 4     6 wt             -2.78     1.33      -2.08 0.0918    
## 5     8 (Intercept)    23.9      3.01       7.94 0.00000405
## 6     8 wt             -2.19     0.739     -2.97 0.0118

arrange()

排序功能,預設為遞增排序

diamonds_arrange <- diamonds %>%
  group_by(color) %>%
  summarise(nrow = n(), mean_price = mean(price))%>%
  arrange(desc(mean_price))
head(diamonds_arrange)
## # A tibble: 6 x 3
##   color  nrow mean_price
##   <ord> <int>      <dbl>
## 1 J      2808      5324.
## 2 I      5422      5092.
## 3 H      8304      4487.
## 4 G     11292      3999.
## 5 F      9542      3725.
## 6 D      6775      3170.

按多個屬性排序

diamonds_arrange2 <- diamonds %>%
  arrange(price & desc(depth))
head(diamonds_arrange2)
## # A tibble: 6 x 10
##   carat cut       color clarity depth table price     x     y     z
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
## 2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
## 3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31
## 4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
## 5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75
## 6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48

count()

其功能就是統計某個變量中各組出現的次數

統計資料筆數

統計所有的資料筆數

diamonds_count <- diamonds %>%
  count()
head(diamonds_count)
## # A tibble: 1 x 1
##       n
##   <int>
## 1 53940

統計類別變數,各個類別的數量

diamonds_count2 <- diamonds %>%
  count(cut)
diamonds_count2
## # A tibble: 5 x 2
##   cut           n
##   <ord>     <int>
## 1 Fair       1610
## 2 Good       4906
## 3 Very Good 12082
## 4 Premium   13791
## 5 Ideal     21551

table() 所達到的目的一樣

table(diamonds$cut)
## 
##      Fair      Good Very Good   Premium     Ideal 
##      1610      4906     12082     13791     21551

在count()中創造新變量, 並統計變量

diamonds_count3 <- diamonds %>%
  count(new = 7 * (price %% 10))
head(diamonds_count3)
## # A tibble: 6 x 2
##     new     n
##   <dbl> <int>
## 1     0  5406
## 2     7  5012
## 3    14  5586
## 4    21  5266
## 5    28  5475
## 6    35  5419

add_count()

當你想要加入一個統計欄位的時候可以用比較傳統的:group_by() + mutate() 也可以用更簡單的add_count()

diamonds_count4 <- diamonds %>%
  group_by(color) %>%
  mutate(total_col = n()) %>%
  ungroup()
head(diamonds_count4)
## # A tibble: 6 x 11
##   carat cut       color clarity depth table price     x     y     z total_col
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>     <int>
## 1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43      9797
## 2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31      9797
## 3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31      9797
## 4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63      5422
## 5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75      2808
## 6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48      2808

add_count()的用法,最後會有一個“n”的欄位就是統計結果

diamonds_count5 <- diamonds %>%
  add_count(color)
head(diamonds_count5)
## # A tibble: 6 x 11
##   carat cut       color clarity depth table price     x     y     z     n
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl> <int>
## 1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43  9797
## 2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31  9797
## 3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31  9797
## 4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63  5422
## 5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75  2808
## 6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48  2808

scoped 函数

dplyr的一些函數(mutate(select()等)),事實上,這些函數加上後綴_all,_at,_if,形成三組變體函數,可以方便對特定的子集進行操作。

Operate _all _at _if
select() select_all() select_at() select_if()
mutate() mutate_all() mutate_at() mutate_if()
rename() rename_all() rename_at() rename_if()
arrange() arrange_all() arrange_at() arrange_if()
filter() filter_all() filter_at() filter_if()
distinct() distinct_all() distinct_at() distinct_if()
group_by() group_by_all() group_by_at() group_by_if()
summarise() summarise_all() summarise_at() summarise_if()
map() map_all() map_at() map_if()
modify() modify_all() modify_at() modify_if()

select_if()

可以用 select_if 選擇某一數據類型

# 下看一下資料結構
str(diamonds)
## tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame)
##  $ carat  : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
##  $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
##  $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
##  $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
##  $ depth  : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
##  $ table  : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ...
##  $ price  : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
##  $ x      : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
##  $ y      : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
##  $ z      : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
# 挑選出所有factor型態的資料
diamonds_select3 <- diamonds %>%
   select_if(is.factor)
head(diamonds_select3,5)
## # A tibble: 5 x 3
##   cut     color clarity
##   <ord>   <ord> <ord>  
## 1 Ideal   E     SI2    
## 2 Premium E     SI1    
## 3 Good    E     VS1    
## 4 Premium I     VS2    
## 5 Good    J     SI2