4. mutate 함수 : 파생변수 추가하기

exam <- read.csv("csv_exam.csv")
dplyr::glimpse(exam)

## Observations: 20
## Variables: 5
## $ id      <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,...
## $ class   <int> 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, ...
## $ math    <int> 50, 60, 45, 30, 25, 50, 80, 90, 20, 50, 65, 45, 46, 48...
## $ english <int> 98, 97, 86, 98, 80, 89, 90, 78, 98, 98, 65, 85, 98, 87...
## $ science <int> 50, 60, 78, 58, 65, 98, 45, 25, 15, 45, 65, 32, 65, 12...

(1) 파생변수 추가하기

exam %>% 
  mutate(total = math + english + science) %>% 
  head

##   id class math english science total
## 1  1     1   50      98      50   198
## 2  2     1   60      97      60   217
## 3  3     1   45      86      78   209
## 4  4     1   30      98      58   186
## 5  5     2   25      80      65   170
## 6  6     2   50      89      98   237

(2) 여러 파생변후 한번에 추가하기

exam %>% 
  mutate(tatal = math + english + science, mean = (math + english + science)/3) %>% 
  head

##   id class math english science tatal     mean
## 1  1     1   50      98      50   198 66.00000
## 2  2     1   60      97      60   217 72.33333
## 3  3     1   45      86      78   209 69.66667
## 4  4     1   30      98      58   186 62.00000
## 5  5     2   25      80      65   170 56.66667
## 6  6     2   50      89      98   237 79.00000

(3) mutate()에 ifelse() 적용하기

exam %>% 
  mutate(test = ifelse(science >=60, "pass", "fail")) %>% 
  head

##   id class math english science test
## 1  1     1   50      98      50 fail
## 2  2     1   60      97      60 pass
## 3  3     1   45      86      78 pass
## 4  4     1   30      98      58 fail
## 5  5     2   25      80      65 pass
## 6  6     2   50      89      98 pass

(4) 추가한 변수를 dplyr 코드에 바로 활용하기

exam %>% 
  mutate(total = math + english + science) %>% 
  arrange(desc(total)) %>% 
  head

##   id class math english science total
## 1 18     5   80      78      90   248
## 2 19     5   89      68      87   244
## 3  6     2   50      89      98   237
## 4 17     5   65      68      98   231
## 5 16     4   58      98      65   221
## 6 20     5   78      83      58   219

mpg 데이터를 이용한 분석

mpg 데이터는 연비를 나타내는 변수가 hwy(고속도로 연비), cty(도시 연비) 두 종류로 분리되어 있습니다. 두 변수를 각각 활용하는 대신 하나의 통합연비 변수를 만들어 분석하려고 합니다.

Q1. mpg() 데이터 복사본을 만들고 cty와 hwy를 더한 ’합산연비 변수’를 추가하세요.

mpg_total <- mpg %>% mutate(total = cty + hwy)
head(mpg_total)

## # A tibble: 6 x 12
##   manufacturer model displ  year   cyl      trans   drv   cty   hwy    fl
##          <chr> <chr> <dbl> <int> <int>      <chr> <chr> <int> <int> <chr>
## 1         audi    a4   1.8  1999     4   auto(l5)     f    18    29     p
## 2         audi    a4   1.8  1999     4 manual(m5)     f    21    29     p
## 3         audi    a4   2.0  2008     4 manual(m6)     f    20    31     p
## 4         audi    a4   2.0  2008     4   auto(av)     f    21    30     p
## 5         audi    a4   2.8  1999     6   auto(l5)     f    16    26     p
## 6         audi    a4   2.8  1999     6 manual(m5)     f    18    26     p
## # ... with 2 more variables: class <chr>, total <int>

-Q2. 앞에서 만든 ‘합산연비 변수’ 를 2로 나눠 ‘평균연비 변수’ 를 추가하세요.

mpg_total <- mpg %>% mutate(total = cty + hwy, mean = total/2)
head(mpg_total)

## # A tibble: 6 x 13
##   manufacturer model displ  year   cyl      trans   drv   cty   hwy    fl
##          <chr> <chr> <dbl> <int> <int>      <chr> <chr> <int> <int> <chr>
## 1         audi    a4   1.8  1999     4   auto(l5)     f    18    29     p
## 2         audi    a4   1.8  1999     4 manual(m5)     f    21    29     p
## 3         audi    a4   2.0  2008     4 manual(m6)     f    20    31     p
## 4         audi    a4   2.0  2008     4   auto(av)     f    21    30     p
## 5         audi    a4   2.8  1999     6   auto(l5)     f    16    26     p
## 6         audi    a4   2.8  1999     6 manual(m5)     f    18    26     p
## # ... with 3 more variables: class <chr>, total <int>, mean <dbl>

-Q3. ’평균연비변수’가 가장 높은 자동차 3종의 데이터를 출력하세요.

mpg_total %>% arrange(desc(mean)) %>% head(3)

## # A tibble: 3 x 13
##   manufacturer      model displ  year   cyl      trans   drv   cty   hwy
##          <chr>      <chr> <dbl> <int> <int>      <chr> <chr> <int> <int>
## 1   volkswagen new beetle   1.9  1999     4 manual(m5)     f    35    44
## 2   volkswagen      jetta   1.9  1999     4 manual(m5)     f    33    44
## 3   volkswagen new beetle   1.9  1999     4   auto(l4)     f    29    41
## # ... with 4 more variables: fl <chr>, class <chr>, total <int>,
## #   mean <dbl>

-Q4. 1~3번 문제를 해결할 수 있는 하나로 연결된 dplyr 구문을 만들어 출력하세요. 데이터는 복사본 대신 mpg 원본을 이용하세요.

mpg %>% mutate(total = cty + hwy, mean = total/2) %>% 
  arrange(desc(mean)) %>% 
  head(3)

## # A tibble: 3 x 13
##   manufacturer      model displ  year   cyl      trans   drv   cty   hwy
##          <chr>      <chr> <dbl> <int> <int>      <chr> <chr> <int> <int>
## 1   volkswagen new beetle   1.9  1999     4 manual(m5)     f    35    44
## 2   volkswagen      jetta   1.9  1999     4 manual(m5)     f    33    44
## 3   volkswagen new beetle   1.9  1999     4   auto(l4)     f    29    41
## # ... with 4 more variables: fl <chr>, class <chr>, total <int>,
## #   mean <dbl>