1 Mục đích

  • Hướng dẫn những lệnh cơ bản nhất của gói dplyr

  • Thực hành trên bộ số liệu điều tra mưc sống hộ gia đình 2014

2 Loading những packages cần thiết và import data

## -- Attaching packages ---------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.0     v purrr   0.2.5
## v tibble  1.4.2     v dplyr   0.7.8
## v tidyr   0.8.2     v stringr 1.3.1
## v readr   1.2.1     v forcats 0.3.0
## -- Conflicts ------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## # A tibble: 36,080 x 30
##    tinh  huyen    xa diaban  hoso  matv m1ac1 m1ac2 m1ac3 m1ac4a m1ac4b
##    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl+>  <dbl>
##  1 1         1     4      8    13     1 ""    2     1     11       1958
##  2 1         1     4      8    13     2 ""    1     2     " 9"     1957
##  3 1         1     4      8    14     1 ""    2     1     " 4"     1953
##  4 1         1     4      8    14     2 ""    1     7     " 7"     1996
##  5 1         1     4      8    15     1 ""    1     1     " 6"     1979
##  6 1         1     4      8    15     2 ""    2     2     11       1981
##  7 1         1     4      8    15     3 ""    1     3     " 1"     2008
##  8 1         1     4      8    15     4 ""    2     3     " 7"     2010
##  9 1         1     4      8    15     5 ""    2     4     " 8"     1954
## 10 1         1     7      6    13     1 ""    1     1     " 5"     1953
## # ... with 36,070 more rows, and 19 more variables: m1ac5 <dbl>,
## #   m1ac6 <dbl+lbl>, m1ac7a <dbl+lbl>, m1ac7b <dbl+lbl>, m1ac7c <dbl+lbl>,
## #   m1ac8 <dbl+lbl>, m1ac9 <dbl+lbl>, m1ac10 <dbl+lbl>, m1ama1 <dbl>,
## #   m1ac11 <dbl+lbl>, m1ac12 <dbl+lbl>, m1ac13 <dbl+lbl>,
## #   m1ac14a <dbl+lbl>, m1ac14b <dbl>, m1ac15a <dbl+lbl>,
## #   m1ac15b <dbl+lbl>, m1ac15c <dbl+lbl>, m1ac15d <dbl+lbl>, ky <dbl>
  • Số liệu muc1a có 36,080 dòng, 30 cột

3 Chọn biến

  • Theo cách thông thường
## # A tibble: 36,080 x 1
##    tinh     
##    <dbl+lbl>
##  1 1        
##  2 1        
##  3 1        
##  4 1        
##  5 1        
##  6 1        
##  7 1        
##  8 1        
##  9 1        
## 10 1        
## # ... with 36,070 more rows
  • Dùng toán tử pipe (trong phần này sẽ dùng theo cách 2)
## # A tibble: 36,080 x 3
##    tinh      huyen    xa
##    <dbl+lbl> <dbl> <dbl>
##  1 1             1     4
##  2 1             1     4
##  3 1             1     4
##  4 1             1     4
##  5 1             1     4
##  6 1             1     4
##  7 1             1     4
##  8 1             1     4
##  9 1             1     4
## 10 1             1     7
## # ... with 36,070 more rows
  • Chọn tất cả các biến trừ biến tinh, xa
## # A tibble: 36,080 x 28
##    huyen diaban  hoso  matv m1ac1 m1ac2 m1ac3 m1ac4a m1ac4b m1ac5 m1ac6
##    <dbl>  <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl+>  <dbl> <dbl> <dbl>
##  1     1      8    13     1 ""    2     1     11       1958    56 NA   
##  2     1      8    13     2 ""    1     2     " 9"     1957    57 NA   
##  3     1      8    14     1 ""    2     1     " 4"     1953    61 NA   
##  4     1      8    14     2 ""    1     7     " 7"     1996    18 NA   
##  5     1      8    15     1 ""    1     1     " 6"     1979    35 NA   
##  6     1      8    15     2 ""    2     2     11       1981    33 NA   
##  7     1      8    15     3 ""    1     3     " 1"     2008     6 " 1" 
##  8     1      8    15     4 ""    2     3     " 7"     2010     4 " 1" 
##  9     1      8    15     5 ""    2     4     " 8"     1954    60 NA   
## 10     1      6    13     1 ""    1     1     " 5"     1953    61 NA   
## # ... with 36,070 more rows, and 17 more variables: m1ac7a <dbl+lbl>,
## #   m1ac7b <dbl+lbl>, m1ac7c <dbl+lbl>, m1ac8 <dbl+lbl>, m1ac9 <dbl+lbl>,
## #   m1ac10 <dbl+lbl>, m1ama1 <dbl>, m1ac11 <dbl+lbl>, m1ac12 <dbl+lbl>,
## #   m1ac13 <dbl+lbl>, m1ac14a <dbl+lbl>, m1ac14b <dbl>, m1ac15a <dbl+lbl>,
## #   m1ac15b <dbl+lbl>, m1ac15c <dbl+lbl>, m1ac15d <dbl+lbl>, ky <dbl>
  • Chọn tất cả các biến bắt đầu bằng “m1ac4”
## # A tibble: 36,080 x 2
##    m1ac4a    m1ac4b
##    <dbl+lbl>  <dbl>
##  1 11          1958
##  2 " 9"        1957
##  3 " 4"        1953
##  4 " 7"        1996
##  5 " 6"        1979
##  6 11          1981
##  7 " 1"        2008
##  8 " 7"        2010
##  9 " 8"        1954
## 10 " 5"        1953
## # ... with 36,070 more rows
  • Chọn tất cả các biến kết thúc bằng “a”
## # A tibble: 36,080 x 5
##       xa m1ac4a    m1ac7a    m1ac14a   m1ac15a  
##    <dbl> <dbl+lbl> <dbl+lbl> <dbl+lbl> <dbl+lbl>
##  1     4 11        NA        NA        " 2"     
##  2     4 " 9"      NA        NA        " 2"     
##  3     4 " 4"      NA        NA        " 2"     
##  4     4 " 7"      NA        NA        " 2"     
##  5     4 " 6"      NA        NA        " 2"     
##  6     4 11        NA        NA        " 2"     
##  7     4 " 1"      " 1"      NA        NA       
##  8     4 " 7"      " 1"      NA        NA       
##  9     4 " 8"      NA        NA        " 2"     
## 10     7 " 5"      NA        NA        " 2"     
## # ... with 36,070 more rows
  • Chọn tất cả các biến có chứa “ac”
## # A tibble: 36,080 x 22
##    m1ac1 m1ac2 m1ac3 m1ac4a m1ac4b m1ac5 m1ac6 m1ac7a m1ac7b m1ac7c m1ac8
##    <chr> <dbl> <dbl> <dbl+>  <dbl> <dbl> <dbl> <dbl+> <dbl+> <dbl+> <dbl>
##  1 ""    2     1     11       1958    56 NA    NA     NA     NA     " 2" 
##  2 ""    1     2     " 9"     1957    57 NA    NA     NA     NA     " 2" 
##  3 ""    2     1     " 4"     1953    61 NA    NA     NA     NA     " 4" 
##  4 ""    1     7     " 7"     1996    18 NA    NA     NA     NA     " 1" 
##  5 ""    1     1     " 6"     1979    35 NA    NA     NA     NA     " 2" 
##  6 ""    2     2     11       1981    33 NA    NA     NA     NA     " 2" 
##  7 ""    1     3     " 1"     2008     6 " 1"  " 1"   " 2"   NA     NA   
##  8 ""    2     3     " 7"     2010     4 " 1"  " 1"   " 2"   NA     NA   
##  9 ""    2     4     " 8"     1954    60 NA    NA     NA     NA     " 2" 
## 10 ""    1     1     " 5"     1953    61 NA    NA     NA     NA     " 2" 
## # ... with 36,070 more rows, and 11 more variables: m1ac9 <dbl+lbl>,
## #   m1ac10 <dbl+lbl>, m1ac11 <dbl+lbl>, m1ac12 <dbl+lbl>,
## #   m1ac13 <dbl+lbl>, m1ac14a <dbl+lbl>, m1ac14b <dbl>, m1ac15a <dbl+lbl>,
## #   m1ac15b <dbl+lbl>, m1ac15c <dbl+lbl>, m1ac15d <dbl+lbl>
  • Sắp xếp lại thứ tự các biến
## # A tibble: 36,080 x 30
##       xa huyen tinh  diaban  hoso  matv m1ac1 m1ac2 m1ac3 m1ac4a m1ac4b
##    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl+>  <dbl>
##  1     4     1 1          8    13     1 ""    2     1     11       1958
##  2     4     1 1          8    13     2 ""    1     2     " 9"     1957
##  3     4     1 1          8    14     1 ""    2     1     " 4"     1953
##  4     4     1 1          8    14     2 ""    1     7     " 7"     1996
##  5     4     1 1          8    15     1 ""    1     1     " 6"     1979
##  6     4     1 1          8    15     2 ""    2     2     11       1981
##  7     4     1 1          8    15     3 ""    1     3     " 1"     2008
##  8     4     1 1          8    15     4 ""    2     3     " 7"     2010
##  9     4     1 1          8    15     5 ""    2     4     " 8"     1954
## 10     7     1 1          6    13     1 ""    1     1     " 5"     1953
## # ... with 36,070 more rows, and 19 more variables: m1ac5 <dbl>,
## #   m1ac6 <dbl+lbl>, m1ac7a <dbl+lbl>, m1ac7b <dbl+lbl>, m1ac7c <dbl+lbl>,
## #   m1ac8 <dbl+lbl>, m1ac9 <dbl+lbl>, m1ac10 <dbl+lbl>, m1ama1 <dbl>,
## #   m1ac11 <dbl+lbl>, m1ac12 <dbl+lbl>, m1ac13 <dbl+lbl>,
## #   m1ac14a <dbl+lbl>, m1ac14b <dbl>, m1ac15a <dbl+lbl>,
## #   m1ac15b <dbl+lbl>, m1ac15c <dbl+lbl>, m1ac15d <dbl+lbl>, ky <dbl>
  • Chọn đồng thời đặt tên biến
## # A tibble: 36,080 x 1
##    gender   
##    <dbl+lbl>
##  1 2        
##  2 1        
##  3 2        
##  4 1        
##  5 1        
##  6 2        
##  7 1        
##  8 2        
##  9 2        
## 10 1        
## # ... with 36,070 more rows

4 Chọn dòng (quan sát)

  • Chọn tất cả các quan sát có m1ac2 == 1
## # A tibble: 17,718 x 30
##    tinh  huyen    xa diaban  hoso  matv m1ac1 m1ac2 m1ac3 m1ac4a m1ac4b
##    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl+>  <dbl>
##  1 1         1     4      8    13     2 ""    1     2     9        1957
##  2 1         1     4      8    14     2 ""    1     7     7        1996
##  3 1         1     4      8    15     1 ""    1     1     6        1979
##  4 1         1     4      8    15     3 ""    1     3     1        2008
##  5 1         1     7      6    13     1 ""    1     1     5        1953
##  6 1         1     7      6    13     3 ""    1     6     4        2002
##  7 1         1     7      6    14     1 ""    1     1     1        1954
##  8 1         1     7      6    15     2 ""    1     2     4        1943
##  9 1         1     7      6    15     3 ""    1     3     5        1984
## 10 1         1    16     20    13     2 ""    1     2     3        1963
## # ... with 17,708 more rows, and 19 more variables: m1ac5 <dbl>,
## #   m1ac6 <dbl+lbl>, m1ac7a <dbl+lbl>, m1ac7b <dbl+lbl>, m1ac7c <dbl+lbl>,
## #   m1ac8 <dbl+lbl>, m1ac9 <dbl+lbl>, m1ac10 <dbl+lbl>, m1ama1 <dbl>,
## #   m1ac11 <dbl+lbl>, m1ac12 <dbl+lbl>, m1ac13 <dbl+lbl>,
## #   m1ac14a <dbl+lbl>, m1ac14b <dbl>, m1ac15a <dbl+lbl>,
## #   m1ac15b <dbl+lbl>, m1ac15c <dbl+lbl>, m1ac15d <dbl+lbl>, ky <dbl>
  • Chọn tất cả các quan sát có m1ac2 khác 1
## # A tibble: 18,362 x 30
##    tinh  huyen    xa diaban  hoso  matv m1ac1 m1ac2 m1ac3 m1ac4a m1ac4b
##    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl+>  <dbl>
##  1 1         1     4      8    13     1 ""    2     1     11       1958
##  2 1         1     4      8    14     1 ""    2     1     " 4"     1953
##  3 1         1     4      8    15     2 ""    2     2     11       1981
##  4 1         1     4      8    15     4 ""    2     3     " 7"     2010
##  5 1         1     4      8    15     5 ""    2     4     " 8"     1954
##  6 1         1     7      6    13     2 ""    2     2     " 5"     1955
##  7 1         1     7      6    14     2 ""    2     2     " 7"     1961
##  8 1         1     7      6    15     1 ""    2     1     11       1947
##  9 1         1     7      6    15     4 ""    2     3     " 8"     1984
## 10 1         1     7      6    15     5 ""    2     6     11       2010
## # ... with 18,352 more rows, and 19 more variables: m1ac5 <dbl>,
## #   m1ac6 <dbl+lbl>, m1ac7a <dbl+lbl>, m1ac7b <dbl+lbl>, m1ac7c <dbl+lbl>,
## #   m1ac8 <dbl+lbl>, m1ac9 <dbl+lbl>, m1ac10 <dbl+lbl>, m1ama1 <dbl>,
## #   m1ac11 <dbl+lbl>, m1ac12 <dbl+lbl>, m1ac13 <dbl+lbl>,
## #   m1ac14a <dbl+lbl>, m1ac14b <dbl>, m1ac15a <dbl+lbl>,
## #   m1ac15b <dbl+lbl>, m1ac15c <dbl+lbl>, m1ac15d <dbl+lbl>, ky <dbl>
  • Chọn dòng theo nhiều tiêu chí
## # A tibble: 683 x 30
##    tinh  huyen    xa diaban  hoso  matv m1ac1 m1ac2 m1ac3 m1ac4a m1ac4b
##    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl+>  <dbl>
##  1 1         1     4      8    13     1 ""    2     1     11       1958
##  2 1         1     4      8    14     1 ""    2     1     " 4"     1953
##  3 1         1     7      6    13     1 ""    1     1     " 5"     1953
##  4 1         1    28     25    15     2 ""    2     2     10       1953
##  5 1         2    40     12    15     1 ""    1     1     " 1"     1958
##  6 1         4   124     36    13     1 ""    2     1     " 9"     1953
##  7 1         5   167     30    14     1 ""    2     1     10       1953
##  8 1         6   187     10    20     2 ""    2     2     " 8"     1958
##  9 1         6   190     60    15     2 ""    2     2     " 1"     1953
## 10 1         6   199     16    14     2 ""    1     2     " 8"     1953
## # ... with 673 more rows, and 19 more variables: m1ac5 <dbl>,
## #   m1ac6 <dbl+lbl>, m1ac7a <dbl+lbl>, m1ac7b <dbl+lbl>, m1ac7c <dbl+lbl>,
## #   m1ac8 <dbl+lbl>, m1ac9 <dbl+lbl>, m1ac10 <dbl+lbl>, m1ama1 <dbl>,
## #   m1ac11 <dbl+lbl>, m1ac12 <dbl+lbl>, m1ac13 <dbl+lbl>,
## #   m1ac14a <dbl+lbl>, m1ac14b <dbl>, m1ac15a <dbl+lbl>,
## #   m1ac15b <dbl+lbl>, m1ac15c <dbl+lbl>, m1ac15d <dbl+lbl>, ky <dbl>
## # A tibble: 35,397 x 30
##    tinh  huyen    xa diaban  hoso  matv m1ac1 m1ac2 m1ac3 m1ac4a m1ac4b
##    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl+>  <dbl>
##  1 1         1     4      8    13     2 ""    1     2     " 9"     1957
##  2 1         1     4      8    14     2 ""    1     7     " 7"     1996
##  3 1         1     4      8    15     1 ""    1     1     " 6"     1979
##  4 1         1     4      8    15     2 ""    2     2     11       1981
##  5 1         1     4      8    15     3 ""    1     3     " 1"     2008
##  6 1         1     4      8    15     4 ""    2     3     " 7"     2010
##  7 1         1     4      8    15     5 ""    2     4     " 8"     1954
##  8 1         1     7      6    13     2 ""    2     2     " 5"     1955
##  9 1         1     7      6    13     3 ""    1     6     " 4"     2002
## 10 1         1     7      6    14     1 ""    1     1     " 1"     1954
## # ... with 35,387 more rows, and 19 more variables: m1ac5 <dbl>,
## #   m1ac6 <dbl+lbl>, m1ac7a <dbl+lbl>, m1ac7b <dbl+lbl>, m1ac7c <dbl+lbl>,
## #   m1ac8 <dbl+lbl>, m1ac9 <dbl+lbl>, m1ac10 <dbl+lbl>, m1ama1 <dbl>,
## #   m1ac11 <dbl+lbl>, m1ac12 <dbl+lbl>, m1ac13 <dbl+lbl>,
## #   m1ac14a <dbl+lbl>, m1ac14b <dbl>, m1ac15a <dbl+lbl>,
## #   m1ac15b <dbl+lbl>, m1ac15c <dbl+lbl>, m1ac15d <dbl+lbl>, ky <dbl>
  • Chọn các dòng mà m1ac6 missing
## # A tibble: 26,816 x 30
##    tinh  huyen    xa diaban  hoso  matv m1ac1 m1ac2 m1ac3 m1ac4a m1ac4b
##    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl+>  <dbl>
##  1 1         1     4      8    13     1 ""    2     1     11       1958
##  2 1         1     4      8    13     2 ""    1     2     " 9"     1957
##  3 1         1     4      8    14     1 ""    2     1     " 4"     1953
##  4 1         1     4      8    14     2 ""    1     7     " 7"     1996
##  5 1         1     4      8    15     1 ""    1     1     " 6"     1979
##  6 1         1     4      8    15     2 ""    2     2     11       1981
##  7 1         1     4      8    15     5 ""    2     4     " 8"     1954
##  8 1         1     7      6    13     1 ""    1     1     " 5"     1953
##  9 1         1     7      6    13     2 ""    2     2     " 5"     1955
## 10 1         1     7      6    14     1 ""    1     1     " 1"     1954
## # ... with 26,806 more rows, and 19 more variables: m1ac5 <dbl>,
## #   m1ac6 <dbl+lbl>, m1ac7a <dbl+lbl>, m1ac7b <dbl+lbl>, m1ac7c <dbl+lbl>,
## #   m1ac8 <dbl+lbl>, m1ac9 <dbl+lbl>, m1ac10 <dbl+lbl>, m1ama1 <dbl>,
## #   m1ac11 <dbl+lbl>, m1ac12 <dbl+lbl>, m1ac13 <dbl+lbl>,
## #   m1ac14a <dbl+lbl>, m1ac14b <dbl>, m1ac15a <dbl+lbl>,
## #   m1ac15b <dbl+lbl>, m1ac15c <dbl+lbl>, m1ac15d <dbl+lbl>, ky <dbl>
  • Chọn theo điều kiện 2 biến
## # A tibble: 160 x 30
##    tinh  huyen    xa diaban  hoso  matv m1ac1 m1ac2 m1ac3 m1ac4a m1ac4b
##    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl+>  <dbl>
##  1 1         2    79     10    15     5 ""    1     6     1        2014
##  2 1         6   190     60    15     5 ""    1     6     5        2014
##  3 1         8   322     15    15     4 ""    1     6     4        2014
##  4 1         9   367     33    14     5 ""    1     6     8        2014
##  5 1        21   607     32    14     6 ""    1     6     6        2014
##  6 1       271  9631      1    15     9 ""    1     6     1        2014
##  7 1       274  9847     17    13     4 ""    1     6     4        2014
##  8 1       278 10174     11    15     5 ""    1     6     8        2014
##  9 1       281 10426      9    13     5 ""    1     6     1        2014
## 10 2        30   955      4    14     4 ""    1     3     1        2014
## # ... with 150 more rows, and 19 more variables: m1ac5 <dbl>,
## #   m1ac6 <dbl+lbl>, m1ac7a <dbl+lbl>, m1ac7b <dbl+lbl>, m1ac7c <dbl+lbl>,
## #   m1ac8 <dbl+lbl>, m1ac9 <dbl+lbl>, m1ac10 <dbl+lbl>, m1ama1 <dbl>,
## #   m1ac11 <dbl+lbl>, m1ac12 <dbl+lbl>, m1ac13 <dbl+lbl>,
## #   m1ac14a <dbl+lbl>, m1ac14b <dbl>, m1ac15a <dbl+lbl>,
## #   m1ac15b <dbl+lbl>, m1ac15c <dbl+lbl>, m1ac15d <dbl+lbl>, ky <dbl>
## # A tibble: 160 x 30
##    tinh  huyen    xa diaban  hoso  matv m1ac1 m1ac2 m1ac3 m1ac4a m1ac4b
##    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl+>  <dbl>
##  1 1         2    79     10    15     5 ""    1     6     1        2014
##  2 1         6   190     60    15     5 ""    1     6     5        2014
##  3 1         8   322     15    15     4 ""    1     6     4        2014
##  4 1         9   367     33    14     5 ""    1     6     8        2014
##  5 1        21   607     32    14     6 ""    1     6     6        2014
##  6 1       271  9631      1    15     9 ""    1     6     1        2014
##  7 1       274  9847     17    13     4 ""    1     6     4        2014
##  8 1       278 10174     11    15     5 ""    1     6     8        2014
##  9 1       281 10426      9    13     5 ""    1     6     1        2014
## 10 2        30   955      4    14     4 ""    1     3     1        2014
## # ... with 150 more rows, and 19 more variables: m1ac5 <dbl>,
## #   m1ac6 <dbl+lbl>, m1ac7a <dbl+lbl>, m1ac7b <dbl+lbl>, m1ac7c <dbl+lbl>,
## #   m1ac8 <dbl+lbl>, m1ac9 <dbl+lbl>, m1ac10 <dbl+lbl>, m1ama1 <dbl>,
## #   m1ac11 <dbl+lbl>, m1ac12 <dbl+lbl>, m1ac13 <dbl+lbl>,
## #   m1ac14a <dbl+lbl>, m1ac14b <dbl>, m1ac15a <dbl+lbl>,
## #   m1ac15b <dbl+lbl>, m1ac15c <dbl+lbl>, m1ac15d <dbl+lbl>, ky <dbl>

Điều kiện và có thể dùng dầu phảy, hoặc dấu &, điều kiện hoặc dùng dấu “|”

## # A tibble: 21,848 x 30
##    tinh  huyen    xa diaban  hoso  matv m1ac1 m1ac2 m1ac3 m1ac4a m1ac4b
##    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl+>  <dbl>
##  1 1         1     4      8    13     2 ""    1     2     9        1957
##  2 1         1     4      8    14     2 ""    1     7     7        1996
##  3 1         1     4      8    15     1 ""    1     1     6        1979
##  4 1         1     4      8    15     3 ""    1     3     1        2008
##  5 1         1     4      8    15     4 ""    2     3     7        2010
##  6 1         1     7      6    13     1 ""    1     1     5        1953
##  7 1         1     7      6    13     3 ""    1     6     4        2002
##  8 1         1     7      6    14     1 ""    1     1     1        1954
##  9 1         1     7      6    15     2 ""    1     2     4        1943
## 10 1         1     7      6    15     3 ""    1     3     5        1984
## # ... with 21,838 more rows, and 19 more variables: m1ac5 <dbl>,
## #   m1ac6 <dbl+lbl>, m1ac7a <dbl+lbl>, m1ac7b <dbl+lbl>, m1ac7c <dbl+lbl>,
## #   m1ac8 <dbl+lbl>, m1ac9 <dbl+lbl>, m1ac10 <dbl+lbl>, m1ama1 <dbl>,
## #   m1ac11 <dbl+lbl>, m1ac12 <dbl+lbl>, m1ac13 <dbl+lbl>,
## #   m1ac14a <dbl+lbl>, m1ac14b <dbl>, m1ac15a <dbl+lbl>,
## #   m1ac15b <dbl+lbl>, m1ac15c <dbl+lbl>, m1ac15d <dbl+lbl>, ky <dbl>

5 Xóa dòng trùng nhau

  • Xóa dòng trùng ở tất cả các biến
## # A tibble: 36,080 x 30
##    tinh  huyen    xa diaban  hoso  matv m1ac1 m1ac2 m1ac3 m1ac4a m1ac4b
##    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl+>  <dbl>
##  1 1         1     4      8    13     1 ""    2     1     11       1958
##  2 1         1     4      8    13     2 ""    1     2     " 9"     1957
##  3 1         1     4      8    14     1 ""    2     1     " 4"     1953
##  4 1         1     4      8    14     2 ""    1     7     " 7"     1996
##  5 1         1     4      8    15     1 ""    1     1     " 6"     1979
##  6 1         1     4      8    15     2 ""    2     2     11       1981
##  7 1         1     4      8    15     3 ""    1     3     " 1"     2008
##  8 1         1     4      8    15     4 ""    2     3     " 7"     2010
##  9 1         1     4      8    15     5 ""    2     4     " 8"     1954
## 10 1         1     7      6    13     1 ""    1     1     " 5"     1953
## # ... with 36,070 more rows, and 19 more variables: m1ac5 <dbl>,
## #   m1ac6 <dbl+lbl>, m1ac7a <dbl+lbl>, m1ac7b <dbl+lbl>, m1ac7c <dbl+lbl>,
## #   m1ac8 <dbl+lbl>, m1ac9 <dbl+lbl>, m1ac10 <dbl+lbl>, m1ama1 <dbl>,
## #   m1ac11 <dbl+lbl>, m1ac12 <dbl+lbl>, m1ac13 <dbl+lbl>,
## #   m1ac14a <dbl+lbl>, m1ac14b <dbl>, m1ac15a <dbl+lbl>,
## #   m1ac15b <dbl+lbl>, m1ac15c <dbl+lbl>, m1ac15d <dbl+lbl>, ky <dbl>
  • Xóa dòng trùng ở 1 nhóm biến
## # A tibble: 3,130 x 30
##    tinh  huyen    xa diaban  hoso  matv m1ac1 m1ac2 m1ac3 m1ac4a m1ac4b
##    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl+>  <dbl>
##  1 1         1     4      8    13     1 ""    2     1     11       1958
##  2 1         1     7      6    13     1 ""    1     1     " 5"     1953
##  3 1         1    16     20    13     1 ""    2     1     12       1963
##  4 1         1    22     19    13     1 ""    2     1     " 2"     1941
##  5 1         1    28     25    13     1 ""    2     1     " 4"     1959
##  6 1         1    34     10    14     1 ""    2     1     " 7"     1959
##  7 1         2    40     12    13     1 ""    1     1     " 4"     1948
##  8 1         2    55     11    13     1 ""    1     1     11       1941
##  9 1         2    67     16    13     1 ""    1     1     " 5"     1961
## 10 1         2    79     10    14     1 ""    1     1     " 3"     1952
## # ... with 3,120 more rows, and 19 more variables: m1ac5 <dbl>,
## #   m1ac6 <dbl+lbl>, m1ac7a <dbl+lbl>, m1ac7b <dbl+lbl>, m1ac7c <dbl+lbl>,
## #   m1ac8 <dbl+lbl>, m1ac9 <dbl+lbl>, m1ac10 <dbl+lbl>, m1ama1 <dbl>,
## #   m1ac11 <dbl+lbl>, m1ac12 <dbl+lbl>, m1ac13 <dbl+lbl>,
## #   m1ac14a <dbl+lbl>, m1ac14b <dbl>, m1ac15a <dbl+lbl>,
## #   m1ac15b <dbl+lbl>, m1ac15c <dbl+lbl>, m1ac15d <dbl+lbl>, ky <dbl>

6 Chọn dòng ngẫu nhiên

  • Chọn dòng với số quan sát xác định trước
## # A tibble: 5 x 30
##   tinh  huyen    xa diaban  hoso  matv m1ac1 m1ac2 m1ac3 m1ac4a m1ac4b
##   <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl+>  <dbl>
## 1 79      784 27592     10    14     1 ""    1     1     " 1"     1935
## 2 96      966 32062     13    20     3 ""    1     3     " 1"     2001
## 3 62      608 23305      6    15     2 ""    1     2     12       1965
## 4 45      462 19358      3    13     2 ""    2     2     " 4"     1968
## 5 36      356 13657     22    15     1 ""    2     1     " 4"     1953
## # ... with 19 more variables: m1ac5 <dbl>, m1ac6 <dbl+lbl>,
## #   m1ac7a <dbl+lbl>, m1ac7b <dbl+lbl>, m1ac7c <dbl+lbl>, m1ac8 <dbl+lbl>,
## #   m1ac9 <dbl+lbl>, m1ac10 <dbl+lbl>, m1ama1 <dbl>, m1ac11 <dbl+lbl>,
## #   m1ac12 <dbl+lbl>, m1ac13 <dbl+lbl>, m1ac14a <dbl+lbl>, m1ac14b <dbl>,
## #   m1ac15a <dbl+lbl>, m1ac15b <dbl+lbl>, m1ac15c <dbl+lbl>,
## #   m1ac15d <dbl+lbl>, ky <dbl>
  • Chọn dòng với tỷ lệ xác định trước
## # A tibble: 361 x 30
##    tinh  huyen    xa diaban  hoso  matv m1ac1 m1ac2 m1ac3 m1ac4a m1ac4b
##    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl+>  <dbl>
##  1 44      454 19093      7    15     2 ""    2     2     " 9"     1969
##  2 12      106  3424      7    13     1 ""    1     1     " 5"     1995
##  3 75      732 26080      8    15     3 ""    1     3     " 3"     2004
##  4 70      690 25336     20    14     2 ""    2     3     " 4"     1998
##  5 56      572 22579      7    15     1 ""    1     1     " 1"     1963
##  6 17      157  5287      3    14     1 ""    1     1     " 8"     1979
##  7 94      947 31729      6    15     1 ""    1     1     11       1977
##  8 49      517 20971     12    13     1 ""    1     1     12       1971
##  9 58      587 22891      6    13     2 ""    2     2     " 7"     1952
## 10 45      464 19363     12    14     1 ""    1     1     " 1"     1957
## # ... with 351 more rows, and 19 more variables: m1ac5 <dbl>,
## #   m1ac6 <dbl+lbl>, m1ac7a <dbl+lbl>, m1ac7b <dbl+lbl>, m1ac7c <dbl+lbl>,
## #   m1ac8 <dbl+lbl>, m1ac9 <dbl+lbl>, m1ac10 <dbl+lbl>, m1ama1 <dbl>,
## #   m1ac11 <dbl+lbl>, m1ac12 <dbl+lbl>, m1ac13 <dbl+lbl>,
## #   m1ac14a <dbl+lbl>, m1ac14b <dbl>, m1ac15a <dbl+lbl>,
## #   m1ac15b <dbl+lbl>, m1ac15c <dbl+lbl>, m1ac15d <dbl+lbl>, ky <dbl>

7 Chọn dòng xác định trước

  • Chọn dòng 2, dòng 4
## # A tibble: 5 x 30
##   tinh  huyen    xa diaban  hoso  matv m1ac1 m1ac2 m1ac3 m1ac4a m1ac4b
##   <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl+>  <dbl>
## 1 1         1     4      8    13     1 ""    2     1     11       1958
## 2 1         1     4      8    13     2 ""    1     2     " 9"     1957
## 3 1         1     4      8    14     1 ""    2     1     " 4"     1953
## 4 1         1     4      8    14     2 ""    1     7     " 7"     1996
## 5 1         1     4      8    15     1 ""    1     1     " 6"     1979
## # ... with 19 more variables: m1ac5 <dbl>, m1ac6 <dbl+lbl>,
## #   m1ac7a <dbl+lbl>, m1ac7b <dbl+lbl>, m1ac7c <dbl+lbl>, m1ac8 <dbl+lbl>,
## #   m1ac9 <dbl+lbl>, m1ac10 <dbl+lbl>, m1ama1 <dbl>, m1ac11 <dbl+lbl>,
## #   m1ac12 <dbl+lbl>, m1ac13 <dbl+lbl>, m1ac14a <dbl+lbl>, m1ac14b <dbl>,
## #   m1ac15a <dbl+lbl>, m1ac15b <dbl+lbl>, m1ac15c <dbl+lbl>,
## #   m1ac15d <dbl+lbl>, ky <dbl>
## # A tibble: 2 x 30
##   tinh  huyen    xa diaban  hoso  matv m1ac1 m1ac2 m1ac3 m1ac4a m1ac4b
##   <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl+>  <dbl>
## 1 1         1     4      8    13     2 ""    1     2     9        1957
## 2 1         1     4      8    14     2 ""    1     7     7        1996
## # ... with 19 more variables: m1ac5 <dbl>, m1ac6 <dbl+lbl>,
## #   m1ac7a <dbl+lbl>, m1ac7b <dbl+lbl>, m1ac7c <dbl+lbl>, m1ac8 <dbl+lbl>,
## #   m1ac9 <dbl+lbl>, m1ac10 <dbl+lbl>, m1ama1 <dbl>, m1ac11 <dbl+lbl>,
## #   m1ac12 <dbl+lbl>, m1ac13 <dbl+lbl>, m1ac14a <dbl+lbl>, m1ac14b <dbl>,
## #   m1ac15a <dbl+lbl>, m1ac15b <dbl+lbl>, m1ac15c <dbl+lbl>,
## #   m1ac15d <dbl+lbl>, ky <dbl>

8 Sắp xếp số liệu

  • Sắp xếp theo thứ tự tăng dần, hoặc theo a -> z
## # A tibble: 36,080 x 30
##    tinh  huyen    xa diaban  hoso  matv m1ac1 m1ac2 m1ac3 m1ac4a m1ac4b
##    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl+>  <dbl>
##  1 1         1     4      8    13     1 ""    2     1     11       1958
##  2 1         1     4      8    13     2 ""    1     2     " 9"     1957
##  3 1         1     4      8    14     1 ""    2     1     " 4"     1953
##  4 1         1     4      8    14     2 ""    1     7     " 7"     1996
##  5 1         1     4      8    15     1 ""    1     1     " 6"     1979
##  6 1         1     4      8    15     2 ""    2     2     11       1981
##  7 1         1     4      8    15     3 ""    1     3     " 1"     2008
##  8 1         1     4      8    15     4 ""    2     3     " 7"     2010
##  9 1         1     4      8    15     5 ""    2     4     " 8"     1954
## 10 1         1     7      6    13     1 ""    1     1     " 5"     1953
## # ... with 36,070 more rows, and 19 more variables: m1ac5 <dbl>,
## #   m1ac6 <dbl+lbl>, m1ac7a <dbl+lbl>, m1ac7b <dbl+lbl>, m1ac7c <dbl+lbl>,
## #   m1ac8 <dbl+lbl>, m1ac9 <dbl+lbl>, m1ac10 <dbl+lbl>, m1ama1 <dbl>,
## #   m1ac11 <dbl+lbl>, m1ac12 <dbl+lbl>, m1ac13 <dbl+lbl>,
## #   m1ac14a <dbl+lbl>, m1ac14b <dbl>, m1ac15a <dbl+lbl>,
## #   m1ac15b <dbl+lbl>, m1ac15c <dbl+lbl>, m1ac15d <dbl+lbl>, ky <dbl>
  • Sắp xép theo thứ tự giảm dần, hoặc từ z -> a
## # A tibble: 36,080 x 30
##    tinh  huyen    xa diaban  hoso  matv m1ac1 m1ac2 m1ac3 m1ac4a m1ac4b
##    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl+>  <dbl>
##  1 96      964 31999      2    13     1 ""    1     1     " 9"     1961
##  2 96      964 31999      2    13     2 ""    2     2     10       1965
##  3 96      964 31999      2    15     1 ""    2     1     -2       1924
##  4 96      964 31999      2    15     2 ""    1     3     " 3"     1952
##  5 96      964 31999      2    15     3 ""    2     3     12       1969
##  6 96      964 31999      2    15     4 ""    1     6     " 3"     1997
##  7 96      964 31999      2    15     5 ""    1     6     " 4"     1998
##  8 96      964 31999      2    20     1 ""    2     1     -2       1958
##  9 96      964 31999      2    20     2 ""    1     2     -2       1958
## 10 96      964 31999      2    20     3 ""    2     3     " 3"     1983
## # ... with 36,070 more rows, and 19 more variables: m1ac5 <dbl>,
## #   m1ac6 <dbl+lbl>, m1ac7a <dbl+lbl>, m1ac7b <dbl+lbl>, m1ac7c <dbl+lbl>,
## #   m1ac8 <dbl+lbl>, m1ac9 <dbl+lbl>, m1ac10 <dbl+lbl>, m1ama1 <dbl>,
## #   m1ac11 <dbl+lbl>, m1ac12 <dbl+lbl>, m1ac13 <dbl+lbl>,
## #   m1ac14a <dbl+lbl>, m1ac14b <dbl>, m1ac15a <dbl+lbl>,
## #   m1ac15b <dbl+lbl>, m1ac15c <dbl+lbl>, m1ac15d <dbl+lbl>, ky <dbl>
## # A tibble: 36,080 x 30
##    tinh  huyen    xa diaban  hoso  matv m1ac1 m1ac2 m1ac3 m1ac4a m1ac4b
##    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl+>  <dbl>
##  1 96      973 32233      6    13     1 ""    1     1     " 5"     1949
##  2 96      973 32233      6    13     2 ""    2     2     " 1"     1948
##  3 96      973 32233      6    14     1 ""    1     1     " 1"     1987
##  4 96      973 32233      6    14     2 ""    2     2     " 6"     1988
##  5 96      973 32233      6    14     3 ""    1     3     " 9"     2008
##  6 96      973 32233      6    14     4 ""    2     3     " 9"     2010
##  7 96      973 32233      6    15     1 ""    1     1     " 4"     1970
##  8 96      973 32233      6    15     2 ""    2     7     " 6"     1982
##  9 96      973 32233      6    15     3 ""    1     7     " 5"     2006
## 10 96      973 32233      6    15     4 ""    2     7     11       2012
## # ... with 36,070 more rows, and 19 more variables: m1ac5 <dbl>,
## #   m1ac6 <dbl+lbl>, m1ac7a <dbl+lbl>, m1ac7b <dbl+lbl>, m1ac7c <dbl+lbl>,
## #   m1ac8 <dbl+lbl>, m1ac9 <dbl+lbl>, m1ac10 <dbl+lbl>, m1ama1 <dbl>,
## #   m1ac11 <dbl+lbl>, m1ac12 <dbl+lbl>, m1ac13 <dbl+lbl>,
## #   m1ac14a <dbl+lbl>, m1ac14b <dbl>, m1ac15a <dbl+lbl>,
## #   m1ac15b <dbl+lbl>, m1ac15c <dbl+lbl>, m1ac15d <dbl+lbl>, ky <dbl>

9 Tạo biến mới, cột mới

## # A tibble: 36,080 x 9
##    tinh      huyen    xa diaban  hoso  matv m1ac2     m1ac4b id     
##    <dbl+lbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl+lbl>  <dbl> <chr>  
##  1 1             1     4      8    13     1 2           1958 1148131
##  2 1             1     4      8    13     2 1           1957 1148132
##  3 1             1     4      8    14     1 2           1953 1148141
##  4 1             1     4      8    14     2 1           1996 1148142
##  5 1             1     4      8    15     1 1           1979 1148151
##  6 1             1     4      8    15     2 2           1981 1148152
##  7 1             1     4      8    15     3 1           2008 1148153
##  8 1             1     4      8    15     4 2           2010 1148154
##  9 1             1     4      8    15     5 2           1954 1148155
## 10 1             1     7      6    13     1 1           1953 1176131
## # ... with 36,070 more rows
## # A tibble: 36,080 x 9
##    tinh      huyen    xa diaban  hoso  matv m1ac2     m1ac4b   age
##    <dbl+lbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl+lbl>  <dbl> <dbl>
##  1 1             1     4      8    13     1 2           1958    60
##  2 1             1     4      8    13     2 1           1957    61
##  3 1             1     4      8    14     1 2           1953    65
##  4 1             1     4      8    14     2 1           1996    22
##  5 1             1     4      8    15     1 1           1979    39
##  6 1             1     4      8    15     2 2           1981    37
##  7 1             1     4      8    15     3 1           2008    10
##  8 1             1     4      8    15     4 2           2010     8
##  9 1             1     4      8    15     5 2           1954    64
## 10 1             1     7      6    13     1 1           1953    65
## # ... with 36,070 more rows
## # A tibble: 36,080 x 9
##    tinh      huyen    xa diaban  hoso  matv m1ac2     m1ac4b age_lg
##    <dbl+lbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl+lbl>  <dbl> <lgl> 
##  1 1             1     4      8    13     1 2           1958 FALSE 
##  2 1             1     4      8    13     2 1           1957 FALSE 
##  3 1             1     4      8    14     1 2           1953 FALSE 
##  4 1             1     4      8    14     2 1           1996 FALSE 
##  5 1             1     4      8    15     1 1           1979 FALSE 
##  6 1             1     4      8    15     2 2           1981 FALSE 
##  7 1             1     4      8    15     3 1           2008 TRUE  
##  8 1             1     4      8    15     4 2           2010 TRUE  
##  9 1             1     4      8    15     5 2           1954 FALSE 
## 10 1             1     7      6    13     1 1           1953 FALSE 
## # ... with 36,070 more rows
## # A tibble: 36,080 x 9
##    tinh      huyen    xa diaban  hoso  matv m1ac2     m1ac4b ha_noi
##    <dbl+lbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl+lbl>  <dbl> <lgl> 
##  1 1             1     4      8    13     1 2           1958 TRUE  
##  2 1             1     4      8    13     2 1           1957 TRUE  
##  3 1             1     4      8    14     1 2           1953 TRUE  
##  4 1             1     4      8    14     2 1           1996 TRUE  
##  5 1             1     4      8    15     1 1           1979 TRUE  
##  6 1             1     4      8    15     2 2           1981 TRUE  
##  7 1             1     4      8    15     3 1           2008 TRUE  
##  8 1             1     4      8    15     4 2           2010 TRUE  
##  9 1             1     4      8    15     5 2           1954 TRUE  
## 10 1             1     7      6    13     1 1           1953 TRUE  
## # ... with 36,070 more rows
## # A tibble: 36,080 x 9
##    tinh     huyen    xa diaban  hoso  matv m1ac2    m1ac4b ha_noi_ha_giang
##    <dbl+lb> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl+lb>  <dbl> <lgl>          
##  1 1            1     4      8    13     1 2          1958 TRUE           
##  2 1            1     4      8    13     2 1          1957 TRUE           
##  3 1            1     4      8    14     1 2          1953 TRUE           
##  4 1            1     4      8    14     2 1          1996 TRUE           
##  5 1            1     4      8    15     1 1          1979 TRUE           
##  6 1            1     4      8    15     2 2          1981 TRUE           
##  7 1            1     4      8    15     3 1          2008 TRUE           
##  8 1            1     4      8    15     4 2          2010 TRUE           
##  9 1            1     4      8    15     5 2          1954 TRUE           
## 10 1            1     7      6    13     1 1          1953 TRUE           
## # ... with 36,070 more rows
## # A tibble: 36,080 x 9
##    tinh      huyen    xa diaban  hoso  matv m1ac2     m1ac4b gender
##    <dbl+lbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl+lbl>  <dbl> <chr> 
##  1 1             1     4      8    13     1 2           1958 Nu    
##  2 1             1     4      8    13     2 1           1957 Nam   
##  3 1             1     4      8    14     1 2           1953 Nu    
##  4 1             1     4      8    14     2 1           1996 Nam   
##  5 1             1     4      8    15     1 1           1979 Nam   
##  6 1             1     4      8    15     2 2           1981 Nu    
##  7 1             1     4      8    15     3 1           2008 Nam   
##  8 1             1     4      8    15     4 2           2010 Nu    
##  9 1             1     4      8    15     5 2           1954 Nu    
## 10 1             1     7      6    13     1 1           1953 Nam   
## # ... with 36,070 more rows
## # A tibble: 36,080 x 9
##    tinh      huyen    xa diaban  hoso  matv m1ac2     m1ac4b nhom_tuoi
##    <dbl+lbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl+lbl>  <dbl> <chr>    
##  1 1             1     4      8    13     1 2           1958 >= 29    
##  2 1             1     4      8    13     2 1           1957 >= 29    
##  3 1             1     4      8    14     1 2           1953 >= 29    
##  4 1             1     4      8    14     2 1           1996 >= 29    
##  5 1             1     4      8    15     1 1           1979 >= 29    
##  6 1             1     4      8    15     2 2           1981 >= 29    
##  7 1             1     4      8    15     3 1           2008 >= 29    
##  8 1             1     4      8    15     4 2           2010 >= 29    
##  9 1             1     4      8    15     5 2           1954 >= 29    
## 10 1             1     7      6    13     1 1           1953 >= 29    
## # ... with 36,070 more rows

10 Toán tử pipe %>% chạy nhiều lệnh liên tục nối tiếp nhau

## # A tibble: 18,362 x 11
##    tinh  huyen    xa diaban  hoso  matv m1a~ m1ac4b   age gender nhom_tuoi
##    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <db>  <dbl> <dbl> <chr>  <chr>    
##  1 1       282 10441     14    13     2 2      1971    47 Nu     >= 29    
##  2 1       282 10441     14    13     3 2      1995    23 Nu     >= 29    
##  3 1       282 10441     14    13     4 2      2006    12 Nu     >= 29    
##  4 1       282 10441     14    14     2 2      1954    64 Nu     >= 29    
##  5 1       282 10441     14    14     4 2      1983    35 Nu     >= 29    
##  6 1       282 10441     14    14     5 2      2008    10 Nu     >= 29    
##  7 1       282 10441     14    15     2 2      1968    50 Nu     >= 29    
##  8 1       282 10441     14    15     3 2      1994    24 Nu     >= 29    
##  9 1       282 10450     11    13     2 2      1979    39 Nu     >= 29    
## 10 1       282 10450     11    13     4 2      2000    18 Nu     >= 29    
## # ... with 18,352 more rows