dplyr

Paquete que permite manipular los datos, ordenándolos, dividiéndolos, limpiándolos y creando nuevas variables, entre otras funciones para poder normalizarlos y usarlos. http://dplyr.tidyverse.org

Verbos funciones más comunes ordenados según donde se ubiquen los datos que queremos ordenar: filas, columnas y grupos de filas.

Funciones para filas

  • filter() elige casos por su valor
  • arrange() = cambia el orden de las filas
  • slice() = elige las filas en base a sus ubicaciones

Funciones para columnas

  • Mutate() Nuevas Variables
  • select() = selecciona variables
  • rename() = cambia los nombres de las columnas
  • relocate() = cambia el orden de las columnas

Grupos de fila

  • summarise() resumen de valores importantes

Notas de valor: Todas las funciones toma el data frame ( o tibble) como primer argumento. Para eso usamos el operador pipa: %>%

Práctica

1. Instalamos el paquete dplyr

# Preliminares
library(tidyverse)

2. Usamos la base de datos de starwars ya incorporada en r

#características de la tabla de datos starwar

dim(starwars)
## [1] 87 14
starwars
## # A tibble: 87 × 14
##    name     height  mass hair_color skin_color eye_color birth_year sex   gender
##    <chr>     <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
##  1 Luke Sk…    172    77 blond      fair       blue            19   male  mascu…
##  2 C-3PO       167    75 <NA>       gold       yellow         112   none  mascu…
##  3 R2-D2        96    32 <NA>       white, bl… red             33   none  mascu…
##  4 Darth V…    202   136 none       white      yellow          41.9 male  mascu…
##  5 Leia Or…    150    49 brown      light      brown           19   fema… femin…
##  6 Owen La…    178   120 brown, gr… light      blue            52   male  mascu…
##  7 Beru Wh…    165    75 brown      light      blue            47   fema… femin…
##  8 R5-D4        97    32 <NA>       white, red red             NA   none  mascu…
##  9 Biggs D…    183    84 black      light      brown           24   male  mascu…
## 10 Obi-Wan…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
## # ℹ 77 more rows
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>

FILAS

Filter()

starwars %>% filter(skin_color == "light", eye_color =="brown")
## # A tibble: 7 × 14
##   name      height  mass hair_color skin_color eye_color birth_year sex   gender
##   <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
## 1 Leia Org…    150    49 brown      light      brown             19 fema… femin…
## 2 Biggs Da…    183    84 black      light      brown             24 male  mascu…
## 3 Cordé        157    NA brown      light      brown             NA fema… femin…
## 4 Dormé        165    NA brown      light      brown             NA fema… femin…
## 5 Raymus A…    188    79 brown      light      brown             NA male  mascu…
## 6 Poe Dame…     NA    NA brown      light      brown             NA male  mascu…
## 7 Padmé Am…    165    45 brown      light      brown             46 fema… femin…
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>

arrange()

starwars %>% arrange(height, mass)
## # A tibble: 87 × 14
##    name     height  mass hair_color skin_color eye_color birth_year sex   gender
##    <chr>     <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
##  1 Yoda         66    17 white      green      brown            896 male  mascu…
##  2 Ratts T…     79    15 none       grey, blue unknown           NA male  mascu…
##  3 Wicket …     88    20 brown      brown      brown              8 male  mascu…
##  4 Dud Bolt     94    45 none       blue, grey yellow            NA male  mascu…
##  5 R2-D2        96    32 <NA>       white, bl… red               33 none  mascu…
##  6 R4-P17       96    NA none       silver, r… red, blue         NA none  femin…
##  7 R5-D4        97    32 <NA>       white, red red               NA none  mascu…
##  8 Sebulba     112    40 none       grey, red  orange            NA male  mascu…
##  9 Gasgano     122    NA none       white, bl… black             NA male  mascu…
## 10 Watto       137    NA black      blue, grey yellow            NA male  mascu…
## # ℹ 77 more rows
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>
# orden descendiente 

starwars %>% arrange(desc(height))
## # A tibble: 87 × 14
##    name     height  mass hair_color skin_color eye_color birth_year sex   gender
##    <chr>     <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
##  1 Yarael …    264    NA none       white      yellow          NA   male  mascu…
##  2 Tarfful     234   136 brown      brown      blue            NA   male  mascu…
##  3 Lama Su     229    88 none       grey       black           NA   male  mascu…
##  4 Chewbac…    228   112 brown      unknown    blue           200   male  mascu…
##  5 Roos Ta…    224    82 none       grey       orange          NA   male  mascu…
##  6 Grievous    216   159 none       brown, wh… green, y…       NA   male  mascu…
##  7 Taun We     213    NA none       grey       black           NA   fema… femin…
##  8 Rugor N…    206    NA none       green      orange          NA   male  mascu…
##  9 Tion Me…    206    80 none       grey       black           NA   male  mascu…
## 10 Darth V…    202   136 none       white      yellow          41.9 male  mascu…
## # ℹ 77 more rows
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>

slice()

Permite acomodar (integer) por ubicación. Permite: seleccionar, remover y duplicar filas.

starwars %>% slice (5:10)
## # A tibble: 6 × 14
##   name      height  mass hair_color skin_color eye_color birth_year sex   gender
##   <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
## 1 Leia Org…    150    49 brown      light      brown             19 fema… femin…
## 2 Owen Lars    178   120 brown, gr… light      blue              52 male  mascu…
## 3 Beru Whi…    165    75 brown      light      blue              47 fema… femin…
## 4 R5-D4         97    32 <NA>       white, red red               NA none  mascu…
## 5 Biggs Da…    183    84 black      light      brown             24 male  mascu…
## 6 Obi-Wan …    182    77 auburn, w… fair       blue-gray         57 male  mascu…
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>
# selecciona las primeras o ultimas filas

starwars %>% slice_head( n = 3)
## # A tibble: 3 × 14
##   name      height  mass hair_color skin_color eye_color birth_year sex   gender
##   <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
## 1 Luke Sky…    172    77 blond      fair       blue              19 male  mascu…
## 2 C-3PO        167    75 <NA>       gold       yellow           112 none  mascu…
## 3 R2-D2         96    32 <NA>       white, bl… red               33 none  mascu…
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>
# aleatorio: 

starwars %>% slice_sample( n = 8)
## # A tibble: 8 × 14
##   name      height  mass hair_color skin_color eye_color birth_year sex   gender
##   <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
## 1 R4-P17        96    NA none       silver, r… red, blue         NA none  femin…
## 2 Ric Olié     183    NA brown      fair       blue              NA <NA>  <NA>  
## 3 IG-88        200   140 none       metal      red               15 none  mascu…
## 4 Poggle t…    183    80 none       green      yellow            NA male  mascu…
## 5 Dooku        193    80 white      fair       brown            102 male  mascu…
## 6 Wedge An…    170    77 brown      fair       hazel             21 male  mascu…
## 7 Plo Koon     188    80 none       orange     black             22 male  mascu…
## 8 Nute Gun…    191    90 none       mottled g… red               NA male  mascu…
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>
# selección de los valores máximos y mínimos y que no contengan NA

starwars %>%
  filter (! is.na(height)) %>%
  slice_max(height, n =3)
## # A tibble: 3 × 14
##   name      height  mass hair_color skin_color eye_color birth_year sex   gender
##   <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
## 1 Yarael P…    264    NA none       white      yellow            NA male  mascu…
## 2 Tarfful      234   136 brown      brown      blue              NA male  mascu…
## 3 Lama Su      229    88 none       grey       black             NA male  mascu…
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>

COLUMNAS

select()

por nombre

starwars %>% select(hair_color, skin_color, eye_color)
## # A tibble: 87 × 3
##    hair_color    skin_color  eye_color
##    <chr>         <chr>       <chr>    
##  1 blond         fair        blue     
##  2 <NA>          gold        yellow   
##  3 <NA>          white, blue red      
##  4 none          white       yellow   
##  5 brown         light       brown    
##  6 brown, grey   light       blue     
##  7 brown         light       blue     
##  8 <NA>          white, red  red      
##  9 black         light       brown    
## 10 auburn, white fair        blue-gray
## # ℹ 77 more rows

todas las variables entre dos

starwars %>% select (hair_color:eye_color)
## # A tibble: 87 × 3
##    hair_color    skin_color  eye_color
##    <chr>         <chr>       <chr>    
##  1 blond         fair        blue     
##  2 <NA>          gold        yellow   
##  3 <NA>          white, blue red      
##  4 none          white       yellow   
##  5 brown         light       brown    
##  6 brown, grey   light       blue     
##  7 brown         light       blue     
##  8 <NA>          white, red  red      
##  9 black         light       brown    
## 10 auburn, white fair        blue-gray
## # ℹ 77 more rows

al revés. exceptuando las que estan entre medio de dos variables

starwars %>% select (!(hair_color:eye_color))
## # A tibble: 87 × 11
##    name    height  mass birth_year sex   gender homeworld species films vehicles
##    <chr>    <int> <dbl>      <dbl> <chr> <chr>  <chr>     <chr>   <lis> <list>  
##  1 Luke S…    172    77       19   male  mascu… Tatooine  Human   <chr> <chr>   
##  2 C-3PO      167    75      112   none  mascu… Tatooine  Droid   <chr> <chr>   
##  3 R2-D2       96    32       33   none  mascu… Naboo     Droid   <chr> <chr>   
##  4 Darth …    202   136       41.9 male  mascu… Tatooine  Human   <chr> <chr>   
##  5 Leia O…    150    49       19   fema… femin… Alderaan  Human   <chr> <chr>   
##  6 Owen L…    178   120       52   male  mascu… Tatooine  Human   <chr> <chr>   
##  7 Beru W…    165    75       47   fema… femin… Tatooine  Human   <chr> <chr>   
##  8 R5-D4       97    32       NA   none  mascu… Tatooine  Droid   <chr> <chr>   
##  9 Biggs …    183    84       24   male  mascu… Tatooine  Human   <chr> <chr>   
## 10 Obi-Wa…    182    77       57   male  mascu… Stewjon   Human   <chr> <chr>   
## # ℹ 77 more rows
## # ℹ 1 more variable: starships <list>

que terminen en….

starwars %>% select(ends_with("color"))
## # A tibble: 87 × 3
##    hair_color    skin_color  eye_color
##    <chr>         <chr>       <chr>    
##  1 blond         fair        blue     
##  2 <NA>          gold        yellow   
##  3 <NA>          white, blue red      
##  4 none          white       yellow   
##  5 brown         light       brown    
##  6 brown, grey   light       blue     
##  7 brown         light       blue     
##  8 <NA>          white, red  red      
##  9 black         light       brown    
## 10 auburn, white fair        blue-gray
## # ℹ 77 more rows

NOTA Hay muchas funciones que se pueden utilizar con select() por ejemplo starts_with(), ends_with(), matches() - coincide-, and contains() -contiene-

Podemos renombrar variables con select ( pero mejor usar para esto “rename”)

starwars %>% select (home_world = homeworld)
## # A tibble: 87 × 1
##    home_world
##    <chr>     
##  1 Tatooine  
##  2 Tatooine  
##  3 Naboo     
##  4 Tatooine  
##  5 Alderaan  
##  6 Tatooine  
##  7 Tatooine  
##  8 Tatooine  
##  9 Tatooine  
## 10 Stewjon   
## # ℹ 77 more rows

Mutate()

agregamos nueva variable.

starwars %>% mutate (height_m = height / 100)
## # A tibble: 87 × 15
##    name     height  mass hair_color skin_color eye_color birth_year sex   gender
##    <chr>     <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
##  1 Luke Sk…    172    77 blond      fair       blue            19   male  mascu…
##  2 C-3PO       167    75 <NA>       gold       yellow         112   none  mascu…
##  3 R2-D2        96    32 <NA>       white, bl… red             33   none  mascu…
##  4 Darth V…    202   136 none       white      yellow          41.9 male  mascu…
##  5 Leia Or…    150    49 brown      light      brown           19   fema… femin…
##  6 Owen La…    178   120 brown, gr… light      blue            52   male  mascu…
##  7 Beru Wh…    165    75 brown      light      blue            47   fema… femin…
##  8 R5-D4        97    32 <NA>       white, red red             NA   none  mascu…
##  9 Biggs D…    183    84 black      light      brown           24   male  mascu…
## 10 Obi-Wan…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
## # ℹ 77 more rows
## # ℹ 6 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>, height_m <dbl>

Para que quede claro que son metros seguimos…

starwars %>%
  mutate(height_m = height / 100) %>%
  select(height_m, height, everything())
## # A tibble: 87 × 15
##    height_m height name    mass hair_color skin_color eye_color birth_year sex  
##       <dbl>  <int> <chr>  <dbl> <chr>      <chr>      <chr>          <dbl> <chr>
##  1     1.72    172 Luke …    77 blond      fair       blue            19   male 
##  2     1.67    167 C-3PO     75 <NA>       gold       yellow         112   none 
##  3     0.96     96 R2-D2     32 <NA>       white, bl… red             33   none 
##  4     2.02    202 Darth…   136 none       white      yellow          41.9 male 
##  5     1.5     150 Leia …    49 brown      light      brown           19   fema…
##  6     1.78    178 Owen …   120 brown, gr… light      blue            52   male 
##  7     1.65    165 Beru …    75 brown      light      blue            47   fema…
##  8     0.97     97 R5-D4     32 <NA>       white, red red             NA   none 
##  9     1.83    183 Biggs…    84 black      light      brown           24   male 
## 10     1.82    182 Obi-W…    77 auburn, w… fair       blue-gray       57   male 
## # ℹ 77 more rows
## # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
## #   films <list>, vehicles <list>, starships <list>

dplyt::mutate

es similar a transform() pero permite referirnos a las columnas que ya hayamos creado.

starwars %>%
  mutate(
    height_m = height / 100,
    BMI = mass / (height_m^2)
  ) %>%
  select(BMI, everything())
## # A tibble: 87 × 16
##      BMI name      height  mass hair_color skin_color eye_color birth_year sex  
##    <dbl> <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>
##  1  26.0 Luke Sky…    172    77 blond      fair       blue            19   male 
##  2  26.9 C-3PO        167    75 <NA>       gold       yellow         112   none 
##  3  34.7 R2-D2         96    32 <NA>       white, bl… red             33   none 
##  4  33.3 Darth Va…    202   136 none       white      yellow          41.9 male 
##  5  21.8 Leia Org…    150    49 brown      light      brown           19   fema…
##  6  37.9 Owen Lars    178   120 brown, gr… light      blue            52   male 
##  7  27.5 Beru Whi…    165    75 brown      light      blue            47   fema…
##  8  34.0 R5-D4         97    32 <NA>       white, red red             NA   none 
##  9  25.1 Biggs Da…    183    84 black      light      brown           24   male 
## 10  23.2 Obi-Wan …    182    77 auburn, w… fair       blue-gray       57   male 
## # ℹ 77 more rows
## # ℹ 7 more variables: gender <chr>, homeworld <chr>, species <chr>,
## #   films <list>, vehicles <list>, starships <list>, height_m <dbl>

si queremos solo conservar la nueva variable, usamos . keep = “none”

starwars %>%
  mutate(
    height_m = height / 100,
    BMI = mass / (height_m^2),
    .keep = "none"
  )
## # A tibble: 87 × 2
##    height_m   BMI
##       <dbl> <dbl>
##  1     1.72  26.0
##  2     1.67  26.9
##  3     0.96  34.7
##  4     2.02  33.3
##  5     1.5   21.8
##  6     1.78  37.9
##  7     1.65  27.5
##  8     0.97  34.0
##  9     1.83  25.1
## 10     1.82  23.2
## # ℹ 77 more rows

Cambiamos el orden de las columnas con relocate()

Su sintaxis es similar a select () mueve blocks a otras columnas.

starwars %>% relocate (sex:homeworld, .before = height)
## # A tibble: 87 × 14
##    name      sex   gender homeworld height  mass hair_color skin_color eye_color
##    <chr>     <chr> <chr>  <chr>      <int> <dbl> <chr>      <chr>      <chr>    
##  1 Luke Sky… male  mascu… Tatooine     172    77 blond      fair       blue     
##  2 C-3PO     none  mascu… Tatooine     167    75 <NA>       gold       yellow   
##  3 R2-D2     none  mascu… Naboo         96    32 <NA>       white, bl… red      
##  4 Darth Va… male  mascu… Tatooine     202   136 none       white      yellow   
##  5 Leia Org… fema… femin… Alderaan     150    49 brown      light      brown    
##  6 Owen Lars male  mascu… Tatooine     178   120 brown, gr… light      blue     
##  7 Beru Whi… fema… femin… Tatooine     165    75 brown      light      blue     
##  8 R5-D4     none  mascu… Tatooine      97    32 <NA>       white, red red      
##  9 Biggs Da… male  mascu… Tatooine     183    84 black      light      brown    
## 10 Obi-Wan … male  mascu… Stewjon      182    77 auburn, w… fair       blue-gray
## # ℹ 77 more rows
## # ℹ 5 more variables: birth_year <dbl>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>

resumimos datos de valor con summarise()

No es tan útil hasta que no agrupemos con group_by()

starwars %>% summarise ( hight = mean(height, na.rm = TRUE))
## # A tibble: 1 × 1
##   hight
##   <dbl>
## 1  174.

Recordar:

  • El primer argumento es data frame
  • El segundo argumento describe que queremos hacer con el dataframe. Podemos referirnos a la columna del dataframe directamente sin usar $.
  • El resultado es un nuevo data frame con los cambios hechos.

PIPA %>%

El operador pipa nos sirve para realizar varias operaciones de una vez.

starwars %>%
  group_by(species, sex) %>%
  select (height, mass) %>%
  summarise(
    height = mean(height, na.rm = TRUE),
    mass = mean(mass, na.rm = TRUE)
  )
## # A tibble: 41 × 4
## # Groups:   species [38]
##    species   sex    height  mass
##    <chr>     <chr>   <dbl> <dbl>
##  1 Aleena    male      79   15  
##  2 Besalisk  male     198  102  
##  3 Cerean    male     198   82  
##  4 Chagrian  male     196  NaN  
##  5 Clawdite  female   168   55  
##  6 Droid     none     131.  69.8
##  7 Dug       male     112   40  
##  8 Ewok      male      88   20  
##  9 Geonosian male     183   80  
## 10 Gungan    male     209.  74  
## # ℹ 31 more rows