Paquete que permite manipular los datos, ordenándolos, dividiéndolos, limpiándolos y creando nuevas variables, entre otras funciones para poder normalizarlos y usarlos. http://dplyr.tidyverse.org
Verbos funciones más comunes ordenados según donde se ubiquen los datos que queremos ordenar: filas, columnas y grupos de filas.
Notas de valor: Todas las funciones toma el data frame ( o tibble) como primer argumento. Para eso usamos el operador pipa: %>%
1. Instalamos el paquete dplyr
# Preliminares
library(tidyverse)
2. Usamos la base de datos de starwars ya incorporada en r
#características de la tabla de datos starwar
dim(starwars)
## [1] 87 14
starwars
## # A tibble: 87 × 14
## name height mass hair_color skin_color eye_color birth_year sex gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 Luke Sk… 172 77 blond fair blue 19 male mascu…
## 2 C-3PO 167 75 <NA> gold yellow 112 none mascu…
## 3 R2-D2 96 32 <NA> white, bl… red 33 none mascu…
## 4 Darth V… 202 136 none white yellow 41.9 male mascu…
## 5 Leia Or… 150 49 brown light brown 19 fema… femin…
## 6 Owen La… 178 120 brown, gr… light blue 52 male mascu…
## 7 Beru Wh… 165 75 brown light blue 47 fema… femin…
## 8 R5-D4 97 32 <NA> white, red red NA none mascu…
## 9 Biggs D… 183 84 black light brown 24 male mascu…
## 10 Obi-Wan… 182 77 auburn, w… fair blue-gray 57 male mascu…
## # ℹ 77 more rows
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## # vehicles <list>, starships <list>
starwars %>% filter(skin_color == "light", eye_color =="brown")
## # A tibble: 7 × 14
## name height mass hair_color skin_color eye_color birth_year sex gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 Leia Org… 150 49 brown light brown 19 fema… femin…
## 2 Biggs Da… 183 84 black light brown 24 male mascu…
## 3 Cordé 157 NA brown light brown NA fema… femin…
## 4 Dormé 165 NA brown light brown NA fema… femin…
## 5 Raymus A… 188 79 brown light brown NA male mascu…
## 6 Poe Dame… NA NA brown light brown NA male mascu…
## 7 Padmé Am… 165 45 brown light brown 46 fema… femin…
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## # vehicles <list>, starships <list>
starwars %>% arrange(height, mass)
## # A tibble: 87 × 14
## name height mass hair_color skin_color eye_color birth_year sex gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 Yoda 66 17 white green brown 896 male mascu…
## 2 Ratts T… 79 15 none grey, blue unknown NA male mascu…
## 3 Wicket … 88 20 brown brown brown 8 male mascu…
## 4 Dud Bolt 94 45 none blue, grey yellow NA male mascu…
## 5 R2-D2 96 32 <NA> white, bl… red 33 none mascu…
## 6 R4-P17 96 NA none silver, r… red, blue NA none femin…
## 7 R5-D4 97 32 <NA> white, red red NA none mascu…
## 8 Sebulba 112 40 none grey, red orange NA male mascu…
## 9 Gasgano 122 NA none white, bl… black NA male mascu…
## 10 Watto 137 NA black blue, grey yellow NA male mascu…
## # ℹ 77 more rows
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## # vehicles <list>, starships <list>
# orden descendiente
starwars %>% arrange(desc(height))
## # A tibble: 87 × 14
## name height mass hair_color skin_color eye_color birth_year sex gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 Yarael … 264 NA none white yellow NA male mascu…
## 2 Tarfful 234 136 brown brown blue NA male mascu…
## 3 Lama Su 229 88 none grey black NA male mascu…
## 4 Chewbac… 228 112 brown unknown blue 200 male mascu…
## 5 Roos Ta… 224 82 none grey orange NA male mascu…
## 6 Grievous 216 159 none brown, wh… green, y… NA male mascu…
## 7 Taun We 213 NA none grey black NA fema… femin…
## 8 Rugor N… 206 NA none green orange NA male mascu…
## 9 Tion Me… 206 80 none grey black NA male mascu…
## 10 Darth V… 202 136 none white yellow 41.9 male mascu…
## # ℹ 77 more rows
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## # vehicles <list>, starships <list>
Permite acomodar (integer) por ubicación. Permite: seleccionar, remover y duplicar filas.
starwars %>% slice (5:10)
## # A tibble: 6 × 14
## name height mass hair_color skin_color eye_color birth_year sex gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 Leia Org… 150 49 brown light brown 19 fema… femin…
## 2 Owen Lars 178 120 brown, gr… light blue 52 male mascu…
## 3 Beru Whi… 165 75 brown light blue 47 fema… femin…
## 4 R5-D4 97 32 <NA> white, red red NA none mascu…
## 5 Biggs Da… 183 84 black light brown 24 male mascu…
## 6 Obi-Wan … 182 77 auburn, w… fair blue-gray 57 male mascu…
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## # vehicles <list>, starships <list>
# selecciona las primeras o ultimas filas
starwars %>% slice_head( n = 3)
## # A tibble: 3 × 14
## name height mass hair_color skin_color eye_color birth_year sex gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 Luke Sky… 172 77 blond fair blue 19 male mascu…
## 2 C-3PO 167 75 <NA> gold yellow 112 none mascu…
## 3 R2-D2 96 32 <NA> white, bl… red 33 none mascu…
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## # vehicles <list>, starships <list>
# aleatorio:
starwars %>% slice_sample( n = 8)
## # A tibble: 8 × 14
## name height mass hair_color skin_color eye_color birth_year sex gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 R4-P17 96 NA none silver, r… red, blue NA none femin…
## 2 Ric Olié 183 NA brown fair blue NA <NA> <NA>
## 3 IG-88 200 140 none metal red 15 none mascu…
## 4 Poggle t… 183 80 none green yellow NA male mascu…
## 5 Dooku 193 80 white fair brown 102 male mascu…
## 6 Wedge An… 170 77 brown fair hazel 21 male mascu…
## 7 Plo Koon 188 80 none orange black 22 male mascu…
## 8 Nute Gun… 191 90 none mottled g… red NA male mascu…
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## # vehicles <list>, starships <list>
# selección de los valores máximos y mínimos y que no contengan NA
starwars %>%
filter (! is.na(height)) %>%
slice_max(height, n =3)
## # A tibble: 3 × 14
## name height mass hair_color skin_color eye_color birth_year sex gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 Yarael P… 264 NA none white yellow NA male mascu…
## 2 Tarfful 234 136 brown brown blue NA male mascu…
## 3 Lama Su 229 88 none grey black NA male mascu…
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## # vehicles <list>, starships <list>
starwars %>% select(hair_color, skin_color, eye_color)
## # A tibble: 87 × 3
## hair_color skin_color eye_color
## <chr> <chr> <chr>
## 1 blond fair blue
## 2 <NA> gold yellow
## 3 <NA> white, blue red
## 4 none white yellow
## 5 brown light brown
## 6 brown, grey light blue
## 7 brown light blue
## 8 <NA> white, red red
## 9 black light brown
## 10 auburn, white fair blue-gray
## # ℹ 77 more rows
starwars %>% select (hair_color:eye_color)
## # A tibble: 87 × 3
## hair_color skin_color eye_color
## <chr> <chr> <chr>
## 1 blond fair blue
## 2 <NA> gold yellow
## 3 <NA> white, blue red
## 4 none white yellow
## 5 brown light brown
## 6 brown, grey light blue
## 7 brown light blue
## 8 <NA> white, red red
## 9 black light brown
## 10 auburn, white fair blue-gray
## # ℹ 77 more rows
starwars %>% select (!(hair_color:eye_color))
## # A tibble: 87 × 11
## name height mass birth_year sex gender homeworld species films vehicles
## <chr> <int> <dbl> <dbl> <chr> <chr> <chr> <chr> <lis> <list>
## 1 Luke S… 172 77 19 male mascu… Tatooine Human <chr> <chr>
## 2 C-3PO 167 75 112 none mascu… Tatooine Droid <chr> <chr>
## 3 R2-D2 96 32 33 none mascu… Naboo Droid <chr> <chr>
## 4 Darth … 202 136 41.9 male mascu… Tatooine Human <chr> <chr>
## 5 Leia O… 150 49 19 fema… femin… Alderaan Human <chr> <chr>
## 6 Owen L… 178 120 52 male mascu… Tatooine Human <chr> <chr>
## 7 Beru W… 165 75 47 fema… femin… Tatooine Human <chr> <chr>
## 8 R5-D4 97 32 NA none mascu… Tatooine Droid <chr> <chr>
## 9 Biggs … 183 84 24 male mascu… Tatooine Human <chr> <chr>
## 10 Obi-Wa… 182 77 57 male mascu… Stewjon Human <chr> <chr>
## # ℹ 77 more rows
## # ℹ 1 more variable: starships <list>
starwars %>% select(ends_with("color"))
## # A tibble: 87 × 3
## hair_color skin_color eye_color
## <chr> <chr> <chr>
## 1 blond fair blue
## 2 <NA> gold yellow
## 3 <NA> white, blue red
## 4 none white yellow
## 5 brown light brown
## 6 brown, grey light blue
## 7 brown light blue
## 8 <NA> white, red red
## 9 black light brown
## 10 auburn, white fair blue-gray
## # ℹ 77 more rows
NOTA Hay muchas funciones que se pueden utilizar con select() por ejemplo starts_with(), ends_with(), matches() - coincide-, and contains() -contiene-
starwars %>% select (home_world = homeworld)
## # A tibble: 87 × 1
## home_world
## <chr>
## 1 Tatooine
## 2 Tatooine
## 3 Naboo
## 4 Tatooine
## 5 Alderaan
## 6 Tatooine
## 7 Tatooine
## 8 Tatooine
## 9 Tatooine
## 10 Stewjon
## # ℹ 77 more rows
agregamos nueva variable.
starwars %>% mutate (height_m = height / 100)
## # A tibble: 87 × 15
## name height mass hair_color skin_color eye_color birth_year sex gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 Luke Sk… 172 77 blond fair blue 19 male mascu…
## 2 C-3PO 167 75 <NA> gold yellow 112 none mascu…
## 3 R2-D2 96 32 <NA> white, bl… red 33 none mascu…
## 4 Darth V… 202 136 none white yellow 41.9 male mascu…
## 5 Leia Or… 150 49 brown light brown 19 fema… femin…
## 6 Owen La… 178 120 brown, gr… light blue 52 male mascu…
## 7 Beru Wh… 165 75 brown light blue 47 fema… femin…
## 8 R5-D4 97 32 <NA> white, red red NA none mascu…
## 9 Biggs D… 183 84 black light brown 24 male mascu…
## 10 Obi-Wan… 182 77 auburn, w… fair blue-gray 57 male mascu…
## # ℹ 77 more rows
## # ℹ 6 more variables: homeworld <chr>, species <chr>, films <list>,
## # vehicles <list>, starships <list>, height_m <dbl>
starwars %>%
mutate(height_m = height / 100) %>%
select(height_m, height, everything())
## # A tibble: 87 × 15
## height_m height name mass hair_color skin_color eye_color birth_year sex
## <dbl> <int> <chr> <dbl> <chr> <chr> <chr> <dbl> <chr>
## 1 1.72 172 Luke … 77 blond fair blue 19 male
## 2 1.67 167 C-3PO 75 <NA> gold yellow 112 none
## 3 0.96 96 R2-D2 32 <NA> white, bl… red 33 none
## 4 2.02 202 Darth… 136 none white yellow 41.9 male
## 5 1.5 150 Leia … 49 brown light brown 19 fema…
## 6 1.78 178 Owen … 120 brown, gr… light blue 52 male
## 7 1.65 165 Beru … 75 brown light blue 47 fema…
## 8 0.97 97 R5-D4 32 <NA> white, red red NA none
## 9 1.83 183 Biggs… 84 black light brown 24 male
## 10 1.82 182 Obi-W… 77 auburn, w… fair blue-gray 57 male
## # ℹ 77 more rows
## # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
## # films <list>, vehicles <list>, starships <list>
es similar a transform() pero permite referirnos a las columnas que ya hayamos creado.
starwars %>%
mutate(
height_m = height / 100,
BMI = mass / (height_m^2)
) %>%
select(BMI, everything())
## # A tibble: 87 × 16
## BMI name height mass hair_color skin_color eye_color birth_year sex
## <dbl> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
## 1 26.0 Luke Sky… 172 77 blond fair blue 19 male
## 2 26.9 C-3PO 167 75 <NA> gold yellow 112 none
## 3 34.7 R2-D2 96 32 <NA> white, bl… red 33 none
## 4 33.3 Darth Va… 202 136 none white yellow 41.9 male
## 5 21.8 Leia Org… 150 49 brown light brown 19 fema…
## 6 37.9 Owen Lars 178 120 brown, gr… light blue 52 male
## 7 27.5 Beru Whi… 165 75 brown light blue 47 fema…
## 8 34.0 R5-D4 97 32 <NA> white, red red NA none
## 9 25.1 Biggs Da… 183 84 black light brown 24 male
## 10 23.2 Obi-Wan … 182 77 auburn, w… fair blue-gray 57 male
## # ℹ 77 more rows
## # ℹ 7 more variables: gender <chr>, homeworld <chr>, species <chr>,
## # films <list>, vehicles <list>, starships <list>, height_m <dbl>
starwars %>%
mutate(
height_m = height / 100,
BMI = mass / (height_m^2),
.keep = "none"
)
## # A tibble: 87 × 2
## height_m BMI
## <dbl> <dbl>
## 1 1.72 26.0
## 2 1.67 26.9
## 3 0.96 34.7
## 4 2.02 33.3
## 5 1.5 21.8
## 6 1.78 37.9
## 7 1.65 27.5
## 8 0.97 34.0
## 9 1.83 25.1
## 10 1.82 23.2
## # ℹ 77 more rows
Su sintaxis es similar a select () mueve blocks a otras columnas.
starwars %>% relocate (sex:homeworld, .before = height)
## # A tibble: 87 × 14
## name sex gender homeworld height mass hair_color skin_color eye_color
## <chr> <chr> <chr> <chr> <int> <dbl> <chr> <chr> <chr>
## 1 Luke Sky… male mascu… Tatooine 172 77 blond fair blue
## 2 C-3PO none mascu… Tatooine 167 75 <NA> gold yellow
## 3 R2-D2 none mascu… Naboo 96 32 <NA> white, bl… red
## 4 Darth Va… male mascu… Tatooine 202 136 none white yellow
## 5 Leia Org… fema… femin… Alderaan 150 49 brown light brown
## 6 Owen Lars male mascu… Tatooine 178 120 brown, gr… light blue
## 7 Beru Whi… fema… femin… Tatooine 165 75 brown light blue
## 8 R5-D4 none mascu… Tatooine 97 32 <NA> white, red red
## 9 Biggs Da… male mascu… Tatooine 183 84 black light brown
## 10 Obi-Wan … male mascu… Stewjon 182 77 auburn, w… fair blue-gray
## # ℹ 77 more rows
## # ℹ 5 more variables: birth_year <dbl>, species <chr>, films <list>,
## # vehicles <list>, starships <list>
No es tan útil hasta que no agrupemos con group_by()
starwars %>% summarise ( hight = mean(height, na.rm = TRUE))
## # A tibble: 1 × 1
## hight
## <dbl>
## 1 174.
El operador pipa nos sirve para realizar varias operaciones de una vez.
starwars %>%
group_by(species, sex) %>%
select (height, mass) %>%
summarise(
height = mean(height, na.rm = TRUE),
mass = mean(mass, na.rm = TRUE)
)
## # A tibble: 41 × 4
## # Groups: species [38]
## species sex height mass
## <chr> <chr> <dbl> <dbl>
## 1 Aleena male 79 15
## 2 Besalisk male 198 102
## 3 Cerean male 198 82
## 4 Chagrian male 196 NaN
## 5 Clawdite female 168 55
## 6 Droid none 131. 69.8
## 7 Dug male 112 40
## 8 Ewok male 88 20
## 9 Geonosian male 183 80
## 10 Gungan male 209. 74
## # ℹ 31 more rows