La manipulación de data frames significa distintas cosas para distintos investigadores analistas.
A veces se quiere seleccionar ciertas observaciones (filas) o variables (columnas), otras veces se pueden agrupar los datos en función de una o más variables, o tal vez querer calcular valores estadísticos de un conjunto o simplemente cambiar o transformar valores en variables.
Estas operaciones se pueden usar mediante funciones de la librería dplyr().
library(dplyr)
# datos <- data.frame(starwars)
# datos <- read.csv("ruta/starwars.csv") # local
datos <- read.csv("https://raw.githubusercontent.com/rpizarrog/Curso-Titulacion-Data-Science-/master/2020/datos/starwars.csv", encoding = "UTF-8") # desde la WEB
El paquete dplyr fue desarrollado por Hadley Wickham de RStudio y es un versión optimizada de su paquete plyr.
Una importante contribución del paquete dplyr es que proporciona una “gramática” (particularmente verbos) para la manipulación y operaciones con data frames. https://rsanchezs.gitbooks.io/rprogramming/content/chapter9/dplyr.html
Algunas funciones de la libraria dplyr serían:
select(datos, name, gender)
## name gender
## 1 Luke Skywalker male
## 2 C-3PO <NA>
## 3 R2-D2 <NA>
## 4 Darth Vader male
## 5 Leia Organa female
## 6 Owen Lars male
## 7 Beru Whitesun lars female
## 8 R5-D4 <NA>
## 9 Biggs Darklighter male
## 10 Obi-Wan Kenobi male
## 11 Anakin Skywalker male
## 12 Wilhuff Tarkin male
## 13 Chewbacca male
## 14 Han Solo male
## 15 Greedo male
## 16 Jabba Desilijic Tiure hermaphrodite
## 17 Wedge Antilles male
## 18 Jek Tono Porkins male
## 19 Yoda male
## 20 Palpatine male
## 21 Boba Fett male
## 22 IG-88 none
## 23 Bossk male
## 24 Lando Calrissian male
## 25 Lobot male
## 26 Ackbar male
## 27 Mon Mothma female
## 28 Arvel Crynyd male
## 29 Wicket Systri Warrick male
## 30 Nien Nunb male
## 31 Qui-Gon Jinn male
## 32 Nute Gunray male
## 33 Finis Valorum male
## 34 Jar Jar Binks male
## 35 Roos Tarpals male
## 36 Rugor Nass male
## 37 Ric Olié male
## 38 Watto male
## 39 Sebulba male
## 40 Quarsh Panaka male
## 41 Shmi Skywalker female
## 42 Darth Maul male
## 43 Bib Fortuna male
## 44 Ayla Secura female
## 45 Dud Bolt male
## 46 Gasgano male
## 47 Ben Quadinaros male
## 48 Mace Windu male
## 49 Ki-Adi-Mundi male
## 50 Kit Fisto male
## 51 Eeth Koth male
## 52 Adi Gallia female
## 53 Saesee Tiin male
## 54 Yarael Poof male
## 55 Plo Koon male
## 56 Mas Amedda male
## 57 Gregar Typho male
## 58 Cordé female
## 59 Cliegg Lars male
## 60 Poggle the Lesser male
## 61 Luminara Unduli female
## 62 Barriss Offee female
## 63 Dormé female
## 64 Dooku male
## 65 Bail Prestor Organa male
## 66 Jango Fett male
## 67 Zam Wesell female
## 68 Dexter Jettster male
## 69 Lama Su male
## 70 Taun We female
## 71 Jocasta Nu female
## 72 Ratts Tyerell male
## 73 R4-P17 female
## 74 Wat Tambor male
## 75 San Hill male
## 76 Shaak Ti female
## 77 Grievous male
## 78 Tarfful male
## 79 Raymus Antilles male
## 80 Sly Moore female
## 81 Tion Medon male
## 82 Finn male
## 83 Rey female
## 84 Poe Dameron male
## 85 BB8 none
## 86 Captain Phasma female
## 87 Padmé Amidala female
Permite filtrar o enconrtrar un sunconjunto de datos
Algo similar a subset() visto en anteriores sesiones
Aquí con variable temporal datostemp. No es lo más recomendable, mejor utilizar pipes.
datostemp <- select(datos, name, gender)
masculinos <- filter(datostemp, gender == 'male')
masculinos
## name gender
## 1 Luke Skywalker male
## 2 Darth Vader male
## 3 Owen Lars male
## 4 Biggs Darklighter male
## 5 Obi-Wan Kenobi male
## 6 Anakin Skywalker male
## 7 Wilhuff Tarkin male
## 8 Chewbacca male
## 9 Han Solo male
## 10 Greedo male
## 11 Wedge Antilles male
## 12 Jek Tono Porkins male
## 13 Yoda male
## 14 Palpatine male
## 15 Boba Fett male
## 16 Bossk male
## 17 Lando Calrissian male
## 18 Lobot male
## 19 Ackbar male
## 20 Arvel Crynyd male
## 21 Wicket Systri Warrick male
## 22 Nien Nunb male
## 23 Qui-Gon Jinn male
## 24 Nute Gunray male
## 25 Finis Valorum male
## 26 Jar Jar Binks male
## 27 Roos Tarpals male
## 28 Rugor Nass male
## 29 Ric Olié male
## 30 Watto male
## 31 Sebulba male
## 32 Quarsh Panaka male
## 33 Darth Maul male
## 34 Bib Fortuna male
## 35 Dud Bolt male
## 36 Gasgano male
## 37 Ben Quadinaros male
## 38 Mace Windu male
## 39 Ki-Adi-Mundi male
## 40 Kit Fisto male
## 41 Eeth Koth male
## 42 Saesee Tiin male
## 43 Yarael Poof male
## 44 Plo Koon male
## 45 Mas Amedda male
## 46 Gregar Typho male
## 47 Cliegg Lars male
## 48 Poggle the Lesser male
## 49 Dooku male
## 50 Bail Prestor Organa male
## 51 Jango Fett male
## 52 Dexter Jettster male
## 53 Lama Su male
## 54 Ratts Tyerell male
## 55 Wat Tambor male
## 56 San Hill male
## 57 Grievous male
## 58 Tarfful male
## 59 Raymus Antilles male
## 60 Tion Medon male
## 61 Finn male
## 62 Poe Dameron male
masculinos <- datos %>%
filter(gender == "male") %>%
select(name,gender)
masculinos
## name gender
## 1 Luke Skywalker male
## 2 Darth Vader male
## 3 Owen Lars male
## 4 Biggs Darklighter male
## 5 Obi-Wan Kenobi male
## 6 Anakin Skywalker male
## 7 Wilhuff Tarkin male
## 8 Chewbacca male
## 9 Han Solo male
## 10 Greedo male
## 11 Wedge Antilles male
## 12 Jek Tono Porkins male
## 13 Yoda male
## 14 Palpatine male
## 15 Boba Fett male
## 16 Bossk male
## 17 Lando Calrissian male
## 18 Lobot male
## 19 Ackbar male
## 20 Arvel Crynyd male
## 21 Wicket Systri Warrick male
## 22 Nien Nunb male
## 23 Qui-Gon Jinn male
## 24 Nute Gunray male
## 25 Finis Valorum male
## 26 Jar Jar Binks male
## 27 Roos Tarpals male
## 28 Rugor Nass male
## 29 Ric Olié male
## 30 Watto male
## 31 Sebulba male
## 32 Quarsh Panaka male
## 33 Darth Maul male
## 34 Bib Fortuna male
## 35 Dud Bolt male
## 36 Gasgano male
## 37 Ben Quadinaros male
## 38 Mace Windu male
## 39 Ki-Adi-Mundi male
## 40 Kit Fisto male
## 41 Eeth Koth male
## 42 Saesee Tiin male
## 43 Yarael Poof male
## 44 Plo Koon male
## 45 Mas Amedda male
## 46 Gregar Typho male
## 47 Cliegg Lars male
## 48 Poggle the Lesser male
## 49 Dooku male
## 50 Bail Prestor Organa male
## 51 Jango Fett male
## 52 Dexter Jettster male
## 53 Lama Su male
## 54 Ratts Tyerell male
## 55 Wat Tambor male
## 56 San Hill male
## 57 Grievous male
## 58 Tarfful male
## 59 Raymus Antilles male
## 60 Tion Medon male
## 61 Finn male
## 62 Poe Dameron male
arrange(masculinos, desc(name) )
## name gender
## 1 Yoda male
## 2 Yarael Poof male
## 3 Wilhuff Tarkin male
## 4 Wicket Systri Warrick male
## 5 Wedge Antilles male
## 6 Watto male
## 7 Wat Tambor male
## 8 Tion Medon male
## 9 Tarfful male
## 10 Sebulba male
## 11 San Hill male
## 12 Saesee Tiin male
## 13 Rugor Nass male
## 14 Roos Tarpals male
## 15 Ric Olié male
## 16 Raymus Antilles male
## 17 Ratts Tyerell male
## 18 Qui-Gon Jinn male
## 19 Quarsh Panaka male
## 20 Poggle the Lesser male
## 21 Poe Dameron male
## 22 Plo Koon male
## 23 Palpatine male
## 24 Owen Lars male
## 25 Obi-Wan Kenobi male
## 26 Nute Gunray male
## 27 Nien Nunb male
## 28 Mas Amedda male
## 29 Mace Windu male
## 30 Luke Skywalker male
## 31 Lobot male
## 32 Lando Calrissian male
## 33 Lama Su male
## 34 Kit Fisto male
## 35 Ki-Adi-Mundi male
## 36 Jek Tono Porkins male
## 37 Jar Jar Binks male
## 38 Jango Fett male
## 39 Han Solo male
## 40 Grievous male
## 41 Gregar Typho male
## 42 Greedo male
## 43 Gasgano male
## 44 Finn male
## 45 Finis Valorum male
## 46 Eeth Koth male
## 47 Dud Bolt male
## 48 Dooku male
## 49 Dexter Jettster male
## 50 Darth Vader male
## 51 Darth Maul male
## 52 Cliegg Lars male
## 53 Chewbacca male
## 54 Bossk male
## 55 Boba Fett male
## 56 Biggs Darklighter male
## 57 Bib Fortuna male
## 58 Ben Quadinaros male
## 59 Bail Prestor Organa male
## 60 Arvel Crynyd male
## 61 Anakin Skywalker male
## 62 Ackbar male
masculinos <- datos %>%
filter(gender == "male") %>%
select(name,gender) %>%
arrange(name)
masculinos
## name gender
## 1 Ackbar male
## 2 Anakin Skywalker male
## 3 Arvel Crynyd male
## 4 Bail Prestor Organa male
## 5 Ben Quadinaros male
## 6 Bib Fortuna male
## 7 Biggs Darklighter male
## 8 Boba Fett male
## 9 Bossk male
## 10 Chewbacca male
## 11 Cliegg Lars male
## 12 Darth Maul male
## 13 Darth Vader male
## 14 Dexter Jettster male
## 15 Dooku male
## 16 Dud Bolt male
## 17 Eeth Koth male
## 18 Finis Valorum male
## 19 Finn male
## 20 Gasgano male
## 21 Greedo male
## 22 Gregar Typho male
## 23 Grievous male
## 24 Han Solo male
## 25 Jango Fett male
## 26 Jar Jar Binks male
## 27 Jek Tono Porkins male
## 28 Ki-Adi-Mundi male
## 29 Kit Fisto male
## 30 Lama Su male
## 31 Lando Calrissian male
## 32 Lobot male
## 33 Luke Skywalker male
## 34 Mace Windu male
## 35 Mas Amedda male
## 36 Nien Nunb male
## 37 Nute Gunray male
## 38 Obi-Wan Kenobi male
## 39 Owen Lars male
## 40 Palpatine male
## 41 Plo Koon male
## 42 Poe Dameron male
## 43 Poggle the Lesser male
## 44 Quarsh Panaka male
## 45 Qui-Gon Jinn male
## 46 Ratts Tyerell male
## 47 Raymus Antilles male
## 48 Ric Olié male
## 49 Roos Tarpals male
## 50 Rugor Nass male
## 51 Saesee Tiin male
## 52 San Hill male
## 53 Sebulba male
## 54 Tarfful male
## 55 Tion Medon male
## 56 Wat Tambor male
## 57 Watto male
## 58 Wedge Antilles male
## 59 Wicket Systri Warrick male
## 60 Wilhuff Tarkin male
## 61 Yarael Poof male
## 62 Yoda male
cuantos.Genero <- data.frame(datos %>%
group_by(gender) %>%
summarise(cuantos = n()))
## Warning: Factor `gender` contains implicit NA, consider using
## `forcats::fct_explicit_na`
cuantos.Genero
## gender cuantos
## 1 female 19
## 2 hermaphrodite 1
## 3 male 62
## 4 none 2
## 5 <NA> 3
datos <- datos %>%
mutate(homeworld = ifelse(is.na(homeworld), "star", homeworld))
cuantos.Origen <- data.frame(datos %>%
group_by(homeworld) %>%
summarise(cuantos = n()))
cuantos.Origen <- arrange(cuantos.Origen, desc(cuantos))
cuantos.Origen
## homeworld cuantos
## 1 28 11
## 2 40 10
## 3 star 10
## 4 1 3
## 5 11 3
## 6 22 3
## 7 10 2
## 8 23 2
## 9 25 2
## 10 33 2
## 11 12 1
## 12 13 1
## 13 14 1
## 14 15 1
## 15 16 1
## 16 17 1
## 17 18 1
## 18 19 1
## 19 2 1
## 20 20 1
## 21 21 1
## 22 24 1
## 23 26 1
## 24 27 1
## 25 29 1
## 26 3 1
## 27 30 1
## 28 31 1
## 29 32 1
## 30 34 1
## 31 35 1
## 32 36 1
## 33 37 1
## 34 38 1
## 35 39 1
## 36 4 1
## 37 41 1
## 38 42 1
## 39 43 1
## 40 44 1
## 41 45 1
## 42 46 1
## 43 47 1
## 44 48 1
## 45 5 1
## 46 6 1
## 47 7 1
## 48 8 1
## 49 9 1