library(tidyverse)
starwars
head(starwars$films)
## [[1]]
## [1] "The Empire Strikes Back" "Revenge of the Sith"
## [3] "Return of the Jedi" "A New Hope"
## [5] "The Force Awakens"
##
## [[2]]
## [1] "The Empire Strikes Back" "Attack of the Clones"
## [3] "The Phantom Menace" "Revenge of the Sith"
## [5] "Return of the Jedi" "A New Hope"
##
## [[3]]
## [1] "The Empire Strikes Back" "Attack of the Clones"
## [3] "The Phantom Menace" "Revenge of the Sith"
## [5] "Return of the Jedi" "A New Hope"
## [7] "The Force Awakens"
##
## [[4]]
## [1] "The Empire Strikes Back" "Revenge of the Sith"
## [3] "Return of the Jedi" "A New Hope"
##
## [[5]]
## [1] "The Empire Strikes Back" "Revenge of the Sith"
## [3] "Return of the Jedi" "A New Hope"
## [5] "The Force Awakens"
##
## [[6]]
## [1] "Attack of the Clones" "Revenge of the Sith" "A New Hope"
starwars %>%
filter(map_lgl(films,~ "Attack of the Clones" %in% .))
starwars %>%
filter(map_lgl(films,~ all( c("Attack of the Clones","A New Hope") %in% .)))
starwars %>%
select(name,films) %>%
unnest(films)
starwars %>%
unnest(films) %>%
count(films) %>%
arrange(n)
starwars %>%
filter(!is.na(homeworld)) %>%
mutate(homeworld = fct_lump(homeworld, n = 3)) %>%
count(homeworld) %>%
arrange(n)
starwars %>%
unnest(films) %>%
ggplot(aes(films)) +
geom_bar() +
coord_flip()
starwars %>%
unnest(films) %>%
mutate(films = fct_infreq(films)) %>%
ggplot(aes(films)) +
geom_bar() +
coord_flip()
library(tidyr)
library(reactable)
library(wordcloud)
library(purrr)
library(tibble)
From looking at the variables in this dataset, I will create two subsets based on gender (masculine/feminine). With the use of DPLYR, subsetting is very easy to do, especially when doing so by keyword.
masc_starwars <- starwars %>%
filter(gender == "masculine")
fem_starwars <- starwars %>%
filter(gender == "feminine")
head(masc_starwars,3)
head(fem_starwars,3)
We can see information on all species listed within this dataset. To exemplify, we can look at data regarding characters who are humans or droids.
human <- starwars %>% filter(species == "Human")
droid <- starwars %>% filter(species == "Droid")
head(human, 3)
head(droid, 3)
Via the “reactable” function, we are able to generate an interactive search table. For the first example, the “distinct()” function is used to specifically search through the names of all characters in the franchise. \(^1\)
starwars %>%
distinct(name) %>%
reactable(bordered = TRUE, striped = TRUE,
highlight = TRUE, filterable = TRUE, showPageSizeOptions = TRUE,
showPagination = TRUE, pageSizeOptions = c(10,10,10), defaultPageSize = 5)
We can also make the entire dataset searchable by simply omitting “distinct()”. Now we can see all information listed per each character within the dataset. This also allows us to search through results of each column.
reactable(starwars, searchable = TRUE, minRows = 1, bordered = TRUE, striped = TRUE,
highlight = TRUE, filterable = TRUE, showPageSizeOptions = TRUE,
showPagination = TRUE, pageSizeOptions = c(10,10,10), defaultPageSize = 5)
The Purrr library from the tidyverse can be particularly useful for cleaning data. For instance, I will exemplify how to keep/extract certain variables within the starwars dataset. \(^2\)
numeric variables only:
num_SW <- starwars %>%
keep(is.numeric)
head(num_SW, 3)
summary(num_SW)
## height mass birth_year
## Min. : 66.0 Min. : 15.00 Min. : 8.00
## 1st Qu.:167.0 1st Qu.: 55.60 1st Qu.: 35.00
## Median :180.0 Median : 79.00 Median : 52.00
## Mean :174.4 Mean : 97.31 Mean : 87.57
## 3rd Qu.:191.0 3rd Qu.: 84.50 3rd Qu.: 72.00
## Max. :264.0 Max. :1358.00 Max. :896.00
## NA's :6 NA's :28 NA's :44
character variables only:
char_SW <- starwars %>%
keep(is_character)
head(char_SW, 3)
Tidyr is a package that allows one to clean data. In this example, I will remove all NA values from the starwars dataset. \(^3\)
no_more_NA <- starwars %>% drop_na()
no_more_NA
As we can see from this dataset, there are only 6 complete rows.
Sources & Citations \(^1\): https://glin.github.io/reactable/index.html \(^2\): https://dcl-prog.stanford.edu/purrr-extras.html \(^3\): https://tidyr.tidyverse.org/reference/drop_na.html