library(tidyverse)
starwars
head(starwars$films)
## [[1]]
## [1] "The Empire Strikes Back" "Revenge of the Sith"
## [3] "Return of the Jedi" "A New Hope"
## [5] "The Force Awakens"
##
## [[2]]
## [1] "The Empire Strikes Back" "Attack of the Clones"
## [3] "The Phantom Menace" "Revenge of the Sith"
## [5] "Return of the Jedi" "A New Hope"
##
## [[3]]
## [1] "The Empire Strikes Back" "Attack of the Clones"
## [3] "The Phantom Menace" "Revenge of the Sith"
## [5] "Return of the Jedi" "A New Hope"
## [7] "The Force Awakens"
##
## [[4]]
## [1] "The Empire Strikes Back" "Revenge of the Sith"
## [3] "Return of the Jedi" "A New Hope"
##
## [[5]]
## [1] "The Empire Strikes Back" "Revenge of the Sith"
## [3] "Return of the Jedi" "A New Hope"
## [5] "The Force Awakens"
##
## [[6]]
## [1] "Attack of the Clones" "Revenge of the Sith" "A New Hope"
starwars %>%
filter(map_lgl(films,~ "Attack of the Clones" %in% .))
starwars %>%
filter(map_lgl(films,~ all( c("Attack of the Clones","A New Hope") %in% .)))
starwars %>%
select(name,films) %>%
unnest(films)
starwars %>%
unnest(films) %>%
count(films) %>%
arrange(n)
starwars %>%
filter(!is.na(homeworld)) %>%
mutate(homeworld = fct_lump(homeworld, n = 3)) %>%
count(homeworld) %>%
arrange(n)
starwars %>%
unnest(films) %>%
ggplot(aes(films)) +
geom_bar() +
coord_flip()
starwars %>%
unnest(films) %>%
mutate(films = fct_infreq(films)) %>%
ggplot(aes(films)) +
geom_bar() +
coord_flip()
Let’s look at some of the attributes of the characters. I’m interested in know which of these folks could float. We can’t know density directly, but we can fudge it a bit by comparing height and mass. I would like to plot this relationship and categorize the results by species.
Let’s begin by removing outliers. There is one very heavy character, and no real obvious outliers in height.
hist(starwars$mass)
hist(starwars$height)
sort(starwars$mass)
## [1] 15.0 17.0 20.0 32.0 32.0 40.0 45.0 45.0 48.0 48.0
## [11] 49.0 50.0 50.0 55.0 55.0 56.2 57.0 65.0 66.0 68.0
## [21] 74.0 75.0 75.0 75.0 77.0 77.0 77.0 78.2 79.0 79.0
## [31] 79.0 79.0 80.0 80.0 80.0 80.0 80.0 80.0 82.0 82.0
## [41] 83.0 84.0 84.0 84.0 85.0 87.0 88.0 89.0 90.0 102.0
## [51] 110.0 112.0 113.0 120.0 136.0 136.0 140.0 159.0 1358.0
filter(starwars, mass == 1358)
Of course, our fat outlier is none other than Jabba the Hut. Where’s the Rancor??? I suppose it doesn’t matter.
sw_no_jabba <- starwars %>%
filter(name != "Jabba Desilijic Tiure")
hist(sw_no_jabba$mass)
This is somewhat normally distributed.
There are so many species in starwars, we should show only the most common, and lump the rest into a group called “Other”. If there is only one specimen, we call them “Other”.
top_species <- sw_no_jabba %>%
group_by(species) %>%
filter(n() >= 2)
tops <- unique(top_species$species)
sw_no_jabba <- sw_no_jabba %>%
mutate(species = fct_other(species, keep = tops, other_level = "Other"))
ggplot(sw_no_jabba,
aes(x = height,
y = mass,
col = species)) +
geom_point() +
geom_smooth(method = "lm")
Assuming that humans offer a pretty good approximation of things that float, then we can probably assume that these droids do not float, but that most Gungans, Mirialans, Wookies, and Kaminoans do.