What were the three most interesting or useful things you learned in this week?:
ggplot(data, aes(...)) and then stack geoms, scales, and
themes with +. Once that clicks, every plot is just a
different combination of the same building blocks.position argument completely changes a bar or
histogram. Switching between "stack", "dodge",
and "fill" answers three different questions from the same
data: raw totals, side-by-side comparison, and percent share.geom_col() and
geom_bar(): geom_col() uses values you already
computed, while geom_bar() counts rows for you.What were the three muddiest or unclear things from this week?:
fct_inorder() versus manually setting
factor levels with factor(levels = ...).position = "dodge",
position = "stack", and position = "fill" in
bar charts.These data contain counts of words spoken by characters of different species and genders in the Lord of the Rings movie trilogy. It originally comes from the manyeyes data blog, which no longer exists—Jenny Bryan has a copy here.
These datasets are in the Resources module in Canvas. The code below
loads, restructures, and cleans the data. You need to modify the
read.csv functions and adjust the path to successfully read
the data.
The resulting lotr dataset contains the following
columns:
Film: The name of the filmSpecies: The type of species (elf, hobbit, man)Gender: Female and maleWords: The number of words spoken in each film,
species, and genderDoes a certain species dominate (i.e. speak the most) the entire trilogy?
Use group_by() and summarize() to create a
data frame named lotr_species that looks like this:
# A tibble: 3 × 2
Species total
<chr> <dbl>
1 Elf 3737
2 Hobbit 8796
3 Man 8712
See the plots in Canvas; each is shown in the details for this assignment.
Use ggplot() and geom_col() in the chunk
below to re-create the plot above using the lotr_species
dataset.
Does a certain gender dominate a movie? (lolz of course it does, but still, graph it).
Use group_by() and summarize() to create a
data frame named lotr_gender_film that looks like this:
# A tibble: 6 × 3
# Groups: Gender [2]
Gender Film total
<chr> <fct> <dbl>
1 Female The Fellowship Of The Ring 1243
2 Female The Two Towers 732
3 Female The Return Of The King 453
4 Male The Fellowship Of The Ring 6610
5 Male The Two Towers 6565
6 Male The Return Of The King 5642
## # A tibble: 6 × 3
## # Groups: Gender [2]
## Gender Film total
## <chr> <fct> <dbl>
## 1 Female The Fellowship Of The Ring 1243
## 2 Female The Two Towers 732
## 3 Female The Return Of The King 453
## 4 Male The Fellowship Of The Ring 6610
## 5 Male The Two Towers 6565
## 6 Male The Return Of The King 5642
Run the code chunk below to see a plot.
Use ggplot() and geom_col() in the chunk
below to re-create the plot above using the
lotr_gender_film dataset.
Does the dominant species differ across the three movies?
There’s no recreation here—you have all the code you need above.
Hint: you’ll need to group by Species and
Film.
Hobbits dominate The Fellowship of the Ring. Men take over in The Two Towers and The Return of the King, which matches the story moving from the Shire toward Rohan and Gondor. Elves stay the smallest group across all three films.
Create a plot that visualizes the number of words spoken by species,
gender, and film simultaneously. Use the complete tidy lotr
data frame. You don’t need to create a new summarized dataset (with
group_by(Species, Gender, Film)) because the original data
already have a row for each of those (you could make a summarized
dataset, but it would be identical to the full version).
You need to show Species, Gender, and
Film at the same time, but you only have two possible
aesthetics (x and fill), so you’ll also need
to facet by the third. Play around with different combinations (e.g. try
x = Species, then x = Film) until you find one
that tells the clearest story.
I rebuilt the species, gender, and film plot following Tufte’s “use less ink” principle. I dropped the grey panel background, removed the vertical gridlines, and kept only soft horizontal gridlines so the eye can still read totals without visual noise. The bars were narrowed slightly so each gender pair reads as a distinct unit. A calmer red and blue palette replaces the saturated default colors while still separating Female and Male clearly. The title now states the conclusion itself, so the reader sees the story before reading any axis label. Less decoration, more meaning.