Task 1: Weekly check-in

What were the three most interesting or useful things you learned in this week?:

  1. ggplot is layered. You start with ggplot(data, aes(...)) and then stack geoms, scales, and themes with +. Once that clicks, every plot is just a different combination of the same building blocks.
  2. The position argument completely changes a bar or histogram. Switching between "stack", "dodge", and "fill" answers three different questions from the same data: raw totals, side-by-side comparison, and percent share.
  3. The difference between geom_col() and geom_bar(): geom_col() uses values you already computed, while geom_bar() counts rows for you.

What were the three muddiest or unclear things from this week?:

  1. When to use fct_inorder() versus manually setting factor levels with factor(levels = ...).
  2. The difference between position = "dodge", position = "stack", and position = "fill" in bar charts.
  3. Mosaic plots, treemaps, and parallel sets all show part-to-whole relationships, but it is not obvious which one to reach for in practice.

Task 2: Lord of the Rings

Data description and cleaning

These data contain counts of words spoken by characters of different species and genders in the Lord of the Rings movie trilogy. It originally comes from the manyeyes data blog, which no longer exists—Jenny Bryan has a copy here.

These datasets are in the Resources module in Canvas. The code below loads, restructures, and cleans the data. You need to modify the read.csv functions and adjust the path to successfully read the data.

The resulting lotr dataset contains the following columns:

  • Film: The name of the film
  • Species: The type of species (elf, hobbit, man)
  • Gender: Female and male
  • Words: The number of words spoken in each film, species, and gender

Species

Does a certain species dominate (i.e. speak the most) the entire trilogy?

Recreate data summary

Use group_by() and summarize() to create a data frame named lotr_species that looks like this:

# A tibble: 3 × 2
  Species total
  <chr>   <dbl>
1 Elf      3737
2 Hobbit   8796
3 Man      8712

Recreate plot

See the plots in Canvas; each is shown in the details for this assignment.

Use ggplot() and geom_col() in the chunk below to re-create the plot above using the lotr_species dataset.

Gender and film

Does a certain gender dominate a movie? (lolz of course it does, but still, graph it).

Recreate data summary

Use group_by() and summarize() to create a data frame named lotr_gender_film that looks like this:

# A tibble: 6 × 3
# Groups:   Gender [2]
  Gender Film                       total
  <chr>  <fct>                      <dbl>
1 Female The Fellowship Of The Ring  1243
2 Female The Two Towers               732
3 Female The Return Of The King       453
4 Male   The Fellowship Of The Ring  6610
5 Male   The Two Towers              6565
6 Male   The Return Of The King      5642
## # A tibble: 6 × 3
## # Groups:   Gender [2]
##   Gender Film                       total
##   <chr>  <fct>                      <dbl>
## 1 Female The Fellowship Of The Ring  1243
## 2 Female The Two Towers               732
## 3 Female The Return Of The King       453
## 4 Male   The Fellowship Of The Ring  6610
## 5 Male   The Two Towers              6565
## 6 Male   The Return Of The King      5642

Recreate plot

Run the code chunk below to see a plot.

Use ggplot() and geom_col() in the chunk below to re-create the plot above using the lotr_gender_film dataset.

Species and film

Does the dominant species differ across the three movies?

There’s no recreation here—you have all the code you need above. Hint: you’ll need to group by Species and Film.

Hobbits dominate The Fellowship of the Ring. Men take over in The Two Towers and The Return of the King, which matches the story moving from the Shire toward Rohan and Gondor. Elves stay the smallest group across all three films.

Species and gender and film

Create a plot that visualizes the number of words spoken by species, gender, and film simultaneously. Use the complete tidy lotr data frame. You don’t need to create a new summarized dataset (with group_by(Species, Gender, Film)) because the original data already have a row for each of those (you could make a summarized dataset, but it would be identical to the full version).

You need to show Species, Gender, and Film at the same time, but you only have two possible aesthetics (x and fill), so you’ll also need to facet by the third. Play around with different combinations (e.g. try x = Species, then x = Film) until you find one that tells the clearest story.

Task 3: Extension

  1. Copy the code for one of your plots above and paste it into the chunk below. Apply the concepts described in the Tufte readings to enhance the plo (e.g. annotation, color use, coordinate axes, adding a new geom, refining the appearance of a theme, etc). Importantly, your solution should apply Edward Tufte’s adage, “Use less ink” to plot data without any superfluous pixels. Hint: using different ggplot themes can help! This is your chance to demonstrate your knowledge of ggplot() functions and arguments.

 

  1. Write 75-100 words that summarise your application of Tufte to the plot you made above.

I rebuilt the species, gender, and film plot following Tufte’s “use less ink” principle. I dropped the grey panel background, removed the vertical gridlines, and kept only soft horizontal gridlines so the eye can still read totals without visual noise. The bars were narrowed slightly so each gender pair reads as a distinct unit. A calmer red and blue palette replaces the saturated default colors while still separating Female and Male clearly. The title now states the conclusion itself, so the reader sees the story before reading any axis label. Less decoration, more meaning.