suppressPackageStartupMessages(library("tidyverse"))
package 㤼㸱tidyverse㤼㸲 was built under R version 3.6.3
suppressPackageStartupMessages(library("nycflights13"))
package 㤼㸱nycflights13㤼㸲 was built under R version 3.6.3
#These additional package provide functions that will be used in answering some questions.
suppressPackageStartupMessages(library("viridis"))
package 㤼㸱viridis㤼㸲 was built under R version 3.6.2

1. How could you rescale the count dataset above to more clearly show the distribution of cut within color, or color within cut?

To clearly show the distribution of cut within color, calculate a new variable prop which is the proportion of each cut within a color. This is done using a grouped mutate.

diamonds %>%
  count(color, cut) %>%
  group_by(color) %>%
  mutate(prop = n / sum(n)) %>%
  ggplot(mapping = aes(x = color, y = cut)) +
  geom_tile(mapping = aes(fill = prop)) +
  scale_fill_viridis(limits = c(0, 1)) # from the viridis colour palette library

Similarly, to scale by the distribution of color within cut,

diamonds %>%
  count(color, cut) %>%
  group_by(cut) %>%
  mutate(prop = n / sum(n)) %>%
  ggplot(mapping = aes(x = color, y = cut)) +
  geom_tile(mapping = aes(fill = prop)) +
  scale_fill_viridis(limits = c(0, 1))

I add limit = c(0, 1) to put the color scale between (0, 1). These are the logical boundaries of proportions. This makes it possible to compare each cell to its actual value, and would improve comparisons across multiple plots. However, it ends up limiting the colors and makes it harder to compare within the dataset. However, using the default limits of the minimum and maximum values makes it easier to compare within the dataset the emphasizing relative differences, but harder to compare across datasets.

2. Use geom_tile() together with dplyr to explore how average flight delays vary by destination and month of year. What makes the plot difficult to read? How could you improve it?

flights %>%
  group_by(month, dest) %>%
  summarise(dep_delay = mean(dep_delay, na.rm = TRUE)) %>%
  ggplot(aes(x = factor(month), y = dest, fill = dep_delay)) +
  geom_tile() +
  labs(x = "Month", y = "Destination", fill = "Departure Delay")

There are several things that could be done to improve it,

  • sort destinations by a meaningful quantity (distance, number of flights, average delay)
  • remove missing values
  • better color scheme (viridis)

How to treat missing values is difficult. In this case, missing values correspond to airports which don’t have regular flights (at least one flight each month) from NYC. These are likely smaller airports (with higher variance in their average due to fewer observations).

flights %>%
  group_by(month, dest) %>%
  summarise(dep_delay = mean(dep_delay, na.rm = TRUE)) %>%
  group_by(dest) %>%
  filter(n() == 12) %>%
  ungroup() %>%
  mutate(dest = reorder(dest, dep_delay)) %>%
  ggplot(aes(x = factor(month), y = dest, fill = dep_delay)) +
  geom_tile() +
  scale_fill_viridis() +
  labs(x = "Month", y = "Destination", fill = "Departure Delay")

3. Why is it slightly better to use aes(x = color, y = cut) rather than aes(x = cut, y = color) in the example above?

It’s usually better to use the categorical variable with a larger number of categories or the longer labels on the y axis. If at all possible, labels should be horizontal because that is easier to read.

However, switching the order doesn’t result in overlapping labels.

diamonds %>%
  count(color, cut) %>%
  ggplot(mapping = aes(y = color, x = cut)) +
  geom_tile(mapping = aes(fill = n))

Another justification, for switching the order is that the larger numbers are at the top when x = color and y = cut, and that lowers the cognitive burden of interpreting the plot.

LS0tDQp0aXRsZTogIlR3byBjYXRlZ29yaWNhbCB2YXJpYWJsZXMiDQpvdXRwdXQ6IA0KICBodG1sX25vdGVib29rOg0KICAgIHRvYzogdHJ1ZQ0KICAgIHRvY19mbG9hdDogdHJ1ZQ0KLS0tDQoNCmBgYHtyIGxvYWRsaWJyYXJ5fQ0Kc3VwcHJlc3NQYWNrYWdlU3RhcnR1cE1lc3NhZ2VzKGxpYnJhcnkoInRpZHl2ZXJzZSIpKQ0Kc3VwcHJlc3NQYWNrYWdlU3RhcnR1cE1lc3NhZ2VzKGxpYnJhcnkoIm55Y2ZsaWdodHMxMyIpKQ0KI1RoZXNlIGFkZGl0aW9uYWwgcGFja2FnZSBwcm92aWRlIGZ1bmN0aW9ucyB0aGF0IHdpbGwgYmUgdXNlZCBpbiBhbnN3ZXJpbmcgc29tZSBxdWVzdGlvbnMuDQpzdXBwcmVzc1BhY2thZ2VTdGFydHVwTWVzc2FnZXMobGlicmFyeSgidmlyaWRpcyIpKQ0KYGBgDQoNCiMjIyAxLiBIb3cgY291bGQgeW91IHJlc2NhbGUgdGhlIGNvdW50IGRhdGFzZXQgYWJvdmUgdG8gbW9yZSBjbGVhcmx5IHNob3cgdGhlIGRpc3RyaWJ1dGlvbiBvZiBjdXQgd2l0aGluIGNvbG9yLCBvciBjb2xvciB3aXRoaW4gY3V0Pw0KDQpUbyBjbGVhcmx5IHNob3cgdGhlIGRpc3RyaWJ1dGlvbiBvZiBgY3V0YCB3aXRoaW4gYGNvbG9yYCwgY2FsY3VsYXRlIGEgbmV3IHZhcmlhYmxlIGBwcm9wYCB3aGljaCBpcyB0aGUgcHJvcG9ydGlvbiBvZiBlYWNoIGN1dCB3aXRoaW4gYSBgY29sb3JgLiBUaGlzIGlzIGRvbmUgdXNpbmcgYSBncm91cGVkIG11dGF0ZS4NCg0KYGBge3J9DQpkaWFtb25kcyAlPiUNCiAgY291bnQoY29sb3IsIGN1dCkgJT4lDQogIGdyb3VwX2J5KGNvbG9yKSAlPiUNCiAgbXV0YXRlKHByb3AgPSBuIC8gc3VtKG4pKSAlPiUNCiAgZ2dwbG90KG1hcHBpbmcgPSBhZXMoeCA9IGNvbG9yLCB5ID0gY3V0KSkgKw0KICBnZW9tX3RpbGUobWFwcGluZyA9IGFlcyhmaWxsID0gcHJvcCkpICsNCiAgc2NhbGVfZmlsbF92aXJpZGlzKGxpbWl0cyA9IGMoMCwgMSkpICMgZnJvbSB0aGUgdmlyaWRpcyBjb2xvdXIgcGFsZXR0ZSBsaWJyYXJ5DQpgYGANCg0KU2ltaWxhcmx5LCB0byBzY2FsZSBieSB0aGUgZGlzdHJpYnV0aW9uIG9mIGBjb2xvcmAgd2l0aGluIGBjdXRgLA0KDQpgYGB7cn0NCmRpYW1vbmRzICU+JQ0KICBjb3VudChjb2xvciwgY3V0KSAlPiUNCiAgZ3JvdXBfYnkoY3V0KSAlPiUNCiAgbXV0YXRlKHByb3AgPSBuIC8gc3VtKG4pKSAlPiUNCiAgZ2dwbG90KG1hcHBpbmcgPSBhZXMoeCA9IGNvbG9yLCB5ID0gY3V0KSkgKw0KICBnZW9tX3RpbGUobWFwcGluZyA9IGFlcyhmaWxsID0gcHJvcCkpICsNCiAgc2NhbGVfZmlsbF92aXJpZGlzKGxpbWl0cyA9IGMoMCwgMSkpDQpgYGANCg0KSSBhZGQgYGxpbWl0ID0gYygwLCAxKWAgdG8gcHV0IHRoZSBjb2xvciBzY2FsZSBiZXR3ZWVuICgwLCAxKS4gVGhlc2UgYXJlIHRoZSBsb2dpY2FsIGJvdW5kYXJpZXMgb2YgcHJvcG9ydGlvbnMuIFRoaXMgbWFrZXMgaXQgcG9zc2libGUgdG8gY29tcGFyZSBlYWNoIGNlbGwgdG8gaXRzIGFjdHVhbCB2YWx1ZSwgYW5kIHdvdWxkIGltcHJvdmUgY29tcGFyaXNvbnMgYWNyb3NzIG11bHRpcGxlIHBsb3RzLiBIb3dldmVyLCBpdCBlbmRzIHVwIGxpbWl0aW5nIHRoZSBjb2xvcnMgYW5kIG1ha2VzIGl0IGhhcmRlciB0byBjb21wYXJlIHdpdGhpbiB0aGUgZGF0YXNldC4gSG93ZXZlciwgdXNpbmcgdGhlIGRlZmF1bHQgbGltaXRzIG9mIHRoZSBtaW5pbXVtIGFuZCBtYXhpbXVtIHZhbHVlcyBtYWtlcyBpdCBlYXNpZXIgdG8gY29tcGFyZSB3aXRoaW4gdGhlIGRhdGFzZXQgdGhlIGVtcGhhc2l6aW5nIHJlbGF0aXZlIGRpZmZlcmVuY2VzLCBidXQgaGFyZGVyIHRvIGNvbXBhcmUgYWNyb3NzIGRhdGFzZXRzLg0KDQojIyMgMi4gVXNlIGBnZW9tX3RpbGUoKWAgdG9nZXRoZXIgd2l0aCBkcGx5ciB0byBleHBsb3JlIGhvdyBhdmVyYWdlIGZsaWdodCBkZWxheXMgdmFyeSBieSBkZXN0aW5hdGlvbiBhbmQgbW9udGggb2YgeWVhci4gV2hhdCBtYWtlcyB0aGUgcGxvdCBkaWZmaWN1bHQgdG8gcmVhZD8gSG93IGNvdWxkIHlvdSBpbXByb3ZlIGl0Pw0KDQpgYGB7cn0NCmZsaWdodHMgJT4lDQogIGdyb3VwX2J5KG1vbnRoLCBkZXN0KSAlPiUNCiAgc3VtbWFyaXNlKGRlcF9kZWxheSA9IG1lYW4oZGVwX2RlbGF5LCBuYS5ybSA9IFRSVUUpKSAlPiUNCiAgZ2dwbG90KGFlcyh4ID0gZmFjdG9yKG1vbnRoKSwgeSA9IGRlc3QsIGZpbGwgPSBkZXBfZGVsYXkpKSArDQogIGdlb21fdGlsZSgpICsNCiAgbGFicyh4ID0gIk1vbnRoIiwgeSA9ICJEZXN0aW5hdGlvbiIsIGZpbGwgPSAiRGVwYXJ0dXJlIERlbGF5IikNCmBgYA0KDQpUaGVyZSBhcmUgc2V2ZXJhbCB0aGluZ3MgdGhhdCBjb3VsZCBiZSBkb25lIHRvIGltcHJvdmUgaXQsDQoNCiAtIHNvcnQgZGVzdGluYXRpb25zIGJ5IGEgbWVhbmluZ2Z1bCBxdWFudGl0eSAoZGlzdGFuY2UsIG51bWJlciBvZiBmbGlnaHRzLCBhdmVyYWdlIGRlbGF5KQ0KIC0gcmVtb3ZlIG1pc3NpbmcgdmFsdWVzDQogLSBiZXR0ZXIgY29sb3Igc2NoZW1lICh2aXJpZGlzKQ0KDQpIb3cgdG8gdHJlYXQgbWlzc2luZyB2YWx1ZXMgaXMgZGlmZmljdWx0LiBJbiB0aGlzIGNhc2UsIG1pc3NpbmcgdmFsdWVzIGNvcnJlc3BvbmQgdG8gYWlycG9ydHMgd2hpY2ggZG9u4oCZdCBoYXZlIHJlZ3VsYXIgZmxpZ2h0cyAoYXQgbGVhc3Qgb25lIGZsaWdodCBlYWNoIG1vbnRoKSBmcm9tIE5ZQy4gVGhlc2UgYXJlIGxpa2VseSBzbWFsbGVyIGFpcnBvcnRzICh3aXRoIGhpZ2hlciB2YXJpYW5jZSBpbiB0aGVpciBhdmVyYWdlIGR1ZSB0byBmZXdlciBvYnNlcnZhdGlvbnMpLg0KDQpgYGB7cn0NCmZsaWdodHMgJT4lDQogIGdyb3VwX2J5KG1vbnRoLCBkZXN0KSAlPiUNCiAgc3VtbWFyaXNlKGRlcF9kZWxheSA9IG1lYW4oZGVwX2RlbGF5LCBuYS5ybSA9IFRSVUUpKSAlPiUNCiAgZ3JvdXBfYnkoZGVzdCkgJT4lDQogIGZpbHRlcihuKCkgPT0gMTIpICU+JQ0KICB1bmdyb3VwKCkgJT4lDQogIG11dGF0ZShkZXN0ID0gcmVvcmRlcihkZXN0LCBkZXBfZGVsYXkpKSAlPiUNCiAgZ2dwbG90KGFlcyh4ID0gZmFjdG9yKG1vbnRoKSwgeSA9IGRlc3QsIGZpbGwgPSBkZXBfZGVsYXkpKSArDQogIGdlb21fdGlsZSgpICsNCiAgc2NhbGVfZmlsbF92aXJpZGlzKCkgKw0KICBsYWJzKHggPSAiTW9udGgiLCB5ID0gIkRlc3RpbmF0aW9uIiwgZmlsbCA9ICJEZXBhcnR1cmUgRGVsYXkiKQ0KYGBgDQoNCiMjIyAzLiBXaHkgaXMgaXQgc2xpZ2h0bHkgYmV0dGVyIHRvIHVzZSBgYWVzKHggPSBjb2xvciwgeSA9IGN1dClgIHJhdGhlciB0aGFuIGBhZXMoeCA9IGN1dCwgeSA9IGNvbG9yKWAgaW4gdGhlIGV4YW1wbGUgYWJvdmU/DQoNCkl04oCZcyB1c3VhbGx5IGJldHRlciB0byB1c2UgdGhlIGNhdGVnb3JpY2FsIHZhcmlhYmxlIHdpdGggYSBsYXJnZXIgbnVtYmVyIG9mIGNhdGVnb3JpZXMgb3IgdGhlIGxvbmdlciBsYWJlbHMgb24gdGhlIHkgYXhpcy4gSWYgYXQgYWxsIHBvc3NpYmxlLCBsYWJlbHMgc2hvdWxkIGJlIGhvcml6b250YWwgYmVjYXVzZSB0aGF0IGlzIGVhc2llciB0byByZWFkLg0KDQpIb3dldmVyLCBzd2l0Y2hpbmcgdGhlIG9yZGVyIGRvZXNu4oCZdCByZXN1bHQgaW4gb3ZlcmxhcHBpbmcgbGFiZWxzLg0KDQpgYGB7cn0NCmRpYW1vbmRzICU+JQ0KICBjb3VudChjb2xvciwgY3V0KSAlPiUNCiAgZ2dwbG90KG1hcHBpbmcgPSBhZXMoeSA9IGNvbG9yLCB4ID0gY3V0KSkgKw0KICBnZW9tX3RpbGUobWFwcGluZyA9IGFlcyhmaWxsID0gbikpDQpgYGANCg0KQW5vdGhlciBqdXN0aWZpY2F0aW9uLCBmb3Igc3dpdGNoaW5nIHRoZSBvcmRlciBpcyB0aGF0IHRoZSBsYXJnZXIgbnVtYmVycyBhcmUgYXQgdGhlIHRvcCB3aGVuIGB4ID0gY29sb3JgIGFuZCBgeSA9IGN1dGAsIGFuZCB0aGF0IGxvd2VycyB0aGUgY29nbml0aXZlIGJ1cmRlbiBvZiBpbnRlcnByZXRpbmcgdGhlIHBsb3Qu