Adapted from a lab written by Prof Charlotte Wickham
To experiment with some features of scales we are going to work with a heatmap of the proportion of flights that have a departure delay of more that 15 minutes (prop_over_15
) at the George Bush Intercontinental Airport (IAH) by day of the week (DayOfWeek
) and departure hour (DepHour
).
These data come from the hflights
package, but some summarization is done for you.
library(tidyverse)
iah <- read_csv("http://vis.cwick.co.nz/data/iah-summary.csv")
str(iah)
## tibble [154 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ DepHour : num [1:154] 0 0 0 0 0 0 0 1 1 1 ...
## $ DayOfWeek : chr [1:154] "Mon" "Tue" "Wed" "Thu" ...
## $ avg_delay : num [1:154] 187.6 174.5 173 196.2 31.1 ...
## $ avg_delay_delayed: num [1:154] 187.6 174.5 173 196.2 31.1 ...
## $ prop_over_15 : num [1:154] 1 1 1 1 0.31 ...
## $ nflights : num [1:154] 7 6 10 17 29 3 7 1 3 5 ...
## $ ndests : num [1:154] 5 6 9 14 5 1 7 1 3 5 ...
## - attr(*, "spec")=
## .. cols(
## .. DepHour = col_double(),
## .. DayOfWeek = col_character(),
## .. avg_delay = col_double(),
## .. avg_delay_delayed = col_double(),
## .. prop_over_15 = col_double(),
## .. nflights = col_double(),
## .. ndests = col_double()
## .. )
# make sure days of week are displayed in the right order
iah$DayOfWeek <- factor(iah$DayOfWeek,
levels = c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"))
p <- ggplot(iah, aes(DepHour, DayOfWeek)) +
geom_tile(aes(fill = prop_over_15))
p
## Warning: Removed 7 rows containing missing values (geom_tile).
scale_fill_xxx
We are interested in changing the scale for the fill of the tiles so we should be adding a scale_fill_xxx function
, where xxx
is replaced by a scale name. (If you are working with points and have mapped color to a variable, you would use scale_color_xxx
.) When the fill aesthetic is mapped to a continuous variable, the default is scale_fill_gradient
.
p + scale_fill_gradient()
## Warning: Removed 7 rows containing missing values (geom_tile).
We can change the colors at the endpoints of the scale by specifying the high
and low
arguments, for example,
p + scale_fill_gradient("Proportion",
high = "white", low = "springgreen4")
## Warning: Removed 7 rows containing missing values (geom_tile).
Use different named colors for high and low arguments to scale_fill_gradient. You can see a list of the named colors with colors()
, or a useful .pdf is at: http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf
It’s surprisingly hard using scale_fill_gradient to find “nice” color gradients, and there is no guarantee they’ll correspond to any perceptually sound palettes like those you’ve seen in the readings.
# install.packages("colorspace")
library(colorspace)
colorspace has a whole set of named palettes, you can see them all with:
hcl_palettes(plot = TRUE)
Notice they are broken into sections for qualitative, sequential and diverging palettes.
To apply a colorspace palette to a ggplot2 plot, you use the a function that takes the form:
scale_<aesthetic>_<datatype>_<colorscale>()
Where <datatype>
is one of discrete
or continuous
, and <colorscale>
is one of qualitative
, sequential
, or diverging
, and you’ll pass the palette name as the argument.
So, for instance to add the "Mint"
palette, you could use:
p + scale_fill_continuous_sequential("Mint")
## Warning: Removed 7 rows containing missing values (geom_tile).
I’ve used fill
because our original plot mapped prop_over_15
to fill
, continuous
because prop_over_15
is continuous, and sequential
because "Mint"
is a sequential scale.
Try 3-4 other colorspace sequential scales.
You could use a diverging scale, but since colorspace assumes zero is the midpoint you’ll only get one half of the palette:
p + scale_fill_continuous_diverging("Blue-Red")
## Warning: Removed 7 rows containing missing values (geom_tile).
Let’s look at the average departure delay of delayed flights
ggplot(iah, aes(DepHour, DayOfWeek)) +
geom_tile(aes(fill = avg_delay_delayed)) +
scale_fill_gradient()
## Warning: Removed 7 rows containing missing values (geom_tile).
Why is this plot so uninformative?
Try limiting the fill scale to only show values between 0 and 120:
ggplot(iah, aes(DepHour, DayOfWeek)) +
geom_tile(aes(fill = avg_delay_delayed)) +
scale_fill_gradient(limits = c(0, 120))
## Warning: Removed 7 rows containing missing values (geom_tile).
Instead of using limits
try using trans = "log10"
. What happens?
Compare that to mapping fill to the log transformed average delay of delayed flights:
ggplot(iah, aes(DepHour, DayOfWeek)) +
geom_tile(aes(fill = log10(avg_delay_delayed))) +
scale_fill_gradient()
## Warning: Removed 7 rows containing missing values (geom_tile).
Other transformations are listed in the “See Also” section of ?trans_new
.
An alternative approach to deal with a few very large numbers is to turn the continuous variable into a discrete one by binning it:
iah <- iah %>%
mutate(avg_delay_cut = cut(avg_delay, breaks = c(-5, 0, 15, 30, 60, 1000)))
Then you’ll need to use the discrete form of the scale:
ggplot(iah, aes(DepHour, DayOfWeek)) +
geom_tile(aes(fill = avg_delay_cut)) +
scale_fill_discrete_sequential("Mint")
## Warning: Removed 7 rows containing missing values (geom_tile).
Experiment Save one of your plots so far as a PNG file, upload it at http://hclwizard.org/cvdemulator/. and click on the “All” tab to see it as it may appear to someone with each of the different color vision deficiencies. Are the colors still distinguishable?
Guides control how legends appear. Take a look at http://www.cookbook-r.com/Graphs/Legends_(ggplot2)/
and the examples at https://ggplot2.tidyverse.org/reference/guide_colourbar.html
and https://ggplot2.tidyverse.org/reference/guide_legend.html
to see what can be changed.