Workshop 10: Histograms, positions and facets Exercise

Your name here

2022-12-13

Welcome to your last ggplot2 live workshop!

In this exercise you will be using a subset of the gapminder dataframe. Create a subset called gap_small, which only contains data from 1952 and 2007.

gap_small <- gapminder %>% 
  filter(year == 1952 | year == 2007)

First, plot a simple histogram showing the distribution of life expectancy (lifeExp) in your dataframe.

ggplot(data =  gap_small , 
       mapping = aes(x = lifeExp)) +
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Set binwidth = 5 to give each bar a 5-year range (and get rid of the persistent warning message).

ggplot(data =  gap_small, 
       mapping = aes(x = lifeExp)) +
  geom_histogram(binwidth = 5)

Use the fill argument to create a stacked histogram with two fill colors, one for each year.

ggplot(data =  gap_small, 
       mapping = aes(x = lifeExp, fill = as.factor(year))) +
  geom_histogram(binwidth = 5)

Hint: year is treated as numerical, so ggplot() will try to map it as a continuous variable. To get two distinct colors, you will need to tell R to treat it as a factor.

Change the position argument to overlap the two distributions. Then, add a degree of transparency so that you can see where the bars overlap.

ggplot(data =  gap_small, 
       mapping = aes(x = lifeExp, fill = as.factor(year))) +
  geom_histogram(binwidth = 5, alpha = 0.6, 
                 position = "identity")

Create small multiples of the plot above using facet_wrap(), with one panel for each continent.

ggplot(data =  gap_small, 
       mapping = aes(x = lifeExp, fill = as.factor(year))) +
  geom_histogram(binwidth = 5, alpha = 0.6, 
                 position = "identity") +
  facet_wrap(~continent)

Next, use facet_grid() to further subdivide your plots by year. You should have one column for each continent, and one row for each year.

ggplot(data =  gap_small, 
       mapping = aes(x = lifeExp, fill = as.factor(year))) +
  geom_histogram(binwidth = 5, alpha = 0.6, 
                 position = "identity") +
  facet_grid(year ~ continent, 
               labeller = label_both)

There are only two countries in the continent “Oceania”. Remove these from your dataframe (or if you want an extra challenge, use mutate() and case_when() to change the continent name from “Oceania” to “Asia”).

gap_smaller <- gap_small %>% 
  mutate(continent = case_when(continent == "Oceania" ~ "Asia",
                          TRUE ~ as.character(continent)))

Recreate your previous plot with the new dataframe. This time, map continent to fill color.

ggplot(data =  gap_smaller, 
       mapping = aes(x = lifeExp, fill = as.factor(continent))) +
  geom_histogram(binwidth = 5, alpha = 0.6, 
                 position = "identity") +
  facet_grid(year ~ continent, 
               labeller = label_both)

Lastly, improve the labels of the axes and color legend on this plot.

ggplot(data =  gap_smaller, 
       mapping = aes(x = lifeExp, fill = as.factor(continent))) +
  geom_histogram(binwidth = 5, alpha = 0.6, 
                 position = "identity") +
  facet_grid(year ~ continent, 
               labeller = label_both) +
  xlab("Life Expectancy in years") +
  ylab("Count") +
  ggtitle("Life Expectancy in the year 1952 and 2007 by continent") +
  theme(legend.title = element_blank())

1 Submission: Upload Rmd and HTML

The final due date for this exercise is Wednesday, December 14th at 23:59 PM UTC+2.

Once you have finished the tasks above, you should knit this Rmd into an HTML and upload both files on the assignment page in a ZIP folder.