Welcome to your last ggplot2 live workshop!
In this exercise you will be using a subset of the
gapminder
dataframe. Create a subset called
gap_small
, which only contains data from 1952 and 2007.
<- gapminder %>%
gap_small filter(year == 1952 | year == 2007)
First, plot a simple histogram showing the distribution of life expectancy (lifeExp) in your dataframe.
ggplot(data = gap_small ,
mapping = aes(x = lifeExp)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Set binwidth = 5
to give each bar a 5-year range (and
get rid of the persistent warning message).
ggplot(data = gap_small,
mapping = aes(x = lifeExp)) +
geom_histogram(binwidth = 5)
Use the fill
argument to create a stacked histogram with
two fill colors, one for each year.
ggplot(data = gap_small,
mapping = aes(x = lifeExp, fill = as.factor(year))) +
geom_histogram(binwidth = 5)
Hint: year is treated as numerical, so ggplot()
will try
to map it as a continuous variable. To get two distinct colors, you will
need to tell R to treat it as a factor.
Change the position argument to overlap the two distributions. Then, add a degree of transparency so that you can see where the bars overlap.
ggplot(data = gap_small,
mapping = aes(x = lifeExp, fill = as.factor(year))) +
geom_histogram(binwidth = 5, alpha = 0.6,
position = "identity")
Create small multiples of the plot above using
facet_wrap()
, with one panel for each continent.
ggplot(data = gap_small,
mapping = aes(x = lifeExp, fill = as.factor(year))) +
geom_histogram(binwidth = 5, alpha = 0.6,
position = "identity") +
facet_wrap(~continent)
Next, use facet_grid()
to further subdivide your plots
by year
. You should have one column for each continent, and
one row for each year.
ggplot(data = gap_small,
mapping = aes(x = lifeExp, fill = as.factor(year))) +
geom_histogram(binwidth = 5, alpha = 0.6,
position = "identity") +
facet_grid(year ~ continent,
labeller = label_both)
There are only two countries in the continent “Oceania”. Remove these
from your dataframe (or if you want an extra challenge, use
mutate()
and case_when()
to change the
continent name from “Oceania” to “Asia”).
<- gap_small %>%
gap_smaller mutate(continent = case_when(continent == "Oceania" ~ "Asia",
TRUE ~ as.character(continent)))
Recreate your previous plot with the new dataframe. This time, map continent to fill color.
ggplot(data = gap_smaller,
mapping = aes(x = lifeExp, fill = as.factor(continent))) +
geom_histogram(binwidth = 5, alpha = 0.6,
position = "identity") +
facet_grid(year ~ continent,
labeller = label_both)
Lastly, improve the labels of the axes and color legend on this plot.
ggplot(data = gap_smaller,
mapping = aes(x = lifeExp, fill = as.factor(continent))) +
geom_histogram(binwidth = 5, alpha = 0.6,
position = "identity") +
facet_grid(year ~ continent,
labeller = label_both) +
xlab("Life Expectancy in years") +
ylab("Count") +
ggtitle("Life Expectancy in the year 1952 and 2007 by continent") +
theme(legend.title = element_blank())
1 Submission: Upload Rmd and HTML
The final due date for this exercise is Wednesday, December 14th at 23:59 PM UTC+2.
Once you have finished the tasks above, you should knit this Rmd into an HTML and upload both files on the assignment page in a ZIP folder.