library(tidyverse)
library(gapminder)HDS 5.5.1 and 5.5.2
Begin by loading the tidyverse and gapminder packages in the code chunk above and adding your name as the author.
The dplyr Wrangling Penguins tutorial (up through Section 7) and Chapter 5 of Hello Data Science have shown you how to subset your data by rows (filter()) and columns (select()), how to relocate() and rename() columns, and how to redefine or create new columns (mutate()). It’s time to put those tools together to manipulate, and visualize with ggplot, the gapminder data with a series of commands connected with the pipe, |>. Each code chuck below should start with the original gapminder data frame.
Wrangling and Plotting the gapminder Data
Let’s start by making a line plot of lifeExp versus year colored by country for all the countries in Europe. Rename country to europe_country and lifeExp to lifeExp_yrs. Modify this code by filling in the ______ to do so:
gapminder |>
filter(continent == "Europe") |>
rename(europe_country = country, lifeExp_yrs = lifeExp) |>
ggplot(mapping = aes(x = year, y = lifeExp_yrs, color = europe_country)) +
geom_line() +
labs(title = "Life Expectancy by Year in Europe",
x = "Year",
y = "Life Expectancy at Birth (years)",
color = "Country")Focusing again on Europe, make a plot containing a series of histograms of gdpPercap for each country in Europe.
gapminder |>
filter(continent == "Europe") |>
ggplot(mapping = aes(x = gdpPercap)) +
geom_histogram() +
facet_wrap(~ country) +
labs(title = "GDP Per Captia for European Countries",
x = "GDP Per Capita",
y = "Count")`stat_bin()` using `bins = 30`. Pick better value `binwidth`.
If gdpPercap is the per capita GDP, then we can calculate the total_GDP for each country by multiplying by the population. Create side-by-side boxplots of the total_GDP by continent:
gapminder |>
mutate(total_GDP = gdpPercap * pop) |>
ggplot(mapping = aes(x = continent, y = total_GDP)) +
geom_boxplot() +
labs(title = "Total GDP per Continent",
x = "Continent",
y = "Total GDP")Let’s compare gdpPercap for the countries in Europe and the Americas. Create a line plot of gdpPercap by year for each of the included countries, colored by continent.
gapminder |>
filter(continent %in% c("Europe", "Americas")) |>
ggplot(mapping = aes(x = year, y = gdpPercap, group = country, color = continent)) +
geom_line() +
labs(title = "GDP per Capita over time: Europe vs Americas",
x = "Year",
y = "GDP Per Capita",
color = "Continent")Create a new variable, pop_mil, that is the population of each country in millions of people. Make side-by-side boxplots of pop_mil by continent for the last year of data available:
gapminder |>
mutate(pop_mil = pop / 1e6) |>
filter(year == 2007) |>
ggplot(mapping = aes(x = continent, y = pop_mil)) +
geom_boxplot() +
labs(title = "Population by Continent in 2007",
x = "Continent",
y = "Populaiton (Millions)")Make a scatterplot of lifeExp versus gdpPercap for the last year of data available. Color the points by continent:
gapminder |>
filter(year == 2007) |>
ggplot(mapping = aes(x = gdpPercap, y = lifeExp, color = continent)) +
geom_point() +
labs(title = "Life Expectancy vs GDP Per Capita",
x = "GDP Per Capita",
y = "Life Expectancy",
color = "continent")Make a series of scatterplots of lifeExp versus gdpPercap for each year. Color the points by continent:
gapminder |>
ggplot(mapping = aes(x = gdpPercap, y = lifeExp, color = continent)) +
geom_point() +
facet_wrap(~ year) +
labs(title = "Life Expectancy vs GDP Per Capita per year",
x = "GDP Per Capita",
y = "Life Expectancy",
color = "continent")