library(tidyverse)
library(gapminder)HDS 5.5.1 and 5.5.2
Begin by loading the tidyverse and gapminder packages in the code chunk above and adding your name as the author.
The dplyr Wrangling Penguins tutorial (up through Section 7) and Chapter 5 of Hello Data Science have shown you how to subset your data by rows (filter()) and columns (select()), how to relocate() and rename() columns, and how to redefine or create new columns (mutate()). It’s time to put those tools together to manipulate, and visualize with ggplot, the gapminder data with a series of commands connected with the pipe, |>. Each code chuck below should start with the original gapminder data frame.
Wrangling and Plotting the gapminder Data
Let’s start by making a line plot of lifeExp versus year colored by country for all the countries in Europe. Rename country to europe_country and lifeExp to lifeExp_yrs. Modify this code by filling in the ______ to do so:
gapminder |>
filter(continent == "Europe") |>
rename(europe_country = country, lifeExp_yrs = lifeExp) |>
ggplot(mapping = aes(x = year, y = lifeExp_yrs, color = europe_country)) +
geom_line() +
labs(title = "Life Expectancy by Year in Europe",
x = "Year",
y = "Life Expectancy at Birth (years)",
color = "Country")Focusing again on Europe, make a plot containing a series of histograms of gdpPercap for each country in Europe.
gapminder |>
filter(continent == "Europe") |>
ggplot(mapping = aes(x = gdpPercap)) +
geom_histogram() +
facet_wrap(~ 100) +
labs(title = "GDP per Capita for Countries in Europe",
x = "GDP per Capita",
y = "Country in Europe")`stat_bin()` using `bins = 30`. Pick better value `binwidth`.
If gdpPercap is the per capita GDP, then we can calculate the total_GDP for each country by multiplying by the population. Create side-by-side boxplots of the total_GDP by continent:
gapminder |>
mutate(total_GDP = gdpPercap * pop) |>
ggplot(mapping = aes(x = country, y = total_GDP)) +
geom_boxplot() +
labs(title = "GDP for each Country",
x = "Country",
y = "Total GDP")Let’s compare gdpPercap for the countries in Europe and the Americas. Create a line plot of gdpPercap by year for each of the included countries, colored by continent.
gapminder |>
filter(continent == "Europe" |
continent == "North America" |
continent == "South America") |>
ggplot(mapping = aes(x = year, y = gdpPercap, group = continent, color = country)) +
geom_line() +
labs(title = "GDP per Capita in the Americas and Europe by Country",
x = "Year",
y = "GDP per Capita",
color = "Country")Create a new variable, pop_mil, that is the population of each country in millions of people. Make side-by-side boxplots of pop_mil by continent for the last year of data available:
gapminder |>
rename(pop_mil = pop) |>
filter(year == 2007) |>
ggplot(mapping = aes(x = continent, y = pop_mil)) +
geom_boxplot() +
labs(title = "Population in Millions for Each Continent in 2007",
x = "Continent",
y = "Population")Make a scatterplot of lifeExp versus gdpPercap for the last year of data available. Color the points by continent:
gapminder |>
filter(year == 2007) |>
ggplot(mapping = aes(x = lifeExp, y = gdpPercap, color = continent)) +
geom_point() +
labs(title = "Life Expectancy by GDP per Capita by Continent",
x = "Life Expanctancy",
y = "GDP per Capita",
color = "Continent")Make a series of scatterplots of lifeExp versus gdpPercap for each year. Color the points by continent:
gapminder |>
ggplot(mapping = aes(x = lifeExp, y = gdpPercap, color = continent)) +
geom_point() +
facet_wrap(~ year) +
labs(title = "Life Expectancy by GDP per Capita by Year",
x = "Life Expectancy",
y = "GDP per Capita",
color = "Continent")