Summarizing the median life expectancy

You’ve seen how to find the mean life expectancy and the total population across a set of observations, but mean() and sum() are only two of the functions R provides for summarizing a collection of numbers. Here, you’ll learn to use the median() function in combination with summarize().

By the way, dplyr displays some messages when it’s loaded that we’ve been hiding so far. They’ll show up in red and start with:

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’: This will occur in future exercises each time you load dplyr: it’s mentioning some built-in functions that are overwritten by dplyr. You won’t need to worry about this message within this course.

Use the median() function within a summarize() to find the median life expectancy. Save it into a column called medianLifeExp.

HINT Pipe the gapminder data into the summarize() function, which should contain medianLifeExp = median(lifeExp).

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJsaWJyYXJ5KGdhcG1pbmRlcilcbmxpYnJhcnkoZHBseXIpXG5cbiMgU3VtbWFyaXplIHRvIGZpbmQgdGhlIG1lZGlhbiBsaWZlIGV4cGVjdGFuY3kiLCJzb2x1dGlvbiI6ImxpYnJhcnkoZ2FwbWluZGVyKVxubGlicmFyeShkcGx5cilcblxuIyBTdW1tYXJpemUgdG8gZmluZCB0aGUgbWVkaWFuIGxpZmUgZXhwZWN0YW5jeVxuZ2FwbWluZGVyICU+JVxuICBzdW1tYXJpemUobWVkaWFuTGlmZUV4cCA9IG1lZGlhbihsaWZlRXhwKSkifQ==

Summarizing the median life expectancy in 1957

Rather than summarizing the entire dataset, you may want to find the median life expectancy for only one particular year. In this case, you’ll find the median in the year 1957.

Filter for the year 1957, then use the median() function within a summarize() to calculate the median life expectancy into a column called medianLifeExp.

HINT This takes two steps: first a filter() step with year == 1957, and then a summarize() step that’s identical to the last exercise.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJsaWJyYXJ5KGdhcG1pbmRlcilcbmxpYnJhcnkoZHBseXIpXG5cbiMgRmlsdGVyIGZvciAxOTU3IHRoZW4gc3VtbWFyaXplIHRoZSBtZWRpYW4gbGlmZSBleHBlY3RhbmN5Iiwic29sdXRpb24iOiJsaWJyYXJ5KGdhcG1pbmRlcilcbmxpYnJhcnkoZHBseXIpXG5cbiMgRmlsdGVyIGZvciAxOTU3IHRoZW4gc3VtbWFyaXplIHRoZSBtZWRpYW4gbGlmZSBleHBlY3RhbmN5XG5nYXBtaW5kZXIgJT4lXG4gIGZpbHRlcih5ZWFyID09IDE5NTcpICU+JVxuICBzdW1tYXJpemUobWVkaWFuTGlmZUV4cCA9IG1lZGlhbihsaWZlRXhwKSkifQ==

Summarizing multiple variables in 1957

The summarize() verb allows you to summarize multiple variables at once. In this case, you’ll use the median() function to find the median life expectancy and the max() function to find the maximum GDP per capita.

Find both the median life expectancy (lifeExp) and the maximum GDP per capita (gdpPercap) in the year 1957, calling them medianLifeExp and maxGdpPercap respectively. You can use the max() function to find the maximum.

HINT This takes two steps: first a filter() step with year == 1957, and then a summarize() step saving both the summaries. The two variable summaries should be separated with a comma.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJsaWJyYXJ5KGdhcG1pbmRlcilcbmxpYnJhcnkoZHBseXIpXG5cbiMgRmlsdGVyIGZvciAxOTU3IHRoZW4gc3VtbWFyaXplIHRoZSBtZWRpYW4gbGlmZSBleHBlY3RhbmN5IGFuZCB0aGUgbWF4aW11bSBHRFAgcGVyIGNhcGl0YSIsInNvbHV0aW9uIjoibGlicmFyeShnYXBtaW5kZXIpXG5saWJyYXJ5KGRwbHlyKVxuXG4jIEZpbHRlciBmb3IgMTk1NyB0aGVuIHN1bW1hcml6ZSB0aGUgbWVkaWFuIGxpZmUgZXhwZWN0YW5jeSBhbmQgdGhlIG1heGltdW0gR0RQIHBlciBjYXBpdGFcbmdhcG1pbmRlciAlPiVcbiAgZmlsdGVyKHllYXIgPT0gMTk1NykgJT4lXG4gIHN1bW1hcml6ZShtZWRpYW5MaWZlRXhwID0gbWVkaWFuKGxpZmVFeHApLFxuICAgICAgICAgICAgbWF4R2RwUGVyY2FwID0gbWF4KGdkcFBlcmNhcCkpIn0=

Summarizing by year

In a previous exercise, you found the median life expectancy and the maximum GDP per capita in the year 1957. Now, you’ll perform those two summaries within each year in the dataset, using the group_by verb.

Find the median life expectancy (lifeExp) and maximum GDP per capita (gdpPercap) within each year, saving them into medianLifeExp and maxGdpPercap, respectively.

HINT This takes two steps: first pipe the gapminder data into group_by(year), then pipe it into a summarize() similar to the last exercise.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJsaWJyYXJ5KGdhcG1pbmRlcilcbmxpYnJhcnkoZHBseXIpXG5cbiMgRmluZCBtZWRpYW4gbGlmZSBleHBlY3RhbmN5IGFuZCBtYXhpbXVtIEdEUCBwZXIgY2FwaXRhIGluIGVhY2ggeWVhciIsInNvbHV0aW9uIjoibGlicmFyeShnYXBtaW5kZXIpXG5saWJyYXJ5KGRwbHlyKVxuXG4jIEZpbmQgbWVkaWFuIGxpZmUgZXhwZWN0YW5jeSBhbmQgbWF4aW11bSBHRFAgcGVyIGNhcGl0YSBpbiBlYWNoIHllYXJcbmdhcG1pbmRlciAlPiVcbiAgZ3JvdXBfYnkoeWVhcikgJT4lXG4gIHN1bW1hcml6ZShtZWRpYW5MaWZlRXhwID0gbWVkaWFuKGxpZmVFeHApLFxuICAgICAgICAgICAgbWF4R2RwUGVyY2FwID0gbWF4KGdkcFBlcmNhcCkpIn0=

Summarizing by continent

You can group by any variable in your dataset to create a summary. Rather than comparing across time, you might be interested in comparing among continents. You’ll want to do that within one year of the dataset: let’s use 1957.

Filter the gapminder data for the year 1957. Then find the median life expectancy (lifeExp) and maximum GDP per capita (gdpPercap) within each continent, saving them into medianLifeExp and maxGdpPercap, respectively.

HINT This will be almost identical to the last exercise, except that you’ll first need a filter(year == 1957), and then you’ll group by continent instead of grouping by year.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJsaWJyYXJ5KGdhcG1pbmRlcilcbmxpYnJhcnkoZHBseXIpXG5cbiMgRmluZCBtZWRpYW4gbGlmZSBleHBlY3RhbmN5IGFuZCBtYXhpbXVtIEdEUCBwZXIgY2FwaXRhIGluIGVhY2ggY29udGluZW50IGluIDE5NTciLCJzb2x1dGlvbiI6ImxpYnJhcnkoZ2FwbWluZGVyKVxubGlicmFyeShkcGx5cilcblxuIyBGaW5kIG1lZGlhbiBsaWZlIGV4cGVjdGFuY3kgYW5kIG1heGltdW0gR0RQIHBlciBjYXBpdGEgaW4gZWFjaCBjb250aW5lbnQgaW4gMTk1N1xuZ2FwbWluZGVyICU+JVxuICBmaWx0ZXIoeWVhciA9PSAxOTU3KSAlPiVcbiAgZ3JvdXBfYnkoY29udGluZW50KSAlPiVcbiAgc3VtbWFyaXplKG1lZGlhbkxpZmVFeHAgPSBtZWRpYW4obGlmZUV4cCksXG4gICAgICAgICAgICBtYXhHZHBQZXJjYXAgPSBtYXgoZ2RwUGVyY2FwKSkifQ==

Summarizing by continent and year

Instead of grouping just by year, or just by continent, you’ll now group by both continent and year to summarize within each.

Find the median life expectancy (lifeExp) and maximum GDP per capita (gdpPercap) within each combination of continent and year, saving them into medianLifeExp and maxGdpPercap, respectively.

HINT This is similar to the last two exercises, but the group by will look like group_by(continent, year).

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJsaWJyYXJ5KGdhcG1pbmRlcilcbmxpYnJhcnkoZHBseXIpXG5cbiMgRmluZCBtZWRpYW4gbGlmZSBleHBlY3RhbmN5IGFuZCBtYXhpbXVtIEdEUCBwZXIgY2FwaXRhIGluIGVhY2ggeWVhci9jb250aW5lbnQgY29tYmluYXRpb24iLCJzb2x1dGlvbiI6ImxpYnJhcnkoZ2FwbWluZGVyKVxubGlicmFyeShkcGx5cilcblxuIyBGaW5kIG1lZGlhbiBsaWZlIGV4cGVjdGFuY3kgYW5kIG1heGltdW0gR0RQIHBlciBjYXBpdGEgaW4gZWFjaCB5ZWFyL2NvbnRpbmVudCBjb21iaW5hdGlvblxuZ2FwbWluZGVyICU+JVxuICBncm91cF9ieShjb250aW5lbnQsIHllYXIpICU+JVxuICBzdW1tYXJpemUobWVkaWFuTGlmZUV4cCA9IG1lZGlhbihsaWZlRXhwKSxcbiAgICAgICAgICAgIG1heEdkcFBlcmNhcCA9IG1heChnZHBQZXJjYXApKSJ9

Visualizing median life expectancy over time

In the last chapter, you summarized the gapminder data to calculate the median life expectancy within each year. This code is provided for you, and is saved (with <-) as the by_year dataset.

Now you can use the ggplot2 package to turn this into a visualization of changing life expectancy over time.

Use the by_year dataset to create a scatter plot showing the change of median life expectancy over time, with year on the x-axis and medianLifeExp on the y-axis. Be sure to add expand_limits(y = 0) to make sure the plot’s y-axis includes zero.

HINT The aesthetics in this plot will be aes(x = year, y = medianLifeExp), and don’t forget to add expand_limits(y = 0) after the geom_point().

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJsaWJyYXJ5KGdhcG1pbmRlcilcbmxpYnJhcnkoZHBseXIpXG5saWJyYXJ5KGdncGxvdDIpXG5cbmJ5X3llYXIgPC0gZ2FwbWluZGVyICU+JVxuICBncm91cF9ieSh5ZWFyKSAlPiVcbiAgc3VtbWFyaXplKG1lZGlhbkxpZmVFeHAgPSBtZWRpYW4obGlmZUV4cCksXG4gICAgICAgICAgICBtYXhHZHBQZXJjYXAgPSBtYXgoZ2RwUGVyY2FwKSlcblxuIyBDcmVhdGUgYSBzY2F0dGVyIHBsb3Qgc2hvd2luZyB0aGUgY2hhbmdlIGluIG1lZGlhbkxpZmVFeHAgb3ZlciB0aW1lIiwic29sdXRpb24iOiJsaWJyYXJ5KGdhcG1pbmRlcilcbmxpYnJhcnkoZHBseXIpXG5saWJyYXJ5KGdncGxvdDIpXG5cbmJ5X3llYXIgPC0gZ2FwbWluZGVyICU+JVxuICBncm91cF9ieSh5ZWFyKSAlPiVcbiAgc3VtbWFyaXplKG1lZGlhbkxpZmVFeHAgPSBtZWRpYW4obGlmZUV4cCksXG4gICAgICAgICAgICBtYXhHZHBQZXJjYXAgPSBtYXgoZ2RwUGVyY2FwKSlcblxuIyBDcmVhdGUgYSBzY2F0dGVyIHBsb3Qgc2hvd2luZyB0aGUgY2hhbmdlIGluIG1lZGlhbkxpZmVFeHAgb3ZlciB0aW1lXG5nZ3Bsb3QoYnlfeWVhciwgYWVzKHggPSB5ZWFyLCB5ID0gbWVkaWFuTGlmZUV4cCkpICtcbiAgZ2VvbV9wb2ludCgpICtcbiAgZXhwYW5kX2xpbWl0cyh5ID0gMCkifQ==

Visualizing median GDP per capita per continent over time

In the last exercise you were able to see how the median life expectancy of countries changed over time. Now you’ll examine the median GDP per capita instead, and see how the trend differs among continents.

Summarize the gapminder dataset by continent and year, finding the median GDP per capita (gdpPercap) within each and putting it into a column called medianGdpPercap. Use the assignment operator <- to save this summarized data as by_year_continent. Create a scatter plot showing the change in medianGdpPercap by continent over time. Use color to distinguish between continents, and be sure to add expand_limits(y = 0) so that the y-axis starts at zero.

HINT The scatter plot will have three aesthetics: x (year), y (medianGdpPercap) and color (continent).

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJsaWJyYXJ5KGdhcG1pbmRlcilcbmxpYnJhcnkoZHBseXIpXG5saWJyYXJ5KGdncGxvdDIpXG5cbiMgU3VtbWFyaXplIG1lZGlhbkdkcFBlcmNhcCB3aXRoaW4gZWFjaCBjb250aW5lbnQgd2l0aGluIGVhY2ggeWVhcjogYnlfeWVhcl9jb250aW5lbnRcblxuXG4jIFBsb3QgdGhlIGNoYW5nZSBpbiBtZWRpYW5HZHBQZXJjYXAgaW4gZWFjaCBjb250aW5lbnQgb3ZlciB0aW1lIiwic29sdXRpb24iOiJsaWJyYXJ5KGdhcG1pbmRlcilcbmxpYnJhcnkoZHBseXIpXG5saWJyYXJ5KGdncGxvdDIpXG5cbiMgU3VtbWFyaXplIG1lZGlhbkdkcFBlcmNhcCB3aXRoaW4gZWFjaCBjb250aW5lbnQgd2l0aGluIGVhY2ggeWVhcjogYnlfeWVhcl9jb250aW5lbnRcbmJ5X3llYXJfY29udGluZW50IDwtIGdhcG1pbmRlciAlPiVcbiAgZ3JvdXBfYnkoY29udGluZW50LCB5ZWFyKSAlPiVcbiAgc3VtbWFyaXplKG1lZGlhbkdkcFBlcmNhcCA9IG1lZGlhbihnZHBQZXJjYXApKVxuXG4jIFBsb3QgdGhlIGNoYW5nZSBpbiBtZWRpYW5HZHBQZXJjYXAgaW4gZWFjaCBjb250aW5lbnQgb3ZlciB0aW1lXG5nZ3Bsb3QoYnlfeWVhcl9jb250aW5lbnQsIGFlcyh4ID0geWVhciwgeSA9IG1lZGlhbkdkcFBlcmNhcCwgY29sb3IgPSBjb250aW5lbnQpKSArXG4gIGdlb21fcG9pbnQoKSArXG4gIGV4cGFuZF9saW1pdHMoeSA9IDApIn0=

Comparing median life expectancy and median GDP per continent in 2007

In these exercises you’ve generally created plots that show change over time. But as another way of exploring your data visually, you can also use ggplot2 to plot summarized data to compare continents within a single year.

Filter the gapminder dataset for the year 2007, then summarize the median GDP per capita and the median life expectancy within each continent, into columns called medianLifeExp and medianGdpPercap. Save this as by_continent_2007. Use the by_continent_2007 data to create a scatterplot comparing these summary statistics for continents in 2007, putting the median GDP per capita on the x-axis to the median life expectancy on the y-axis. Color the scatter plot by continent. You don’t need to add expand_limits(y = 0) for this plot.

HINT Creating the by_continent_2007 dataset requires three steps: a filter for the year 2007, a group_by on continent, and then a summarize creating two summary columns. Once you have the dataset created, the scatterplot should be relatively straightforward.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJsaWJyYXJ5KGdhcG1pbmRlcilcbmxpYnJhcnkoZHBseXIpXG5saWJyYXJ5KGdncGxvdDIpXG5cbiMgU3VtbWFyaXplIHRoZSBtZWRpYW4gR0RQIGFuZCBtZWRpYW4gbGlmZSBleHBlY3RhbmN5IHBlciBjb250aW5lbnQgaW4gMjAwN1xuXG5cbiMgVXNlIGEgc2NhdHRlciBwbG90IHRvIGNvbXBhcmUgdGhlIG1lZGlhbiBHRFAgYW5kIG1lZGlhbiBsaWZlIGV4cGVjdGFuY3kiLCJzb2x1dGlvbiI6ImxpYnJhcnkoZ2FwbWluZGVyKVxubGlicmFyeShkcGx5cilcbmxpYnJhcnkoZ2dwbG90MilcblxuIyBTdW1tYXJpemUgdGhlIG1lZGlhbiBHRFAgYW5kIG1lZGlhbiBsaWZlIGV4cGVjdGFuY3kgcGVyIGNvbnRpbmVudCBpbiAyMDA3XG5ieV9jb250aW5lbnRfMjAwNyA8LSBnYXBtaW5kZXIgJT4lXG4gIGZpbHRlcih5ZWFyID09IDIwMDcpICU+JVxuICBncm91cF9ieShjb250aW5lbnQpICU+JVxuICBzdW1tYXJpemUobWVkaWFuR2RwUGVyY2FwID0gbWVkaWFuKGdkcFBlcmNhcCksXG4gICAgICAgICAgICBtZWRpYW5MaWZlRXhwID0gbWVkaWFuKGxpZmVFeHApKVxuXG4jIFVzZSBhIHNjYXR0ZXIgcGxvdCB0byBjb21wYXJlIHRoZSBtZWRpYW4gR0RQIGFuZCBtZWRpYW4gbGlmZSBleHBlY3RhbmN5XG5nZ3Bsb3QoYnlfY29udGluZW50XzIwMDcsIGFlcyh4ID0gbWVkaWFuR2RwUGVyY2FwLCB5ID0gbWVkaWFuTGlmZUV4cCwgY29sb3IgPSBjb250aW5lbnQpKSArXG4gIGdlb21fcG9pbnQoKSJ9