In this lab activity, we’ll discuss Data Wrangling/Transformation via the dplyr package. We’ll explore ways to choose subsets of data, aggregate data to create summaries, make new variables, and sort your data frames. It is recommended you also explore the RStudio Cheatsheet on Data Transformation as we discuss this content.

Back to gapminder

Here is a look at the gapminder data frame in the gapminder package.

library(gapminder)
rmarkdown::paged_table(gapminder)
# Has been set as the default way to print data frames
# via df_print: paged in the YAML at the top

Say we wanted mean life expectancy across all years for Asia

# Base R
asia <- gapminder[gapminder$continent == "Asia", ]
mean(asia$lifeExp)
[1] 60.0649
library(dplyr)
gapminder %>% 
  filter(continent == "Asia") %>%
  summarize(mean_exp = mean(lifeExp))
gapminder %>% 
  filter(year == 2002 | continent == "Europe")
gapminder %>% 
  filter(year == 2002, continent == "Europe")
gapminder %>% 
  filter(country %in% c("Argentina", "Belgium", "Mexico"),
         year %in% c(1987, 1992))

summarize()

max_exp_1997 <- gapminder %>% 
  filter(year == 1997) %>% 
  summarize(max_exp = max(lifeExp))
max_exp_1997

Combining summarize() with group_by()

When you’d like to determine a numerical summary for all levels of a different categorical variable

max_exp_1997_by_cont <- gapminder %>% 
  filter(year == 1997) %>% 
  group_by(continent) %>%
  summarize(max_exp = max(lifeExp))
max_exp_1997_by_cont

Without the %>%

It’s hard to appreciate the %>% without seeing what the code would look like without it:

max_exp_1997_by_cont <- 
  summarize(
    group_by(
      filter(
        gapminder, 
          year == 1997), 
      continent),
    max_exp = max(lifeExp))
max_exp_1997_by_cont

mutate()

  1. create a new variable based on other variables
gap_w_gdp <- gapminder %>% mutate(gdp = pop * gdpPercap)
gap_w_gdp

mutate()

  1. change the contents of an existing variable
gap_weird <- gapminder %>% mutate(pop = pop + 1000)
gap_weird

arrange()

gapminder %>%
  arrange(year, country)
gapminder %>%
  filter(year > 2000) %>%
  arrange(desc(lifeExp))

Other useful dplyr verbs

Your Task

Determine which African country had the highest GDP per capita in 1982 using the gapminder data in the gapminder package.

library(gapminder)
library(dplyr)

gapminder %>%
  filter(year == 1982, continent == "Africa") %>%
  arrange(desc(gdpPercap))

Libya has the highest GDP per capita in Africa in 1982.

Challenge

For both of these problems below, use the bechdel data frame in the fivethirtyeight package:

library(dplyr)
library(fivethirtyeight)
view(bechdel)
bechdel %>%
  filter(year == 2013) %>%
  count(clean_test)
bechdel %>%
  count(clean_test) %>%
  mutate(percentage = n / sum(n) * 100) %>%
  filter(clean_test == "ok")

Your Task

Determine the top five movies in terms of domestic return on investment for 2013 scaled data using the bechdel data frame in the fivethirtyeight package.

bechdel %>%
  mutate(domestic_roi_2013 = domgross_2013 / budget_2013) %>% 
  filter(year == 2013) %>%
  arrange(desc(domestic_roi_2013)) %>%
  select(title, year, domestic_roi_2013)

The top 5 movies in domestic return on investment in 2013 in order is The Purge, Insidious: Chapter 2, The Conjuring, Despicable Me 2, and Mama.

LS0tDQp0aXRsZTogJ0RhdGEgV3JhbmdsaW5nIExhYjogSW50cm8gdG8gYGRwbHlyYCcNCmF1dGhvcjogIkNoZXN0ZXIgSXNtYXkiDQpvdXRwdXQ6DQogIGh0bWxfZG9jdW1lbnQ6DQogICAgY29kZV9kb3dubG9hZDogdHJ1ZQ0KICAgIGNvZGVfZm9sZGluZzogaGlkZQ0KICAgIGRmX3ByaW50OiBwYWdlZA0KICBwZGZfZG9jdW1lbnQ6IGRlZmF1bHQNCiAgd29yZF9kb2N1bWVudDogZGVmYXVsdA0KLS0tDQoNCmBgYHtyIGluY2x1ZGU9RkFMU0V9DQpsaWJyYXJ5KHRpZHl2ZXJzZSkNCmtuaXRyOjpvcHRzX2NodW5rJHNldChtZXNzYWdlPUZBTFNFKQ0KZmlsdGVyIDwtIGRwbHlyOjpmaWx0ZXINCmtuaXRyOjpvcHRzX2NodW5rJHNldCh3YXJuaW5nPUZBTFNFLCBtZXNzYWdlPUZBTFNFLCBmaWcud2lkdGg9OS41LCBmaWcuaGVpZ2h0PTQuNSwgY29tbWVudD1OQSwgcm93cy5wcmludD0xNiwgb3V0LndpZHRoID0gIlxcdGV4dHdpZHRoIikNCnRoZW1lX3NldCh0aGVtZV9ncmF5KGJhc2Vfc2l6ZSA9IDIwKSkNCmBgYA0KDQpJbiB0aGlzIGxhYiBhY3Rpdml0eSwgd2UnbGwgZGlzY3VzcyBEYXRhIFdyYW5nbGluZy9UcmFuc2Zvcm1hdGlvbiB2aWEgdGhlIGBkcGx5cmAgcGFja2FnZS4gV2UnbGwgZXhwbG9yZSB3YXlzIHRvIGNob29zZSBzdWJzZXRzIG9mIGRhdGEsIGFnZ3JlZ2F0ZSBkYXRhIHRvIGNyZWF0ZSBzdW1tYXJpZXMsIG1ha2UgbmV3IHZhcmlhYmxlcywgYW5kIHNvcnQgeW91ciBkYXRhIGZyYW1lcy4gSXQgaXMgcmVjb21tZW5kZWQgeW91IGFsc28gZXhwbG9yZSB0aGUgUlN0dWRpbyBDaGVhdHNoZWV0IG9uIFtEYXRhIFRyYW5zZm9ybWF0aW9uXShodHRwczovL2dpdGh1Yi5jb20vcnN0dWRpby9jaGVhdHNoZWV0cy9yYXcvbWFzdGVyL3NvdXJjZS9wZGZzL2RhdGEtdHJhbnNmb3JtYXRpb24tY2hlYXRzaGVldC5wZGYpIGFzIHdlIGRpc2N1c3MgdGhpcyBjb250ZW50Lg0KDQojIyMgQmFjayB0byBgZ2FwbWluZGVyYA0KDQpIZXJlIGlzIGEgbG9vayBhdCB0aGUgYGdhcG1pbmRlcmAgZGF0YSBmcmFtZSBpbiB0aGUgYGdhcG1pbmRlcmAgcGFja2FnZS4NCg0KYGBge3J9DQpsaWJyYXJ5KGdhcG1pbmRlcikNCnJtYXJrZG93bjo6cGFnZWRfdGFibGUoZ2FwbWluZGVyKQ0KIyBIYXMgYmVlbiBzZXQgYXMgdGhlIGRlZmF1bHQgd2F5IHRvIHByaW50IGRhdGEgZnJhbWVzDQojIHZpYSBkZl9wcmludDogcGFnZWQgaW4gdGhlIFlBTUwgYXQgdGhlIHRvcA0KYGBgDQoNClNheSB3ZSB3YW50ZWQgbWVhbiBsaWZlIGV4cGVjdGFuY3kgYWNyb3NzIGFsbCB5ZWFycyBmb3IgQXNpYQ0KDQpgYGB7cn0NCiMgQmFzZSBSDQphc2lhIDwtIGdhcG1pbmRlcltnYXBtaW5kZXIkY29udGluZW50ID09ICJBc2lhIiwgXQ0KbWVhbihhc2lhJGxpZmVFeHApDQpgYGANCg0KYGBge3J9DQpsaWJyYXJ5KGRwbHlyKQ0KZ2FwbWluZGVyICU+JSANCiAgZmlsdGVyKGNvbnRpbmVudCA9PSAiQXNpYSIpICU+JQ0KICBzdW1tYXJpemUobWVhbl9leHAgPSBtZWFuKGxpZmVFeHApKQ0KYGBgDQoNCg0KYGBge3J9DQpnYXBtaW5kZXIgJT4lIA0KICBmaWx0ZXIoeWVhciA9PSAyMDAyIHwgY29udGluZW50ID09ICJFdXJvcGUiKQ0KYGBgDQoNCi0gICBVc2UgYCZgIG9yIGAsYCB0byBjaGVjayBmb3IgYWxsIG9mIG11bHRpcGxlIGZpbHRlcnMgYmVpbmcgdHJ1ZToNCg0KYGBge3J9DQpnYXBtaW5kZXIgJT4lIA0KICBmaWx0ZXIoeWVhciA9PSAyMDAyLCBjb250aW5lbnQgPT0gIkV1cm9wZSIpDQpgYGANCg0KLSAgIFVzZSBgJWluJWAgdG8gY2hlY2sgZm9yIGFueSBiZWluZyB0cnVlIChzaG9ydGN1dCB0byB1c2luZyBgfGAgcmVwZWF0ZWRseSB3aXRoIGA9PWApDQoNCmBgYHtyfQ0KZ2FwbWluZGVyICU+JSANCiAgZmlsdGVyKGNvdW50cnkgJWluJSBjKCJBcmdlbnRpbmEiLCAiQmVsZ2l1bSIsICJNZXhpY28iKSwNCiAgICAgICAgIHllYXIgJWluJSBjKDE5ODcsIDE5OTIpKQ0KYGBgDQoNCiMjIGBzdW1tYXJpemUoKWANCg0KLSAgIEFueSBudW1lcmljYWwgc3VtbWFyeSB0aGF0IHlvdSB3YW50IHRvIGFwcGx5IHRvIGEgY29sdW1uIG9mIGEgZGF0YSBmcmFtZSBpcyBzcGVjaWZpZWQgd2l0aGluIGBzdW1tYXJpemUoKWAuDQoNCmBgYHtyfQ0KbWF4X2V4cF8xOTk3IDwtIGdhcG1pbmRlciAlPiUgDQogIGZpbHRlcih5ZWFyID09IDE5OTcpICU+JSANCiAgc3VtbWFyaXplKG1heF9leHAgPSBtYXgobGlmZUV4cCkpDQptYXhfZXhwXzE5OTcNCmBgYA0KDQojIyMgQ29tYmluaW5nIGBzdW1tYXJpemUoKWAgd2l0aCBgZ3JvdXBfYnkoKWANCg0KV2hlbiB5b3UnZCBsaWtlIHRvIGRldGVybWluZSBhIG51bWVyaWNhbCBzdW1tYXJ5IGZvciBhbGwgbGV2ZWxzIG9mIGEgZGlmZmVyZW50IGNhdGVnb3JpY2FsIHZhcmlhYmxlDQoNCmBgYHtyfQ0KbWF4X2V4cF8xOTk3X2J5X2NvbnQgPC0gZ2FwbWluZGVyICU+JSANCiAgZmlsdGVyKHllYXIgPT0gMTk5NykgJT4lIA0KICBncm91cF9ieShjb250aW5lbnQpICU+JQ0KICBzdW1tYXJpemUobWF4X2V4cCA9IG1heChsaWZlRXhwKSkNCm1heF9leHBfMTk5N19ieV9jb250DQpgYGANCg0KIyMjIFdpdGhvdXQgdGhlIGAlPiVgDQoNCkl0J3MgaGFyZCB0byBhcHByZWNpYXRlIHRoZSBgJT4lYCB3aXRob3V0IHNlZWluZyB3aGF0IHRoZSBjb2RlIHdvdWxkIGxvb2sgbGlrZSB3aXRob3V0IGl0Og0KDQpgYGB7cn0NCm1heF9leHBfMTk5N19ieV9jb250IDwtIA0KICBzdW1tYXJpemUoDQogICAgZ3JvdXBfYnkoDQogICAgICBmaWx0ZXIoDQogICAgICAgIGdhcG1pbmRlciwgDQogICAgICAgICAgeWVhciA9PSAxOTk3KSwgDQogICAgICBjb250aW5lbnQpLA0KICAgIG1heF9leHAgPSBtYXgobGlmZUV4cCkpDQptYXhfZXhwXzE5OTdfYnlfY29udA0KYGBgDQoNCiMjIGBtdXRhdGUoKWANCg0KLSAgIEFsbG93cyB5b3UgdG8NCiAgICAxLiAgY3JlYXRlIGEgbmV3IHZhcmlhYmxlIGJhc2VkIG9uIG90aGVyIHZhcmlhYmxlcyBPUg0KICAgIDIuICBjaGFuZ2UgdGhlIGNvbnRlbnRzIG9mIGFuIGV4aXN0aW5nIHZhcmlhYmxlDQoNCjEuICBjcmVhdGUgYSBuZXcgdmFyaWFibGUgYmFzZWQgb24gb3RoZXIgdmFyaWFibGVzDQoNCmBgYHtyfQ0KZ2FwX3dfZ2RwIDwtIGdhcG1pbmRlciAlPiUgbXV0YXRlKGdkcCA9IHBvcCAqIGdkcFBlcmNhcCkNCmdhcF93X2dkcA0KYGBgDQoNCiMjIGBtdXRhdGUoKWANCg0KMy4gIGNoYW5nZSB0aGUgY29udGVudHMgb2YgYW4gZXhpc3RpbmcgdmFyaWFibGUNCg0KYGBge3J9DQpnYXBfd2VpcmQgPC0gZ2FwbWluZGVyICU+JSBtdXRhdGUocG9wID0gcG9wICsgMTAwMCkNCmdhcF93ZWlyZA0KYGBgDQoNCiMjIGBhcnJhbmdlKClgDQoNCi0gICBSZW9yZGVycyB0aGUgcm93cyBpbiBhIGRhdGEgZnJhbWUgYmFzZWQgb24gdGhlIHZhbHVlcyBvZiBvbmUgb3IgbW9yZSB2YXJpYWJsZXMNCg0KYGBge3J9DQpnYXBtaW5kZXIgJT4lDQogIGFycmFuZ2UoeWVhciwgY291bnRyeSkNCmBgYA0KDQotICAgQ2FuIGFsc28gcHV0IGludG8gZGVzY2VuZGluZyBvcmRlcg0KDQpgYGB7ciBkZXNjfQ0KZ2FwbWluZGVyICU+JQ0KICBmaWx0ZXIoeWVhciA+IDIwMDApICU+JQ0KICBhcnJhbmdlKGRlc2MobGlmZUV4cCkpDQpgYGANCg0KIyMgT3RoZXIgdXNlZnVsIGBkcGx5cmAgdmVyYnMNCg0KLSAgIGBzZWxlY3RgDQotICAgYHRvcF9uYA0KLSAgIGBzYW1wbGVfbmANCi0gICBgc2xpY2VgDQotICAgYGdsaW1wc2VgDQotICAgYHJlbmFtZWANCg0KIyMgWW91ciBUYXNrDQoNCkRldGVybWluZSB3aGljaCBBZnJpY2FuIGNvdW50cnkgaGFkIHRoZSBoaWdoZXN0IEdEUCBwZXIgY2FwaXRhIGluIDE5ODIgdXNpbmcgdGhlIGBnYXBtaW5kZXJgIGRhdGEgaW4gdGhlIGBnYXBtaW5kZXJgIHBhY2thZ2UuDQoNCmBgYHtyfQ0KbGlicmFyeShnYXBtaW5kZXIpDQpsaWJyYXJ5KGRwbHlyKQ0KDQpnYXBtaW5kZXIgJT4lDQogIGZpbHRlcih5ZWFyID09IDE5ODIsIGNvbnRpbmVudCA9PSAiQWZyaWNhIikgJT4lDQogIGFycmFuZ2UoZGVzYyhnZHBQZXJjYXApKQ0KDQpgYGANCkxpYnlhIGhhcyB0aGUgaGlnaGVzdCBHRFAgcGVyIGNhcGl0YSBpbiBBZnJpY2EgaW4gMTk4Mi4NCg0KKipDaGFsbGVuZ2UqKg0KDQpGb3IgYm90aCBvZiB0aGVzZSBwcm9ibGVtcyBiZWxvdywgdXNlIHRoZSBgYmVjaGRlbGAgZGF0YSBmcmFtZSBpbiB0aGUgYGZpdmV0aGlydHllaWdodGAgcGFja2FnZToNCg0KLSAgIFVzZSB0aGUgYGNvdW50YCBmdW5jdGlvbiBpbiB0aGUgYGRwbHlyYCBwYWNrYWdlIHRvIGRldGVybWluZSBob3cgbWFueSBtb3ZpZXMgaW4gMjAxMyBmZWxsIGludG8gZWFjaCBvZiB0aGUgZGlmZmVyZW50IGNhdGVnb3JpZXMgZm9yIGBjbGVhbl90ZXN0YA0KLSAgIERldGVybWluZSB0aGUgcGVyY2VudGFnZSBvZiBtb3ZpZXMgdGhhdCByZWNlaXZlZCB0aGUgdmFsdWUgb2YgYCJvayJgIGFjcm9zcyBhbGwgeWVhcnMNCg0KDQpgYGB7cn0NCmxpYnJhcnkoZHBseXIpDQpsaWJyYXJ5KGZpdmV0aGlydHllaWdodCkNCnZpZXcoYmVjaGRlbCkNCmJlY2hkZWwgJT4lDQogIGZpbHRlcih5ZWFyID09IDIwMTMpICU+JQ0KICBjb3VudChjbGVhbl90ZXN0KQ0KDQpgYGANCg0KYGBge3J9DQpiZWNoZGVsICU+JQ0KICBjb3VudChjbGVhbl90ZXN0KSAlPiUNCiAgbXV0YXRlKHBlcmNlbnRhZ2UgPSBuIC8gc3VtKG4pICogMTAwKSAlPiUNCiAgZmlsdGVyKGNsZWFuX3Rlc3QgPT0gIm9rIikNCg0KYGBgDQoNCi0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQ0KDQojIyBZb3VyIFRhc2sNCg0KRGV0ZXJtaW5lIHRoZSB0b3AgZml2ZSBtb3ZpZXMgaW4gdGVybXMgb2YgZG9tZXN0aWMgcmV0dXJuIG9uIGludmVzdG1lbnQgZm9yIDIwMTMgc2NhbGVkIGRhdGEgdXNpbmcgdGhlIGBiZWNoZGVsYCBkYXRhIGZyYW1lIGluIHRoZSBgZml2ZXRoaXJ0eWVpZ2h0YCBwYWNrYWdlLg0KDQpgYGB7cn0NCmJlY2hkZWwgJT4lDQogIG11dGF0ZShkb21lc3RpY19yb2lfMjAxMyA9IGRvbWdyb3NzXzIwMTMgLyBidWRnZXRfMjAxMykgJT4lIA0KICBmaWx0ZXIoeWVhciA9PSAyMDEzKSAlPiUNCiAgYXJyYW5nZShkZXNjKGRvbWVzdGljX3JvaV8yMDEzKSkgJT4lDQogIHNlbGVjdCh0aXRsZSwgeWVhciwgZG9tZXN0aWNfcm9pXzIwMTMpDQpgYGANClRoZSB0b3AgNSBtb3ZpZXMgaW4gZG9tZXN0aWMgcmV0dXJuIG9uIGludmVzdG1lbnQgaW4gMjAxMyBpbiBvcmRlciBpcyBUaGUgUHVyZ2UsIEluc2lkaW91czogQ2hhcHRlciAyLCBUaGUgQ29uanVyaW5nLCBEZXNwaWNhYmxlIE1lIDIsIGFuZCBNYW1hLg==