suppressPackageStartupMessages(library("tidyverse"))
package 㤼㸱tidyverse㤼㸲 was built under R version 3.6.3
1. Using prose, describe how the variables and observations are organized in each of the sample tables.
In table table1
, each row represents a (country, year) combination. The columns cases
and population
contain the values for those variables.
table1
In table2
, each row represents a (country, year, variable) combination. The column count
contains the values of variables cases and population in separate rows.
table2
In table3
, each row represents a (country, year) combination. The column rate
provides the values of both cases
and population
in a string formatted like cases / population
.
table3
Table 4 is split into two tables, one table for each variable. The table table4a
contains the values of cases and table4b
contains the values of population. Within each table, each row represents a country, each column represents a year, and the cells are the value of the table’s variable for that country and year.
table4a
table4b
2. Compute the rate for table2, and table4a + table4b. You will need to perform four operations:
1.Extract the number of TB cases per country per year. 2.Extract the matching population per country per year. 3.Divide cases by population, and multiply by 10000. 4.Store back in the appropriate place.
Which representation is easiest to work with? Which is hardest? Why?
To calculate cases per person, we need to divide cases by population for each country and year. This is easiest if the cases and population variables are two columns in a data frame in which rows represent (country, year) combinations.
Table 2: First, create separate tables for cases and population and ensure that they are sorted in the same order.
t2_cases <- filter(table2, type == "cases") %>%
rename(cases = count) %>%
arrange(country, year)
t2_population <- filter(table2, type == "population") %>%
rename(population = count) %>%
arrange(country, year)
Then create a new data frame with the population and cases columns, and calculate the cases per capita in a new column.
t2_cases_per_cap <- tibble(
year = t2_cases$year,
country = t2_cases$country,
cases = t2_cases$cases,
population = t2_population$population
) %>%
mutate(cases_per_cap = (cases / population) * 10000) %>%
select(country, year, cases_per_cap)
To store this new variable in the appropriate location, we will add new rows to table2
.
t2_cases_per_cap <- t2_cases_per_cap %>%
mutate(type = "cases_per_cap") %>%
rename(count = cases_per_cap)
bind_rows(table2, t2_cases_per_cap) %>%
arrange(country, year, type, count)
Note that after adding the cases_per_cap
rows, the type of count
is coerced to numeric
(double) because cases_per_cap
is not an integer.
For table4a
and table4b
, create a new table for cases per capita, which we’ll name table4c
, with country rows and year columns.
table4c <-
tibble(
country = table4a$country,
`1999` = table4a[["1999"]] / table4b[["1999"]] * 10000,
`2000` = table4a[["2000"]] / table4b[["2000"]] * 10000
)
table4c
Neither table is particularly easy to work with. Since table2
has separate rows for cases and population we needed to generate a table with columns for cases and population where we could calculate cases per capita. table4a
and table4b
split the cases and population variables into different tables which made it easy to divide cases by population. However, we had to repeat this calculation for each row.
The ideal format of a data frame to answer this question is one with columns country
, year
, cases
, and population
. Then problem could be answered with a single mutate()
call.
3. Recreate the plot showing change in cases over time using table2
instead of table1
. What do you need to do first?
Before creating the plot with change in cases over time, we need to filter table2
to only include rows representing cases of TB.
table2 %>%
filter(type == "cases") %>%
ggplot(aes(year, count)) +
geom_line(aes(group = country), colour = "grey50") +
geom_point(aes(colour = country)) +
scale_x_continuous(breaks = unique(table2$year)) +
ylab("cases")

LS0tDQp0aXRsZTogIlRpZHkgZGF0YSINCm91dHB1dDogDQogIGh0bWxfbm90ZWJvb2s6DQogICAgdG9jOiB0cnVlDQogICAgdG9jX2Zsb2F0OiB0cnVlDQotLS0NCg0KYGBge3J9DQpzdXBwcmVzc1BhY2thZ2VTdGFydHVwTWVzc2FnZXMobGlicmFyeSgidGlkeXZlcnNlIikpDQpgYGANCg0KIyMjIDEuIFVzaW5nIHByb3NlLCBkZXNjcmliZSBob3cgdGhlIHZhcmlhYmxlcyBhbmQgb2JzZXJ2YXRpb25zIGFyZSBvcmdhbml6ZWQgaW4gZWFjaCBvZiB0aGUgc2FtcGxlIHRhYmxlcy4NCg0KSW4gdGFibGUgYHRhYmxlMWAsIGVhY2ggcm93IHJlcHJlc2VudHMgYSAoY291bnRyeSwgeWVhcikgY29tYmluYXRpb24uIFRoZSBjb2x1bW5zIGBjYXNlc2AgYW5kIGBwb3B1bGF0aW9uYCBjb250YWluIHRoZSB2YWx1ZXMgZm9yIHRob3NlIHZhcmlhYmxlcy4NCg0KYGBge3J9DQp0YWJsZTENCmBgYA0KDQpJbiBgdGFibGUyYCwgZWFjaCByb3cgcmVwcmVzZW50cyBhIChjb3VudHJ5LCB5ZWFyLCB2YXJpYWJsZSkgY29tYmluYXRpb24uIFRoZSBjb2x1bW4gYGNvdW50YCBjb250YWlucyB0aGUgdmFsdWVzIG9mIHZhcmlhYmxlcyBjYXNlcyBhbmQgcG9wdWxhdGlvbiBpbiBzZXBhcmF0ZSByb3dzLg0KDQpgYGB7cn0NCnRhYmxlMg0KYGBgDQoNCkluIGB0YWJsZTNgLCBlYWNoIHJvdyByZXByZXNlbnRzIGEgKGNvdW50cnksIHllYXIpIGNvbWJpbmF0aW9uLiBUaGUgY29sdW1uIGByYXRlYCBwcm92aWRlcyB0aGUgdmFsdWVzIG9mIGJvdGggYGNhc2VzYCBhbmQgYHBvcHVsYXRpb25gIGluIGEgc3RyaW5nIGZvcm1hdHRlZCBsaWtlIGBjYXNlcyAvIHBvcHVsYXRpb25gLg0KDQpgYGB7cn0NCnRhYmxlMw0KYGBgDQoNClRhYmxlIDQgaXMgc3BsaXQgaW50byB0d28gdGFibGVzLCBvbmUgdGFibGUgZm9yIGVhY2ggdmFyaWFibGUuIFRoZSB0YWJsZSBgdGFibGU0YWAgY29udGFpbnMgdGhlIHZhbHVlcyBvZiBjYXNlcyBhbmQgYHRhYmxlNGJgIGNvbnRhaW5zIHRoZSB2YWx1ZXMgb2YgcG9wdWxhdGlvbi4gV2l0aGluIGVhY2ggdGFibGUsIGVhY2ggcm93IHJlcHJlc2VudHMgYSBjb3VudHJ5LCBlYWNoIGNvbHVtbiByZXByZXNlbnRzIGEgeWVhciwgYW5kIHRoZSBjZWxscyBhcmUgdGhlIHZhbHVlIG9mIHRoZSB0YWJsZeKAmXMgdmFyaWFibGUgZm9yIHRoYXQgY291bnRyeSBhbmQgeWVhci4NCg0KYGBge3J9DQp0YWJsZTRhDQp0YWJsZTRiDQpgYGANCg0KIyMjIDIuIENvbXB1dGUgdGhlIHJhdGUgZm9yIHRhYmxlMiwgYW5kIHRhYmxlNGEgKyB0YWJsZTRiLiBZb3Ugd2lsbCBuZWVkIHRvIHBlcmZvcm0gZm91ciBvcGVyYXRpb25zOg0KDQoxLkV4dHJhY3QgdGhlIG51bWJlciBvZiBUQiBjYXNlcyBwZXIgY291bnRyeSBwZXIgeWVhci4NCjIuRXh0cmFjdCB0aGUgbWF0Y2hpbmcgcG9wdWxhdGlvbiBwZXIgY291bnRyeSBwZXIgeWVhci4NCjMuRGl2aWRlIGNhc2VzIGJ5IHBvcHVsYXRpb24sIGFuZCBtdWx0aXBseSBieSAxMDAwMC4NCjQuU3RvcmUgYmFjayBpbiB0aGUgYXBwcm9wcmlhdGUgcGxhY2UuDQoNCldoaWNoIHJlcHJlc2VudGF0aW9uIGlzIGVhc2llc3QgdG8gd29yayB3aXRoPyBXaGljaCBpcyBoYXJkZXN0PyBXaHk/DQoNClRvIGNhbGN1bGF0ZSBjYXNlcyBwZXIgcGVyc29uLCB3ZSBuZWVkIHRvIGRpdmlkZSBjYXNlcyBieSBwb3B1bGF0aW9uIGZvciBlYWNoIGNvdW50cnkgYW5kIHllYXIuIFRoaXMgaXMgZWFzaWVzdCBpZiB0aGUgY2FzZXMgYW5kIHBvcHVsYXRpb24gdmFyaWFibGVzIGFyZSB0d28gY29sdW1ucyBpbiBhIGRhdGEgZnJhbWUgaW4gd2hpY2ggcm93cyByZXByZXNlbnQgKGNvdW50cnksIHllYXIpIGNvbWJpbmF0aW9ucy4NCg0KVGFibGUgMjogRmlyc3QsIGNyZWF0ZSBzZXBhcmF0ZSB0YWJsZXMgZm9yIGNhc2VzIGFuZCBwb3B1bGF0aW9uIGFuZCBlbnN1cmUgdGhhdCB0aGV5IGFyZSBzb3J0ZWQgaW4gdGhlIHNhbWUgb3JkZXIuDQoNCmBgYHtyfQ0KdDJfY2FzZXMgPC0gZmlsdGVyKHRhYmxlMiwgdHlwZSA9PSAiY2FzZXMiKSAlPiUNCiAgcmVuYW1lKGNhc2VzID0gY291bnQpICU+JQ0KICBhcnJhbmdlKGNvdW50cnksIHllYXIpDQp0Ml9wb3B1bGF0aW9uIDwtIGZpbHRlcih0YWJsZTIsIHR5cGUgPT0gInBvcHVsYXRpb24iKSAlPiUNCiAgcmVuYW1lKHBvcHVsYXRpb24gPSBjb3VudCkgJT4lDQogIGFycmFuZ2UoY291bnRyeSwgeWVhcikNCmBgYA0KDQpUaGVuIGNyZWF0ZSBhIG5ldyBkYXRhIGZyYW1lIHdpdGggdGhlIHBvcHVsYXRpb24gYW5kIGNhc2VzIGNvbHVtbnMsIGFuZCBjYWxjdWxhdGUgdGhlIGNhc2VzIHBlciBjYXBpdGEgaW4gYSBuZXcgY29sdW1uLg0KDQpgYGB7cn0NCnQyX2Nhc2VzX3Blcl9jYXAgPC0gdGliYmxlKA0KICB5ZWFyID0gdDJfY2FzZXMkeWVhciwNCiAgY291bnRyeSA9IHQyX2Nhc2VzJGNvdW50cnksDQogIGNhc2VzID0gdDJfY2FzZXMkY2FzZXMsDQogIHBvcHVsYXRpb24gPSB0Ml9wb3B1bGF0aW9uJHBvcHVsYXRpb24NCikgJT4lDQogIG11dGF0ZShjYXNlc19wZXJfY2FwID0gKGNhc2VzIC8gcG9wdWxhdGlvbikgKiAxMDAwMCkgJT4lDQogIHNlbGVjdChjb3VudHJ5LCB5ZWFyLCBjYXNlc19wZXJfY2FwKQ0KYGBgDQoNClRvIHN0b3JlIHRoaXMgbmV3IHZhcmlhYmxlIGluIHRoZSBhcHByb3ByaWF0ZSBsb2NhdGlvbiwgd2Ugd2lsbCBhZGQgbmV3IHJvd3MgdG8gYHRhYmxlMmAuDQoNCmBgYHtyfQ0KdDJfY2FzZXNfcGVyX2NhcCA8LSB0Ml9jYXNlc19wZXJfY2FwICU+JQ0KICBtdXRhdGUodHlwZSA9ICJjYXNlc19wZXJfY2FwIikgJT4lDQogIHJlbmFtZShjb3VudCA9IGNhc2VzX3Blcl9jYXApDQpiaW5kX3Jvd3ModGFibGUyLCB0Ml9jYXNlc19wZXJfY2FwKSAlPiUNCiAgYXJyYW5nZShjb3VudHJ5LCB5ZWFyLCB0eXBlLCBjb3VudCkNCmBgYA0KDQpOb3RlIHRoYXQgYWZ0ZXIgYWRkaW5nIHRoZSBgY2FzZXNfcGVyX2NhcGAgcm93cywgdGhlIHR5cGUgb2YgYGNvdW50YCBpcyBjb2VyY2VkIHRvIGBudW1lcmljYCAoZG91YmxlKSBiZWNhdXNlIGBjYXNlc19wZXJfY2FwYCBpcyBub3QgYW4gaW50ZWdlci4NCg0KRm9yIGB0YWJsZTRhYCBhbmQgYHRhYmxlNGJgLCBjcmVhdGUgYSBuZXcgdGFibGUgZm9yIGNhc2VzIHBlciBjYXBpdGEsIHdoaWNoIHdl4oCZbGwgbmFtZSBgdGFibGU0Y2AsIHdpdGggY291bnRyeSByb3dzIGFuZCB5ZWFyIGNvbHVtbnMuDQoNCmBgYHtyfQ0KdGFibGU0YyA8LQ0KICB0aWJibGUoDQogICAgY291bnRyeSA9IHRhYmxlNGEkY291bnRyeSwNCiAgICBgMTk5OWAgPSB0YWJsZTRhW1siMTk5OSJdXSAvIHRhYmxlNGJbWyIxOTk5Il1dICogMTAwMDAsDQogICAgYDIwMDBgID0gdGFibGU0YVtbIjIwMDAiXV0gLyB0YWJsZTRiW1siMjAwMCJdXSAqIDEwMDAwDQogICkNCnRhYmxlNGMNCmBgYA0KDQpOZWl0aGVyIHRhYmxlIGlzIHBhcnRpY3VsYXJseSBlYXN5IHRvIHdvcmsgd2l0aC4gU2luY2UgYHRhYmxlMmAgaGFzIHNlcGFyYXRlIHJvd3MgZm9yIGNhc2VzIGFuZCBwb3B1bGF0aW9uIHdlIG5lZWRlZCB0byBnZW5lcmF0ZSBhIHRhYmxlIHdpdGggY29sdW1ucyBmb3IgY2FzZXMgYW5kIHBvcHVsYXRpb24gd2hlcmUgd2UgY291bGQgY2FsY3VsYXRlIGNhc2VzIHBlciBjYXBpdGEuIGB0YWJsZTRhYCBhbmQgYHRhYmxlNGJgIHNwbGl0IHRoZSBjYXNlcyBhbmQgcG9wdWxhdGlvbiB2YXJpYWJsZXMgaW50byBkaWZmZXJlbnQgdGFibGVzIHdoaWNoIG1hZGUgaXQgZWFzeSB0byBkaXZpZGUgY2FzZXMgYnkgcG9wdWxhdGlvbi4gSG93ZXZlciwgd2UgaGFkIHRvIHJlcGVhdCB0aGlzIGNhbGN1bGF0aW9uIGZvciBlYWNoIHJvdy4NCg0KVGhlIGlkZWFsIGZvcm1hdCBvZiBhIGRhdGEgZnJhbWUgdG8gYW5zd2VyIHRoaXMgcXVlc3Rpb24gaXMgb25lIHdpdGggY29sdW1ucyBgY291bnRyeWAsIGB5ZWFyYCwgYGNhc2VzYCwgYW5kIGBwb3B1bGF0aW9uYC4gVGhlbiBwcm9ibGVtIGNvdWxkIGJlIGFuc3dlcmVkIHdpdGggYSBzaW5nbGUgYG11dGF0ZSgpYCBjYWxsLg0KDQojIyMgMy4gUmVjcmVhdGUgdGhlIHBsb3Qgc2hvd2luZyBjaGFuZ2UgaW4gY2FzZXMgb3ZlciB0aW1lIHVzaW5nIGB0YWJsZTJgIGluc3RlYWQgb2YgYHRhYmxlMWAuIFdoYXQgZG8geW91IG5lZWQgdG8gZG8gZmlyc3Q/DQoNCkJlZm9yZSBjcmVhdGluZyB0aGUgcGxvdCB3aXRoIGNoYW5nZSBpbiBjYXNlcyBvdmVyIHRpbWUsIHdlIG5lZWQgdG8gZmlsdGVyIGB0YWJsZTJgIHRvIG9ubHkgaW5jbHVkZSByb3dzIHJlcHJlc2VudGluZyBjYXNlcyBvZiBUQi4NCg0KYGBge3J9DQp0YWJsZTIgJT4lDQogIGZpbHRlcih0eXBlID09ICJjYXNlcyIpICU+JQ0KICBnZ3Bsb3QoYWVzKHllYXIsIGNvdW50KSkgKw0KICBnZW9tX2xpbmUoYWVzKGdyb3VwID0gY291bnRyeSksIGNvbG91ciA9ICJncmV5NTAiKSArDQogIGdlb21fcG9pbnQoYWVzKGNvbG91ciA9IGNvdW50cnkpKSArDQogIHNjYWxlX3hfY29udGludW91cyhicmVha3MgPSB1bmlxdWUodGFibGUyJHllYXIpKSArDQogIHlsYWIoImNhc2VzIikNCmBgYA0K