Harold Nelson
10/2/2018
## ── Attaching packages ──────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.0.0 ✔ purrr 0.2.5
## ✔ tibble 1.4.2 ✔ dplyr 0.7.6
## ✔ tidyr 0.8.1 ✔ stringr 1.3.1
## ✔ readr 1.1.1 ✔ forcats 0.3.0
## Warning: package 'dplyr' was built under R version 3.5.1
## ── Conflicts ─────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
We want to look at ways to viaualize a relationship where the explanatory variable is categorical and the response variable is continuous.
In this set of examples, we start by thinking of the number of organ donations as just a function of country and year. First glimpse the data.
## Observations: 238
## Variables: 21
## $ country <chr> "Australia", "Australia", "Australia", "Austr...
## $ year <date> NA, 1991-01-01, 1992-01-01, 1993-01-01, 1994...
## $ donors <dbl> NA, 12.09, 12.35, 12.51, 10.25, 10.18, 10.59,...
## $ pop <int> 17065, 17284, 17495, 17667, 17855, 18072, 183...
## $ pop_dens <dbl> 0.2204433, 0.2232723, 0.2259980, 0.2282198, 0...
## $ gdp <int> 16774, 17171, 17914, 18883, 19849, 21079, 219...
## $ gdp_lag <int> 16591, 16774, 17171, 17914, 18883, 19849, 210...
## $ health <dbl> 1300, 1379, 1455, 1540, 1626, 1737, 1846, 194...
## $ health_lag <dbl> 1224, 1300, 1379, 1455, 1540, 1626, 1737, 184...
## $ pubhealth <dbl> 4.8, 5.4, 5.4, 5.4, 5.4, 5.5, 5.6, 5.7, 5.9, ...
## $ roads <dbl> 136.59537, 122.25179, 112.83224, 110.54508, 1...
## $ cerebvas <int> 682, 647, 630, 611, 631, 592, 576, 525, 516, ...
## $ assault <int> 21, 19, 17, 18, 17, 16, 17, 17, 16, 15, 16, 1...
## $ external <int> 444, 425, 406, 376, 387, 371, 395, 385, 410, ...
## $ txp_pop <dbl> 0.9375916, 0.9257116, 0.9145470, 0.9056433, 0...
## $ world <chr> "Liberal", "Liberal", "Liberal", "Liberal", "...
## $ opt <chr> "In", "In", "In", "In", "In", "In", "In", "In...
## $ consent_law <chr> "Informed", "Informed", "Informed", "Informed...
## $ consent_practice <chr> "Informed", "Informed", "Informed", "Informed...
## $ consistent <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Ye...
## $ ccode <chr> "Oz", "Oz", "Oz", "Oz", "Oz", "Oz", "Oz", "Oz...
Here is Healy’s first reasonable graph.
p <- ggplot(data = organdata,
mapping = aes(x = reorder(country, donors, na.rm=TRUE),
y = donors))
p + geom_boxplot() +
labs(x=NULL) +
coord_flip()
## Warning: Removed 34 rows containing non-finite values (stat_boxplot).
It is instructive to examine an ultra simple version of this graph to see how it was improved upon by what I call the first reasonable graph.
## Warning: Removed 34 rows containing non-finite values (stat_boxplot).