Harold Nelson
2/3/2021
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.0 ✓ purrr 0.3.4
## ✓ tibble 3.0.5 ✓ dplyr 1.0.3
## ✓ tidyr 1.0.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
We want to look at ways to viaualize a relationship where the explanatory variable is categorical and the response variable is continuous.
In this set of examples, we start by thinking of the number of organ donations as just a function of country and year. First glimpse the data.
## Rows: 238
## Columns: 21
## $ country <chr> "Australia", "Australia", "Australia", "Australia", …
## $ year <date> NA, 1991-01-01, 1992-01-01, 1993-01-01, 1994-01-01,…
## $ donors <dbl> NA, 12.09, 12.35, 12.51, 10.25, 10.18, 10.59, 10.26,…
## $ pop <int> 17065, 17284, 17495, 17667, 17855, 18072, 18311, 185…
## $ pop_dens <dbl> 0.2204433, 0.2232723, 0.2259980, 0.2282198, 0.230648…
## $ gdp <int> 16774, 17171, 17914, 18883, 19849, 21079, 21923, 229…
## $ gdp_lag <int> 16591, 16774, 17171, 17914, 18883, 19849, 21079, 219…
## $ health <dbl> 1300, 1379, 1455, 1540, 1626, 1737, 1846, 1948, 2077…
## $ health_lag <dbl> 1224, 1300, 1379, 1455, 1540, 1626, 1737, 1846, 1948…
## $ pubhealth <dbl> 4.8, 5.4, 5.4, 5.4, 5.4, 5.5, 5.6, 5.7, 5.9, 6.1, 6.…
## $ roads <dbl> 136.59537, 122.25179, 112.83224, 110.54508, 107.9809…
## $ cerebvas <int> 682, 647, 630, 611, 631, 592, 576, 525, 516, 493, 47…
## $ assault <int> 21, 19, 17, 18, 17, 16, 17, 17, 16, 15, 16, 15, 14, …
## $ external <int> 444, 425, 406, 376, 387, 371, 395, 385, 410, 409, 39…
## $ txp_pop <dbl> 0.9375916, 0.9257116, 0.9145470, 0.9056433, 0.896107…
## $ world <chr> "Liberal", "Liberal", "Liberal", "Liberal", "Liberal…
## $ opt <chr> "In", "In", "In", "In", "In", "In", "In", "In", "In"…
## $ consent_law <chr> "Informed", "Informed", "Informed", "Informed", "Inf…
## $ consent_practice <chr> "Informed", "Informed", "Informed", "Informed", "Inf…
## $ consistent <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Ye…
## $ ccode <chr> "Oz", "Oz", "Oz", "Oz", "Oz", "Oz", "Oz", "Oz", "Oz"…
Here is Healy’s first reasonable graph.
p <- ggplot(data = organdata,
mapping = aes(x = reorder(country, donors, na.rm=TRUE),
y = donors))
p + geom_boxplot() +
labs(x=NULL) +
coord_flip()
## Warning: Removed 34 rows containing non-finite values (stat_boxplot).
It is instructive to examine an ultra simple version of this graph to see how it was improved upon by what I call the first reasonable graph.
## Warning: Removed 34 rows containing non-finite values (stat_boxplot).
What was the big improvement?
Flipping
Reordering
This combination is described as a Cleveland plot.