One Categorical and One Quantitative

Harold Nelson

2/3/2021

Setup

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.0     ✓ purrr   0.3.4
## ✓ tibble  3.0.5     ✓ dplyr   1.0.3
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Categorical –> Quantitative

We want to look at ways to viaualize a relationship where the explanatory variable is categorical and the response variable is continuous.

In this set of examples, we start by thinking of the number of organ donations as just a function of country and year. First glimpse the data.

glimpse(organdata)
## Rows: 238
## Columns: 21
## $ country          <chr> "Australia", "Australia", "Australia", "Australia", …
## $ year             <date> NA, 1991-01-01, 1992-01-01, 1993-01-01, 1994-01-01,…
## $ donors           <dbl> NA, 12.09, 12.35, 12.51, 10.25, 10.18, 10.59, 10.26,…
## $ pop              <int> 17065, 17284, 17495, 17667, 17855, 18072, 18311, 185…
## $ pop_dens         <dbl> 0.2204433, 0.2232723, 0.2259980, 0.2282198, 0.230648…
## $ gdp              <int> 16774, 17171, 17914, 18883, 19849, 21079, 21923, 229…
## $ gdp_lag          <int> 16591, 16774, 17171, 17914, 18883, 19849, 21079, 219…
## $ health           <dbl> 1300, 1379, 1455, 1540, 1626, 1737, 1846, 1948, 2077…
## $ health_lag       <dbl> 1224, 1300, 1379, 1455, 1540, 1626, 1737, 1846, 1948…
## $ pubhealth        <dbl> 4.8, 5.4, 5.4, 5.4, 5.4, 5.5, 5.6, 5.7, 5.9, 6.1, 6.…
## $ roads            <dbl> 136.59537, 122.25179, 112.83224, 110.54508, 107.9809…
## $ cerebvas         <int> 682, 647, 630, 611, 631, 592, 576, 525, 516, 493, 47…
## $ assault          <int> 21, 19, 17, 18, 17, 16, 17, 17, 16, 15, 16, 15, 14, …
## $ external         <int> 444, 425, 406, 376, 387, 371, 395, 385, 410, 409, 39…
## $ txp_pop          <dbl> 0.9375916, 0.9257116, 0.9145470, 0.9056433, 0.896107…
## $ world            <chr> "Liberal", "Liberal", "Liberal", "Liberal", "Liberal…
## $ opt              <chr> "In", "In", "In", "In", "In", "In", "In", "In", "In"…
## $ consent_law      <chr> "Informed", "Informed", "Informed", "Informed", "Inf…
## $ consent_practice <chr> "Informed", "Informed", "Informed", "Informed", "Inf…
## $ consistent       <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Ye…
## $ ccode            <chr> "Oz", "Oz", "Oz", "Oz", "Oz", "Oz", "Oz", "Oz", "Oz"…

Here is Healy’s first reasonable graph.

p <- ggplot(data = organdata,
            mapping = aes(x = reorder(country, donors, na.rm=TRUE),
                          y = donors))
p + geom_boxplot() +
    labs(x=NULL) +
    coord_flip()
## Warning: Removed 34 rows containing non-finite values (stat_boxplot).

It is instructive to examine an ultra simple version of this graph to see how it was improved upon by what I call the first reasonable graph.

p <- ggplot(data = organdata,
            mapping = aes(x = country,y = donors))
p + geom_boxplot() 
## Warning: Removed 34 rows containing non-finite values (stat_boxplot).

Question

What was the big improvement?

Answer

Flipping
Reordering

This combination is described as a Cleveland plot.