Healy Chapter 5 Part 2

Harold Nelson

10/2/2018

Setup

## ── Attaching packages ──────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.0.0     ✔ purrr   0.2.5
## ✔ tibble  1.4.2     ✔ dplyr   0.7.6
## ✔ tidyr   0.8.1     ✔ stringr 1.3.1
## ✔ readr   1.1.1     ✔ forcats 0.3.0
## Warning: package 'dplyr' was built under R version 3.5.1
## ── Conflicts ─────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

Categorical –> Quantitative

We want to look at ways to viaualize a relationship where the explanatory variable is categorical and the response variable is continuous.

In this set of examples, we start by thinking of the number of organ donations as just a function of country and year. First glimpse the data.

glimpse(organdata)
## Observations: 238
## Variables: 21
## $ country          <chr> "Australia", "Australia", "Australia", "Austr...
## $ year             <date> NA, 1991-01-01, 1992-01-01, 1993-01-01, 1994...
## $ donors           <dbl> NA, 12.09, 12.35, 12.51, 10.25, 10.18, 10.59,...
## $ pop              <int> 17065, 17284, 17495, 17667, 17855, 18072, 183...
## $ pop_dens         <dbl> 0.2204433, 0.2232723, 0.2259980, 0.2282198, 0...
## $ gdp              <int> 16774, 17171, 17914, 18883, 19849, 21079, 219...
## $ gdp_lag          <int> 16591, 16774, 17171, 17914, 18883, 19849, 210...
## $ health           <dbl> 1300, 1379, 1455, 1540, 1626, 1737, 1846, 194...
## $ health_lag       <dbl> 1224, 1300, 1379, 1455, 1540, 1626, 1737, 184...
## $ pubhealth        <dbl> 4.8, 5.4, 5.4, 5.4, 5.4, 5.5, 5.6, 5.7, 5.9, ...
## $ roads            <dbl> 136.59537, 122.25179, 112.83224, 110.54508, 1...
## $ cerebvas         <int> 682, 647, 630, 611, 631, 592, 576, 525, 516, ...
## $ assault          <int> 21, 19, 17, 18, 17, 16, 17, 17, 16, 15, 16, 1...
## $ external         <int> 444, 425, 406, 376, 387, 371, 395, 385, 410, ...
## $ txp_pop          <dbl> 0.9375916, 0.9257116, 0.9145470, 0.9056433, 0...
## $ world            <chr> "Liberal", "Liberal", "Liberal", "Liberal", "...
## $ opt              <chr> "In", "In", "In", "In", "In", "In", "In", "In...
## $ consent_law      <chr> "Informed", "Informed", "Informed", "Informed...
## $ consent_practice <chr> "Informed", "Informed", "Informed", "Informed...
## $ consistent       <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Ye...
## $ ccode            <chr> "Oz", "Oz", "Oz", "Oz", "Oz", "Oz", "Oz", "Oz...

Here is Healy’s first reasonable graph.

p <- ggplot(data = organdata,
            mapping = aes(x = reorder(country, donors, na.rm=TRUE),
                          y = donors))
p + geom_boxplot() +
    labs(x=NULL) +
    coord_flip()
## Warning: Removed 34 rows containing non-finite values (stat_boxplot).

It is instructive to examine an ultra simple version of this graph to see how it was improved upon by what I call the first reasonable graph.

p <- ggplot(data = organdata,
            mapping = aes(x = country,y = donors))
p + geom_boxplot() 
## Warning: Removed 34 rows containing non-finite values (stat_boxplot).