knitr::opts_chunk$set(echo = F,
fig.align = "center",
warning = F,
message = F,
fig.height = 6,
fig.width = 8)
## Set the default size of figures
# knitr::opts_chunk$set(fig.width=8, fig.height=5)
## Load the libraries we will be using
pacman::p_load(gapminder, socviz, tidyverse)
# Changing the default theme to black/white instead of grey
theme_set(theme_bw())
# Display the organ data in the global environment
data(organdata)
help(organdata)
# Skimming the organdata
skimr::skim(organdata)
Name | organdata |
Number of rows | 238 |
Number of columns | 21 |
_______________________ | |
Column type frequency: | |
character | 7 |
Date | 1 |
numeric | 13 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
country | 0 | 1.00 | 5 | 14 | 0 | 17 | 0 |
world | 14 | 0.94 | 6 | 11 | 0 | 3 | 0 |
opt | 28 | 0.88 | 2 | 3 | 0 | 2 | 0 |
consent_law | 0 | 1.00 | 8 | 8 | 0 | 2 | 0 |
consent_practice | 0 | 1.00 | 8 | 8 | 0 | 2 | 0 |
consistent | 0 | 1.00 | 2 | 3 | 0 | 2 | 0 |
ccode | 0 | 1.00 | 2 | 4 | 0 | 17 | 0 |
Variable type: Date
skim_variable | n_missing | complete_rate | min | max | median | n_unique |
---|---|---|---|---|---|---|
year | 34 | 0.86 | 1991-01-01 | 2002-01-01 | 1996-07-02 | 12 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
donors | 34 | 0.86 | 16.48 | 5.11 | 5.20 | 13.00 | 15.10 | 19.60 | 33.90 | ▁▇▅▂▁ |
pop | 17 | 0.93 | 39921.29 | 62219.22 | 3514.00 | 6938.00 | 15531.00 | 57301.00 | 288369.00 | ▇▁▁▁▁ |
pop_dens | 17 | 0.93 | 12.00 | 11.09 | 0.22 | 1.94 | 9.49 | 19.11 | 38.89 | ▇▃▃▂▁ |
gdp | 17 | 0.93 | 22986.18 | 4665.92 | 12917.00 | 19546.00 | 22756.00 | 26180.00 | 36554.00 | ▂▇▇▃▁ |
gdp_lag | 0 | 1.00 | 22574.92 | 4790.71 | 11434.00 | 19034.25 | 22158.00 | 25886.50 | 36554.00 | ▂▇▇▃▁ |
health | 0 | 1.00 | 2073.75 | 733.59 | 791.00 | 1581.00 | 1956.00 | 2407.50 | 5665.00 | ▆▇▂▁▁ |
health_lag | 0 | 1.00 | 1972.99 | 699.24 | 727.00 | 1542.00 | 1850.50 | 2290.25 | 5267.00 | ▆▇▂▁▁ |
pubhealth | 21 | 0.91 | 6.19 | 0.92 | 4.30 | 5.50 | 6.00 | 6.90 | 8.80 | ▂▇▅▃▁ |
roads | 17 | 0.93 | 113.04 | 36.33 | 58.21 | 83.46 | 111.22 | 139.57 | 232.48 | ▇▇▆▂▁ |
cerebvas | 17 | 0.93 | 610.80 | 144.45 | 300.00 | 500.00 | 604.00 | 698.00 | 957.00 | ▂▅▇▃▂ |
assault | 17 | 0.93 | 16.53 | 17.33 | 4.00 | 9.00 | 11.00 | 16.00 | 103.00 | ▇▁▁▁▁ |
external | 17 | 0.93 | 450.06 | 118.19 | 258.00 | 367.00 | 421.00 | 534.00 | 853.00 | ▆▇▅▁▁ |
txp_pop | 17 | 0.93 | 0.72 | 0.20 | 0.22 | 0.63 | 0.71 | 0.83 | 1.12 | ▁▂▇▃▃ |
This section involves adding text to graphs using:
geom_text()
geom_text_repel()
annotate()
The aesthetic to add text to graphs is label =
, used in
all three of the functions!
Let’s create a scatterplot for the organdata with
x = roads_mean
and y = donor_mean
To start, create and save a data set that has the average road deaths and average organ donors by country and consent_law called by_country. Keep in mind that there are missing values in the roads and donors columns!
## # A tibble: 17 × 4
## country consent_law roads_mean donors_mean
## <chr> <chr> <dbl> <dbl>
## 1 Australia Informed 105. 10.6
## 2 Austria Presumed 150. 23.5
## 3 Belgium Presumed 155. 21.9
## 4 Canada Informed 109. 14.0
## 5 Denmark Informed 102. 13.1
## 6 Finland Presumed 93.6 18.4
## 7 France Presumed 156. 16.8
## 8 Germany Informed 113. 13.0
## 9 Ireland Informed 118. 19.8
## 10 Italy Presumed 122. 11.1
## 11 Netherlands Informed 76.1 13.7
## 12 Norway Presumed 70.0 15.4
## 13 Spain Presumed 161. 28.1
## 14 Sweden Presumed 72.3 13.1
## 15 Switzerland Presumed 96.4 14.2
## 16 United Kingdom Informed 67.9 13.5
## 17 United States Informed 155. 20.0
Use the by_country data set to create and save the scatterplot. Change the labels on the x and y-axis to be more descriptive!
Next add the country name next to each point in the graph by adding geom_text(mapping = aes(label = label_var))
If the points on a graph are dense, the text can overlap and become difficult to read.
The geom_text_repel()
function in ggrepel
is a solution to the overcrowding.
First install ggrepel
if you haven’t already and load
it. Next, copy the code for the graph in the previous codechunk, then
replace geom_text()
with geom_text_repel()
without setting hjust
or vjust
From the organdata, create a data set that only contains the latest year for each country in the data and name it organ_newest
The code below will create a data set named organ_newest that only contains the rows from the most recent year in the organdata data set:
## # A tibble: 17 × 21
## country year donors pop pop_dens gdp gdp_lag health health_lag
## <chr> <dbl> <dbl> <int> <dbl> <int> <int> <dbl> <dbl>
## 1 Australia 2002 10.5 19663 0.254 28168 27461 2629 2504
## 2 Austria 2002 23.8 8053 9.60 28842 28457 2220 2174
## 3 Belgium 2002 21.7 10333 31.2 27652 27113 2515 2441
## 4 Canada 2002 13 31414 0.315 30429 29235 2931 2743
## 5 Denmark 2002 12.7 5376 12.5 29228 29203 2580 2520
## 6 Finland 2002 17.1 5201 1.54 26616 26376 1943 1841
## 7 France 2002 20 59486 10.8 28094 27394 2736 2588
## 8 Germany 2002 12.2 82489 23.1 25843 25436 2830 2735
## 9 Italy 2002 18.1 57994 19.2 25569 25359 2166 2107
## 10 Ireland 2002 21 3932 5.60 32571 29703 2367 2059
## 11 Netherlands 2002 12.6 16149 38.9 28983 28756 2643 2455
## 12 Norway 2002 13.7 4538 1.40 35531 36554 3083 3258
## 13 Spain 2002 33.7 41874 8.28 21592 20864 1646 1567
## 14 Sweden 2002 11 8925 1.98 27255 26902 2517 2370
## 15 Switzerland 2002 10.4 7290 17.7 30725 30134 3445 3288
## 16 United Kingdom 2002 13 59232 24.4 27959 26720 2160 2012
## 17 United States 2002 21.5 288369 2.99 36006 35118 5267 4869
## # ℹ 12 more variables: pubhealth <dbl>, roads <dbl>, cerebvas <int>,
## # assault <int>, external <int>, txp_pop <dbl>, world <chr>, opt <chr>,
## # consent_law <chr>, consent_practice <chr>, consistent <chr>, ccode <chr>
Using the organ_newest data set:
x = gdp
y = health_mean
scale_
function to change the labels on
the x and y-axis to be in USDAdd geom_text_repel()
to the saved scatterplot, but
supply the data =
argument with the same data set
but only of countries with an average gdp above $32,000 or below
$24,000by using filter()
map the name of the outlier countries to their corresponding points on the scatterplot
The result should be the same as above but with the 3 dots should have Spain, Ireland, Norway, and United States next to them
annotate()
adds ink to the plot that doesn’t depend on
the data.
Need to specify which geom you want to use with
geom = "geom_name"
without geom_ and the remaining
arguments to use with the geom, all without using aes()
since it doesn’t depend on the data.
For example, if you want to add a red point to a plot at
x = 2
and y = 5
:
+ annotate(geom = "point",
x = 2,
y = 5,
color = "red")
Add an annotation to the plot below at (2, 43) that states “Two high
highway mpg”. Include hjust = 0
inside annotate().
If you want to add words to a plot, which geom should you use?
We’ll start by looking at election data of US presidential elections
## # A tibble: 49 × 19
## election year winner win_party ec_pct popular_pct popular_margin votes
## <int> <int> <chr> <chr> <dbl> <dbl> <dbl> <int>
## 1 10 1824 John Quinc… D.-R. 0.322 0.309 -0.104 1.13e5
## 2 11 1828 Andrew Jac… Dem. 0.682 0.559 0.122 6.43e5
## 3 12 1832 Andrew Jac… Dem. 0.766 0.547 0.178 7.03e5
## 4 13 1836 Martin Van… Dem. 0.578 0.508 0.142 7.63e5
## 5 14 1840 William He… Whig 0.796 0.529 0.0605 1.28e6
## 6 15 1844 James Polk Dem. 0.618 0.495 0.0145 1.34e6
## 7 16 1848 Zachary Ta… Whig 0.562 0.473 0.0479 1.36e6
## 8 17 1852 Franklin P… Dem. 0.858 0.508 0.0695 1.61e6
## 9 18 1856 James Buch… Dem. 0.588 0.453 0.122 1.84e6
## 10 19 1860 Abraham Li… Rep. 0.594 0.396 0.101 1.86e6
## # ℹ 39 more rows
## # ℹ 11 more variables: margin <int>, runner_up <chr>, ru_part <chr>,
## # turnout_pct <dbl>, winner_lname <chr>, winner_label <chr>, ru_lname <chr>,
## # ru_label <chr>, two_term <lgl>, ec_votes <dbl>, ec_denom <dbl>
## # A tibble: 49 × 3
## year winner win_party
## <int> <chr> <chr>
## 1 1824 John Quincy Adams D.-R.
## 2 1828 Andrew Jackson Dem.
## 3 1832 Andrew Jackson Dem.
## 4 1836 Martin Van Buren Dem.
## 5 1840 William Henry Harrison Whig
## 6 1844 James Polk Dem.
## 7 1848 Zachary Taylor Whig
## 8 1852 Franklin Pierce Dem.
## 9 1856 James Buchanan Dem.
## 10 1860 Abraham Lincoln Rep.
## # ℹ 39 more rows
Follow the comments in the codechunk below to create a graph of US
presidential elections by popular vote and electoral vote percentage
Now let’s make the graph a little fancier