Set Up Your Project and Load Libraries

knitr::opts_chunk$set(echo = F,
                      fig.align = "center",
                      warning = F,
                      message = F,
                      fig.height = 6,
                      fig.width = 8) 

## Set the default size of figures
# knitr::opts_chunk$set(fig.width=8, fig.height=5)  

## Load the libraries we will be using
pacman::p_load(gapminder, socviz, tidyverse)

# Changing the default theme to black/white instead of grey
theme_set(theme_bw())

# Display the organ data in the global environment
data(organdata)
help(organdata)

# Skimming the organdata
skimr::skim(organdata)
Data summary
Name organdata
Number of rows 238
Number of columns 21
_______________________
Column type frequency:
character 7
Date 1
numeric 13
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
country 0 1.00 5 14 0 17 0
world 14 0.94 6 11 0 3 0
opt 28 0.88 2 3 0 2 0
consent_law 0 1.00 8 8 0 2 0
consent_practice 0 1.00 8 8 0 2 0
consistent 0 1.00 2 3 0 2 0
ccode 0 1.00 2 4 0 17 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
year 34 0.86 1991-01-01 2002-01-01 1996-07-02 12

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
donors 34 0.86 16.48 5.11 5.20 13.00 15.10 19.60 33.90 ▁▇▅▂▁
pop 17 0.93 39921.29 62219.22 3514.00 6938.00 15531.00 57301.00 288369.00 ▇▁▁▁▁
pop_dens 17 0.93 12.00 11.09 0.22 1.94 9.49 19.11 38.89 ▇▃▃▂▁
gdp 17 0.93 22986.18 4665.92 12917.00 19546.00 22756.00 26180.00 36554.00 ▂▇▇▃▁
gdp_lag 0 1.00 22574.92 4790.71 11434.00 19034.25 22158.00 25886.50 36554.00 ▂▇▇▃▁
health 0 1.00 2073.75 733.59 791.00 1581.00 1956.00 2407.50 5665.00 ▆▇▂▁▁
health_lag 0 1.00 1972.99 699.24 727.00 1542.00 1850.50 2290.25 5267.00 ▆▇▂▁▁
pubhealth 21 0.91 6.19 0.92 4.30 5.50 6.00 6.90 8.80 ▂▇▅▃▁
roads 17 0.93 113.04 36.33 58.21 83.46 111.22 139.57 232.48 ▇▇▆▂▁
cerebvas 17 0.93 610.80 144.45 300.00 500.00 604.00 698.00 957.00 ▂▅▇▃▂
assault 17 0.93 16.53 17.33 4.00 9.00 11.00 16.00 103.00 ▇▁▁▁▁
external 17 0.93 450.06 118.19 258.00 367.00 421.00 534.00 853.00 ▆▇▅▁▁
txp_pop 17 0.93 0.72 0.20 0.22 0.63 0.71 0.83 1.12 ▁▂▇▃▃

Adding text to graphs

This section involves adding text to graphs using:

The aesthetic to add text to graphs is label =, used in all three of the functions!

Organ Data by Country

Let’s create a scatterplot for the organdata with x = roads_mean and y = donor_mean

To start, create and save a data set that has the average road deaths and average organ donors by country and consent_law called by_country. Keep in mind that there are missing values in the roads and donors columns!

## # A tibble: 17 × 4
##    country        consent_law roads_mean donors_mean
##    <chr>          <chr>            <dbl>       <dbl>
##  1 Australia      Informed         105.         10.6
##  2 Austria        Presumed         150.         23.5
##  3 Belgium        Presumed         155.         21.9
##  4 Canada         Informed         109.         14.0
##  5 Denmark        Informed         102.         13.1
##  6 Finland        Presumed          93.6        18.4
##  7 France         Presumed         156.         16.8
##  8 Germany        Informed         113.         13.0
##  9 Ireland        Informed         118.         19.8
## 10 Italy          Presumed         122.         11.1
## 11 Netherlands    Informed          76.1        13.7
## 12 Norway         Presumed          70.0        15.4
## 13 Spain          Presumed         161.         28.1
## 14 Sweden         Presumed          72.3        13.1
## 15 Switzerland    Presumed          96.4        14.2
## 16 United Kingdom Informed          67.9        13.5
## 17 United States  Informed         155.         20.0

Use the by_country data set to create and save the scatterplot. Change the labels on the x and y-axis to be more descriptive!

Next add the country name next to each point in the graph by adding geom_text(mapping = aes(label = label_var))

GGrepel: When Text gets Crowded

If the points on a graph are dense, the text can overlap and become difficult to read.

The geom_text_repel() function in ggrepel is a solution to the overcrowding.

First install ggrepel if you haven’t already and load it. Next, copy the code for the graph in the previous codechunk, then replace geom_text() with geom_text_repel() without setting hjust or vjust

Labeling Particular values, e.g., outliers

From the organdata, create a data set that only contains the latest year for each country in the data and name it organ_newest

The code below will create a data set named organ_newest that only contains the rows from the most recent year in the organdata data set:

## # A tibble: 17 × 21
##    country         year donors    pop pop_dens   gdp gdp_lag health health_lag
##    <chr>          <dbl>  <dbl>  <int>    <dbl> <int>   <int>  <dbl>      <dbl>
##  1 Australia       2002   10.5  19663    0.254 28168   27461   2629       2504
##  2 Austria         2002   23.8   8053    9.60  28842   28457   2220       2174
##  3 Belgium         2002   21.7  10333   31.2   27652   27113   2515       2441
##  4 Canada          2002   13    31414    0.315 30429   29235   2931       2743
##  5 Denmark         2002   12.7   5376   12.5   29228   29203   2580       2520
##  6 Finland         2002   17.1   5201    1.54  26616   26376   1943       1841
##  7 France          2002   20    59486   10.8   28094   27394   2736       2588
##  8 Germany         2002   12.2  82489   23.1   25843   25436   2830       2735
##  9 Italy           2002   18.1  57994   19.2   25569   25359   2166       2107
## 10 Ireland         2002   21     3932    5.60  32571   29703   2367       2059
## 11 Netherlands     2002   12.6  16149   38.9   28983   28756   2643       2455
## 12 Norway          2002   13.7   4538    1.40  35531   36554   3083       3258
## 13 Spain           2002   33.7  41874    8.28  21592   20864   1646       1567
## 14 Sweden          2002   11     8925    1.98  27255   26902   2517       2370
## 15 Switzerland     2002   10.4   7290   17.7   30725   30134   3445       3288
## 16 United Kingdom  2002   13    59232   24.4   27959   26720   2160       2012
## 17 United States   2002   21.5 288369    2.99  36006   35118   5267       4869
## # ℹ 12 more variables: pubhealth <dbl>, roads <dbl>, cerebvas <int>,
## #   assault <int>, external <int>, txp_pop <dbl>, world <chr>, opt <chr>,
## #   consent_law <chr>, consent_practice <chr>, consistent <chr>, ccode <chr>

Using the organ_newest data set:

  1. create and SAVE a scatterplot below named gg_gdp_health with
  1. add the correct scale_ function to change the labels on the x and y-axis to be in USD

Add geom_text_repel() to the saved scatterplot, but

  1. supply the data = argument with the same data set but only of countries with an average gdp above $32,000 or below $24,000by using filter()

  2. map the name of the outlier countries to their corresponding points on the scatterplot

The result should be the same as above but with the 3 dots should have Spain, Ireland, Norway, and United States next to them

Using annotate

annotate() adds ink to the plot that doesn’t depend on the data.

Need to specify which geom you want to use with geom = "geom_name" without geom_ and the remaining arguments to use with the geom, all without using aes() since it doesn’t depend on the data.

For example, if you want to add a red point to a plot at x = 2 and y = 5:

+ annotate(geom = "point", x = 2, y = 5, color = "red")

Add an annotation to the plot below at (2, 43) that states “Two high highway mpg”. Include hjust = 0 inside annotate(). If you want to add words to a plot, which geom should you use?

Fancy Example with annotate

Election Data

We’ll start by looking at election data of US presidential elections

## # A tibble: 49 × 19
##    election  year winner      win_party ec_pct popular_pct popular_margin  votes
##       <int> <int> <chr>       <chr>      <dbl>       <dbl>          <dbl>  <int>
##  1       10  1824 John Quinc… D.-R.      0.322       0.309        -0.104  1.13e5
##  2       11  1828 Andrew Jac… Dem.       0.682       0.559         0.122  6.43e5
##  3       12  1832 Andrew Jac… Dem.       0.766       0.547         0.178  7.03e5
##  4       13  1836 Martin Van… Dem.       0.578       0.508         0.142  7.63e5
##  5       14  1840 William He… Whig       0.796       0.529         0.0605 1.28e6
##  6       15  1844 James Polk  Dem.       0.618       0.495         0.0145 1.34e6
##  7       16  1848 Zachary Ta… Whig       0.562       0.473         0.0479 1.36e6
##  8       17  1852 Franklin P… Dem.       0.858       0.508         0.0695 1.61e6
##  9       18  1856 James Buch… Dem.       0.588       0.453         0.122  1.84e6
## 10       19  1860 Abraham Li… Rep.       0.594       0.396         0.101  1.86e6
## # ℹ 39 more rows
## # ℹ 11 more variables: margin <int>, runner_up <chr>, ru_part <chr>,
## #   turnout_pct <dbl>, winner_lname <chr>, winner_label <chr>, ru_lname <chr>,
## #   ru_label <chr>, two_term <lgl>, ec_votes <dbl>, ec_denom <dbl>
## # A tibble: 49 × 3
##     year winner                 win_party
##    <int> <chr>                  <chr>    
##  1  1824 John Quincy Adams      D.-R.    
##  2  1828 Andrew Jackson         Dem.     
##  3  1832 Andrew Jackson         Dem.     
##  4  1836 Martin Van Buren       Dem.     
##  5  1840 William Henry Harrison Whig     
##  6  1844 James Polk             Dem.     
##  7  1848 Zachary Taylor         Whig     
##  8  1852 Franklin Pierce        Dem.     
##  9  1856 James Buchanan         Dem.     
## 10  1860 Abraham Lincoln        Rep.     
## # ℹ 39 more rows

Fancy graph, with labels

Follow the comments in the codechunk below to create a graph of US presidential elections by popular vote and electoral vote percentage

Now let’s make the graph a little fancier