Get Started

Load Libraries

A code chunk:

Things to know about R

Everything has a name

Everything is an object

c(1, 2, 3, 1, 3, 5, 25)

## [1]  1  2  3  1  3  5 25

my_numbers <- c(1, 2, 3, 1, 3, 5, 25)

your_numbers <- c(5, 31, 71, 1, 3, 21, 6)

my_numbers

## [1]  1  2  3  1  3  5 25

You do things using functions

my_numbers

## [1]  1  2  3  1  3  5 25

mean()

mean(x = my_numbers)

## [1] 5.714286

mean(x = your_numbers)

## [1] 19.71429

mean(my_numbers)

## [1] 5.714286

my_summary <- summary(my_numbers)

my_summary

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.500   3.000   5.714   4.000  25.000

Functions are bundled into thematic packages, which you load with the `library()` command

Datasets can also be bundled into packages. Here we load the drat library to allow us to install two data packages: covdata and nycdogs.

library(drat)
addRepo("kjhealy")

install.packages(c("covdata", "nycdogs"))

## Installing packages into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)

## Warning in install.packages(c("covdata", "nycdogs")): installation of package
## 'covdata' had non-zero exit status

You only have to install a package once, or whenever it is updated. But if you want to make use of it, you must load it (with library()) at the start of your R session. You can do this in the first chunk of your RMarkdown file, for example.

table(my_numbers)

## my_numbers
##  1  2  3  5 25 
##  2  1  2  1  1

sd(my_numbers)

## [1] 8.616153

my_numbers * 5

## [1]   5  10  15   5  15  25 125

my_numbers + 1

## [1]  2  3  4  2  4  6 26

my_numbers + my_numbers

## [1]  2  4  6  2  6 10 50

If you’re not sure what an object is, ask for its class

class(my_numbers)

## [1] "numeric"

class(my_summary)

## [1] "summaryDefault" "table"

class(summary)

## [1] "function"

my_new_vector <- c(my_numbers, "Apple")
my_new_vector

## [1] "1"     "2"     "3"     "1"     "3"     "5"     "25"    "Apple"

class(my_new_vector)

## [1] "character"

titanic

##       fate    sex    n percent
## 1 perished   male 1364    62.0
## 2 perished female  126     5.7
## 3 survived   male  367    16.7
## 4 survived female  344    15.6

class(titanic)

## [1] "data.frame"

titanic$percent

## [1] 62.0  5.7 16.7 15.6

titanic_tb <- as_tibble(titanic)
titanic_tb

## # A tibble: 4 × 4
##   fate     sex        n percent
##   <fct>    <fct>  <dbl>   <dbl>
## 1 perished male    1364    62  
## 2 perished female   126     5.7
## 3 survived male     367    16.7
## 4 survived female   344    15.6

To see inside an object, ask for its structure, or use RStudio’s object inspector

str(my_numbers)

##  num [1:7] 1 2 3 1 3 5 25

str(my_summary)

##  'summaryDefault' Named num [1:6] 1 1.5 3 5.71 4 ...
##  - attr(*, "names")= chr [1:6] "Min." "1st Qu." "Median" "Mean" ...

Be patient with R, and with yourself

Here are three very specific things to watch out for:

Make sure parentheses are balanced and that every opening “(” has a corresponding closing “)”.
Make sure you complete your expressions. If you think you have completed typing your code, but instead of seeing the > command prompt at the console you see the + character instead, that may mean R thinks you haven’t written a complete expression yet. You can hit Esc or Ctrl-C to force your way back to the console and try typing your code again.
In ggplot specifically, as you will see, we will build up plots a piece at a time by adding expressions to one another. When doing this, make sure your + character goes at the end of the line, and not the beginning. That is, write this:

ggplot(data = mpg, aes(x = displ, y = hwy)) +
    geom_point()

and not this:

ggplot(data = mpg, aes(x = displ, y = hwy))
+ geom_point()

Get data into R

Remotely:

url <- "https://cdn.rawgit.com/kjhealy/viz-organdata/master/organdonation.csv"

organs <- read_csv(file = url)

Or locally:

organs <- read_csv(file = "data/organdonation.csv")

## Rows: 238 Columns: 21
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): country, world, opt, consent.law, consent.practice, consistent, ccode
## dbl (14): year, donors, pop, pop.dens, gdp, gdp.lag, health, health.lag, pub...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

organs

## # A tibble: 238 × 21
##    country  year donors   pop pop.dens   gdp gdp.lag health health.lag pubhealth
##    <chr>   <dbl>  <dbl> <dbl>    <dbl> <dbl>   <dbl>  <dbl>      <dbl>     <dbl>
##  1 Austra…    NA  NA    17065    0.220 16774   16591   1300       1224       4.8
##  2 Austra…  1991  12.1  17284    0.223 17171   16774   1379       1300       5.4
##  3 Austra…  1992  12.4  17495    0.226 17914   17171   1455       1379       5.4
##  4 Austra…  1993  12.5  17667    0.228 18883   17914   1540       1455       5.4
##  5 Austra…  1994  10.2  17855    0.231 19849   18883   1626       1540       5.4
##  6 Austra…  1995  10.2  18072    0.233 21079   19849   1737       1626       5.5
##  7 Austra…  1996  10.6  18311    0.237 21923   21079   1846       1737       5.6
##  8 Austra…  1997  10.3  18518    0.239 22961   21923   1948       1846       5.7
##  9 Austra…  1998  10.5  18711    0.242 24148   22961   2077       1948       5.9
## 10 Austra…  1999   8.67 18926    0.244 25445   24148   2231       2077       6.1
## # ℹ 228 more rows
## # ℹ 11 more variables: roads <dbl>, cerebvas <dbl>, assault <dbl>,
## #   external <dbl>, txp.pop <dbl>, world <chr>, opt <chr>, consent.law <chr>,
## #   consent.practice <chr>, consistent <chr>, ccode <chr>

Make your first figure

gapminder

## # A tibble: 1,704 × 6
##    country     continent  year lifeExp      pop gdpPercap
##    <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Afghanistan Asia       1952    28.8  8425333      779.
##  2 Afghanistan Asia       1957    30.3  9240934      821.
##  3 Afghanistan Asia       1962    32.0 10267083      853.
##  4 Afghanistan Asia       1967    34.0 11537966      836.
##  5 Afghanistan Asia       1972    36.1 13079460      740.
##  6 Afghanistan Asia       1977    38.4 14880372      786.
##  7 Afghanistan Asia       1982    39.9 12881816      978.
##  8 Afghanistan Asia       1987    40.8 13867957      852.
##  9 Afghanistan Asia       1992    41.7 16317921      649.
## 10 Afghanistan Asia       1997    41.8 22227415      635.
## # ℹ 1,694 more rows

p <- ggplot(data = gapminder,
            mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point() + 
  geom_smooth(mapping = aes(color = continent, fill = continent)) + 
  scale_x_log10(labels = scales::dollar)

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Life expectancy plotted against GDP per capita for a large number of country-years.