Load Libraries

A code chunk:

Things to know about R

Everything has a name

Everything is an object

c(1, 2, 3, 1, 3, 5, 25)
## [1]  1  2  3  1  3  5 25
my_numbers <- c(1, 2, 3, 1, 3, 5, 25)

your_numbers <- c(5, 31, 71, 1, 3, 21, 6)
my_numbers
## [1]  1  2  3  1  3  5 25

You do things using functions

my_numbers
## [1]  1  2  3  1  3  5 25
mean()
mean(x = my_numbers)
## [1] 5.714286
mean(x = your_numbers)
## [1] 19.71429
mean(my_numbers)
## [1] 5.714286
my_summary <- summary(my_numbers)
my_summary
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.500   3.000   5.714   4.000  25.000

Functions are bundled into thematic packages, which you load with the library() command

Datasets can also be bundled into packages. Here we load the drat library to allow us to install two data packages: covdata and nycdogs.

library(drat)
addRepo("kjhealy")

install.packages(c("covdata", "nycdogs"))
## Installing packages into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
## Warning in install.packages(c("covdata", "nycdogs")): installation of package
## 'covdata' had non-zero exit status

You only have to install a package once, or whenever it is updated. But if you want to make use of it, you must load it (with library()) at the start of your R session. You can do this in the first chunk of your RMarkdown file, for example.

table(my_numbers)
## my_numbers
##  1  2  3  5 25 
##  2  1  2  1  1
sd(my_numbers)
## [1] 8.616153
my_numbers * 5
## [1]   5  10  15   5  15  25 125
my_numbers + 1
## [1]  2  3  4  2  4  6 26
my_numbers + my_numbers
## [1]  2  4  6  2  6 10 50

If you’re not sure what an object is, ask for its class

class(my_numbers)
## [1] "numeric"
class(my_summary)
## [1] "summaryDefault" "table"
class(summary)
## [1] "function"
my_new_vector <- c(my_numbers, "Apple")
my_new_vector
## [1] "1"     "2"     "3"     "1"     "3"     "5"     "25"    "Apple"
class(my_new_vector)
## [1] "character"
titanic
##       fate    sex    n percent
## 1 perished   male 1364    62.0
## 2 perished female  126     5.7
## 3 survived   male  367    16.7
## 4 survived female  344    15.6
class(titanic)
## [1] "data.frame"
titanic$percent
## [1] 62.0  5.7 16.7 15.6
titanic_tb <- as_tibble(titanic)
titanic_tb
## # A tibble: 4 × 4
##   fate     sex        n percent
##   <fct>    <fct>  <dbl>   <dbl>
## 1 perished male    1364    62  
## 2 perished female   126     5.7
## 3 survived male     367    16.7
## 4 survived female   344    15.6

To see inside an object, ask for its structure, or use RStudio’s object inspector

str(my_numbers)
##  num [1:7] 1 2 3 1 3 5 25
str(my_summary)
##  'summaryDefault' Named num [1:6] 1 1.5 3 5.71 4 ...
##  - attr(*, "names")= chr [1:6] "Min." "1st Qu." "Median" "Mean" ...

Be patient with R, and with yourself

Here are three very specific things to watch out for:

ggplot(data = mpg, aes(x = displ, y = hwy)) +
    geom_point()

and not this:

ggplot(data = mpg, aes(x = displ, y = hwy))
+ geom_point()

Get data into R

Remotely:

url <- "https://cdn.rawgit.com/kjhealy/viz-organdata/master/organdonation.csv"

organs <- read_csv(file = url)

Or locally:

organs <- read_csv(file = "data/organdonation.csv")
## Rows: 238 Columns: 21
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): country, world, opt, consent.law, consent.practice, consistent, ccode
## dbl (14): year, donors, pop, pop.dens, gdp, gdp.lag, health, health.lag, pub...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
organs
## # A tibble: 238 × 21
##    country  year donors   pop pop.dens   gdp gdp.lag health health.lag pubhealth
##    <chr>   <dbl>  <dbl> <dbl>    <dbl> <dbl>   <dbl>  <dbl>      <dbl>     <dbl>
##  1 Austra…    NA  NA    17065    0.220 16774   16591   1300       1224       4.8
##  2 Austra…  1991  12.1  17284    0.223 17171   16774   1379       1300       5.4
##  3 Austra…  1992  12.4  17495    0.226 17914   17171   1455       1379       5.4
##  4 Austra…  1993  12.5  17667    0.228 18883   17914   1540       1455       5.4
##  5 Austra…  1994  10.2  17855    0.231 19849   18883   1626       1540       5.4
##  6 Austra…  1995  10.2  18072    0.233 21079   19849   1737       1626       5.5
##  7 Austra…  1996  10.6  18311    0.237 21923   21079   1846       1737       5.6
##  8 Austra…  1997  10.3  18518    0.239 22961   21923   1948       1846       5.7
##  9 Austra…  1998  10.5  18711    0.242 24148   22961   2077       1948       5.9
## 10 Austra…  1999   8.67 18926    0.244 25445   24148   2231       2077       6.1
## # ℹ 228 more rows
## # ℹ 11 more variables: roads <dbl>, cerebvas <dbl>, assault <dbl>,
## #   external <dbl>, txp.pop <dbl>, world <chr>, opt <chr>, consent.law <chr>,
## #   consent.practice <chr>, consistent <chr>, ccode <chr>

Make your first figure

gapminder
## # A tibble: 1,704 × 6
##    country     continent  year lifeExp      pop gdpPercap
##    <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Afghanistan Asia       1952    28.8  8425333      779.
##  2 Afghanistan Asia       1957    30.3  9240934      821.
##  3 Afghanistan Asia       1962    32.0 10267083      853.
##  4 Afghanistan Asia       1967    34.0 11537966      836.
##  5 Afghanistan Asia       1972    36.1 13079460      740.
##  6 Afghanistan Asia       1977    38.4 14880372      786.
##  7 Afghanistan Asia       1982    39.9 12881816      978.
##  8 Afghanistan Asia       1987    40.8 13867957      852.
##  9 Afghanistan Asia       1992    41.7 16317921      649.
## 10 Afghanistan Asia       1997    41.8 22227415      635.
## # ℹ 1,694 more rows
p <- ggplot(data = gapminder,
            mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point() + 
  geom_smooth(mapping = aes(color = continent, fill = continent)) + 
  scale_x_log10(labels = scales::dollar)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Life expectancy plotted against GDP per capita for a large number of country-years.

Life expectancy plotted against GDP per capita for a large number of country-years.