A code chunk:
c(1, 2, 3, 1, 3, 5, 25)
## [1] 1 2 3 1 3 5 25
my_numbers <- c(1, 2, 3, 1, 3, 5, 25)
your_numbers <- c(5, 31, 71, 1, 3, 21, 6)
my_numbers
## [1] 1 2 3 1 3 5 25
my_numbers
## [1] 1 2 3 1 3 5 25
mean()
mean(x = my_numbers)
## [1] 5.714286
mean(x = your_numbers)
## [1] 19.71429
mean(my_numbers)
## [1] 5.714286
my_summary <- summary(my_numbers)
my_summary
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.500 3.000 5.714 4.000 25.000
library() commandDatasets can also be bundled into packages. Here we load the
drat library to allow us to install two data packages: covdata and nycdogs.
library(drat)
addRepo("kjhealy")
install.packages(c("covdata", "nycdogs"))
## Installing packages into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
## Warning in install.packages(c("covdata", "nycdogs")): installation of package
## 'covdata' had non-zero exit status
You only have to install a package once, or whenever it is
updated. But if you want to make use of it, you must load it
(with library()) at the start of your R session. You can do
this in the first chunk of your RMarkdown file, for example.
table(my_numbers)
## my_numbers
## 1 2 3 5 25
## 2 1 2 1 1
sd(my_numbers)
## [1] 8.616153
my_numbers * 5
## [1] 5 10 15 5 15 25 125
my_numbers + 1
## [1] 2 3 4 2 4 6 26
my_numbers + my_numbers
## [1] 2 4 6 2 6 10 50
class(my_numbers)
## [1] "numeric"
class(my_summary)
## [1] "summaryDefault" "table"
class(summary)
## [1] "function"
my_new_vector <- c(my_numbers, "Apple")
my_new_vector
## [1] "1" "2" "3" "1" "3" "5" "25" "Apple"
class(my_new_vector)
## [1] "character"
titanic
## fate sex n percent
## 1 perished male 1364 62.0
## 2 perished female 126 5.7
## 3 survived male 367 16.7
## 4 survived female 344 15.6
class(titanic)
## [1] "data.frame"
titanic$percent
## [1] 62.0 5.7 16.7 15.6
titanic_tb <- as_tibble(titanic)
titanic_tb
## # A tibble: 4 × 4
## fate sex n percent
## <fct> <fct> <dbl> <dbl>
## 1 perished male 1364 62
## 2 perished female 126 5.7
## 3 survived male 367 16.7
## 4 survived female 344 15.6
str(my_numbers)
## num [1:7] 1 2 3 1 3 5 25
str(my_summary)
## 'summaryDefault' Named num [1:6] 1 1.5 3 5.71 4 ...
## - attr(*, "names")= chr [1:6] "Min." "1st Qu." "Median" "Mean" ...
Here are three very specific things to watch out for:
(” has a corresponding closing “)”.>
command prompt at the console you see the + character
instead, that may mean R thinks you haven’t written a complete
expression yet. You can hit Esc or Ctrl-C to
force your way back to the console and try typing your code again.+ character goes at the end of the line, and
not the beginning. That is, write this:ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point()
and not this:
ggplot(data = mpg, aes(x = displ, y = hwy))
+ geom_point()
Remotely:
url <- "https://cdn.rawgit.com/kjhealy/viz-organdata/master/organdonation.csv"
organs <- read_csv(file = url)
Or locally:
organs <- read_csv(file = "data/organdonation.csv")
## Rows: 238 Columns: 21
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): country, world, opt, consent.law, consent.practice, consistent, ccode
## dbl (14): year, donors, pop, pop.dens, gdp, gdp.lag, health, health.lag, pub...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
organs
## # A tibble: 238 × 21
## country year donors pop pop.dens gdp gdp.lag health health.lag pubhealth
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Austra… NA NA 17065 0.220 16774 16591 1300 1224 4.8
## 2 Austra… 1991 12.1 17284 0.223 17171 16774 1379 1300 5.4
## 3 Austra… 1992 12.4 17495 0.226 17914 17171 1455 1379 5.4
## 4 Austra… 1993 12.5 17667 0.228 18883 17914 1540 1455 5.4
## 5 Austra… 1994 10.2 17855 0.231 19849 18883 1626 1540 5.4
## 6 Austra… 1995 10.2 18072 0.233 21079 19849 1737 1626 5.5
## 7 Austra… 1996 10.6 18311 0.237 21923 21079 1846 1737 5.6
## 8 Austra… 1997 10.3 18518 0.239 22961 21923 1948 1846 5.7
## 9 Austra… 1998 10.5 18711 0.242 24148 22961 2077 1948 5.9
## 10 Austra… 1999 8.67 18926 0.244 25445 24148 2231 2077 6.1
## # ℹ 228 more rows
## # ℹ 11 more variables: roads <dbl>, cerebvas <dbl>, assault <dbl>,
## # external <dbl>, txp.pop <dbl>, world <chr>, opt <chr>, consent.law <chr>,
## # consent.practice <chr>, consistent <chr>, ccode <chr>
gapminder
## # A tibble: 1,704 × 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
## 7 Afghanistan Asia 1982 39.9 12881816 978.
## 8 Afghanistan Asia 1987 40.8 13867957 852.
## 9 Afghanistan Asia 1992 41.7 16317921 649.
## 10 Afghanistan Asia 1997 41.8 22227415 635.
## # ℹ 1,694 more rows
p <- ggplot(data = gapminder,
mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point() +
geom_smooth(mapping = aes(color = continent, fill = continent)) +
scale_x_log10(labels = scales::dollar)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Life expectancy plotted against GDP per capita for a large number of country-years.