install.packages("tidyverse", repos = "http://cran.us.r-project.org", dependencies = TRUE)
library(tidyverse)
What we do:
Review data structures in R
Learn a few implicit behaviors: coercion, recycling, and subsetting
Why we do:
r4ds starts with the most complex data structure, dataframes, and goes down to the basic data structures.
Many base R functions and packages work on vectors. Without understanding them, we may be frustrated occasionally.
Atomic vectors: elements are homogeneous (i.e., in the same type).
Lists: elements can be heterogeneous. Also called recursive vectors because they can contain lists as elements. Also,
NULL: a vector of length zero.
# check the type of atomic vectors
typeof(letters)
typeof(1:10)
# check the length of a list
x <- list("a", "b", 1:10)
length(x)
Augmented vectors: attributes, or metadata, are used to make basic vectors behave better.
Factors: built from integer vectors
Dates/date-times: built from nemeric vectors
Dataframes: built from lists
No2. In the second variant of rescale01(), infinite values are left unchanged. Rewrite rescale01() so that -Inf is mapped to 0, and Inf is mapped to 1. (page 273) (Hint: experiment with is.finite(), is.infinite(), is.na, or is.nan().)
# the original function
rescale01 <- function(x) {
rng <- range(x, na.rm = TRUE)
(x - rng[1]) / (rng[2] - rng[1])
}
# applied only to finite numbers
rescale01 <- function(x) {
rng <- range(x, na.rm = TRUE, finite = TRUE)
(x - rng[1]) / (rng[2] - rng[1])
}
When types are mixed in a vector, the most complex type is assigned. type is a property of the complete vector, not of its individual elements.
typeof(c(TRUE, 1L)) # logical + integer
typeof(c(1L, 1.5)) # integer + double
typeof(c(1.5, "a")) # double + character
When calculating two vectors, R makes shorter vectors longer by repeating its values.
1:10 + 1:2
1:10 + 1:3 # when the longer vector is not an integer multiple of the shorter vector
This is implicit, or dangerous, so tibble returns an error for this case.
tibble(x = 1:4, y = 1:2)
# instead, you need to explicitly repeat elements.
tibble(x = 1:4, y = rep(1:2, 2))
tibble(x = 1:4, y = rep(1:2, each = 2))
Four ways of extracting some part of your vector.
# with a integer vector
x <- c("one", "two", "three", "four", "five")
x[c(3, 2, 5)] # select
x[c(1, 1, 5, 5, 5, 2)] # make a longer vector than an input
x[c(-1, -3, -5)] # drop
x[c(1, -1)] # don't mix positive and negative integers
x[0] # not super useful
# with a logical vector
x <- c(10, 3, NA, 5, 8, 1, NA)
x[!is.na(x)]
x[x %% 2 == 0]
# with a character vector, if your vector has names for elements.
x <- c(abc = 1, def = 2, xyz = 5)
x[c("xyz", "def")]
# how to assign names to a vector? two ways
c(x = 1, y = 2, z =3)
set_names(1:3, c("a", "b", "c"))
# with []
x[] # select all elements
#if x is 2D
x[1, ] # first row and all the columns
x[, -1] # all rows except the last column
No4. Create functions that take a vector as input and returns:
The last value. Should you use [ or [[?
The elements at even numbered positions.
Every element except the last value.
Only even numbers (and no missing values).
Lists can contain other lists as elements. This makes lists suitable for hierarchical or tree-like structures.
x <- list(1, 2, 3)
x
str(x)
x_named <- list(a = 1, b = 2, c = 3)
str(x_named)
y <- list("a", 1L, 1.5, TRUE)
str(y)
z <- list(list(1, 2), list(3, 4))
str(z)
x1 <- list(c(1,2), c(3,4))
x2 <- list(list(1,2), list(3,4))
x3 <- list(1, list(2, list(3)))
Drawing rules:
Round vs. square corners
Light vs. dark colors
a <- list(a = 1:3, b = "a string", c = pi, d = list(-1, -5))
# [ extracts a sublist
str(a[1:2])
str(a[4])
# [[ extracts a single compoenet from a list
str(a[[1]])
str(a[[2]])
str(a[[3]])
str(a[[4]])
# $ is a shorthand for extrancting named elements of a list
a$a
a[["a"]]
Vectors with additional attributes (i.e., additional metadata attached to vectors), including class (i.e., controls how genertic functions work).
That is, augmented vectors behave differently to the atomic vector on which they are built.
x <- factor(c("ab", "cd", "ab"), levels = c("ab", "cd", "ef"))
typeof(x)
attributes(x)
Skip
Tibbels/data frames have vectors with the same length, while lists do not always.
# compare the following two (almost the same)
tb <- tibble::tibble(x = 1:5, y = 5:1)
typeof(tb)
attributes(tb)
df <- data.frame(x = 1:5, y = 5:1)
typeof(df)
attributes(df)