Disclaimer: The contents of this document come from Chapter 16. Vectors of R for Data Science (Wickham & Grolemund, 2017). This document is prepared for CP6521 Advanced GIS, a graduate-level city planning elective course at Georgia Tech in Spring 2019. For any question, contact the instructor, Yongsung Lee, Ph.D. via yongsung.lee(at)gatech.edu.
This document is also published on RPubs.
install.packages("tidyverse", repos = "http://cran.us.r-project.org", dependencies = TRUE)
library(tidyverse)

1. Intro

What we do:

  1. Review data structures in R

  2. Learn a few implicit behaviors: coercion, recycling, and subsetting

Why we do:

  1. r4ds starts with the most complex data structure, dataframes, and goes down to the basic data structures.

  2. Many base R functions and packages work on vectors. Without understanding them, we may be frustrated occasionally.

2. Vector Basics

  1. Atomic vectors: elements are homogeneous (i.e., in the same type).

  2. Lists: elements can be heterogeneous. Also called recursive vectors because they can contain lists as elements. Also,

  3. NULL: a vector of length zero.

# check the type of atomic vectors 
typeof(letters)
typeof(1:10)

# check the length of a list 
x <- list("a", "b", 1:10)
length(x)
  1. Augmented vectors: attributes, or metadata, are used to make basic vectors behave better.

    Factors: built from integer vectors

    Dates/date-times: built from nemeric vectors

    Dataframes: built from lists

3. Atomic Vectors

Exercises

No2. In the second variant of rescale01(), infinite values are left unchanged. Rewrite rescale01() so that -Inf is mapped to 0, and Inf is mapped to 1. (page 273) (Hint: experiment with is.finite(), is.infinite(), is.na, or is.nan().)

# the original function 
rescale01 <- function(x) {
  rng <- range(x, na.rm = TRUE)
  (x - rng[1]) / (rng[2] - rng[1])
}

# applied only to finite numbers 
rescale01 <- function(x) {
  rng <- range(x, na.rm = TRUE, finite = TRUE)
  (x - rng[1]) / (rng[2] - rng[1])
}
Coercion

When types are mixed in a vector, the most complex type is assigned. type is a property of the complete vector, not of its individual elements.

typeof(c(TRUE, 1L)) # logical + integer
typeof(c(1L, 1.5)) # integer + double 
typeof(c(1.5, "a")) # double + character
Recycling

When calculating two vectors, R makes shorter vectors longer by repeating its values.

1:10 + 1:2 
1:10 + 1:3 # when the longer vector is not an integer multiple of the shorter vector 

This is implicit, or dangerous, so tibble returns an error for this case.

tibble(x = 1:4, y = 1:2)
# instead, you need to explicitly repeat elements. 
tibble(x = 1:4, y = rep(1:2, 2))
tibble(x = 1:4, y = rep(1:2, each = 2))
Subsetting

Four ways of extracting some part of your vector.

# with a integer vector 
x <- c("one", "two", "three", "four", "five")
x[c(3, 2, 5)] # select 
x[c(1, 1, 5, 5, 5, 2)] # make a longer vector than an input
x[c(-1, -3, -5)] # drop 
x[c(1, -1)] # don't mix positive and negative integers 
x[0] # not super useful 
# with a logical vector
x <- c(10, 3, NA, 5, 8, 1, NA) 
x[!is.na(x)]
x[x %% 2 == 0]
# with a character vector, if your vector has names for elements. 
x <- c(abc = 1, def = 2, xyz = 5) 
x[c("xyz", "def")]

# how to assign names to a vector? two ways 
c(x = 1, y = 2, z =3) 
set_names(1:3, c("a", "b", "c"))
# with []
x[] # select all elements 
#if x is 2D
x[1, ] # first row and all the columns 
x[, -1] # all rows except the last column 
Exercises

No4. Create functions that take a vector as input and returns:

  1. The last value. Should you use [ or [[?

  2. The elements at even numbered positions.

  3. Every element except the last value.

  4. Only even numbers (and no missing values).

4. Recursive Vectors (Lists)

Lists can contain other lists as elements. This makes lists suitable for hierarchical or tree-like structures.

x <- list(1, 2, 3)
x
str(x)
x_named <- list(a = 1, b = 2, c = 3) 
str(x_named)
y <- list("a", 1L, 1.5, TRUE)
str(y) 
z <- list(list(1, 2), list(3, 4))
str(z)
Subsetting
x1 <- list(c(1,2), c(3,4))
x2 <- list(list(1,2), list(3,4))
x3 <- list(1, list(2, list(3)))

Drawing rules:

  1. Round vs. square corners

  2. Light vs. dark colors

a <- list(a = 1:3, b = "a string", c = pi, d = list(-1, -5))
# [ extracts a sublist
str(a[1:2])
str(a[4])

# [[ extracts a single compoenet from a list  
str(a[[1]])
str(a[[2]])
str(a[[3]])
str(a[[4]]) 

# $ is a shorthand for extrancting named elements of a list 
a$a 
a[["a"]]

5. Augmented Vectors

Vectors with additional attributes (i.e., additional metadata attached to vectors), including class (i.e., controls how genertic functions work).

That is, augmented vectors behave differently to the atomic vector on which they are built.

Factors
x <- factor(c("ab", "cd", "ab"), levels = c("ab", "cd", "ef"))
typeof(x) 
attributes(x)
Dates and Date-Times

Skip

Tibbles

Tibbels/data frames have vectors with the same length, while lists do not always.

# compare the following two (almost the same) 
tb <- tibble::tibble(x = 1:5, y = 5:1) 
typeof(tb)
attributes(tb)

df <- data.frame(x = 1:5, y = 5:1) 
typeof(df)
attributes(df)