CH20 Vectors

1. Introductions

2. Vector Basics

There are two types of vectors, Atomic vectors and lists. The Atomic vectors have 6 different types of vectors: logical, integer, double, character, complex, and raw. List vectors can also be called recursive vectors. Vectors also have attributes that are used to create augmented vectors. The three important types of augmented vectors are: factors are built on top of integer vectors, dates and date-times are built on top of numeric vectors, and data frames and tibbles are built on top of lists.

3. Important Types of Atomic Vector

The important types of Atomic Vectors are logical, integer, double, and character.

Logical: These are the simplest vectors there are due to the fact that there are only 3 possible values that can come from this: FALSE, TRUE, and NA. These are used as comparisons.

1:10 %% 3 == 0

##  [1] FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE

c(TRUE, TRUE, FALSE, NA)

## [1]  TRUE  TRUE FALSE    NA

Double: Doubles are approximations that represent a point in which numbers cannot be represented in a monetary form. That is why they are considered approximations.

x <- sqrt(2) ^ 2
x

## [1] 2

Integer: Integers only have one value, and that would be NA. With integers, you want to avoid using == to check for this special value, but instead use is.infinite() and is.na().

c(-1, 0, 1) / 0

## [1] -Inf  NaN  Inf

Character: Character vectors are the most complex ones because each branch of the vector can contain arbitrary data. Since R uses a global string pool, this allows for memory to only have to be stored once due to the fact that every use of the string points to its individual representation, which in turn reduces the amount of memory needed by duplicated strings.

x <- "This is a reasonably long string."
pryr::object_size(x)

## 152 B

y <- rep(x, 1000)
pryr::object_size(y)

## 8.14 kB

4. Using Atomic Vectors

Coercion: Explicit Coercion occurs whenever as.logical(), as.integer(), as.double(), or as.character() are being used. They are typically very rare in use and relatively simple. Implicit coercion is when you use a vector in a specific context that expects a certain type of vector. One thing to understand is that to read any sort of implicit coercion within code, you want to simplify it as much as possible because the more complex code always win in lines.

typeof(c(TRUE, 1L))

## [1] "integer"

typeof(c(1L, 1.5))

## [1] "double"

typeof(c(1.5, "a"))

## [1] "character"

5. Recursive Vectors (lists)

Lists are a little more complex than atomic vectors due to their wide range of things it’s able to do. They can create a sort of hierarchical structure. For example, using the str() function, you can produce something like this:

y <- list("a", 1L, 1.5, TRUE)
str(y)

## List of 4
##  $ : chr "a"
##  $ : int 1
##  $ : num 1.5
##  $ : logi TRUE

There are three principals when it comes to visualizing lists: lists have rounded corners, Children are drawn inside their parent, and have a slightly darker background to make it easier to see the hierarchy, and The orientation of the children (i.e. rows or columns) isn’t important.

There is also subsetting, when it comes to lists. A single bracket as such [ extracts a sub-list, as seen here:

Two of these brackets [[ extracts a single component from a lis, removing a level of hierarchy from the list.

The dollar sign $ is a shorthand for extracting named elements of a list, essentially the same as [[ except there’s no need for quotations.

6. Attributes

Attributes are named lists of vectors that can be attached to any object. There is a way to get and set individual attribute values woith attr() and attributes().

x <- 1:10
attr(x, "greeting")

## NULL

attr(x, "greeting") <- "Hi!"
attr(x, "farewell") <- "Bye!"
attributes(x)

## $greeting
## [1] "Hi!"
## 
## $farewell
## [1] "Bye!"

There are three very important attributes that are used in the fundamentals of R, Names (used to name the elements of a vector), Dimensions (make a vector behave like a matrix or array), and Class (used to implement the S3 object oriented system).

methods("as.Date")

## [1] as.Date.character   as.Date.default     as.Date.factor     
## [4] as.Date.numeric     as.Date.POSIXct     as.Date.POSIXlt    
## [7] as.Date.vctrs_sclr* as.Date.vctrs_vctr*
## see '?methods' for accessing help and source code

7. Augmented Vectors

Augmented vectors are vectors with additional attributes including class. Due to the fact that they have a class, they behave differently to that in which they are built. The four important augmented vectors are factors, dates, date-time, and tibbles.

Factors: Factors are designed to represent categorical data that can take a fixed set of possible values.

x <- factor(c("ab", "cd", "ab"), levels = c("ab", "cd", "ef"))
typeof(x)

## [1] "integer"

attributes(x)

## $levels
## [1] "ab" "cd" "ef"
## 
## $class
## [1] "factor"

Dates: Dates in R are numeric vectors that represent the date.

x <- as.Date("1971-01-01")
unclass(x)

## [1] 365

typeof(x)

## [1] "double"

attributes(x)

## $class
## [1] "Date"

Date-times: Date-times are numeric values that represent the number of seconds since 1 January 1970.

x <- lubridate::ymd_hm("1970-01-01 01:00")
unclass(x)

## [1] 3600
## attr(,"tzone")
## [1] "UTC"

typeof(x)

## [1] "double"

attributes(x)

## $class
## [1] "POSIXct" "POSIXt" 
## 
## $tzone
## [1] "UTC"

Tibbles: Tibbles are augmented lists: they have class “tbl_df” + “tbl” + “data.frame”, and names (column) and row.names attributes: The difference between a tibble and a list is that all the elements of a data frame must be vectors with the same length. all functions that work with tibbles enforce this constraint.

df <- data.frame(x = 1:5, y = 5:1)
typeof(df)

## [1] "list"

attributes(df)

## $names
## [1] "x" "y"
## 
## $class
## [1] "data.frame"
## 
## $row.names
## [1] 1 2 3 4 5

#CH21

Introduction

1. Iteration

Two different types, imperative programming and functional programming

2. For Loops

# example from the cheatsheet 

for (i in 1:4){
    j <- i + 10
    print(j) 
}

## [1] 11
## [1] 12
## [1] 13
## [1] 14

# example 1: numeric calculation - add 10
x <- 11:15

for (i in seq_along(x)){
    j <- x[i] + 10
    print(j)
}

## [1] 21
## [1] 22
## [1] 23
## [1] 24
## [1] 25

# save output
y <- vector("integer", length(x))

for (i in seq_along(x)) {
    y[i] <- x[i] + 10
    print(y[i])
}

## [1] 21
## [1] 22
## [1] 23
## [1] 24
## [1] 25

# output
y

## [1] 21 22 23 24 25

# example 2: string operation - extract first letter 
x <- c("abc", "xyz")

y <- vector("character", length(x))

for (i in seq_along(x)){ 
    y[i] <- x[i] %>% str_extract("[a-z]")
    print(y[i])
    
}

## [1] "a"
## [1] "x"

# output

y

## [1] "a" "x"

3. The Map Functions

# example 1: numeric calculation - add 10
x <- 11:15

for (i in seq_along(x)){
    j <- x[i] + 10
    print(j)
}

## [1] 21
## [1] 22
## [1] 23
## [1] 24
## [1] 25

# output
y

## [1] "a" "x"

# using map function
x

## [1] 11 12 13 14 15

map(.x = x, .f = ~.x + 10)

## [[1]]
## [1] 21
## 
## [[2]]
## [1] 22
## 
## [[3]]
## [1] 23
## 
## [[4]]
## [1] 24
## 
## [[5]]
## [1] 25

map_dbl(.x = x, .f = ~.x + 10)

## [1] 21 22 23 24 25

add_10 <- function(x) {x + 10}
11 %>% add_10()

## [1] 21

map_dbl(.x = x, .f = add_10)

## [1] 21 22 23 24 25

Week 14 : Code Along 13

R for Data Science: Chapter 20/21

Eoin Hamell-Kelleher

2023-1-9