suppressPackageStartupMessages(library("tidyverse"))

1. What does mean(is.na(x)) tell you about a vector x? What about `sum(!is.finite(x))?

I’ll use the numeric vector x to compare the behaviors of is.na() and is.finite(). It contains numbers (-1, 0, 1) as well as all the special numeric values: infinity (Inf), missing (NA), and not-a-number (NaN).

x <- c(-Inf, -1, 0, 1, Inf, NA, NaN)

The expression mean(is.na(x)) calculates the proportion of missing values (values equal to NA) in a vector.

mean(is.na(x))
[1] 0.2857143

The expression sum(!is.finite(x)) calculates the number of elements in the vector that are equal to missing (NA), not-a-number (NaN), or infinity (Inf).

sum(!is.finite(x))
[1] 4

Review the Numeric section for the differences between is.na() and is.finite().

2. Carefully read the documentation of is.vector(). What does it actually test for? Why does is.atomic() not agree with the definition of atomic vectors above?

The function is.vector() only checks whether the object has no attributes other than names. Thus a list is a vector:

is.vector(list(a = 1, b = 2))
[1] TRUE

But any object that has an attribute (other than names) is not:

x <- 1:10
attr(x, "something") <- TRUE
is.vector(x)
[1] FALSE

The idea behind this is that object oriented classes will include attributes, including, but not limited to “class”.

The function is.atomic() explicitly checks whether an object is one of the atomic types (“logical”, “integer”, “numeric”, “complex”, “character”, and “raw”) or NULL.

is.atomic(1:10)
[1] TRUE
is.atomic(list(a = 1))
[1] FALSE

The function is.atomic() will consider objects to be atomic even if they have extra attributes.

is.atomic(x)
[1] TRUE

3. Compare and contrast setNames() with purrr::set_names().

The function setNames() takes two arguments, a vector to be named and a vector of names to apply to its elements.

setNames(1:4, c("a", "b", "c", "d"))
a b c d 
1 2 3 4 

You can use the values of the vector as its names if the nm argument is used.

setNames(nm = c("a", "b", "c", "d"))
  a   b   c   d 
"a" "b" "c" "d" 

The function set_names() has more ways to set the names than setNames(). The names can be specified in the same manner as setNames().

purrr::set_names(1:4, c("a", "b", "c", "d"))
a b c d 
1 2 3 4 

The names can also be specified as unnamed arguments,

purrr::set_names(1:4, "a", "b", "c", "d")
a b c d 
1 2 3 4 

The function set_names() will name an object with itself if no nm argument is provided (the opposite of setNames() behavior).

purrr::set_names(c("a", "b", "c", "d"))
  a   b   c   d 
"a" "b" "c" "d" 

The biggest difference between set_names() and setNames() is that set_names() allows for using a function or formula to transform the existing names.

purrr::set_names(c(a = 1, b = 2, c = 3), toupper)
A B C 
1 2 3 
purrr::set_names(c(a = 1, b = 2, c = 3), ~ toupper(.))
A B C 
1 2 3 

The set_names() function also checks that the length of the names argument is the same length as the vector that is being named, and will raise an error if it is not.

#purrr::set_names(1:4, c("a", "b"))
#>Error: `nm` must be `NULL` or a character vector the same length as `x`

The setNames() function will allow the names to be shorter than the vector being named, and will set the missing names to NA.

setNames(1:4, c("a", "b"))
   a    b <NA> <NA> 
   1    2    3    4 

4.Create functions that take a vector as input and returns:

1. The last value. Should you use [ or [[? 2. The elements at even numbered positions. 3. Every element except the last value. 4. Only even numbers (and no missing values).

The answers to the parts follow.

1. This function find the last value in a vector.

last_value <- function(x) {
  # check for case with no length
  if (length(x)) {
    x[[length(x)]]
  } else {
    x
  }
}
last_value(numeric())
numeric(0)
last_value(1)
[1] 1
last_value(1:10)
[1] 10

The function uses [[ in order to extract a single element.

2. This function returns the elements at even number positions.

even_indices <- function(x) {
  if (length(x)) {
    x[seq_along(x) %% 2 == 0]
  } else {
    x
  }
}
even_indices(numeric())
numeric(0)
even_indices(1)
numeric(0)
even_indices(1:10)
[1]  2  4  6  8 10
# test using case to ensure that values not indices
# are being returned
even_indices(letters)
 [1] "b" "d" "f" "h" "j" "l" "n" "p" "r" "t" "v" "x" "z"

3. This function returns a vector with every element except the last.

not_last <- function(x) {
  n <- length(x)
  if (n) {
    x[-n]
  } else {
    # n == 0
    x
  }
}
not_last(1:3)
[1] 1 2

We should also confirm that the function works with some edge cases, like a vector with one element, and a vector with zero elements.

not_last(1)
numeric(0)
not_last(numeric())
numeric(0)

In both these cases, not_last() correctly returns an empty vector.

4. This function returns the elements of a vector that are even numbers.

even_numbers <- function(x) {
  x[x %% 2 == 0]
}
even_numbers(-4:4)
[1] -4 -2  0  2  4

We could improve this function by handling the special numeric values: NA, NaN, Inf. However, first we need to decide how to handle them. Neither NaN nor Inf are numbers, and so they are neither even nor odd. In other words, since NaN nor Inf aren’t even numbers, they aren’t even numbers. What about NA? Well, we don’t know. NA is a number, but we don’t know its value. The missing number could be even or odd, but we don’t know. Another reason to return NA is that it is consistent with the behavior of other R functions, which generally return NA values instead of dropping them.

even_numbers2 <- function(x) {
  x[!is.infinite(x) & !is.nan(x) & (x %% 2 == 0)]
}
even_numbers2(c(0:4, NA, NaN, Inf, -Inf))
[1]  0  2  4 NA

5. Why is x[-which(x > 0)] not the same as x[x <= 0]?

These expressions differ in the way that they treat missing values. Let’s test how they work by creating a vector with positive and negative integers, and special values (NA, NaN, and Inf). These values should encompass all relevant types of values that these expressions would encounter.

x <- c(-1:1, Inf, -Inf, NaN, NA)
x[-which(x > 0)]
[1]   -1    0 -Inf  NaN   NA
x[x <= 0]
[1]   -1    0 -Inf   NA   NA

The expressions x[-which(x > 0)] and x[x <= 0] return the same values except for a NaN instead of a NA in the which() based expression.

So what is going on here? Let’s work through each part of these expressions and see where the different occurs. Let’s start with the expression x[x <= 0].

x <= 0
[1]  TRUE  TRUE FALSE FALSE  TRUE    NA    NA

Recall how the logical relational operators (<, <=, ==, !=, >, >=) treat NA values. Any relational operation that includes a NA returns an NA. Is NA <= 0? We don’t know because it depends on the unknown value of NA, so the answer is NA. This same argument applies to NaN. Asking whether NaN <= 0 does not make sense because you can’t compare a number to “Not a Number”.

Now recall how indexing treats NA values. Indexing can take a logical vector as in input, in which case it will include those elements where the logical vector is TRUE, and will not return those elements where the logical vector is FALSE. Logical vectors can also include NA values, and it is not clear how they should be treated. Well, since the value is NA, it could be TRUE or FALSE, we don’t know. Keeping elements with NA would treat the NA as TRUE, and dropping them would treat the NA as FALSE.

The way R decides to handle the NA values so that they are treated differently than TRUE or FALSE values is to include elements where the indexing vector is NA, but set their values to NA.

Now consider the expression x[-which(x > 0)]. As before, to understand this expression we’ll work from the inside out. Consider x > 0.

x > 0
[1] FALSE FALSE  TRUE  TRUE FALSE    NA    NA

As with x <= 0, it returns NA for comparisons involving NA and NaN.

What does which() do?

which(x > 0)
[1] 3 4

The which() function returns the indexes for which the argument is TRUE. This means that it is not including the indexes for which the argument is FALSE or NA.

Now consider the full expression x[-which(x > 0)]? The which() function returned a vector of integers. How does indexing treat negative integers?

x[1:2]
[1] -1  0
x[-(1:2)]
[1]    1  Inf -Inf  NaN   NA

If indexing gets a vector of positive integers, it will select those indexes; if it receives a vector of negative integers, it will drop those indexes. Thus, x[-which(x > 0)] ends up dropping the elements for which x > 0 is true, and keeps all the other elements and their original values, including NA and NaN.

There’s one other special case that we should consider. How do these two expressions work with an empty vector?

x <- numeric()
x[x <= 0]
numeric(0)
x[-which(x > 0)]
numeric(0)

Thankfully, they both handle empty vectors the same.

This exercise is a reminder to always test your code. Even though these two expressions looked equivalent, they are not in practice. And when you do test code, consider both how it works on typical values as well as special values and edge cases, like a vector with NA or NaN or Inf values, or an empty vector. These are where unexpected behavior is most likely to occur.

6. What happens when you subset with a positive integer that’s bigger than the length of the vector? What happens when you subset with a name that doesn’t exist?

Let’s consider the named vector,

x <- c(a = 10, b = 20)

If we subset it by an integer larger than its length, it returns a vector of missing values.

x[3]
<NA> 
  NA 

This also applies to ranges.

x[3:5]
<NA> <NA> <NA> 
  NA   NA   NA 

If some indexes are larger than the length of the vector, those elements are NA.

x[1:5]
   a    b <NA> <NA> <NA> 
  10   20   NA   NA   NA 

Likewise, when [ is provided names not in the vector’s names, it will return NA for those elements.

x["c"]
<NA> 
  NA 
x[c("c", "d", "e")]
<NA> <NA> <NA> 
  NA   NA   NA 
x[c("a", "b", "c")]
   a    b <NA> 
  10   20   NA 

Though not yet discussed much in this lecture, the [[ behaves differently. With an atomic vector, if [[ is given an index outside the range of the vector or an invalid name, it raises an error.

---
title: "Using atomic vectors"
output: 
  html_notebook:
    toc: true
    toc_float: true
---

```{r}
suppressPackageStartupMessages(library("tidyverse"))
```

### 1. What does `mean(is.na(x))` tell you about a vector `x`? What about `sum(!is.finite(x))?

I’ll use the numeric vector x to compare the behaviors of `is.na()` and `is.finite()`. It contains numbers (`-1`, `0`, `1`) as well as all the special numeric values: infinity (`Inf`), missing (`NA`), and not-a-number (`NaN`).

```{r}
x <- c(-Inf, -1, 0, 1, Inf, NA, NaN)
```

The expression `mean(is.na(x))` calculates the proportion of missing values (values equal to `NA`) in a vector.

```{r}
mean(is.na(x))
```

The expression `sum(!is.finite(x))` calculates the number of elements in the vector that are equal to missing (`NA`), not-a-number (`NaN`), or infinity (`Inf`).

```{r}
sum(!is.finite(x))
```

Review the Numeric section for the differences between `is.na()` and `is.finite()`.

### 2. Carefully read the documentation of `is.vector()`. What does it actually test for? Why does `is.atomic()` not agree with the definition of atomic vectors above?

The function `is.vector()` only checks whether the object has no attributes other than names. Thus a list is a vector:

```{r}
is.vector(list(a = 1, b = 2))
```

But any object that has an attribute (other than names) is not:

```{r}
x <- 1:10
attr(x, "something") <- TRUE
is.vector(x)
```

The idea behind this is that object oriented classes will include attributes, including, but not limited to "class".

The function `is.atomic()` explicitly checks whether an object is one of the atomic types (“logical”, “integer”, “numeric”, “complex”, “character”, and “raw”) or NULL.

```{r}
is.atomic(1:10)
is.atomic(list(a = 1))
```

The function `is.atomic()` will consider objects to be atomic even if they have extra attributes.

```{r}
is.atomic(x)
```

### 3. Compare and contrast `setNames()` with `purrr::set_names()`.

The function `setNames()` takes two arguments, a vector to be named and a vector of names to apply to its elements.

```{r}
setNames(1:4, c("a", "b", "c", "d"))
```

You can use the values of the vector as its names if the `nm` argument is used.

```{r}
setNames(nm = c("a", "b", "c", "d"))
```

The function `set_names()` has more ways to set the names than `setNames()`. The names can be specified in the same manner as `setNames()`.

```{r}
purrr::set_names(1:4, c("a", "b", "c", "d"))
```

The names can also be specified as unnamed arguments,

```{r}
purrr::set_names(1:4, "a", "b", "c", "d")
```

The function `set_names()` will name an object with itself if no nm argument is provided (the opposite of `setNames()` behavior).

```{r}
purrr::set_names(c("a", "b", "c", "d"))
```

The biggest difference between `set_names()` and `setNames()` is that `set_names()` allows for using a function or formula to transform the existing names.

```{r}
purrr::set_names(c(a = 1, b = 2, c = 3), toupper)
purrr::set_names(c(a = 1, b = 2, c = 3), ~ toupper(.))
```

The `set_names()` function also checks that the length of the names argument is the same length as the vector that is being named, and will raise an error if it is not.

```{r}
#purrr::set_names(1:4, c("a", "b"))
#>Error: `nm` must be `NULL` or a character vector the same length as `x`
```

The `setNames()` function will allow the names to be shorter than the vector being named, and will set the missing names to NA.

```{r}
setNames(1:4, c("a", "b"))
```

### 4.Create functions that take a vector as input and returns:

**1. The last value. Should you use `[` or `[[`?**
**2. The elements at even numbered positions.**
**3. Every element except the last value.**
**4. Only even numbers (and no missing values).**

The answers to the parts follow.

**1.** This function find the last value in a vector.

```{r}
last_value <- function(x) {
  # check for case with no length
  if (length(x)) {
    x[[length(x)]]
  } else {
    x
  }
}
last_value(numeric())
last_value(1)
last_value(1:10)
```

The function uses `[[` in order to extract a single element.

**2.** This function returns the elements at even number positions.

```{r}
even_indices <- function(x) {
  if (length(x)) {
    x[seq_along(x) %% 2 == 0]
  } else {
    x
  }
}
even_indices(numeric())
even_indices(1)
even_indices(1:10)
# test using case to ensure that values not indices
# are being returned
even_indices(letters)
```

**3.** This function returns a vector with every element except the last.
```{r}
not_last <- function(x) {
  n <- length(x)
  if (n) {
    x[-n]
  } else {
    # n == 0
    x
  }
}
not_last(1:3)
```

We should also confirm that the function works with some edge cases, like a vector with one element, and a vector with zero elements.

```{r}
not_last(1)
not_last(numeric())
```

In both these cases, not_last() correctly returns an empty vector.

**4.** This function returns the elements of a vector that are even numbers.

```{r}
even_numbers <- function(x) {
  x[x %% 2 == 0]
}
even_numbers(-4:4)
```

We could improve this function by handling the special numeric values: `NA`, `NaN`, `Inf`. However, first we need to decide how to handle them. Neither `NaN` nor `Inf` are numbers, and so they are neither even nor odd. In other words, since `NaN` nor `Inf` aren’t even numbers, they aren’t even numbers. What about `NA`? Well, we don’t know. `NA` is a number, but we don’t know its value. The missing number could be even or odd, but we don’t know. Another reason to return `NA` is that it is consistent with the behavior of other R functions, which generally return `NA` values instead of dropping them.

```{r}
even_numbers2 <- function(x) {
  x[!is.infinite(x) & !is.nan(x) & (x %% 2 == 0)]
}
even_numbers2(c(0:4, NA, NaN, Inf, -Inf))
```

### 5. Why is `x[-which(x > 0)]` not the same as `x[x <= 0]`?

These expressions differ in the way that they treat missing values. Let’s test how they work by creating a vector with positive and negative integers, and special values (`NA`, `NaN`, and `Inf`). These values should encompass all relevant types of values that these expressions would encounter.

```{r}
x <- c(-1:1, Inf, -Inf, NaN, NA)
x[-which(x > 0)]
x[x <= 0]
```

The expressions `x[-which(x > 0)]` and `x[x <= 0]` return the same values except for a `NaN` instead of a `NA` in the `which()` based expression.

So what is going on here? Let’s work through each part of these expressions and see where the different occurs. Let’s start with the expression `x[x <= 0]`.

```{r}
x <= 0
```

Recall how the logical relational operators (`<`, `<=`, `==`, `!=`, `>`, `>=`) treat `NA` values. Any relational operation that includes a `NA` returns an `NA`. Is `NA <= 0`? We don’t know because it depends on the unknown value of `NA`, so the answer is `NA`. This same argument applies to `NaN`. Asking whether `NaN <= 0` does not make sense because you can’t compare a number to “Not a Number”.

Now recall how indexing treats `NA` values. Indexing can take a logical vector as in input, in which case it will include those elements where the logical vector is `TRUE`, and will not return those elements where the logical vector is `FALSE`. Logical vectors can also include `NA` values, and it is not clear how they should be treated. Well, since the value is `NA`, it could be `TRUE` or `FALSE`, we don’t know. Keeping elements with `NA` would treat the `NA` as `TRUE`, and dropping them would treat the `NA` as `FALSE`.

The way `R` decides to handle the `NA` values so that they are treated differently than `TRUE` or `FALSE` values is to include elements where the indexing vector is `NA`, but set their values to `NA`.

Now consider the expression `x[-which(x > 0)]`. As before, to understand this expression we’ll work from the inside out. Consider `x > 0`.

```{r}
x > 0
```

As with `x <= 0`, it returns `NA` for comparisons involving `NA` and `NaN`.

What does `which()` do?

```{r}
which(x > 0)
```

The `which()` function returns the indexes for which the argument is `TRUE`. This means that it is not including the indexes for which the argument is `FALSE` or `NA`.

Now consider the full expression `x[-which(x > 0)]`? The `which()` function returned a vector of integers. How does indexing treat negative integers?

```{r}
x[1:2]
x[-(1:2)]
```

If indexing gets a vector of positive integers, it will select those indexes; if it receives a vector of negative integers, it will drop those indexes. Thus, `x[-which(x > 0)]` ends up dropping the elements for which `x > 0` is true, and keeps all the other elements and their original values, including `NA` and `NaN`.

There’s one other special case that we should consider. How do these two expressions work with an empty vector?

```{r}
x <- numeric()
x[x <= 0]
x[-which(x > 0)]
```

Thankfully, they both handle empty vectors the same.

This exercise is a reminder to always test your code. Even though these two expressions looked equivalent, they are not in practice. And when you do test code, consider both how it works on typical values as well as special values and edge cases, like a vector with `NA` or `NaN` or `Inf` values, or an empty vector. These are where unexpected behavior is most likely to occur.

### 6. What happens when you subset with a positive integer that’s bigger than the length of the vector? What happens when you subset with a name that doesn’t exist?

Let’s consider the named vector,

```{r}
x <- c(a = 10, b = 20)
```

If we subset it by an integer larger than its length, it returns a vector of missing values.

```{r}
x[3]
```

This also applies to ranges.

```{r}
x[3:5]
```

If some indexes are larger than the length of the vector, those elements are `NA`.

```{r}
x[1:5]
```

Likewise, when `[` is provided names not in the vector’s names, it will return NA for those elements.

```{r}
x["c"]
x[c("c", "d", "e")]
x[c("a", "b", "c")]
```

Though not yet discussed much in this lecture, the `[[` behaves differently. With an atomic vector, if `[[` is given an index outside the range of the vector or an invalid name, it raises an error.

```{r}
x[["c"]]
x[[5]]
```
