In this class, you will learn how to:
- Save new types of data, like character strings and logical
values
- Save a data set as a vector, matrix, array, list, or data frame
- Load and save your own data sets with R
- Extract individual values from a data set
- Change individual values within a data set
Atomic Vectors
We create and atomic vector die that stores 5 elements.
die <- c(1, 2, 3, 4, 5, 6)
die
[1] 1 2 3 4 5 6
## 1 2 3 4 5 6
Is it a vector?
is.vector(die)
[1] TRUE
Yes, it is a vector
We create an atomic vector that stores 5 (one element).
five <- 5
five
[1] 5
Is object five a vector?
is.vector(five)
[1] TRUE
Yes, five is a vector with just one element.
Function length gets or sets the length of vectors (including lists)
and factors, and of any other R object for which a method has been
defined. In simple terms, length returns the length of an atomic
vector.
length(five)
[1] 1
length(die)
[1] 6
Vector five has one element while vector die has 6 elements.
Each atomic vector stores its values as a one-dimensional vector, and
each atomic vector can only store one type of data. R recognizes six
basic types of atomic vectors: doubles, integers, characters, logicals,
complex, and raw.
int <- 1L
text <- "ace"
do_uble <- 30 #64 bits to store
logic <- TRUE
Floating-point errors arise due to each double accuracy to about 16
significant digits. This introduces a little bit of error. In most
cases, this rounding error will go unnoticed. However, in some
situations, the rounding error can cause surprising results. For
example, you may expect the result of the expression below to be zero,
but it is not:
sqrt(2)^2 - 2
[1] 4.440892e-16
Problem here is that program compute square root of 2. The result of
this operation is rounded so it is computing an error. Then, the number
which is elevated to 2, is an approximation of the square root of 2
value, is going to be an approximation to 2.
Other types
comp <- c(1 + 1i, 1 + 2i, 1 + 3i)
comp
[1] 1+1i 1+2i 1+3i
r_raw <- raw(3)
raw
function (length = 0L)
.Internal(vector("raw", length))
<bytecode: 0x000001b025c3cea0>
<environment: namespace:base>
## 00 00 00
Attributes
The most common attributes to give an atomic vector are names,
dimensions (dim), and classes. Notice how object die has no names after
we created the object.
names(die)
NULL
We assign names to the elemements.
names(die) <- c("one", "two", "three", "four", "five", "six")
names(die)
[1] "one" "two" "three" "four" "five" "six"
Let’s recheck the attributes function.
attributes(die)
$names
[1] "one" "two" "three" "four" "five" "six"
Names do not affect the values.
names(die) <- c("uno", "dos", "tres", "quatro", "cinco", "seis")
die
uno dos tres quatro cinco seis
1 2 3 4 5 6
We can also remove names.
names(die) <- NULL
Creating n dimensional Structures
A vector is a one-dimensional array. A matrix is a two-dimensional
array; therefore is the same thing as a matrix. Modifying the dim
attribute of an atomic vector into either a matrix or an array with more
than three dimensions.
For example you can reorganize die into a 2 × 3 matrix.
dim(die) <- c(2, 3)
die
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
R will always use the first value in dim for the number of rows and
the second value for the number of columns. In general, rows always come
first in R operations that deal with both rows and columns.
dim(die) <- c(3, 2)
die
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
Notice how by default R fills up each matrix by columns.
#hypercube
dim(die) <- c(1, 2, 3)
class(die)
[1] "array"
If you’d like more control over how the data is stored, you can use
one of R’s helper functions, matrix or array. They do the same thing as
changing the dim attribute, but they provide extra arguments to
customize the process. # Matrix Function
m <- matrix(die, nrow = 2)
m
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
specifying byrow=TRUE, values follows the rows order
m <- matrix(die, nrow = 2, byrow = TRUE)
m
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
Array Function
The array function creates an n-dimensional array.
ar <- array(c(11:14, 21:24, 31:34), dim = c(2, 2, 3))
ar
, , 1
[,1] [,2]
[1,] 11 13
[2,] 12 14
, , 2
[,1] [,2]
[1,] 21 23
[2,] 22 24
, , 3
[,1] [,2]
[1,] 31 33
[2,] 32 34
Notice that changing the dimensions of your object will not change
the type of the object, but it will change the object’s class
attribute:
dim(die) <- c(2, 3)
typeof(die)
[1] "double"
class(die)
[1] "matrix" "array"
Note that an object’s class attribute will not always appear when you
run attributes; you may need to specifically search for it with class:
attributes(die)
attributes(die)
$dim
[1] 2 3
You can apply class to objects that do not have a class attribute.
class will return a value based on the object’s atomic type. Notice that
the “class” of a double is “numeric,” an odd deviation, but one I am
thankful for. I think that the most important property of a double
vector is that it contains numbers, a property that “numeric” makes
obvious:
class("Hello")
[1] "character"
class(5)
[1] "numeric"
now <- Sys.time()
now
[1] "2022-11-30 20:05:18 EST"
typeof(now)
[1] "double"
class(now)
[1] "POSIXct" "POSIXt"
POSIXct is a framework for representing dates and times. Time is
represented by the number of seconds that have passed between now
and12:00 AM January 1st 1970 (in the Universal Time Coordinated (UTC)
zone). You can see this number by removing the class attribute of now,
or by using the un class function, which does the same thing:
unclass(now)
[1] 1669856718
R then gives the double vector a class attribute that contains two
classes, “POSIXct” and “POSIXt”. This attribute alerts R functions that
they are dealing with a POSIXct time, so they can treat it in a special
way. For example, R functions will use the POSIXct standard to convert
the time into a user-friendly character string before displaying it. You
can take advantage of this system by giving the POSIXct class to random
R objects. For example, have you ever wondered what day it was a million
seconds after 12:00 a.m. Jan. 1, 1970?
mil <- 1000000
mil
[1] 1e+06
class(mil) <- c("POSIXct", "POSIXt")
mil
[1] "1970-01-12 08:46:40 EST"
Factors
As could be seen below there are two different values that are
repeated male and female Gender is a factor with 2 different levels.
gender <- factor(c("male", "female", "female", "male"))
typeof(gender)
[1] "integer"
attributes(gender)
$levels
[1] "female" "male"
$class
[1] "factor"
unclass(gender)
[1] 2 1 1 2
attr(,"levels")
[1] "female" "male"
attributes(gender)
$levels
[1] "female" "male"
$class
[1] "factor"
The 2 levels male and female keep
gender
[1] male female female male
Levels: female male
Shows gender as characters instead of as a factor.
gender2 <- as.character(gender)
gender2
[1] "male" "female" "female" "male"
Coercion
Taking TRUE as 1 and FALSE as 0
sum(c(TRUE, TRUE, FALSE, FALSE))
[1] 2
#will become:
sum(c(1, 1, 0, 0))
[1] 2
Number 1 as a character is “1” as logical is TRUE. Number “0” as
logical is FALSE and FALSE as numeric is 0.
as.character(1)
[1] "1"
## "1"
as.logical(1)
[1] TRUE
## TRUE
as.numeric(FALSE)
[1] 0
## 0
Lists
Lists do not group together individual values; lists group together R
objects, they are used as building blocks to create many more
spohisticated types of R objects.
list1 <- list(100:130, "R", list(TRUE, FALSE))
list1
[[1]]
[1] 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130
[[2]]
[1] "R"
[[3]]
[[3]][[1]]
[1] TRUE
[[3]][[2]]
[1] FALSE
Data Frames
Data frames are the two-dimensional version of a list. They are far
and away the most useful storage structure for data analysis, and they
provide an ideal way to store an entire deck of cards. You can think of
a data frame as R’s equivalent to the Excel spreadsheet because it
stores data in a similar format.
df <- data.frame(face = c("ace", "two", "six"),
suit = c("clubs", "clubs", "clubs"), value = c(1, 2, 3))
df
Data frames cannot combine columns of different lengths.
df <- data.frame(face = c("ace", "two", "six"),
suit = c("clubs", "clubs", "clubs"), value = c(1, 2, 3),
stringsAsFactors = FALSE)
typeof(df)
[1] "list"
class(df)
[1] "data.frame"
str(df)
'data.frame': 3 obs. of 3 variables:
$ face : chr "ace" "two" "six"
$ suit : chr "clubs" "clubs" "clubs"
$ value: num 1 2 3
df <- data.frame(face = c("ace", "two", "six"),
suit = c("clubs", "clubs", "clubs"), value = c(1, 2, 3),
stringsAsFactors = TRUE)
df
---
title: "Week 2 Part1 (Object Types)"
author: "Raul Roces"
output: html_notebook
---

In this class, you will learn how to:

* Save new types of data, like character strings and logical values
* Save a data set as a vector, matrix, array, list, or data frame
* Load and save your own data sets with R
* Extract individual values from a data set
* Change individual values within a data set

# Atomic Vectors
We create and atomic vector die that stores 5 elements.
```{r}
die <- c(1, 2, 3, 4, 5, 6)
die
## 1 2 3 4 5 6

```


Is it a vector?
```{r}
is.vector(die)
```
Yes, it is a vector

We create an atomic vector that stores 5 (one element).
```{r}
five <- 5
five
```


Is object five a vector?
```{r}
is.vector(five)
```
Yes, five is a vector with just one element.

Function length gets or sets the length of vectors (including lists) and factors, and of any other R object for which a method has been defined. In simple terms, length returns the length of an atomic vector.
```{r}
length(five)
length(die)
```
Vector five has one element while vector die has 6 elements.

Each atomic vector stores its values as a one-dimensional vector, and each atomic vector can only store one type of data. R recognizes six basic types of atomic vectors: doubles, integers, characters, logicals, complex, and raw.
```{r}
int <- 1L
text <- "ace"
do_uble <- 30 #64 bits to store
logic <- TRUE
```


Floating-point errors arise due to each double accuracy to about 16 significant digits. This introduces a little bit of error. In most cases, this rounding error will go unnoticed. However, in some
situations, the rounding error can cause surprising results. For example, you may expect
the result of the expression below to be zero, but it is not:
```{r}
sqrt(2)^2 - 2
```
Problem here is that program compute square root of 2. The result of this operation is rounded so it is computing an error. Then, the number which is elevated to 2, is an approximation of the square root of 2 value, is going to be an approximation to 2.

Other types
```{r}
comp <- c(1 + 1i, 1 + 2i, 1 + 3i)
comp
r_raw <- raw(3)
raw
## 00 00 00
```


# Attributes
The most common attributes to give an atomic vector are names, dimensions (dim),
and classes. Notice how object die has no names after we created the object.
```{r}
names(die)
```


We assign names to the elemements.
```{r}
names(die) <- c("one", "two", "three", "four", "five", "six")
names(die)
```


Let's recheck the attributes function.
```{r}
attributes(die)
```


Names do not affect the values.
```{r}
names(die) <- c("uno", "dos", "tres", "quatro", "cinco", "seis")
die
```
We can also remove names.

```{r}
names(die) <- NULL
```
# Creating n dimensional Structures


A vector is a one-dimensional array. A matrix is a two-dimensional array; therefore is the same thing as a matrix.
Modifying the dim attribute of an atomic vector into either a matrix or an array with more than three dimensions. 


For example you can reorganize die into a 2 × 3 matrix.
```{r}
dim(die) <- c(2, 3)
die
```


R will always use the first value in dim for the number of rows and the second value for
the number of columns. In general, rows always come first in R operations that deal
with both rows and columns.
```{r}
dim(die) <- c(3, 2)
die
```


Notice how by default R fills up each matrix by columns. 
```{r}
#hypercube
dim(die) <- c(1, 2, 3)
class(die)

```


If you’d like more control over how the data is stored, you can use one of R's helper functions, matrix or array.
They do the same thing as changing the dim attribute, but they provide extra arguments to customize the process.
# Matrix Function
```{r}
m <- matrix(die, nrow = 2)
m
```

specifying byrow=TRUE, values follows the rows order
```{r}
m <- matrix(die, nrow = 2, byrow = TRUE)
m
```


# Array Function
The array function creates an n-dimensional array.
```{r}
ar <- array(c(11:14, 21:24, 31:34), dim = c(2, 2, 3))
ar
```

Notice that changing the dimensions of your object will not change the type of the object,
but it will change the object’s class attribute:
```{r}
dim(die) <- c(2, 3)
typeof(die)
class(die)
```

Note that an object’s class attribute will not always appear when you run attributes;
you may need to specifically search for it with class:
attributes(die)
```{r}
attributes(die)
```

You can apply class to objects that do not have a class attribute. class will return a
value based on the object’s atomic type. Notice that the “class” of a double is “numeric,”
an odd deviation, but one I am thankful for. I think that the most important property
of a double vector is that it contains numbers, a property that “numeric” makes obvious:
```{r}
class("Hello")
class(5)
```

```{r}
now <- Sys.time()
now
typeof(now)
class(now)
```
POSIXct is a framework for representing dates and times. Time is represented by the number of seconds that have passed between now and12:00 AM January 1st 1970 (in the Universal Time Coordinated (UTC)
zone). You can see this number by removing the class attribute of now, or by using the un
class function, which does the same thing:
```{r}
unclass(now)
```
R then gives the double vector a class attribute that contains two classes, "POSIXct"
and "POSIXt". This attribute alerts R functions that they are dealing with a POSIXct
time, so they can treat it in a special way. For example, R functions will use the POSIXct
standard to convert the time into a user-friendly character string before displaying it.
You can take advantage of this system by giving the POSIXct class to random R objects.
For example, have you ever wondered what day it was a million seconds after 12:00 a.m.
Jan. 1, 1970?
```{r}
mil <- 1000000
mil
class(mil) <- c("POSIXct", "POSIXt")
mil
```

# Factors
As could be seen below there are two different values that are repeated male and female Gender is a factor with 2 different levels.
```{r}
gender <- factor(c("male", "female", "female", "male"))
typeof(gender)
attributes(gender)
```


```{r}
unclass(gender)
```
The 2 levels male and female keep
```{r}
gender
```

Shows gender as characters instead of as a factor.
```{r}
gender2 <- as.character(gender)
gender2

```


# Coercion
Taking TRUE as 1 and FALSE as 0
```{r}
sum(c(TRUE, TRUE, FALSE, FALSE))
#will become:
sum(c(1, 1, 0, 0))
```


Number 1 as a character is "1" as logical is TRUE. Number "0" as logical is FALSE and FALSE as numeric is 0.
```{r}
as.character(1)
## "1"
as.logical(1)
## TRUE
as.numeric(FALSE)
## 0
```

# Lists
Lists do not group together individual values; lists group together R objects, they are used as building blocks to create many more spohisticated types of R objects.
```{r}
list1 <- list(100:130, "R", list(TRUE, FALSE))
list1
```

# Data Frames
Data frames are the two-dimensional version of a list. They are far and away the most
useful storage structure for data analysis, and they provide an ideal way to store an entire
deck of cards. You can think of a data frame as R’s equivalent to the Excel spreadsheet
because it stores data in a similar format.
```{r}
df <- data.frame(face = c("ace", "two", "six"),
suit = c("clubs", "clubs", "clubs"), value = c(1, 2, 3))
df
```
Data frames cannot combine columns of different lengths.
```{r}
df <- data.frame(face = c("ace", "two", "six"),
suit = c("clubs", "clubs", "clubs"), value = c(1, 2, 3),
stringsAsFactors = FALSE)
```


```{r}
typeof(df)
class(df)
str(df)
```



```{r}
df <- data.frame(face = c("ace", "two", "six"),
suit = c("clubs", "clubs", "clubs"), value = c(1, 2, 3),
stringsAsFactors = FALSE)
df
```