In this class, you will learn how to:
We create and atomic vector die that stores 5 elements.
die <- c(1, 2, 3, 4, 5, 6)
die
## [1] 1 2 3 4 5 6
## 1 2 3 4 5 6
output shows numbers 1 through 6
Is it a vector?
is.vector(die)
## [1] TRUE
-> confirms that die is indeed a vector
We create an atomic vector that stores 5 (one element).
five <- 5
five
## [1] 5
the simple assignment of 5 to variable five also results in a vector -> everything in R is a vector
Is object five a vector?
is.vector(five)
## [1] TRUE
it is!
Function length gets or sets the length of vectors (including lists) and factors, and of any other R object for which a method has been defined. In simple terms, length returns the length of an atomic vector.
length(five)
## [1] 1
length(die)
## [1] 6
while the length of ‘five’ is 1, the length of ‘die’ is 6
Each atomic vector stores its values as a one-dimensional vector, and each atomic vector can only store one type of data. R recognizes six basic types of atomic vectors: doubles, integers, characters, logicals, complex, and raw.
int <- 1L
text <- "ace"
do_uble <- 30 #64 bits to store
logic <- TRUE
-> assigning 4 vectors and initializing them
Floating-point errors arise due to each double accuracy to about 16 significant digits. This introduces a little bit of error. In most cases, this rounding error will go unnoticed. However, in some situations, the rounding error can cause surprising results. For example, you may expect the result of the expression below to be zero, but it is not:
sqrt(2)^2 - 2
## [1] 4.440892e-16
the output shows the calculation to equal approximately 4.44e-16
Other types
comp <- c(1 + 1i, 1 + 2i, 1 + 3i)
comp
## [1] 1+1i 1+2i 1+3i
r_raw <- raw(3)
## 00 00 00
#Attributes The most common attributes to give an atomic vector are names, dimensions (dim), and classes. Notice how object die has no names after we created the object.
names(die)
## NULL
null -> there are no names
We assign names to the elemements.
names(die) <- c("one", "two", "three", "four", "five", "six")
names(die)
## [1] "one" "two" "three" "four" "five" "six"
names are assigned and it is confirmed that they are stored
Let’s recheck the attributes function.
attributes(die)
## $names
## [1] "one" "two" "three" "four" "five" "six"
names can also be confirmed with the attributes function
Names do not affect the values.
names(die) <- c("uno", "dos", "tres", "quatro", "cinco", "seis")
die
## uno dos tres quatro cinco seis
## 1 2 3 4 5 6
names are reassigned to spanish -> values are not affected
We can also remove names.
names(die) <- NULL
names are removed by assigning null to them
A vector is a one-dimensional array. A matrix is a two-dimensional array; therefore is the same thing as a matrix. Modifying the dim attribute of an atomic vector into either a matrix or an array with more than three dimensions.
For example you can reorganize die into a 2 × 3 matrix.
print(dim(die) <- c(2, 3))
## [1] 2 3
shows output of 2 and 3 -> reorganized to 2 x 3 matrix
R will always use the first value in dim for the number of rows and the second value for the number of columns. In general, rows always come first in R operations that deal with both rows and columns.
dim(die) <- c(3, 2)
-> 3 x 2 matrix
Notice how by default R fills up each matrix by columns.
#hypercube
dim(die) <- c(1, 2, 3)
class(die)
## [1] "array"
now, the class of die is an array
If you’d like more control over how the data is stored, you can use one of R’s helper functions, matrix or array. They do the same thing as changing the dim attribute, but they provide extra arguments to customize the process. #Matrix Function
m <- matrix(die, nrow = 2)
m
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
matrix with 2 rows and 3 columns is created
m <- matrix(die, nrow = 2, byrow = TRUE)
m
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
matrix is no organized by row, therefore, 123 is one row and 456 is the second row
The array function creates an n-dimensional array.
ar <- array(c(11:14, 21:24, 31:34), dim = c(2, 2, 3))
ar
## , , 1
##
## [,1] [,2]
## [1,] 11 13
## [2,] 12 14
##
## , , 2
##
## [,1] [,2]
## [1,] 21 23
## [2,] 22 24
##
## , , 3
##
## [,1] [,2]
## [1,] 31 33
## [2,] 32 34
array with a depth of 3, length of 2, and height of 2 is created
Notice that changing the dimensions of your object will not change the type of the object, but it will change the object’s class attribute:
dim(die) <- c(2, 3)
typeof(die)
## [1] "double"
class(die)
## [1] "matrix" "array"
the type is double and class is matrix, array
Note that an object’s class attribute will not always appear when you run attributes; you may need to specifically search for it with class: attributes(die)
attributes(die)
## $dim
## [1] 2 3
the attributes displayed are the dimensions 2 and 3
You can apply class to objects that do not have a class attribute. class will return a value based on the object’s atomic type. Notice that the “class” of a double is “numeric,” an odd deviation, but one I am thankful for. I think that the most important property of a double vector is that it contains numbers, a property that “numeric” makes obvious:
class("Hello")
## [1] "character"
class(5)
## [1] "numeric"
class of hello is character class of 5 is numeric
now <- Sys.time()
now
## [1] "2022-11-29 19:46:29 EST"
typeof(now)
## [1] "double"
class(now)
## [1] "POSIXct" "POSIXt"
built in system functions like time allow you to interact with the computer system -> time returns the date and time -> the type is double and the class is POSIXct
POSIXct is a framework for representing dates and times. Time is represented by the number of seconds that have passed between now and12:00 AM January 1st 1970 (in the Universal Time Coordinated (UTC) zone). You can see this number by removing the class attribute of now, or by using the un class function, which does the same thing:
unclass(now)
## [1] 1669769190
unclassing now returns large int
R then gives the double vector a class attribute that contains two classes, “POSIXct” and “POSIXt”. This attribute alerts R functions that they are dealing with a POSIXct time, so they can treat it in a special way. For example, R functions will use the POSIXct standard to convert the time into a user-friendly character string before displaying it. You can take advantage of this system by giving the POSIXct class to random R objects. For example, have you ever wondered what day it was a million seconds after 12:00 a.m. Jan. 1, 1970?
mil <- 1000000
mil
## [1] 1e+06
class(mil) <- c("POSIXct", "POSIXt")
mil
## [1] "1970-01-12 08:46:40 EST"
one million can be transformed to date and time using the posixct class
#Factors
gender <- factor(c("male", "female", "female", "male"))
typeof(gender)
## [1] "integer"
attributes(gender)
## $levels
## [1] "female" "male"
##
## $class
## [1] "factor"
the type of gender happens to be gender the attributes are level -> female and male class is factor
unclass(gender)
## [1] 2 1 1 2
## attr(,"levels")
## [1] "female" "male"
unclassing gender returns list of female and male
gender
## [1] male female female male
## Levels: female male
-> returns male female female male with two levels: male and female
as.character(gender)
## [1] "male" "female" "female" "male"
transforming gender to character returns values with “”
#Coercion
sum(c(TRUE, TRUE, FALSE, FALSE))
## [1] 2
#will become:
sum(c(1, 1, 0, 0))
## [1] 2
-> as true is 1 and false is 0: sum will be 2
as.character(1)
## [1] "1"
## "1"
as.logical(1)
## [1] TRUE
## TRUE
as.numeric(FALSE)
## [1] 0
## 0
-> 1 can be transformed to logical boolean and vice verca
#Lists Lists do not group together individual values; lists group together R objects, they are used as building blocks to create many more spohisticated types of R objects.
list1 <- list(100:130, "R", list(TRUE, FALSE))
list1
## [[1]]
## [1] 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118
## [20] 119 120 121 122 123 124 125 126 127 128 129 130
##
## [[2]]
## [1] "R"
##
## [[3]]
## [[3]][[1]]
## [1] TRUE
##
## [[3]][[2]]
## [1] FALSE
-> list1 now contains 100 through 130 on level 1 R on level 2 and a list of true and false (divided into sublevels) on level 3
#Data Frames Data frames are the two-dimensional version of a list. They are far and away the most useful storage structure for data analysis, and they provide an ideal way to store an entire deck of cards. You can think of a data frame as R’s equivalent to the Excel spreadsheet because it stores data in a similar format.
df <- data.frame(face = c("ace", "two", "six"),
suit = c("clubs", "clubs", "clubs"), value = c(1, 2, 3))
df
-> the dataframe df has three columns face, suit, and value
Data frames cannot combine columns of different lengths.
df <- data.frame(face = c("ace", "two", "six"),
suit = c("clubs", "clubs", "clubs"), value = c(1, 2, 3),
stringsAsFactors = FALSE)
while the rest stays the same, the strings are not set as factors in this example
typeof(df)
## [1] "list"
class(df)
## [1] "data.frame"
str(df)
## 'data.frame': 3 obs. of 3 variables:
## $ face : chr "ace" "two" "six"
## $ suit : chr "clubs" "clubs" "clubs"
## $ value: num 1 2 3
the type is list the class is dataframe and the structure is: face -> character suit -> character value -> numeric
df <- data.frame(face = c("ace", "two", "six"),
suit = c("clubs", "clubs", "clubs"), value = c(1, 2, 3),
stringsAsFactors = FALSE)