Activity 11

Michael Marx

In this class, you will learn how to:

  • Save new types of data, like character strings and logical values
  • Save a data set as a vector, matrix, array, list, or data frame
  • Load and save your own data sets with R
  • Extract individual values from a data set
  • Change individual values within a data set

Atomic Vectors

We create and atomic vector die that stores 5 elements.

die <- c(1, 2, 3, 4, 5, 6)
die
## [1] 1 2 3 4 5 6
## 1 2 3 4 5 6

output shows numbers 1 through 6

Is it a vector?

is.vector(die)
## [1] TRUE

-> confirms that die is indeed a vector

We create an atomic vector that stores 5 (one element).

five <- 5
five
## [1] 5

the simple assignment of 5 to variable five also results in a vector -> everything in R is a vector

Is object five a vector?

is.vector(five)
## [1] TRUE

it is!

Function length gets or sets the length of vectors (including lists) and factors, and of any other R object for which a method has been defined. In simple terms, length returns the length of an atomic vector.

length(five)
## [1] 1
length(die)
## [1] 6

while the length of ‘five’ is 1, the length of ‘die’ is 6

Each atomic vector stores its values as a one-dimensional vector, and each atomic vector can only store one type of data. R recognizes six basic types of atomic vectors: doubles, integers, characters, logicals, complex, and raw.

int <- 1L
text <- "ace"
do_uble <- 30 #64 bits to store
logic <- TRUE

-> assigning 4 vectors and initializing them

Floating-point errors arise due to each double accuracy to about 16 significant digits. This introduces a little bit of error. In most cases, this rounding error will go unnoticed. However, in some situations, the rounding error can cause surprising results. For example, you may expect the result of the expression below to be zero, but it is not:

sqrt(2)^2 - 2
## [1] 4.440892e-16

the output shows the calculation to equal approximately 4.44e-16

Other types

comp <- c(1 + 1i, 1 + 2i, 1 + 3i)
comp
## [1] 1+1i 1+2i 1+3i
r_raw <- raw(3)
## 00 00 00

#Attributes The most common attributes to give an atomic vector are names, dimensions (dim), and classes. Notice how object die has no names after we created the object.

names(die)
## NULL

null -> there are no names

We assign names to the elemements.

names(die) <- c("one", "two", "three", "four", "five", "six")
names(die)
## [1] "one"   "two"   "three" "four"  "five"  "six"

names are assigned and it is confirmed that they are stored

Let’s recheck the attributes function.

attributes(die)
## $names
## [1] "one"   "two"   "three" "four"  "five"  "six"

names can also be confirmed with the attributes function

Names do not affect the values.

names(die) <- c("uno", "dos", "tres", "quatro", "cinco", "seis")
die
##    uno    dos   tres quatro  cinco   seis 
##      1      2      3      4      5      6

names are reassigned to spanish -> values are not affected

We can also remove names.

names(die) <- NULL

names are removed by assigning null to them

Creating n dimensional Structures

A vector is a one-dimensional array. A matrix is a two-dimensional array; therefore is the same thing as a matrix. Modifying the dim attribute of an atomic vector into either a matrix or an array with more than three dimensions.

For example you can reorganize die into a 2 × 3 matrix.

print(dim(die) <- c(2, 3))
## [1] 2 3

shows output of 2 and 3 -> reorganized to 2 x 3 matrix

R will always use the first value in dim for the number of rows and the second value for the number of columns. In general, rows always come first in R operations that deal with both rows and columns.

dim(die) <- c(3, 2)

-> 3 x 2 matrix

Notice how by default R fills up each matrix by columns.

#hypercube
dim(die) <- c(1, 2, 3)
class(die)
## [1] "array"

now, the class of die is an array

If you’d like more control over how the data is stored, you can use one of R’s helper functions, matrix or array. They do the same thing as changing the dim attribute, but they provide extra arguments to customize the process. #Matrix Function

m <- matrix(die, nrow = 2)
m
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6

matrix with 2 rows and 3 columns is created

m <- matrix(die, nrow = 2, byrow = TRUE)
m
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6

matrix is no organized by row, therefore, 123 is one row and 456 is the second row

Array Function

The array function creates an n-dimensional array.

ar <- array(c(11:14, 21:24, 31:34), dim = c(2, 2, 3))
ar
## , , 1
## 
##      [,1] [,2]
## [1,]   11   13
## [2,]   12   14
## 
## , , 2
## 
##      [,1] [,2]
## [1,]   21   23
## [2,]   22   24
## 
## , , 3
## 
##      [,1] [,2]
## [1,]   31   33
## [2,]   32   34

array with a depth of 3, length of 2, and height of 2 is created

Notice that changing the dimensions of your object will not change the type of the object, but it will change the object’s class attribute:

dim(die) <- c(2, 3)
typeof(die)
## [1] "double"
class(die)
## [1] "matrix" "array"

the type is double and class is matrix, array

Note that an object’s class attribute will not always appear when you run attributes; you may need to specifically search for it with class: attributes(die)

attributes(die)
## $dim
## [1] 2 3

the attributes displayed are the dimensions 2 and 3

You can apply class to objects that do not have a class attribute. class will return a value based on the object’s atomic type. Notice that the “class” of a double is “numeric,” an odd deviation, but one I am thankful for. I think that the most important property of a double vector is that it contains numbers, a property that “numeric” makes obvious:

class("Hello")
## [1] "character"
class(5)
## [1] "numeric"

class of hello is character class of 5 is numeric

now <- Sys.time()
now
## [1] "2022-11-29 19:46:29 EST"
typeof(now)
## [1] "double"
class(now)
## [1] "POSIXct" "POSIXt"

built in system functions like time allow you to interact with the computer system -> time returns the date and time -> the type is double and the class is POSIXct

POSIXct is a framework for representing dates and times. Time is represented by the number of seconds that have passed between now and12:00 AM January 1st 1970 (in the Universal Time Coordinated (UTC) zone). You can see this number by removing the class attribute of now, or by using the un class function, which does the same thing:

unclass(now)
## [1] 1669769190

unclassing now returns large int

R then gives the double vector a class attribute that contains two classes, “POSIXct” and “POSIXt”. This attribute alerts R functions that they are dealing with a POSIXct time, so they can treat it in a special way. For example, R functions will use the POSIXct standard to convert the time into a user-friendly character string before displaying it. You can take advantage of this system by giving the POSIXct class to random R objects. For example, have you ever wondered what day it was a million seconds after 12:00 a.m. Jan. 1, 1970?

mil <- 1000000
mil
## [1] 1e+06
class(mil) <- c("POSIXct", "POSIXt")
mil
## [1] "1970-01-12 08:46:40 EST"

one million can be transformed to date and time using the posixct class

#Factors

gender <- factor(c("male", "female", "female", "male"))
typeof(gender)
## [1] "integer"
attributes(gender)
## $levels
## [1] "female" "male"  
## 
## $class
## [1] "factor"

the type of gender happens to be gender the attributes are level -> female and male class is factor

unclass(gender)
## [1] 2 1 1 2
## attr(,"levels")
## [1] "female" "male"

unclassing gender returns list of female and male

gender
## [1] male   female female male  
## Levels: female male

-> returns male female female male with two levels: male and female

as.character(gender)
## [1] "male"   "female" "female" "male"

transforming gender to character returns values with “”

#Coercion

sum(c(TRUE, TRUE, FALSE, FALSE))
## [1] 2
#will become:
sum(c(1, 1, 0, 0))
## [1] 2

-> as true is 1 and false is 0: sum will be 2

as.character(1)
## [1] "1"
## "1"
as.logical(1)
## [1] TRUE
## TRUE
as.numeric(FALSE)
## [1] 0
## 0

-> 1 can be transformed to logical boolean and vice verca

#Lists Lists do not group together individual values; lists group together R objects, they are used as building blocks to create many more spohisticated types of R objects.

list1 <- list(100:130, "R", list(TRUE, FALSE))
list1
## [[1]]
##  [1] 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118
## [20] 119 120 121 122 123 124 125 126 127 128 129 130
## 
## [[2]]
## [1] "R"
## 
## [[3]]
## [[3]][[1]]
## [1] TRUE
## 
## [[3]][[2]]
## [1] FALSE

-> list1 now contains 100 through 130 on level 1 R on level 2 and a list of true and false (divided into sublevels) on level 3

#Data Frames Data frames are the two-dimensional version of a list. They are far and away the most useful storage structure for data analysis, and they provide an ideal way to store an entire deck of cards. You can think of a data frame as R’s equivalent to the Excel spreadsheet because it stores data in a similar format.

df <- data.frame(face = c("ace", "two", "six"),
suit = c("clubs", "clubs", "clubs"), value = c(1, 2, 3))
df

-> the dataframe df has three columns face, suit, and value

Data frames cannot combine columns of different lengths.

df <- data.frame(face = c("ace", "two", "six"),
suit = c("clubs", "clubs", "clubs"), value = c(1, 2, 3),
stringsAsFactors = FALSE)

while the rest stays the same, the strings are not set as factors in this example

typeof(df)
## [1] "list"
class(df)
## [1] "data.frame"
str(df)
## 'data.frame':    3 obs. of  3 variables:
##  $ face : chr  "ace" "two" "six"
##  $ suit : chr  "clubs" "clubs" "clubs"
##  $ value: num  1 2 3

the type is list the class is dataframe and the structure is: face -> character suit -> character value -> numeric

df <- data.frame(face = c("ace", "two", "six"),
suit = c("clubs", "clubs", "clubs"), value = c(1, 2, 3),
stringsAsFactors = FALSE)