In this class, you will learn how to:
We create and atomic vector die that stores 4 elements.
#Is the simplest vector in which the elements need/ are homogenous.
#Created a vector with 1:6 elements
die <- c(1,2,3,4)
die
## [1] 1 2 3 4
Is it a vector?
is.vector(die)
## [1] TRUE
We create an atomic vector that stores 4 (one element).
four <- 4
four
## [1] 4
Is object four a vector?
is.vector(four)
## [1] TRUE
Function length gets or sets the length of vectors (including lists) and factors, and of any other R object for which a method has been defined. In simple terms, length returns the length of an atomic vector.
#finding the length of all the elements in ceach created vector
length(four)
## [1] 1
length(die)
## [1] 4
Each atomic vector stores its values as a one-dimensional vector, and each atomic vector can only store one type of data. R recognizes six basic types of atomic vectors: doubles, integers, characters, logicals, complex, and raw.
#int- 1 integer element
int <- 1L
#text- character element
text <- "ace"
#do_uble- double integer
do_uble <- 30 #64 bits to store
#logic- boolean statment
logic <- TRUE
Floating-point errors arise due to each double accuracy to about 16 significant digits. This introduces a little bit of error. In most cases, this rounding error will go unnoticed. However, in some situations, the rounding error can cause surprising results. For example, you may expect the result of the expression below to be zero, but it is not:
#sqrt(): calculated the mathematical square-root of the value passed to it as argument.
sqrt(2)^2 - 2
## [1] 4.440892e-16
Other types
#comp- creates a complex vector
comp <- c(1 + 1i, 1 + 2i, 1 + 3i)
comp
## [1] 1+1i 1+2i 1+3i
#creates a raw vector- A raw vector is used to represent a "raw" sequence of bytes
r_raw <- raw(3)
r_raw
## [1] 00 00 00
## 00 00 00
#Attributes The most common attributes to give an atomic vector are names, dimensions (dim), and classes. Notice how object die has no names after we created the object.
#tries to find the name of the element in the 'die' vector; but since all the elements are integers then the value is NULL.
names(die)
## NULL
We assign names to the elemements.
names(die) <- c("one", "two", "three", "four")
names(die)
## [1] "one" "two" "three" "four"
Let’s recheck the attributes function.
#attributes() - explains what an object represents and how it should be interpreted by R
attributes(die)
## $names
## [1] "one" "two" "three" "four"
Names do not affect the values.
#gives a name to each element
#Values are not affected just given names.
names(die) <- c("uno", "dos", "tres", "quatro")
die
## uno dos tres quatro
## 1 2 3 4
We can also remove names.
#NULL- removes the names of the elements.
names(die) <- NULL
die
## [1] 1 2 3 4
#Creating n dimensional Structures
A vector is a one-dimensional array. A matrix is a two-dimensional array; therefore is the same thing as a matrix. Modifying the dim attribute of an atomic vector into either a matrix or an array with more than three dimensions.
For example you can reorganize die into a 2 × 2 matrix.
#dim() -used to get or set the dimension of the specified matrix, array or data frame.
dim(die) <- c(2, 2)
die
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
R will always use the first value in dim for the number of rows and the second value for the number of columns. In general, rows always come first in R operations that deal with both rows and columns.
#same as above because I only had 4 elements
dim(die) <- c(2, 2)
die
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
Notice how by default R fills up each matrix by columns.
#hypercube
#An array because it in specified order 1:4
dim(die) <- c(1, 2, 2)
class(die)
## [1] "array"
If you’d like more control over how the data is stored, you can use one of R’s helper functions, matrix or array. They do the same thing as changing the dim attribute, but they provide extra arguments to customize the process. #Matrix Function
#Creates a matrix with 2 rows
m <- matrix(die, nrow = 2)
m
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
#BYROW = byrow=TRUE indicates that the matrix should be filled by rows. byrow=FALSE indicates that the matrix should be filled by columns (the default).
m <- matrix(die, nrow = 2, byrow = TRUE)
m
## [,1] [,2]
## [1,] 1 2
## [2,] 3 4
#Array Function The array function creates an n-dimensional array.
#Creates a. array/ matrix - 2 rows by 2 rows into 3 lists.
ar <- array(c(11:14, 21:24, 31:34), dim = c(2, 2, 3))
ar
## , , 1
##
## [,1] [,2]
## [1,] 11 13
## [2,] 12 14
##
## , , 2
##
## [,1] [,2]
## [1,] 21 23
## [2,] 22 24
##
## , , 3
##
## [,1] [,2]
## [1,] 31 33
## [2,] 32 34
Notice that changing the dimensions of your object will not change the type of the object, but it will change the object’s class attribute:
#typeof() function in R Language is used to return the types of data used as the arguments.
#'double'- means there was was 2 integers
#matrix, array- the elements are in a mtrax, but are listed as an array.
dim(die) <- c(2, 2)
typeof(die)
## [1] "double"
class(die)
## [1] "matrix" "array"
Note that an object’s class attribute will not always appear when you run attributes; you may need to specifically search for it with class: attributes(die)
#class not shown.
attributes(die)
## $dim
## [1] 2 2
You can apply class to objects that do not have a class attribute. class will return a value based on the object’s atomic type. Notice that the “class” of a double is “numeric,” an odd deviation, but one I am thankful for. I think that the most important property of a double vector is that it contains numbers, a property that “numeric” makes obvious:
#showing the class for characters, and integers
class("Hello")
## [1] "character"
class(5)
## [1] "numeric"
#sys.time()- returns an absolute date-time value which can be converted to various time zones and may return different days.
#class- POSIXct stores both a date and time with an associated time zone.
now <- Sys.time()
now
## [1] "2022-11-22 03:14:17 UTC"
typeof(now)
## [1] "double"
class(now)
## [1] "POSIXct" "POSIXt"
POSIXct is a framework for representing dates and times. Time is represented by the number of seconds that have passed between now and12:00 AM January 1st 1970 (in the Universal Time Coordinated (UTC) zone). You can see this number by removing the class attribute of now, or by using the un class function, which does the same thing:
#displays and removes the class for the vector.
unclass(now)
## [1] 1669086858
R then gives the double vector a class attribute that contains two classes, “POSIXct” and “POSIXt”. This attribute alerts R functions that they are dealing with a POSIXct time, so they can treat it in a special way. For example, R functions will use the POSIXct standard to convert the time into a user-friendly character string before displaying it. You can take advantage of this system by giving the POSIXct class to random R objects. For example, have you ever wondered what day it was a million seconds after 12:00 a.m. Jan. 1, 1970?
#The POSIXct class stores date/time values as the number of seconds since January 1, 1970,
#while the POSIXlt class stores them as a list with elements for second, minute, hour, day, month, and year, among others.
mil <- 1000000
mil
## [1] 1e+06
class(mil) <- c("POSIXct", "POSIXt")
mil
## [1] "1970-01-12 13:46:40 UTC"
#Factors
#Creates a vector, named gender, of factors (mal, female)
gender <- factor(c("male", "female", "female", "male"))
typeof(gender)
## [1] "integer"
attributes(gender)
## $levels
## [1] "female" "male"
##
## $class
## [1] "factor"
#naming and taking away the class given to genders
unclass(gender)
## [1] 2 1 1 2
## attr(,"levels")
## [1] "female" "male"
#displaying vector 'gender'
gender
## [1] male female female male
## Levels: female male
#character() function in R converts a numeric object to a string data type or a character object.
as.character(gender)
## [1] "male" "female" "female" "male"
#Coercion
#finds the summary of each vector which is 2
sum(c(TRUE, TRUE, FALSE, FALSE))
## [1] 2
#will become:
sum(c(1, 1, 0, 0))
## [1] 2
#character() function in R converts a numeric object to a string data type or a character object.
#as. logical(x) function converts x to logical or integer values
#numeric() function in R is used to convert a character vector into a numeric vector.
as.character(1)
## [1] "1"
## "1"
as.logical(1)
## [1] TRUE
## TRUE
as.numeric(FALSE)
## [1] 0
## 0
#Lists Lists do not group together individual values; lists group together R objects, they are used as building blocks to create many more spohisticated types of R objects.
#created a lists 3 functions. 100-130, r, and then whether its true, false
list1 <- list(100:130, "R", list(TRUE, FALSE))
list1
## [[1]]
## [1] 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118
## [20] 119 120 121 122 123 124 125 126 127 128 129 130
##
## [[2]]
## [1] "R"
##
## [[3]]
## [[3]][[1]]
## [1] TRUE
##
## [[3]][[2]]
## [1] FALSE
#Data Frames Data frames are the two-dimensional version of a list. They are far and away the most useful storage structure for data analysis, and they provide an ideal way to store an entire deck of cards. You can think of a data frame as R’s equivalent to the Excel spreadsheet because it stores data in a similar format.
#Data frame created with 3 columns and 3 elemnts each- leading to 3 rows.
df <- data.frame(face = c("ace", "two", "six"),
suit = c("clubs", "clubs", "clubs"), value = c(1, 2, 3))
df
Data frames cannot combine columns of different lengths.
#Using stringsAsFactors=FALSE. By default, when building or importing a data frame, the columns that contain characters (i.e., text) are coerced (=converted) into the factor data type.
df <- data.frame(face = c("ace", "two", "six"),
suit = c("clubs", "clubs", "clubs"), value = c(1, 2, 3),
stringsAsFactors = FALSE)
df
typeof(df)
## [1] "list"
class(df)
## [1] "data.frame"
str(df)
## 'data.frame': 3 obs. of 3 variables:
## $ face : chr "ace" "two" "six"
## $ suit : chr "clubs" "clubs" "clubs"
## $ value: num 1 2 3
df <- data.frame(face = c("ace", "two", "six"),
suit = c("clubs", "clubs", "clubs"), value = c(1, 2, 3),
stringsAsFactors = FALSE)
df