paste()
concatenates character strings. Any numbers are cast as a character with the paste()
command. Arguments are specified by a space unless indicated otherwise useing the sep=
(separator) argument.paste('Georgia',"Bulldogs")
## [1] "Georgia Bulldogs"
y=2
paste("The value of y is",y)
## [1] "The value of y is 2"
labs = paste(c('X','Y'),1:10,sep='')
labs
## [1] "X1" "Y2" "X3" "Y4" "X5" "Y6" "X7" "Y8" "X9" "Y10"
labs2 = paste('X','Y',1:10,sep='')
labs2
## [1] "XY1" "XY2" "XY3" "XY4" "XY5" "XY6" "XY7" "XY8" "XY9" "XY10"
Logical vector elements take the values TRUE (T), FALSE (F), and NA. These vectors are generated by conditions. For instance temp = x>14
(where x is a numeric vector). Here, elements meeting the condiction are set to TRUE, whereas those that are not are set to FALSE.
Potential logical operators include (disregard the colon):
<
, >
, <=
, =>
: same as in SAS==
: strict equality!=
: not equal to&&
: and||
: or!
: notIf used in arithmetic, FALSE is cast as a 0 and TRUE is cast as a 1.
x=c(1,1,2,3,4,1,7,1)
## When is x equal to 1?
x==1
## [1] TRUE TRUE FALSE FALSE FALSE TRUE FALSE TRUE
## Count the number of 1's in x
sum(x==1)
## [1] 4
Conversely, a numeric value of 0 converts to FALSE and all other values convert to TRUE.
x=0 # FALSE
y=17 # TRUE
x&&y # TRUE if both are
## [1] FALSE
x||y # TRUE if at least one is TRUE
## [1] TRUE
Functions for changing the mode of an object:
as.logical()
: makes all elements Boolean (logical)as.numeric()
: makes all elements numericas.character()
: makes all elements character stringIf an element cannot be transformed, it will be set to NA.
x=c(T,F)
y=c(0,1)
z=c('a','b')
as.numeric(x)
## [1] 1 0
as.character(x)
## [1] "TRUE" "FALSE"
as.logical(y)
## [1] FALSE TRUE
as.character(y)
## [1] "0" "1"
as.logical(z)
## [1] NA NA
as.numeric(z)
## Warning: NAs introduced by coercion
## [1] NA NA
Now try on your own z=('a','1')
.
This allows for the creation of indicator vectors. factor(x)
creates levels based on the values in x.
x=c(1,1,1,1,2,2,2,2,9,9,9,9)
factor(x)
## [1] 1 1 1 1 2 2 2 2 9 9 9 9
## Levels: 1 2 9
This bit of code creates unordered factors 1, 2, and 9 and treats them like treatment levels.
x=c('like','dislike','hate','like',"don't know", 'like','dislike')
factor(x,levels=c('hate','dislike',"don't know",'like'),ordered=T)
## [1] like dislike hate like don't know like
## [7] dislike
## Levels: hate < dislike < don't know < like
This bit of code will assume the ordering of the factor levels as given. If you omit levels, R will order them alphabetically. If you leave out one or more of the element from levels, R will return NA. Try these.
These are subsets of elements of vectors.
x = c(1:3,NA,4)
x
## [1] 1 2 3 NA 4
l=c(T,T,T,F,T)
## Selecting the non-missing values of x
x[l]
## [1] 1 2 3 4
x=1:10
x
## [1] 1 2 3 4 5 6 7 8 9 10
x[5]
## [1] 5
x[2:7]
## [1] 2 3 4 5 6 7
x[c(9,1,1,4)]
## [1] 9 1 1 4
c("x","y")[rep(c(1,1,2,2),times=2)]
## [1] "x" "x" "y" "y" "x" "x" "y" "y"
x
## [1] 1 2 3 4 5 6 7 8 9 10
x[c(-2,-5)]
## [1] 1 3 4 6 7 8 9 10
fruit=c(5,10,1,20)
names(fruit) = c('orange','banana','apple','peach')
fruit
## orange banana apple peach
## 5 10 1 20
fruit[c('apple','orange')]
## apple orange
## 1 5
x[c(T,T,T,F,T)]
## [1] 1 2 3 5 6 7 8 10
x[c(T,T,T,F,T)]=0
y
## [1] 0 1
y=y-2
y
## [1] -2 -1
y[y<0] = -y[y<0] # This is equivalent to an absolute value
y
## [1] 2 1
1:30
is equivalent to c(1,2,...,30)
. The :
operator has high priority in an expression meaning that it is performed first!
2*1:15
## [1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
n=10
1:n-1
## [1] 0 1 2 3 4 5 6 7 8 9
1:(n-1)
## [1] 1 2 3 4 5 6 7 8 9
30:1
generates the sequence backwards. The function seq()
is a more general way to generate sequences.
seq(10)
## [1] 1 2 3 4 5 6 7 8 9 10
## Start at -5 and stop at 5 by increments of 0.2
seq(from=-5,to=5,by=0.2)
## [1] -5.0 -4.8 -4.6 -4.4 -4.2 -4.0 -3.8 -3.6 -3.4 -3.2 -3.0 -2.8 -2.6 -2.4
## [15] -2.2 -2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4
## [29] 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2
## [43] 3.4 3.6 3.8 4.0 4.2 4.4 4.6 4.8 5.0
## Another way to generate the same sequence
seq(length=51,from=-5,by=0.2)
## [1] -5.0 -4.8 -4.6 -4.4 -4.2 -4.0 -3.8 -3.6 -3.4 -3.2 -3.0 -2.8 -2.6 -2.4
## [15] -2.2 -2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4
## [29] 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2
## [43] 3.4 3.6 3.8 4.0 4.2 4.4 4.6 4.8 5.0
Similarly, we have rep()
(replicate).
x=1:10
rep(x,times=5)
## [1] 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3
## [24] 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6
## [47] 7 8 9 10
rep(x,each=5)
## [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5
## [24] 5 5 6 6 6 6 6 7 7 7 7 7 8 8 8 8 8 9 9 9 9 9 10
## [47] 10 10 10 10
NA
= not available or missing. In general, operations are not completed if an NA is present. Most stat functions have an option to ignore NAs
mean(x)
## [1] 5.5
x[4] = NA
mean(x)
## [1] NA
mean(x,na.rm=T)
## [1] 5.666667
is.na()
: returns a logical vector that is TRUE where the argument is NA, and FALSE otherwise.z=c(1:3,NA)
z
## [1] 1 2 3 NA
is.na(z)
## [1] FALSE FALSE FALSE TRUE
The expression x==NA
does not work because NA is not a value - it’s a marker.
NaN
: Not a (representable) Number. Too big: Inf
, Too small: -Inf
.0/0
## [1] NaN
Inf-Inf
## [1] NaN
is.na()
returns TRUE for both NA
and NaN
. is.nan()
returns TRUE for only NaN
(FALSE for NA
).
SAS::SASdatasets R::objects
R objects include vectors (“atomic” = components are all the same type), lists, arrays, matrices, tables, and dataframes.
is()
returns the type of a named R object. We can recast an object (change its type). This will affect the behavoir of R functions. Objects have modes (numeric, character, logical, etc.), just like vectors do. The mode can also be recast.
Properties of objects can be observed with these common functions. Their behaviors will depend on the object type.
length(object)
attributes(object)
: e.g. creation dateattr(object,name)
: selects a specific attribute## Change a vector to a matrix
z=1:25
is(z)
## [1] "integer" "numeric" "vector"
## [4] "data.frameRowLabels"
attr(z,'dim') = c(5,5)
z
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 6 11 16 21
## [2,] 2 7 12 17 22
## [3,] 3 8 13 18 23
## [4,] 4 9 14 19 24
## [5,] 5 10 15 20 25
is(z)
## [1] "matrix" "array" "structure" "vector"
class()
: numeric, logical, character, matrix, array, factor, list, data.frame, etc. An object’s class will affect how functions (i.e. plot()
and summary()
) process the object.age=18:29
height=c(76.1,77,78.1,78.2,78.8,79.7,79.9,81.1,81.2,81.8,82.8,83.5)
village=data.frame(age=age,Var_Name2=height)
print(village)
## age Var_Name2
## 1 18 76.1
## 2 19 77.0
## 3 20 78.1
## 4 21 78.2
## 5 22 78.8
## 6 23 79.7
## 7 24 79.9
## 8 25 81.1
## 9 26 81.2
## 10 27 81.8
## 11 28 82.8
## 12 29 83.5
plot(village)
See what happens with these functions: is()
, length()
, class()
, attributes()
, and summary()
. unclass()
prints the dataframe as its base parts.
Vectors are the bottom of the object heirarchy. A single value is still considered a vector! Vectors can be extended without special considerations like compatible size.
a=c(4,6,8)
a[5] = 9
## Automatically inserts NA at position 4
a
## [1] 4 6 8 NA 9
This is a data structure of all one type. Vectors and matrices are special types of arrays. Caution: A 10 by 1 matrix is NOT a vector of length 10! These objects are different! Matrices can be built in many ways:
matrix(vector,nrow=n,ncol=p)
creates an \(n\times p\) matrix by filling columns from left to right.matrix(1:6,nrow=2,ncol=3)
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
cbind()
will combine vertically and rbind()
will combine horizontally.x=11:13
y=c(55,33,12)
rbind(x,y) ## Creates a 2 by 3 matrix
## [,1] [,2] [,3]
## x 11 12 13
## y 55 33 12
cbind(x,y) ## Creates a 3 by 2 matrix
## x y
## [1,] 11 55
## [2,] 12 33
## [3,] 13 12
Typically, mathematical functions tend to be performed element-wise. But, there are are matrix-specific functions:
det()
: matrix determinantt()
: matrix transposesolve()
: matrix inverse%*%
: matrix multiplicationdim()
: returns the dimensions of the matrixMatrix elements can be referenced by specifying the row and column numbers.
print(z)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 6 11 16 21
## [2,] 2 7 12 17 22
## [3,] 3 8 13 18 23
## [4,] 4 9 14 19 24
## [5,] 5 10 15 20 25
z[,3] ## The third column of z
## [1] 11 12 13 14 15
z[1,] ## The first row of z
## [1] 1 6 11 16 21
z[5,3] = 13 ## Replaces the element in the 1st row & 3rd column with 13
z[1,] = c(2,2,3,4,5) ## replaces the 1st row
print(z)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 2 2 3 4 5
## [2,] 2 7 12 17 22
## [3,] 3 8 13 18 23
## [4,] 4 9 14 19 24
## [5,] 5 10 13 20 25
Clearly, dimensions in appropriate directs must line up. Arrays may be multi-dimensional. For example, an array may be \(3 \times 4\times 2\) and viewed as a stack of two matrices that are \(3\times4\). Or, if you need to index space and time, your array needs to be 4-dimensional: (x,y,z,t)
.
a=matrix(8,2,3) ## Creates a 2 by 3 matrix of 8's
b=matrix(9,2,3) ## Creates a 2 by 3 matrix of 9's
array(c(a,b),c(2,3,2))
## , , 1
##
## [,1] [,2] [,3]
## [1,] 8 8 8
## [2,] 8 8 8
##
## , , 2
##
## [,1] [,2] [,3]
## [1,] 9 9 9
## [2,] 9 9 9
The first element in array(c(a,b),c(2,3,3))
(c(a,b)
) indicates the data used to fill the array. The second element (c(2,3,3)
) represents the dimension attribute and is a vector giving the max indices in each direction.
apply()
helps you use arrays. This applies a given function to each row (1st dimension), column (2nd dimension), or level of a higher dimension.
a = matrix(1:6,2,3)
a
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
apply(a,2,max) ## Finds the maximum of each column
## [1] 2 4 6
apply(a,1,sum) ## Row sums of a
## [1] 9 12
The apply()
functions (there are more similar functions not noted) are extremely handy because R is very inefficient with loops.
Lists are different from arrays in that their elements can be of differing kinds. In fact, they are so flexible, that you can have a list of lists! A list could include:
The function names()
displays the named items of a list.
names(airquality)
## [1] "Ozone" "Solar.R" "Wind" "Temp" "Month" "Day"
airquality$Temp ## The specific item called Temp in the list airquality
## [1] 67 72 74 62 56 66 65 59 61 69 74 69 66 68 58 64 66 57 68 62 59 73 61
## [24] 61 57 58 57 67 81 79 76 78 74 67 84 85 79 82 87 90 87 93 92 82 80 79
## [47] 77 72 65 73 76 77 76 76 76 75 78 73 80 77 83 84 85 81 84 83 83 88 92
## [70] 92 89 82 73 81 91 80 81 82 84 87 85 74 81 82 86 85 82 86 88 86 83 81
## [93] 81 81 82 86 85 87 89 90 90 92 86 86 82 80 79 77 79 76 78 78 77 72 75
## [116] 79 81 86 88 97 94 96 94 91 92 93 93 87 84 80 78 75 73 81 76 77 71 71
## [139] 78 67 76 68 82 64 71 81 69 63 70 77 75 76 68
If the list items are not named, we must use double-brackets to extract. For example, L[[4]]
extracts the 4th item from list L
and L[[4]]='b'
replaces the 4th item with the character string 'b'
. Lists are similar to vectors in that we can add to them with no regard for size.
## First, create an empty list
L = list()
## Inserts a vector into the list & names it Coefficients
L$Coefficients = c(1,4,6,8)
## Extract the names of L
names(L)
## [1] "Coefficients"
## Inserting vector into 4th position. Empty elements are set to Null
L[[4]] = c(5,8,9)
L
## $Coefficients
## [1] 1 4 6 8
##
## [[2]]
## NULL
##
## [[3]]
## NULL
##
## [[4]]
## [1] 5 8 9
names(L)
## [1] "Coefficients" "" "" ""
You can easily create a list using objects in your workspace:
k = c(1,4,6,8,10)
v = 64
L = list(coefficients=k,variance=v)
L
## $coefficients
## [1] 1 4 6 8 10
##
## $variance
## [1] 64
Most of R’s objects are actually lists.