Character Strings:

  1. A character string is a vector encased in “example” or ‘example’ (i.e “x-axis” or ‘x-axis’ are equivalent)
  2. paste() concatenates character strings. Any numbers are cast as a character with the paste() command. Arguments are specified by a space unless indicated otherwise useing the sep= (separator) argument.
paste('Georgia',"Bulldogs")
## [1] "Georgia Bulldogs"
y=2
paste("The value of y is",y)
## [1] "The value of y is 2"
labs = paste(c('X','Y'),1:10,sep='')
labs
##  [1] "X1"  "Y2"  "X3"  "Y4"  "X5"  "Y6"  "X7"  "Y8"  "X9"  "Y10"
labs2 = paste('X','Y',1:10,sep='')
labs2
##  [1] "XY1"  "XY2"  "XY3"  "XY4"  "XY5"  "XY6"  "XY7"  "XY8"  "XY9"  "XY10"

Logical Mode:

Logical vector elements take the values TRUE (T), FALSE (F), and NA. These vectors are generated by conditions. For instance temp = x>14 (where x is a numeric vector). Here, elements meeting the condiction are set to TRUE, whereas those that are not are set to FALSE.

Potential logical operators include (disregard the colon):

  1. <, >, <=, =>: same as in SAS
  2. ==: strict equality
  3. !=: not equal to
  4. &&: and
  5. ||: or
  6. !: not

If used in arithmetic, FALSE is cast as a 0 and TRUE is cast as a 1.

x=c(1,1,2,3,4,1,7,1)
## When is x equal to 1?
x==1
## [1]  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE  TRUE
## Count the number of 1's in x
sum(x==1)
## [1] 4

Conversely, a numeric value of 0 converts to FALSE and all other values convert to TRUE.

x=0 # FALSE
y=17 # TRUE
x&&y # TRUE if both are
## [1] FALSE
x||y # TRUE if at least one is TRUE
## [1] TRUE

Changing Modes:

Functions for changing the mode of an object:

  1. as.logical(): makes all elements Boolean (logical)
  2. as.numeric(): makes all elements numeric
  3. as.character(): makes all elements character string

If an element cannot be transformed, it will be set to NA.

x=c(T,F)
y=c(0,1)
z=c('a','b')
as.numeric(x)
## [1] 1 0
as.character(x)
## [1] "TRUE"  "FALSE"
as.logical(y)
## [1] FALSE  TRUE
as.character(y)
## [1] "0" "1"
as.logical(z)
## [1] NA NA
as.numeric(z)
## Warning: NAs introduced by coercion
## [1] NA NA

Now try on your own z=('a','1').

Factor Class:

This allows for the creation of indicator vectors. factor(x) creates levels based on the values in x.

x=c(1,1,1,1,2,2,2,2,9,9,9,9)
factor(x)
##  [1] 1 1 1 1 2 2 2 2 9 9 9 9
## Levels: 1 2 9

This bit of code creates unordered factors 1, 2, and 9 and treats them like treatment levels.

x=c('like','dislike','hate','like',"don't know", 'like','dislike')
factor(x,levels=c('hate','dislike',"don't know",'like'),ordered=T)
## [1] like       dislike    hate       like       don't know like      
## [7] dislike   
## Levels: hate < dislike < don't know < like

This bit of code will assume the ordering of the factor levels as given. If you omit levels, R will order them alphabetically. If you leave out one or more of the element from levels, R will return NA. Try these.

Indexing Vectors:

These are subsets of elements of vectors.

  1. Logical Index: This index will keep the elements that are TRUE and omit those that are FALSE.
x = c(1:3,NA,4)
x
## [1]  1  2  3 NA  4
l=c(T,T,T,F,T)
## Selecting the non-missing values of x
x[l]
## [1] 1 2 3 4
  1. Positive Integers: This index will only take values in the set {1,…,length}
x=1:10
x
##  [1]  1  2  3  4  5  6  7  8  9 10
x[5]
## [1] 5
x[2:7]
## [1] 2 3 4 5 6 7
x[c(9,1,1,4)]
## [1] 9 1 1 4
c("x","y")[rep(c(1,1,2,2),times=2)]
## [1] "x" "x" "y" "y" "x" "x" "y" "y"
  1. Negative Integers: This will exclude these elements.
x
##  [1]  1  2  3  4  5  6  7  8  9 10
x[c(-2,-5)]
## [1]  1  3  4  6  7  8  9 10
  1. Character Index: This can only be used when a name attribute identifies components.
fruit=c(5,10,1,20)
names(fruit) = c('orange','banana','apple','peach')
fruit
## orange banana  apple  peach 
##      5     10      1     20
fruit[c('apple','orange')]
##  apple orange 
##      1      5
  1. Also…
x[c(T,T,T,F,T)]
## [1]  1  2  3  5  6  7  8 10
x[c(T,T,T,F,T)]=0
y
## [1] 0 1
y=y-2
y
## [1] -2 -1
y[y<0] = -y[y<0] # This is equivalent to an absolute value
y
## [1] 2 1

Generating Sequences:

1:30 is equivalent to c(1,2,...,30). The : operator has high priority in an expression meaning that it is performed first!

2*1:15
##  [1]  2  4  6  8 10 12 14 16 18 20 22 24 26 28 30
n=10
1:n-1
##  [1] 0 1 2 3 4 5 6 7 8 9
1:(n-1)
## [1] 1 2 3 4 5 6 7 8 9

30:1 generates the sequence backwards. The function seq() is a more general way to generate sequences.

seq(10)
##  [1]  1  2  3  4  5  6  7  8  9 10
## Start at -5 and stop at 5 by increments of 0.2
seq(from=-5,to=5,by=0.2)
##  [1] -5.0 -4.8 -4.6 -4.4 -4.2 -4.0 -3.8 -3.6 -3.4 -3.2 -3.0 -2.8 -2.6 -2.4
## [15] -2.2 -2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2  0.0  0.2  0.4
## [29]  0.6  0.8  1.0  1.2  1.4  1.6  1.8  2.0  2.2  2.4  2.6  2.8  3.0  3.2
## [43]  3.4  3.6  3.8  4.0  4.2  4.4  4.6  4.8  5.0
## Another way to generate the same sequence
seq(length=51,from=-5,by=0.2)
##  [1] -5.0 -4.8 -4.6 -4.4 -4.2 -4.0 -3.8 -3.6 -3.4 -3.2 -3.0 -2.8 -2.6 -2.4
## [15] -2.2 -2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2  0.0  0.2  0.4
## [29]  0.6  0.8  1.0  1.2  1.4  1.6  1.8  2.0  2.2  2.4  2.6  2.8  3.0  3.2
## [43]  3.4  3.6  3.8  4.0  4.2  4.4  4.6  4.8  5.0

Similarly, we have rep() (replicate).

x=1:10
rep(x,times=5)
##  [1]  1  2  3  4  5  6  7  8  9 10  1  2  3  4  5  6  7  8  9 10  1  2  3
## [24]  4  5  6  7  8  9 10  1  2  3  4  5  6  7  8  9 10  1  2  3  4  5  6
## [47]  7  8  9 10
rep(x,each=5)
##  [1]  1  1  1  1  1  2  2  2  2  2  3  3  3  3  3  4  4  4  4  4  5  5  5
## [24]  5  5  6  6  6  6  6  7  7  7  7  7  8  8  8  8  8  9  9  9  9  9 10
## [47] 10 10 10 10

Missing Values:

NA = not available or missing. In general, operations are not completed if an NA is present. Most stat functions have an option to ignore NAs

mean(x)
## [1] 5.5
x[4] = NA
mean(x)
## [1] NA
mean(x,na.rm=T)
## [1] 5.666667
  1. is.na(): returns a logical vector that is TRUE where the argument is NA, and FALSE otherwise.
z=c(1:3,NA)
z
## [1]  1  2  3 NA
is.na(z)
## [1] FALSE FALSE FALSE  TRUE

The expression x==NA does not work because NA is not a value - it’s a marker.

  1. NaN: Not a (representable) Number. Too big: Inf, Too small: -Inf.
0/0
## [1] NaN
Inf-Inf
## [1] NaN

is.na() returns TRUE for both NA and NaN. is.nan() returns TRUE for only NaN (FALSE for NA).

ASSIGNMENT 8

Objects

SAS::SASdatasets R::objects

R objects include vectors (“atomic” = components are all the same type), lists, arrays, matrices, tables, and dataframes.

is() returns the type of a named R object. We can recast an object (change its type). This will affect the behavoir of R functions. Objects have modes (numeric, character, logical, etc.), just like vectors do. The mode can also be recast.

Properties

Properties of objects can be observed with these common functions. Their behaviors will depend on the object type.

  1. length(object)
  2. attributes(object): e.g. creation date
  3. attr(object,name): selects a specific attribute
## Change a vector to a matrix
z=1:25
is(z)
## [1] "integer"             "numeric"             "vector"             
## [4] "data.frameRowLabels"
attr(z,'dim') = c(5,5)
z
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    6   11   16   21
## [2,]    2    7   12   17   22
## [3,]    3    8   13   18   23
## [4,]    4    9   14   19   24
## [5,]    5   10   15   20   25
is(z)
## [1] "matrix"    "array"     "structure" "vector"
  1. class(): numeric, logical, character, matrix, array, factor, list, data.frame, etc. An object’s class will affect how functions (i.e. plot() and summary()) process the object.
age=18:29
height=c(76.1,77,78.1,78.2,78.8,79.7,79.9,81.1,81.2,81.8,82.8,83.5)
village=data.frame(age=age,Var_Name2=height)
print(village)
##    age Var_Name2
## 1   18      76.1
## 2   19      77.0
## 3   20      78.1
## 4   21      78.2
## 5   22      78.8
## 6   23      79.7
## 7   24      79.9
## 8   25      81.1
## 9   26      81.2
## 10  27      81.8
## 11  28      82.8
## 12  29      83.5
plot(village)

See what happens with these functions: is(), length(), class(), attributes(), and summary(). unclass() prints the dataframe as its base parts.

Vectors

Vectors are the bottom of the object heirarchy. A single value is still considered a vector! Vectors can be extended without special considerations like compatible size.

a=c(4,6,8)
a[5] = 9
## Automatically inserts NA at position 4
a
## [1]  4  6  8 NA  9

Arrays

This is a data structure of all one type. Vectors and matrices are special types of arrays. Caution: A 10 by 1 matrix is NOT a vector of length 10! These objects are different! Matrices can be built in many ways:

  1. From a Vector: matrix(vector,nrow=n,ncol=p) creates an \(n\times p\) matrix by filling columns from left to right.
matrix(1:6,nrow=2,ncol=3)
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
  1. From \(2^+\) Vectors: cbind() will combine vertically and rbind() will combine horizontally.
x=11:13
y=c(55,33,12)
rbind(x,y) ## Creates a 2 by 3 matrix
##   [,1] [,2] [,3]
## x   11   12   13
## y   55   33   12
cbind(x,y) ## Creates a 3 by 2 matrix
##       x  y
## [1,] 11 55
## [2,] 12 33
## [3,] 13 12

Typically, mathematical functions tend to be performed element-wise. But, there are are matrix-specific functions:

  • det(): matrix determinant
  • t(): matrix transpose
  • solve(): matrix inverse
  • %*%: matrix multiplication
  • dim(): returns the dimensions of the matrix

Matrix elements can be referenced by specifying the row and column numbers.

print(z)
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    6   11   16   21
## [2,]    2    7   12   17   22
## [3,]    3    8   13   18   23
## [4,]    4    9   14   19   24
## [5,]    5   10   15   20   25
z[,3] ## The third column of z
## [1] 11 12 13 14 15
z[1,] ## The first row of z
## [1]  1  6 11 16 21
z[5,3] = 13 ## Replaces the element in the 1st row & 3rd column with 13
z[1,] = c(2,2,3,4,5) ## replaces the 1st row
print(z)
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    2    2    3    4    5
## [2,]    2    7   12   17   22
## [3,]    3    8   13   18   23
## [4,]    4    9   14   19   24
## [5,]    5   10   13   20   25

Clearly, dimensions in appropriate directs must line up. Arrays may be multi-dimensional. For example, an array may be \(3 \times 4\times 2\) and viewed as a stack of two matrices that are \(3\times4\). Or, if you need to index space and time, your array needs to be 4-dimensional: (x,y,z,t).

a=matrix(8,2,3) ## Creates a 2 by 3 matrix of 8's
b=matrix(9,2,3) ## Creates a 2 by 3 matrix of 9's
array(c(a,b),c(2,3,2))
## , , 1
## 
##      [,1] [,2] [,3]
## [1,]    8    8    8
## [2,]    8    8    8
## 
## , , 2
## 
##      [,1] [,2] [,3]
## [1,]    9    9    9
## [2,]    9    9    9

The first element in array(c(a,b),c(2,3,3)) (c(a,b)) indicates the data used to fill the array. The second element (c(2,3,3)) represents the dimension attribute and is a vector giving the max indices in each direction.

apply() helps you use arrays. This applies a given function to each row (1st dimension), column (2nd dimension), or level of a higher dimension.

a = matrix(1:6,2,3)
a
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
apply(a,2,max) ## Finds the maximum of each column
## [1] 2 4 6
apply(a,1,sum) ## Row sums of a
## [1]  9 12

The apply() functions (there are more similar functions not noted) are extremely handy because R is very inefficient with loops.

Lists

Lists are different from arrays in that their elements can be of differing kinds. In fact, they are so flexible, that you can have a list of lists! A list could include:

  • A vector of coefficients
  • Var-cov matrix
  • List of descriptive character strings

The function names() displays the named items of a list.

names(airquality)
## [1] "Ozone"   "Solar.R" "Wind"    "Temp"    "Month"   "Day"
airquality$Temp ## The specific item called Temp in the list airquality
##   [1] 67 72 74 62 56 66 65 59 61 69 74 69 66 68 58 64 66 57 68 62 59 73 61
##  [24] 61 57 58 57 67 81 79 76 78 74 67 84 85 79 82 87 90 87 93 92 82 80 79
##  [47] 77 72 65 73 76 77 76 76 76 75 78 73 80 77 83 84 85 81 84 83 83 88 92
##  [70] 92 89 82 73 81 91 80 81 82 84 87 85 74 81 82 86 85 82 86 88 86 83 81
##  [93] 81 81 82 86 85 87 89 90 90 92 86 86 82 80 79 77 79 76 78 78 77 72 75
## [116] 79 81 86 88 97 94 96 94 91 92 93 93 87 84 80 78 75 73 81 76 77 71 71
## [139] 78 67 76 68 82 64 71 81 69 63 70 77 75 76 68

If the list items are not named, we must use double-brackets to extract. For example, L[[4]] extracts the 4th item from list L and L[[4]]='b' replaces the 4th item with the character string 'b'. Lists are similar to vectors in that we can add to them with no regard for size.

## First, create an empty list
L = list()
## Inserts a vector into the list & names it Coefficients
L$Coefficients = c(1,4,6,8)
## Extract the names of L
names(L)
## [1] "Coefficients"
## Inserting vector into 4th position. Empty elements are set to Null
L[[4]] = c(5,8,9)
L
## $Coefficients
## [1] 1 4 6 8
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL
## 
## [[4]]
## [1] 5 8 9
names(L)
## [1] "Coefficients" ""             ""             ""

You can easily create a list using objects in your workspace:

k = c(1,4,6,8,10)
v = 64
L = list(coefficients=k,variance=v)
L
## $coefficients
## [1]  1  4  6  8 10
## 
## $variance
## [1] 64

Most of R’s objects are actually lists.