Lecture 2 Notes

March 23, 2016

• Numeric
• Character
• Logical
• Factor

R Data Structures

• Vector
• Matrix
• Dataframe
• List

Most of our work will be concerned with vectors and dataframes.

Specifying an individual element of a vector

First create a vector with c(), supply names and examine the vector.

vec <- c(6,7,8,9)
names(vec) <- c("a","b","c","d")
# Simple display
vec
## a b c d
## 6 7 8 9
# Look at the properties.
str(vec)
##  Named num [1:4] 6 7 8 9
##  - attr(*, "names")= chr [1:4] "a" "b" "c" "d"

All of the elements of a vector must be of the same type. R will allow you to put any atomic data element in a vector, but will perform coercion to enforce this restriction.

vec["a"] = "six"
# Simple display
vec
##     a     b     c     d
## "six"   "7"   "8"   "9"
# Look at the properties.
str(vec)
##  Named chr [1:4] "six" "7" "8" "9"
##  - attr(*, "names")= chr [1:4] "a" "b" "c" "d"

Return vec to being numeric.

# Let's try a simple fix. Note that we can use the positional index or the name to refer to an element of a vector.
vec[1] <- 6
# Simple display
vec
##   a   b   c   d
## "6" "7" "8" "9"
# Look at the properties.
str(vec)
##  Named chr [1:4] "6" "7" "8" "9"
##  - attr(*, "names")= chr [1:4] "a" "b" "c" "d"
# Since these character strings are all numbers, we can manually coerce.
vec <- as.numeric(vec)
# Simple display
vec
## [1] 6 7 8 9
# Look at the properties.
str(vec)
##  num [1:4] 6 7 8 9

Note that when we replaced the entire vector we lost the names.

Now let’s see what happens when we replace a numeric element wih a boolean.

vec[1] <- FALSE
# Simple display
vec
## [1] 0 7 8 9
# Look at the properties.
str(vec)
##  num [1:4] 0 7 8 9

Let’s try to make a logical vector

vec <- as.logical(vec)
# Simple display
vec
## [1] FALSE  TRUE  TRUE  TRUE
# Look at the properties.
str(vec)
##  logi [1:4] FALSE TRUE TRUE TRUE

Specifying and using subvectors of a vector

If x is a vector x[sub-vector specification] defines a sub-vector of x. The subvector may be a vector of numbers indicating positions in x, a logical vector or names if they have been assigned.

x <- 11:20
names(x) = c("a","b","c","d","e",
"f","g","h","i","j")
x
##  a  b  c  d  e  f  g  h  i  j
## 11 12 13 14 15 16 17 18 19 20
x[c(2,3,4)]
##  b  c  d
## 12 13 14
x[c("f","b","e")]
##  f  b  e
## 16 12 15
x[c(2,2,2)]
##  b  b  b
## 12 12 12
x[7:9]
##  g  h  i
## 17 18 19
x[-c(7:9)] # "-" means everything but 
##  a  b  c  d  e  f  j
## 11 12 13 14 15 16 20
x[c(rep(TRUE,4),rep(FALSE,4),rep(TRUE,2))]
##  a  b  c  d  i  j
## 11 12 13 14 19 20
x[c(TRUE,FALSE)] # Note the recycling
##  a  c  e  g  i
## 11 13 15 17 19
x > 5 & x <= 8 # Create a logical vector
##     a     b     c     d     e     f     g     h     i     j
## FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
x[x > 15 & x <= 18] # Use it to specify a subvector
##  f  g  h
## 16 17 18
x[] # No specification means "everything."
##  a  b  c  d  e  f  g  h  i  j
## 11 12 13 14 15 16 17 18 19 20

Note that a subvector may be used to display the subvector, as the right side of a replacement statement or the left side of a replacement statement.

x <- 11:20
x[7:9]
## [1] 17 18 19
y <- x[7:9]
y
## [1] 17 18 19
x[7:9] <- 5
x
##  [1] 11 12 13 14 15 16  5  5  5 20

Subsets of a dataframe

Most of the principles as with vectors, but there two dimensions rather than one. The specifications are separated by a comma. In general we have df[Row Spec,Col Spec]. The result is a new dataframe. A missing dimension must be represented by a blank space.

Recall the dataframe mtcars.

str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ... ##$ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $disp: num 160 160 108 258 360 ... ##$ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ... ##$ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $qsec: num 16.5 17 18.6 19.4 17 ... ##$ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $am : num 1 1 1 0 0 0 0 0 0 0 ... ##$ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $carb: num 4 4 1 1 2 1 4 2 2 4 ... NewDF <- mtcars[3:7,c("cyl","mpg")] NewDF ## cyl mpg ## Datsun 710 4 22.8 ## Hornet 4 Drive 6 21.4 ## Hornet Sportabout 8 18.7 ## Valiant 6 18.1 ## Duster 360 8 14.3 str(NewDF) ## 'data.frame': 5 obs. of 2 variables: ##$ cyl: num  4 6 8 6 8
##  $mpg: num 22.8 21.4 18.7 18.1 14.3 Note that we can use specified subsets of a dataframe the same way we used specified subvectors. mtcars[3:7,c("cyl","mpg")] ## cyl mpg ## Datsun 710 4 22.8 ## Hornet 4 Drive 6 21.4 ## Hornet Sportabout 8 18.7 ## Valiant 6 18.1 ## Duster 360 8 14.3 NewDF <- mtcars[3:7,c("cyl","mpg")] NewDF ## cyl mpg ## Datsun 710 4 22.8 ## Hornet 4 Drive 6 21.4 ## Hornet Sportabout 8 18.7 ## Valiant 6 18.1 ## Duster 360 8 14.3 NewDF[1:2,"mpg"] <- 100 NewDF ## cyl mpg ## Datsun 710 4 100.0 ## Hornet 4 Drive 6 100.0 ## Hornet Sportabout 8 18.7 ## Valiant 6 18.1 ## Duster 360 8 14.3 NewDF[NewDF$mpg > 50,"class"] <- "High MPG"
NewDF # Note that we added a new column and it has NA values where we supplied nothing.
##                   cyl   mpg    class
## Datsun 710          4 100.0 High MPG
## Hornet 4 Drive      6 100.0 High MPG
## Hornet Sportabout   8  18.7     <NA>
## Valiant             6  18.1     <NA>
## Duster 360          8  14.3     <NA>

Let’s fix the NA values. Look at the problem rows first.

NewDF[is.na(NewDF$class),] ## cyl mpg class ## Hornet Sportabout 8 18.7 <NA> ## Valiant 6 18.1 <NA> ## Duster 360 8 14.3 <NA> # Replace the NA values NewDF[is.na(NewDF$class),"class"] <- "Low MPG"
NewDF
##                   cyl   mpg    class
## Datsun 710          4 100.0 High MPG
## Hornet 4 Drive      6 100.0 High MPG
## Hornet Sportabout   8  18.7  Low MPG
## Valiant             6  18.1  Low MPG
## Duster 360          8  14.3  Low MPG