Lecture 2 Notes

Harold Nelson

March 23, 2016

R Data Types

R Data Structures

Most of our work will be concerned with vectors and dataframes.

Specifying an individual element of a vector

First create a vector with c(), supply names and examine the vector.

vec <- c(6,7,8,9)
names(vec) <- c("a","b","c","d")
# Simple display
vec
## a b c d 
## 6 7 8 9
# Look at the properties.
str(vec)
##  Named num [1:4] 6 7 8 9
##  - attr(*, "names")= chr [1:4] "a" "b" "c" "d"

All of the elements of a vector must be of the same type. R will allow you to put any atomic data element in a vector, but will perform coercion to enforce this restriction.

vec["a"] = "six"
# Simple display
vec
##     a     b     c     d 
## "six"   "7"   "8"   "9"
# Look at the properties.
str(vec)
##  Named chr [1:4] "six" "7" "8" "9"
##  - attr(*, "names")= chr [1:4] "a" "b" "c" "d"

Return vec to being numeric.

# Let's try a simple fix. Note that we can use the positional index or the name to refer to an element of a vector.
vec[1] <- 6
# Simple display
vec
##   a   b   c   d 
## "6" "7" "8" "9"
# Look at the properties.
str(vec)
##  Named chr [1:4] "6" "7" "8" "9"
##  - attr(*, "names")= chr [1:4] "a" "b" "c" "d"
# Since these character strings are all numbers, we can manually coerce.
vec <- as.numeric(vec)
# Simple display
vec
## [1] 6 7 8 9
# Look at the properties.
str(vec)
##  num [1:4] 6 7 8 9

Note that when we replaced the entire vector we lost the names.

Now let’s see what happens when we replace a numeric element wih a boolean.

vec[1] <- FALSE
# Simple display
vec
## [1] 0 7 8 9
# Look at the properties.
str(vec)
##  num [1:4] 0 7 8 9

Let’s try to make a logical vector

vec <- as.logical(vec)
# Simple display
vec
## [1] FALSE  TRUE  TRUE  TRUE
# Look at the properties.
str(vec)
##  logi [1:4] FALSE TRUE TRUE TRUE

Specifying and using subvectors of a vector

If x is a vector x[sub-vector specification] defines a sub-vector of x. The subvector may be a vector of numbers indicating positions in x, a logical vector or names if they have been assigned.

x <- 11:20
names(x) = c("a","b","c","d","e",
             "f","g","h","i","j")
x
##  a  b  c  d  e  f  g  h  i  j 
## 11 12 13 14 15 16 17 18 19 20
x[c(2,3,4)]
##  b  c  d 
## 12 13 14
x[c("f","b","e")]
##  f  b  e 
## 16 12 15
x[c(2,2,2)]
##  b  b  b 
## 12 12 12
x[7:9]
##  g  h  i 
## 17 18 19
x[-c(7:9)] # "-" means everything but 
##  a  b  c  d  e  f  j 
## 11 12 13 14 15 16 20
x[c(rep(TRUE,4),rep(FALSE,4),rep(TRUE,2))]
##  a  b  c  d  i  j 
## 11 12 13 14 19 20
x[c(TRUE,FALSE)] # Note the recycling
##  a  c  e  g  i 
## 11 13 15 17 19
x > 5 & x <= 8 # Create a logical vector
##     a     b     c     d     e     f     g     h     i     j 
## FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
x[x > 15 & x <= 18] # Use it to specify a subvector
##  f  g  h 
## 16 17 18
x[] # No specification means "everything."
##  a  b  c  d  e  f  g  h  i  j 
## 11 12 13 14 15 16 17 18 19 20

Note that a subvector may be used to display the subvector, as the right side of a replacement statement or the left side of a replacement statement.

x <- 11:20
x[7:9]
## [1] 17 18 19
y <- x[7:9]
y
## [1] 17 18 19
x[7:9] <- 5
x
##  [1] 11 12 13 14 15 16  5  5  5 20

Subsets of a dataframe

Most of the principles as with vectors, but there two dimensions rather than one. The specifications are separated by a comma. In general we have df[Row Spec,Col Spec]. The result is a new dataframe. A missing dimension must be represented by a blank space.

Recall the dataframe mtcars.

str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
NewDF <- mtcars[3:7,c("cyl","mpg")]
NewDF
##                   cyl  mpg
## Datsun 710          4 22.8
## Hornet 4 Drive      6 21.4
## Hornet Sportabout   8 18.7
## Valiant             6 18.1
## Duster 360          8 14.3
str(NewDF)
## 'data.frame':    5 obs. of  2 variables:
##  $ cyl: num  4 6 8 6 8
##  $ mpg: num  22.8 21.4 18.7 18.1 14.3

Note that we can use specified subsets of a dataframe the same way we used specified subvectors.

mtcars[3:7,c("cyl","mpg")]
##                   cyl  mpg
## Datsun 710          4 22.8
## Hornet 4 Drive      6 21.4
## Hornet Sportabout   8 18.7
## Valiant             6 18.1
## Duster 360          8 14.3
NewDF <- mtcars[3:7,c("cyl","mpg")]
NewDF
##                   cyl  mpg
## Datsun 710          4 22.8
## Hornet 4 Drive      6 21.4
## Hornet Sportabout   8 18.7
## Valiant             6 18.1
## Duster 360          8 14.3
NewDF[1:2,"mpg"] <- 100
NewDF
##                   cyl   mpg
## Datsun 710          4 100.0
## Hornet 4 Drive      6 100.0
## Hornet Sportabout   8  18.7
## Valiant             6  18.1
## Duster 360          8  14.3
NewDF[NewDF$mpg > 50,"class"] <- "High MPG"
NewDF # Note that we added a new column and it has NA values where we supplied nothing.
##                   cyl   mpg    class
## Datsun 710          4 100.0 High MPG
## Hornet 4 Drive      6 100.0 High MPG
## Hornet Sportabout   8  18.7     <NA>
## Valiant             6  18.1     <NA>
## Duster 360          8  14.3     <NA>

Dealing with NA values

Let’s fix the NA values. Look at the problem rows first.

NewDF[is.na(NewDF$class),]
##                   cyl  mpg class
## Hornet Sportabout   8 18.7  <NA>
## Valiant             6 18.1  <NA>
## Duster 360          8 14.3  <NA>
# Replace the NA values
NewDF[is.na(NewDF$class),"class"] <- "Low MPG"
NewDF
##                   cyl   mpg    class
## Datsun 710          4 100.0 High MPG
## Hornet 4 Drive      6 100.0 High MPG
## Hornet Sportabout   8  18.7  Low MPG
## Valiant             6  18.1  Low MPG
## Duster 360          8  14.3  Low MPG