Lecture 2 Notes

Harold Nelson

March 23, 2016

R Data Types

Numeric
Character
Logical
Factor

R Data Structures

Vector
Matrix
Dataframe
List

Most of our work will be concerned with vectors and dataframes.

Specifying an individual element of a vector

First create a vector with c(), supply names and examine the vector.

vec <- c(6,7,8,9)
names(vec) <- c("a","b","c","d")
# Simple display
vec

## a b c d 
## 6 7 8 9

# Look at the properties.
str(vec)

##  Named num [1:4] 6 7 8 9
##  - attr(*, "names")= chr [1:4] "a" "b" "c" "d"

All of the elements of a vector must be of the same type. R will allow you to put any atomic data element in a vector, but will perform coercion to enforce this restriction.

vec["a"] = "six"
# Simple display
vec

##     a     b     c     d 
## "six"   "7"   "8"   "9"

# Look at the properties.
str(vec)

##  Named chr [1:4] "six" "7" "8" "9"
##  - attr(*, "names")= chr [1:4] "a" "b" "c" "d"

Return vec to being numeric.

# Let's try a simple fix. Note that we can use the positional index or the name to refer to an element of a vector.
vec[1] <- 6
# Simple display
vec

##   a   b   c   d 
## "6" "7" "8" "9"

# Look at the properties.
str(vec)

##  Named chr [1:4] "6" "7" "8" "9"
##  - attr(*, "names")= chr [1:4] "a" "b" "c" "d"

# Since these character strings are all numbers, we can manually coerce.

vec <- as.numeric(vec)
# Simple display
vec

## [1] 6 7 8 9

# Look at the properties.
str(vec)

##  num [1:4] 6 7 8 9

Note that when we replaced the entire vector we lost the names.

Now let’s see what happens when we replace a numeric element wih a boolean.

vec[1] <- FALSE
# Simple display
vec

## [1] 0 7 8 9

# Look at the properties.
str(vec)

##  num [1:4] 0 7 8 9

Let’s try to make a logical vector

vec <- as.logical(vec)
# Simple display
vec

## [1] FALSE  TRUE  TRUE  TRUE

# Look at the properties.
str(vec)

##  logi [1:4] FALSE TRUE TRUE TRUE

Specifying and using subvectors of a vector

If x is a vector x[sub-vector specification] defines a sub-vector of x. The subvector may be a vector of numbers indicating positions in x, a logical vector or names if they have been assigned.

x <- 11:20
names(x) = c("a","b","c","d","e",
             "f","g","h","i","j")
x

##  a  b  c  d  e  f  g  h  i  j 
## 11 12 13 14 15 16 17 18 19 20

x[c(2,3,4)]

##  b  c  d 
## 12 13 14

x[c("f","b","e")]

##  f  b  e 
## 16 12 15

x[c(2,2,2)]

##  b  b  b 
## 12 12 12

x[7:9]

##  g  h  i 
## 17 18 19

x[-c(7:9)] # "-" means everything but

##  a  b  c  d  e  f  j 
## 11 12 13 14 15 16 20

x[c(rep(TRUE,4),rep(FALSE,4),rep(TRUE,2))]

##  a  b  c  d  i  j 
## 11 12 13 14 19 20

x[c(TRUE,FALSE)] # Note the recycling

##  a  c  e  g  i 
## 11 13 15 17 19

x > 5 & x <= 8 # Create a logical vector

##     a     b     c     d     e     f     g     h     i     j 
## FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

x[x > 15 & x <= 18] # Use it to specify a subvector

##  f  g  h 
## 16 17 18

x[] # No specification means "everything."

##  a  b  c  d  e  f  g  h  i  j 
## 11 12 13 14 15 16 17 18 19 20

Note that a subvector may be used to display the subvector, as the right side of a replacement statement or the left side of a replacement statement.

x <- 11:20
x[7:9]

## [1] 17 18 19

y <- x[7:9]
y

## [1] 17 18 19

x[7:9] <- 5
x

##  [1] 11 12 13 14 15 16  5  5  5 20

Subsets of a dataframe

Most of the principles as with vectors, but there two dimensions rather than one. The specifications are separated by a comma. In general we have df[Row Spec,Col Spec]. The result is a new dataframe. A missing dimension must be represented by a blank space.

Recall the dataframe mtcars.

str(mtcars)

## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

NewDF <- mtcars[3:7,c("cyl","mpg")]
NewDF

##                   cyl  mpg
## Datsun 710          4 22.8
## Hornet 4 Drive      6 21.4
## Hornet Sportabout   8 18.7
## Valiant             6 18.1
## Duster 360          8 14.3

str(NewDF)

## 'data.frame':    5 obs. of  2 variables:
##  $ cyl: num  4 6 8 6 8
##  $ mpg: num  22.8 21.4 18.7 18.1 14.3

Note that we can use specified subsets of a dataframe the same way we used specified subvectors.

mtcars[3:7,c("cyl","mpg")]

##                   cyl  mpg
## Datsun 710          4 22.8
## Hornet 4 Drive      6 21.4
## Hornet Sportabout   8 18.7
## Valiant             6 18.1
## Duster 360          8 14.3

NewDF <- mtcars[3:7,c("cyl","mpg")]
NewDF

##                   cyl  mpg
## Datsun 710          4 22.8
## Hornet 4 Drive      6 21.4
## Hornet Sportabout   8 18.7
## Valiant             6 18.1
## Duster 360          8 14.3

NewDF[1:2,"mpg"] <- 100
NewDF

##                   cyl   mpg
## Datsun 710          4 100.0
## Hornet 4 Drive      6 100.0
## Hornet Sportabout   8  18.7
## Valiant             6  18.1
## Duster 360          8  14.3

NewDF[NewDF$mpg > 50,"class"] <- "High MPG"
NewDF # Note that we added a new column and it has NA values where we supplied nothing.

##                   cyl   mpg    class
## Datsun 710          4 100.0 High MPG
## Hornet 4 Drive      6 100.0 High MPG
## Hornet Sportabout   8  18.7     <NA>
## Valiant             6  18.1     <NA>
## Duster 360          8  14.3     <NA>

Dealing with NA values

Let’s fix the NA values. Look at the problem rows first.

NewDF[is.na(NewDF$class),]

##                   cyl  mpg class
## Hornet Sportabout   8 18.7  <NA>
## Valiant             6 18.1  <NA>
## Duster 360          8 14.3  <NA>

# Replace the NA values
NewDF[is.na(NewDF$class),"class"] <- "Low MPG"
NewDF

##                   cyl   mpg    class
## Datsun 710          4 100.0 High MPG
## Hornet 4 Drive      6 100.0 High MPG
## Hornet Sportabout   8  18.7  Low MPG
## Valiant             6  18.1  Low MPG
## Duster 360          8  14.3  Low MPG