Harold Nelson
March 23, 2016
Most of our work will be concerned with vectors and dataframes.
First create a vector with c(), supply names and examine the vector.
vec <- c(6,7,8,9)
names(vec) <- c("a","b","c","d")
# Simple display
vec
## a b c d
## 6 7 8 9
# Look at the properties.
str(vec)
## Named num [1:4] 6 7 8 9
## - attr(*, "names")= chr [1:4] "a" "b" "c" "d"
All of the elements of a vector must be of the same type. R will allow you to put any atomic data element in a vector, but will perform coercion to enforce this restriction.
vec["a"] = "six"
# Simple display
vec
## a b c d
## "six" "7" "8" "9"
# Look at the properties.
str(vec)
## Named chr [1:4] "six" "7" "8" "9"
## - attr(*, "names")= chr [1:4] "a" "b" "c" "d"
Return vec to being numeric.
# Let's try a simple fix. Note that we can use the positional index or the name to refer to an element of a vector.
vec[1] <- 6
# Simple display
vec
## a b c d
## "6" "7" "8" "9"
# Look at the properties.
str(vec)
## Named chr [1:4] "6" "7" "8" "9"
## - attr(*, "names")= chr [1:4] "a" "b" "c" "d"
# Since these character strings are all numbers, we can manually coerce.
vec <- as.numeric(vec)
# Simple display
vec
## [1] 6 7 8 9
# Look at the properties.
str(vec)
## num [1:4] 6 7 8 9
Note that when we replaced the entire vector we lost the names.
Now let’s see what happens when we replace a numeric element wih a boolean.
vec[1] <- FALSE
# Simple display
vec
## [1] 0 7 8 9
# Look at the properties.
str(vec)
## num [1:4] 0 7 8 9
Let’s try to make a logical vector
vec <- as.logical(vec)
# Simple display
vec
## [1] FALSE TRUE TRUE TRUE
# Look at the properties.
str(vec)
## logi [1:4] FALSE TRUE TRUE TRUE
If x is a vector x[sub-vector specification] defines a sub-vector of x. The subvector may be a vector of numbers indicating positions in x, a logical vector or names if they have been assigned.
x <- 11:20
names(x) = c("a","b","c","d","e",
"f","g","h","i","j")
x
## a b c d e f g h i j
## 11 12 13 14 15 16 17 18 19 20
x[c(2,3,4)]
## b c d
## 12 13 14
x[c("f","b","e")]
## f b e
## 16 12 15
x[c(2,2,2)]
## b b b
## 12 12 12
x[7:9]
## g h i
## 17 18 19
x[-c(7:9)] # "-" means everything but
## a b c d e f j
## 11 12 13 14 15 16 20
x[c(rep(TRUE,4),rep(FALSE,4),rep(TRUE,2))]
## a b c d i j
## 11 12 13 14 19 20
x[c(TRUE,FALSE)] # Note the recycling
## a c e g i
## 11 13 15 17 19
x > 5 & x <= 8 # Create a logical vector
## a b c d e f g h i j
## FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
x[x > 15 & x <= 18] # Use it to specify a subvector
## f g h
## 16 17 18
x[] # No specification means "everything."
## a b c d e f g h i j
## 11 12 13 14 15 16 17 18 19 20
Note that a subvector may be used to display the subvector, as the right side of a replacement statement or the left side of a replacement statement.
x <- 11:20
x[7:9]
## [1] 17 18 19
y <- x[7:9]
y
## [1] 17 18 19
x[7:9] <- 5
x
## [1] 11 12 13 14 15 16 5 5 5 20
Most of the principles as with vectors, but there two dimensions rather than one. The specifications are separated by a comma. In general we have df[Row Spec,Col Spec]. The result is a new dataframe. A missing dimension must be represented by a blank space.
Recall the dataframe mtcars.
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
NewDF <- mtcars[3:7,c("cyl","mpg")]
NewDF
## cyl mpg
## Datsun 710 4 22.8
## Hornet 4 Drive 6 21.4
## Hornet Sportabout 8 18.7
## Valiant 6 18.1
## Duster 360 8 14.3
str(NewDF)
## 'data.frame': 5 obs. of 2 variables:
## $ cyl: num 4 6 8 6 8
## $ mpg: num 22.8 21.4 18.7 18.1 14.3
Note that we can use specified subsets of a dataframe the same way we used specified subvectors.
mtcars[3:7,c("cyl","mpg")]
## cyl mpg
## Datsun 710 4 22.8
## Hornet 4 Drive 6 21.4
## Hornet Sportabout 8 18.7
## Valiant 6 18.1
## Duster 360 8 14.3
NewDF <- mtcars[3:7,c("cyl","mpg")]
NewDF
## cyl mpg
## Datsun 710 4 22.8
## Hornet 4 Drive 6 21.4
## Hornet Sportabout 8 18.7
## Valiant 6 18.1
## Duster 360 8 14.3
NewDF[1:2,"mpg"] <- 100
NewDF
## cyl mpg
## Datsun 710 4 100.0
## Hornet 4 Drive 6 100.0
## Hornet Sportabout 8 18.7
## Valiant 6 18.1
## Duster 360 8 14.3
NewDF[NewDF$mpg > 50,"class"] <- "High MPG"
NewDF # Note that we added a new column and it has NA values where we supplied nothing.
## cyl mpg class
## Datsun 710 4 100.0 High MPG
## Hornet 4 Drive 6 100.0 High MPG
## Hornet Sportabout 8 18.7 <NA>
## Valiant 6 18.1 <NA>
## Duster 360 8 14.3 <NA>
NewDF[is.na(NewDF$class),]
## cyl mpg class
## Hornet Sportabout 8 18.7 <NA>
## Valiant 6 18.1 <NA>
## Duster 360 8 14.3 <NA>
# Replace the NA values
NewDF[is.na(NewDF$class),"class"] <- "Low MPG"
NewDF
## cyl mpg class
## Datsun 710 4 100.0 High MPG
## Hornet 4 Drive 6 100.0 High MPG
## Hornet Sportabout 8 18.7 Low MPG
## Valiant 6 18.1 Low MPG
## Duster 360 8 14.3 Low MPG