My personal documentation, for future reference and created after completion of the JHU Data Science Specialization, online via Coursera LMS. Notes paraphrased from Roger D. Peng’s book Mastering Software Development in R.

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

x <- 1
print(x)
## [1] 1
x
## [1] 1
msg <- "hello"
msg
## [1] "hello"

Note that if the echo = FALSE parameter was added to the code chunk, it would prevent printing of the R code that generated the plot.

The index for the response vector is printed in square brackets, left side. Note that the : character is used to create integer sequences.

x <- 11:30
x
##  [1] 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

The most basic type of R object is a vector. A vector can only contain objects of the same object class. Except for a list, review later. A list is represented as a vector, but can contain objects of different classes.

Numbers in R are generally treated as numeric objects (double precision real numbers). Even if you see a number like “1” or “2” in R, which you might think of as integers, they are likely represented behind the scenes as numeric objects (“1.00” or “2.00”).

If you explicity want an integer, you need to specify the L suffix. So entering 1 in R gives you a numeric object; entering 1L explicity gives you an integer object.

Creating Vectors

The c() function can be used to create vectors of objects by concatenating things together.

x <- c(0.5, 0.6)
x
## [1] 0.5 0.6

You can also use the vector() function to initialize vectors:

x <- vector("numeric", length = 10)
x
##  [1] 0 0 0 0 0 0 0 0 0 0

Mixing Objects

When different objects are mixed in a vector, coercion occurs so that every element in the vector is of the same class. R tries to find a way to represent all of the objects in the vector in a reasonable fashion.

Explicit Coercion

Objects can be explicitly coerced from one class to another using the as.* functions, if available.

x <- 0:6
class(x)
## [1] "integer"
as.numeric(x)
## [1] 0 1 2 3 4 5 6
as.logical(x)
## [1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
as.character(x)
## [1] "0" "1" "2" "3" "4" "5" "6"

Matrices

Matrices are vectors with a dimension attribute. The dimension attribute is itself an integer vector of length 2 (number of rows, number of columns).

m <- matrix(nrow = 2, ncol = 3)
m
##      [,1] [,2] [,3]
## [1,]   NA   NA   NA
## [2,]   NA   NA   NA
dim(m)
## [1] 2 3
attributes(m)
## $dim
## [1] 2 3

Matrices are constructed column-wise, so entries can be thought of starting in the “upper left” corner and running down the colums.

m <- matrix(1:6, nrow = 2, ncol = 3)
m
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6

Matrices can also be created directly from vectors by adding a dimension attribute.

m <- 1:10
m
##  [1]  1  2  3  4  5  6  7  8  9 10
dim(m) <- c(2, 5)
m
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    3    5    7    9
## [2,]    2    4    6    8   10

Matrices can be created by column-binding or row-binding with the cbind() and rbind() functions.

x <- 1:3
y <- 10:12
cbind(x, y)
##      x  y
## [1,] 1 10
## [2,] 2 11
## [3,] 3 12
rbind(x, y)
##   [,1] [,2] [,3]
## x    1    2    3
## y   10   11   12

Lists

Lists are a special type of vector that can contain elements of different classes. Lists are very important in R and in combination with the various “apply” functions discussed later, make for a powerful combination.

Lists can be explicity created using the list() function, which takes an arbitrary number of arguments.

x <- list(1, "a", TRUE, 1 + 4i)
x
## [[1]]
## [1] 1
## 
## [[2]]
## [1] "a"
## 
## [[3]]
## [1] TRUE
## 
## [[4]]
## [1] 1+4i

We can also create an empty list of a prespecified length with the vector() function.

x <- vector("list", length = 5)
x
## [[1]]
## NULL
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL
## 
## [[4]]
## NULL
## 
## [[5]]
## NULL

Factors

Factors represent categorical data and can be unordered or ordered. A factor as an integer vector where each integer has a label.

Using factors with labels is better than using integers because factors are self-describing. Having a variable that has values “Male” and “Female” is better than a var with values 1 and 2.

Often, factors will be automatically created for you when you read a dataset in using a function like read.table(). Often, strings as factors is default.

Factor objects can be created with the factor() function. Levels are automatically put in alphabetical order. Note that the underlying representation of factors are numbers in order of their placement. Here, 1 = “no”, 2 = “yes”.

x <- factor(c("yes", "yes", "no", "yes", "no"))
x
## [1] yes yes no  yes no 
## Levels: no yes
table(x)
## x
##  no yes 
##   2   3
unclass(x)
## [1] 2 2 1 2 1
## attr(,"levels")
## [1] "no"  "yes"

Alternatively, as shown below, to override alphabetical order, the order of the levels of a factor can be set using the levels argument to factor(). Can be important in both linear modeling and plots, since by default the first level is used as the baseline level, and default factors are plotted in the order of their levels.

x <- factor(c("yes", "yes", "no", "yes", "no"), 
levels = c("yes", "no"))
x
## [1] yes yes no  yes no 
## Levels: yes no

Missing Values

Missing values are denoted by NA or NaN.

x <- c(1, 2, NA, 10, 3)
is.na(x)
## [1] FALSE FALSE  TRUE FALSE FALSE
is.nan(x)
## [1] FALSE FALSE FALSE FALSE FALSE

Data Frames

Data frames are used to store tabular data (data in a table form) in R. They are important in R, used in a variety of operations. Tidyverse works best with data frames.

Unlike matrices, data frames can store different classes of objects in each column. Matrices must have every element be the same class (e.g. all integers, all numeric).

Data Frames:

x <- data.frame(foo = 1:4, bar = c(T, T, F, F))
x
##   foo   bar
## 1   1  TRUE
## 2   2  TRUE
## 3   3 FALSE
## 4   4 FALSE
nrow(x)
## [1] 4
ncol(x)
## [1] 2

Names

R objects can have names, useful for writing readable code and self-describing objects. Example of assigning names to an integer vector:

x <- 1:3
names(x)
## NULL
names(x) <- c("New York", "Seattle", "Los Angeles")
x
##    New York     Seattle Los Angeles 
##           1           2           3

Attributes

In general, R objects can have attributes, which are like metadata for the object. Attributes for an object (if any) can be accessed using the attributes() function.

attributes(x)
## $names
## [1] "New York"    "Seattle"     "Los Angeles"