[2014 Epiworkshop] R basics and Sequencing data analysis

Part I: R basics

Lecturer: Sheng Li

Weill Cornell Medical College

1. Computations in R

10/3  # division
## [1] 3.333
10 + 3  # sum
## [1] 13
10 - 3  # subtract
## [1] 7
10 * 3  # product
## [1] 30
10%%3  # modulo
## [1] 1
3 + 5 * 6  # 
## [1] 33
log(100)
## [1] 4.605
log(100, 10)
## [1] 2
2^5
## [1] 32
abs(2 - 10)
## [1] 8
exp(3)
## [1] 20.09

2. Get help from R

help(log)

3. Vectors

a = c(1, 5, 8, 9)
a
## [1] 1 5 8 9
b = 1:4
b
## [1] 1 2 3 4
b + 4
## [1] 5 6 7 8
3 * b
## [1]  3  6  9 12
b^3
## [1]  1  8 27 64
2^b
## [1]  2  4  8 16
b1 = 3 * b
b1
## [1]  3  6  9 12
a + b
## [1]  2  7 11 13
a * b
## [1]  1 10 24 36
a - b
## [1] 0 3 5 5
a/b
## [1] 1.000 2.500 2.667 2.250
a/(b + 1)
## [1] 0.500 1.667 2.000 1.800

4. Strings

x = c("A", "B")
x
## [1] "A" "B"
y = 1:2
y
## [1] 1 2
z1 = paste(x, y, sep = "-")
z1
## [1] "A-1" "B-2"
gsub("-", "", z1)
## [1] "A1" "B2"
z2 = paste0(x, y)
z2
## [1] "A1" "B2"
grep("B", z2)
## [1] 2
grep("B", z2, value = T)
## [1] "B2"
grep("B", z2, value = T, invert = T)
## [1] "A1"

5. Matrices

a = 1:4
b = 6:9
m1 = cbind(a, b)
m1
##      a b
## [1,] 1 6
## [2,] 2 7
## [3,] 3 8
## [4,] 4 9
t(m1)
##   [,1] [,2] [,3] [,4]
## a    1    2    3    4
## b    6    7    8    9
dim(m1)
## [1] 4 2
v = 1:12
m2 = matrix(v, ncol = 3, byrow = T)
m2
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9
## [4,]   10   11   12

6. Data frame

v = c(1, 1, 2, 2)
chr = paste("chr", v, sep = "")
start = c(50, 100, 200, 300)
end = c(75, 125, 275, 400)
df = data.frame(chr = chr, start = start, end = end, id = 4:1)
df
##    chr start end id
## 1 chr1    50  75  4
## 2 chr1   100 125  3
## 3 chr2   200 275  2
## 4 chr2   300 400  1
df[1:3, ]
##    chr start end id
## 1 chr1    50  75  4
## 2 chr1   100 125  3
## 3 chr2   200 275  2
df[, 1:2]
##    chr start
## 1 chr1    50
## 2 chr1   100
## 3 chr2   200
## 4 chr2   300
df[, "chr"]
## [1] chr1 chr1 chr2 chr2
## Levels: chr1 chr2
df$start
## [1]  50 100 200 300
df2 = data.frame(gene = c("a", "b", "c", "d"), id = 1:4)
df2
##   gene id
## 1    a  1
## 2    b  2
## 3    c  3
## 4    d  4
df3 = data.frame(chr = "chr3", start = 500, end = 600, id = 5)
df3
##    chr start end id
## 1 chr3   500 600  5

7. Combine two data frame

rbind(df, df3)
##    chr start end id
## 1 chr1    50  75  4
## 2 chr1   100 125  3
## 3 chr2   200 275  2
## 4 chr2   300 400  1
## 5 chr3   500 600  5
cbind(df, df2[4:1, 1])
##    chr start end id df2[4:1, 1]
## 1 chr1    50  75  4           d
## 2 chr1   100 125  3           c
## 3 chr2   200 275  2           b
## 4 chr2   300 400  1           a
merge(df, df2, by = "id")
##   id  chr start end gene
## 1  1 chr2   300 400    a
## 2  2 chr2   200 275    b
## 3  3 chr1   100 125    c
## 4  4 chr1    50  75    d

8. Lists

l = list(a = "seq", n = 1:3, m = matrix(1:10, ncol = 2))
l
## $a
## [1] "seq"
## 
## $n
## [1] 1 2 3
## 
## $m
##      [,1] [,2]
## [1,]    1    6
## [2,]    2    7
## [3,]    3    8
## [4,]    4    9
## [5,]    5   10
l[[3]]
##      [,1] [,2]
## [1,]    1    6
## [2,]    2    7
## [3,]    3    8
## [4,]    4    9
## [5,]    5   10
l[["n"]]
## [1] 1 2 3