[2014 Epiworkshop] R basics and Sequencing data analysis
Part I: R basics
Lecturer: Sheng Li
Weill Cornell Medical College
1. Computations in R
10/3 # division
## [1] 3.333
10 + 3 # sum
## [1] 13
10 - 3 # subtract
## [1] 7
10 * 3 # product
## [1] 30
10%%3 # modulo
## [1] 1
3 + 5 * 6 #
## [1] 33
log(100)
## [1] 4.605
log(100, 10)
## [1] 2
2^5
## [1] 32
abs(2 - 10)
## [1] 8
exp(3)
## [1] 20.09
2. Get help from R
help(log)
3. Vectors
a = c(1, 5, 8, 9)
a
## [1] 1 5 8 9
b = 1:4
b
## [1] 1 2 3 4
b + 4
## [1] 5 6 7 8
3 * b
## [1] 3 6 9 12
b^3
## [1] 1 8 27 64
2^b
## [1] 2 4 8 16
b1 = 3 * b
b1
## [1] 3 6 9 12
a + b
## [1] 2 7 11 13
a * b
## [1] 1 10 24 36
a - b
## [1] 0 3 5 5
a/b
## [1] 1.000 2.500 2.667 2.250
a/(b + 1)
## [1] 0.500 1.667 2.000 1.800
4. Strings
x = c("A", "B")
x
## [1] "A" "B"
y = 1:2
y
## [1] 1 2
z1 = paste(x, y, sep = "-")
z1
## [1] "A-1" "B-2"
gsub("-", "", z1)
## [1] "A1" "B2"
z2 = paste0(x, y)
z2
## [1] "A1" "B2"
grep("B", z2)
## [1] 2
grep("B", z2, value = T)
## [1] "B2"
grep("B", z2, value = T, invert = T)
## [1] "A1"
5. Matrices
a = 1:4
b = 6:9
m1 = cbind(a, b)
m1
## a b
## [1,] 1 6
## [2,] 2 7
## [3,] 3 8
## [4,] 4 9
t(m1)
## [,1] [,2] [,3] [,4]
## a 1 2 3 4
## b 6 7 8 9
dim(m1)
## [1] 4 2
v = 1:12
m2 = matrix(v, ncol = 3, byrow = T)
m2
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
## [4,] 10 11 12
6. Data frame
v = c(1, 1, 2, 2)
chr = paste("chr", v, sep = "")
start = c(50, 100, 200, 300)
end = c(75, 125, 275, 400)
df = data.frame(chr = chr, start = start, end = end, id = 4:1)
df
## chr start end id
## 1 chr1 50 75 4
## 2 chr1 100 125 3
## 3 chr2 200 275 2
## 4 chr2 300 400 1
df[1:3, ]
## chr start end id
## 1 chr1 50 75 4
## 2 chr1 100 125 3
## 3 chr2 200 275 2
df[, 1:2]
## chr start
## 1 chr1 50
## 2 chr1 100
## 3 chr2 200
## 4 chr2 300
df[, "chr"]
## [1] chr1 chr1 chr2 chr2
## Levels: chr1 chr2
df$start
## [1] 50 100 200 300
df2 = data.frame(gene = c("a", "b", "c", "d"), id = 1:4)
df2
## gene id
## 1 a 1
## 2 b 2
## 3 c 3
## 4 d 4
df3 = data.frame(chr = "chr3", start = 500, end = 600, id = 5)
df3
## chr start end id
## 1 chr3 500 600 5
7. Combine two data frame
rbind(df, df3)
## chr start end id
## 1 chr1 50 75 4
## 2 chr1 100 125 3
## 3 chr2 200 275 2
## 4 chr2 300 400 1
## 5 chr3 500 600 5
cbind(df, df2[4:1, 1])
## chr start end id df2[4:1, 1]
## 1 chr1 50 75 4 d
## 2 chr1 100 125 3 c
## 3 chr2 200 275 2 b
## 4 chr2 300 400 1 a
merge(df, df2, by = "id")
## id chr start end gene
## 1 1 chr2 300 400 a
## 2 2 chr2 200 275 b
## 3 3 chr1 100 125 c
## 4 4 chr1 50 75 d
8. Lists
l = list(a = "seq", n = 1:3, m = matrix(1:10, ncol = 2))
l
## $a
## [1] "seq"
##
## $n
## [1] 1 2 3
##
## $m
## [,1] [,2]
## [1,] 1 6
## [2,] 2 7
## [3,] 3 8
## [4,] 4 9
## [5,] 5 10
l[[3]]
## [,1] [,2]
## [1,] 1 6
## [2,] 2 7
## [3,] 3 8
## [4,] 4 9
## [5,] 5 10
l[["n"]]
## [1] 1 2 3