“学习R reference Card 2.0”
“日期:2013-04-01”
by_daigazi

1. sweep

sweep(x, MARGIN, STATS, FUN = “-”, check.margin = TRUE, …)
描述:Return an array obtained from an input array by sweeping out a summary statistic.
x:数组,margin,1或2,代表行和列,stats:一个指标或者一个数字等,FUN默认是减号,可以写成“/”等

A <- array(1:24, dim = 4:2)
A
## , , 1
## 
##      [,1] [,2] [,3]
## [1,]    1    5    9
## [2,]    2    6   10
## [3,]    3    7   11
## [4,]    4    8   12
## 
## , , 2
## 
##      [,1] [,2] [,3]
## [1,]   13   17   21
## [2,]   14   18   22
## [3,]   15   19   23
## [4,]   16   20   24
## no warnings in normal use
sweep(A, 1, 5)
## , , 1
## 
##      [,1] [,2] [,3]
## [1,]   -4    0    4
## [2,]   -3    1    5
## [3,]   -2    2    6
## [4,]   -1    3    7
## 
## , , 2
## 
##      [,1] [,2] [,3]
## [1,]    8   12   16
## [2,]    9   13   17
## [3,]   10   14   18
## [4,]   11   15   19

2. prop.table

prop.table(x, margin = NULL)
描述:This is really sweep(x, margin, margin.table(x, margin), “/”) for newbies, except that if margin has length zero, then one gets x/sum(x).
prop比例,table,求和,即对每一行或每一列求和,再求每个元素在行内或列内的比值
prop.table(x, margin = 1)等价于 sweep(x,1,sum,FUN=“/”) ##错误,stats是一个数值而不是函数

x <- matrix(1:4, 2)
x
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
prop.table(x, margin = 1)
##        [,1]   [,2]
## [1,] 0.2500 0.7500
## [2,] 0.3333 0.6667
# 上式等价于
sweep(x, 1, margin.table(x, 1), FUN = "/")  ##正确
##        [,1]   [,2]
## [1,] 0.2500 0.7500
## [2,] 0.3333 0.6667

3. split

split(x, f, drop = FALSE, …)
描述:split divides the data in the vector x into the groups defined by f. The replacement forms replace values corresponding to such a division. unsplit reverses the effect of split.

split是按照f对x进行分组的。其中x是向量或者数据框,f必须是因子
–》x按照f的类别分类,返回一个列表,列表名字是因子 split divides the data in the vector x into the groups defined by f. The replacement forms replace values corresponding to such a division. unsplit reverses the effect of split.

set.seed(1234)
x = sample(1:30, 30, replace = T)
x
##  [1]  4 19 19 19 26 20  1  7 20 16 21 17  9 28  9 26  9  9  6  7 10 10  5
## [24]  2  7 25 16 28 25  2
y = as.factor(sample(1:3, 30, replace = T))
levels(y)
## [1] "1" "2" "3"
split(x, f = y)
## $`1`
##  [1] 19 19 26  1  7  9  9  6 10 10  7 25
## 
## $`2`
##  [1]  4 19 21 17 28 26  9  2 25 16
## 
## $`3`
## [1] 20 20 16  9  7  5 28  2
names(split(x, f = y))  #查看返回列表的名字
## [1] "1" "2" "3"

4. choose

choose就是排列组合

choose(5, 2)  #结果就是10
## [1] 10

5. xtabs

描述:Create a contingency table (optionally a sparse matrix) from cross-classifying factors, usually contained in a data frame, using a formula interface.
也就是说通过一个cross-classifying factors(交叉分类因素)新生成一张表格

其功能相当于excel里的透视表具体例子见http://cos.name/cn/topic/11566#post-157688

cbind(1,1:7)  #增加列
##      [,1] [,2]
## [1,]    1    1
## [2,]    1    2
## [3,]    1    3
## [4,]    1    4
## [5,]    1    5
## [6,]    1    6
## [7,]    1    7
rbind(1,1:7)  #增加行
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,]    1    1    1    1    1    1    1
## [2,]    1    2    3    4    5    6    7

text <- "V1 V2   V3
2006 1 1871
2006 2 1828
2006 3 2126
2006 4 2172
2006 5 2340
2006 6 2397
2006 7 2389
2006 8 2444
2006 9 2430
2006 10 2490
2006 11 2554
2006 12 2736
2007 1 2404
2007 2 2289
2007 3 2604
2007 4 2646
2007 5 2741
2007 6 2889
2007 7 2811
2007 8 2796
2007 9 2890
2007 10 2854
2007 11 2878
2007 12 2958"
text;class(text)
## [1] "character"
text <- gsub(" +", " ", text) #不知道该函数的用法
tab <- read.table(textConnection(text), sep=" ", head=TRUE)
dt.tab <- xtabs(V3~V2+V1, tab) # tab 是数据集 
#根据V2和V1的组合情况,元素内填写V3
addmargins(dt.tab) #新增边缘列
##      V1
## V2     2006  2007   Sum
##   1    1871  2404  4275
##   2    1828  2289  4117
##   3    2126  2604  4730
##   4    2172  2646  4818
##   5    2340  2741  5081
##   6    2397  2889  5286
##   7    2389  2811  5200
##   8    2444  2796  5240
##   9    2430  2890  5320
##   10   2490  2854  5344
##   11   2554  2878  5432
##   12   2736  2958  5694
##   Sum 27777 32760 60537
#下面是另外一个例子
head(esoph);str(esoph)
## 'data.frame':    88 obs. of  5 variables:
##  $ agegp    : Ord.factor w/ 6 levels "25-34"<"35-44"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ alcgp    : Ord.factor w/ 4 levels "0-39g/day"<"40-79"<..: 1 1 1 1 2 2 2 2 3 3 ...
##  $ tobgp    : Ord.factor w/ 4 levels "0-9g/day"<"10-19"<..: 1 2 3 4 1 2 3 4 1 2 ...
##  $ ncases   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ ncontrols: num  40 10 6 5 27 7 4 7 2 1 ...
xtabs(cbind(ncases, ncontrols) ~ ., data = esoph)
## , , tobgp = 0-9g/day,  = ncases
## 
##        alcgp
## agegp   0-39g/day 40-79 80-119 120+
##   25-34         0     0      0    0
##   35-44         0     0      0    2
##   45-54         1     6      3    4
##   55-64         2     9      9    5
##   65-74         5    17      6    3
##   75+           1     2      1    2
## 
## , , tobgp = 10-19,  = ncases
## 
##        alcgp
## agegp   0-39g/day 40-79 80-119 120+
##   25-34         0     0      0    1
##   35-44         1     3      0    0
##   45-54         0     4      6    3
##   55-64         3     6      8    6
##   65-74         4     3      4    1
##   75+           2     1      1    1
## 
## , , tobgp = 20-29,  = ncases
## 
##        alcgp
## agegp   0-39g/day 40-79 80-119 120+
##   25-34         0     0      0    0
##   35-44         0     1      0    2
##   45-54         0     5      1    2
##   55-64         3     4      3    2
##   65-74         2     5      2    1
##   75+           0     0      0    0
## 
## , , tobgp = 30+,  = ncases
## 
##        alcgp
## agegp   0-39g/day 40-79 80-119 120+
##   25-34         0     0      0    0
##   35-44         0     0      0    0
##   45-54         0     5      2    4
##   55-64         4     3      4    5
##   65-74         0     0      1    1
##   75+           1     1      0    0
## 
## , , tobgp = 0-9g/day,  = ncontrols
## 
##        alcgp
## agegp   0-39g/day 40-79 80-119 120+
##   25-34        40    27      2    1
##   35-44        60    35     11    3
##   45-54        46    38     16    4
##   55-64        49    40     18   10
##   65-74        48    34     13    4
##   75+          18     5      1    2
## 
## , , tobgp = 10-19,  = ncontrols
## 
##        alcgp
## agegp   0-39g/day 40-79 80-119 120+
##   25-34        10     7      1    1
##   35-44        14    23      6    3
##   45-54        18    21     14    4
##   55-64        22    21     15    7
##   65-74        14    10     12    2
##   75+           6     3      1    1
## 
## , , tobgp = 20-29,  = ncontrols
## 
##        alcgp
## agegp   0-39g/day 40-79 80-119 120+
##   25-34         6     4      0    1
##   35-44         7    14      2    4
##   45-54        10    15      5    3
##   55-64        12    17      6    3
##   65-74         7     9      3    1
##   75+           0     3      0    0
## 
## , , tobgp = 30+,  = ncontrols
## 
##        alcgp
## agegp   0-39g/day 40-79 80-119 120+
##   25-34         5     7      2    2
##   35-44         8     8      1    0
##   45-54         4     7      4    4
##   55-64         6     6      4    6
##   65-74         2     0      1    1
##   75+           3     1      0    0
## Output is not really helpful ... flat tables are better:
ftable(xtabs(cbind(ncases, ncontrols) ~ ., data = esoph))
##                           ncases ncontrols
## agegp alcgp     tobgp                     
## 25-34 0-39g/day 0-9g/day       0        40
##                 10-19          0        10
##                 20-29          0         6
##                 30+            0         5
##       40-79     0-9g/day       0        27
##                 10-19          0         7
##                 20-29          0         4
##                 30+            0         7
##       80-119    0-9g/day       0         2
##                 10-19          0         1
##                 20-29          0         0
##                 30+            0         2
##       120+      0-9g/day       0         1
##                 10-19          1         1
##                 20-29          0         1
##                 30+            0         2
## 35-44 0-39g/day 0-9g/day       0        60
##                 10-19          1        14
##                 20-29          0         7
##                 30+            0         8
##       40-79     0-9g/day       0        35
##                 10-19          3        23
##                 20-29          1        14
##                 30+            0         8
##       80-119    0-9g/day       0        11
##                 10-19          0         6
##                 20-29          0         2
##                 30+            0         1
##       120+      0-9g/day       2         3
##                 10-19          0         3
##                 20-29          2         4
##                 30+            0         0
## 45-54 0-39g/day 0-9g/day       1        46
##                 10-19          0        18
##                 20-29          0        10
##                 30+            0         4
##       40-79     0-9g/day       6        38
##                 10-19          4        21
##                 20-29          5        15
##                 30+            5         7
##       80-119    0-9g/day       3        16
##                 10-19          6        14
##                 20-29          1         5
##                 30+            2         4
##       120+      0-9g/day       4         4
##                 10-19          3         4
##                 20-29          2         3
##                 30+            4         4
## 55-64 0-39g/day 0-9g/day       2        49
##                 10-19          3        22
##                 20-29          3        12
##                 30+            4         6
##       40-79     0-9g/day       9        40
##                 10-19          6        21
##                 20-29          4        17
##                 30+            3         6
##       80-119    0-9g/day       9        18
##                 10-19          8        15
##                 20-29          3         6
##                 30+            4         4
##       120+      0-9g/day       5        10
##                 10-19          6         7
##                 20-29          2         3
##                 30+            5         6
## 65-74 0-39g/day 0-9g/day       5        48
##                 10-19          4        14
##                 20-29          2         7
##                 30+            0         2
##       40-79     0-9g/day      17        34
##                 10-19          3        10
##                 20-29          5         9
##                 30+            0         0
##       80-119    0-9g/day       6        13
##                 10-19          4        12
##                 20-29          2         3
##                 30+            1         1
##       120+      0-9g/day       3         4
##                 10-19          1         2
##                 20-29          1         1
##                 30+            1         1
## 75+   0-39g/day 0-9g/day       1        18
##                 10-19          2         6
##                 20-29          0         0
##                 30+            1         3
##       40-79     0-9g/day       2         5
##                 10-19          1         3
##                 20-29          0         3
##                 30+            1         1
##       80-119    0-9g/day       1         1
##                 10-19          1         1
##                 20-29          0         0
##                 30+            0         0
##       120+      0-9g/day       2         2
##                 10-19          1         1
##                 20-29          0         0
##                 30+            0         0

6. ftable

描述:Create ‘flat’ contingency tables.
使用方法:ftable(x, …)
x R objects which can be interpreted as factors (including character strings), or a list (or data frame) whose components can be so interpreted, or a contingency table object of class “table” or “ftable”.
x要求是可以分解出因子的列表/数据框/因子/table/ftable等类型

head(Titanic)
str(Titanic)
## Start with a contingency table.
ftable(Titanic, row.vars = 1:3)  #row.vars是输出透视表中的行名,填写的元素是出1:3三列之外的其他列
ftable(Titanic, row.vars = 1:2, col.vars = "Survived")  #col.vars是列名
ftable(Titanic, row.vars = 2:1, col.vars = "Survived")
## Start with a data frame.
x <- ftable(mtcars[c("cyl", "vs", "am", "gear")])
x
levels(as.factor(mtcars$gear))
ftable(x, row.vars = c(2, 4))
## Start with expressions, use table()'s 'dnn' to change labels
ftable(mtcars$cyl, mtcars$vs, mtcars$am, mtcars$gear, row.vars = c(2, 4), dnn = c("Cylinders", 
    "V/S", "Transmission", "Gears"))  ##dnn对列名和行名分别给名字

7.replace

replace(x, list, values)
x vector
list an index vector
values replacement values
replace replaces the values in x with indices given in list by those given in values.
If necessary, the values in values are recycled.
按照list所提供的索引以及value所提供的更换数值,一次性对x的指定位置的值做更换。

set.seed(123)
x = sample(1:10, 5, replace = T)
x
[1]  3  8  5  9 10
l = c(1, 3, 4)
value = c(100, 200, 300)
# list(a=matrix(sample(1:20,30),ncol=4,nrow=5,byrow=T))
x = replace(x, l, value)
x
[1] 100   8 200 300  10

8. addmargins

Puts Arbitrary Margins on Multidimensional Tables or Arrays
addmargins(A, margin = seq_along(dim(A)), FUN = sum, quiet = FALSE)

Aye <- sample(c("Yes", "Si", "Oui"), 177, replace = TRUE)
Bee <- sample(c("Hum", "Buzz"), 177, replace = TRUE)
Sea <- sample(c("White", "Black", "Red", "Dead"), 177, replace = TRUE)
(A <- table(Aye, Bee, Sea))  #与A <- table(Aye, Bee, Sea)不同,前者直接赋值给A并且打印出来,后者只有赋值
## , , Sea = Black
## 
##      Bee
## Aye   Buzz Hum
##   Oui    8   5
##   Si    10   8
##   Yes    9  11
## 
## , , Sea = Dead
## 
##      Bee
## Aye   Buzz Hum
##   Oui    6   5
##   Si     9   8
##   Yes    5  10
## 
## , , Sea = Red
## 
##      Bee
## Aye   Buzz Hum
##   Oui    8  10
##   Si     7   3
##   Yes    4   3
## 
## , , Sea = White
## 
##      Bee
## Aye   Buzz Hum
##   Oui    3   9
##   Si     8  12
##   Yes    8   8
addmargins(A)
## , , Sea = Black
## 
##      Bee
## Aye   Buzz Hum Sum
##   Oui    8   5  13
##   Si    10   8  18
##   Yes    9  11  20
##   Sum   27  24  51
## 
## , , Sea = Dead
## 
##      Bee
## Aye   Buzz Hum Sum
##   Oui    6   5  11
##   Si     9   8  17
##   Yes    5  10  15
##   Sum   20  23  43
## 
## , , Sea = Red
## 
##      Bee
## Aye   Buzz Hum Sum
##   Oui    8  10  18
##   Si     7   3  10
##   Yes    4   3   7
##   Sum   19  16  35
## 
## , , Sea = White
## 
##      Bee
## Aye   Buzz Hum Sum
##   Oui    3   9  12
##   Si     8  12  20
##   Yes    8   8  16
##   Sum   19  29  48
## 
## , , Sea = Sum
## 
##      Bee
## Aye   Buzz Hum Sum
##   Oui   25  29  54
##   Si    34  31  65
##   Yes   26  32  58
##   Sum   85  92 177

ftable(A)  #等价于ftable(A,row.vars=c(1,2),col.vars=3)
##          Sea Black Dead Red White
## Aye Bee                          
## Oui Buzz         8    6   8     3
##     Hum          5    5  10     9
## Si  Buzz        10    9   7     8
##     Hum          8    8   3    12
## Yes Buzz         9    5   4     8
##     Hum         11   10   3     8
ftable(addmargins(A))  #等价于ftable(addmargins(A),row.vars=c(1,2),col.vars=3)
##          Sea Black Dead Red White Sum
## Aye Bee                              
## Oui Buzz         8    6   8     3  25
##     Hum          5    5  10     9  29
##     Sum         13   11  18    12  54
## Si  Buzz        10    9   7     8  34
##     Hum          8    8   3    12  31
##     Sum         18   17  10    20  65
## Yes Buzz         9    5   4     8  26
##     Hum         11   10   3     8  32
##     Sum         20   15   7    16  58
## Sum Buzz        27   20  19    19  85
##     Hum         24   23  16    29  92
##     Sum         51   43  35    48 177

# Non-commutative functions - note differences between resulting tables:
ftable(addmargins(A, c(1, 3), FUN = list(Sum = sum, list(Min = min, Max = max))))
## Margins computed over dimensions
## in the following order:
## 1: Aye
## 2: Sea
##          Sea Black Dead Red White Min Max
## Aye Bee                                  
## Oui Buzz         8    6   8     3   3   8
##     Hum          5    5  10     9   5  10
## Si  Buzz        10    9   7     8   7  10
##     Hum          8    8   3    12   3  12
## Yes Buzz         9    5   4     8   4   9
##     Hum         11   10   3     8   3  11
## Sum Buzz        27   20  19    19  19  27
##     Hum         24   23  16    29  16  29
ftable(addmargins(A, c(3, 1), FUN = list(list(Min = min, Max = max), Sum = sum)))
## Margins computed over dimensions
## in the following order:
## 1: Sea
## 2: Aye
##          Sea Black Dead Red White Min Max
## Aye Bee                                  
## Oui Buzz         8    6   8     3   3   8
##     Hum          5    5  10     9   5  10
## Si  Buzz        10    9   7     8   7  10
##     Hum          8    8   3    12   3  12
## Yes Buzz         9    5   4     8   4   9
##     Hum         11   10   3     8   3  11
## Sum Buzz        27   20  19    19  14  27
##     Hum         24   23  16    29  11  33
# 与上一行代码结果一致,排序顺序不同
ftable(addmargins(A, c(1, 3), FUN = list(list(Min = min, Max = max), Sum = sum)))
## Margins computed over dimensions
## in the following order:
## 1: Aye
## 2: Sea
##          Sea Black Dead Red White Sum
## Aye Bee                              
## Oui Buzz         8    6   8     3  25
##     Hum          5    5  10     9  29
## Si  Buzz        10    9   7     8  34
##     Hum          8    8   3    12  31
## Yes Buzz         9    5   4     8  26
##     Hum         11   10   3     8  32
## Min Buzz         8    5   4     3  20
##     Hum          5    5   3     8  21
## Max Buzz        10    9   8     8  35
##     Hum         11   10  10    12  43
# 与上上行代码结果类似,只是求和在列,最大最小值在行

9. seq/seq.Date

seq(from = 1, to = 1, by = ((to - from)/(length.out - 1)), length.out = NULL, along.with = NULL, …)
seq.int(from, to, by, length.out, along.with, …)
seq_along(along.with)
seq_len(length.out)
seq.Date:seq(from, to, by, length.out = NULL, along.with = NULL, …)
by can be specified in several ways.
A number, taken to be in days.by可以是以天为单位,比如"1 days"、"1 days"。
A object of class difftime.by可是是difftime的格式,见案例
A character string, containing one of “day”, “week”, “month” or “year”. # 单位也可以是天、周、月、年
This can optionally be preceded by a (positive or negative) integer and a space, or followed by “s”.

seq_along(dim(A))
## [1] 1 2 3
dim(A)
## [1] 3 2 4
seq_len(length.out = 10)
##  [1]  1  2  3  4  5  6  7  8  9 10
seq_along(along.with = 10)
## [1] 1
seq(along.with = 10)
## [1] 1
seq(along.with = seq_len(length.out = 10))
##  [1]  1  2  3  4  5  6  7  8  9 10
seq(along.with = rep(1, 5))
## [1] 1 2 3 4 5
seq.int(from = 1.1, to = 10, by = 1)  #精确到小数点
## [1] 1.1 2.1 3.1 4.1 5.1 6.1 7.1 8.1 9.1
seq.int(from = 1.1, by = 1, length.out = 10)  #和上一行代码的结果不一样,是整数
##  [1]  1  2  3  4  5  6  7  8  9 10
seq(as.Date("2000/1/1"), as.Date("2000/1/2"), by = "0.5 days")
## Error: invalid '(to - from)/by' in 'seq'
seq(as.Date("2000/1/1"), as.Date("2000/1/2"), by = "1 days")
## [1] "2000-01-01" "2000-01-02"
t = seq(as.POSIXlt("2012-12-01 00:00:00"), as.POSIXlt("2012-12-02 00:00:00"), 
    by = difftime(as.POSIXlt("2012-12-01 00:15:00"), as.POSIXlt("2012-12-01 00:00:00"), 
        units = "mins"))
length(t)
## [1] 97

10. strsplit/unlist

是strsplit不是strisplit
Split the elements of a character vector x into substrings according to the matches to substring split within them.
strsplit(x, split, fixed = FALSE, perl = FALSE, useBytes = FALSE)
fixed=T时,完全精确匹配

noquote(strsplit("A text I want to display with spaces", NULL)[[1]])  #split=NULL或者“ ”时,对每一个字节做切割
##  [1] A   t e x t   I   w a n t   t o   d i s p l a y   w i t h   s p a c e
## [36] s
x <- c(as = "asfef", qu = "qwerty", "yuiop[", "b", "stuff.blah.yech")
# split x on the letter e
strsplit(x, "e")  #保留split的左右两侧,并分开成两小部分
## $as
## [1] "asf" "f"  
## 
## $qu
## [1] "qw"  "rty"
## 
## [[3]]
## [1] "yuiop["
## 
## [[4]]
## [1] "b"
## 
## [[5]]
## [1] "stuff.blah.y" "ch"
strsplit(x, "e", fixed = T)
## $as
## [1] "asf" "f"  
## 
## $qu
## [1] "qw"  "rty"
## 
## [[3]]
## [1] "yuiop["
## 
## [[4]]
## [1] "b"
## 
## [[5]]
## [1] "stuff.blah.y" "ch"
# 例子,见http://cos.name/cn/topic/104102#post-217293
library(igraph)
b = "1-2,1-3,1-4,1-6,1-8,1-10,2-3,2-4,2-5,2-7,2-9,3-4,3-5,3-6,3-10,4-6,4-7,4-9,4-10,5-7,5-8,5-10,6-7,6-8,6-9,7-9,7-10,8-10,9-10"
b = strsplit(b, ",")[[1]]
b = as.numeric(unlist(strsplit(b, "-")))
# unlist的例子
l.ex <- list(a = list(1:5, LETTERS[1:5]), b = "Z", c = NA)
unlist(l.ex, recursive = FALSE)
## $a1
## [1] 1 2 3 4 5
## 
## $a2
## [1] "A" "B" "C" "D" "E"
## 
## $b
## [1] "Z"
## 
## $c
## [1] NA
unlist(l.ex, recursive = TRUE)
##  a1  a2  a3  a4  a5  a6  a7  a8  a9 a10   b   c 
## "1" "2" "3" "4" "5" "A" "B" "C" "D" "E" "Z"  NA
b1 = b[seq(1, 57, by = 2)]
b2 = b[seq(2, 58, by = 2)]
r = data.frame(from = b1, to = b2)
g = graph.data.frame(d = r, directed = F)  #生成class是graph的对象
plot(g, layout = layout.fruchterman.reingold, vertex.label = 1:10)

plot of chunk strplit/unlist

11. which

which(x, arr.ind = FALSE, useNames = TRUE) arrayInd(ind, .dim, .dimnames = NULL, useNames = FALSE)
x a logical vector or array. NAs are allowed and omitted (treated as if FALSE).
x 是逻辑向量或者数组。NA是运行的,因为会被过滤掉,默认此时的if结果是False arr.ind logical; should array indices be returned when x is an array
ind integer-valued index vector, as resulting from which(x).
.dim dim(.) integer vector
.dimnames optional list of character dimnames(.), of which only .dimnames[[1]] is used.
useNames logical indicating if the value of arrayInd() should have (non-null) dimnames at all.

df = data.frame(u = c(5, 10, 15, 20, 30, 40, 60, 80, 100), lot1 = c(118, 58, 
    42, 35, 27, 25, 21, 19, 18), lot2 = c(69, 35, 26, 21, 18, 16, 13, 12, 12))
which(df$u == 15)
## [1] 3
mat = as.matrix(df)
which(mat%%3 == 0, arr.ind = T)  #返回符合条件元素所在行和列
##       row col
##  [1,]   3   1
##  [2,]   5   1
##  [3,]   7   1
##  [4,]   3   2
##  [5,]   5   2
##  [6,]   7   2
##  [7,]   9   2
##  [8,]   1   3
##  [9,]   4   3
## [10,]   5   3
## [11,]   8   3
## [12,]   9   3
arrayInd(which(mat%%3 == 0))
## Error: argument ".dim" is missing, with no default
# Error in arrayInd(which(mat%%3 == 0)) : argument '.dim' is missing, with
# no default
arrayInd(which(mat%%3 == 0), .dim = c(1:3))  # ind来源与which(x)即使逻辑向量。.dim控制mat中参与计算ma的维???有问题
##       [,1] [,2] [,3]
##  [1,]    1    1    2
##  [2,]    1    1    3
##  [3,]    1    1    1
##  [4,]    1    2    3
##  [5,]    1    2    1
##  [6,]    1    2    2
##  [7,]    1    2    3
##  [8,]    1    1    1
##  [9,]    1    2    2
## [10,]    1    1    3
## [11,]    1    2    1
## [12,]    1    1    2

12. table

13. ceiling_date/floor_date/round_date

library(lubridate)
## Attaching package: 'lubridate'
## The following object(s) are masked from 'package:igraph':
## 
## %--%
floor_date(as.POSIXlt("2012-01-31 23:46:09"), "day")
## [1] "2012-01-31 CST"
ceiling_date(as.POSIXct("2012-01-31 23:46:09"), "day")
## [1] "2012-02-01 CST"
floor_date(as.POSIXct("2012-01-31 23:46:09"), "month")
## [1] "2012-01-01 CST"
ceiling_date(as.POSIXct("2012-01-31 23:46:09"), "month")
## [1] "2012-02-01 CST"
t = class(floor_date(as.POSIXct("2012-01-31 23:46:09"), "month"))
t
## [1] "POSIXct" "POSIXt"
mode(t)  #t的类型是function(函数)
## [1] "character"
## 把数据拼接成精确到秒
library(stringr)
str_c(ceiling_date(as.POSIXct("2012-01-31 23:46:09"), "month"), "00:00:00", 
    sep = " ")
## [1] "2012-02-01 00:00:00"
str_c(t, "00:00:00", sep = " ")  #报错,因为t应该是一个单纯对象,函数是个复合对象
## [1] "POSIXct 00:00:00" "POSIXt 00:00:00"

14. attr/attributes

atrribtutes(object)函数返回对象objiect的各种特殊属性组成的列表,不包括固有属性mode和length。

15. atomic vectors

R对象分为单纯(atomic)对象和复合(recursive)对象两种。单纯对象的所有元素都是同一种基本类型,如数值、字符串等,元素不再是对象;复合对象的元素可以是不同类型的对象,每一个元素是一个对象。函数就是一种复合对象。案例:见13小点。

16. within/with

Evaluate an R expression in an environment constructed from data, possibly modifying the original data.
with(data, expr, …)
within(data, expr, …)

str(airquality)
## 'data.frame':    153 obs. of  6 variables:
##  $ Ozone  : int  41 36 12 18 NA 28 23 19 8 NA ...
##  $ Solar.R: int  190 118 149 313 NA NA 299 99 19 194 ...
##  $ Wind   : num  7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
##  $ Temp   : int  67 72 74 62 56 66 65 59 61 69 ...
##  $ Month  : int  5 5 5 5 5 5 5 5 5 5 ...
##  $ Day    : int  1 2 3 4 5 6 7 8 9 10 ...
aq = within(airquality, {
    a = log(Ozone)
})
str(aq)  #对原表做了修改,新增一列a
## 'data.frame':    153 obs. of  7 variables:
##  $ Ozone  : int  41 36 12 18 NA 28 23 19 8 NA ...
##  $ Solar.R: int  190 118 149 313 NA NA 299 99 19 194 ...
##  $ Wind   : num  7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
##  $ Temp   : int  67 72 74 62 56 66 65 59 61 69 ...
##  $ Month  : int  5 5 5 5 5 5 5 5 5 5 ...
##  $ Day    : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ a      : num  3.71 3.58 2.48 2.89 NA ...
df = data.frame(u = c(5, 10, 15, 20, 30, 40, 60, 80, 100), lot1 = c(118, 58, 
    42, 35, 27, 25, 21, 19, 18), lot2 = c(69, 35, 26, 21, 18, 16, 13, 12, 12))
with(df, list(summary(glm(lot1 ~ log(u), family = Gamma)), summary(glm(lot2 ~ 
    log(u), family = Gamma))))  #返回的是两个summary。用到list而不是{}
## [[1]]
## 
## Call:
## glm(formula = lot1 ~ log(u), family = Gamma)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.0401  -0.0376  -0.0264   0.0290   0.0864  
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.016554   0.000928   -17.9  4.3e-07 ***
## log(u)       0.015343   0.000415    37.0  2.8e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## (Dispersion parameter for Gamma family taken to be 0.002446)
## 
##     Null deviance: 3.51283  on 8  degrees of freedom
## Residual deviance: 0.01673  on 7  degrees of freedom
## AIC: 37.99
## 
## Number of Fisher Scoring iterations: 3
## 
## 
## [[2]]
## 
## Call:
## glm(formula = lot2 ~ log(u), family = Gamma)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.0557  -0.0293   0.0103   0.0171   0.0637  
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.023908   0.001326   -18.0  4.0e-07 ***
## log(u)       0.023599   0.000577    40.9  1.4e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## (Dispersion parameter for Gamma family taken to be 0.001813)
## 
##     Null deviance: 3.118557  on 8  degrees of freedom
## Residual deviance: 0.012672  on 7  degrees of freedom
## AIC: 27.03
## 
## Number of Fisher Scoring iterations: 3

17.聚类分析

分为系统聚类和动态聚类两大类方法。
系统聚类: 步骤:数据中心化与标准化变换–》求距离–》系统聚类–》画图plot
这中间有个类的个数确定问题。尚无十分令人满意的方法。R中提供了rect.hclust()函数,其本质是由给定类的个数或者阀值来确定聚类的情况。
动态聚类:

temp = lf.111[, c(4, 7)]

temp = temp[temp$traveltime < 240, ]
cor(temp$traveltime, temp$weight)  #弱相关性
hist(temp$traveltime)
hist(temp$weight, breaks = 200)
plot(density(temp$weight))
plot(temp$weight, temp$traveltime)
temp = temp[temp$traveltime < 240, ]
km = kmeans(scale(temp), 4, nstart = 20)
plot(temp[c("weight", "traveltime")], col = km$cluster)
points(km$centers[, c("weight", "traveltime")], col = 1:3, pch = 8, cex = 2)

18. pmatch

Partial String Matching局部字符串匹配,需同charmatch(23点)区分开来。

pmatch("", "")  # returns NA
## [1] NA
pmatch("m", c("mean", "median", "mode"))  # returns NA
## [1] NA
pmatch("med", c("mean", "median", "mode"))  # returns 2
## [1] 2

pmatch(c("", "ab", "ab"), c("abc", "ab"), dup = FALSE)
## [1] NA  2  1
pmatch(c("", "ab", "ab"), c("abc", "ab"), dup = TRUE)
## [1] NA  2  2
## compare
charmatch(c("", "ab", "ab"), c("abc", "ab"))
## [1] 0 2 2

newiris <- iris newiris$Species <- NULL kc <- kmeans(newiris, 3) plot(newiris[c(“Sepal.Length”, “Sepal.Width”)], col = kc$cluster) points(kc$centers[,c(“Sepal.Length”, “Sepal.Width”)], col = 1:3, pch = 8, cex=2)

19. paste

例子http://www.cookbook-r.com/Strings/Creating_strings_from_variables/ Usage
paste (…, sep = “ ”, collapse = NULL)
paste0(…, collapse = NULL)
Arguments
… one or more R objects, to be converted to character vectors.
sep a character string to separate the terms. Not NAcharacter.
collapse an optional character string to separate the results. Not NAcharacter.
sep单个单个之间的链接,collapse是对sep链接之后结果再做链接

a = LETTERS[1:5]
b = letters[1:5]
paste(a, b, sep = ",")
## [1] "A,a" "B,b" "C,c" "D,d" "E,e"
paste(a, b, sep = "\t")
## [1] "A\ta" "B\tb" "C\tc" "D\td" "E\te"
paste(a, b, sep = ",", collapse = "-")
## [1] "A,a-B,b-C,c-D,d-E,e"
paste("hello", 2, collapse = "")
## [1] "hello 2"
paste("hello", 2, sep = "", collapse = "")
## [1] "hello2"

20. apply

x = seq(0.1, 2.5, length = 10)
m = 10000
z = rnorm(m)
dim(x) = length(x)
p = apply(x, 1, FUN = function(x, y) {
    mean(z <= x)
}, y = z)
# z是z=rnorm(m)所赋值的,关键是在引入z这个变量时的方法。注意:大括号之外
phi = pnorm(x)
print(round(rbind(x, p, phi), 3))  #round(x,3)四舍五入,精确到小数点后三位
##     [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10]
## x   0.10 0.367 0.633 0.900 1.167 1.433 1.700 1.967 2.233 2.500
## p   0.54 0.644 0.736 0.816 0.882 0.926 0.956 0.976 0.987 0.994
## phi 0.54 0.643 0.737 0.816 0.878 0.924 0.955 0.975 0.987 0.994

21 R中的字符串处理函数

详见例子http://cos.name/cn/topic/12987#post-159604

# 字符串连接:
paste()

# 字符串分割:
strsplit()  #strsplit(x, split, extended = TRUE, fixed = FALSE, perl = FALSE)

# 计算字符串的字符数:
nchar()

# 字符串截取:
substr(x, start, stop)
substring(text, first, last = 1e+06)
substr(x, start, stop) <- value
substring(text, first, last = 1e+06) <- value

# 字符串替换及大小写转换:
chartr(old, new, x)
tolower(x)
toupper(x)
casefold(x, upper = FALSE)
字符完全匹配
grep()
字符不完全匹配
agrep()
字符替换
gsub()
# 以上这些函数均可以通过perl=TRUE来使用正则表达式。
grep(pattern, x, ignore.case = FALSE, extended = TRUE, perl = FALSE, value = FALSE, 
    fixed = FALSE, useBytes = FALSE)

sub(pattern, replacement, x, ignore.case = FALSE, extended = TRUE, perl = FALSE, 
    fixed = FALSE, useBytes = FALSE)

gsub(pattern, replacement, x, ignore.case = FALSE, extended = TRUE, perl = FALSE, 
    fixed = FALSE, useBytes = FALSE)

regexpr(pattern, text, ignore.case = FALSE, extended = TRUE, perl = FALSE, fixed = FALSE, 
    useBytes = FALSE)

gregexpr(pattern, text, ignore.case = FALSE, extended = TRUE, perl = FALSE, 
    fixed = FALSE, useBytes = FALSE)

22 assign

例子http://cos.name/cn/topic/6021

# 最简单的,向量赋值
assign(a, c(1:5))  #报错
## Warning: only the first element is used as variable name
assign("a", c(1:5))  #ok
a = 1:4
assign("a[1]", 5)
a  #a还是1:4,这是因为什么呢?
## [1] 1 2 3 4
get("a[1]") == 2  #TRUE
## [1] FALSE
# 复杂点
eval(parse(text = paste("assign('a',", 1, ")", sep = "")))
a
## [1] 1

23. parse

<>= set.seed(1213) # for reproducibility x = cumsum(rnorm(100)) mean(x) # mean of x plot(x, type = 'l') # Brownian motion

24. charmatch

charmatch("", "")  # returns 1
## [1] 1
charmatch("m", c("mean", "median", "mode"))  # returns 0
## [1] 0
charmatch("med", c("mean", "median", "mode"))  # returns 2
## [1] 2
charmatch("med", c("mean", "median", "mode"), nomatch = 5)
## [1] 2
charmatch("m", c("mean", "median", "mode"), nomatch = 5)
## [1] 0