Workign with Data Frames

do() in dplyr compared to summarise

150612: Note first that both for the major manipulations functions such as summarise as for do, the first argument is not data but .data. Do makes mainly sense with grouped data.frames. The dot can be used to refer to the current group, i.e. the data frame filtered for that group. Also the output of do has to be a data frame (or an arbitrary object).

by_cyl <- group_by(mtcars, cyl)
summarise(.data = by_cyl, Mean = mean(disp))
## Source: local data frame [3 x 2]
## 
##   cyl     Mean
## 1   4 105.1364
## 2   6 183.3143
## 3   8 353.1000
# so summarise knows the column headers
do(.data = by_cyl, data.frame(Mean = mean(.$disp)))
## Source: local data frame [3 x 2]
## Groups: cyl
## 
##   cyl     Mean
## 1   4 105.1364
## 2   6 183.3143
## 3   8 353.1000
# dot is used for the data frmae defined by the group, you have to output a
# data frame to which it interestingly adds the group column

# the following is cool then, a list as a column of a data frame
models <- by_cyl %>% do(mod = lm(mpg ~ disp, data = .))
models
## Source: local data frame [3 x 2]
## Groups: <by row>
## 
##   cyl     mod
## 1   4 <S3:lm>
## 2   6 <S3:lm>
## 3   8 <S3:lm>
summarise(models, rsq = summary(mod)$r.squared)
## Source: local data frame [3 x 1]
## 
##          rsq
## 1 0.64840514
## 2 0.01062604
## 3 0.27015777
models %>% do(data.frame(var = names(coef(.$mod)),coef(summary(.$mod))))
## Source: local data frame [6 x 5]
## Groups: <by row>
## 
##           var     Estimate  Std..Error    t.value     Pr...t..
## 1 (Intercept) 40.871955322 3.589605400 11.3861973 1.202715e-06
## 2        disp -0.135141815 0.033171608 -4.0740206 2.782827e-03
## 3 (Intercept) 19.081987419 2.913992892  6.5483988 1.243968e-03
## 4        disp  0.003605119 0.015557115  0.2317344 8.259297e-01
## 5 (Intercept) 22.032798914 3.345241115  6.5863112 2.588765e-05
## 6        disp -0.019634095 0.009315926 -2.1075838 5.677488e-02

There is more good stuff to learn in the do() examples.