Why is dplyr http://cran.r-project.org/web/packages/dplyr/dplyr.pdf so great? It simplifies all your code for managing dataframes.
Let’s say you were using the dataframe “mtcars” and wanted to calculate the mean, median, and standard deviation of the miles per gallon of cars aggregated by the number of cylindars.
Here’s (one way) you could do that using the aggregate function in base R (though I admit there may be simpler ways…)
Means <- aggregate(mtcars$mpg, by = list(mtcars$cyl), FUN = mean) ## Calculate aggregated means
Medians <- aggregate(mtcars$mpg, by = list(mtcars$cyl), FUN = median) ## Repeat with medians
SDs <- aggregate(mtcars$mpg, by = list(mtcars$cyl), FUN = sd) ## (sigh) again with sd
Final <- cbind(Means, Medians[,2], SDs[2]) ## Combine (and delete duplicated columns)
names(Final) <- c("cyl", "mpg.mean", "mpg.median", "mpg.sd") ## Rename
Final
## cyl mpg.mean mpg.median mpg.sd
## 1 4 26.66 26.0 4.510
## 2 6 19.74 19.7 1.454
## 3 8 15.10 15.2 2.560
Note that you have to run the aggregate command for each function separately and then combine them into one dataframe later. Kind of a pain.
Here’s the much, much simpler version in dplyr
Final <- mtcars %>% ## Define the dataframe once
group_by(cyl) %>% ## Define one (or more!) grouping variables
summarise_each(funs(mean, median, sd), mpg) ## Define the aggregation functions
Final
## Source: local data frame [3 x 4]
##
## cyl mean median sd
## 1 4 26.66 26.0 4.510
## 2 6 19.74 19.7 1.454
## 3 8 15.10 15.2 2.560
Isn’t that so much nicer?!