The help for ?n_distinct claims to be faster than length(unique(x)).
Generate some data.
set.seed(2015-08-05)
x <- sample(1:1000, 1000, replace=TRUE)
Try both functions.
library(dplyr)
n_distinct(x)
## [1] 630
length(unique(x))
## [1] 630
Benchmark 1000 times.
library(microbenchmark)
m <- microbenchmark(dplyr_n_distinct=n_distinct(x),
base_length_unique=length(unique(x)),
times=1000L)
m
## Unit: microseconds
## expr min lq mean median uq max
## dplyr_n_distinct 90.272 91.5445 102.10565 94.4695 101.1120 1114.321
## base_length_unique 11.961 13.7180 19.84094 14.6910 17.9735 1529.025
## neval
## 1000
## 1000
library(ggplot2)
autoplot(m)
sessionInfo()
## R version 3.2.1 (2015-06-18)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.10.4 (Yosemite)
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggplot2_1.0.1 microbenchmark_1.4-2 dplyr_0.4.2
## [4] BiocInstaller_1.18.2
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.0 knitr_1.10.5 magrittr_1.5 MASS_7.3-43
## [5] munsell_0.4.2 colorspace_1.2-6 R6_2.1.0 stringr_1.0.0
## [9] plyr_1.8.3 tools_3.2.1 parallel_3.2.1 grid_3.2.1
## [13] gtable_0.1.2 DBI_0.3.1 htmltools_0.2.6 yaml_2.1.13
## [17] assertthat_0.1 digest_0.6.8 reshape2_1.4.1 formatR_1.2
## [21] evaluate_0.7 rmarkdown_0.7 stringi_0.5-5 scales_0.2.5
## [25] proto_0.3-10