Data Science ≈
Using R & Python +
November 13, 2016
Data Science ≈
Using R & Python +
General Side
Academic Side
Industry Side
Tedious Installation
Hard to Repeat
Quality and Agile
R | Python
| Levels | R | Python |
|---|---|---|
| Package and Environment | pacman(Rinker and Kurkiewicz 2015) + packrat(Ushey et al. 2016) + .RData | pip/conda virtulenv |
| Content | Rmarkdown(Allaire et al. 2016) | Jupyter Notebook |
| OS | ? | ? |
packrat::init()
pacman::p_load(tidyverse,
xgboost,
rstanarm,
liftr)
load(".RData")
virtualenv xxx_project pip freeze > requirements.txt pip install -r requirements.txt
Docker ≈ Virtual Machine
| Item | Docker | Virtual Machine |
|---|---|---|
| Launch Speed | <1 s | >1 min |
| Base Size | < 20 m | >200 m |
| Performance | 100% Native | 80% Native |
| Cross Platform | True | True |
| Social Collabration | True | False |
1.Download Docker App
sudo brew install docker # Mac sudo apt-get install docker # ubuntu # go to offical website # Windows
2.One Line To Build
docker run -d -p 8787:8787 --name sparklyr index.tenxcloud.com/7harryprince/sparkr-rstudio
3.Open Chrome
localhost:8787 # ifconfig|grep 0xfffff000|awk '{print $2}'
4.Witness the Miracle
user/passwd: harryzhu
Feel free to contact me at 7harryprince@gmail.com
Allaire, JJ, Joe Cheng, Yihui Xie, Jonathan McPherson, Winston Chang, Jeff Allen, Hadley Wickham, Aron Atkins, and Rob Hyndman. 2016. Rmarkdown: Dynamic Documents for R. http://rmarkdown.rstudio.com.
Rinker, Tyler W., and Dason Kurkiewicz. 2015. pacman: Package Management for R. Buffalo, New York: University at Buffalo/SUNY. http://github.com/trinker/pacman.
Ushey, Kevin, Jonathan McPherson, Joe Cheng, Aron Atkins, and JJ Allaire. 2016. Packrat: A Dependency Management System for Projects and Their R Package Dependencies. https://CRAN.R-project.org/package=packrat.