install.packages(devtools)
devtools::install_github("sahilseth/flowr")
## OR
install.packages("flowr")
library(flowr) ## load the library
setup() ## copy flowr bash script
run('sleep', execute=TRUE, platform='moab')
## OR from terminal
# flowr run sleep execute=TRUE platform=moab
a deluge of data
flowrstreamlining computing workflows
two ingredients
flow_mat]flow_def]scatter, sequential (serial)gather, serial, burstGiven a bunch of shell commands for a step, how to submit the jobs ?
serial: or sequential, one after the otherscatter: submit all the them at the same timeIn what fashion should the downsteam job wait for previous step(s)?
gather: if multiple, wait for all of them:
Submission type serialserial: when one completes start the next: submission type scatterburst: wait for one job, and submit:
Submission type scatter
#install.packages(devtools)
#devtools::install_github("sahilseth/flowr")
## OR
#install.packages("flowr")
library(flowr)
setup()
Consider adding ~/bin to your PATH variable in .bashrc.
export PATH=$PATH:$HOME/bin
You may now use all R functions using 'flowr' from shell.
exdata = file.path(system.file(package = "flowr"), "extdata")
flow_mat = read_sheet(file.path(exdata, "example1_flow_mat.txt"))
Using 'samplename'' as id_column
## this has a bunch of samples, so let us subset one of them
flow_mat = subset(flow_mat, samplename == "sample1")
flow_def = read_sheet(file.path(exdata, "example1_flow_def.txt"))
Using 'jobname'' as id_column
fobj <- to_flow(x = flow_mat, def = flow_def,
flowname = "example1", platform = "moab")
input x is data.frame
Error in `[.data.frame`(x, , grp_col): undefined columns selected
plot_flow(fobj)
Error in plot_flow(fobj): error in evaluating the argument 'x' in selecting a method for function 'plot_flow': Error: object 'fobj' not found
submit_flow(fobj)
submit_flow(fobj, execute = TRUE)
Flow has been submitted. Track it from terminal using:
flowr::status(x="~/flowr/type1-20150520-15-18-46-sySOzZnE")
OR
flowr status x=~/flowr/type1-20150520-15-18-46-sySOzZnE
$ flowr status x=~/flowr/sample1-20150619-07-43-28-OTpuKaMz
Flowr: streamlining workflows
Showing status of: ~/flowr/sample1-20150619-07-43-28-OTpuKaMz
| | total| started| completed| exit_status|
|:---------|-----:|-------:|---------:|-----------:|
|001.sleep | 3| 3| 1| 0|
|002.tmp | 3| 1| 1| 0|
|003.merge | 1| 0| 0| 0|
flow object !Here is an example:
flowr run sleep execute=TRUE platform=moab
$ flowr status x=sample1
Showing status of: ./sample1-20150619-07-34-17-lykJ4pdf
| | total| started| completed| exit_status|
|:---------|-----:|-------:|---------:|-----------:|
|001.sleep | 3| 3| 3| 0|
|002.tmp | 3| 3| 3| 0|
|003.merge | 1| 1| 1| 0|
Showing status of: ./sample1-20150619-07-43-28-OTpuKaMz
| | total| started| completed| exit_status|
|:---------|-----:|-------:|---------:|-----------:|
|001.sleep | 3| 3| 3| 0|
|002.tmp | 3| 3| 3| 0|
|003.merge | 1| 0| 0| 0|
status shows a summary of all the flows in the folderstatus() is designed to work similar to how ls works in the terminal
flowr run sleep execute=TRUE flow_base_path="~/flowr/sleep"
flowr status x=~/flowr/sleep ## parent folder with 3 flows inside
Showing status of: /rsrch2/iacs/iacs_dep/sseth/flowr/sleep
| | total| started| completed| exit_status|
|:---------|-----:|-------:|---------:|-----------:|
|001.sleep | 9| 9| 6| 0|
|002.tmp | 9| 6| 6| 0|
|003.merge | 3| 1| 1| 0|
|004.size | 3| 1| 1| 0|
flowr status x=~/flowr/sleep/sample1* ## get status of all them
kill_flow: fetch jobid of each jobflowr kill_flow wd=~/flowr/sample1-20150619-07-53-58-ySuYo5t0
flowr rerun_flow x=~/flowr/sample1-20150619-11-41-50-eXa0insg start_from=tmp
Extracting commands from previous run.
Hope the reason for previous failure was fixed...
Subsetting... get stuff to run starting tmp
Using flow_base_path default: ~/flowr
├── 001.sleep
│ ├── 001.sleep
│ ├── sleep_cmd_1.sh
│ ├── sleep_cmd_2.sh
│ └── sleep_cmd_3.sh
├── 002.tmp
│ ├── 002.tmp
│ ├── tmp_cmd_1.sh
│ ├── tmp_cmd_2.sh
│ └── tmp_cmd_3.sh
├── 003.merge
│ ├── 003.merge
│ └── merge_cmd_1.sh
├── 004.size
│ ├── 004.size
│ └── size_cmd_1.sh
├── example1-flow_design.pdf
├── flow_details.rda
├── flow_details.txt
├── flow_status.txt
├── tmp
│ ├── merge1
│ ├── tmp1_1
│ ├── tmp1_2
│ └── tmp1_3
└── trigger
├── trigger_001.sleep_1.txt
├── trigger_001.sleep_2.txt
├── trigger_001.sleep_3.txt
├── trigger_002.tmp_1.txt
├── trigger_002.tmp_2.txt
├── trigger_002.tmp_3.txt
├── trigger_003.merge_1.txt
└── trigger_004.size_1.txt
localmoabsyntax: flowr
functionparameters
-h or missing argument loads R help fileflowr rnorm n=100
Loading required package: shape
Flowr: streamlining workflows
2.277249 0.3188005 -0.9658285 0.4719445
....
## load help file for knitr
rfun knitr::knit
## OR use
rfun knitr::knit -h