You may visit docs.flowr.space for more details.

If you face any issues, please feel free to raise a issue on github.

1 Setup up flowr

Requirements:

#install.packages("params", repos = "http://cran.rstudio.com")
## for a latest stable version (updated every few days):
install.packages("flowr", repos = c(CRAN= "http://cran.rstudio.com", DRAT="http://sahilseth.github.io/drat"))

After installation run setup(), this will copy the flowr’s helper script to ~/bin. Please make sure that this folder is in your $PATH variable.

library(flowr)
setup()

Then we need to test whether we are able to submit jobs to the cluster properly.

## run a test on the local platform
run(x='sleep_pipe', platform='local', execute=FALSE)
## run a test on the HPCC platform (torque, sge, moab, slurm, lsf)
run(x='sleep_pipe', platform='torque', execute=TRUE)

NOTE: In case the test is not successful, please follow the advanced configuration page for more details on how to solve the issues.

2 Fetch and download the pipeline

Next, we will download a pipeline which processes multiple fastq files of a sample into a single aligned and merged BAM file.

cd ~/flowr/pipelines
base=https://raw.githubusercontent.com/sahilseth/flowr/devel/inst/pipelines
wget $base/fastq_bam_bwa.R
wget $base/fastq_bam_bwa.conf
wget $base/fastq_bam_bwa.def

2.1 Download data/genome

2.1.1 Reference Genome

One can download the reference genome including indexes of various alignment tools from Illumina’s iGenomes website.

You may skip this step, if you already have the genome fasta and related files.

mkdir ~/flowr/genomes; cd ~/flowr/genomes
url=ussd-ftp.illumina.com/Homo_sapiens/NCBI/build37.2/Homo_sapiens_NCBI_build37.2.tar.gz
ftp ftp://igenome:G3nom3s4u@$url
tar -zxvf Homo_sapiens_NCBI_build37.2.tar.gz

A typical NGS pipeline consists of many steps, each with several parameters. You can modify fastq_bam_bwa.conf, specifying paths to various tools and their default options (samtools, bwa, picard and reference genome indexes).

## customize parameters, including paths to samtools, bwa, reference genomes etc.
vi fastq_bam_bwa.conf

2.1.2 Example data

You may skip this step if you already have raw reads for a sample, in fastq format.

mkdir ~/flowr/genomes; cd ~/flowr/genomes
## for testing puposes one may download example fastq files:
wget http://omixon-download.s3.amazonaws.com/target_brca_example.zip
unzip target_brca_example.zip

3 Customize flow definition, describing the computing cluster

Next, we need to customize the resource requirements based on the computing platform. You may refer to the flow definition format for more details.

## customize the resource requirements in flowdef:
- need to change: queue, platform
- may change: walltime, memory, CPUs etc.
vi fastq_bam_bwa.def

## read check flowdef (shell)
flowr as.flowdef x=fastq_bam_bwa.def

## OR from R
as.flowdef(x='fastq_bam_bwa.def')

Read and check flowdef

A flow definition with default values has already been supplied, briefly,

4 Submit to cluster

4.1 Single step cluster submission

## get input fastqs
fqs1=~/flowr/genomes/target_brca_example/brca.example.illumina.0.1.fastq
fqs2=~/flowr/genomes/target_brca_example/brca.example.illumina.0.2.fastq

## submit to the cluster
flowr run x=fastq_bam_bwa fqs1=$fqs1 fqs2=$fqs2 samplename=samp execute=TRUE

## change the platform specified in flowdef
flowr run x=fastq_bam_bwa fqs1=$fqs1 fqs2=$fqs2 samplename=samp execute=TRUE platform=slurm
library(flowr)
fqpath = "~/flowr/genomes/target_brca_example"
## demonstrating that multiple fqs can be used here...
fobj = run(x = "fastq_bam_bwa", samplename = "samp1", execute = TRUE,
        fqs1 = rep(paste0(fqpath, "/brca.example.illumina.0.1.fastq"), 2), 
        fqs2 = rep(paste0(fqpath, "/brca.example.illumina.0.2.fastq"), 2))

OR from R using:

Refer to the help pages for more details on the run function.

4.2 Details regarding cluster submission

The run function performs several steps, finally submitting the commands to the cluster. It may be useful to go through these steps to understand the details.

1. Get user inputs

Using the name of the pipeline, run fetches it in various places inclusing ~/flowr/pipelines.

library(flowr)
setwd("~/flowr/pipelines")
source("fastq_bam_bwa.R")
## this may throw a warning if paths do not exist
## if you have used modules instead of full paths please ignore the warnings
load_opts("fastq_bam_bwa.conf") 

## Get example input
## these can be a vector of multiple paired-end files
## OR multiple single-end files
fqs1 = "~/flowr/genomes/target_brca_example/brca.example.illumina.0.1.fastq"
fqs2 = "~/flowr/genomes/target_brca_example/brca.example.illumina.0.2.fastq"
samp = "samplename"

## optionally specify the center, lane, platform etc.
set_opts(rg_center = "the_institute", rg_lane = "1")

## **Note:** load_opts checks if variables ending with 
## _exe, _path, _dir etc. exist or not.
## make sure they are all correct. 
## Ignore the warnings, if instead of specifying full path to a tool 
## you are using the module command.

Refer to the help pages of fetch_pipes and fetch_pipes for more details.

2. Read flow definition

def = as.flowdef("fastq_bam_bwa.def")
## def seems to be a file, reading it...
## 
## checking if required columns are present...
## checking if resources columns are present...
## checking if dependency column has valid names...
## checking if submission column has valid names...
## checking for missing rows in def...
## checking for extra rows in def...
## checking submission and dependency types...

The plot would work only if you have X11 etc enabled, i.e. if you logged into the cluster using ssh -X (or ssh -Y).

Optionally, one can edit all config files on their own machine, debug and sort issues; when done, upload them to the cluster and submit.

plot_flow(def) ## on a cluster, only works if graphics X11 is enabled. ssh -X