Aaron, you can access the VM via lugh:
ssh administrator@217.74.56.249
Gal2018
If you have any problems pass on your IP address and I’ll add it to the firewall rule.
If you want to deploy the workflow, each time you shell into the VM you will need to run sudo chmod 666 /var/run/docker.sock
.
There are two errors I came across in the workflow which are described below. I’ve made three branches for you that will produce error 1, error 2 and run the workflow to completion.
nextflow run -bg -r kelp_err1 BarryDigby/ampliseq -profile docker --input "18S_kreuger_samplesheet.tsv" --FW_primer "GTGYCAGCMGCCGCGGTAA" --RV_primer "GGACTACNVGGGTWTCTAAT" --metadata "18S_kreuger_metadata.tsv" --outdir "18S_results" --ignore_failed_trimming
Workflow fails at NFCORE_AMPLISEQ:AMPLISEQ:QIIME2_DIVERSITY:QIIME2_ALPHARAREFACTION
step with error:
Parameter 'steps' received 0 as an argument, which is incompatible with parameter type: Int % Range(2, None)
Scratch directory where process failed: /home/administrator/work/35/65eda9245cb4dfc78ff54e4484ab47
.
A look at the .command.sh
script called for this process:
#!/bin/bash -euo pipefail
export XDG_CONFIG_HOME="${PWD}/HOME"
maxdepth=$(count_table_minmax_reads.py filtered-table.tsv maximum 2>&1)
#check values
if [ "$maxdepth" -gt "75000" ]; then maxdepth="75000"; fi
if [ "$maxdepth" -gt "5000" ]; then maxsteps="250"; else maxsteps=$((maxdepth/20)); fi
qiime diversity alpha-rarefaction --i-table filtered-table.qza --i-phylogeny rooted-tree.qza --p-max-depth $maxdepth --m-metadata-file 18S_kreuger_metadata.tsv --p-steps $maxsteps --p-iterations 10 --o-visualization alpha-rarefaction.qzv
qiime tools export --input-path alpha-rarefaction.qzv --output-path alpha-rarefaction
cat <<-END_VERSIONS > versions.yml
"NFCORE_AMPLISEQ:AMPLISEQ:QIIME2_DIVERSITY:QIIME2_ALPHARAREFACTION":
qiime2: $( qiime --version | sed -e "s/q2cli version //g" | tr -d '`' | sed -e "s/Run qiime info for more version details.//g" ) END_VERSIONS
What happens here is the maxdepth
variable is calculated using the python script count_table_minmax_reads.py
which returns the max value for sum(cols)
of filtered-table.tsv
which is 15 for LHHF2_2
:
<- read.table("/data/KELP_ME/filtered.tsv", header=F, sep="\t")
filtered_table <- c("ID", "LHST1_3", "LHST2_1", "LHST3_1B", "LHST3_2", "LHHF1_2", "LHHF2_1", "LHHF2_2", "LHHF3_1", "LOST2_1", "LOST2_2", "LOST2_3", "LOST3_3", "LOHF1_2", "LOHF3_2", "LOHF3_3")
cols colnames(filtered_table) <- cols
::datatable(filtered_table, options = list(scrollX = TRUE, pageLength = 19, scroller = TRUE)) DT
When calculating the maxsteps
parameter in the if; else
statement, 15/20 produces a value of 0 which is passed to the --p-steps $maxsteps
argument in qiime
, resulting in the Int % Range(2, None)
error.
--p-steps
variable to the default value of 10, but I imagine they are doing this dynamically for good reason. Documentation on the parameter:–p-steps (INTEGER): The number of rarefaction depths to include between Range(2, None) min-depth and max-depth. [default: 10]
The updated version containing the hard-coded parameter for --p-steps
is reflected on the kelp_err2
branch, which is used to produce error #2 below.
nextflow run -bg -r kelp_err2 BarryDigby/ampliseq -profile docker --input "18S_kreuger_samplesheet.tsv" --FW_primer "GTGYCAGCMGCCGCGGTAA" --RV_primer "GGACTACNVGGGTWTCTAAT" --metadata "18S_kreuger_metadata.tsv" --outdir "18S_results" --ignore_failed_trimming
Workflow fails at step NFCORE_AMPLISEQ:AMPLISEQ:QIIME2_DIVERSITY:QIIME2_DIVERSITY_ALPHA (evenness_vector)
with error:
All numbers are identical in kruskal
Scratch directory where process failed: /home/administrator/work/20/061ffdf0bbc9aa46c9ef95524bfb82
.
Looking at the .command.sh
file executed:
#!/bin/bash -euo pipefail
export XDG_CONFIG_HOME="${PWD}/HOME"
qiime diversity alpha-group-significance --i-alpha-diversity evenness_vector.qza --m-metadata-file 18S_kreuger_metadata.tsv --o-visualization evenness_vector-vis.qzv
qiime tools export --input-path evenness_vector-vis.qzv --output-path "alpha_diversity/evenness_vector"
cat <<-END_VERSIONS > versions.yml
"NFCORE_AMPLISEQ:AMPLISEQ:QIIME2_DIVERSITY:QIIME2_DIVERSITY_ALPHA":
qiime2: $( qiime --version | sed -e "s/q2cli version //g" | tr -d '`' | sed -e "s/Run qiime info for more version details.//g" ) END_VERSIONS
The input file to this process is the 18S_kreuger_metadata.tsv
file (below) and evenness_vector.qza
which is a special Qiime2
file. I couldnt make sense of the data structures used in the qza
file - looks like a JSON file in a provenance graph. File attached in the email.
Fairly sure this is a straight up python error where it thinks all of the samples in a group are similar.
Discussed on Qiime2
forums here: https://forum.qiime2.org/t/error-plugin-error-from-diversity-all-numbers-are-identical-in-kruskal/15033/3
Only fails for ‘evenness
’, so I think my metadata file is OK?
NFCORE_AMPLISEQ:AMPLISEQ:QIIME2_DIVERSITY:QIIME2_DIVERSITY_ALPHA (evenness_vector)
hinting that there was no knock on downstream.nextflow run -bg -r kelp_completion BarryDigby/ampliseq -profile docker --input "18S_kreuger_samplesheet.tsv" --FW_primer "GTGYCAGCMGCCGCGGTAA" --RV_primer "GGACTACNVGGGTWTCTAAT" --metadata "18S_kreuger_metadata.tsv" --outdir "18S_results" --ignore_failed_trimming
Ignores the error produced by Kruskal test in error2. Runs to completion.
<- read.table("/data/KELP_ME/18S_kreuger_samplesheet.tsv", header = T, sep="\t")
samplesheet $forwardReads <- sapply(strsplit(samplesheet$forwardReads, split = "/", fixed=T), tail, 1L)
samplesheet$reverseReads <- sapply(strsplit(samplesheet$reverseReads, split = "/", fixed=T), tail, 1L)
samplesheet
::datatable(samplesheet, options = list(scrollX = TRUE, pageLength = 40, scroller = TRUE)) DT
<- read.table("/data/KELP_ME/18S_kreuger_metadata.tsv", header=T, sep="\t")
metadata ::datatable(metadata, options = list(scrollX = TRUE, pageLength = 40, scroller = TRUE)) DT
These popped up during all of the runs, but personally I don’t think they are causing the errors (open to correction of course..)
WARN: No DADA2 cutoffs were specified (`--trunclenf` & --`trunclenr`), therefore reads will be truncated where median quality drops below 25 (defined by `--trunc_qmin`) but at least a fraction of 0.75 (defined by `--trunc_rmin`) of the reads will be retained.
The chosen cutoffs do not account for required overlap for merging, therefore DADA2 might have poor merging efficiency or even fail.
Yells at me for using default parameters for DADA2
. Can ignore this as DADA2
runs without issue.
WARN: The following samples had too small file size (<1KB) after trimming with cutadapt:
LHST3_3
Ignoring failed samples and continue!
Trimming greatly reduces the depth of this sample - however given multiple samples and runs contribute to the representative L. Hyperborea Stipe
sample, I don’t think it has a negative effect downstream. It’s not like it’s adding junk, it’s just adding very little.
Definitely does not cause error1 IMO.
WARN: Probably everything is fine, but this is a reminder that `--trunclenf` was set automatically to 241 and `--trunclenr` to 222. If this doesnt seem reasonable, then please change `--trunc_qmin` (and `--trunc_rmin`), or set `--trunclenf` and `--trunclenr` directly.
Triggered when default cutadapt
params are used.
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
nf-core/ampliseq Nextflow config file
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Default config options for all compute environments
----------------------------------------------------------------------------------------
*/
// Global default params, used in configs
params {
// Input options
input = null
extension = "/*_R{1,2}_001.fastq.gz"
pacbio = false
iontorrent = false
FW_primer = null
RV_primer = null
classifier = null
metadata = null
// Other options
trunc_qmin = 25
trunc_rmin = 0.75
trunclenf = null
trunclenr = null
max_ee = 2
max_len = null
min_len = 50
metadata_category = null
double_primer = false
retain_untrimmed = false
exclude_taxa = "mitochondria,chloroplast"
min_frequency = 1
min_samples = 1
multiple_sequencing_runs = false
single_end = false
sample_inference = "independent"
illumina_pe_its = false
concatenate_reads = false
cut_its = "none"
its_partial = 0
picrust = false
sbdiexport = false
dada_tax_agglom_min = 2
dada_tax_agglom_max = 7
qiime_tax_agglom_min = 2
qiime_tax_agglom_max = 6
ignore_failed_trimming = false
ignore_empty_input_files = false
qiime_adonis_formula = null
seed = 100
filter_ssu = null
// Skipping options
skip_cutadapt = false
skip_barrnap = false
skip_qiime = false
skip_fastqc = false
skip_alpha_rarefaction = false
skip_abundance_tables = false
skip_barplot = false
skip_taxonomy = false
skip_dada_addspecies = false
skip_alpha_rarefaction = false
skip_diversity_indices = false
skip_ancom = false
skip_multiqc = false
// Database options
dada_ref_taxonomy = "silva=138"
cut_dada_ref_taxonomy = false
qiime_ref_taxonomy = null
// MultiQC options
multiqc_config = null
multiqc_title = null
max_multiqc_email_size = '25.MB'
// Boilerplate options
outdir = null
tracedir = "${params.outdir}/pipeline_info"
publish_dir_mode = 'copy'
email = null
email_on_fail = null
plaintext_email = false
monochrome_logs = false
help = false
validate_params = true
show_hidden_params = false
schema_ignore_params = 'dada_ref_databases,qiime_ref_databases,igenomes_base'
enable_conda = false
// Config options
custom_config_version = 'master'
custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}"
config_profile_description = null
config_profile_contact = null
config_profile_url = null
config_profile_name = null
// Max resource options
// Defaults only, expecting to be overwritten
max_memory = '56.GB'
max_cpus = 16
max_time = '240.h'
}
// Load base.config by default for all pipelines
includeConfig 'conf/base.config'
// Load nf-core custom profiles from different Institutions
try {
includeConfig "${params.custom_config_base}/nfcore_custom.config"
} catch (Exception e) {
System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/nfcore_custom.config")
}
// Load nf-core/ampliseq custom profiles from different institutions.
// Warning: Uncomment only if a pipeline-specific instititutional config already exists on nf-core/configs!
// try {
// includeConfig "${params.custom_config_base}/pipeline/ampliseq.config"
// } catch (Exception e) {
// System.err.println("WARNING: Could not load nf-core/config/ampliseq profiles: ${params.custom_config_base}/pipeline/ampliseq.config")
// }
profiles {
debug { process.beforeScript = 'echo $HOSTNAME' }
conda {
params.enable_conda = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
docker {
docker.enabled = true
docker.runOptions = '-u \$(id -u):\$(id -g)'
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
singularity {
singularity.enabled = true
singularity.autoMounts = true
docker.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
podman {
podman.enabled = true
docker.enabled = false
singularity.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
shifter {
shifter.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
charliecloud.enabled = false
}
charliecloud {
charliecloud.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
}
test { includeConfig 'conf/test.config' }
test_single { includeConfig 'conf/test_single.config' }
test_multi { includeConfig 'conf/test_multi.config' }
test_doubleprimers { includeConfig 'conf/test_doubleprimers.config' }
test_pacbio_its { includeConfig 'conf/test_pacbio_its.config' }
test_iontorrent { includeConfig 'conf/test_iontorrent.config' }
test_fasta { includeConfig 'conf/test_fasta.config' }
test_full { includeConfig 'conf/test_full.config' }
}
// Export these variables to prevent local Python/R libraries from conflicting with those in the container
// The JULIA depot path has been adjusted to a fixed path `/usr/local/share/julia` that needs to be used for packages in the container.
// See https://apeltzer.github.io/post/03-julia-lang-nextflow/ for details on that. Once we have a common agreement on where to keep Julia packages, this is adjustable.
env {
PYTHONNOUSERSITE = 1
R_PROFILE_USER = "/.Rprofile"
R_ENVIRON_USER = "/.Renviron"
JULIA_DEPOT_PATH = "/usr/local/share/julia"
}
// Capture exit codes from upstream processes when piping
process.shell = ['/bin/bash', '-euo', 'pipefail']
def trace_timestamp = new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss')
timeline {
enabled = true
file = "${params.tracedir}/execution_timeline_${trace_timestamp}.html"
}
report {
enabled = true
file = "${params.tracedir}/execution_report_${trace_timestamp}.html"
}
trace {
enabled = true
file = "${params.tracedir}/execution_trace_${trace_timestamp}.txt"
}
dag {
enabled = true
file = "${params.tracedir}/pipeline_dag_${trace_timestamp}.svg"
}
manifest {
name = 'nf-core/ampliseq'
author = 'Daniel Straub, Alexander Peltzer'
homePage = 'https://github.com/nf-core/ampliseq'
description = 'Amplicon sequencing analysis workflow using DADA2 and QIIME2'
mainScript = 'main.nf'
nextflowVersion = '!>=21.10.3'
version = '2.3.1'
}
// Load modules.config for DSL2 module specific options
includeConfig 'conf/modules.config'
// Load ref_databases.config for reference taxonomy
includeConfig 'conf/ref_databases.config'
// Function to ensure that resource requirements don't go beyond
// a maximum limit
def check_max(obj, type) {
if (type == 'memory') {
try {
if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1)
return params.max_memory as nextflow.util.MemoryUnit
else
return obj
} catch (all) {
println " ### ERROR ### Max memory '${params.max_memory}' is not valid! Using default value: $obj"
return obj
}
} else if (type == 'time') {
try {
if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1)
return params.max_time as nextflow.util.Duration
else
return obj
} catch (all) {
println " ### ERROR ### Max time '${params.max_time}' is not valid! Using default value: $obj"
return obj
}
} else if (type == 'cpus') {
try {
return Math.min( obj, params.max_cpus as int )
} catch (all) {
println " ### ERROR ### Max cpus '${params.max_cpus}' is not valid! Using default value: $obj"
return obj
}
}
}
Comments
rarecurve(merged.otus, step=1, xlab = "Seq. Depth", ylab = "No. of OTUs")
) suggests using a merged OTU table - albeit Luke did not useQiime2
.