• 18S Errors
  • Accessing the VM
  • Error 1
    • Traceback
    • Comments
    • Solution
  • Error 2
    • Traceback
    • Comments
    • Solution
  • Run to completion
  • Samplesheet
  • Metadata File
  • Workflow warnings
    • DADA2 Parameters
    • Insufficient Depth After Trimming
    • Default Trimming Parameters
  • Default Parameters used

18S Errors

Accessing the VM

Aaron, you can access the VM via lugh:

ssh administrator@217.74.56.249
Gal2018

If you have any problems pass on your IP address and I’ll add it to the firewall rule.


If you want to deploy the workflow, each time you shell into the VM you will need to run sudo chmod 666 /var/run/docker.sock.


There are two errors I came across in the workflow which are described below. I’ve made three branches for you that will produce error 1, error 2 and run the workflow to completion.

Error 1

nextflow run -bg -r kelp_err1 BarryDigby/ampliseq -profile docker --input "18S_kreuger_samplesheet.tsv" --FW_primer "GTGYCAGCMGCCGCGGTAA" --RV_primer "GGACTACNVGGGTWTCTAAT" --metadata "18S_kreuger_metadata.tsv" --outdir "18S_results" --ignore_failed_trimming

Workflow fails at NFCORE_AMPLISEQ:AMPLISEQ:QIIME2_DIVERSITY:QIIME2_ALPHARAREFACTION step with error:

Parameter 'steps' received 0 as an argument, which is incompatible with parameter type: Int % Range(2, None)

Scratch directory where process failed: /home/administrator/work/35/65eda9245cb4dfc78ff54e4484ab47.

A look at the .command.sh script called for this process:

#!/bin/bash -euo pipefail
export XDG_CONFIG_HOME="${PWD}/HOME"

maxdepth=$(count_table_minmax_reads.py filtered-table.tsv maximum 2>&1)

#check values
if [ "$maxdepth" -gt "75000" ]; then maxdepth="75000"; fi
if [ "$maxdepth" -gt "5000" ]; then maxsteps="250"; else maxsteps=$((maxdepth/20)); fi
qiime diversity alpha-rarefaction          --i-table filtered-table.qza          --i-phylogeny rooted-tree.qza          --p-max-depth $maxdepth          --m-metadata-file 18S_kreuger_metadata.tsv          --p-steps $maxsteps          --p-iterations 10          --o-visualization alpha-rarefaction.qzv
qiime tools export --input-path alpha-rarefaction.qzv          --output-path alpha-rarefaction

cat <<-END_VERSIONS > versions.yml
"NFCORE_AMPLISEQ:AMPLISEQ:QIIME2_DIVERSITY:QIIME2_ALPHARAREFACTION":
    qiime2: $( qiime --version | sed -e "s/q2cli version //g" | tr -d '`' | sed -e "s/Run qiime info for more version details.//g" )
END_VERSIONS

Traceback

What happens here is the maxdepth variable is calculated using the python script count_table_minmax_reads.py which returns the max value for sum(cols) of filtered-table.tsv which is 15 for LHHF2_2:

filtered_table <- read.table("/data/KELP_ME/filtered.tsv", header=F, sep="\t")
cols <- c("ID", "LHST1_3",  "LHST2_1",  "LHST3_1B", "LHST3_2",  "LHHF1_2",  "LHHF2_1",  "LHHF2_2",  "LHHF3_1",  "LOST2_1",  "LOST2_2",  "LOST2_3",  "LOST3_3",  "LOHF1_2",  "LOHF3_2",  "LOHF3_3")
colnames(filtered_table) <- cols
DT::datatable(filtered_table, options = list(scrollX = TRUE, pageLength = 19, scroller = TRUE))

When calculating the maxsteps parameter in the if; else statement, 15/20 produces a value of 0 which is passed to the --p-steps $maxsteps argument in qiime, resulting in the Int % Range(2, None) error.

Comments

  1. Not every sample is being used for this step. I am unsure if this is a ‘scatter gather’ approach to calculating alpha rarefaction - I don’t know enough about this analysis step. Luke’s code on Github (rarecurve(merged.otus, step=1, xlab = "Seq. Depth", ylab = "No. of OTUs")) suggests using a merged OTU table - albeit Luke did not use Qiime2.

Solution

  1. I edited the workflow to hard-code the --p-steps variable to the default value of 10, but I imagine they are doing this dynamically for good reason. Documentation on the parameter:

–p-steps (INTEGER): The number of rarefaction depths to include between Range(2, None) min-depth and max-depth. [default: 10]

The updated version containing the hard-coded parameter for --p-steps is reflected on the kelp_err2 branch, which is used to produce error #2 below.

Error 2

nextflow run -bg -r kelp_err2 BarryDigby/ampliseq -profile docker --input "18S_kreuger_samplesheet.tsv" --FW_primer "GTGYCAGCMGCCGCGGTAA" --RV_primer "GGACTACNVGGGTWTCTAAT" --metadata "18S_kreuger_metadata.tsv" --outdir "18S_results" --ignore_failed_trimming

Workflow fails at step NFCORE_AMPLISEQ:AMPLISEQ:QIIME2_DIVERSITY:QIIME2_DIVERSITY_ALPHA (evenness_vector) with error:

All numbers are identical in kruskal

Scratch directory where process failed: /home/administrator/work/20/061ffdf0bbc9aa46c9ef95524bfb82.

Traceback

Looking at the .command.sh file executed:

#!/bin/bash -euo pipefail
export XDG_CONFIG_HOME="${PWD}/HOME"

qiime diversity alpha-group-significance             --i-alpha-diversity evenness_vector.qza             --m-metadata-file 18S_kreuger_metadata.tsv             --o-visualization evenness_vector-vis.qzv
qiime tools export --input-path evenness_vector-vis.qzv             --output-path "alpha_diversity/evenness_vector"

cat <<-END_VERSIONS > versions.yml
"NFCORE_AMPLISEQ:AMPLISEQ:QIIME2_DIVERSITY:QIIME2_DIVERSITY_ALPHA":
    qiime2: $( qiime --version | sed -e "s/q2cli version //g" | tr -d '`' | sed -e "s/Run qiime info for more version details.//g" )
END_VERSIONS

The input file to this process is the 18S_kreuger_metadata.tsv file (below) and evenness_vector.qza which is a special Qiime2 file. I couldnt make sense of the data structures used in the qza file - looks like a JSON file in a provenance graph. File attached in the email.

Comments

  1. Fairly sure this is a straight up python error where it thinks all of the samples in a group are similar.

  2. Discussed on Qiime2 forums here: https://forum.qiime2.org/t/error-plugin-error-from-diversity-all-numbers-are-identical-in-kruskal/15033/3

  3. Only fails for ‘evenness’, so I think my metadata file is OK?

Solution

  1. Edited the configuration file to ignore exit status codes. Not recommended, but in the completed run the only process to fail was the NFCORE_AMPLISEQ:AMPLISEQ:QIIME2_DIVERSITY:QIIME2_DIVERSITY_ALPHA (evenness_vector) hinting that there was no knock on downstream.

Run to completion

nextflow run -bg -r kelp_completion BarryDigby/ampliseq -profile docker --input "18S_kreuger_samplesheet.tsv" --FW_primer "GTGYCAGCMGCCGCGGTAA" --RV_primer "GGACTACNVGGGTWTCTAAT" --metadata "18S_kreuger_metadata.tsv" --outdir "18S_results" --ignore_failed_trimming

Ignores the error produced by Kruskal test in error2. Runs to completion.

Samplesheet

samplesheet <- read.table("/data/KELP_ME/18S_kreuger_samplesheet.tsv", header = T, sep="\t")
samplesheet$forwardReads <- sapply(strsplit(samplesheet$forwardReads, split = "/", fixed=T), tail, 1L)
samplesheet$reverseReads <- sapply(strsplit(samplesheet$reverseReads, split = "/", fixed=T), tail, 1L)

DT::datatable(samplesheet, options = list(scrollX = TRUE, pageLength = 40, scroller = TRUE))

Metadata File

metadata <- read.table("/data/KELP_ME/18S_kreuger_metadata.tsv", header=T, sep="\t")
DT::datatable(metadata, options = list(scrollX = TRUE, pageLength = 40, scroller = TRUE))

Workflow warnings

These popped up during all of the runs, but personally I don’t think they are causing the errors (open to correction of course..)

DADA2 Parameters

WARN: No DADA2 cutoffs were specified (`--trunclenf` & --`trunclenr`), therefore reads will be truncated where median quality drops below 25 (defined by `--trunc_qmin`) but at least a fraction of 0.75 (defined by `--trunc_rmin`) of the reads will be retained.
The chosen cutoffs do not account for required overlap for merging, therefore DADA2 might have poor merging efficiency or even fail.

Yells at me for using default parameters for DADA2. Can ignore this as DADA2 runs without issue.

Insufficient Depth After Trimming

WARN: The following samples had too small file size (<1KB) after trimming with cutadapt:
LHST3_3
Ignoring failed samples and continue!

Trimming greatly reduces the depth of this sample - however given multiple samples and runs contribute to the representative L. Hyperborea Stipe sample, I don’t think it has a negative effect downstream. It’s not like it’s adding junk, it’s just adding very little.

Definitely does not cause error1 IMO.

Default Trimming Parameters

WARN: Probably everything is fine, but this is a reminder that `--trunclenf` was set automatically to 241 and `--trunclenr` to 222. If this doesnt seem reasonable, then please change `--trunc_qmin` (and `--trunc_rmin`), or set `--trunclenf` and `--trunclenr` directly.

Triggered when default cutadapt params are used.

Default Parameters used

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    nf-core/ampliseq Nextflow config file
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Default config options for all compute environments
----------------------------------------------------------------------------------------
*/

// Global default params, used in configs
params {

    // Input options
    input                      = null
    extension                  = "/*_R{1,2}_001.fastq.gz"
    pacbio                     = false
    iontorrent                 = false
    FW_primer                  = null
    RV_primer                  = null
    classifier                 = null
    metadata                   = null

    // Other options
    trunc_qmin        = 25
    trunc_rmin        = 0.75
    trunclenf         = null
    trunclenr         = null
    max_ee            = 2
    max_len           = null
    min_len           = 50
    metadata_category = null
    double_primer     = false
    retain_untrimmed  = false
    exclude_taxa      = "mitochondria,chloroplast"
    min_frequency     = 1
    min_samples       = 1
    multiple_sequencing_runs = false
    single_end        = false
    sample_inference  = "independent"
    illumina_pe_its   = false
    concatenate_reads = false
    cut_its           = "none"
    its_partial       = 0
    picrust           = false
    sbdiexport        = false
    dada_tax_agglom_min = 2
    dada_tax_agglom_max = 7
    qiime_tax_agglom_min = 2
    qiime_tax_agglom_max = 6
    ignore_failed_trimming = false
    ignore_empty_input_files = false
    qiime_adonis_formula = null
    seed              = 100
    filter_ssu        = null

    // Skipping options
    skip_cutadapt          = false
    skip_barrnap           = false
    skip_qiime             = false
    skip_fastqc            = false
    skip_alpha_rarefaction = false
    skip_abundance_tables  = false
    skip_barplot           = false
    skip_taxonomy          = false
    skip_dada_addspecies   = false
    skip_alpha_rarefaction = false
    skip_diversity_indices = false
    skip_ancom             = false
    skip_multiqc           = false

    // Database options
    dada_ref_taxonomy     = "silva=138"
    cut_dada_ref_taxonomy = false
    qiime_ref_taxonomy    = null

    // MultiQC options
    multiqc_config             = null
    multiqc_title              = null
    max_multiqc_email_size     = '25.MB'

    // Boilerplate options
    outdir                     = null
    tracedir                   = "${params.outdir}/pipeline_info"
    publish_dir_mode           = 'copy'
    email                      = null
    email_on_fail              = null
    plaintext_email            = false
    monochrome_logs            = false
    help                       = false
    validate_params            = true
    show_hidden_params         = false
    schema_ignore_params       = 'dada_ref_databases,qiime_ref_databases,igenomes_base'
    enable_conda               = false

    // Config options
    custom_config_version      = 'master'
    custom_config_base         = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}"
    config_profile_description = null
    config_profile_contact     = null
    config_profile_url         = null
    config_profile_name        = null

    // Max resource options
    // Defaults only, expecting to be overwritten
    max_memory                 = '56.GB'
    max_cpus                   = 16
    max_time                   = '240.h'

}

// Load base.config by default for all pipelines
includeConfig 'conf/base.config'

// Load nf-core custom profiles from different Institutions
try {
    includeConfig "${params.custom_config_base}/nfcore_custom.config"
} catch (Exception e) {
    System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/nfcore_custom.config")
}

// Load nf-core/ampliseq custom profiles from different institutions.
// Warning: Uncomment only if a pipeline-specific instititutional config already exists on nf-core/configs!
// try {
//   includeConfig "${params.custom_config_base}/pipeline/ampliseq.config"
// } catch (Exception e) {
//   System.err.println("WARNING: Could not load nf-core/config/ampliseq profiles: ${params.custom_config_base}/pipeline/ampliseq.config")
// }


profiles {
    debug { process.beforeScript = 'echo $HOSTNAME' }
    conda {
        params.enable_conda    = true
        docker.enabled         = false
        singularity.enabled    = false
        podman.enabled         = false
        shifter.enabled        = false
        charliecloud.enabled   = false
    }
    docker {
        docker.enabled         = true
        docker.runOptions = '-u \$(id -u):\$(id -g)'
        singularity.enabled    = false
        podman.enabled         = false
        shifter.enabled        = false
        charliecloud.enabled   = false
    }
    singularity {
        singularity.enabled    = true
        singularity.autoMounts = true
        docker.enabled         = false
        podman.enabled         = false
        shifter.enabled        = false
        charliecloud.enabled   = false
    }
    podman {
        podman.enabled         = true
        docker.enabled         = false
        singularity.enabled    = false
        shifter.enabled        = false
        charliecloud.enabled   = false
    }
    shifter {
        shifter.enabled        = true
        docker.enabled         = false
        singularity.enabled    = false
        podman.enabled         = false
        charliecloud.enabled   = false
    }
    charliecloud {
        charliecloud.enabled   = true
        docker.enabled         = false
        singularity.enabled    = false
        podman.enabled         = false
        shifter.enabled        = false
    }
    test               { includeConfig 'conf/test.config'               }
    test_single        { includeConfig 'conf/test_single.config'        }
    test_multi         { includeConfig 'conf/test_multi.config'         }
    test_doubleprimers { includeConfig 'conf/test_doubleprimers.config' }
    test_pacbio_its    { includeConfig 'conf/test_pacbio_its.config'    }
    test_iontorrent    { includeConfig 'conf/test_iontorrent.config'    }
    test_fasta         { includeConfig 'conf/test_fasta.config'         }
    test_full          { includeConfig 'conf/test_full.config'          }
}

// Export these variables to prevent local Python/R libraries from conflicting with those in the container
// The JULIA depot path has been adjusted to a fixed path `/usr/local/share/julia` that needs to be used for packages in the container.
// See https://apeltzer.github.io/post/03-julia-lang-nextflow/ for details on that. Once we have a common agreement on where to keep Julia packages, this is adjustable.

env {
    PYTHONNOUSERSITE = 1
    R_PROFILE_USER   = "/.Rprofile"
    R_ENVIRON_USER   = "/.Renviron"
    JULIA_DEPOT_PATH = "/usr/local/share/julia"
}

// Capture exit codes from upstream processes when piping
process.shell = ['/bin/bash', '-euo', 'pipefail']

def trace_timestamp = new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss')
timeline {
    enabled = true
    file    = "${params.tracedir}/execution_timeline_${trace_timestamp}.html"
}
report {
    enabled = true
    file    = "${params.tracedir}/execution_report_${trace_timestamp}.html"
}
trace {
    enabled = true
    file    = "${params.tracedir}/execution_trace_${trace_timestamp}.txt"
}
dag {
    enabled = true
    file    = "${params.tracedir}/pipeline_dag_${trace_timestamp}.svg"
}

manifest {
    name            = 'nf-core/ampliseq'
    author          = 'Daniel Straub, Alexander Peltzer'
    homePage        = 'https://github.com/nf-core/ampliseq'
    description     = 'Amplicon sequencing analysis workflow using DADA2 and QIIME2'
    mainScript      = 'main.nf'
    nextflowVersion = '!>=21.10.3'
    version         = '2.3.1'
}

// Load modules.config for DSL2 module specific options
includeConfig 'conf/modules.config'

// Load ref_databases.config for reference taxonomy
includeConfig 'conf/ref_databases.config'

// Function to ensure that resource requirements don't go beyond
// a maximum limit
def check_max(obj, type) {
    if (type == 'memory') {
        try {
            if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1)
                return params.max_memory as nextflow.util.MemoryUnit
            else
                return obj
        } catch (all) {
            println "   ### ERROR ###   Max memory '${params.max_memory}' is not valid! Using default value: $obj"
            return obj
        }
    } else if (type == 'time') {
        try {
            if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1)
                return params.max_time as nextflow.util.Duration
            else
                return obj
        } catch (all) {
            println "   ### ERROR ###   Max time '${params.max_time}' is not valid! Using default value: $obj"
            return obj
        }
    } else if (type == 'cpus') {
        try {
            return Math.min( obj, params.max_cpus as int )
        } catch (all) {
            println "   ### ERROR ###   Max cpus '${params.max_cpus}' is not valid! Using default value: $obj"
            return obj
        }
    }
}