Preparing a computer with Ubuntu to run the pipeline

After installing Ubuntu, there are some applications which need to be installed and set up in order to run our pipeline.

Using the apt package manager means that not all of the software packages will be in their most up-to-date versions. However, I tend to prefer installing everything with apt that’s possible, because using Conda often results in conflicts with dependencies. Also, using pip on a computer which uses Conda sometimes creates problems, too, so it’s better to avoid that. In any case, before installing things, run

sudo apt update && sudo apt upgrade

Install available packages with apt


sudo apt install vsearch
sudo apt install fastqc
sudo apt install multiqc
sudo apt install cutadapt
sudo apt install git
sudo apt install swarm
sudo apt install gawk

sudo apt install r-base r-base-dev -y
sudo apt install littler

Install Usearch

  • Download the binary from Robert Edgar’s website (Drive5).
  • Copy / move to somewhere in your PATH.
  • Renaming it to simply “usearch” is convenient for scripts.
  • Do a chmod +x on it.

(optional) Install Miniconda & Bioconda

Some of the programs we need are available through Conda. There is a guide here: https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html

These four commands quickly and quietly install the latest 64-bit version of the installer and then clean up after themselves.

To install a different version or architecture of Miniconda for Linux, change the name of the .sh installer in the wget command.

mkdir -p ~/miniconda3 wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3 rm -rf ~/miniconda3/miniconda.sh

After installing, initialize your newly-installed Miniconda:

~/miniconda3/bin/conda init bash

Set up channels for conda: conda config –add channels defaults conda config –add channels bioconda conda config –add channels conda-forge

(End of pre-processing requirements)

For BLASTing, install NCBI BLAST+ according to the site’s instructions.


Configuring Git with Bitbucket

Set up the configuration using git config
  • Set your username

  • Directions here but not all this is necessary https://git-scm.com/book/en/v2/Getting-Started-First-Time-Git-Setup ## Next, connect it to the Bitbucket account. Clone the repo using SSH. Before doing that though, you have to set up ssh key pairs on your local computer and Bitbucket account. On local computer: ssh-keygen

  • Press Enter to save the keypair with the default names/locations.

  • Press Enter for an empty passphrase.

  • It will show you a randomart image.

  • Now go into your Bitbucket account settings. Under SSH keys, click on new key.

  • Give it a name, probably something to do with the computer.

  • Basically paste the contents of the public key file into the box. Save, and it’s done.

  • Now, you can clone the repository with SSH.

  • Find it on Bitbucket, and go to the Clone menu (in the right corner).

  • Copy the SHH clone command (not the https one).

  • On your local terminal, navigate to the directory in which you want to put the pipeline-utils repository. Traditionally, ~/Desktop/

  • Paste the git clone ssh command there. It should work. 💫

Set up the folders in which things are expected to be.

Some large files are not in the pipeline-utils repo, and will need to be added manually. They are stored on SwarmCluster. For example, my taxonkit_files folder is here: /home/laur/Desktop/pipeline_utils/NCBI_taxonomy/taxonkit_files and here’s this one /home/laur/Desktop/pipeline_utils/NCBI_taxonomy/taxonkit_files/2023_refresh/acc_taxid.tmp

The RDP classifier is also on SC Here is where I put my executable, for example /home/laur/Desktop/rdp_classifier_2.13/dist/classifier.jar

And the trained model for COI /home/laur/Desktop/rdp_classifier_2.13/mydata_trained_v4.0.1/rRNAClassifier.properties

There are files needed for running RDP and the wrapper script. I’m not sure where they are. * CUSTOM_STRING_TEMPLATE.txt * edit_custom_string_for_consensus_taxonomy.sh * Filtered_Animalia_COI_BIN_species_countries_July_2023.tsv * RDP_classify_COI.sh, redlist_Mueller_2020_grouped.tsv * taxa_nubkeys_uniq.tsv * run_VB_02_08_2023.sh

Post-processing

Krona

Follow the directions on the GitHub documentation: https://github.com/marbl/Krona/releases

otusamples2krona

Here is the GitHub: https://github.com/GenomicaMicrob/OTUsamples2krona The Installation instruction say: Clone this repository in you linux machine, make it executable and you are ready to go. git clone https://github.com/GenomicaMicrob/OTUsamples2krona.git cd OTUsamples2krona chmod +x OTUsamples2krona.v0.2.3.sh

(optional) Install Anydesk

Disable Wayland, if necessary for using Anydesk