Now activate the environment you created last week.

conda activate bfblab

Install the tools that we will use

#Install FASTQC for sequence quality check
conda install -c bioconda fastqc

#Install Megahit for de novo assembly
conda install -c bioconda megahit
#Then, this will show the user manual:
megahit -h

#Install gdown for file downloading
conda install -c conda-forge gdown

Obtain the sequencing reads and check their quality

First, create a separate folder for the lab exercises:

cd ~

mkdir Lab07

cd Lab07

Download the data from this link using wget:

#Download the first read:
gdown https://drive.google.com/uc?id=1CiRkrUcP3S_oNiluGQ1Mh1qSU-3FpKdF

#Download the second read:
gdown https://drive.google.com/uc?id=14di4CJ_J8TrRwISRQm0Nbt9_tILcYF9N

Open the file and visualize the reads file, fastq. Pay attention to using ‘|’ (pipe)

gunzip -c read_1.fq.gz | less

Check the quality of sequences:

Run the FASTQC software on both read_1 and read_2 files as follows:

fastqc *.gz

Open the html outputs generated using a web browser. Investigate the results.

De Novo Genome assembly using MegaHit

megahit -1 reads_1.fq.gz -2 reads_2.fq.gz -o Unknown_genome_megahit

This command will create a folder called ‘Unknown_genome_megahit’. Please, navigate into that folder.

cd Unknown_genome_megahit/

ls -al 

#And open the final contigs fasta file
less final.contigs.fa

Questions to answer:

1. How many contigs did the assembly produce?
2. What is the length of the total genome size?
3. What is the coverage (C) of this genome?
4. Which organism does this genome belong to? How can you find out?