In this tutorial, I will perform translational genomics. I will take a gene from one genome and identify its location in another using Blast.

Download data

Here, I went to NCBI.nlm.nih.gov and searched for Seriola rivoliana then clicked on the Assembly link under Genomes. The download link can be found on the right hand side under Download the GenBank assembly.

# clear workspace
# rm(list = ls())

# set working directory
setwd("/Users/nnthieu/Downloads/GenomeProject1000/Blast/")

# Define the PLINK command
get_data_command <- "wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/994/505/GCA_002994505.1_ASM299450v1/GCA_002994505.1_ASM299450v1_genomic.fna.gz"

# Run the command
system(get_data_command)

# unzip data
system("gunzip GCA_002994505.1_ASM299450v1_genomic.fna.gz")

Obtain the gene sequence

From NCBI.nlm.nih.gov I search for . Manually type out the C-lectin gene in the Paper and add a fasta header. I chose the protein sequence as there were fewer letters to type out.

C-LECTIN MKTLLILSVVLCAALSVRAAAVVPAEAATAQLGDKAAPEPEAVKDTAVEDTAVEETAVEDTAVEETAVEDTAVEETAVEDTAVEETAVEDTAVEDTAVEDTAVEDTAVEDTAVEETAVEDTAVEDTAVEDTAVAAGRPAGLRQTRLSFCLDGWQSFSGKCYFLANHPDSWANAERFCASYEGSLASVGSIWEYNFLQRMVKTGGHAFAWIGGYYFQGEWRWEDGSRFDY SNWDTPRSTAYYQCLLLNSQVSMGWSNNGCNMNFPFVCQVRQLNC

In Terminal I code ‘nano benediniaGene.fasta’ to create fasta file ‘benediniaGene.fast’

Create a database to reference

# create a database
system("makeblastdb -in GCA_002994505.1_ASM299450v1_genomic.fna -dbtype nucl -input_type fasta -out SerRivdb")

Blast

# blast
system("tblastn -db SerRivdb -query benediniaGene.fasta  -out blastout.txt")

make format more readable

# change output for more readable
system("tblastn -db SerRivdb -query benediniaGene.fasta  -outfmt '6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qcovs salltitles' -num_threads 16 -out blastout2.txt")
# percent > 50
system("more blastout2.txt | awk '$3>50' ")

Multiple alignment blast

Download data from NCBI

Search for sequeneces of KJ677130.1, KJ726376.1, MN402681.1, MT341277.1, MT341284.1 and save them into .fna files.

I combine these 5 files as follow:

’cat *.1.fna >input.fna’

Multiple alignment

I get to website of https://www.ebi.ac.uk/jdispatcher/msa/clustalo/ to perform multiple alignment using Blast

input file input.fna then wait and download the output file as “clustalo-I20240826-081537-0648-80894241-p1m.aln-fasta”

Plot Phylogenetic Tree

library(ape)
library(phangorn)
alignment <- read.dna("/Users/nnthieu/Downloads/GenomeProject1000/Blast/clustalo-I20240826-081537-0648-80894241-p1m.aln-fasta",format ="fasta")
dist_matrix <- dist.dna(alignment)
tree <- nj(dist_matrix)
plot(tree, main = "Phylogenetic Tree")

Genes KJ677130.1 and KJ726376.1 are closed together in phylogenie. Other genes are far from phylogenes.