In this tutorial, I will perform translational genomics. I will take a gene from one genome and identify its location in another using Blast.
Here, I went to NCBI.nlm.nih.gov and searched for Seriola rivoliana then clicked on the Assembly link under Genomes. The download link can be found on the right hand side under Download the GenBank assembly.
# clear workspace
# rm(list = ls())
# set working directory
setwd("/Users/nnthieu/Downloads/GenomeProject1000/Blast/")
# Define the PLINK command
get_data_command <- "wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/994/505/GCA_002994505.1_ASM299450v1/GCA_002994505.1_ASM299450v1_genomic.fna.gz"
# Run the command
system(get_data_command)
# unzip data
system("gunzip GCA_002994505.1_ASM299450v1_genomic.fna.gz")
From NCBI.nlm.nih.gov I search for . Manually type out the C-lectin gene in the Paper and add a fasta header. I chose the protein sequence as there were fewer letters to type out.
C-LECTIN MKTLLILSVVLCAALSVRAAAVVPAEAATAQLGDKAAPEPEAVKDTAVEDTAVEETAVEDTAVEETAVEDTAVEETAVEDTAVEETAVEDTAVEDTAVEDTAVEDTAVEDTAVEETAVEDTAVEDTAVEDTAVAAGRPAGLRQTRLSFCLDGWQSFSGKCYFLANHPDSWANAERFCASYEGSLASVGSIWEYNFLQRMVKTGGHAFAWIGGYYFQGEWRWEDGSRFDY SNWDTPRSTAYYQCLLLNSQVSMGWSNNGCNMNFPFVCQVRQLNC
In Terminal I code ‘nano benediniaGene.fasta’ to create fasta file ‘benediniaGene.fast’
# create a database
system("makeblastdb -in GCA_002994505.1_ASM299450v1_genomic.fna -dbtype nucl -input_type fasta -out SerRivdb")
# blast
system("tblastn -db SerRivdb -query benediniaGene.fasta -out blastout.txt")
make format more readable
# change output for more readable
system("tblastn -db SerRivdb -query benediniaGene.fasta -outfmt '6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qcovs salltitles' -num_threads 16 -out blastout2.txt")
# percent > 50
system("more blastout2.txt | awk '$3>50' ")
Search for sequeneces of KJ677130.1, KJ726376.1, MN402681.1, MT341277.1, MT341284.1 and save them into .fna files.
I combine these 5 files as follow:
’cat *.1.fna >input.fna’
I get to website of https://www.ebi.ac.uk/jdispatcher/msa/clustalo/ to perform multiple alignment using Blast
input file input.fna then wait and download the output file as “clustalo-I20240826-081537-0648-80894241-p1m.aln-fasta”
library(ape)
library(phangorn)
alignment <- read.dna("/Users/nnthieu/Downloads/GenomeProject1000/Blast/clustalo-I20240826-081537-0648-80894241-p1m.aln-fasta",format ="fasta")
dist_matrix <- dist.dna(alignment)
tree <- nj(dist_matrix)
plot(tree, main = "Phylogenetic Tree")
Genes KJ677130.1 and KJ726376.1 are closed together in phylogenie. Other genes are far from phylogenes.