BLASTing an unknown protein sequence against all existing protein sequences

Go to http://blast.ncbi.nlm.nih.gov/ and choose the appropriate BLAST tool. You wonder what this protein sequence is and what it does. So, please go ahead and search for it!

>Unknown BFB protein 1
MDLIRGVLLRLLLLASSLGPGAVSLRAAIRKPGKVGPPLDIKLGALNCTAFSIQWKMPRHPGSPILGYTVFYSEVGADKS
LQEQLHSVPLSRDIPTTEEVIGDLKPGTEYRVSIAAYSQAGKGRLSSPRHVTTLSQDSCLPPAAPQQPHVIVVSDSEVAL
SWKPGASEGSAPIQYYSVEFIRPDFDKKWTSIHERIQMDSMVIKGLDPDTNYQFAVRAMNSHGPSPRSWPSDIIRTLCPE
EAGSGRYGPRYITDMGAGEDDEGFEDDLDLDISFEEVKPLPATKGGNKKFLVESKKMSISNPKTISRLIPPTSASLPVTT
VAPQPIPIQRKGKNGVAIMSRLFDMPCDETLCSADSFCVNDYTWGGSRCQCTLGKGGESCSEDIVIQYPQFFGHSYVTFE
PLKNSYQAFQITLEFRAEAEDGLLLYCGENEHGRGDFMSLAIIRRSLQFRFNCGTGVAIIVSETKIKLGGWHTVMLYRDG
LNGLLQLNNGTPVTGQSQGQYSKITFRTPLYLGGAPSAYWLVRATGTNRGFQGCVQSLAVNGRRIDMRPWPLGKALSGAD
VGECSSGICDEASCIHGGTCTAIKADSYICLCPLGFKGRHCEDAFTLTIPQFRESLRSYAATPWPLEPQHYLSFMEFEIT
FRPDSGDGVLLYSYDTGSKDFLSINLAGGHVEFRFDCGSGTGVLRSEDPLTLGNWHELRVSRTAKNGILQVDKQKIVEGM
AEGGFTQIKCNTDIFIGGVPNYDDVKKNSGVLKPFSGSIQKIILNDRTIHVKHDFTSGVNVENAAHPCVRAPCAHGGSCR
PRKEGYDCDCPLGFEGLHCQKECGNYCLNTIIEAIEIPQFIGRSYLTYDNPDILKRVSGSRSNVFMRFKTTAKDGLLLWR
GDSPMRPNSDFISLGLRDGALVFSYNLGSGVASIMVNGSFNDGRWHRVKAVRDGQSGKITVDDYGARTGKSPGMMRQLNI
NGALYVGGMKEIALHTNRQYMRGLVGCISHFTLSTDYHISLVEDAVDGKNINTCGAK

Once you are done with the results, please share your findings.

Perform a second web BLAST, this time restricting the search database to the genome identified in step above. Are there any duplicated copies of this protein in its host genome?

Install conda


https://repo.anaconda.com/miniconda/Miniconda3-py37_4.10.3-Linux-x86_64.sh

bash ~/Downloads/Miniconda3-py37_4.10.3-Linux-x86_64.sh

Create an environment and activate

conda create --name bfblab

Check out this conda cheat sheet for more options: https://docs.conda.io/projects/conda/en/4.6.0/_downloads/52a95608c49671267e40c689e0bc00ca/conda-cheatsheet.pdf

Now activate the environment you just created.

conda activate bfblab

To exit from that, use conda deactivate

Installing tools with conda

conda install -c bioconda blast

Creating our lab folder

cd ~
mkdir Lab06
cd Lab06

Downloading protein files from NCBI FTP servers using wget and unzipping a compressed file using gunzip (gzip for compression)

wget ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/mRNA_Prot/human.1.protein.faa.gz

gunzip *gz

ls

Let’s check the file using head command

head human.1.protein.faa

Create a fasta file using gedit and save your protein of interest (or example above!)

less query_fasta.fa

If you have more than 1 file , you need to merge all fasta files in order to create a database, makeblastdb only accepts 1 file

cat file1.fa file2.fa file3.fa file4.fa > mergedfile.fa
#or
cat *.fa > mergedfile.fa

In order to do a local search with BLAST we need to create a local database

makeblastdb -dbtype prot -in human.1.protein.faa -out human_prot_1

After creating our own local database, we can now use BLAST locally (blastp for prot, blastn for nucl)

#make blast creates multiple files, we only use the filename w/o the file suffixes
blastp -db human_prot_1 -query human.1.protein.faa

You can change the output format using -outfmt , you can use blastp -help to view available formats

blastp -db human_prot_1 -query human.1.protein.faa -outfmt 6
#tabular output

You can use -evalue to input a threshold value for e-value

blastp -db human_prot_1 -query human.1.protein.faa -outfmt 6 -evalue 1e6
#tabular output with e-value threshold