Go to http://blast.ncbi.nlm.nih.gov/ and choose the appropriate BLAST tool. You wonder what this protein sequence is and what it does. So, please go ahead and search for it!
>Unknown BFB protein 1
MDLIRGVLLRLLLLASSLGPGAVSLRAAIRKPGKVGPPLDIKLGALNCTAFSIQWKMPRHPGSPILGYTVFYSEVGADKS
LQEQLHSVPLSRDIPTTEEVIGDLKPGTEYRVSIAAYSQAGKGRLSSPRHVTTLSQDSCLPPAAPQQPHVIVVSDSEVAL
SWKPGASEGSAPIQYYSVEFIRPDFDKKWTSIHERIQMDSMVIKGLDPDTNYQFAVRAMNSHGPSPRSWPSDIIRTLCPE
EAGSGRYGPRYITDMGAGEDDEGFEDDLDLDISFEEVKPLPATKGGNKKFLVESKKMSISNPKTISRLIPPTSASLPVTT
VAPQPIPIQRKGKNGVAIMSRLFDMPCDETLCSADSFCVNDYTWGGSRCQCTLGKGGESCSEDIVIQYPQFFGHSYVTFE
PLKNSYQAFQITLEFRAEAEDGLLLYCGENEHGRGDFMSLAIIRRSLQFRFNCGTGVAIIVSETKIKLGGWHTVMLYRDG
LNGLLQLNNGTPVTGQSQGQYSKITFRTPLYLGGAPSAYWLVRATGTNRGFQGCVQSLAVNGRRIDMRPWPLGKALSGAD
VGECSSGICDEASCIHGGTCTAIKADSYICLCPLGFKGRHCEDAFTLTIPQFRESLRSYAATPWPLEPQHYLSFMEFEIT
FRPDSGDGVLLYSYDTGSKDFLSINLAGGHVEFRFDCGSGTGVLRSEDPLTLGNWHELRVSRTAKNGILQVDKQKIVEGM
AEGGFTQIKCNTDIFIGGVPNYDDVKKNSGVLKPFSGSIQKIILNDRTIHVKHDFTSGVNVENAAHPCVRAPCAHGGSCR
PRKEGYDCDCPLGFEGLHCQKECGNYCLNTIIEAIEIPQFIGRSYLTYDNPDILKRVSGSRSNVFMRFKTTAKDGLLLWR
GDSPMRPNSDFISLGLRDGALVFSYNLGSGVASIMVNGSFNDGRWHRVKAVRDGQSGKITVDDYGARTGKSPGMMRQLNI
NGALYVGGMKEIALHTNRQYMRGLVGCISHFTLSTDYHISLVEDAVDGKNINTCGAK
Once you are done with the results, please share your findings.
Perform a second web BLAST, this time restricting the search database to the genome identified in step above. Are there any duplicated copies of this protein in its host genome?
https://repo.anaconda.com/miniconda/Miniconda3-py37_4.10.3-Linux-x86_64.sh
bash ~/Downloads/Miniconda3-py37_4.10.3-Linux-x86_64.sh
conda create --name bfblab
Check out this conda cheat sheet for more options: https://docs.conda.io/projects/conda/en/4.6.0/_downloads/52a95608c49671267e40c689e0bc00ca/conda-cheatsheet.pdf
Now activate the environment you just created.
conda activate bfblab
To exit from that, use conda deactivate
conda install -c bioconda blast
Creating our lab folder
cd ~
mkdir Lab06
cd Lab06
Downloading protein files from NCBI FTP servers using wget and unzipping a compressed file using gunzip (gzip for compression)
wget ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/mRNA_Prot/human.1.protein.faa.gz
gunzip *gz
ls
Let’s check the file using head command
head human.1.protein.faa
Create a fasta file using gedit and save your protein of interest (or example above!)
less query_fasta.fa
If you have more than 1 file , you need to merge all fasta files in order to create a database, makeblastdb only accepts 1 file
cat file1.fa file2.fa file3.fa file4.fa > mergedfile.fa
#or
cat *.fa > mergedfile.fa
In order to do a local search with BLAST we need to create a local database
makeblastdb -dbtype prot -in human.1.protein.faa -out human_prot_1
After creating our own local database, we can now use BLAST locally (blastp for prot, blastn for nucl)
#make blast creates multiple files, we only use the filename w/o the file suffixes
blastp -db human_prot_1 -query human.1.protein.faa
You can change the output format using -outfmt , you can use blastp -help to view available formats
blastp -db human_prot_1 -query human.1.protein.faa -outfmt 6
#tabular output
You can use -evalue to input a threshold value for e-value
blastp -db human_prot_1 -query human.1.protein.faa -outfmt 6 -evalue 1e6
#tabular output with e-value threshold