Overview

In todays workshop we will be looking to:

Simple pairwise alignment - Global and local

You’ve by now done some alignments by hand and that takes a while, even for short sequences. Fortunately, researchers have developed computer programs to do this for you. Our goal here will be to use the EBI web interface to perform a global alignment using an implementation of the Needleman-Wunsch algorithm and a local alignment using an implementation of the Smith-Waterman algorithm.

Direct your browser to the EBI Tools page for pairwise sequence alignment (PSA): https://www.ebi.ac.uk/Tools/psa/

The following exercises will use the following sequences:

>NP_005359.1 myoglobin isoform 1 [Homo sapiens]
MGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASEDLKKHGATVL
TALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKHPGDFGADAQGAMNKALELFR
KDMASNYKELGFQG
>NP_000509.1 hemoglobin subunit beta [Homo sapiens]
MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG
AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN
ALAHKYH

Global alignment

We’ll start by aligning using the program Needle which is an implementation of the Needleman-Wunsch algorithm called EMBOSS Needle. In the page you’ve been directed to, click Launch Needle.

First, select the type of sequence being aligned (PROTEIN/DNA). Enter each FASTA sequenct entry (> to the last character) into the corresponding text entry windows (there should be 2). After entering your two sequences, go ahead and submit your job. As you will see, this takes a minute or two since you are using public computing resources. Take a look at your output and make note.

When interpreting alignments, note the gaps in alignment. These are noted by - in either the subject or query sequence in the output alignment. Also, perfect matches are indicated with | characters while acceptable matches are indicated by . and : characters, respectively.

Question 1: Are these sequences similar? Why or why not?

Now, let’s go back to the input form (link at the top of the page). Input the sequences again. This time, let’s adjust some parameters and see how that might influence our alignment. Let’s change our scoring matrix to BLOSUM30 and reduce the gap open/end penalties to 1. You change this parameter by clicking on “More options…” and switching the Matrix used. Go ahead and submit.

Question 2: What is something you notice that is very different in this alignment?

Local alignment

Local alignment is a much more powerful tool to search databases, detect protein domains, and assess similarity between distant homologous genes/proteins. Here, we’ll use the EBI interface for an implementation of the Smith-Waterman algorithm called EMBOSS Water.

Go back to the page you are directed to in the link to EBI, above. Click on Launch Water.

First, let’s input our sequences and submit our alignment request using the default parameters. You might notice that there doesn’t seem to be much difference between this alignment and our first alignment above.

Question 3: Why do you think this might be true? (Hint: Think about sequence length and similarity)

Next, let’s adjust some parameters. This can be done in the “More options…” section. Let’s reduce our gap opening penalty to 1 but increase the gap extension penalty to 10. Select the BLOSUM90 scoring matrix.

Question 4: Is this a useful alignment? What can you conclude from it?

Other ways to visualize alignments

The output of alignments is sometimes difficult to interpret. This is very true when we want to assess regions of similarity between proteins/nucleotides at a glance. Another way of visualizing pairwise alignment is using a dot plot. This type of graphic is best illustrated by use. Navigate to a simple dot plot software via the link below.

https://www.ebi.ac.uk/Tools/seqstats/emboss_dotmatcher/

As before, enter the two sequences. Using default parameters, submit. Now, let’s look at the output. Each protein is on a respective axis. The matches in residues are indicated by dots. We see along the horizontal axis of this plot regions of similarity between these sequences.

Question 5: How does the dot plot compare to the global alignment you generated above?