In today’s workshop we will be looking to:
There are a wide variety of ways to represent RNA secondary structure. We have already seen a few in this Module.
These are all useful in a variety of contexts. Considering that we may need to store predicted structures in a database or make some systematic comparisons, most of these formats are not well-suited for these tasks. The most useful is the dot-bracket text representation of RNA structure. Sometimes this notations is referred to as Vienna format. Read more about dot here.
In dot-bracket notation, we use dots (periods) to indicate a
nucleotide in molecule is unpaired and brackets
((),[],{},<>,
and so on) to indicate specific nucleotides are base-paired together.
While less visually appealing than other representations, dot-bracket
(or Vienna) notation enables us to rapidly compare structural
predictions in a systematic way using alignment algorithms and can be
easily stored in a database as flat text.
Understanding dot-bracket notation is best served by example. Consider the two structures below.
>Hairpin
AAAGGCCUUCCGGAAGAUGUUGGAAGGCCAAA
...((((((((((......))))))))))...
>Pseudoknot
GAAUUCCGGUCGACUCCGGAGAAACAAAGUCAA
....((((([[[[[.)))))]......]]]]..
Note that for the both structures, the number of left and left right brackets is balanced. This is critical for this representation. For the hairpin, we can use just one bracket character since there is no ambiguity about the structure and its relationship to nucleotides upstream or downstream. This is the output of most RNA folding algorithms (more on this in a bit). However, RNA can adopt far more complicated structures like pseudoknots.
One might imagine representing these twisted structures is a real
challenge. However, the dot-bracket notation does an excellent job with
these structures. We simply need to add more bracket characters to make
distinct relationships between nucleotides internal to one structure
with nucleotides upstream or downstream. Here, we’re using
() and [] to denote two sets of distinct
structural relationships between nucleotides.
There are other text formats used to store RNA structure information, such as Ct, however we will focus our attention on dot-bracket given its visual nature. Ct files store additional information about base-pairing useful for some other RNA structure software.
The utility of the dot-bracket notation is the ability to convert this flat text into visually appealing representations of RNA structure. Let’s take a look at some tools to perform this task.
Two useful tools (among many) are:
We will use Forna to help us get a feel for the the two structures, above. RNAstructure is a more comprehensive program for editing RNA structure images should this be of use in your future work.
Input the hairpin and pseudoknot structures into Forna, and let us take a look.
Question 1: Input the following structure a labmate gave you into Forna.
>structure
AAGGGGCCAAAAAAGGCCCCUA
..((((((......))))))).
Why is this not working? What is wrong with this structure?
Predicting RNA secondary structure is a mature computational problem but is not without problems. Let us recall some of the parameters and methods used in RNA folding algorithms.
These components allow for robust prediction of highly favorable structures. However, due to the linear nature of the dynamic programming algorithm finding MFE structures many complex complex structures cannot be predicted reliably. Additionally, many longer-range interactions are challenging to predict due to the dynamic programming algorithm favoring local interactions, first, during folding.
This exercise is adapted from Dr. Wendy Olivas
R2 elements are a class of retrotransposons that are found in most arthropods 1. During retrotransposition, the 3’ UTR of the message RNA is specifically recognized by the reverse transcriptase during target-primed reverse transcription 2 3. The secondary structure of the 3’ UTR was predicted for Drosophila with comparative sequence analysis of 10 sequences 4. The sequence of the R2 element from D. sucinea, which can adopt the comparative analysis structure, was later determined 5. This sequence has been chosen for this example because it has a known secondary structure and the prediction of this secondary structure by free energy minimization is less accurate than average, so that the usefulness of color annotation is demonstrated 6 7.
>R2 UTR
UGAUCUCUGUAUUUGUUUCUAUUUUGAACAUUUGCCUGCUACCUUGGCAUAACAUCAAUAAGGUACAAACAUCGCAAAAAGUCAUCAUAAGGUGGGUUUUAGUACGUAGGCGCUGUAGAACUUAAUUGUUCUGAUAGAGCAGCGAGUCGUGCAUGCUAGUCUAGCAUUUCUUGCUACCUAGUAUCUUUAGAAGAUUUCCCUCCCUUAGCGGUCAAA
Access the UNAFold web server,
navigate to the mfold input page, choose RNA folding form
from the left, and paste the sucinea R2 element sequence into the large
field on the server web site for the input sequence. Scroll to the
bottom of the web page, to the section marked “Choose structure
annotation.” Select the button after “p-num” to choose a color
annotation that reflects how well determined base pairs are. Keep the
default settings for all other fields. Note, however, that there are
links to a help page with an explanation of each user definable
setting.
Click the “Fold RNA” button at the bottom of the form. This sequence is short enough that the default immediate job can be performed, so the Web browser will move quickly to the results page. The results remain available on the server for 24 hours. Note that the energy dot plot can be viewed by following a hyperlink under Output. Furthermore, a zip or tar file can be downloaded that contains all the predicted structures. On the results page, view the first individual structure by clicking jpg under Structure 1.
Question 2: In the color-coding scheme, which color means that the base-pair has the highest probability? Which color corresponds to the lowest probability?
Now, let’s us another folding algorithm called RNAfold
that is part of the ViennaRNA Suite 8. ViennaRNA is a comprehensive set of RNA
analysis tools that is an excellent starting off point for analyzing any
RNA structure. In fact, we’ve already used a tool from it -
Forna.
Go to ViennaRNA Web
Services. Proceed to RNAfold paste the sucinea R2
element sequence in the input box. Scroll to the bottom and click on
Proceed to generate the prediction.
Question 3: Are there similarities between the structures predicted by mfold and RNAfold?
Question 4: How do the predicted structures compare to the structure shown above? Provide a descriptive comparison (ie., what parts are most similar/different?)
There are variety of other tools to help you with RNA structure prediction or calculating RNA structure free energies. Explore the ViennaRNA Web Services to learn more.
Simple RNA folding can provide insights into very strong, obvious
structures. A drawback, as we’ve seen, is that subtle differences in
algorithm, folding parameters (stacking energies, etc.), and sequence
length can strongly influence outcomes. For this reason, we often need
to constrain the RNA folding algorithm. As we saw in mfold
constraints can be added manually based on interactions that have been
defined experimentally.
An alternative to experimental data to constrain folding is to use comparative methods. The underlying hypothesis with comparative RNA folding is that functionally important RNA structures should be preserved. That means, in part, the RNA sequences of these structures should remain conserved. Let’s see if we can make sense of this with a real-world example.
Our first example will be to fold some tRNA. Below is a muliple
sequence alignment we can enter into the RNAalifold
program.
CLUSTAL W (1.83) multiple sequence alignment
Seq1 GGGCCUGUAGCUCAGAGGAUUAGAGCACGUGGCUACGAACCACGGUGUCGGGGGUUCGAA
Seq2 GGGCUAUUAGCUCAGUUGGUUAGAGCGCACCCCUGAUAAGGGUGAGGUCGCUGAUUCGAA
Seq3 GGCGCCGUGGCGCAGUGGA--AGCGCGCAGGGCUCAUAACCCUGAUGUCCUCGGAUCGAA
Seq5 GCGUUGGUGGUAUAGUGGUG-AGCAUAGCUGCCUUCCAAGCA-GUUGACCCGGGUUCGAU
Seq4 ACUCCCUUAGUAUAAUU----AAUAUAACUGACUUCCAAUUA-GUAGAUUCUGAAU-AAA
* * * * ** ** * * * * *
Seq1 UCCCUCCUCGCCCA
Seq2 UUCAGCAUAGCCCA
Seq3 ACCGAGCGGCGCUA
Seq5 UCCCGGCCAACGCA
Seq4 CCCAGAAGAGAGUA
* *
Paste this alignment text into the RNAalifold window and
click Proceed. A consensus structure will be provided. We can see the
constraints of sequence alignment helped predict the multiple hairpin
cloverleaf structure found in tRNA.
Now, a more challenging example. These sequences below are for an important regulatory element in the lentiviruses HIV and SIV called the Rev response element, or RRE. The sequences have the same function in the two species and we hypothesize that they may have similar structures.
>HIV-1
AGACCCAACAACAAUACAAGAAAAAGAAUCCGUAUCCAGAGAGGACCAGGGAGAGCAUUUGUUACAAUAGGAAAAAUAGGAAAUAUGAGACAAGCACAUUGUAACAUUAGUAGAGCAAAAUGGAAUAACACUUUAAAACAGAUAGCUAGCAAAUUAAGAGAACAAUUUGGAAAUAAUAAAACAAUAAUCUUUAAGCAAUCCUCAGGAGGGGACCCAGAAAUUGUAACGCACAG
>HIV-2
GUGCUAGGGUUCUUGGGUUUUCUCGCGACAGCAGGUUCUGCAAUGGGCGCGCGGUCCCUGACGCUGUCAGCCCAGUCCCGGACUUUACUGGCCGGGAUAGUGCAGCAACAGCAACAGCUGUUGGACGUAGUCAAGAGACAACAAGAAAUGUUGCGACUGACCGUCUGGGGAACGAAAAACCUCCAGGCAAGAGUCACUGCUAUCGAGAAGUACCUAAAGCAUCAGGCAC
>SIV
UCGUGCUAGGGUUUCUAGGCUUCUUGGGAGCUGCUGGAACUGCAAUGGGCGCAGCGGCAACAACGCUGACAGUCCAGUCUCGGCAUUUGCUUGCUGGGAUAUUGCAGCAGCAGAAGAACUUGCUGGCGGCUGUGGAACAGCAACAACAGUUGUUGAAGCUGACCAUUUGGGGUGUGAAAAACCUCAAUGCCCGCGUCACAGCUCUCGAGAAGUACCUAGAGGAUCAGGCACGG
Question 5: Fold each sequence using RNAfold.
Are there any similarities between the HIV-1, HIV-2, and SIV
RRE’s
To help determine if the sequences share a common structure, it may help to identify regions of high similarity and predict the structure of just those regions. We will do this by performing a multiple sequence alignment on these RNA sequences.
Go to Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/) and enter the two sequences. Make sure to select that you are entering RNA sequences. Use the program with default parameters to identify any regions of similarity.
Copy the entire Clustal output (like the example above) and enter
this into RNAalifold.
Question 6: How does the structure predicted by
RNAalifold compare to the RNAfold
structures?
From our analysis, it should be somewhat apparent that HIV-2 and SIV
are much more closely related. Since we see this, we can guide more
tailored alignment and estimation of structure. In addition to
RNAalifold there is another consensus folding program
called RNAz. The RNAz algorithm follows an
iterative fold-align method that is seeded with a multiple sequence
alignment.
Let’s go ahead and rerun our Clustal alignment with just HIV-2 and
SIV. In the ViennaRNA web services, select RNAz and input
the Clustal alignment as before. Proceed with all standard settings.
We should see a set of smaller MFE structures that make up the larger RNA RRE structure.
There are a wide variety of applications where we need to predict the thermodynamics of RNA-RNA interactions. RNA folding is one. Examining the ViennaRNA suite we see there are myriad other applications where the same algorithms are useful.
Like many bioinformatic methods, it is important to remember that these RNA structures are predictions. They can serve as a guide for forming hypotheses. Validating these predictions is the hard work of experimental science.
Eickbush, TH (2002). In Mobile DNA II (Craig, NL, Craigie, R, Gellart, M, and Lambowitz, AM eds).↩︎
Luan, DD et al. (1993). Cell 72, 595-605.↩︎
Luan, DD and Eickbush TH (1995). Mol. Cell. Biol. 15, 3882-3891.↩︎
Mathews, DH et al. (1997). RNA 3, 1-16.↩︎
Lathe, WC and Eickbush, TH (1997). Mol. Biol. Evol. 14, 1232-1241.↩︎
Zuker, M, and Jacobson, AB (1995). Nucl. Acids Res. 23, 2791-2798.↩︎
Zuker, M, and Jacobson, AB (1998). RNA 4, 669-679.↩︎
Lorenz et al. (2011), Algorithms for Molecular Biology 6(26)↩︎