Overview

In today’s workshop we will be looking to:

Use online resources for RNA structure prediction
Decipher dot-bracket notation to represent RNA structure
Use a multiple sequence alignment (MSA) program to perform MSA of nucleotide sequences

Representing RNA structure as text

There are a wide variety of ways to represent RNA secondary structure. We have already seen a few in this Module.

Conventional interaction diagrams
Mountain plots
Circle plots
Dot-bracket text
Ct text

These are all useful in a variety of contexts. Considering that we may need to store predicted structures in a database or make some systematic comparisons, most of these formats are not well-suited for these tasks. The most useful is the dot-bracket text representation of RNA structure. Sometimes this notations is referred to as Vienna format. Read more about dot here.

In dot-bracket notation, we use dots (periods) to indicate a nucleotide in molecule is unpaired and brackets ((),[],{},<>, and so on) to indicate specific nucleotides are base-paired together. While less visually appealing than other representations, dot-bracket (or Vienna) notation enables us to rapidly compare structural predictions in a systematic way using alignment algorithms and can be easily stored in a database as flat text.

Understanding dot-bracket notation is best served by example. Consider the two structures below.

>Hairpin
AAAGGCCUUCCGGAAGAUGUUGGAAGGCCAAA
...((((((((((......))))))))))...

>Pseudoknot
GAAUUCCGGUCGACUCCGGAGAAACAAAGUCAA
....((((([[[[[.)))))]......]]]]..

Note that for the both structures, the number of left and left right brackets is balanced. This is critical for this representation. For the hairpin, we can use just one bracket character since there is no ambiguity about the structure and its relationship to nucleotides upstream or downstream. This is the output of most RNA folding algorithms (more on this in a bit). However, RNA can adopt far more complicated structures like pseudoknots.

One might imagine representing these twisted structures is a real challenge. However, the dot-bracket notation does an excellent job with these structures. We simply need to add more bracket characters to make distinct relationships between nucleotides internal to one structure with nucleotides upstream or downstream. Here, we’re using () and [] to denote two sets of distinct structural relationships between nucleotides.

There are other text formats used to store RNA structure information, such as Ct, however we will focus our attention on dot-bracket given its visual nature. Ct files store additional information about base-pairing useful for some other RNA structure software.

Using RNA text representations to make other visual representations

The utility of the dot-bracket notation is the ability to convert this flat text into visually appealing representations of RNA structure. Let’s take a look at some tools to perform this task.

Two useful tools (among many) are:

RNAstructure is a stand-alone program you will need to install on your computer
Forna is a web-based utility that is simple to use

We will use Forna to help us get a feel for the the two structures, above. RNAstructure is a more comprehensive program for editing RNA structure images should this be of use in your future work.

Input the hairpin and pseudoknot structures into Forna, and let us take a look.

Question 1: Input the following structure a labmate gave you into Forna.

>structure
AAGGGGCCAAAAAAGGCCCCUA
..((((((......))))))).

Why is this not working? What is wrong with this structure?

Basics of RNA folding

Predicting RNA secondary structure is a mature computational problem but is not without problems. Let us recall some of the parameters and methods used in RNA folding algorithms.

Nucleotide stacking energies to predict interactions
Loop and bulge energetic costs
RNA molecules are folded as a contiguous string of nucleotides
Dynamic programming algorithm (much like sequence alignment) used to predict minimum free energy (MFE) structures
RNA folding algorithms attempt to minimize the free energy of a structure

These components allow for robust prediction of highly favorable structures. However, due to the linear nature of the dynamic programming algorithm finding MFE structures many complex complex structures cannot be predicted reliably. Additionally, many longer-range interactions are challenging to predict due to the dynamic programming algorithm favoring local interactions, first, during folding.

Using RNA folding software

This exercise is adapted from Dr. Wendy Olivas

Example 1

R2 elements are a class of retrotransposons that are found in most arthropods ¹. During retrotransposition, the 3’ UTR of the message RNA is specifically recognized by the reverse transcriptase during target-primed reverse transcription ² ³. The secondary structure of the 3’ UTR was predicted for Drosophila with comparative sequence analysis of 10 sequences ⁴. The sequence of the R2 element from D. sucinea, which can adopt the comparative analysis structure, was later determined ⁵. This sequence has been chosen for this example because it has a known secondary structure and the prediction of this secondary structure by free energy minimization is less accurate than average, so that the usefulness of color annotation is demonstrated ⁶ ⁷.

>R2 UTR
UGAUCUCUGUAUUUGUUUCUAUUUUGAACAUUUGCCUGCUACCUUGGCAUAACAUCAAUAAGGUACAAACAUCGCAAAAAGUCAUCAUAAGGUGGGUUUUAGUACGUAGGCGCUGUAGAACUUAAUUGUUCUGAUAGAGCAGCGAGUCGUGCAUGCUAGUCUAGCAUUUCUUGCUACCUAGUAUCUUUAGAAGAUUUCCCUCCCUUAGCGGUCAAA

Access the UNAFold web server, navigate to the mfold input page, choose RNA folding form from the left, and paste the sucinea R2 element sequence into the large field on the server web site for the input sequence. Scroll to the bottom of the web page, to the section marked “Choose structure annotation.” Select the button after “p-num” to choose a color annotation that reflects how well determined base pairs are. Keep the default settings for all other fields. Note, however, that there are links to a help page with an explanation of each user definable setting.

Click the “Fold RNA” button at the bottom of the form. This sequence is short enough that the default immediate job can be performed, so the Web browser will move quickly to the results page. The results remain available on the server for 24 hours. Note that the energy dot plot can be viewed by following a hyperlink under Output. Furthermore, a zip or tar file can be downloaded that contains all the predicted structures. On the results page, view the first individual structure by clicking jpg under Structure 1.

Question 2: In the color-coding scheme, which color means that the base-pair has the highest probability? Which color corresponds to the lowest probability?

Now, let’s us another folding algorithm called RNAfold that is part of the ViennaRNA Suite ⁸. ViennaRNA is a comprehensive set of RNA analysis tools that is an excellent starting off point for analyzing any RNA structure. In fact, we’ve already used a tool from it - Forna.

Go to ViennaRNA Web Services. Proceed to RNAfold paste the sucinea R2 element sequence in the input box. Scroll to the bottom and click on Proceed to generate the prediction.

Question 3: Are there similarities between the structures predicted by mfold and RNAfold?

Question 4: How do the predicted structures compare to the structure shown above? Provide a descriptive comparison (ie., what parts are most similar/different?)

There are variety of other tools to help you with RNA structure prediction or calculating RNA structure free energies. Explore the ViennaRNA Web Services to learn more.

Use of comparative methods to guide RNA folding

Simple RNA folding can provide insights into very strong, obvious structures. A drawback, as we’ve seen, is that subtle differences in algorithm, folding parameters (stacking energies, etc.), and sequence length can strongly influence outcomes. For this reason, we often need to constrain the RNA folding algorithm. As we saw in mfold constraints can be added manually based on interactions that have been defined experimentally.

An alternative to experimental data to constrain folding is to use comparative methods. The underlying hypothesis with comparative RNA folding is that functionally important RNA structures should be preserved. That means, in part, the RNA sequences of these structures should remain conserved. Let’s see if we can make sense of this with a real-world example.

Our first example will be to fold some tRNA. Below is a muliple sequence alignment we can enter into the RNAalifold program.

CLUSTAL W (1.83) multiple sequence alignment

Seq1            GGGCCUGUAGCUCAGAGGAUUAGAGCACGUGGCUACGAACCACGGUGUCGGGGGUUCGAA
Seq2            GGGCUAUUAGCUCAGUUGGUUAGAGCGCACCCCUGAUAAGGGUGAGGUCGCUGAUUCGAA
Seq3            GGCGCCGUGGCGCAGUGGA--AGCGCGCAGGGCUCAUAACCCUGAUGUCCUCGGAUCGAA
Seq5            GCGUUGGUGGUAUAGUGGUG-AGCAUAGCUGCCUUCCAAGCA-GUUGACCCGGGUUCGAU
Seq4            ACUCCCUUAGUAUAAUU----AAUAUAACUGACUUCCAAUUA-GUAGAUUCUGAAU-AAA
                       * *   *       *          **   **    *  *     *  *  * 

Seq1            UCCCUCCUCGCCCA
Seq2            UUCAGCAUAGCCCA
Seq3            ACCGAGCGGCGCUA
Seq5            UCCCGGCCAACGCA
Seq4            CCCAGAAGAGAGUA
                  *          *

Paste this alignment text into the RNAalifold window and click Proceed. A consensus structure will be provided. We can see the constraints of sequence alignment helped predict the multiple hairpin cloverleaf structure found in tRNA.

Now, a more challenging example. These sequences below are for an important regulatory element in the lentiviruses HIV and SIV called the Rev response element, or RRE. The sequences have the same function in the two species and we hypothesize that they may have similar structures.

>HIV-1
AGACCCAACAACAAUACAAGAAAAAGAAUCCGUAUCCAGAGAGGACCAGGGAGAGCAUUUGUUACAAUAGGAAAAAUAGGAAAUAUGAGACAAGCACAUUGUAACAUUAGUAGAGCAAAAUGGAAUAACACUUUAAAACAGAUAGCUAGCAAAUUAAGAGAACAAUUUGGAAAUAAUAAAACAAUAAUCUUUAAGCAAUCCUCAGGAGGGGACCCAGAAAUUGUAACGCACAG
>HIV-2
GUGCUAGGGUUCUUGGGUUUUCUCGCGACAGCAGGUUCUGCAAUGGGCGCGCGGUCCCUGACGCUGUCAGCCCAGUCCCGGACUUUACUGGCCGGGAUAGUGCAGCAACAGCAACAGCUGUUGGACGUAGUCAAGAGACAACAAGAAAUGUUGCGACUGACCGUCUGGGGAACGAAAAACCUCCAGGCAAGAGUCACUGCUAUCGAGAAGUACCUAAAGCAUCAGGCAC
>SIV
UCGUGCUAGGGUUUCUAGGCUUCUUGGGAGCUGCUGGAACUGCAAUGGGCGCAGCGGCAACAACGCUGACAGUCCAGUCUCGGCAUUUGCUUGCUGGGAUAUUGCAGCAGCAGAAGAACUUGCUGGCGGCUGUGGAACAGCAACAACAGUUGUUGAAGCUGACCAUUUGGGGUGUGAAAAACCUCAAUGCCCGCGUCACAGCUCUCGAGAAGUACCUAGAGGAUCAGGCACGG

Question 5: Fold each sequence using RNAfold. Are there any similarities between the HIV-1, HIV-2, and SIV RRE’s

To help determine if the sequences share a common structure, it may help to identify regions of high similarity and predict the structure of just those regions. We will do this by performing a multiple sequence alignment on these RNA sequences.

Go to Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/) and enter the two sequences. Make sure to select that you are entering RNA sequences. Use the program with default parameters to identify any regions of similarity.

Copy the entire Clustal output (like the example above) and enter this into RNAalifold.

Question 6: How does the structure predicted by RNAalifold compare to the RNAfold structures?

From our analysis, it should be somewhat apparent that HIV-2 and SIV are much more closely related. Since we see this, we can guide more tailored alignment and estimation of structure. In addition to RNAalifold there is another consensus folding program called RNAz. The RNAz algorithm follows an iterative fold-align method that is seeded with a multiple sequence alignment.

Let’s go ahead and rerun our Clustal alignment with just HIV-2 and SIV. In the ViennaRNA web services, select RNAz and input the Clustal alignment as before. Proceed with all standard settings.

We should see a set of smaller MFE structures that make up the larger RNA RRE structure.

Wrapping up

There are a wide variety of applications where we need to predict the thermodynamics of RNA-RNA interactions. RNA folding is one. Examining the ViennaRNA suite we see there are myriad other applications where the same algorithms are useful.

Like many bioinformatic methods, it is important to remember that these RNA structures are predictions. They can serve as a guide for forming hypotheses. Validating these predictions is the hard work of experimental science.

Eickbush, TH (2002). In Mobile DNA II (Craig, NL, Craigie, R, Gellart, M, and Lambowitz, AM eds).↩︎
Luan, DD et al. (1993). Cell 72, 595-605.↩︎
Luan, DD and Eickbush TH (1995). Mol. Cell. Biol. 15, 3882-3891.↩︎
Mathews, DH et al. (1997). RNA 3, 1-16.↩︎
Lathe, WC and Eickbush, TH (1997). Mol. Biol. Evol. 14, 1232-1241.↩︎
Zuker, M, and Jacobson, AB (1995). Nucl. Acids Res. 23, 2791-2798.↩︎
Zuker, M, and Jacobson, AB (1998). RNA 4, 669-679.↩︎
Lorenz et al. (2011), Algorithms for Molecular Biology 6(26)↩︎

Module 4 Lab Exercises