Overview

In today’s workshop we will be looking to:

Protein structural databases

As we’ve discussed, there are three primary ways we can determine the 3D (tertiary/quaternary) structure of biological molcules - typically proteins.

Each technique has its pros and cons such as how well each is able to capture the dynamic states of molecules. NMR and Cryo-EM are suitable techniques for understanding how dynamic structures may be whereas X-ray crystallography cannot really be used for these purposes. The trade-off is often resolution. Most crytal structures have considerably higher resoltion, although with advances in NMR and Cryo-EM methods these differences are diminishing.

Protein data bank (PDB)

Navigate to: http://www.rcsb.org

Let’s go ahead and check out the “November Molecule of the Month”! Feel free to read through a bit about this enzyme. Let’s navigat to the entry 6VZ8 (the plant acetohydroxyacid synthase complex).

Let’s look at the entry’s information.


Question

What method was used to generate this structure?


A few key variables should be assessed for any structure prior to use.

  • Resolution - What is the minimum distance you can reliably resolve two atoms in the structure, on average?
  • wwPDB validation - How do these scores rank relative to other structures?

There are many more parameters in the full report, but these are a good starting point for any general inquiry. If scores are bad or the resolution is quite large (Angstroms), proceed with caution.

We can also visualize the data in a web-based structure viewer. Alternatively, you can download the PDB file and open it with any number of locally installed viewer options. A traditional viewer is RasMol.

Use your mouse/trackpad to move this structure around. Quite beautiful, no? The default setting in the viewer is to a cartoon depiction. This should seem somewhat intuitive since it is a common depiction in a lot of literature. We can modify this in the settings window, under Components. Here, we can add, remove, or change visualized components. You will need to play with these settings to develop some intuition for these settings and the associated menu.

Now, let’s hover our pointer over some of the subunits. You’ll note you can highlight specific amino acids on each subunit. As you hover over these subunits, you can see in the bottom right what peptide chain these amino acids are associated with. This can be very important in working out how genes/proteins correspond to the teriary/quaternary structure of a multi-subunit protein.

Finally, we can use our viewer to produce nice animations of the structure based on the settings we have made in the Components menu. These are often nice for presentations or your own viewing.

NCBI Structure database

Navigate to: https://www.ncbi.nlm.nih.gov/Structure/index.shtml

NCBI is also a great resource for protein structures, thought most everything is redundant with PDB. However, there is strong integration of the NCBI Structure database to other NCBI tools like BLAST as well as all of the search integration we have seen in environments like R, Python, etc.

Go ahead and search for: 6VZ8

Like PDB, we can see there is a 3D viewer. However, NCBI will also provide some additional search information such as domains and an interaction network. These can help you better understand the function and relationship to structure each subunit of a protein.

Structure prediction

Navigate to: https://robetta.bakerlab.org/

Use your username+pwd to log in. Alternatively, register yourself with Robetta so you can proceed.

Robetta is an interface to a number of protein structure tools developed by David Baker’s lab at the University of Washington in Seattle, WA.

Let’s first look at the Example Results. You can see, there is a 3D rendering of the structure and typically 4 or 5 additional model outputs for your protein. There are also confidence scores with the models (1=awesome, 0=terrible). The error estimates over the length of the predicted peptide are also provided. Low numbers mean good predictions, high numbers not so good predictions. This interpretation is subjective based on the problem being addressed.

Let’s go ahead and submit a job.

Submit the following protein sequence for prediction:

>tr|B5XXV1|B5XXV1_KLEP3 Regulatory protein SoxS OS=Klebsiella pneumoniae (strain 342) OX=507522 GN=soxS PE=4 SV=1
MSHQDIIQTLIEWIDEHIDQPLNIDIVARKSGYSKWYLQRMFRTVMHQTLGDYIRQRRLL
LAAEALRTTQRPIFDIAMDLGYVSQQTFSRVFRREFDRTPSDYRHQISA

You will see a number of options to select, below. For now, we’ll stick to RoseTTAFold. This is an integrated homology based predictive algorithm.

What does it do?

Let’s take a look at some of my outputs. Search for “lchubiz” in the queue.

Take a look at the MarA, MarA-Rob, and MarA-Rob modified structures. The MarA-Rob structure is a fusion between the N-terminal and C-terminal regions of two regulatory proteins in E. coli (both have known structures). This fusion is reasonably accurate. To modify this structure, I altered a number of prolines in the primary AA sequence to glycines. This is a traditional method to disrupt structures, experimentally. Let’s look closely at the predicted effects.


Question

Did changing P to G make any difference in the MarA-Rob structure prediction? Why do you think this is the case? (Hint: Think about how RoseTTAFold is determining the structure.)


---
title: "Module 10 Workshop"
output: html_notebook
---

# Overview

In today's workshop we will be looking to:

- Explore the Protein Data Bank (PDB) and NCBI Structure databases.
- Use a protein structure prediction system to determine structures of some unknowns.

# Protein structural databases

As we've discussed, there are three primary ways we can determine the 3D (tertiary/quaternary) structure of biological molcules - typically proteins.

- X-ray crystallography
- Nuclear magnetic resonance (NMR)
- Cryogenic electron microscopy (Cryo-EM)

Each technique has its pros and cons such as how well each is able to capture the dynamic states of molecules. NMR and Cryo-EM are suitable techniques for understanding how dynamic structures may be whereas X-ray crystallography cannot really be used for these purposes. The trade-off is often resolution. Most crytal structures have considerably higher resoltion, although with advances in NMR and Cryo-EM methods these differences are diminishing.

## Protein data bank (PDB)

Navigate to: http://www.rcsb.org

Let's go ahead and check out the "November Molecule of the Month"! Feel free to read through a bit about this enzyme. Let's navigat to the entry 6VZ8 (the plant acetohydroxyacid synthase complex).

Let's look at the entry's information.

---

*Question*

What method was used to generate this structure?

---

A few key variables should be assessed for any structure prior to use.

- *Resolution* - What is the minimum distance you can reliably resolve two atoms in the structure, on average? 
- *wwPDB* validation - How do these scores rank relative to other structures?

There are many more parameters in the full report, but these are a good starting point for any general inquiry. If scores are bad or the resolution is quite large (Angstroms), proceed with caution.

We can also visualize the data in a web-based structure viewer. Alternatively, you can download the PDB file and open it with any number of locally installed viewer options. A traditional viewer is RasMol.

Use your mouse/trackpad to move this structure around. Quite beautiful, no? The default setting in the viewer is to a cartoon depiction. This should seem somewhat intuitive since it is a common depiction in a lot of literature. We can modify this in the settings window, under Components. Here, we can add, remove, or change visualized components. You will need to play with these settings to develop some intuition for these settings and the associated menu.

Now, let's hover our pointer over some of the subunits. You'll note you can highlight specific amino acids on each subunit. As you hover over these subunits, you can see in the bottom right what peptide chain these amino acids are associated with. This can be very important in working out how genes/proteins correspond to the teriary/quaternary structure of a multi-subunit protein.

Finally, we can use our viewer to produce nice animations of the structure based on the settings we have made in the Components menu. These are often nice for presentations or your own viewing.

## NCBI Structure database

Navigate to: https://www.ncbi.nlm.nih.gov/Structure/index.shtml

NCBI is also a great resource for protein structures, thought most everything is redundant with PDB. However, there is strong integration of the NCBI Structure database to other NCBI tools like BLAST as well as all of the search integration we have seen in environments like R, Python, etc. 

Go ahead and search for: 6VZ8

Like PDB, we can see there is a 3D viewer. However, NCBI will also provide some additional search information such as domains and an interaction network. These can help you better understand the function and relationship to structure each subunit of a protein.

# Structure prediction

Navigate to: https://robetta.bakerlab.org/

Use your username+pwd to log in. Alternatively, register yourself with Robetta so you can proceed.

Robetta is an interface to a number of protein structure tools developed by David Baker's lab at the University of Washington in Seattle, WA. 

Let's first look at the Example Results. You can see, there is a 3D rendering of the structure and typically 4 or 5 additional model outputs for your protein. There are also confidence scores with the models (1=awesome, 0=terrible). The error estimates over the length of the predicted peptide are also provided. Low numbers mean good predictions, high numbers not so good predictions. This interpretation is subjective based on the problem being addressed.

Let's go ahead and submit a job.

Submit the following protein sequence for prediction:

    >tr|B5XXV1|B5XXV1_KLEP3 Regulatory protein SoxS OS=Klebsiella pneumoniae (strain 342) OX=507522 GN=soxS PE=4 SV=1
    MSHQDIIQTLIEWIDEHIDQPLNIDIVARKSGYSKWYLQRMFRTVMHQTLGDYIRQRRLL
    LAAEALRTTQRPIFDIAMDLGYVSQQTFSRVFRREFDRTPSDYRHQISA

You will see a number of options to select, below. For now, we'll stick to RoseTTAFold. This is an integrated homology based predictive algorithm.

What does it do?

- Searches for known homology in databases like NCBI and correlated structural information in databases like PDB. This is a HMM-based approach.
- Reconstucts structure based on "known" pieces from databases and performs ab initio folding to fill in gaps.
- These different modes can be altered based on input settings (CM vs AB settings).

Let's take a look at some of my outputs. Search for "lchubiz" in the queue.

Take a look at the MarA, MarA-Rob, and MarA-Rob modified structures. The MarA-Rob structure is a fusion between the N-terminal and C-terminal regions of two regulatory proteins in E. coli (both have known structures). This fusion is reasonably accurate. To modify this structure, I altered a number of prolines in the primary AA sequence to glycines. This is a traditional method to disrupt structures, experimentally. Let's look closely at the predicted effects.

---

*Question*

Did changing P to G make any difference in the MarA-Rob structure prediction? Why do you think this is the case? (Hint: Think about how RoseTTAFold is determining the structure.)

---



