In today’s workshop we will be looking to:
As we’ve discussed, there are three primary ways we can determine the 3D (tertiary/quaternary) structure of biological molcules - typically proteins.
Each technique has its pros and cons such as how well each is able to capture the dynamic states of molecules. NMR and Cryo-EM are suitable techniques for understanding how dynamic structures may be whereas X-ray crystallography cannot really be used for these purposes. The trade-off is often resolution. Most crytal structures have considerably higher resoltion, although with advances in NMR and Cryo-EM methods these differences are diminishing.
Navigate to: http://www.rcsb.org
Let’s go ahead and check out the “November Molecule of the Month”! Feel free to read through a bit about this enzyme. Let’s navigat to the entry 6VZ8 (the plant acetohydroxyacid synthase complex).
Let’s look at the entry’s information. A few key variables should be assessed for any structure prior to use.
There are many more parameters in the full report, but these are a good starting point for any general inquiry. If scores are bad or the resolution is quite large (Angstroms), proceed with caution.
Question 1: What method was used to generate 6VZ8 and what is the resolution?
We can also visualize the data in a web-based structure viewer. Alternatively, you can download the PDB file and open it with any number of locally installed viewer options. A traditional viewer is RasMol.
Use your mouse/trackpad to move this structure around. Quite beautiful, no? The default setting in the viewer is to a cartoon depiction. This should seem somewhat intuitive since it is a common depiction in a lot of literature. We can modify this in the settings window, under Components. Here, we can add, remove, or change visualized components. You will need to play with these settings to develop some intuition for these settings and the associated menu.
Now, let’s hover our pointer over some of the subunits. You’ll note you can highlight specific amino acids on each subunit. As you hover over these subunits, you can see in the bottom right what peptide chain these amino acids are associated with. This can be very important in working out how genes/proteins correspond to the teriary/quaternary structure of a multi-subunit protein.
Finally, we can use our viewer to produce nice animations of the structure based on the settings we have made in the Components menu. These are often nice for presentations or your own viewing.
Navigate to: https://www.ncbi.nlm.nih.gov/Structure/index.shtml
NCBI is also a great resource for protein structures, thought most everything is redundant with PDB. However, there is strong integration of the NCBI Structure database to other NCBI tools like BLAST as well as all of the search integration we have seen in environments like R, Python, etc.
Go ahead and search for: 6VZ8
Like PDB, we can see there is a 3D viewer. However, NCBI will also provide some additional search information such as domains and an interaction network. These can help you better understand the function and relationship to structure each subunit of a protein.
Navigate to: https://robetta.bakerlab.org/
Use your username+pwd to log in. Alternatively, register yourself with Robetta so you can proceed.
Robetta is an interface to a number of protein structure tools developed by David Baker’s lab at the University of Washington in Seattle, WA.
Once you are logged in, there are two menus at the top that are relevant for our work: Project and Structure Prediction.
Let’s first look at the Example Results in the Structure Prediction dropdown menu. You can see, there is a 3D rendering of the structure and typically 4 or 5 additional model outputs for your protein. There are also confidence scores with the models (1=awesome, 0=terrible). The error estimates over the length of the predicted peptide are also provided. Low numbers mean good predictions, high numbers not so good predictions. This interpretation is subjective based on the problem being addressed. But, a lot of bad scores generally means a poorly predicted structure.
Note, bad scores may actually have biological meaning! There are various proteins that have intrinsically disordered regions (or IDRs). These cannot been resolved very well since they do not form ordered crystals or adopt any stable, long-term configuration. However, IDRs are very import in self-assembly of various biological macromolecules. Learn More!
Let’s go ahead and submit a job.
Go to Submit in the Structure Prediction menu. Submit the following protein sequence for prediction. Only paste in the amino acid sequence, not the header.
>tr|B5XXV1|B5XXV1_KLEP3 Regulatory protein SoxS OS=Klebsiella pneumoniae (strain 342) OX=507522 GN=soxS PE=4 SV=1
MSHQDIIQTLIEWIDEHIDQPLNIDIVARKSGYSKWYLQRMFRTVMHQTLGDYIRQRRLL
LAAEALRTTQRPIFDIAMDLGYVSQQTFSRVFRREFDRTPSDYRHQISA
You will see a number of options to select, below. For now, we’ll stick to RoseTTAFold. This is an integrated homology based predictive algorithm. Once you have submitted you can find your job in the Queue. Navigate to the Queue in the Structure Prediction menu.
While we wait, what RoseTTAFold do?
While your prediction is calculated, let’s take a look at some of my outputs. Search for “lchubiz” in the queue. You can search by user name at the top of the queue. This can help you find your jobs.
Take a look at the MarA, MarA-Rob, and MarA-Rob modified Ps structures. The MarA-Rob structure is a fusion between the N-terminal and C-terminal regions of two regulatory proteins in E. coli (both have known structures). This fusion is reasonably accurate. To modify this structure, I altered a number of prolines in the primary AA sequence to glycines. This is a traditional method to disrupt structures, experimentally. Let’s look closely at the predicted effects.
Question2: Did changing P to G make any difference in the MarA-Rob structure prediction? Why do you think this is the case? (Hint: Think about how RoseTTAFold is determining the structure.)
Here is the CLUSTALO alignment of these proteins so you can see where Ps were modified.
CLUSTAL O(1.2.4) multiple sequence alignment
MarA MSRRNTDAITIHSILDWIEDNLESPLSLEKVSERSGYSKWHLQRMFKKETGHSLGQYIRS 60
MarA-Rob_Fusion MSRRNTDAITIHSILDWIEDNLESPLSLEKVSERSGYSKWHLQRMFKKETGHSLGQYIRS 60
MarA-Rob_Fusion_ModP MSRRNTDAITIHSILDWIEDNLESGLSLEKVSERSGYSKWHLQRMFKKETGHSLGQYIRS 60
************************ ***********************************
MarA RKMTEIAQKLKESNEPILYLAERYGFESQQTLTRTFKNYFDVPPHKYRMTNMQGESRFLH 120
MarA-Rob_Fusion RKMTEIAQKLKESNEPILYLAERYGFESQQTLTRTFKNYFDVPPHKYRRSPEWSAFGIRP 120
MarA-Rob_Fusion_ModP RKMTEIAQKLKESNEGILYLAERYGFESQQTLTRTFKNYFDVGGHKYRRSPEWSAFGIRG 120
*************** ************************** **** : . :
MarA PLNH---------YNS-------------------------------------------- 127
MarA-Rob_Fusion PLRLGEFTMPEHKFVTLEDTPLIGVTQSYSCSLEQISDFRHEMRYQFWHDFLGNAPTIPP 180
MarA-Rob_Fusion_ModP GLRLGEFTMPEHKFVTLEDTPLIGVTQSYSCSLEQISDFRHEMRYQFWHDFLGNAPTIPP 180
*. : :
MarA ------------------------------------------------------------ 127
MarA-Rob_Fusion VLYGLNETRPSQDKDDEQEVFYTTALAQDQADGYVLTGHPVMLQGGEYVMFTYEGLGTGV 240
MarA-Rob_Fusion_ModP VLYGLNETRPSQDKDDEQEVFYTTALAQDQADGYVLTGHPVMLQGGEYVMFTYEGLGTGV 240
MarA ----------------------------------------------------- 127
MarA-Rob_Fusion QEFILTVYGTCMPMLNLTRRKGQDIERYYPAEDAKAGDRPINLRCELLIPIRR 293
MarA-Rob_Fusion_ModP QEFILTVYGTCMPMLNLTRRKGQDIERYYPAEDAKAGDRPINLRCELLIPIRR 293