In this simple tutorial, we will use PyMOL to create a few high quality figures of a recent crystal structure of the nucleosome bound to a linker histone proten, which is usually referred to as a chromatosome. I will not try to explain the features or go into details of the chromatosome, since the purpose of the tutorial is to explore some of the features of PyMOL. You can learn more about the structure here and find the appropriate links to download the structure.

PyMOL fundamentals

We will use the Ubuntu virtual machine that we created inside VirtualBox to follow along with these instructions. So the first thing to do is to open PyMOL. You can do that by simply double-clicking on the icon (that is one way of opening it). From here and on, we will just use PyMOL by typing in the command line on the small window on top where it says ‘PyMOL>’. Feel free to explore the graphical interface on your own.

The next thing we are going to do is navigate to the directory (folder) where we want our files to be located. A directory named ‘PyMOL-example’, that I created, already exists under ‘/home/student/Documents’. You can explore those directories and create new ones using ‘File Manager’, which has an interface you should already be familiar with. Inside PyMOL, we are going to use command lines to navigate to that directory. We do so by using the ‘cd’ command, which stands for ‘change directory’, and providing the full path of the directory:

cd /home/student/Documents/PyMOL-example

Now if you type ‘ls’, which stands for ‘list (files and directories)’, you will see the files and folders that exist in that directory listed in the space above the command line.

As you can see, the file ‘4qlc.pdb’ that contains our structure has already been downloaded into the directory.

Note that the above commands, using ‘cd’ and ‘ls’, will work if you are using the Ubuntu virtual machine that we introduced for the demo, or in another Linux or MacOS environment. If you installed PyMOL directly into your Windows machine, those commands will need to be adapted to Windows.

Loading a molecule

To load the molecule into PyMOL from a file that we have already downloaded, the syntax is:

load file-name

So in our case we will type:

load 4qlc.pdb

This will create an object called ‘4qlc’ in our PyMOL session.

Another alternative for loading molecues is by using the ‘fetch’ command. If you are connected to the internet and if you know the 4-character PDB ID for the structure that you are interested in (in this case: 4qlc), you could combine the downloading and the loading steps into one. Instead of typing the command above, you could type:

fetch 4qlc

This will download the file from the Protein Data Bank into the current directory and load the molecule into PyMOL.

Molecule representations

By default the molecule will be shown as ‘lines’. There are many different ways of visualizing the molecules and features you can show on the screen that you can explore on your own. Some of the most commonly used ones are:

lines, ribbon, surface, everything, mesh, volume, cartoon, spheres, labels, sticks

For now let’s hide the ‘lines’ representation and display the molecule as ‘cartoon’. The syntax for the ‘hide’ and ‘show’ commands is:

hide [representation], [selection-name]
show [representation], [selection-name]

Do not worry about the ‘selection-name’ part for now, since that is an optional argument. Without the ‘selection-name’ expression, PyMOL will hide or show the specified representation for all the objects loaded. And if you don’t specify the ‘representation’ either, ‘hide’ will hide everything, while ‘show’ will show ‘lines’ as a default. So let’s hide everything and show our molecule as cartoon using the two commands:

hide
show cartoon

Depending on your preference, you might want to change the background color to something other than black. Black is usually good for visual inspection and for exploring different features, but when you are about to generate figures, they look nicer with a white background. You can change the background color by typing:

bg_color white

At this point our PyMOL session would look like this:

Select/create commands

One of the most useful features of the PyMOL interface is creating atom selections and new objects from select components (groups of atoms) in your molecule. Like we hinted before, when showing/hiding different representations or assigning colors, we might want to be more specific to apply those settings to certain components of the molecule. Creating atom selections allows us to not have to type the long expressions (we will get to that just below) every time we need to use those selections. So the syntax for creating atom selections is:

select selection-name, selection-expression

The ‘selection-name’ is just a name that you assign to that group of selected atoms. The ‘selection-expression’ describes the list of atoms using simple or complex expressions that use identifiers, property selectors, etc. that we will explore a little more in depth.

Two important categories of selectors are single word selectors and property selectors.

Single word selectors

Single word selectors are pretty much shortcuts to entire classes of atoms, provided by the PyMOL interface that could be useful to you. They can be combined with other keywords (property selectors) using logical expressions. The table below comes straight from the PyMOL documentation.

Single Word Selector	Short Form	Selector Description
all	*	All atoms currently loaded into PyMOL
none	none	No atoms (empty selection)
hydro	h.	All hydrogen atoms currently loaded into PyMOL
hetatm	het	All atoms loaded from Protein Data Bank HETATM records
visible	v.	All atoms in enabled objects with at least one visible representation
present	pr.	All atoms with defined coordinates in the current state (used in creating movies)

As an example of single word selectors, just earlier we typed ‘hide’ to remove the lines. To produce the same result, instead we could have typed:

hide all

Property selectors

Property selectors, on the other hand, take more general expressions using selectors and identifiers. Let’s start by using it first, and then we can explain it. In our case, let’s create some selections by typing:

select dna, chain I+J
select histonecore, not chain I+J+U
select linkerhistone, chain U

In the first line, ‘chain’ would be the selector word and ‘I+J’ would be the identifier for the group of atoms that we are interested in. In this case we want to select just the DNA atoms and from exploring the PDB file, we can see that the I and J chains correspond to the two DNA strands. The same way, the linker histone protein is under chain U and the rest of the atoms belong to the histone core. One thing you can notice is that we used a ‘+’ sign to list more than one identifier. In another case further down, we will see that if we had to specify a range (for example residue numbers), we can use a dash sign: ‘-’. The other thing you can notice is that we used a logical operator ‘not’ in the expression. Other logical operators like ‘and’ and ‘or’ work the same way, if we want to combine different selectors to create a more complex (and therefore more specific) expression. The table below, that contains the syntax for some of the most useful property selectors, also comes from the PyMOL documentation.

Matching Property Selector	Short Form Selector	Identifier and Example
symbol		chemical-symbol-list list of 1- or 2-letter chemical symbols from the periodic table PyMOL> `select polar, symbol o+n`
name		atom-name-list list of up to 4-letter codes for atoms in proteins or nucleic acids PyMOL> `select carbons, name ca+cb+cg+cd`
resn		residue-name-list list of 3-letter codes for amino acids, or up to 2-letter codes for nucleic acids PyMOL> `select aas, resn asp+glu+asn+gln` PyMOL> `select bases, resn a+g`
resi		residue-identifier-list or residue-identifier-range list of up to 4-digit residue numbers, or range of residue numbers PyMOL> `select mults10, resi 1+10+100+1000` PyMOL> `select nterm, resi 1-10`
chain		chain-identifier-list list of single letters or sometimes numbers PyMOL> `select firstch, chain a`
ss	ss	secondary-structure-type list of single letters PyMOL> `select allstrs, ss h+s+l+""`

You can learn more about the usage of the commands above and many more in the official PyMOL documentation. You can go to the PyMOL wiki or to the old documentation in the SourceForge open source project site. More specifically, you can find the selection documentation here.

One thing I wanted to add here is that the ‘create’ command works in an almost identical way to the ‘select’ command in tems of the syntax. The big difference is that in the case of the ‘create’ command, you are actually creating a brand new object in your PyMOL session from the atoms in the selection expression. So this new object behaves as if you were to load a new molecule into the session, in contrast with the ‘select’ command, which merely creates a pointer to the atoms in the original object.

Figure 1 - cartoon representation

So now that we know a little bit about the basics, let’s try to improve our cartoon representation. First, let’s use the selections that we created before to assign different colors to the components of our system. We will use the ‘color’ command, which works like this:

color color-name, selection-expression

By now you are probably getting an idea for how commands work in PyMOL. You can find a list of possible ‘color-name’ options on the PyMOL documentation or by clicking on the colored ‘C’ button on the right hand of any of your session objects/selections and hovering the mouse over the different color groups. The ‘selection-expression’ is the exact same thing as the one from above. You can either use one of the selectors or write the name of your existing selection. So let’s try:

color paleyellow, dna
color palecyan, histonecore
color pink, linkerhistone

To make the cartoon look even better we can play around with some other settings, that are usually a little more obscure and less documented but you can find information by performing simple searches on Google. For example let’s try:

set cartoon_ring_mode, 3
cartoon oval, dna

The first line will make the bases of the DNA stand out a little more, while the second command will make the cartoon representation of the DNA backbone look like a helix, instead of the loop that is shown by default. You can play around with different values for the ‘cartoon_ring_mode’ setting and see the effect of changing it.

Another thing that is commonly used when generating images of protein alpha helices is:

set cartoon_fancy_helices, on

So if we were to save the image as it is at this point using the command

png image-name.png

the resulting figure would look something like this:

We can make our figures look much better, by using something called ray-tracing. You can read about it online but basically, it means simulating the path of the light rays from a source (lamp, sun, etc) to the object and its surroundings and into the camera lens. The results are something that looks a little more animated. The syntax for ray-tracing is simply:

ray [width, height]

The width and height are the dimensions in number of pixels of the desired image. They are optional, which means if they are left blank, the generated image will have the size and display ratio of the current session. In that sense, you can resize the display by dragging it with your mouse, or you may choose to set a size using the command line. This may not make much sense for now but it is very useful if you want to generate images automatically using scripts and you want to mantain a consistent size and view for your images. One thing to note is that this is a process that, depending on the settings and size of the image, could take quite some time, so you have to be patient and let PyMOL finish. So let’s change the display size and ratio to:

viewport 450, 600

Now if we want to generate an image of the same size we can simply type:

ray

The image you just generated will look like this (but with a different aspect ratio):

As you can see, it already looks much better than the previous one, with softer edges and better depth of field. To improve the ray-traced images even further, we can change some settings that for now, you can just take my word for.

unset depth_cue
set antialias, 2
set ray_shadow_decay_factor, 0.2
set ray_shadow_decay_range, 2
set ambient_occlusion_mode, 0.2
set ray_trace_fog, 0

You can read more about some of these settings on the online documentation and take a look here, here and here, where you might find some useful examples and tricks. Just remember that making and improving figures is a never ending process. You can change the settings in a way that you will generate an amazing figure that will take an entire week to render, but that is beyond what we are trying to do.
In our case, if we want so generate a bigger image (say 2x the size of our display), so we can have a nice high-resolution figure for papers/presentations, we can type something like:

ray 900, 1200
png fig-cartoon.png

Remember that the ‘ray’ command just renders a high-quality image in the PyMOL session. You still need to save it to a file using the ‘png’ command, just like in the second line above. Now you have a high resolution quality figure of a cartoon representation.

Figure 2 - surface representation

Now let’s try to generate another type of image with the same size and point of view as the previous one, but this time using a surface representation. We can keep working in the current session, saving ourselves from the trouble of reloading everything and resetting some of the settings that we want to keep. Let’s get right to it, by hiding everything and showing the surface.

hide
show surface

Now you should have something that looks like this displayed in your session:

The different representations (cartoon, surface, etc.) usually have their own set of use cases which they are best suited for. In our case, let’s use the surface representation to highlight the major and minor grooves of the DNA. To do that we need to first create selections from the group of atoms found in the major/minor grooves. As we can see from the figure below, atoms N4, N6, O4, O6, C5, C7, N7 are found inside the major groove, while atoms N2, C2, O2, N3 are found inside the minor groove.

With that in mind let’s create two different selections from our DNA (remember that chains I and J represent the DNA in our molecule):

select minorgroove_at, chain I+J and name N2+C2+O2+N3
select majorgroove_at, chain I+J and name N4+N6+O4+O6+C5+C7+N7

Now that we have the selections, let’s change the previous colors a little bit so that we can highlight the major/minor grooves:

color paleyellow, dna
color palecyan, histonecore
color lime, linkerhistone
color skyblue, minorgroove
color pink, majorgroove

In order to make the surface representation look a little smoother and more refined let’s change a setting that you can play around with in order to see the effects (WARNING: changing this value to 2 or more could take a really long time, depending on your computer, and slow down the ray-tracing process even more!)

set surface_mode, 1

Now we can ray-trace our scene (‘ray’ command might take some time - be patient) to generate a high resolution image and save it in our directory.

ray 900, 1200
png fig-surface.png

The resulting image will look something like this:

Figure 3 - mixed representation with labels

The tricks we learned from creating figures 1 and 2 should take us pretty far in terms of creating high quality images for use in papers and presentations. Now let’s try to combine some of these tools to generate a more complex image with different features highlighted. One thing to note here is that adding features to an image comes with its own risks, the major one being that the image becomes too complicated to read and interpret. So keep in mind that the purpose of the images is to convey the message in an easy to interpret form but, with that in mind, the purpose of this tutorial is to illustrate some of the things that PyMOL allows you to do.

Let’s start with the surface representation that we have so far and add in some information about how the linker histone interacts with the nucleosome, more specifically nucleosomal DNA. When we are trying to highlight certain features, it is good to make the rest of the molecule fade, so lets make the the surface representation more transparent.

set transparency, 0.4, 4qlc

This will make just the 4qlc object, as well as all of the selections that point to the atoms in the object, 40% transparent. As you may have noticed by now, most of the settings that can be changed with ‘set’ work the same way:

set setting-name, setting-value, [selection]

The ‘selection’ expression is optional, again. If not provided, the specified setting will be changed to the given value for all objects loaded in the session. Now that we have done that, let’s create a new object for the linker histone. We want to do that because we want to show a different representation (a cartoon in this case) for that component in addition to the existing surface and we cannot show two different representations for the same selection. So let’s do:

create gh5, 4qlc and chain U
create dnabp, 4qlc and chain I+J

As we said earlier, these new objects we created are entire new sets of atoms, just like the 4qlc object and are completely independent from 4qlc. In this case we created a new object for the nucleosomal DNA since we want to highlight the contacting bases on the DNA as well. We will later show the contacts on the DNA as ‘sticks’. Before showing new representations, we have to hide the surface representation for the newly created objects. The reason for this is because when we create a new object from an existing one, the representation shown in the old object carries over to the new one. So basically we have two sets of surfaces shown for the linker histone and DNA at this point. Let’s hide those and show the cartoon for the linker histone:

hide surface, gh5
hide surface, dnabp
show cartoon, gh5
set transparency, 0.7, linkerhistone

Just to make the cartoon stand out more, we also set the transparency a little higher for the surface of the linker histone from the original 4qlc object. So at this point our session should look similar to this:

We can drag the molecule with the mouse so we can zoom it and reposition it to focus on the linker histone. Once we have oriented our molecule and we are happy with the view, there is a neat feature we can use to obtain the matrix for the current view. This way we can reuse it when we write a script to recreate the same image or a different one of the molecule oriented the exact same view. The command is:

get_view

What this command produces is a matrix with the position, orientation and zoom information of the current display, and it prints out on the results area (same window that you type, just above) a complete command (set_view (...)), containing that matrix, that you can just copy and paste into your script. You can also copy and paste the complete block directly into the command line. For example, try copying and pasting the block of lines below as it is. You should be able to obtain the view I am currently using for the final image created for this part of the tutorial. First, change the viewport to a different aspect ratio, then set the view.

viewport 600, 600
set_view (\
     0.963745475,   -0.009863121,   -0.266564101,\
     0.075177506,    0.968832552,    0.235966802,\
     0.255937904,   -0.247460246,    0.934470892,\
    -0.002535433,   -0.005029410, -193.080993652,\
    31.255285263,  168.931381226,   51.533031464,\
   -85.735649109,  471.749694824,  -20.000000000 )

Now let’s try to get into something more complicated in terms of selection expressions. Since we want to see the contacts between the linker histone and the DNA in a way that is easy to visualize, we will focus on the overall residues that are making contact and not show specific atoms (that might complicate the figure a little too much). If you go back to the documentation for the property selectors and scroll down, you can find some keyowrds we can use in this case. When you select ‘a within X of b’, PyMOL will select all atoms in ‘a’ that are within ‘X’ Angstroms of ‘b’. At the same time, the keyword ‘byres’ will group those atoms that satisfy the condition and return the entire residues that contain those atoms. So we type:

select dnacontacts, byres (dnabp within 4 of gh5)
select gh5contacts, byres (gh5 within 4 of dnacontacts)
select gh5c_min, byres (gh5 within 4 of dnacontacts and name N2+C2+O2+N3)
select gh5c_maj, byres (gh5 within 4 of dnacontacts and name N4+N6+O4+O6+C5+C7+N7)

In the first line, we are selecting the residues (bases in this case) in the DNA chains (dnabp selection we already created) that contain atoms located within 4 Angstroms from any of the linker histone (gh5) atoms. Now, in the second line we are selecting the residues in the linker histone that contain atoms located within 4 Angstroms from any of the DNA bases we just selected above. In the third and fourth lines, we are doing the same thing, but more specifically, we are selecting the residues in the linker histone that “make contact” with just the minor/major groove atoms in the DNA. We can show all the contacting bases in the DNA as sticks, and simply color the contacting residues on the linker histone protein - blue for the ones making contact with the minor groove, red for the ones making contact with the major groove and orange for the rest of them.

show sticks, dnacontacts
color orange, gh5contacts
color blue, gh5c_min
color red, gh5c_maj

At this point we should have a figure that looks like this:

One more thing we can add to images in PyMOL (last for this tutorial) is labels. We can add labels using the command:

label selection, [expression]

In this case the ‘selection’ means the exact same thing we have seen so far when writing selections - it refers to the thing that is being labeled. The ‘expression’ is how the labeling will be done and, in a sense, what will be written. The identifiers used for this part are pretty much the ones you would expect and have used before and they can be combined in a very simple way that we will demonstrate just below.

So let’s label the residues on the linker histone that make contact with the minor or major groove just as an example. We can write, for example, the residue type and the residue number for each one of them, assign different colors depending on if it’s a minor or major grooves and make them relatively big.

label gh5c_min and name CB, resn+","+resi
label gh5c_maj and name CB, resn+","+resi
set label_size, 20
set float_labels, on
set label_color, blue, gh5c_min
set label_color, red, gh5c_maj

Now we can ray-trace the image and save it.

ray 900, 900
png fig-composite.png

If you still have questions about using PyMOL or the virtual machine in VirtualBox, feel free to email me at stefjord.todolli@rutgers.edu.

Apendix I - websites/resources mentioned in the text

Chromatosome structure from Bai, et al. - [http://www.rcsb.org/pdb/explore/explore.do?structureId=4qlc]
PyMOL wiki site - [http://www.pymolwiki.org]
PyMOL documentation (old) on the open source hosting site (SourceForge) - [http://pymol.sourceforge.net/newman/user/toc.html]
PyMOL documentation, selection syntax - [http://pymolwiki.org/index.php/Select]
PyMOL color values - http://pymolwiki.org/index.php/Color_Values
Ray-tracing documentation - [http://www.pymolwiki.org/index.php/Ray]
PyMOL image gallery - [http://www.pymolwiki.org/index.php/Gallery]
PyMOL tips and tricks from the University of Cambridge, Department of Chemistry - [http://www-cryst.bioc.cam.ac.uk/members/zbyszek/figures_pymol]

Apendix II - creating/running scripts

We can take what we just did above and write a script (or one for each image), that we can reuse to recreate the images without having to type everything over again. So we can just create a plain text document where we will write all those commands verbatim. You can give it any name that you wish, and a simple ‘.txt’ ending, or use the PyMOL convention ‘.pml’ ending. It will not make a difference, it’s just a way of keeping track of what kind of file it is. Inside that text file you can write comments that start with a ‘#’ sign that can contain any explanatory text that will not be executed by PyMOL. I have created 4 different scripts for this tutorial. Three of them reproduce the images we just created above and a fourth one generates an alternative to the third figure, with a different representation for the DNA bases. You can download those scripts and place them in your directory where the PDB file ‘4qlc.pdb’ is located. After starting PyMOL and navigating to the direcotry just like we did at the start you can run the script by using the command:

@demo1-cartoon.pml

So in this case, the script is called ‘demo1-cartoon.pml’ and you run it by just preceding it with an ‘@’ sign. You can use these scripts as templates to create your own figures and learn new tricks by modifying them as you please. You can find the PyMOL scripts on the Sakai site, or you can download them from here.

Introduction to visualization with PyMOL

Stefjord Todolli

October 20, 2015

PyMOL fundamentals

Loading a molecule

Molecule representations

Select/create commands

Single word selectors

Property selectors

Figure 1 - cartoon representation

Figure 2 - surface representation

Figure 3 - mixed representation with labels

Apendix I - websites/resources mentioned in the text

Apendix II - creating/running scripts

Apendix III - complete figures generated at each stage (also, by each script)

Figure 1 - cartoon (`demo1-cartoon.pml`)

Figure 2 - surface (`demo2-surface.pml`)

Figure 3 - composite/mixed (`demo3-composite.pml`)

Figure 4 - alternative to Fig 3, not explained in previous text (`demo4-composite.pml`)

Introduction to visualization with PyMOL

Stefjord Todolli

October 20, 2015

PyMOL fundamentals

Loading a molecule

Molecule representations

Select/create commands

Single word selectors

Property selectors

Figure 1 - cartoon representation

Figure 2 - surface representation

Figure 3 - mixed representation with labels

Apendix I - websites/resources mentioned in the text

Apendix II - creating/running scripts

Apendix III - complete figures generated at each stage (also, by each script)

Figure 1 - cartoon (demo1-cartoon.pml)

Figure 2 - surface (demo2-surface.pml)

Figure 3 - composite/mixed (demo3-composite.pml)

Figure 4 - alternative to Fig 3, not explained in previous text (demo4-composite.pml)

Figure 1 - cartoon (`demo1-cartoon.pml`)

Figure 2 - surface (`demo2-surface.pml`)

Figure 3 - composite/mixed (`demo3-composite.pml`)

Figure 4 - alternative to Fig 3, not explained in previous text (`demo4-composite.pml`)