Guide of `protag`: Search Tagged Peptides & Draw Highlighted Mass Spectra

In this tutorial, I’ll illustrate the use of protag package to determine the peptide N-terminus by analyzing theMALDI-TOF-MS data.

1. Background

In a typical protein labelling experiment, proteins are chemically tagged with a functional group, usually at specific sites, then digested into peptides, which are then analyzed using matrix-assisted laser desorption ionization - time of flight mass spectrometry (MALDI-TOF MS) to generate peptide fingerprint. Relative to the control, peptides that are heavier by the mass of the labelling group are informative of sequence determination.

This package, aiming to facilitate the search of such tagged peptides with expected mass shift(s), takes as input two or multiple MALDI-TOF MS mass lists, and makes pairwise comparisons between the labeled groups vs. control, and restores centroid mass spectra with highlighted peaks of interest for easier visual examination.

2. Basic setting

2.1. Input dataset

The input dataset is required to contain at least two columns, respectively named “group” and “mass”. The “group” column contains names of experiments, which must also contain a level named “control”.

If column “intensity” is also included, mass spectra will be drawn based on given intensities; if not included, the peaks will be drawn as equal intensity. The input dataset allows existence of any other columns.

We’ll use myoglobin, dataset of simulated MALDI-TOF peptide fingerprint mass list, to demonstrate the use of this package. This dataset is contained in protag package.

2.2. Peak searching

With tag.search() function, the input dataset will be augmented with additional columns containing search results. This output dataset per se may be of less direct interest to end users. Instead, this output is further fed into plotting functions to draw highlighted mass spectra for easy use.

library(protag)
search.result = tag.search(myoglobin, 
                           # specify expected mass shift (Dalton)
                           delta = 28,
                           
                           # error tolerance, default to Dalton control at 0.5  
                           error.Da.pair = .5, 
                           error.Da.match = .5)
search.result

[[1]]
# A tibble: 195 × 14
# Groups:   group [4]
    mass position no.MC peptide.sequence  group err.ppm     err intensity status
   <dbl> <chr>    <dbl> <chr>             <chr>   <dbl>   <dbl>     <dbl> <chr> 
 1 2860. 18-43        1 VEADIAGHGQEVLIRL… cont…   125.   0.358      2791. misma…
 2 2232. 120-140      1 HPGDFGADAQGAMTKA… cont…   128.  -0.287      2334. misma…
 3  736. 98-103       1 HKIPIK            cont…    61.4  0.0451     1411. misma…
 4 2844. 120-146      2 HPGDFGADAQGAMTKA… cont…   101.  -0.286      3509. misma…
 5 1362. 47-57        2 FKHLKTEAEMK       cont…   105.   0.143       632. misma…
 6 1155. <NA>        NA <NA>              cont…   116.   0.134       902. misma…
 7 1045. <NA>        NA <NA>              cont…   111.  -0.116      1247. misma…
 8 1882. <NA>        NA <NA>              cont…   102.   0.193       804. misma…
 9 1897. <NA>        NA <NA>              cont…   102.  -0.193       966. misma…
10 1751. <NA>        NA <NA>              cont…   114.   0.199       585. misma…
# ℹ 185 more rows
# ℹ 5 more variables: pair.tracker <dbl>, intensity.scaled <dbl>,
#   spectra.dividor <dbl>, `intensity.scaled.+.-` <dbl>, sign <dbl>

[[2]]
[1] "Found both paired peaks (mass differentiate by expected delta) and matched peaks (of the same mass)."

2.3. Plotting

In the mass spectra drawing, three types of peaks will/can be highlighted, differentiated and annotated:

pair: peaks of tagged peptides with expected mass shift(s) (i.e., the delta input) and the corresponding control. Same paired peaks will be drawn in the same color, and different pairs in different colors.

match: peaks with the same mass as the control. All matched peaks will be drawn in a single color, default grey, regardless of their masses, as these peaks are often of less research interest.

mismatch: neither of the prior two cases. All mismatched peptides will be drawn in a single color, default grey, as thinner and more transparent peaks, as they are usually the least important peaks of interest.

2.3.1. listplot

This display format presents spectra in a listed manner, i.e., one spectrum on top of another. It is suitable for comparison of multiple spectra.

tag.spectra.listplot(search.result)

2.3.2. butterfly / mirrored plot

This is especially designed for comparison of two mass spectra to be displayed in a mirrored / butterfly manner. This plot gives higher annotation clarity than listplot. The general setting of butterflyplot is mostly the same as listplot.

# create subset containing only two groups
myoglobin.subset = myoglobin[myoglobin$group %in% c("control", "label1"), ]

search.result = tag.search(myoglobin.subset, delta = 28,
                           error.Da.pair = .5, error.Da.match = .5)
tag.spectra.butterflyplot(search.result)

To reduce overlap among peaks, annotations and central divider, see annotation space control.

3. Advanced setting

In this section, we’ll see how to fine tune our mass searching criteria and mass spectra display effect.

3.1. Peak search

3.1.1. Stepwise / multiple labelling

When multiple steps of labelling are of interest, simply input the delta arugument as a numeric vector. Stepwise-labelled paired peaks derived from the same control will be highlighted by the same color. See example below, and note the stepwise double labelling of peaks of 1729 series (blue-green), 1894 series (blue) and 1937 series (pink).

search.result = tag.search(myoglobin, delta = c(28, 56), 
                           error.Da.pair = .5, error.Da.match = .5)

tag.spectra.listplot(search.result) +
  # zoom in over interested mass range
  ggplot2::coord_cartesian(xlim = c(1700, 2000))

For more in range zoom-in and stepwise labelling arrow notation, see customization with ggplot2

3.1.2. Error tolerance

The mass computation error tolerance is by default controlled by Dalton, at 0.5, for both paired and matched peak search. When the error is proportional to the mass magnitude, error designation by ppm could be helpful. When error control by both Dalton and ppm is turned on, error tolerance will be controlled by whichever is the most stringent at each mass measurement. An example code follows as below (without showing output).

search.result = tag.search(myoglobin, delta = c(28, 56),
                           # turn off error control by dalton by setting infinite Da error tolerance
                           error.Da.pair = Inf, error.Da.match = Inf, 
                           # set up error control by ppm
                           error.ppm.pair = 200, error.ppm.match = 200); 
search.result

3.2. Plotting

3.2.1. Peak scale transform

Sometimes it is helpful to logarithmically transform intensity to render weak peaks more visible. This will not affect the peak mass searching result, and only affect the final display outlook.

search.result.logscale = tag.search(myoglobin, delta = c(28, 56),
                                    error.Da.pair = .5, error.Da.match = .5,
                                    intens.log.transfrom = T)
tag.spectra.listplot(search.result.logscale)

3.2.2. Selective show up

We can also select which types of peaks and mass annotations of interest to show up for a more clear view (now switching back to normal scale without log transformation). Peaks and annotations can be separately controled.

tag.spectra.listplot(search.result, 
                     # adjust mass annotation space
                     peak.height.shrink = .4, gap.annotation = .3, 
                     # not show peaks
                     show.peak.match = F, show.peak.mismatch = F, 
                     # not show mass annotations for "matched" peaks; annotation for mismatched peaks not shown by default
                     show.annotation.match = F)

3.2.3. Annotation space control

Mass annotations can be crowded, as shown above, with much overlap between peaks, annotations and central dividers. We can “shrink” the peaks to make more space for annotations, and increase the gap between peaks and annotations to avoid overlap.

tag.spectra.listplot(search.result.logscale, 
                     # the smaller the shrink factor, more "shrinked" peaks are
                     # shrink argument only applicable to listplot since not needed for butterflyplot
                     peak.height.shrink = .4,
                     # adjust gap between mass annotations and peaks
                     gap.annotation = .3)

Meanwhile, we can adjust the annotation size as well for better space utilization.

3.2.4 Color

Peak and annotation colors can be easily changed. Attach package RColorBrewer to change paired peaks colors. Matched and mismatched peaks, respectively, can only be set to a single color as these peaks are usually of less research interest. For best clarity, peaks and corresponding annotations are always of the same color. Argument names are self-explanatory.

When using RColorBrewer palettes, each run will randomly shuffle the peak & annotation colors from the designated palette source, and this feature is helpful to distinguish adjacent different peaks coated with similar color in a single run.

library(RColorBrewer)
tag.spectra.listplot(search.result,
                     # color based on palettes of RColorBrewer
                     # "PuRd", purple-red, further finely divided into different gradients
                     color.pair = "PuRd", 
                     # monocolors for "matched" and "mismatched" peaks
                     color.match = "steelblue",
                     color.mismatch = "dark green")

3.2.5 Size, and transparency

Similarly, peak width and transparency can be easily changed. Let’s make the matched and mismatched peaks more visible by applying broader peak width and more color opacity, and annotations for the paired peaks a bit bigger.

tag.spectra.listplot(search.result,
                     color.pair = "PuRd",
                     color.match = "Steelblue",
                     color.mismatch = "dark green",
                     
                     # peak & annotation transparency for mismatched peaks
                     alpha.peak.mismatch = .8, 
                     # peak width for matched peaks
                     size.peak.match = 2,
                     # mass annotation size for paired peaks
                     size.annotation.pair = 5)

3.2.6 Groupname position

The groupname position can be adjusted as below.

tag.spectra.listplot(search.result,
                     angle.groupname =  0, 
                     # negative gap values typically [-1, 0) shift groupnames to the right; 
                     # positive gap values shift groupnames to the left
                     gap.groupname = -.8,
                     # label size
                     size.groupname = 7)

3.2.7 Customization with ggplot2

When ggplot2 is attached, more flexible annotation becomes possible. For example, we can zoom-in the interested mass range; for stepwise-labelling, we can draw arrows to highlight the “paired” peaks.

## Attach ggplot2 for more plot customization
library(ggplot2)

## For data wrangling, to make the double.label dataset
## Particularly to use functions "mutate()" and pipeline "%>%"
library(dplyr) 

## Create dataset of the arrow coordinates
double.label = 
  # arrow starting point coordinates
  data.frame(x1 = c(1729, 1757, 1785, 1757, 1785), 
             y1 = c(.8, 2.8, 2.8, 3.8, 3.8)) %>%
  # arrow ending point coordinates
  mutate(x2 = x1, 
         y2 = y1 - 0.1)  

## Plot adding arrows
tag.spectra.listplot(search.result) +
  # zoom in over interested mass range
  coord_cartesian(xlim = c(1500, 2000)) + 
  # draw arrows
  geom_segment(data = double.label,
               aes(x = x1, xend = x2, y = y1, yend = y2),
               color = "firebrick", size = 1,
               arrow = arrow(length = unit(0.3, "cm")))

We can also work more on the theme. For example, we can flip over the mass spectra to fit into our paper layout.

library(ggplot2)
tag.spectra.listplot(search.result, 
                     angle.groupname = 0, 
                     angle.annotation = 0) +
  coord_flip() +
  scale_x_continuous(breaks = seq(500, 3000, by = 500),
                     labels = function(x){paste(x/1000, "K m/z")}) +
  # note that m/z scale is still the x axis despite the flip
  theme(axis.title = element_blank(),
        # note here that m/z scale becomes y axis in theme setting
        axis.text.y = element_text(face = "bold", color = "firebrick",
                                   angle = 90, vjust = -10),
        panel.grid = element_line(size = .2))

Sometimes setting the background to black gives a sharper view - and a different taste as well.

tag.spectra.listplot(search.result,
                     color.groupname = "white",
                     color.divider = "grey",
                     size.divider = .1,
                     angle.groupname = 0,
                     gap.annotation = .25) +
  
  # set background to black
  theme(panel.background = element_rect(fill = "black")) +
  # add title
  annotate(geom = "text", x = 1600, y = 3.9, 
           label = "Fluorescent Spectra", 
           color = "white", fontface = "bold")

4. References

The R code has been developed with reference to R for Data Science (2e), and the official documentation of tidyverse, and DataBrewer.co. See breakdown of modules below:

Data visualization with ggplot2 (tutorial of the fundamentals; and data viz. gallery).
Data wrangling with the following packages: tidyr: transform (e.g., pivoting) the dataset into tidy structure; dplyr: the basic tools to work with data frames; stringr: work with strings; regular expression: search and match a string pattern; purrr: functional programming (e.g., iterating functions across elements of columns); and tibble: work with data frames in the modern tibble structure.

Updated on June 16, 2024, Boston, MA
Published on August 11, 2019, Kalamazoo, MI