Guide of protag: Search Tagged Peptides & Draw Highlighted Mass Spectra
In this tutorial, I’ll illustrate the use of protag package to determine the peptide N-terminus by analyzing theMALDI-TOF-MS data.
1. Background
In a typical protein labelling experiment, proteins are chemically tagged with a functional group, usually at specific sites, then digested into peptides, which are then analyzed using matrix-assisted laser desorption ionization - time of flight mass spectrometry (MALDI-TOF MS) to generate peptide fingerprint. Relative to the control, peptides that are heavier by the mass of the labelling group are informative of sequence determination.
This package, aiming to facilitate the search of such tagged peptides with expected mass shift(s), takes as input two or multiple MALDI-TOF MS mass lists, and makes pairwise comparisons between the labeled groups vs. control, and restores centroid mass spectra with highlighted peaks of interest for easier visual examination.
2. Basic setting
2.1. Input dataset
The input dataset is required to contain at least two columns, respectively named “group” and “mass”. The “group” column contains names of experiments, which must also contain a level named “control”.
If column “intensity” is also included, mass spectra will be drawn based on given intensities; if not included, the peaks will be drawn as equal intensity. The input dataset allows existence of any other columns.
We’ll use myoglobin, dataset of simulated MALDI-TOF peptide fingerprint mass list, to demonstrate the use of this package. This dataset is contained in protag package.
2.2. Peak searching
With tag.search() function, the input dataset will be augmented with additional columns containing search results. This output dataset per se may be of less direct interest to end users. Instead, this output is further fed into plotting functions to draw highlighted mass spectra for easy use.
library(protag)search.result =tag.search(myoglobin, # specify expected mass shift (Dalton)delta =28,# error tolerance, default to Dalton control at 0.5 error.Da.pair = .5, error.Da.match = .5)search.result
[[1]]
# A tibble: 195 × 14
# Groups: group [4]
mass position no.MC peptide.sequence group err.ppm err intensity status
<dbl> <chr> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr>
1 2860. 18-43 1 VEADIAGHGQEVLIRL… cont… 125. 0.358 2791. misma…
2 2232. 120-140 1 HPGDFGADAQGAMTKA… cont… 128. -0.287 2334. misma…
3 736. 98-103 1 HKIPIK cont… 61.4 0.0451 1411. misma…
4 2844. 120-146 2 HPGDFGADAQGAMTKA… cont… 101. -0.286 3509. misma…
5 1362. 47-57 2 FKHLKTEAEMK cont… 105. 0.143 632. misma…
6 1155. <NA> NA <NA> cont… 116. 0.134 902. misma…
7 1045. <NA> NA <NA> cont… 111. -0.116 1247. misma…
8 1882. <NA> NA <NA> cont… 102. 0.193 804. misma…
9 1897. <NA> NA <NA> cont… 102. -0.193 966. misma…
10 1751. <NA> NA <NA> cont… 114. 0.199 585. misma…
# ℹ 185 more rows
# ℹ 5 more variables: pair.tracker <dbl>, intensity.scaled <dbl>,
# spectra.dividor <dbl>, `intensity.scaled.+.-` <dbl>, sign <dbl>
[[2]]
[1] "Found both paired peaks (mass differentiate by expected delta) and matched peaks (of the same mass)."
2.3. Plotting
In the mass spectra drawing, three types of peaks will/can be highlighted, differentiated and annotated:
pair: peaks of tagged peptides with expected mass shift(s) (i.e., the delta input) and the corresponding control. Same paired peaks will be drawn in the same color, and different pairs in different colors.
match: peaks with the same mass as the control. All matched peaks will be drawn in a single color, default grey, regardless of their masses, as these peaks are often of less research interest.
mismatch: neither of the prior two cases. All mismatched peptides will be drawn in a single color, default grey, as thinner and more transparent peaks, as they are usually the least important peaks of interest.
2.3.1. listplot
This display format presents spectra in a listed manner, i.e., one spectrum on top of another. It is suitable for comparison of multiple spectra.
tag.spectra.listplot(search.result)
2.3.2. butterfly / mirrored plot
This is especially designed for comparison of two mass spectra to be displayed in a mirrored / butterfly manner. This plot gives higher annotation clarity than listplot. The general setting of butterflyplot is mostly the same as listplot.
# create subset containing only two groupsmyoglobin.subset = myoglobin[myoglobin$group %in%c("control", "label1"), ]search.result =tag.search(myoglobin.subset, delta =28,error.Da.pair = .5, error.Da.match = .5)tag.spectra.butterflyplot(search.result)
In this section, we’ll see how to fine tune our mass searching criteria and mass spectra display effect.
3.1. Peak search
3.1.1. Stepwise / multiple labelling
When multiple steps of labelling are of interest, simply input the delta arugument as a numeric vector. Stepwise-labelled paired peaks derived from the same control will be highlighted by the same color. See example below, and note the stepwise double labelling of peaks of 1729 series (blue-green), 1894 series (blue) and 1937 series (pink).
search.result =tag.search(myoglobin, delta =c(28, 56), error.Da.pair = .5, error.Da.match = .5)tag.spectra.listplot(search.result) +# zoom in over interested mass range ggplot2::coord_cartesian(xlim =c(1700, 2000))
The mass computation error tolerance is by default controlled by Dalton, at 0.5, for both paired and matched peak search. When the error is proportional to the mass magnitude, error designation by ppm could be helpful. When error control by both Dalton and ppm is turned on, error tolerance will be controlled by whichever is the most stringent at each mass measurement. An example code follows as below (without showing output).
search.result =tag.search(myoglobin, delta =c(28, 56),# turn off error control by dalton by setting infinite Da error toleranceerror.Da.pair =Inf, error.Da.match =Inf, # set up error control by ppmerror.ppm.pair =200, error.ppm.match =200); search.result
3.2. Plotting
3.2.1. Peak scale transform
Sometimes it is helpful to logarithmically transform intensity to render weak peaks more visible. This will not affect the peak mass searching result, and only affect the final display outlook.
We can also select which types of peaks and mass annotations of interest to show up for a more clear view (now switching back to normal scale without log transformation). Peaks and annotations can be separately controled.
tag.spectra.listplot(search.result, # adjust mass annotation spacepeak.height.shrink = .4, gap.annotation = .3, # not show peaksshow.peak.match = F, show.peak.mismatch = F, # not show mass annotations for "matched" peaks; annotation for mismatched peaks not shown by defaultshow.annotation.match = F)
3.2.3. Annotation space control
Mass annotations can be crowded, as shown above, with much overlap between peaks, annotations and central dividers. We can “shrink” the peaks to make more space for annotations, and increase the gap between peaks and annotations to avoid overlap.
tag.spectra.listplot(search.result.logscale, # the smaller the shrink factor, more "shrinked" peaks are# shrink argument only applicable to listplot since not needed for butterflyplotpeak.height.shrink = .4,# adjust gap between mass annotations and peaksgap.annotation = .3)
Peak and annotation colors can be easily changed. Attach package RColorBrewer to change paired peaks colors. Matched and mismatched peaks, respectively, can only be set to a single color as these peaks are usually of less research interest. For best clarity, peaks and corresponding annotations are always of the same color. Argument names are self-explanatory.
When using RColorBrewer palettes, each run will randomly shuffle the peak & annotation colors from the designated palette source, and this feature is helpful to distinguish adjacent different peaks coated with similar color in a single run.
library(RColorBrewer)tag.spectra.listplot(search.result,# color based on palettes of RColorBrewer# "PuRd", purple-red, further finely divided into different gradientscolor.pair ="PuRd", # monocolors for "matched" and "mismatched" peakscolor.match ="steelblue",color.mismatch ="dark green")
3.2.5 Size, and transparency
Similarly, peak width and transparency can be easily changed. Let’s make the matched and mismatched peaks more visible by applying broader peak width and more color opacity, and annotations for the paired peaks a bit bigger.
tag.spectra.listplot(search.result,color.pair ="PuRd",color.match ="Steelblue",color.mismatch ="dark green",# peak & annotation transparency for mismatched peaksalpha.peak.mismatch = .8, # peak width for matched peakssize.peak.match =2,# mass annotation size for paired peakssize.annotation.pair =5)
3.2.6 Groupname position
The groupname position can be adjusted as below.
tag.spectra.listplot(search.result,angle.groupname =0, # negative gap values typically [-1, 0) shift groupnames to the right; # positive gap values shift groupnames to the leftgap.groupname =-.8,# label sizesize.groupname =7)
3.2.7 Customization with ggplot2
When ggplot2 is attached, more flexible annotation becomes possible. For example, we can zoom-in the interested mass range; for stepwise-labelling, we can draw arrows to highlight the “paired” peaks.
## Attach ggplot2 for more plot customizationlibrary(ggplot2)## For data wrangling, to make the double.label dataset## Particularly to use functions "mutate()" and pipeline "%>%"library(dplyr) ## Create dataset of the arrow coordinatesdouble.label =# arrow starting point coordinatesdata.frame(x1 =c(1729, 1757, 1785, 1757, 1785), y1 =c(.8, 2.8, 2.8, 3.8, 3.8)) %>%# arrow ending point coordinatesmutate(x2 = x1, y2 = y1 -0.1) ## Plot adding arrowstag.spectra.listplot(search.result) +# zoom in over interested mass rangecoord_cartesian(xlim =c(1500, 2000)) +# draw arrowsgeom_segment(data = double.label,aes(x = x1, xend = x2, y = y1, yend = y2),color ="firebrick", size =1,arrow =arrow(length =unit(0.3, "cm")))
We can also work more on the theme. For example, we can flip over the mass spectra to fit into our paper layout.
library(ggplot2)tag.spectra.listplot(search.result, angle.groupname =0, angle.annotation =0) +coord_flip() +scale_x_continuous(breaks =seq(500, 3000, by =500),labels =function(x){paste(x/1000, "K m/z")}) +# note that m/z scale is still the x axis despite the fliptheme(axis.title =element_blank(),# note here that m/z scale becomes y axis in theme settingaxis.text.y =element_text(face ="bold", color ="firebrick",angle =90, vjust =-10),panel.grid =element_line(size = .2))
Sometimes setting the background to black gives a sharper view - and a different taste as well.
tag.spectra.listplot(search.result,color.groupname ="white",color.divider ="grey",size.divider = .1,angle.groupname =0,gap.annotation = .25) +# set background to blacktheme(panel.background =element_rect(fill ="black")) +# add titleannotate(geom ="text", x =1600, y =3.9, label ="Fluorescent Spectra", color ="white", fontface ="bold")
Data wrangling with the following packages: tidyr: transform (e.g., pivoting) the dataset into tidy structure; dplyr: the basic tools to work with data frames; stringr: work with strings; regular expression: search and match a string pattern; purrr: functional programming (e.g., iterating functions across elements of columns); and tibble: work with data frames in the modern tibble structure.
Updated on June 16, 2024, Boston, MA
Published on August 11, 2019, Kalamazoo, MI