Overview

In todays workshop we will be looking to:

Use online tools to analyze proteomic data
Interpret output related to proteomic analytical methods

MASCOT

MASCOT is an online tool to analyis tandem MS (or MS/MS) data related to proteomic analysis. As you will see, Mascot and other proteomic tools take MGF input files. These are a file type that contains the parameters and mass spectra data from the samples you have analyzed.

Navigate to MASCOT here: http://www.matrixscience.com/cgi/search_form.pl?FORMVER=2&SEARCH=MIS

Today, we will use MASCOT to:

identify proteins in a sample
quantify proteins in a mixture (emPAI)
determine differential abundance of proteins (iTRAQ)

MASCOT Parameters

Proteolytic enzyme
- You need to select the one used for protein digestion
- Cleavages
  - Acidic sites: AspN (D), GluC (E), V8 (D, E)
  - Basic sites: Trypsin (R,K), Lys-C (K), Arg-C (R)
  - Chemical: CNBr (M), W oxidation
Missed-cleavage
- In case of partial fragments present, usually set up as 1 or 2.
Modifications
- Fixed or variable
  - Fixed modifications are applied to every instance of the specified residue(s) or terminus. For example, selecting Carboxymethyl (C) means that all calculations will use add 58 Da to the mass of cysteine.
  - Variable modifications are those which may or may not be present.
Peptide and fragment mass tolerance
- Depends on the mass spectrometer used. This will be a known instrument parameter.

Confident and quality criteria in MASCOT

Confidence Criteria

Number of peptide sequences: minimum of 2 required
Protein sequence coverage
Total MASCOT score and individual ion score
The quality of the MS/MS spectra judged by a full length y-ion series of peptides comprising at least six consecutive amino acids and no missed cleavages

False Discovery Rate (FDR)

Using controlled FDR, we can find an optimum between maximized true positives and minimized false positive and false negatives
FDR = # False Discoveries/#Total Discoveries
Common acceptance criteria is either 5% or 1%
Modify the significance threshold to get close to the optimum FDR
How do we calculate FDR?
- To determine the number of false discoveries we search against a “Decoy” protein database that has nonsense proteins
- Common decoy strategies: reversed database, shuffled database and randomized database.

Exercise 1: Identification

Locate the file, “identification.mgf”
Enter you name and email
For this data set use
- enzyme =trypsin
- 1 missed cleavage
- oxidation (M) as a variable modification
- +/- 0.2 Da for peptide ion tolerance
- +/- 0.2 Da fragment ion tolerance
- monoisotopic values
- charge state 1+, 2+, 3+
- instrument type = ESI-Quad ToF
- check Decoy Database
For Data File, browse to the location of “identification.mgf”
Click “Start Search…”

Question 1: Are there any proteins identified?

Click “Re-search” near the top of the output page. This will take you back to your query with settings as you set them previously.
Add “SwissProt” to the database you are searching.
Click “Start Search…”

Question 2: What protein is the best hit?

Click on your top search hit.

Question 3: Examine the regions of this protein that have been matched from your data. Is the entire protein detected?

Try changing some of the following:

Peptide and fragment ion tolerances
Database/taxonomy filter
Missed cleavages
Modifications
Try loosening and tightening the mass tolerances
Try decreasing the search space by adding a taxonomy filter and/or choosing a different database
Now let’s see if we can match more peptides to proteins adding possible modifications or missed cleavages

Exercise 2: Quanitifying (emPAI)

What is emPAI?

emPAI = exponentially modified Protein Abundance Index
\(emPAI = 10^{PAI}-1\)
\(PAI = N_{observed peptides}/N_{observable peptides}\)
\(\textrm{Mole Fraction of a protein} = emPAI/\sum_{i=0}^{n}emPAI_n\)

Let’s make some measurements!

Locate the file, “emPAI.mgf”
For all this data set use
- enzyme = trypsin
- 2 missed cleavage
- oxidation (M) (variable)
- Carbamidomethyl (C) (fixed)
- +/- 0.7 Da for peptide ion tolerance
- +/- 0.2 Da fragment ion tolerance
- monoisotopic values
- charge state 1+, 2+, 3+
- instrument type = ESI-Quad ToF
- check Decoy Database
For Data File, browse to the location of “emPAI.mgf”
For Database choose Swissprot, for Taxonomy choose Humans
Click “Start Search…”

Question 4: What is the top hit? What is its emPAI value?

Go to hit YES, emPAI = 1.17, open protein report in a new tab (by clicking on YES Human)
Go to hit RPN2, emPAI = 0.13; open protein report in a new tab
- Notice the difference in sequence coverage and how that is reflected in the emPAI value
- Let’s check the peptides from RPN2 to make sure since it is on the border of confidence criteria

Question 5: What do you notice about proteins with high emPAI vs. low emPAI values with respect to coverage?

Go to hit #1
Go to hit #2
- Notice the shared peptides between them.
- Notice that they both have uniquely identifying peptides
- Which protein is really present?
- Are both present?
- How do we know which peptides go with which isoform?

This is the “Protein Inference Problem” , or how do we infer the identity or quantity of a protein present in a complex mixture when some peptides are shared among different proteins or protein forms?

Exercise 3: Differential abundance (iTRAQ)

Locate the file, “iTRAQ.mgf”
Search against SwissProt with
- drosophila as the taxonomy filter
- iTRAQ 4-plex quantitation
- +/- 0.7 Da peptide/fragment ion tolerances
- 3 missed cleavages
- +1, +2 and +3 charge states
- fixed methylthio (C)
- variable oxidation (M)
- Check Decoy
- monoisotopic values
When the search is complete, filter by unique only, automatic outlier removal ( in the new yellow box), then press “Format As”.
Go to hit PDI DROME
Notice the variation for K.SVFEGELNEENLK.K
Notice the general trend for all peptides and proteins

Question 6: Consider VIT3_DROME. Examining the variation in iTRAQ ratios of the peptides detected, which sample (115, 116, or 117) is creating problems in analysis? Why?

Important considerations:

There are variations in the measurements
There is a problem with interference
This is an example where we are asking many questions at once knowing there will be many “fuzzy” answers

Module 9 Lab Exercises

Lon Chubiz

2023-11-09