Overview

In todays workshop we will be looking to:

MASCOT

MASCOT is an online tool to analyis tandem MS (or MS/MS) data related to proteomic analysis. As you will see, Mascot and other proteomic tools take MGF input files. These are a file type that contains the parameters and mass spectra data from the samples you have analyzed.

Navigate to MASCOT here: http://www.matrixscience.com/cgi/search_form.pl?FORMVER=2&SEARCH=MIS

Today, we will use MASCOT to:

MASCOT Parameters

  1. Proteolytic enzyme
    • You need to select the one used for protein digestion
    • Cleavages
      • Acidic sites: AspN (D), GluC (E), V8 (D, E)
      • Basic sites: Trypsin (R,K), Lys-C (K), Arg-C (R)
      • Chemical: CNBr (M), W oxidation
  2. Missed-cleavage
    • In case of partial fragments present, usually set up as 1 or 2.
  3. Modifications
    • Fixed or variable
      • Fixed modifications are applied to every instance of the specified residue(s) or terminus. For example, selecting Carboxymethyl (C) means that all calculations will use add 58 Da to the mass of cysteine.
      • Variable modifications are those which may or may not be present.
  4. Peptide and fragment mass tolerance
    • Depends on the mass spectrometer used. This will be a known instrument parameter.

Confident and quality criteria in MASCOT

Confidence Criteria

  • Number of peptide sequences: minimum of 2 required
  • Protein sequence coverage
  • Total MASCOT score and individual ion score
  • The quality of the MS/MS spectra judged by a full length y-ion series of peptides comprising at least six consecutive amino acids and no missed cleavages

False Discovery Rate (FDR)

  • Using controlled FDR, we can find an optimum between maximized true positives and minimized false positive and false negatives
  • FDR = # False Discoveries/#Total Discoveries
  • Common acceptance criteria is either 5% or 1%
  • Modify the significance threshold to get close to the optimum FDR
  • How do we calculate FDR?
    • To determine the number of false discoveries we search against a “Decoy” protein database that has nonsense proteins
    • Common decoy strategies: reversed database, shuffled database and randomized database.

Exercise 1: Identification

  1. Locate the file, “identification.mgf”
  2. Enter you name and email
  3. For this data set use
    • enzyme =trypsin
    • 1 missed cleavage
    • oxidation (M) as a variable modification
    • +/- 0.2 Da for peptide ion tolerance
    • +/- 0.2 Da fragment ion tolerance
    • monoisotopic values
    • charge state 1+, 2+, 3+
    • instrument type = ESI-Quad ToF
    • check Decoy Database
  4. For Data File, browse to the location of “identification.mgf”
  5. Click “Start Search…”

Question 1: Are there any proteins identified?

  1. Click “Re-search” near the top of the output page. This will take you back to your query with settings as you set them previously.
  2. Add “SwissProt” to the database you are searching.
  3. Click “Start Search…”

Question 2: What protein is the best hit?

  1. Click on your top search hit.

Question 3: Examine the regions of this protein that have been matched from your data. Is the entire protein detected?

Try changing some of the following:

Exercise 2: Quanitifying (emPAI)

What is emPAI?

  • emPAI = exponentially modified Protein Abundance Index
  • \(emPAI = 10^{PAI}-1\)
  • \(PAI = N_{observed peptides}/N_{observable peptides}\)
  • \(\textrm{Mole Fraction of a protein} = emPAI/\sum_{i=0}^{n}emPAI_n\)

Let’s make some measurements!

  1. Locate the file, “emPAI.mgf”
  2. For all this data set use
    • enzyme = trypsin
    • 2 missed cleavage
    • oxidation (M) (variable)
    • Carbamidomethyl (C) (fixed)
    • +/- 0.7 Da for peptide ion tolerance
    • +/- 0.2 Da fragment ion tolerance
    • monoisotopic values
    • charge state 1+, 2+, 3+
    • instrument type = ESI-Quad ToF
    • check Decoy Database
  3. For Data File, browse to the location of “emPAI.mgf”
  4. For Database choose Swissprot, for Taxonomy choose Humans
  5. Click “Start Search…”

Question 4: What is the top hit? What is its emPAI value?

  1. Go to hit YES, emPAI = 1.17, open protein report in a new tab (by clicking on YES Human)
  2. Go to hit RPN2, emPAI = 0.13; open protein report in a new tab
    • Notice the difference in sequence coverage and how that is reflected in the emPAI value
    • Let’s check the peptides from RPN2 to make sure since it is on the border of confidence criteria

Question 5: What do you notice about proteins with high emPAI vs. low emPAI values with respect to coverage?

  1. Go to hit #1
  2. Go to hit #2
    • Notice the shared peptides between them.
    • Notice that they both have uniquely identifying peptides
    • Which protein is really present?
    • Are both present?
    • How do we know which peptides go with which isoform?

This is the “Protein Inference Problem” , or how do we infer the identity or quantity of a protein present in a complex mixture when some peptides are shared among different proteins or protein forms?

Exercise 3: Differential abundance (iTRAQ)

  1. Locate the file, “iTRAQ.mgf”
  2. Search against SwissProt with
    • drosophila as the taxonomy filter
    • iTRAQ 4-plex quantitation
    • +/- 0.7 Da peptide/fragment ion tolerances
    • 3 missed cleavages
    • +1, +2 and +3 charge states
    • fixed methylthio (C)
    • variable oxidation (M)
    • Check Decoy
    • monoisotopic values
  3. When the search is complete, filter by unique only, automatic outlier removal ( in the new yellow box), then press “Format As”.
  4. Go to hit PDI DROME
  5. Notice the variation for K.SVFEGELNEENLK.K
  6. Notice the general trend for all peptides and proteins

Question 6: Consider VIT3_DROME. Examining the variation in iTRAQ ratios of the peptides detected, which sample (115, 116, or 117) is creating problems in analysis? Why?

Important considerations: