This vignette shows the available functionality in this package and gives a minial demo.
# Load the package
library(MassSpectrometry)mzXML filesmzXML file is a common mass spectrometry data format.
mzFns <- system.file(c("threonine/threonine_i2_e35_pH_tree.mzXML",
"lockmass/LockMass_test.mzXML"),
package = "msdata")
## Read all the mzXML data
allIntensities <- readMZ(mzFns)
#> Warning in readMZ(mzFns): Using different scanCounts number in files!
## Read the peaks data within certain ragnges
starts <- c(50, 70, 80)
ends <- c(55, 75, 85)
rangedIntensities <- readMZ(mzFns, starts=starts, ends=ends)
#> Warning in readMZ(mzFns, starts = starts, ends = ends): Using different
#> scanCounts number in files!We have normaliseSpectrum to do spectrum normalisation with three methods.
(Rasmussen and Isenhour 1979) studied these three normalisation methods and they found that all the normalization methods and search methods gave similar results, although the “sum” method seems the to be best normalization method.
x <- c(50,100,10,200)
normaliseSpectrum(x, method="sum")
#> [1] 0.13888889 0.27777778 0.02777778 0.55555556
normaliseSpectrum(x, method="max")
#> [1] 0.25 0.50 0.05 1.00
normaliseSpectrum(x, method="unit")
#> [1] 0.21801036 0.43602072 0.04360207 0.87204144This metric measures the geometric distance between two spectra. The component of each spectrum is normalised into the unit length. Then the normalised vector of spectrum can be considered as a single point a sphere with unit radius in a hyperspace of \(n\) dimensions, where \(n\) is the number of components of the vector. Two closer spectra will result in smaller geometric distance (Alfassi 2004).
The inverse of the geometric distance plus 1 is returned in this function, as a measure of the similarity of two spectra. The similarity score of two spectra between 0.5 and 1. 1 means perfect identification and 0.5 means the most dissimilarity. This measurement is one to one correlated with cosine similarity.
\[ MF_g = \frac{1}{1 + \sum(\frac{u_i}{\sqrt{\sum{u_i^2}}} - \frac{s_i}{\sqrt{\sum{s_i^2}}})^2}\]
a <- c(1, 10, 5, 8)
b <- c(2, 10, 5, 8)
c <- c(1, 10, 5, 9)
geometricMF(a, b)
#> [1] 0.9948658
geometricMF(a, c)
#> [1] 0.996804
geometricMF(b, c)
#> [1] 0.9912964Given a set of x (the “mass” or “mass/charge”) and y (the peak “intensities”), generate a series of points for ploting a smooth gaussian distribution spectrum.
x <- c(100, 500, 800)
y <- c(50, 100, 20)
ans <- generatePseudoGaussianSpectrum(x, y, sd=5L,
xlim=c(1, 1000), step = 1L)
plot(ans$x, ans$y, type="l", xlab="m/z", ylab="Intensity",
xaxs="i", yaxs="i")sessionInfo()
#> R version 3.3.0 (2016-05-03)
#> Platform: x86_64-apple-darwin15.5.0 (64-bit)
#> Running under: OS X 10.11.5 (El Capitan)
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] MassSpectrometry_1.0.4 BiocInstaller_1.23.5
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_0.12.5 codetools_0.2-14 digest_0.6.9
#> [4] formatR_1.4 magrittr_1.5 evaluate_0.9
#> [7] stringi_1.1.1 readxl_0.1.1 rmarkdown_0.9.6
#> [10] mzR_2.7.3 tools_3.3.0 stringr_1.0.0
#> [13] Biobase_2.33.0 ProtGenerics_1.5.0 yaml_2.1.13
#> [16] parallel_3.3.0 BiocGenerics_0.19.1 htmltools_0.3.5
#> [19] knitr_1.13Alfassi, Zeev B. 2004. “On the Normalization of a Mass Spectrum for Comparison of Two Spectra.” Journal of the American Society for Mass Spectrometry 15 (3): 385–87. doi:10.1016/j.jasms.2003.11.008.
Rasmussen, G. T., and T. L. Isenhour. 1979. “The Evaluation of Mass Spectral Search Algorithms.” J. Chem. Inf. Comput. Sci. 19 (3): 179–86. doi:10.1021/ci60019a014.