Statistical quality assessment and outlier detection for liquid chromatographymass spectrometry experiments
 Ole SchulzTrieglaff^{1, 2}Email author,
 Egidijus Machtejevas^{3},
 Knut Reinert^{2},
 Hartmut Schlüter^{4},
 Joachim Thiemann^{4} and
 Klaus Unger^{3}
DOI: 10.1186/1756038124
© SchulzTrieglaff et al; licensee BioMed Central Ltd. 2009
Received: 08 September 2008
Accepted: 07 April 2009
Published: 07 April 2009
Abstract
Background
Quality assessment methods, that are common place in engineering and industrial production, are not widely spread in largescale proteomics experiments. But modern technologies such as MultiDimensional Liquid Chromatography coupled to Mass Spectrometry (LCMS) produce large quantities of proteomic data. These data are prone to measurement errors and reproducibility problems such that an automatic quality assessment and control become increasingly important.
Results
We propose a methodology to assess the quality and reproducibility of data generated in quantitative LCMS experiments. We introduce quality descriptors that capture different aspects of the quality and reproducibility of LCMS data sets. Our method is based on the Mahalanobis distance and a robust Principal Component Analysis.
Conclusion
We evaluate our approach on several data sets of different complexities and show that we are able to precisely detect LCMS runs of poor signal quality in largescale studies.
Background
This work addresses the problem of quality assessment in large scale LCMS studies. So far, this is a relatively unexplored topic. There are some publications on the quality assessment of MS fragmentation spectra [6–12]. But their focus is different: the aim of these methods is to detect and remove low quality spectra from a LCMS/MS run. The rationale is that these spectra would not be identified by identification algorithms anyway and that their removal will lead to a significant speedup of the data analysis. To give some examples, Bern et al. [6] pioneered the quality filtering of MS/MS spectra. They used various descriptors to describe an MS/MS such as the number of isotopic peaks or the number of peaks that could be clearly attributed to b or y fragments. Bern et al. applied support vector machines and linear discriminants and removed MS/MS spectra classified as poor before the database search. Since then, several works tried to improve on this approach. Notable developments include the application of selfconvolution for MS/MS quality assessment [7] or an iterative strategy to detect highquality spectra that could not be identified in the database search [9].
Our work, however, addresses a different problem. In a quantitative LCMS experiment, the aim is to obtain abundance estimates of all peptides and proteins contained in a sample. Fragmentation and sequencing of the peptides using MS/MS is usually an additional experimental step and is not the focus of this work. Controlling data quality in highthroughput experiments as early possible is important since numerous problems can affect the quality of an LCMS run. Among these are instabilities of the chromatography, degradation of the peptides or artifacts in the mass spectra caused by the LC mobile phase or buffer molecules.
Only little work on the quality and reproducibility assessment of mass spectrometry data has been published so far [13–16]. Prakash et al. [15] use a distance measure computed by an alignment algorithm to highlight problems of reproducibility in several mass spectrometry studies. Their method is successful in visualizing the time order in which the LCMS runs were performed and reveals pattern caused by different columns or instrument settings used. But their method does not provide direct information on outlier runs and when to discard them. Whistler et al. [16], Coombes et al. [13] and Harezlak et al. [14] address the problem of noise removal and quality assessment but focus solely on SELDITOF spectra which are less complex than LCMS data.
The analysis of LCMS data is a sophisticated task and requires several computational steps such as denoising, peptide feature detection, alignment and statistical analysis [17]. After differentially expressed peptide feature have been found, they need to be sequenced using MS/MSbased identification and have their abundances and sequences mapped to the parent protein. These are general steps which usually have to be adopted depending on the aim of the study. But each of these computational steps has its own difficulties and a typical workflow is complex and error prone [18]. It is therefore desirable to identify poor LCMS runs as early as possible. This would allow us to either exclude these runs from the further analysis, to repeat them or at least to downweight these measurements to reflect our reduced confidence. In contrast to mass spectrometrybased proteomics, quality assessment and control methods are more common in gene expression studies [19–21]. Brown et al. [19] applied image metrics to find poor quality microarrays in a batch of experiments. Cohen et al. [20] applied the Mahalanobis distance to detect outlier runs in largescale gene expression studies. Finally, Model et al. [21] borrowed methods from the field of statistical process control to detect critical differences among replicate microarray measurements.
In this work, we investigate how classical methods from outlier detection and quality control can be extended and applied to LCMS data. Our approach is based on sound statistical principles and we demonstrate that we can precisely detect dubious LCMS runs in large scale studies.
Results
This work addresses the quality assessment of raw LCMS maps. By "raw", we mean the unprocessed spectra before any noise filtering, peak detection or centroiding has been performed. Most statistical methods for quality assessment expect that each item is described by one (univariate) or several (multivariate) variables. For LCMS maps, it is not clear what suitable variables could be. One straightforward approach is to describe an LCMS map by all its data points. But the number of data points (not peaks) in an unprocessed LCMS map is huge, easily several millions of points. Second, many of the raw data points in a map will be caused by noise and might distort the results of an automatic outlier detection.
Consequently, we devised a list of quality descriptors to describe an LCMS map. Some of these descriptors were taken from the literature, where they have been shown to be useful criteria for spectra mining and filtering tasks, others are new. Using these quality descriptors, we can now describe each map as a vector = (x _{1}, x _{2},...,x _{ n })^{ T }and apply statistical methods to detect runs of poor quality.
We emphasize that we define quality in terms of reproducibility, i.e., an LCMS map is of poor quality if its quality descriptors differ significantly from the descriptors of the other maps. It is thus important to compare only maps that represent the same subsets of a sample. As an example, in a multidimensional chromatography experiment, we can only compare LCMS recordings of the same chromatography steps. It does not make sense to compare LCMS maps obtained from different salt pulses, to give an example. On the other hand, even for time series or differential quantitative measurements, we would still expect the key characteristics of the LCMS maps, such as noise level or chromatography, to remain stable during the study.
Algorithms
We use a set of quality descriptors to an LCMS map. These descriptors capture various aspects of the map, such as peaks and noise level of the spectra, as well shape and reproducibility of the TIC. The descriptors are:

Median of the Euclidean distances D _{ E }(s, s') = between baselineremoved spectrum s' and original spectrum s for all spectra. The baseline or background noise in a mass spectrum is usually caused by molecules from the mobile phase of the column. Spectra with a large amount of background noise are difficult to analyze automatically. This rationale of this descriptor is that spectra with a strong baseline signal will be very different after baseline removal and thus have a large distance D _{ E }. We perform the baseline removal using a TopHatFilter which is a standard method for this task.

Median of the Euclidean distances D _{ E }between smoothed mass spectrum and original spectrum for all spectra. Consequently, a noisy spectrum will exhibit a large distance D _{ E }to its smoothed version. We performed the smoothing using a Gaussian Filter with a kernel width of ≈2.0, depending on the mass spectral peak width. The first two quality descriptors were firstly suggested by Windig et al. [22] but they applied them to chromatograms to remove noisy mass traces from the LCMS map.

The Xrea value. This measure for the quality of a mass spectrum was already proposed by Na et al. [8]. They developed it to filter MS/MS spectra before submitting them to a sequence database search. We will show that the Xrea criterion can equally be applied to MS spectra. The Xrea value is based on a cumulative intensity normalization. First, we normalize the spectral intensities by dividing by the total intensity. The cumulative normalized intensity of each data point in the spectrum is defined as the sum of the normalized intensities of all points with intensities smaller than or equal to the intensity of this point. Accordingly, the cumulative normalized intensity of the n th highest data point x is given by:(1)
where I(x) is the intensity of point x and Rank(x) represents the order of points if sorted by intensity in descending order. That is, the most intense point has rank 1, the next rank 2 etc. The nominator is divided by the sum of all intensities. This normalization is relatively stable and less dependent on the most intense peak which is a disadvantage of a normalization by the intensity sum. But in contrast to other methods such as a rankbased normalization, it does not discard the entire information contained in the spectral intensities.
where α is a correction term to account for cases in which the highest point is significantly larger than the rest. Following [12], we set α to the relative intensity of the most abundant data point.

Median of the number of data points with intensity ≥ 0 in each scan. This descriptor accounts for variations in the number of recorded intensities.

Summary statistics for m/z, intensity and signaltonoise ratio of all scans. The summary statistics consist of minimum, maximum, mean and median. We estimate the noise level using an iterative sliding window approach. We move a window of size 25 Th across each spectrum and calculate a noise level for each window. We compute mean and standard deviation σ of all intensities in the current window and discard all points with an intensity higher than 3 × σ. We repeat this procedure and estimate the local noise level as the medium intensity after three iterations.

Skewness and kurtosis of the TIC. For good and reproducible LCMS runs, the TICs should exhibit similar shapes. Skewness and kurtosis describe the asymmetry and peakedness of a distribution, respectively. The skewness is the third standardized moment of a distribution. For a sample of size n, it is defined as skew = . The skewness is positive for distribution with a tail to the right, and negative for lefttailed distributions as illustrated in Figure 3. The kurtosis is defined as kurtosis = . It measures how sharply peaked a distribution is, relative to its width. We subtract 3 to achieve a kurtosis of zero for the Gaussian distribution. A distribution with positive kurtosis has more probability mass around the mean than the Gaussian distribution whereas a distribution with negative kurtosis has less probability mass around the mean and is therefore less peakshaped. We give an example in Figure 4.

Minimum and maximum intensity of the TIC. We store the maximum and minimum intensity over the whole LCMS run.
Using the descriptors described above, we can now represent an LCMS map as a vector where each entry of this vector represents one of the quality descriptor described above.
Outlier Detection using the Mahalanobis Distance
To decide whether an LCMS map is an outlier compared to the rest of the measurements, we use the Mahalanobis distance [23]. It has previously been applied in numerous tasks, such as the quality assessment of microarray experiments [20] or face recognition [24]. It is related to the Euclidean distance but differs in the fact that each dimension is weighted by its variation.
Using the Mahalanobis distance, we can therefore measure the distance of each LCMS run, described by the vector of its quality descriptors , to the distribution of all other n runs, characterized by their mean vector and covariance matrix Σ. The Mahalanobis distance of a vector with dimension p follows a χ ^{2} distribution with p degrees of freedom. This allows us to define cutoffs for suspiciously large distances for a given confidence level α as in any statistical test. Note that the Mahalanobis distance is equal to the Euclidean distance if the covariance matrix is the identity matrix. In this case, each dimension has unit variance and each pair of dimensions is uncorrelated.
Note that if we use this criterion for outlier detection, we effectively classify a map as outlier if its vector of quality descriptors differ by a large extent from the rest. This is reasonable since even for nonreplicate LCMS runs, we would expect most quality descriptors to be similar.
However, this approach suffers from two drawbacks: first, for less LCMS runs than descriptors (n <p), the covariance matrix Σ is singular and cannot be inverted. Second, outlier in the data might distort our estimates of and Σ and lead to incorrect estimates of the distance. We solve the first problem by applying a Principal Component Analysis to reduce the dimensionality of our data to a dimension p' ≪ p but try retain the essential information at the same time. We solve the second problem by using robust estimators for location and scale.
Robust Principal Component Analysis
Principal Component Analysis (PCA) [25] is a method for dimensionality reduction and feature extraction. More formally, our aim is to represent a vector by a lower dimensional representation given by M = where M is a matrix with dimension dim(y) × dim(x) where dim(y) <dim(x). M represents a projection from the higher dimensional space of to a lower dimensional space of . In the case of PCA, M is an orthogonal linear projection.
The standard PCA works as follows: the data, in our case the vectors of quality descriptors for each map, are stored in a n × p matrix X with a row for each of the n maps and a column for each of the p quality descriptors. This matrix is centered by subtracting the columnwise mean. The covariance matrix Σ of the data is given by X ^{ T } X. It contains the variance of each dimension p on its diagonal and the covariances in the remaining entries. We compute the eigenvectors of the covariance matrix and choose as coordinates the eigenvectors with the largest eigenvalues. We project the data into a lower dimensional space by computing = E where E contains the chosen eigenvectors as column vectors.
The L _{1} median is simply the point θ, not necessarily a data point, which minimizes the Euclidean distance to all other points. In contrast to the simple componentwise mean or median for multivariate data, the L _{1} median gives a robust estimate of the center and is invariant to orthogonal linear transformations such as PCA. Efficient algorithms for its computation exist [27].
where c is a correction factor reflecting our assumption that the data is normally distributed and equals 1.4826. The projection pursuit approach to PCA also gives a robust estimate of the covariance matrix [26] which we use for the computation of the Mahalanobis distance.
After projecting the data into the subspace of the direction with the highest robust variance, projection pursuit searches for an approximation of the next eigenvector which is assumed to be orthogonal to the previous one. Obviously, one needs to decide on the dimensionality of the subspace the data is projected into. We always choose a number of components that would explain 90% of the variance, which is usually around 6. Consequently, we can now describe each LCMS map using a vector in 6 dimensions instead of 20, the covariance matrix Σ is invertible and we can search for suspicious maps by plotting the Mahalanobis distance for each map. Note that each dimension does not longer represent a unique descriptor, but a linear combination of all descriptors. But by inspecting the weights (also called loadings in PCA terms) we can gain insights into which descriptor contributed the most to a particular dimension. To summarize, our method aims at the identification of poor runs among a group of LCMS experiments. Our method assumes that the majority of the runs are good and that poor runs differ significantly in their quality descriptors from the rest. Consequently, what constitutes a good and a poor run is determined by the majority of the experiments. Since our method is robust, it suffices if at least half of the LCMS runs are good.
Implementation
The programs to compute the quality descriptors for an LCMS map were written using OpenMS [28], our software library for computational mass spectrometry. We performed the statistical analysis and visualization of the results using the mathematical software package R http://www.rproject.org.
Testing
We present three use cases to demonstrate how our approach can be applied to automatically detect outlier runs among a set of LCMS maps. We start with a set of simulated maps. The simulation allows us to probe the capabilities of our approach on a detailed level. The second and third use case comprise a tryptic digest of bovine serum albumin (BSA) and urine samples from a healthy volunteer recorded using LCESIMS.
Details of the full mass spectrometry analysis concept and chromatographic setup is described elsewhere [29]. In short, we employed a restricted access sulphonic acid strong cationexchanger (RAM SCX) (Merck KGaA, Darmstadt, Germany) column followed by a peptide transfer and solvent switch through trap column (Chromolith Guard, 5 × 4.6 mm Merck KGaA). We performed a subsequent analysis using an analytical column (Chromolith CapRod RP18e, 150 × 0.2 mm, Merck KGaA) by means of column switching to perform two dimensional orthogonal separations. Online mass spectrometric detection was performed using an Esquire Series 3000 PLUS ESI Iontrap mass spectrometer (Bruker Daltonics, Bremen, Germany).
The BSA digest was prepared according to a standard procedure (Proteo Extract Allinone Trypsin Digest Kit, Merck Chemicals Ltd, Nottingham, UK) with final concentration of 2 mg/ml and stored at 20°C. Urine samples were from healthy volunteers pooled and stored at 20°C. Before analysis samples were defrosted at room temperature for an hour, and then filtered through 0.22μ m pore size low protein binding membrane filters (Durapore, Millipore) and clear sample transferred to autosampler tubes. The prepared samples were stored in an autosampler at 4°C not longer than 24 h before injection.
Simulated LCMS runs
Tryptic Digest of Bovine Serum Albumin
This data set are replicate LCMS recordings of a tryptic digest of bovine serum albumin (BSA). The peptide mixture was measured in 43 replicates, details of sample preparation and LCMS analytics are described elsewhere [29]. Using the algorithms implemented in TOPP/OpenMS [28, 31], we performed peptide feature detection, alignment and statistical analysis for these runs. After manual inspection, we classified 5 runs as outlier for various reasons: 3 exhibited peptide feature intensities that deviated by a large extent from the other replicates. The remaining two revealed significant shifts in retention time as compared to the remaining runs. This fact made an alignment difficult and required manual finetuning of the alignment algorithm.
This is of course a timeconsuming procedure. It would be preferable to have a method that would allow us to remove outlier before feature detection and alignment is performed to save time and computer resources. Consequently, we applied our quality assessment method to these runs.
As we can see from Figure 7 (left), the combination of spectral quality descriptors, robust principal component analysis and Mahalanobis distance is accurate and classifies all outlier maps correctly. It also classifies some additional maps as mild outlier, namely the first LCMS map and the maps with number 36 and 37. Manual inspection of the PCA loadings revealed that their larger Mahalanobis distances are mainly due to a higher noise level in some spectra and minor fluctuations in the TIC. For illustration, Figure 7 (right) shows the TIC of four maps. Again, normal runs in the upper row are colored in green, outlier runs are colored in red. Both outlier maps exhibit TICs that contain a significant amount of noise peaks and clearly deviate from the two good runs in the top row.
Urine Samples of a Healthy Volunteer
This data set consists of 54 LCMS runs. A manual inspection indicated that five of these runs are clear outlier. Four of these five runs were measured after a break of several days which seems to have lead to disturbances in the chromatography and sample composition. The fifth outlier has a significantly elevated noise level.
Discussion
Quality assessment and control are common place in fields where many items are produced at a rapid pace and where quality is crucial: be it tools in factories or data in highthroughput biological experiments. The application of statistical quality assessment to quantitative mass spectrometry data is still an underexplored field. We expect that, with the growth of this field, this is going to change as much as it has changed for gene expression studies.
We presented a statistical method for outlier detection in large scale mass spectrometry studies. It is based on quality descriptors capturing different aspects of the quality of an LCMS map and on a statistically robust version of the Mahalanobis distance. We demonstrated that our approach works well with large data sets and can accurately detect poor LCMS runs. This is of special importance in highthroughput experiments, where many LCMS maps are generated and the time lacks to perform a manual quality assessment.
We evaluated our approach on simulated LCMS runs and two real data sets consisting of around 50 replicates each. In all cases, we were able to detect outlier data sets, outlier that were confirmed by manual validation. When dealing with outlier, we have two choices: to either remove them or to repeat the corresponding LCMS run. Clearly, this depends on the time and lab resources available. In each case, outlier detection and removal as early as possible during the data analysis will make the results more reliable and save a lot of time and computational effort.
Declarations
Acknowledgements
Part of this work was performed at the Walter and Eliza Hall Institute in Melbourne, Australia. We thank the members of the Bioinformatics Division, especially Terry Speed and Mark Robinson, for the fruitful discussions.
O.S.T. acknowledges funding by the International Max Planck Research School for Computational Biology and Scientific Computing (IMPRSCBSC) and by a grant of the German Federal Ministry for Education and Research (BMBF), grant no. 031369C.
We are indebted to Chris Bielow who implemented to noise estimation routine during a lab rotation project.
We also would like to thank the reviewers who helped to improve this paper with their suggestions.
Authors’ Affiliations
References
 Mann M, Aebersold R: Mass spectrometrybased proteomics. Nature 422. 2003, 422: 198207.View ArticleGoogle Scholar
 Cappadona S, Levander F, Jansson M, James P, Cerutti S, Pattini L: WaveletBased Method for Noise Characterization and Rejection in HighPerformance Liquid Chromatography Coupled to Mass Spectrometry. Analytical Chemistry. 2008Google Scholar
 Leptos KC, Sarracino DA, Jaffe JD, Krastins B, Church GM: MapQuant: OpenSource software for largescale protein quantification. Proteomics. 2006, 6 (6): 17701782.View ArticlePubMedGoogle Scholar
 SchulzTrieglaff O, Hussong R, Gröpl C, Hildebrandt A, Reinert K: A fast and accurate algorithm for the quantification of peptides from LCMS data. Research in Computational Molecular Biology, 11th Annual International Conference, RECOMB 2007, Oakland, CA, USA, April 21–25, 2007, Proceedings, of Lecture Notes in Computer Science. Edited by: Speed TP, Huang H. 2007, Springer, 4453: 473487.Google Scholar
 Mayr BM, Kohlbacher O, Reinert K, Sturm M, Gröpl C, Lange E, Klein C, Huber C: Absolute Myoglobin Quantitation in Serum by Combining TwoDimensional Liquid ChromatographyElectrospray Ionization Mass Spectrometry and Novel Data Analysis Algorithms. J Proteome Res. 2006, 5: 414421.View ArticlePubMedGoogle Scholar
 Bern M, Goldberg D, McDonald WH, Yates I, John R: Automatic Quality Assessment of Peptide Tandem Mass Spectra. Bioinformatics. 2004, 20: i4954.View ArticlePubMedGoogle Scholar
 Choo K, Tham W: Tandem mass spectrometry data quality assessment by selfconvolution. BMC Bioinformatics. 2007, 8: 352View ArticlePubMedPubMed CentralGoogle Scholar
 Na S, Paek E: Quality Assessment of Tandem Mass Spectra Based on Cumulative Intensity Normalization. Journal of Proteome Research. 2006, 5 (12): 32413248.View ArticlePubMedGoogle Scholar
 Nesvizhskii AI, Roos FF, Grossmann J, Vogelzang M, Eddes JS, Gruissem W, Baginsky S, Aebersold R: Dynamic Spectrum Quality Assessment and Iterative Computational Analysis of Shotgun Proteomic Data: Toward More Efficient Identification of Posttranslational Modifications, Sequence Polymorphisms, and Novel Peptides. Mol Cell Proteomics. 2006, 5 (4): 652670.View ArticlePubMedGoogle Scholar
 Moore RE, Young MK, Lee TD: Method for screening peptide fragment ion mass spectra prior to database searching. Journal of the American Society for Mass Spectrometry. 2000, 11 (5): 422426.View ArticlePubMedGoogle Scholar
 Xu M, Geer L, Bryant S, Roth J, Kowalak J, Maynard D, Markey S: Assessing Data Quality of Peptide Mass Spectra Obtained by Quadrupole Ion Trap Mass Spectrometry. Journal of Proteome Research. 2005, 4 (2): 300305.View ArticlePubMedGoogle Scholar
 Flikka K, Martens L, Vandekerckhove J, Gevaert K, Eidhammer I: Improving the reliability and throughput of mass spectrometrybased proteomics by spectrum quality filtering. PROTEOMICS. 2006, 6 (7): 20862094.View ArticlePubMedGoogle Scholar
 Coombes KR, Fritsche J, Herbert A, Clarke C, Chen Jn, Baggerly KA, Morris JS, Xiao Lc, Hung MC, Kuerer HM: Quality Control and Peak Finding for Proteomics Data Collected from Nipple Aspirate Fluid by SurfaceEnhanced Laser Desorption and Ionization. Clin Chem. 2003, 49 (10): 16151623.View ArticlePubMedGoogle Scholar
 Harezlak J, Wang M, Christiani D, Lin X: Quantitative qualityassessment techniques to compare fractionation and depletion methods in SELDITOF mass spectrometry experiments. Bioinformatics. 2007, 23 (18): 24412448.View ArticlePubMedGoogle Scholar
 Prakash A, Piening B, Whiteaker J, Zhang H, Shaffer SA, Martin D, Hohmann L, Cooke K, Olson JM, Hansen S, Flory MR, Lee H, Watts J, Goodlett DR, Aebersold R, Paulovich A, Schwikowski B: Assessing bias in experiment design for largescale mass spectrometrybased quantitative proteomics. Mol Cell Proteomics. 2007, M600470MCP200.Google Scholar
 Whistler T, Rollin D, Vernon S: A method for improving SELDITOF mass spectrometry data quality. Proteome Science. 2007, 5: 14View ArticlePubMedPubMed CentralGoogle Scholar
 Listgarten J, Emili A: Statistical and computational methods for comparative proteomic profiling using liquid chromatographytandem mass spectrometry. Mol Cell Proteomics. 2005, 4 (4): 419434.View ArticlePubMedGoogle Scholar
 Stead DA, Paton NW, Missier P, Embury SM, Hedeler C, Jin B, Brown AJP, Preece A: Information quality in proteomics. Brief Bioinform. 2008, 9 (2): 174188.View ArticlePubMedGoogle Scholar
 Brown CS, Goodwin PC, Sorger PK: Image metrics in the statistical analysis of DNA microarray data. Proceedings of the National Academy of Sciences. 2001, 98 (16): 89448949.View ArticleGoogle Scholar
 Cohen Freue GV, Hollander Z, Shen E, Zamar RH, Balshaw R, Scherer A, McManus B, Keown P, McMaster WR, Ng RT: MDQC: a new quality assessment method for microarrays based on quality control reports. Bioinformatics. 2007, 23 (23): 31623169.View ArticlePubMedGoogle Scholar
 Model F, Konig T, Piepenbrock C, Adorjan P: Statistical process control for large scale microarray experiments. Bioinformatics. 2002, 18: S155163.View ArticlePubMedGoogle Scholar
 Windig W, Phalp J, Payne A: A Noise and Background Reduction Method for Component Detection in Liquid Chromatography/Mass Spectrometry. Analytical Chemistry. 1996, 68: 36023603.View ArticleGoogle Scholar
 Mahalanobis P: On the generalized distance in statistics. Proceedings of the National Institute of Science of India. 1936, 12: 4955.Google Scholar
 Fraser A, Hengartner N, Vixie K, Wohlberg B: Incorporating invariants in Mahalanobis distance based classifiers: application to face recognition. Proceedings of the International Joint Conference on Neural Networks. 2003, 4: 31183123.Google Scholar
 Pearson K: On Lines and Planes of Closest Fit to Systems of Points in Space. Philosophical Magazine. 1901, 2: 559572.View ArticleGoogle Scholar
 Croux C, RuizGazen A: A fast algorithm for robust principal components based on projection pursuit. COMPSTAT: Proceedings in Computational Statistics. Edited by: Prat A. 1996, PhysicaVerlag, 211216.View ArticleGoogle Scholar
 Hössjer O, Croux C: Generalizing univariate signed rank statistics for testing and estimating a multivariate location parameter. Journal of Nonparametric Statistics. 1995, 4 (3): 293308.View ArticleGoogle Scholar
 Sturm M, Bertsch A, Groepl C, Hildebrandt A, Hussong R, Lange E, Pfeifer N, SchulzTrieglaff O, Zerck A, Reinert K, Kohlbacher O: OpenMS – An opensource software framework for mass spectrometry. BMC Bioinformatics. 2008, 9:Google Scholar
 Machtejevas E, Andrecht S, Lubda D, Unger KK: Monolithic silica columns of various format in automated sample cleanup/multidimensional liquid chromatography/mass spectrometry for peptidomics. Journal of Chromatography A. 2007, 1144: 97101.View ArticlePubMedGoogle Scholar
 SchulzTrieglaff O, Pfeifer N, Groepl C, Kohlbacher O, Reinert K: LCMSsim: a simulation software for Mas SpectrometryLiquid Chromatography Experiments. BMC Bioinformatics. 2008, 9: 423View ArticlePubMedPubMed CentralGoogle Scholar
 Kohlbacher O, Reinert K, Gröpl C, Lange E, Pfeifer N, SchulzTrieglaff O, Sturm M: TOPPthe OpenMS proteomics pipeline. Bioinformatics. 2007, 23 (2): e191197.View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.