- Open Access
- Open Peer Review
Statistical quality assessment and outlier detection for liquid chromatography-mass spectrometry experiments
© Schulz-Trieglaff et al; licensee BioMed Central Ltd. 2009
- Received: 08 September 2008
- Accepted: 07 April 2009
- Published: 07 April 2009
Quality assessment methods, that are common place in engineering and industrial production, are not widely spread in large-scale proteomics experiments. But modern technologies such as Multi-Dimensional Liquid Chromatography coupled to Mass Spectrometry (LC-MS) produce large quantities of proteomic data. These data are prone to measurement errors and reproducibility problems such that an automatic quality assessment and control become increasingly important.
We propose a methodology to assess the quality and reproducibility of data generated in quantitative LC-MS experiments. We introduce quality descriptors that capture different aspects of the quality and reproducibility of LC-MS data sets. Our method is based on the Mahalanobis distance and a robust Principal Component Analysis.
We evaluate our approach on several data sets of different complexities and show that we are able to precisely detect LC-MS runs of poor signal quality in large-scale studies.
- Mahalanobis Distance
- Outlier Detection
- Quality Descriptor
- Robust Principal Component Analysis
- Quality Assessment Method
This work addresses the problem of quality assessment in large scale LC-MS studies. So far, this is a relatively unexplored topic. There are some publications on the quality assessment of MS fragmentation spectra [6–12]. But their focus is different: the aim of these methods is to detect and remove low quality spectra from a LC-MS/MS run. The rationale is that these spectra would not be identified by identification algorithms anyway and that their removal will lead to a significant speed-up of the data analysis. To give some examples, Bern et al.  pioneered the quality filtering of MS/MS spectra. They used various descriptors to describe an MS/MS such as the number of isotopic peaks or the number of peaks that could be clearly attributed to b or y fragments. Bern et al. applied support vector machines and linear discriminants and removed MS/MS spectra classified as poor before the database search. Since then, several works tried to improve on this approach. Notable developments include the application of self-convolution for MS/MS quality assessment  or an iterative strategy to detect high-quality spectra that could not be identified in the database search .
Our work, however, addresses a different problem. In a quantitative LC-MS experiment, the aim is to obtain abundance estimates of all peptides and proteins contained in a sample. Fragmentation and sequencing of the peptides using MS/MS is usually an additional experimental step and is not the focus of this work. Controlling data quality in high-throughput experiments as early possible is important since numerous problems can affect the quality of an LC-MS run. Among these are instabilities of the chromatography, degradation of the peptides or artifacts in the mass spectra caused by the LC mobile phase or buffer molecules.
Only little work on the quality and reproducibility assessment of mass spectrometry data has been published so far [13–16]. Prakash et al.  use a distance measure computed by an alignment algorithm to highlight problems of reproducibility in several mass spectrometry studies. Their method is successful in visualizing the time order in which the LC-MS runs were performed and reveals pattern caused by different columns or instrument settings used. But their method does not provide direct information on outlier runs and when to discard them. Whistler et al. , Coombes et al.  and Harezlak et al.  address the problem of noise removal and quality assessment but focus solely on SELDI-TOF spectra which are less complex than LC-MS data.
The analysis of LC-MS data is a sophisticated task and requires several computational steps such as denoising, peptide feature detection, alignment and statistical analysis . After differentially expressed peptide feature have been found, they need to be sequenced using MS/MS-based identification and have their abundances and sequences mapped to the parent protein. These are general steps which usually have to be adopted depending on the aim of the study. But each of these computational steps has its own difficulties and a typical workflow is complex and error prone . It is therefore desirable to identify poor LC-MS runs as early as possible. This would allow us to either exclude these runs from the further analysis, to repeat them or at least to downweight these measurements to reflect our reduced confidence. In contrast to mass spectrometry-based proteomics, quality assessment and control methods are more common in gene expression studies [19–21]. Brown et al.  applied image metrics to find poor quality microarrays in a batch of experiments. Cohen et al.  applied the Mahalanobis distance to detect outlier runs in large-scale gene expression studies. Finally, Model et al.  borrowed methods from the field of statistical process control to detect critical differences among replicate microarray measurements.
In this work, we investigate how classical methods from outlier detection and quality control can be extended and applied to LC-MS data. Our approach is based on sound statistical principles and we demonstrate that we can precisely detect dubious LC-MS runs in large scale studies.
This work addresses the quality assessment of raw LC-MS maps. By "raw", we mean the unprocessed spectra before any noise filtering, peak detection or centroiding has been performed. Most statistical methods for quality assessment expect that each item is described by one (univariate) or several (multivariate) variables. For LC-MS maps, it is not clear what suitable variables could be. One straightforward approach is to describe an LC-MS map by all its data points. But the number of data points (not peaks) in an unprocessed LC-MS map is huge, easily several millions of points. Second, many of the raw data points in a map will be caused by noise and might distort the results of an automatic outlier detection.
Consequently, we devised a list of quality descriptors to describe an LC-MS map. Some of these descriptors were taken from the literature, where they have been shown to be useful criteria for spectra mining and filtering tasks, others are new. Using these quality descriptors, we can now describe each map as a vector = (x 1, x 2,...,x n ) T and apply statistical methods to detect runs of poor quality.
We emphasize that we define quality in terms of reproducibility, i.e., an LC-MS map is of poor quality if its quality descriptors differ significantly from the descriptors of the other maps. It is thus important to compare only maps that represent the same subsets of a sample. As an example, in a multidimensional chromatography experiment, we can only compare LC-MS recordings of the same chromatography steps. It does not make sense to compare LC-MS maps obtained from different salt pulses, to give an example. On the other hand, even for time series or differential quantitative measurements, we would still expect the key characteristics of the LC-MS maps, such as noise level or chromatography, to remain stable during the study.
We use a set of quality descriptors to an LC-MS map. These descriptors capture various aspects of the map, such as peaks and noise level of the spectra, as well shape and reproducibility of the TIC. The descriptors are:
Median of the Euclidean distances D E (s, s') = between baseline-removed spectrum s' and original spectrum s for all spectra. The baseline or background noise in a mass spectrum is usually caused by molecules from the mobile phase of the column. Spectra with a large amount of background noise are difficult to analyze automatically. This rationale of this descriptor is that spectra with a strong baseline signal will be very different after baseline removal and thus have a large distance D E . We perform the baseline removal using a TopHat-Filter which is a standard method for this task.
Median of the Euclidean distances D E between smoothed mass spectrum and original spectrum for all spectra. Consequently, a noisy spectrum will exhibit a large distance D E to its smoothed version. We performed the smoothing using a Gaussian Filter with a kernel width of ≈2.0, depending on the mass spectral peak width. The first two quality descriptors were firstly suggested by Windig et al.  but they applied them to chromatograms to remove noisy mass traces from the LC-MS map.
The Xrea value. This measure for the quality of a mass spectrum was already proposed by Na et al. . They developed it to filter MS/MS spectra before submitting them to a sequence database search. We will show that the Xrea criterion can equally be applied to MS spectra. The Xrea value is based on a cumulative intensity normalization. First, we normalize the spectral intensities by dividing by the total intensity. The cumulative normalized intensity of each data point in the spectrum is defined as the sum of the normalized intensities of all points with intensities smaller than or equal to the intensity of this point. Accordingly, the cumulative normalized intensity of the n th highest data point x is given by:
where I(x) is the intensity of point x and Rank(x) represents the order of points if sorted by intensity in descending order. That is, the most intense point has rank 1, the next rank 2 etc. The nominator is divided by the sum of all intensities. This normalization is relatively stable and less dependent on the most intense peak which is a disadvantage of a normalization by the intensity sum. But in contrast to other methods such as a rank-based normalization, it does not discard the entire information contained in the spectral intensities.
where α is a correction term to account for cases in which the highest point is significantly larger than the rest. Following , we set α to the relative intensity of the most abundant data point.
Median of the number of data points with intensity ≥ 0 in each scan. This descriptor accounts for variations in the number of recorded intensities.
Summary statistics for m/z, intensity and signal-to-noise ratio of all scans. The summary statistics consist of minimum, maximum, mean and median. We estimate the noise level using an iterative sliding window approach. We move a window of size 25 Th across each spectrum and calculate a noise level for each window. We compute mean and standard deviation σ of all intensities in the current window and discard all points with an intensity higher than 3 × σ. We repeat this procedure and estimate the local noise level as the medium intensity after three iterations.
Skewness and kurtosis of the TIC. For good and reproducible LC-MS runs, the TICs should exhibit similar shapes. Skewness and kurtosis describe the asymmetry and peakedness of a distribution, respectively. The skewness is the third standardized moment of a distribution. For a sample of size n, it is defined as skew = . The skewness is positive for distribution with a tail to the right, and negative for left-tailed distributions as illustrated in Figure 3. The kurtosis is defined as kurtosis = . It measures how sharply peaked a distribution is, relative to its width. We subtract 3 to achieve a kurtosis of zero for the Gaussian distribution. A distribution with positive kurtosis has more probability mass around the mean than the Gaussian distribution whereas a distribution with negative kurtosis has less probability mass around the mean and is therefore less peak-shaped. We give an example in Figure 4.
Minimum and maximum intensity of the TIC. We store the maximum and minimum intensity over the whole LC-MS run.
Using the descriptors described above, we can now represent an LC-MS map as a vector where each entry of this vector represents one of the quality descriptor described above.
Outlier Detection using the Mahalanobis Distance
To decide whether an LC-MS map is an outlier compared to the rest of the measurements, we use the Mahalanobis distance . It has previously been applied in numerous tasks, such as the quality assessment of microarray experiments  or face recognition . It is related to the Euclidean distance but differs in the fact that each dimension is weighted by its variation.
Using the Mahalanobis distance, we can therefore measure the distance of each LC-MS run, described by the vector of its quality descriptors , to the distribution of all other n runs, characterized by their mean vector and covariance matrix Σ. The Mahalanobis distance of a vector with dimension p follows a χ 2 distribution with p degrees of freedom. This allows us to define cutoffs for suspiciously large distances for a given confidence level α as in any statistical test. Note that the Mahalanobis distance is equal to the Euclidean distance if the covariance matrix is the identity matrix. In this case, each dimension has unit variance and each pair of dimensions is uncorrelated.
Note that if we use this criterion for outlier detection, we effectively classify a map as outlier if its vector of quality descriptors differ by a large extent from the rest. This is reasonable since even for non-replicate LC-MS runs, we would expect most quality descriptors to be similar.
However, this approach suffers from two drawbacks: first, for less LC-MS runs than descriptors (n <p), the covariance matrix Σ is singular and cannot be inverted. Second, outlier in the data might distort our estimates of and Σ and lead to incorrect estimates of the distance. We solve the first problem by applying a Principal Component Analysis to reduce the dimensionality of our data to a dimension p' ≪ p but try retain the essential information at the same time. We solve the second problem by using robust estimators for location and scale.
Robust Principal Component Analysis
Principal Component Analysis (PCA)  is a method for dimensionality reduction and feature extraction. More formally, our aim is to represent a vector by a lower dimensional representation given by M = where M is a matrix with dimension dim(y) × dim(x) where dim(y) <dim(x). M represents a projection from the higher dimensional space of to a lower dimensional space of . In the case of PCA, M is an orthogonal linear projection.
The standard PCA works as follows: the data, in our case the vectors of quality descriptors for each map, are stored in a n × p matrix X with a row for each of the n maps and a column for each of the p quality descriptors. This matrix is centered by subtracting the column-wise mean. The covariance matrix Σ of the data is given by X T X. It contains the variance of each dimension p on its diagonal and the covariances in the remaining entries. We compute the eigenvectors of the covariance matrix and choose as coordinates the eigenvectors with the largest eigenvalues. We project the data into a lower dimensional space by computing = E where E contains the chosen eigenvectors as column vectors.
The L 1 median is simply the point θ, not necessarily a data point, which minimizes the Euclidean distance to all other points. In contrast to the simple component-wise mean or median for multivariate data, the L 1 median gives a robust estimate of the center and is invariant to orthogonal linear transformations such as PCA. Efficient algorithms for its computation exist .
where c is a correction factor reflecting our assumption that the data is normally distributed and equals 1.4826. The projection pursuit approach to PCA also gives a robust estimate of the covariance matrix  which we use for the computation of the Mahalanobis distance.
After projecting the data into the subspace of the direction with the highest robust variance, projection pursuit searches for an approximation of the next eigenvector which is assumed to be orthogonal to the previous one. Obviously, one needs to decide on the dimensionality of the subspace the data is projected into. We always choose a number of components that would explain 90% of the variance, which is usually around 6. Consequently, we can now describe each LC-MS map using a vector in 6 dimensions instead of 20, the covariance matrix Σ is invertible and we can search for suspicious maps by plotting the Mahalanobis distance for each map. Note that each dimension does not longer represent a unique descriptor, but a linear combination of all descriptors. But by inspecting the weights (also called loadings in PCA terms) we can gain insights into which descriptor contributed the most to a particular dimension. To summarize, our method aims at the identification of poor runs among a group of LC-MS experiments. Our method assumes that the majority of the runs are good and that poor runs differ significantly in their quality descriptors from the rest. Consequently, what constitutes a good and a poor run is determined by the majority of the experiments. Since our method is robust, it suffices if at least half of the LC-MS runs are good.
The programs to compute the quality descriptors for an LC-MS map were written using OpenMS , our software library for computational mass spectrometry. We performed the statistical analysis and visualization of the results using the mathematical software package R http://www.r-project.org.
We present three use cases to demonstrate how our approach can be applied to automatically detect outlier runs among a set of LC-MS maps. We start with a set of simulated maps. The simulation allows us to probe the capabilities of our approach on a detailed level. The second and third use case comprise a tryptic digest of bovine serum albumin (BSA) and urine samples from a healthy volunteer recorded using LC-ESI-MS.
Details of the full mass spectrometry analysis concept and chromatographic setup is described elsewhere . In short, we employed a restricted access sulphonic acid strong cation-exchanger (RAM -SCX) (Merck KGaA, Darmstadt, Germany) column followed by a peptide transfer and solvent switch through trap column (Chromolith Guard, 5 × 4.6 mm Merck KGaA). We performed a subsequent analysis using an analytical column (Chromolith CapRod RP18e, 150 × 0.2 mm, Merck KGaA) by means of column switching to perform two dimensional orthogonal separations. On-line mass spectrometric detection was performed using an Esquire Series 3000 PLUS ESI Iontrap mass spectrometer (Bruker Daltonics, Bremen, Germany).
The BSA digest was prepared according to a standard procedure (Proteo Extract All-in-one Trypsin Digest Kit, Merck Chemicals Ltd, Nottingham, UK) with final concentration of 2 mg/ml and stored at 20°C. Urine samples were from healthy volunteers pooled and stored at 20°C. Before analysis samples were defrosted at room temperature for an hour, and then filtered through 0.22μ m pore size low protein binding membrane filters (Durapore, Millipore) and clear sample transferred to autosampler tubes. The prepared samples were stored in an autosampler at 4°C not longer than 24 h before injection.
Simulated LC-MS runs
Tryptic Digest of Bovine Serum Albumin
This data set are replicate LC-MS recordings of a tryptic digest of bovine serum albumin (BSA). The peptide mixture was measured in 43 replicates, details of sample preparation and LC-MS analytics are described elsewhere . Using the algorithms implemented in TOPP/OpenMS [28, 31], we performed peptide feature detection, alignment and statistical analysis for these runs. After manual inspection, we classified 5 runs as outlier for various reasons: 3 exhibited peptide feature intensities that deviated by a large extent from the other replicates. The remaining two revealed significant shifts in retention time as compared to the remaining runs. This fact made an alignment difficult and required manual fine-tuning of the alignment algorithm.
This is of course a time-consuming procedure. It would be preferable to have a method that would allow us to remove outlier before feature detection and alignment is performed to save time and computer resources. Consequently, we applied our quality assessment method to these runs.
As we can see from Figure 7 (left), the combination of spectral quality descriptors, robust principal component analysis and Mahalanobis distance is accurate and classifies all outlier maps correctly. It also classifies some additional maps as mild outlier, namely the first LC-MS map and the maps with number 36 and 37. Manual inspection of the PCA loadings revealed that their larger Mahalanobis distances are mainly due to a higher noise level in some spectra and minor fluctuations in the TIC. For illustration, Figure 7 (right) shows the TIC of four maps. Again, normal runs in the upper row are colored in green, outlier runs are colored in red. Both outlier maps exhibit TICs that contain a significant amount of noise peaks and clearly deviate from the two good runs in the top row.
Urine Samples of a Healthy Volunteer
This data set consists of 54 LC-MS runs. A manual inspection indicated that five of these runs are clear outlier. Four of these five runs were measured after a break of several days which seems to have lead to disturbances in the chromatography and sample composition. The fifth outlier has a significantly elevated noise level.
Quality assessment and control are common place in fields where many items are produced at a rapid pace and where quality is crucial: be it tools in factories or data in high-throughput biological experiments. The application of statistical quality assessment to quantitative mass spectrometry data is still an underexplored field. We expect that, with the growth of this field, this is going to change as much as it has changed for gene expression studies.
We presented a statistical method for outlier detection in large scale mass spectrometry studies. It is based on quality descriptors capturing different aspects of the quality of an LC-MS map and on a statistically robust version of the Mahalanobis distance. We demonstrated that our approach works well with large data sets and can accurately detect poor LC-MS runs. This is of special importance in high-throughput experiments, where many LC-MS maps are generated and the time lacks to perform a manual quality assessment.
We evaluated our approach on simulated LC-MS runs and two real data sets consisting of around 50 replicates each. In all cases, we were able to detect outlier data sets, outlier that were confirmed by manual validation. When dealing with outlier, we have two choices: to either remove them or to repeat the corresponding LC-MS run. Clearly, this depends on the time and lab resources available. In each case, outlier detection and removal as early as possible during the data analysis will make the results more reliable and save a lot of time and computational effort.
Part of this work was performed at the Walter and Eliza Hall Institute in Melbourne, Australia. We thank the members of the Bioinformatics Division, especially Terry Speed and Mark Robinson, for the fruitful discussions.
O.S.-T. acknowledges funding by the International Max Planck Research School for Computational Biology and Scientific Computing (IMPRS-CBSC) and by a grant of the German Federal Ministry for Education and Research (BMBF), grant no. 031369C.
We are indebted to Chris Bielow who implemented to noise estimation routine during a lab rotation project.
We also would like to thank the reviewers who helped to improve this paper with their suggestions.
- Mann M, Aebersold R: Mass spectrometry-based proteomics. Nature 422. 2003, 422: 198-207.View ArticleGoogle Scholar
- Cappadona S, Levander F, Jansson M, James P, Cerutti S, Pattini L: Wavelet-Based Method for Noise Characterization and Rejection in High-Performance Liquid Chromatography Coupled to Mass Spectrometry. Analytical Chemistry. 2008Google Scholar
- Leptos KC, Sarracino DA, Jaffe JD, Krastins B, Church GM: MapQuant: Open-Source software for large-scale protein quantification. Proteomics. 2006, 6 (6): 1770-1782.View ArticlePubMedGoogle Scholar
- Schulz-Trieglaff O, Hussong R, Gröpl C, Hildebrandt A, Reinert K: A fast and accurate algorithm for the quantification of peptides from LC-MS data. Research in Computational Molecular Biology, 11th Annual International Conference, RECOMB 2007, Oakland, CA, USA, April 21–25, 2007, Proceedings, of Lecture Notes in Computer Science. Edited by: Speed TP, Huang H. 2007, Springer, 4453: 473-487.Google Scholar
- Mayr BM, Kohlbacher O, Reinert K, Sturm M, Gröpl C, Lange E, Klein C, Huber C: Absolute Myoglobin Quantitation in Serum by Combining Two-Dimensional Liquid Chromatography-Electrospray Ionization Mass Spectrometry and Novel Data Analysis Algorithms. J Proteome Res. 2006, 5: 414-421.View ArticlePubMedGoogle Scholar
- Bern M, Goldberg D, McDonald WH, Yates I, John R: Automatic Quality Assessment of Peptide Tandem Mass Spectra. Bioinformatics. 2004, 20: i49-54.View ArticlePubMedGoogle Scholar
- Choo K, Tham W: Tandem mass spectrometry data quality assessment by self-convolution. BMC Bioinformatics. 2007, 8: 352-View ArticlePubMedPubMed CentralGoogle Scholar
- Na S, Paek E: Quality Assessment of Tandem Mass Spectra Based on Cumulative Intensity Normalization. Journal of Proteome Research. 2006, 5 (12): 3241-3248.View ArticlePubMedGoogle Scholar
- Nesvizhskii AI, Roos FF, Grossmann J, Vogelzang M, Eddes JS, Gruissem W, Baginsky S, Aebersold R: Dynamic Spectrum Quality Assessment and Iterative Computational Analysis of Shotgun Proteomic Data: Toward More Efficient Identification of Post-translational Modifications, Sequence Polymorphisms, and Novel Peptides. Mol Cell Proteomics. 2006, 5 (4): 652-670.View ArticlePubMedGoogle Scholar
- Moore RE, Young MK, Lee TD: Method for screening peptide fragment ion mass spectra prior to database searching. Journal of the American Society for Mass Spectrometry. 2000, 11 (5): 422-426.View ArticlePubMedGoogle Scholar
- Xu M, Geer L, Bryant S, Roth J, Kowalak J, Maynard D, Markey S: Assessing Data Quality of Peptide Mass Spectra Obtained by Quadrupole Ion Trap Mass Spectrometry. Journal of Proteome Research. 2005, 4 (2): 300-305.View ArticlePubMedGoogle Scholar
- Flikka K, Martens L, Vandekerckhove J, Gevaert K, Eidhammer I: Improving the reliability and throughput of mass spectrometry-based proteomics by spectrum quality filtering. PROTEOMICS. 2006, 6 (7): 2086-2094.View ArticlePubMedGoogle Scholar
- Coombes KR, Fritsche J, Herbert A, Clarke C, Chen Jn, Baggerly KA, Morris JS, Xiao Lc, Hung MC, Kuerer HM: Quality Control and Peak Finding for Proteomics Data Collected from Nipple Aspirate Fluid by Surface-Enhanced Laser Desorption and Ionization. Clin Chem. 2003, 49 (10): 1615-1623.View ArticlePubMedGoogle Scholar
- Harezlak J, Wang M, Christiani D, Lin X: Quantitative quality-assessment techniques to compare fractionation and depletion methods in SELDI-TOF mass spectrometry experiments. Bioinformatics. 2007, 23 (18): 2441-2448.View ArticlePubMedGoogle Scholar
- Prakash A, Piening B, Whiteaker J, Zhang H, Shaffer SA, Martin D, Hohmann L, Cooke K, Olson JM, Hansen S, Flory MR, Lee H, Watts J, Goodlett DR, Aebersold R, Paulovich A, Schwikowski B: Assessing bias in experiment design for large-scale mass spectrometry-based quantitative proteomics. Mol Cell Proteomics. 2007, M600470-MCP200.Google Scholar
- Whistler T, Rollin D, Vernon S: A method for improving SELDI-TOF mass spectrometry data quality. Proteome Science. 2007, 5: 14-View ArticlePubMedPubMed CentralGoogle Scholar
- Listgarten J, Emili A: Statistical and computational methods for comparative proteomic profiling using liquid chromatography-tandem mass spectrometry. Mol Cell Proteomics. 2005, 4 (4): 419-434.View ArticlePubMedGoogle Scholar
- Stead DA, Paton NW, Missier P, Embury SM, Hedeler C, Jin B, Brown AJP, Preece A: Information quality in proteomics. Brief Bioinform. 2008, 9 (2): 174-188.View ArticlePubMedGoogle Scholar
- Brown CS, Goodwin PC, Sorger PK: Image metrics in the statistical analysis of DNA microarray data. Proceedings of the National Academy of Sciences. 2001, 98 (16): 8944-8949.View ArticleGoogle Scholar
- Cohen Freue GV, Hollander Z, Shen E, Zamar RH, Balshaw R, Scherer A, McManus B, Keown P, McMaster WR, Ng RT: MDQC: a new quality assessment method for microarrays based on quality control reports. Bioinformatics. 2007, 23 (23): 3162-3169.View ArticlePubMedGoogle Scholar
- Model F, Konig T, Piepenbrock C, Adorjan P: Statistical process control for large scale microarray experiments. Bioinformatics. 2002, 18: S155-163.View ArticlePubMedGoogle Scholar
- Windig W, Phalp J, Payne A: A Noise and Background Reduction Method for Component Detection in Liquid Chromatography/Mass Spectrometry. Analytical Chemistry. 1996, 68: 3602-3603.View ArticleGoogle Scholar
- Mahalanobis P: On the generalized distance in statistics. Proceedings of the National Institute of Science of India. 1936, 12: 49-55.Google Scholar
- Fraser A, Hengartner N, Vixie K, Wohlberg B: Incorporating invariants in Mahalanobis distance based classifiers: application to face recognition. Proceedings of the International Joint Conference on Neural Networks. 2003, 4: 3118-3123.Google Scholar
- Pearson K: On Lines and Planes of Closest Fit to Systems of Points in Space. Philosophical Magazine. 1901, 2: 559-572.View ArticleGoogle Scholar
- Croux C, Ruiz-Gazen A: A fast algorithm for robust principal components based on projection pursuit. COMPSTAT: Proceedings in Computational Statistics. Edited by: Prat A. 1996, Physica-Verlag, 211-216.View ArticleGoogle Scholar
- Hössjer O, Croux C: Generalizing univariate signed rank statistics for testing and estimating a multivariate location parameter. Journal of Nonparametric Statistics. 1995, 4 (3): 293-308.View ArticleGoogle Scholar
- Sturm M, Bertsch A, Groepl C, Hildebrandt A, Hussong R, Lange E, Pfeifer N, Schulz-Trieglaff O, Zerck A, Reinert K, Kohlbacher O: OpenMS – An open-source software framework for mass spectrometry. BMC Bioinformatics. 2008, 9:Google Scholar
- Machtejevas E, Andrecht S, Lubda D, Unger KK: Monolithic silica columns of various format in automated sample clean-up/multidimensional liquid chromatography/mass spectrometry for peptidomics. Journal of Chromatography A. 2007, 1144: 97-101.View ArticlePubMedGoogle Scholar
- Schulz-Trieglaff O, Pfeifer N, Groepl C, Kohlbacher O, Reinert K: LC-MSsim: a simulation software for Mas Spectrometry-Liquid Chromatography Experiments. BMC Bioinformatics. 2008, 9: 423-View ArticlePubMedPubMed CentralGoogle Scholar
- Kohlbacher O, Reinert K, Gröpl C, Lange E, Pfeifer N, Schulz-Trieglaff O, Sturm M: TOPP-the OpenMS proteomics pipeline. Bioinformatics. 2007, 23 (2): e191-197.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.