Blood plasma sample preparation
Venous blood was collected from three volunteers (33-, 24-, and 23-year-old males) into EDTA Vacutainer plasma tubes (BD, USA). Blood samples were processed according to the manufacturer’s instructions. The resultant blood plasma was stored at −80 °C until analysis. The analyzed samples were subjected to one freeze/thaw cycle. Plasma (10 μL) was mixed with 10 μL of water (LiChrosolv; Merck KGaA, Darmstadt, Germany) and 80 μL of methanol (Fluka, Munich, Germany). Next, after incubation at room temperature for 15 min, the samples were centrifuged at 13,000 × g (MiniSpin plus centrifuge; Eppendorf AG, Hamburg, Germany) for 10 min. Supernatants were then transferred to clean plastic Eppendorf tubes, and fifty volumes of methanol containing 0.1% formic acid (Fluka) was added to each tube. The resultant solution was subjected to mass spectrometric analysis. The study design was approved by the relevant ethical review committee.
To confirm that the volunteers did not have a sufficiently distorted blood plasma composition, basic biochemical and blood parameters of these volunteers were measured using routine automatic analyzers. Most of the parameter values fit within normal ranges known in the field of clinical laboratory practice (see Additional file 1: Table S1).
Mass spectrometry analysis
Samples were analyzed by direct infusion mass spectrometry with a maXis hybrid quadrupole time-of-flight mass spectrometer (Bruker Daltonics, Billerica, MA, USA), a micrOTOF-Q hybrid quadrupole time-of-flight mass spectrometer (Bruker Daltonics, Billerica, MA, USA), an OrbiTrap Elite mass spectrometer (Thermo Scientific, USA), a Fourier transform ion cyclotron resonance mass spectrometer (Apex Ultra, Bruker Daltonics, USA), and with an IFunnel Q-ToF mass spectrometer 6550 (Agilent Technologies, USA) equipped with an electrospray ion sources. Details are described in the supplementary material [see Additional file 2]. The resultant metabolite ion masses were pooled and processed using Matlab version R2010a (MathWorks, Natick, MA, USA). This and all other calculations were performed using Matlab software.
Mass spectra standardization by the SantaOmics algorithm
A fragment width of m/z 50 was selected as the start of the mass spectrum (i.e., at the edge with the lowest m/z values). The mass spectrometric peaks located inside the fragment were arranged according to their decreasing mass peak intensities. The curve approximating the intensity values was built using the fit function (here and below all mentioned mathematical functions are from Matlab software) with the power equation (y = ax
b
+ c) as the approximation type. The knee point of this curve was established by finding the first and second derivatives (the source code is presented in the data repository). The knee point determined the normalization value (see Fig. 1) for the m/z value in the middle of the selected fragment. Iteratively, until the entire mass spectrum was processed, the fragment was shifted by m/z 1 and all calculations were repeated. The calculated normalization points were approximated by the curve (called the normalization curve) using the fit function (smoothing splaine as the approximation type). In order to obtain a mass spectrum in a dimensionless instrument-independent scale, the intensity of each mass peak was divided by the value of the normalization curve in the m/z point of the corresponding peak.
Test for knee point stability
The mass spectrometric data for the human blood plasma metabolome at m/z 225–275 was taken as an example of the omics data, and the mass peak intensities were fluctuated (by iterations of 10) by random noise (by the use of the function normrnd) in order to provide a peak intensity CV equal to 46%, the average CV for biological variation of the metabolites in blood plasma [6]. Simultaneously, the CV for the knee point was measured with the aim to estimate its stability.
Tests for the SantaOmics algorithm
The first test was related to the capacity of the SantaOmics algorithm to correct the undesired variability in the mass spectra. The mass list from the mass spectrum of sample #3, obtained at a ‘low range’ mode of detection, was sufficiently distorted. The mass peak intensities were multiplied (10×), linearly increased (from 1× in the low m/z area, to 10× in the high m/z area), or nonlinearly distorted using the Gaussian function (intensities in the low and high m/z areas were decreased (1/4), while the intensities in the center of the mass spectrum were increased (4×)). So, all possible types of variability were imitated by the selected types of distortions. The distorted mass lists were standardized by the SantaOmics algorithm and then were compared with the standardized nondistorted mass list of sample #3 by means of calculating the R2 value for linear approximation of the peak intensities and the correlation coefficient (r).
The second test was related to the capacity of the SantaOmics algorithm to standardize sufficiently divergent data obtained from the same instrument (intra-instrumental experiment). Such divergence may be a result of the variability, which appears when different options are used for measurement. For this aim, the mass spectra obtained by maXis in different ranges of detection (‘wide range’ and ‘high range’) were standardized. The overlapping areas of these mass spectra were used to estimate their identity (by calculating R2 and r).
The third test, which demonstrated the capacity of the algorithm to convert mass spectra to the same scale, was performed using mass spectra from different mass spectrometers (inter-instrumental experiment with maXis, micrOTOF-Q, OrbiTrap Elite, Apex Ultra, and IFunnel Q-ToF mass spectrometers, thus providing data for mass spectrometers from the same manufacturer and same design, as well as from different manufacturers and different designs). The mass spectra were overlapped in order to visually compare them in terms of the quality of scaling. Additionally, Spearman correlation coefficient determination and Passing Bablok analysis were performed to measure the slopes and intercepts in three independent experiments corresponding to blood plasma samples from three different volunteers.