Reproducibility
At first, we examined the reproducibility of our experimental procedures and analysis. Colon tissues were carefully isolated from 3 normal healthy wild type mice. Protein was extracted, run on SDS-PAGE, stained and sliced as shown in Figure 1. Tryptic peptide mixture was eluted, pre-fractionated on C18 separation column and analyzed by Nano-LC-ESI-IT-TOF-MS/MS. Reproducibility of mass analysis was confirmed by matching total ion chromatography (TIC) and peak retention time together with regular monitoring shifts between theoretical and calculated molecular weight precursor ions (Additional file 1) . Moreover, stability of the mass spectrometer was confirmed by performing duplicate check analysis of the same sample which yielded 80-87% similarity of identified proteins (data are not shown). A total of 1237 high confidence proteins with 2 or more peptide matches and FDR < 2 were identified after merging three data sets and removal of redundancy (based on the IPI accession number). Identified protein, including IPI accession number, protein name, peptide matches, theoretical and calculated MW and PI are listed in Additional file 2.
Mouse colon proteome
Using protein data sets generated from 42 slice analysis, we created a proteome catalogue of mouse colon by merging and refining outputs from redundancy and low confidence protein candidates with 1 peptide match. Protein candidate was accepted and considered a confident hit if at least 2 corresponding peptides passed identity and homology threshold of Mascot (MOWSE) algorithm [16]. Over 48.000 Ms/Ms spectra corresponding to 1237 protein candidate were detected and used to configure murine colon proteome (Additional file 2). Ms/Ms spectra of annotated peptides are freely accessible through proteomic data repository, PRIDE, the PRoteomic IDEntification database; http://www.ebi.ac.uk/pride/ under accession number [16857]. Constructed dataset was used for further characterization of mouse colon proteins.
Characteristics of murine colon proteins
As illustrated in Figures 2 and 3, we had characterized the identified proteins based on their pI, Mw and hydrophobicity. While ~ 68% (n = 843) of proteins were fallen in pI ranged from 5–9 and theoretical Mw 10 to 100 KDa, a number of proteins with extreme pI or Mw exists in our database; for instance, ~12.5% of identified proteins (n = 155) were reported with acidic pI ranged from 3–5 [Anp32a; pI 3.99]. Conversely, basic pI proteins ranged from 9–12 were representing ~ 19.5% (n = 239) such as Sfrs7 (pI 11.83). In addition, we succeeded to resolve proteins with wide Mw ranged from 4 to 600 KDa, for Hspb1 and Ahnak, respectively. These eccentric proteins would rarely be resolved by 2DE. In viewing cellular localization of constructed colon proteome, we further categorized protein candidates into globular or membranous proteins based on amino acids hydrophobicity calculated by Kyte-Doolittle and Hopp Woods formula. Result showed that out of 1237 proteins, 996 representing 80.5% were globular (cytoplasmic) while 241 (19.5%) were membranous (Figure 3). The relative small number of membrane-located proteins might be due to its hydrophobic resistance to be dissolved in the lysis buffer. Further optimization of hydrophobic proteins extraction is required.
Subcellular localization of identified proteins
We parsed protein dataset into subcellular localization using BiNGO. As shown in Figure 4, around 786 representing 64% of identified proteins were located in cytoplasm. This major compartment was mainly mitochondrial, cytoskeleton, and endoplasmic reticulum proteins (right bar panel). On the other hand, equal numbers of identified proteins were situated in nucleus and plasma membrane (17%; 208 entries/each). A small percentage (2%) of extracellular space proteins was also detected in our dataset.
GO annotation and functional network analysis
Enrichment and depletion analysis of mouse colon proteome were applied in the current experiment to speculate over and under- represented proteins. As illustrated in Figure 5, enrichment GO annotation showed successful retrieval of wide diversity of proteins (shown by node size and intense color) and reflecting a comprehensive, unbiased Lc-Ms/Ms approach. This observation is coincidental with non significant under representation of the most parent GO categories (colorless nodes in Figure 5B) with an exception of some daughter categories. For instance, membranous and nuclear proteins were shown to be under represented with P value less than 0.01 (see also Additional file 3). Once again, this finding supports gravy index result and likelihood is due to the known difficulty to extract membranous proteins because of its hydrophobic nature. Functional analysis and protein family network illustrated in Figure 6 showed the involvement of 1199 identifiers in essential metabolic processes and pathways of the colon tissue (38 identifier were not recognized) and reflect the relationship between the terms based on similarity of their associated genes. As summarized in Figure 7, ten potential groups were recognized and defined with a leading term (to minimize the complex structure of GO tree) based on the highest significance scoring within the group. These enriched groups indicate their relevance in essential metabolic functions and processes in colon tissue. Detailed sub functional term classification can be found in Additional file 1.
Regulation of actin cytoskeleton/tight junction and related family groups
Representing over one third (563) of identified terms, as illustrated in Figure 6, this family holds 43 GO terms (shown by nodes). Most prominent is actin cytoskeleton which contains 47 identifiers. A wide variety of essential pathways could be recognized in this group. For example; MAPK (proliferation and apoptosis), VEGF (angiogenesis), Toll-like and T-like receptors (Immune barrier) signaling pathways and others, which reflect the active metabolic processes took place in the colonic cells. Moreover, several protein candidates for cancer pathways could be reported. Most notably Mapk8 and 9, Rac1 and 2, Rhoa for colorectal cancer, Cdc42, Fh1, Rac1, Rap1a, Tceb2 for renal cell carcinoma, Cdc42, Mapk8and 9, Pld1, Rac1and 2for pancreatic carcinoma and Fgfr2, Gstp1, Hsp90aa1, Hsp90ab1, Hsp90b1 for prostate cancer.
Glycolysis and Glyconeogenesis and related family groups
Major 2 ubiquitous processes that involve glucose breakdown (glycolysis) and its generation form non carbohydrate sources (glyconeogenesis) were detected. In colon proteome database, we reported 500 identifiers representing 30.5% of total colon proteome representing Glycolysis pathway. This family holds wide members of enzymes including aldolases A and B, alcohol dehydrogenases family members, enolases (α, ß, and γ), lactate dehydrogenases, and others.
Proximal tubule bicarbonate reclamation and related family groups
This family includes ATPase, Na+/K + transporters [alpha1-4], glutamate dehydrogenase and malate dehydrogenase 1 (NAD) and representing 9.6% of colon database. These catalytic enzymes are essential for exchanging sodium and potassium ions and providing energy for active transport of various nutrients in the gut. Other ATPase transporters were also identified which are contributors in salivary and gastric acid secretion such as (ATP1b1and ATP4a).
Pyruvate metabolism and related family groups
An essential group family which has key enzymes in citric acid cycle including dehydrogenases such as pyruvate, lactate, malate dehydrogenase. This group is mainly responsible for cellular respiration and release of energy via NADH. We were able also to recognize a wide variety of enzymes that contribute in amino acid metabolism and participating in cysteine, methionine, valine, leucine and isoleucine, arginine, proline, histidine and tryptophane synthesis and breakdown including LAP3, OAT, GOT2, and ALDH2 (Additional file 3). Several enzymes that share in fatty acid metabolism and elongation were also reported such as acetyl CoA acyltransferese 1 and 2, alcohol dehydrogenase 1 and 2 and others.
Glutathione metabolism and related family groups
Includes glutathione peroxidase (GPX 1–5) and glutathione S- transferases (GST 1–5) enzymes that protect cells and other enzymes form oxidative damage by catalyzing the reduction of hydrogen peroxide, lipid peroxides and organic hydroperoxides.
Other families
Representing less than 15% of colon database and including chemokine signaling pathway, nicotinate and nicotinamide metabolism, amino sugar and nucleotide sugar metabolism, phenylalanine metabolism and amobiasis.
Relative abundance of identified murine colon proteome
In addition to protein identification and annotation, we determined also the relative protein abundance profiling of our murine colon proteome using 3 different approaches; NSAF, PAF, and emPAI owing to its extreme importance in comparative proteomics studies. These algorithms relay on the fact that spectral count (for NSAF and PAF) and peptide rank and score (for emPAI) correlate with relative protein abundance [17–19]. In Figure 8A, individual NSAF in the analyzed samples showed similar pattern. Following construction of the colon proteome catalogue, protein abundance was fallen within the range from 1.2 x 10-5 to 1.8 x 10-2. This wide dynamic range further attests our unbiased Lc-Ms approach. When calculated using PAF algorithm, most of proteins were ranged from 0.1 x10-4 to 1.0 x10-4 (Figure 8B). We noticed the existence of several ubiquitous proteins in high abundance (~ 5 folds) such as transgelin, actin, and ß-globin which is recommended to be depleted when investigating low abundance proteins. On the other hand, several low abundance proteins were also identified. For instance, dipeptidyl - peptidase 1 (0.03 x10-4) and polymeric immunoglobulin receptor (0.05 x10-4). A similar pattern was also noticed when emPAI was used (Figure 8C). To compare between these 3 approaches, we sorted the top 30 abundant peptides (Additional file 4). Obviously, although several top abundant proteins were found to be shared between 3 algorithms, other candidates were different which probably might be due the fact that emPAI relies on peptide score while NSAF and PAF depends on spectral count.
Comparison of murine colon proteome to gene expression database of mouse colon
At last, we compared our generated murine colon proteome with the gene expression dataset for mouse colon in order to confirm the feasibility of this dataset as a standard reference for further colon experimentation. For that purpose, the whole length mouse colon genes were extracted from the reference expression dataset (RefEx) repository. The later was compared versus our generated proteome database (based on its gene ontology). As exemplarily shown in Figure 9, the current murine colon proteome showed an over lapping with around 35.6% compared to the known mouse colon genes. This result can be explained by the selective property and general limitation of mass spectrometry. Moreover, 13 genes were identified in our proteome dataset and have not been recognized in colon genome. These biased genes are possibly a non colon genes contamination.
Biological insights
Recent advances in mass spectrometric methodologies enabled direct analysis of complex protein mixtures in a shotgun approach for global protein identification and biomarker discovery. Presented data in this article, provides not only a normal comprehensive colon proteome database, but also, various label free quantification methods for researcher’s guidance especially when monitoring cancer- related colon protein expression. Furthermore, a functional network analysis of colon proteome is believed to provide a valuable piece of information for clarifying the relationship between possible predicted biomarkers. For instance, several candidates of gastrointestinal tract carcinoma showed correlated pattern; Mapk8 and Rac1,2 of colorectal cancer, cdc42 of renal cell carcinoma, pld1 in pancreatic cancer and Hsp90b1 and hsp90ab1 in prostate cancer. These data might anticipate in elucidating cell signaling and pathophysiological pathways.