Data resources, edits and nomenclature
We merged three large datasets as follows: Firstly, we accessed expression data drawn from massively parallel signature sequencing (MPSS) covering 182,719 tag signatures across 32 tissues . Tissues represented on the MPSS data included nine different central nervous system (CNS) areas (amygdale, caudate nucleus, cerebellum, corpus callosum, fetal brain, hypothalamus, thalamus, spinal cord, and pituitary gland) and 23 non-CNS organs (adrenal gland, bladder, bone marrow, heart, kidney, lung, mammary gland, pancreas, placenta, prostate, retina, salivary gland, small intestine, spleen, stomach, testis, thymus, thyroid, trachea, uterus, colon, monocytes and peripheral blood lymphocytes). A total of 18,677 unique genes were represented on the MPSS data and the number of expressed genes per tissue averaged 8,943 and ranged from 5,845 in pancreas to 12,267 in testis.
Secondly, we downloaded a set of 55,606 true positive interactions among 7,197 genes that were defined from functional studies . This interactions dataset was built including 2,788 confirmed, direct, physical protein-protein interactions derived from the Biomolecular Interaction Network Database (BIND; http://binddb.org) , 18,176 confirmed human protein interactions from the Human Protein Reference Database (HPRD; http://www.hprd.org/) , 22,012 direct functional interactions from the Kyoto Encyclopedia of Genes and Genomes (KEGG; http://www.genome.jp/kegg) , and 16,295 interactions derived from Reactome http://www.reactome.org.
Finally, we used the microarray expression results generated during the profiling of systemic inflammation across 44,924 probe sets  and from which 126,543 interactions among 7,090 genes were reported . The microarray experiment used 92 Affymetrix GeneChips (Affymetrix, Santa Clara, CA) to examine gene expression profiles in whole blood leukocytes immediately before and at 2, 4, 6, 9 and 24 h after intravenous administration of bacterial lipopolysaccharide (LPS) endotoxin to four healthy human subjects. For the control (placebo) data, four additional subjects were studied under identical conditions but without LPS administration.
For the present study, and to enable the merging of the three datasets, a number of edits were performed as follows: For the MPSS data, tags not expressed at more than 5 transcripts per million (tpm), in at least one tissue, were disregarded. The threshold of 5 tpm corresponds to the sensitivity of MPSS technology as claimed by the manufacturers and independently assessed in our laboratory . Also, when the same gene was represented by more than one MPSS tag, the reading from the most abundant tag, summed across all tissues, was assigned to that gene. Finally, for the true positive interactions and the inflammation datasets, interactions involving genes not surveyed in the MPSS data were also discarded.
These criteria resulted in 15,050 genes [see Additional file 1] of which 5,198 and 4,950 were included in the true positive interactions and the inflammation datasets, respectively, and with 2,499 genes in common. In addition, a total of 6,151 (41%) of the genes were associated with disease according to OMIM database  as of September 19, 2007; and with 1,445 of them defined as disease-causing (i.e., associated with either known disease phenotype or polymorphic sequence known).
Hereafter, we refer to DIS to indicate the 6,151 genes from our resulting dataset that are disease-associated according to OMIM, and to NDIS to indicate the remaining 8,899 non-disease-associated genes also according to OMIM. Similarly, we refer to INT (and NINT) to indicate genes in our dataset for which interactions have (and have not) been reported.
Data mining approaches
In order to further characterize the relationship existing between tissue specificity, gene connectivity and disease association, the 15,050 genes were classified as either TS or HK. To ensure that these two categories together represented the majority of the genes, we searched for category limits from either extreme of the distribution of the number of genes expressed in one, two, and up to 32 tissues, until equivalent categories were defined, cumulatively representing > 50% of the total number of genes. In doing so, there were 4,232 (28%) TS genes expressed in 1 to 4 tissues, and 4,006 (27%) HK genes expressed in more than 25 tissues. The remaining 6,812 (45%) genes were classified as non-specific (NS).
Finally, and in order to identify novel candidate genes impacting disease, we developed a guilt-by-association algorithm. Selection thresholds based on the average number of known interactions combined with the average proportion of DIS genes among their interactors were determined from DIS genes. These thresholds were then applied to genes in the NDIS category. Genes exceeding both thresholds were identified as likely disease-associated candidates.