Network construction and structure detection with metagenomic count data
 Zhenqiu Liu^{1}Email author,
 Shili Lin^{2} and
 Steven Piantadosi^{1}
Received: 5 June 2015
Accepted: 18 November 2015
Published: 12 December 2015
Abstract
Background
The human microbiome plays a critical role in human health. Massive amounts of metagenomic data have been generated with advances in nextgeneration sequencing technologies that characterize microbial communities via direct isolation and sequencing. How to extract, analyze, and transform these vast amounts of data into useful knowledge is a great challenge to bioinformaticians. Microbial biodiversity research has focused primarily on taxa composition and abundance and less on the cooccurrences among different taxa. However, taxa cooccurrences and their relationships to environmental and clinical conditions are important because network structure may help to understand how microbial taxa function together.
Results
We propose a systematic robust approach for bacteria network construction and structure detection using metagenomic count data. Pairwise similarity/distance measures between taxa are proposed by adapting distance measures for samples in ecology. We also extend the sparse inverse covariance approach to a sparse inverse of a similarity matrix from count data for network construction. Our approach is efficient for large metagenomic count data with thousands of bacterial taxa. We evaluate our method with real and simulated data. Our method identifies true and biologically significant network structures efficiently.
Conclusions
Network analysis is crucial for detecting subnetwork structures with metagenomic count data. We developed a software tool in MATLAB for network construction and biologically significant module detection. Software MetaNet can be downloaded from http://biostatistics.csmc.edu/MetaNet/.
Keywords
Metagenomics data Networks analysis ModulesBackground
Our human body is a host to various of microbes. Over 90 % of the cells in human body are bacterial or other nonhuman cells. These microbes have great influence on human physiology and nutrition, and are crucial for our health [1]. Metagenomics, which is the study of genetic material recovered directly from uncultured microorganisms, has accelerated the analysis of functional biodiversity relevant to its ecology. The objectives of human microbiome research are to explore the hostmicrobiota interactions, associate differences in microbial communities with differences in metabolic functions and diseases, and understand how microbiota changes may affect human health [1]. Massive amounts of metagenomic sequencing data have been generated with advances in nextgeneration sequencing (NGS) technologies. There are two NGS methods for metagenomics: whole metagenomic shotgun sequencing (WMGSS) and 16S rRNA gene sequencing. 16S rRNA sequencing is an amplicon sequencing method for identifying and comparing bacteria present within a given sample, while WMGSS comprehensively sample all genes in all organisms present in a given complex sample. The two techniques are quite different and intend for answer different biological questions. It has been shown that 16S rRNA sequencing contains hundreds of thousands of 16S RNAs fragments and is an efficient tool to infer bacterial communities, while WMGSS is mainly used for functional delineation and it is generally not deep enough to detect rare species in complex communities [2, 3]. In this paper, we infer network structures and taxa cooccurrence with 16S rRNA sequencing. By examining the relationship of genome structure and function across many different taxa with NGS data, the scope of microbiology and of microbial evolution studies has been greatly broadened, and the field of systems biology has emerged [4, 5].
There have been great strides in determining the taxonomical and functional contents of a sample in the last several years. Many software packages including MOTHUR [6], UniFrac [7], QIIME [8], and SILVAngs [9] have been designed primarily for the analysis of 16S rRNA sequencing data, while the other software packages including MEGAN [5, 10], Phymm [11], NBC [12] were developed mainly for shotgun metagenomic sequencing data. Those tools provide different approaches for the comparison of microbial communities with metagenomic sequence data. One output from some of the software is the abundance counts (sequence reads) for each taxa. These taxa abundance counts can be further analyzed to identify taxa and microbial communities that are associated with human diseases by comparing taxa counts from two or more groups with different disease status. Study of the link between characteristics of a microbiome and human disease is a active area of research. Current approaches such as MetaStats [13] and MetaDistance [14] mainly focus on variations in abundance across different clinical conditions, ignoring the interactions and structural variations among taxa. However, bacteria taxa do not act alone, rather they form part of large interacting (cooccurrence) networks and may function together. Variations in network structures and taxon interactions may be associated with disease status and clinical phenotypes [15–18]. Therefore network methods specifically designed for metagenomic count need to be developed.
Networks methods and graph theory have been widely applied to gene regulatory network construction with expression data [19–21]. Network analysis has also proved powerful for studying the characteristics of metabolic networks and their impact on various functional and evolutionary properties [22–24]. RNASeq is a NGS approach to transcriptome profiling. It provides a far more precise measurement of levels of transcripts and their isoforms than other methods [25]. Local Poisson graphical (loglinear) model and Bayesian generalized graphical model for network construction have been developed with RNAseq data recently [26, 27]. However, the loglinear model is not valid when there are zero counts or measures in the data, which is common in metagenomics. Also, the Bayesian Poisson graphical model is slow when the network size is large. It usually takes hours to construct a network with hundreds of nodes. Those parametric methods can not be applied to metagenomic count data without modification. Moreover, even though there are a few methods available for network construction with microbiome data [28–32], most methods for network analysis are based on pairwise correlations (or distance) and ignore highorder correlations. However, highorder (partial) correlation has the advantage over pairwise correlation, because it measures the conditional dependency between two taxa given the effect of other taxa being removed or fixed, and reflects direct correlation between taxa and excludes the betweentaxon dependency due to other taxa. In addition, variance heterogeneity and nonnormality of metagenomic count data make standard correlations invalid (e.g. Pearson correlation). One way to deal with the problem is to use proportion and logratio transformations [33]. However the logratio is not defined when there are zeros in the data and approximation methods have to be used.
In this paper, we propose a nonparametric approach for cooccurrence network construction and subnetwork structure detection. We propose similarity (or distance) measures between taxa derived by adapting distance measures between samples with abundance counts defined in ecology [34]. We also expand the sparse inverse covariance method to sparse inverse of general similarity matrices for highorder correlation. The performance of our methods are evaluated through simulation and publicly available metagenomic data sets. The proposed methods are efficient for detecting true network structures. Even though the cooccurrence network is just a description analysis from temporal snapshots, it may be informative regarding how microbial taxa function together.
Methods
where X is the count matrix with n samples and m taxa, and x_{ ij } denotes the total number of reads of taxon j in sample i. In case there have been known disease status or clinical conditions (y) available, we will also discuss methods for detecting structural variations across clinical conditions. Constructing a human metagenomic network requires several sequential steps: (i) estimating pairwise similarity (or distance) measures between different taxa, (ii) adjacency matrix construction, (iii) network structure (module) detection and differentiated networks. We will discuss each of these steps.
Pairwise similarity measures

Spearman rankorder correlation:We will take the average of the scores when multiple elements have the tied ranks.$$R (\mathbf{x}, \mathbf{y}) = 1  \frac{6\sum_{i=1}^{n} (\mathbf{R}_{xi} \mathbf{R}_{yi})^{2}}{n(n^{2}1)} $$

Kendall’s τ rank correlation:where T_{0}=n(n−1)/2, \(T_{1} = \sum _{k} t_{k} (t_{k} 1)/2\), and \(T_{2} = \sum _{l} u_{l} (u_{l}1)/2\). The t_{ k } is the number of tied x values in the kth group of the tied x values, u_{ l } is the number of tied y values in the lth group of tied y values, and sign (z) is defined as:$$\tau(\mathbf{x}, \mathbf{y})= \frac{\sum_{i< j} \text{sgn}(x_{i} < x_{j}) \text{sgn}(y_{i} < y_{j})}{\sqrt{(T_{0} T_{1})(T_{0} T_{2})}}, $$$$\text{sgn}(z) = \left\{ \begin{array}{rl} 1 & \text{if} \,x_{i} < x_{j}, \\ 0 & \text{if} \,x_{i} = x_{j}, \\ 1 & \text{if} \,x_{i} > x_{j}. \end{array} \right. $$
Our similarity matrix S can be defined with either \(S = \left [\sin \left (\frac {\pi }{2}R\left (\mathbf {x}_{i}, \mathbf {x}_{j}\right)\right)\right ]_{m\times m}\) or \(S = \left [\sin (\frac {\pi }{2}\tau (\mathbf {x}_{i}, \mathbf {x}_{j}))\right ]_{m\times m}\) [35]. Those distributionfree correlations only utilize rank information, and are more robust than the parametric approach. Even though they are slight less efficiency than Pearson correlation under normal distribution, both Spearman and Kendall correlation coefficients provide a good compromise between robustness and efficiency [36].

Hellinger distance:where \(\mathbf {x}_{+1} = \sum _{i=1}^{n} x_{i1}\), and \(\mathbf {x}_{+2} = \sum _{i=1}^{n} x_{i2}\).$$D(\mathbf{x}_{1}, \mathbf{x}_{2}) =\sqrt{\sum_{i=1}^{n}\left(\sqrt{\frac{x_{i1}}{\mathbf{x}_{+1}}} \sqrt{\frac{x_{i2}}{\mathbf{x}_{+2}}}\right)^{2}}, $$

The χ^{2} distance:$$D\left(\mathbf{x}_{1}, \mathbf{x}_{2}\right) = \sqrt{\sum_{i=1}^{n} \frac{\left(\mathbf{x}_{+1} + \mathbf{x}_{+2}\right)}{(x_{i1} + x_{i2})} \left(\sqrt{\frac{x_{i1}}{\mathbf{x}_{+1}}}  \sqrt{\frac{x_{i2}}{\mathbf{x}_{+2}}}\right)^{2}}. $$

BrayCurtis dissimilarity:$$D(\mathbf{x}_{1}, \mathbf{x}_{2}) = 1  2 \frac{\sum_{i=1}^{n}\min(x_{i1}, x_{i2}) }{\sum_{i=1}^{n}(x_{i1} + x_{i2})} = 12\sum_{i=1}^{n} \frac{\min(x_{i1}, x_{i2}) }{(\mathbf{x}_{+1} + \mathbf{x}_{+2})}. $$
Similarity to adjacency matrix
After finding the derivative, we initialize A^{0}=(S+λI)^{−1}, and then use the standard decent gradient algorithm to update each row/column repeatedly until the algorithm converges. After obtaining the sparse A and taking the absolute value A=A, we set the diagonal value of A to zero with A=A−diag(A) to get the final adjacency matrix A. The adjacency matrix A is a representation of a graph, where the value of a_{ ij } represents the connectivity between taxa i and j.
λ Determination
Final network is constructed using whole data X and \(\hat {{\lambda }}\).
Network structure detection and differentiated networks
MetaNet package
MetaNet toolbox in MATLAB was implemented to construct sparse network from metagenomic count data. The toolbox was tested under MATLAB 2013a, but should also work on the later versions of MATLAB. Implemented functions in this toolbox include several similarity (distance) measures, simulated distributions and network models, sparse inverse covariance estimation, network structure detection and differential networks, and network visualization. The goal of this distribution is to provide an easytouse tool for network construction and analysis. Although the package is till under development, the users can construct, analyze, and visualize a network from their own data without much difficulty. MetaNet is provided as is without warranty of any kind. More information and the toolbox can be downloaded from http://biostatistics.csmc.edu/MetaNet/.
Data sets
Simulated data
Simulated count data with different numbers of nodes and sample sizes are generated from a negative binomial (NB) distribution. More specifically, the data sets are generated from a NB distribution X_{ ij }∼NB(λ_{ ij },γ) with mean λ_{ ij } and dispersion parameter γ, and \(\log ({\lambda }_{i})\) is from a multivariate distribution \(\log ({\lambda }_{i}) \sim N(\mu, \Sigma)\) with mean μ and covariance matrix Σ. The graphical structures are constructed through A=Σ^{−1}, where A is an adjacency matrix with additional diagonal elements \(a_{\textit {ii}}=\sum _{j, i\neq j} a_{\textit {ij}}+1\), i=1,…,n. The adjacency matrices are generated using three different models include small world, scale free, and range dependent networks. The small world network we use only allows a node to connect with its neighborhood node, while the scalefree network has the number of nodes of degree 2 following a power law, and the range dependent network has an edge between nodes i and j with probability 0.9.0.3^{j−i−1}. These three networks are known to mimic the behavior of real biological networks. All count data sets in this paper are generated by setting μ=3 and the overdispersion parameter γ=2.
Real metagenomic data from body habitats
The real data was collected from six body habitats including external auditory canal (EAC), gut, hair, nostril, oral cavity, and skin [46]. The objective of the original study was to estimate the microbial community composition and detect the differentiation in abundance among body habitats. A total of 815 samples were collected for 6 categories of habitat. Networks were constructed from gut, oral cavity (OC), and skin samples with the sample sizes of 45, 54, and 612 respectively. There were total 1713 taxa at the genus level.
Results
Results with simulation data
The proposed similarity or dissimilarity measures including Spearman correlation coefficients, Hellinger, and BrayCurtis performed well to detect the true structures as shown in Fig. 1. The area under ROC curves (AUC) was used to evaluate the performance of detecting proposed network structures, where the specificity for a network measures the proportion of no edges that are correctly detected, while the sensitivity for a network is the proportion of edges that are correctly identified. With the optimal λ^{∗}=0.2,0.55, and 0.65 for Spearman, Hellinger, and BrayCurtis, we have the predicted AUCs of 0.91, 0.96, and 0.92 respectively with the small world network. The BayCurtis distance performed the best, while the other two measures also performed well (≥0.91). Similarly, our proposed approach also performed reasonable well with both scalefree and range dependent networks. We achieved the best predicted AUC of 0.851 and 0.902 with Hellinger distance and λ^{∗}=0.6 and 0.45 respectively for scalefree and range dependent networks. Overall Spearman has the lowest predicted AUCs for all models with the negative binomial simulated data as shown on the bottom of Fig. 1, but the differences among all measures are not very significant.
Predicted AUCs with different network structures and sample sizes for large networks with 500 nodes
n  Similarity  Small world  Scalefree  Rangedependent 

50  Spearman  0.909 (±.015)  0.930 (±.035)  0.730 (±.021) 
BrayCurtis  0.894 (±.014)  0.786 (±.017)  0.693 (±.016)  
Hellinger  0.936 (±.018)  0.844 (±.019)  0.709 (±.015)  
100  Spearman  0.982 (±.009)  0.972 (±.020)  0.849 (±.014) 
BrayCurtis  0.975 (±.011)  0.820 (±.016)  0.794 (±.019)  
Hellinger  0.986 (±.016)  0.883 (±.013)  0.812 (±.011)  
200  Spearman  0.999 (±.012)  0.994 (±.015)  0.872 (±.015) 
BrayCurtis  0.996 (±.009)  0.864 (±0.014)  0.828 (±.018)  
Hellinger  0.998 (±.006)  0.944 (±.012)  0.846 (±.015) 
Table 1 indicates that the predicted AUCs increase and the performance gets better as the sample size increases. The proposed method performs reasonable well for different network structures. While Hellinger distance achieves the best result in smallworld network, Spearman’s rank correlation has the best performance with scalefree and range dependent networks. Therefore, Spearman’s rank correlation is more robust with large networks generated from Poisson distribution, even though differences among different similarity measures are not always statistically significant.
Results with real body habitats data
Different colors of the nodes indicate different network modules, and the node size represents relative abundance: The larger the nodes, the higher the relative abundance of a genera. The edges indicate the direct coexistence (cooccurrence) between two genera. The skin subnetwork on the top panel of Fig. 2 has 4 modules colored in green, blue, orange, and red with 6, 6, 2, and 2 genera respectively. Several genera on the network are known to cause skin infections. For instance, Streptococcus on the green module is a wellknown bacteria directly related to several skin infections including Impetigo, Cellulitis, and Erysipelas. Actinomcyes genus causes a chronic (slowly progressive) infection named Actinomycosis, and Fusobacterium genus has been known to cause tropical phagedenic ulcer (http://dermnetnz.org/bacterial/). More interestingly, the direct association of Actinomcyes and Fusobacterium on the green module was verified by a recent study experimentally [47]. The mixture and coinfection of two bacteria genera cause mastoiditis. In addition, Pasteurella genus has also been shown to cause skin disease [48]. Other genera directly connected to Pasteurella including Rothia, Granulicatella, Gemella, and Haemophilus may function together biologically and clinically through ‘guilt by association’. Even though the cooccurrences among genera are only verified statistically, their biological and medical implications need to be further validated in a wet lab, our methods provide a guidance for investigators in their research.
The gut network constructed with 45 samples is shown in the bottom left panel (II) of Fig. 2. Two modules colored with green and yellow respectively are identified, each of them with 4 genera. All 4 high abundance genera and their interactions on the green module are associated with gut related diseases. Faecalibacterium and Bacteroidetes genera are both associated with type 2 diabetics and obeisity [49]. Another recently study also indicates that Blautia and Faecalibacterium vary together during antibiotic therapy [50], and it has been shown that both Roseburia and Faecalibacterium have lower abundance in patients with Crohn’s Disease compared with their healthy siblings [51]. All these studies support our results that Faecalibacterium, Blautia, Roseburia, and Bacteroidetes interact with each other. However, interactions among the 4 genera (Subdoligranulum, Coprococcus, Anaerotruncus, and Oscillibacter) with lower abundance in the yellow module have not been well studied, even though individual genus has been reported in recent literature.
Bacteria genera in oral cavity (OC) play a key role in mouth infections and periodontal diseases. Oral cavity network built with genera from 54 oral cavity samples is shown in the bottom right panel (III) of Fig. 2. Three modules colored with green, red, and yellow were identified with 2, 2, and 3 genera on each subnetwork. Interactions and cooccurrences are identified among both high abundance genera (Fusobacterium, Veillonella, Neisseria, Prevotella) and low abundance genera (Atopobium, and Megasphaera) in OC. Genera such as Fusobacterium, Neisseria, Porphyromonas, and Prevotella have been known to be significantly different in abundance with different clinical conditions and disease status [52, 53]. The cooccurrences between Porphyromonas and Fusobacterium on the yellow module has been verified through a mouse model experimentally [54]. However, cooccurrences among Fusobacterium, Porphyromonas, and Neisseria together have not been explored. One interesting finding is the common interactions between Prevotella and Veillonella and between Megasphaera and Atopobium in the oral cavity and skin, indicating that some interactions and cooccurrences may be shared at different body habitats. Therefore, cooccurrence network analysis with proposed approach is useful for determining novel biological interactions that may help to decipher the structure of complex microbial communities. It is also useful for systematically exploring coexistence patterns in big metegenomics data that standard tools may fail to detect.
Conclusions
We have developed a systematic approach for constructing networks and detecting subnetwork structures with metagenomic count data. Our contributions are in two areas: (1) we adapt distance measures between samples from ecology to compute similarity between taxa, and (2) we extend sparse inverse covariance methods for Gaussian models to determine highorder interactions with general similarity matrices. Based on both simulated and real data, our method can identify true and biologically important interactions and associations with limited computational experiments. One advantage of our approach is that it detects the partial (highorder) correlations among the taxa. Unlike pairwise correlation, partial correlation measures the conditional dependency between two taxa given the effect of rest taxa being removed or fixed. Therefore, networks constructed from partial correlation usually have lower false positive connections than those from pairwise correlation. In addition, modularity function maximization for structure detection in MetaNet automatically determines the number of network clusters, while other popular approaches such as Kmeans and hierarchical clustering require the number of clusters or a cutoff point to be predefined. Even though MetaNet is slightly computational intensive when comparing to the popular pairwise approach, it only takes minutes to construct a network with thousands of nodes. While our method has been developed for 16S rRNA sequencing data from human body, it can be applied to 16S sequencing data from other organisms. It may also be used to analyze whole metagenomic shotgun sequencing or RNAseq, as long as the sequences are properly aligned. Note that our method constructs networks solely based on their statistical similarity or dissimilarity among taxa, the biological significance of the results has to be further validated in a web lab. Future works wledge and pathway information into our network constructions.
Appendix: network statistics

Degree: number of links connected to a taxon (node) i, \(k_{i} = \sum _{j} a_{\textit {ij}}\).

Number of triangles around a taxon i: \(T_{i} = \frac {1}{2} \sum _{j,k}a_{\textit {ij}}a_{\textit {ik}}a_{\textit {jk}}\).

Clustering coefficient: Measuring the segregation of a network, \(C = \frac {1}{m}\sum _{i}C_{i} = \frac {1}{m}\sum _{i}\frac {2T_{i}}{K_{i}(K_{i}1)}.\)

Participation coefficient of taxon i, \(Y_{i} = 1  \sum _{m\in M}\left (\frac {K_{i}(m)}{K_{i}}\right)^{2}\), where k_{ i }(m) is the withinmodule degree of taxon i in module m.
We will apply network statistics to rank the bacteria taxa. The larger the network statistics, the stronger connectivity the taxa, and the more important the taxa statistically. The biological importance of those taxa with larger network statistics can be further validated in the lab.
Declarations
Acknowledgements
We thank the Associate Editor and anonymous referees for their constructive comments. This work was partially supported by the National Science Foundation (NSF) grants DMS222381 (ZL) and DMS1220772 (SL).
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
 Turnbaugh P, Ley R, Hamady M, FraserLiggett C, Knight R, Gordon JI. The human microbiome project. Nature. 2007; 449:804–10. doi:10.1038/nature06244.PubMedPubMed CentralView ArticleGoogle Scholar
 Mande SS, Mohammed MH, Ghosh TS. Classification of metagenomic sequences: methods and challenges. Brief Bioinform. 2012; 13(6):669–81. doi:10.1093/bib/bbs054.PubMedView ArticleGoogle Scholar
 Keller A, Horn H, Förster F, Schultz J. Computational integration of genomic traits into 16S rDNA microbiota sequencing studies. Gene. 2014; 549(1):186–91.PubMedView ArticleGoogle Scholar
 Wooley J, Godzik A, Friedberg I. A primer on metagenomics. PLoS Comput Biol. 2010; 6(2):e1000667. doi:10.1371/journal.pcbi.1000667.PubMedPubMed CentralView ArticleGoogle Scholar
 Huson D, Auch A, Qi J, Schuster S. Megan analysis of metagenomic data. Genome Res. 2007; 17:377–86.PubMedPubMed CentralView ArticleGoogle Scholar
 Schloss P, Westcott S, Ryabin T, Hall J, Hartmann M, Hollister EB, et al.Introducing mothur: opensource, platformindependent, communitysupported software for describing and comparing microbial communities. Appl Environ Microbiol. 2009; 75:7537–41.PubMedPubMed CentralView ArticleGoogle Scholar
 Lozupone C, Lladser M, Knights D, Stombaugh J, Knight R. Unifrac: an effective distance metric for microbial community comparison. ISME J. 2010; 5:169172. doi:10.1128/aem.0154109.Google Scholar
 Caporaso J, Kuczynski J, Stombaugh J, Bittinger K, Bushman F, Costello EK, et al.Qiime allows analysis of highthroughput community sequencing data. Nat Methods. 2010; 7:335–36. doi:10.1038/nmeth.f.303.PubMedPubMed CentralView ArticleGoogle Scholar
 Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, et al.The SILVA ribosomal RNA gene data base project: improved data processing and webbased tools. Nucleic Acids Res. 2013; 41(D1):D590–96.PubMedView ArticleGoogle Scholar
 Huson D, Mitra S, Weber N, Ruscheweyh H, Schuster S. Integrative analysis of environmental sequences using megan4. Genome Res. 2011; 21:1552–60.PubMedPubMed CentralView ArticleGoogle Scholar
 Brady A, Salzberg S. Phymm and phymmbl: metagenomic phylogenetic classification with interpolated markov models. Nat Methods. 2009; 6:673–76.PubMedPubMed CentralView ArticleGoogle Scholar
 Rosen G, Reichenberger E, Rosenfeld A. Nbc: the naive bayes classification tool webserver for taxonomic classification of metagenomic reads. Bioinformatics. 2010; 27:127–29.PubMedPubMed CentralView ArticleGoogle Scholar
 White J, Nagarajan N, Pop M. Statistical methods for detecting differentially abundant features in clinical metagenomic samples. PLoS Comput Biol. 2009; 5:1000352. doi:10.1038/nmeth.f.303.View ArticleGoogle Scholar
 Liu Z, Hsiao W, Cantarel B, Drbek E, FraserLiggett C. Sparse distancebased learning for simultaneous multiclass classification and feature selection of metagenomic data. Bioinformatics. 2011; 27(23):3242–49. doi:10.1093/bioinformatics/btr547.PubMedPubMed CentralView ArticleGoogle Scholar
 Gevers D, Kugathasan S, Denson LA, VázquezBaeza Y, Van Treuren W, Ren B, et al.The treatmentnaive microbiome in newonset Crohn’s disease. Cell Host Microbe. 2014; 15(3):382–92. doi:10.1016/j.chom.2014.02.005.PubMedPubMed CentralView ArticleGoogle Scholar
 Boutin S, Bernatchez L, Audet C, Derôme N. Network analysis highlights complex interactions between pathogen, host and commensal microbiota. PLoS One. 2013; 8(12):e84772. doi:10.1371/journal.pone.0084772.PubMedPubMed CentralView ArticleGoogle Scholar
 Hurwitz BL, Westveld AH, Brum JR, Sullivan MB. Modeling ecological drivers in marine viral communities using comparative metagenomics and network analyses. Proc Natl Acad Sci U S A. 2014; 111(29):10714–9. doi:10.1073/pnas.1319778111.PubMedPubMed CentralView ArticleGoogle Scholar
 Liu Z, Sun F, Braun J, McGovern D, Piantadosi S. Multilevel Regularized Regression for Simultaneous Taxa Selection and Network Construction with Metagenomic Count Data. Bioinformatics. 2015; 31(7):1067–74.PubMedView ArticleGoogle Scholar
 Horvath S, Dong J. Geometric interpretation of gene coexpression network analysis. PLoS Comput Biol. 2008; 4(8):e1000117. doi:10.1371/journal.pcbi.1000117.PubMedPubMed CentralView ArticleGoogle Scholar
 Barabási AL, Oltvai ZN. Network biology: understanding the cell’s functional organization. Nat Rev Genet. 2004; 5(2):101–13.PubMedView ArticleGoogle Scholar
 Krämer N, Schäfer J, Boulesteix AL. Regularized estimation of largescale gene association networks using graphical Gaussian models. BMC Bioinformatics. 2009; 10:384.PubMedPubMed CentralView ArticleGoogle Scholar
 Stelling J, Klamt S, Bettenbrock K, Schuster S, Gilles ED. Metabolic network structure determines key aspects of functionality and regulation. Nature. 2002; 420(6912):190–3.PubMedView ArticleGoogle Scholar
 Guimera R, Nunes Amaral LA. Functional cartography of complex metabolic networks. Nature. 2005; 433(7028):895–900.PubMedPubMed CentralView ArticleGoogle Scholar
 Kreimer A, Borenstein E, Gophna U, Ruppin E. The evolution of modularity in bacterial metabolic networks. Proc Natl Acad Sci U S A. 2008; 105(19):6976–81.PubMedPubMed CentralView ArticleGoogle Scholar
 Wang Z, Gerstein M, Snyder M. RNASeq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009; 10(1):57–63.PubMedPubMed CentralView ArticleGoogle Scholar
 Allen GI, Liu Z. A Local Poisson Graphical Model for inferring networks from sequencing data. IEEE Trans Nanobioscience. 2013; 12(3):189–98.PubMedView ArticleGoogle Scholar
 Zhang L, Mallick BK. Inferring gene networks from discrete expression data. Biostatistics. 2013; 14(4):708–22.PubMedView ArticleGoogle Scholar
 Faust K, Raes J. Microbial interactions: from networks to models. Nat Rev Microbiol. 2012; 10(8):538–50.PubMedView ArticleGoogle Scholar
 Barberán A, Bates ST, Casamayor EO, Fierer N. Using network analysis to explore cooccurrence patterns in soil microbial communities. ISME J. 2012; 6(2):343–51.PubMedView ArticleGoogle Scholar
 McMurdie PJ. Holmes S. phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS One. 2013; 8(4):e61217.PubMedPubMed CentralView ArticleGoogle Scholar
 Oberauner L, Zachow C, Lackner S, Högenauer C, Smolle KH, Berg G. The ignored diversity: complex bacterial communities in intensive care units revealed by 16S pyrosequencing. Sci Rep. 2013; 3:1413.PubMedPubMed CentralView ArticleGoogle Scholar
 Lozupone CA, Stombaugh JI, Gordon JI, Jansson JK, Knight R. Diversity, stability and resilience of the human gut microbiota. Nature. 2013; 489(7415):220–30.View ArticleGoogle Scholar
 Friedman J, Alm EJ. Inferring correlation networks from genomic survey data. PLoS Comput Biol. 2012; 8(9):e1002687.PubMedPubMed CentralView ArticleGoogle Scholar
 Clarke KR, Somerfield PJ, Chapman MG. On resemblance measures for ecological studies, including taxonomic dissimilarities and a zeroadjusted BrayCurtis coefficient for denuded assemblages. J Exp Mar Biol Ecol. 2006; 330:55–80.View ArticleGoogle Scholar
 Liu H, Han F, Yuan M, Lafferty J, Wasserman L. High dimensional semiparametric gaussian copula graphical models. 2012. Technical report, Johns Hopkins University.Google Scholar
 Croux C, Dehon C. Influence functions of the Spearman and Kendall Correlation measures. Stat Methods Appl. 2010; 19(4):497–515.View ArticleGoogle Scholar
 Legendre P, Gallagher ED. Ecologically meaningful transformations for ordination of species data. Oecologia. 2001; 129:271–80.View ArticleGoogle Scholar
 Mitra S, Gilbert JA, Field D, Huson DH. Comparison of multiple metagenomes using phylogenetic networks based on ecological indices. ISME J. 2010; 4(10):1236–42.PubMedView ArticleGoogle Scholar
 Parks DH, Beiko RG. Measuring community similarity with phylogenetic networks. Mol Biol Evol. 2012; 29(12):3947–58.PubMedView ArticleGoogle Scholar
 Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics. 2008; 9(3):432–41.PubMedView ArticleGoogle Scholar
 Banerjee O, El Ghaoui L. Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. J. Mach. Learn. Res. 2008; 9:485–516.Google Scholar
 Liu H, Roeder K, Wasserman L. Stability approach to regularization selection for high dimensional graphical models. Advances Neural Inf Process Syst. 2010;:1432–1440.Google Scholar
 Meinshausen N, Bühlmann P. Stability selection. JR Stat Soc Series B Stat Methodol. 2010; 72:417–73.View ArticleGoogle Scholar
 Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Stat Mech Theory Exp. 2008; 10:P10008.View ArticleGoogle Scholar
 Newman ME. Finding community structure in networks using the eigenvectors of matrices. Phys Rev E Stat Nonlin Soft Matter Phys. 2006; 74(3 Pt 2):036104.PubMedView ArticleGoogle Scholar
 Costello EK, Lauber CL, Hamady M, Fierer N, Gordon JI, Knight R. Bacterial community variation in human body habitats across space and time. Science. 2009; 326:1694–97.PubMedPubMed CentralView ArticleGoogle Scholar
 Salipante SJ1, Hoogestraat DR, Abbott AN, Sengupta DJ, Cummings LA, ButlerWu SM, et al. Coinfection of Fusobacterium nucleatum and Actinomyces israelii in mastoiditis diagnosed by nextgeneration DNA sequencing. J Clin Microbiol. 2014; 52(5):1789–92.PubMedPubMed CentralView ArticleGoogle Scholar
 Hazelton BJ, Axt MW, Jones CA. Pasteurella canis osteoarticular infections in childhood: review of bone and joint infections due to pasteurella species over 10 years at a tertiary pediatric hospital and in the literature. J Pediatr Orthop. 2013; 33(3):e34–8. doi:10.1097/BPO.0b013e318287ffe6. Review. PMID: 23482278.PubMedView ArticleGoogle Scholar
 Remely M, Aumueller E, Jahn D, Hippe B, Brath H, Haslberger AG. Microbiota and epigenetic regulation of inflammatory mediators in type 2 diabetes and obesity. Benef Microbes. 2014; 5(1):33–43.PubMedView ArticleGoogle Scholar
 Ferrer M, Martins Dos Santos VA, Ott SJ, Moya A. Gut microbiota disturbance during antibiotic therapy: A multiomic approach. Gut Microbes. 2013; 5(1):64–70.PubMedPubMed CentralView ArticleGoogle Scholar
 Hedin CR, McCarthy NE, Louis P, Farquharson FM, McCartney S, Taylor K, et al. Altered intestinal microbiota and blood T cell phenotype are shared by patients with Crohn’s disease and their unaffected siblings. Gut. 2014. doi:10.1136/gutjnl2013306226.
 Sassone LM, Fidel RA, Faveri M, Figueiredo L, Fidel SR, Feres M. A microbiological profile of unexposed and exposed pulp space of primary endodontic infections by checkerboard DNADNA hybridization. J Endod. 2012; 38(7):889–93.PubMedView ArticleGoogle Scholar
 Nascimento Cd, Pita MS, Fernandes FH, Pedrazzi V. de Albuquerque Junior RF, Ribeiro RF. Bacterial adhesion on the titanium and zirconia abutment surfaces. Clin Oral Implants Res. 2014; 25(3):337–43. doi:10.1111/clr.12093.PubMedView ArticleGoogle Scholar
 Metzger Z, Lin YY, Dimeo F, Ambrose WW, Trope M, Arnold RR. Synergistic pathogenicity of Porphyromonas gingivalis and Fusobacterium nucleatum in the mouse subcutaneous chamber model. J Endod. 2009; 35(1):86–94.PubMedView ArticleGoogle Scholar
 Rubinov M, Sporns O. Complex network measures of brain connectivity: uses and interpretations. Neuroimage. 2010; 52(3):1059–69.PubMedView ArticleGoogle Scholar