Mango: combining and analyzing heterogeneous biological networks
© The Author(s) 2016
Received: 18 March 2016
Accepted: 20 June 2016
Published: 2 August 2016
Heterogeneous biological data such as sequence matches, gene expression correlations, protein-protein interactions, and biochemical pathways can be merged and analyzed via graphs, or networks. Existing software for network analysis has limited scalability to large data sets or is only accessible to software developers as libraries. In addition, the polymorphic nature of the data sets requires a more standardized method for integration and exploration.
Mango facilitates large network analyses with its Graph Exploration Language, automatic graph attribute handling, and real-time 3-dimensional visualization. On a personal computer Mango can load, merge, and analyze networks with millions of links and can connect to online databases to fetch and merge biological pathways.
Mango is written in C++ and runs on Mac OS, Windows, and Linux. The stand-alone distributions, including the Graph Exploration Language integrated development environment, are freely available for download from http://www.complex.iastate.edu/download/Mango. The Mango User Guide listing all features can be found at http://www.gitbook.com/book/j23414/mango-user-guide.
KeywordsSystems biology Heterogeneous data integration Biological pathway analysis 3D visualization Graph mathematics
In the present Big Data era, one of the great challenges is to be able to compare or integrate diverse data types. Modern biological research produces large and heterogeneous data sets, and there are many ways to categorize or display each type of data. The 2014 Nucleic Acids Research Database Special Issue counted 1552 online biological databases . It is often illuminating, even essential, to examine important biological problems using different types of data. For example, new discoveries often emerge when a biologist is able to interrogate gene expressions in the context of biological pathways . A common method to analyze related data relies on graphs, or networks, where data of various types are linked and key network features or subsets are identified [3–5].
Comparison of graph visualization software
Graph analysis features
· Many algorithms for systems biology
· 2D predetermined layout
· Can only merge 2 graphs at a time
· Can add GO or KEGG attributes
· 3D predetermined layout (via plug-in)
· 6 min to load a network with 4 M links
· Plug-ins available
but no visual afterward
· Intuitive graph statistics
· 2D and 3D layouts but graphs cannot be
· Cannot display multiple graphs on one
· Automated graph algorithm citation
rotated in 3D
· Generalized for all types of graphs
· Graph layout animation helps maintain
· Limited by JVM constraints; cannot load
· Plug-ins available
a network with 4 M links
· GYTHON, a language for graph analysis
· 2D layout only
· Cannot be run on MacOS 10.9, Windows
· Can map information attributes to visual
· Update with user commands
7, or Redhat Linux 6.0
·No graph analysis capabilities
·Rich set of predetermined 2D layouts
·Not an interactive system
· Streamlined command line interface
· Cannot efficiently handle graphs over
·Graph database system
·Relies on JSON for visualization
·Designed as a backend to database sup-
· Cypher graph query language
· 2D layouts only
port rather than for visualization
· Queries are based on a combination of
· Have to click a node or link to see its
· Nodes are only labeled by numbers
topology and attributes
attributes on a separate panel
· The whole database is one huge graph
· A set of C++ libraries for graph analysis
· 2D visualization
· More useful to users who program C++
· Can also be run as stand-alone program
· 3D is available through plug in
or python directly
· Plug-ins can be created in Python
· Had some 3D layout algorithms
· More analysis than visualization features
· Python module for graph analysis
· Must export to other software or
· Useful only as an analysis tool
· Rich set of network algorithms
modules for visualization
· Provides general graph mathematics
· Interactive 3D layouts and controls
· Does not yet have plug-in feature
· Heterogeneous graph analysis with ease
· Real-time large graph visualization
· Does not yet use GPU speedup
· Takes ∼30 s to load a 4M link network
· User customizable visual attributes
· Limited set of preset layouts
To address these limitations, we have developed a stand-alone graph analysis and visualization software environment called Mango to aid biologists and other researchers efficiently integrate and explore heterogeneous networks larger than previously possible. A 4 million link network can be loaded into Mango in 30 seconds on a Mid 2010 Mac mini computer with a 2.4 GHz (Gigahertz) Intel Core 2 Duo processor and 8 GB RAM (random access memory). As a comparison, Cytoscape took 6 minutes to load that same network file on the same computer using its default configurations. Mango possesses the scalability to handle larger networks, the expressive power of a new Graph Exploration Language (Gel) and the convenience of unlimited graph attributes with automatic graph attribute merging and promotion. Within the integrated development environment, Gel commands can be edited, run line-by-line, or saved as scripts to reproduce results. Script files enhance the speed and reproducability of analysis . Mango provides both comprehensive graph analyses and real-time 3-dimensional (3D) visualization. Mango is a cross-platform C++ program that runs on Mac OS X 10.9 or later, Windows 7 or later, and many Linux variants. It is freely available from our website (http://www.complex.iastate.edu/download/Mango) and the Mango User Guide is hosted at GitBook (http://www.gitbook.com/book/j23414/mango-user-guide).
The Mango user interface
The Graph Exploration Language (Gel)
Other than defining a graph in the native graph exploration language, Mango can read graph data in tabular or CSV (comma separated values) format using the import command. A properly formatted graph file lists nodes with their attributes and then links with their attributes. A single line containing a hyphen separates the node list from the link list. The full description of the import command is in the Mango User Guide.
Mango system-defined graph attributes are appended to user defined attributes. The system-defined attributes are related to the 3D visualization of a network and define such attributes like node position, node color, or link width. Therefore, generating any 3D visualization is a matter of mapping user defined information attributes to system defined visualization attributes . By dynamically changing these mappings, animations and simulations can be accomplished in Mango. A full listing of the visualization attributes is in the Mango User Guide.
Standards for combining heterogeneous graphs
The above mathematics can be extended across multiple graphs to create unions (G A .+G B .+G C ), differences (G A .−G B .−G C or G A −G B −G C ), intersections (G A .& G B .& G C ) and inverse graphs (G A ∗G A −G A ). The graph operations can be mixed and matched to produce more complex results. Figure 3b demonstrates a few of the graph mathematics visually.
When graphs are combined in mathematical operations, attributes from two graphs might conflict. For example, the link between b and d nodes in G A may have a weight attribute of 0.4 while the link between b and d nodes in G B may have a weight attribute of 0.3. Gel handles attribute conflicts by giving preference to the left operand. During the operation G A .+G B , the left operand G A takes precedence and the resulting graph will have weight value 0.4. An exception to this rule is when the conflicting attributes in G A happen to be at their default values (default values can be defined by users). In those cases, the attributes of graph G B will be copied. This automatically merges useful non-default information from G B into the resulting graph.
However, simply writing G C =G A .+G B will not work as the tag attribute from G B is already lost after the addition of G B to G A but before the result is assigned to G C . The correct steps to preserve graph attributes during heterogeneous graph mathematics are demonstrated below (Fig. 3a):
Flexible node and link type definition coupled with an intuitive set of attribute promotion and merging rules ease the combination of heterogeneous graphs in Gel. Thus users can focus on graph level operations instead of attribute level selection, sorting, and merging.
Many graph analyses require traversing all nodes and links to perform a calculation based on graph attributes or topology. Gel provides the select command to pull out a subgraph based on user-defined conditions. These conditions can be related to stored attribute values or topology properties. Gel also allows mapping or computing new attribute values across a graph on a per-node or per-link basis with the foreach command, which efficiently applies a set of user-defined calculations across all nodes or links that optionally meet certain conditions. The same command can also be used to tally attribute values across all nodes and links. The following demonstrates the two types of Gel commands:
In addition to the data types, graph mathematics, automatic attribute handling and traversal commands; Gel also provides commands for object modification, data examination, input and output, code execution, graph construction, and simulation. A growing set of built-in functions for mathematics, visualization control, graph layouts, and statistical reporting are also provided. To explore all Gel commands and functions, type the help command in Mango or consult the online User Guide.
The Mango system and its Graph Exploration Language are data agnostic, meaning that any type of network can be loaded and analyzed – users have total control of node and link attribute definitions and their associations within Mango. Our goal is to make this software widely available to all researchers and promote its use in solving ever more complex biological research problems.
Results and discussion
We present a few network analysis examples to illustrate the use of Mango in this section. Examples of comparing different types of biological networks and the scalability of Mango to large networks are provided.
Network data collection
Summary of 4 large heterogeneous biological networks for E. coli
WGCNA correlation weight
count and string of shared GO terms
source of evidence (Y2H, LIT or both)
Large heterogeneous network comparison
For all networks, nodes are identified by gene names with no additional attributes, thus the following node type declaration can be shared among the networks:
All networks have undirected links but differ in their link attributes (the path network does not contain any link attributes), thus the following 4 link type declarations are used to load the different networks:
After the node and link type declarations, the corr network, path network, go network, and ppi network can be imported into Mango for all-to-all network comparisons:
For the integration of the networks, a common link type including all available link attributes is declared:
Once the networks are loaded into Mango, Gel mathematics allow network integration and comparisons. For example, the comparison of the corr and path networks are visualized in the top two panels in the left column of Fig. 1. The top middle panel in Fig. 1 is the result of the following Gel intersect operation.
Benchmarking the speed of Gel mathematics on massive graphs
Time (in seconds)
4 M+=8 K
0.92, 0.35, 0.27, 0.60, 0.56
8 K+=4 M
1.25, 1.15, 1.03, 1.02, 1.02
4 M−=8 K
0.52, 0.33, 0.62, 0.33, 0.25
8 K−=4 M
1.09, 1.28, 1.09, 1.16, 1.19
4 M.+=8 K
0.69, 0.60, 0.57, 0.31, 0.40
8 K.+=4 M
12.06, 12.09, 12.05, 12.23, 12.32
4 M.−=8 K
0.55, 0.41, 0.25, 0.26, 0.32
8 K.−=4 M
0.90, 0.85, 0.83, 0.98, 0.74
4 M∗=8 K
22.94, 23.74, 23.35, 22.98, 23.03
8 K∗=4 M
36.75, 35.33, 35.23, 35.38
7.90, 7.76, 7.85, 7.73, 7.87
0.30, 0.52, 0.45, 0.34, 0.29
Flexible real-time network exploration and visualization
Over-plotting of nodes and links becomes more of a challenge as network sizes get bigger. For example, the corr and path networks and their combination can be visualized in Mango but provide limited biological interpretation (the left column of panels in Fig. 1). In this example, we continue to explore the intersection of the two networks by querying certain node and link attributes, imposing thresholds to reveal important features, and map these features to network visualization.
First we arrange all nodes in the intersection network along a circle in the x-y plane and map the node connectivity to their z-axis coordinates. Nodes are assigned random colors and higher z-axis node colors are bled down the links to emphasize hubs. Nodes above a threshold are emphasized by increasing their radius and labeling them with gene names and connectivity.
The resulting network layout, called a crown-plot, is shown on the top pane in the middle column of Fig. 1. The hub genes and their links can be pulled into a new sub-network. The sub-network called hubs is then flattened and spread out using a force-directed layout built into the graph panel by right-clicking on the panel. The hub genes are raised one level. Genes that are not themselves hubs but connect two or more hubs are raised to a third level. The following Gel code accomplishes all these except the force-directed layout, which is performed by right-clicking on the panel:
The 3-layer hubs network is shown in the lower panel in the middle column of Fig. 1, which contains other genes on the bottom layer, hub genes on the middle layer and in-betweener genes on the top layer. It is worth mentioning that the in-betweener genes on layer 3 would have been obscured by other genes in a simple list of genes ordered by connectivity. We can further pull out the hubs and in-betweeners into another sub-network for closer inspection with the following Gel code:
This sub-network is laid out as a bipartite graph shown on the right panel in Fig. 1, with hubs on the left and the in-betweeners on the right. This example shows how to map informational attributes of a graph to its visual attributes using Mango. The resulting visual displays help the user decide threshold values, extract sub-networks of interest, and further explore the data.
Microarray expression combined with KEGG biological pathways
E. coli gene expression under control and multiple treatment conditions were measured by microarrays (GSE61736, ). A subset of the data containing one control and one treatment expression values was loaded into Mango and overlaid onto downloaded E. coli KEGG biological pathways. The expression data, E. coli KEGG pathways, and Gel script are available for download from https://github.com/j23414/Mango_Workshop.
More than coloring nodes in a network, we are able to color the links and thereby highlight entire pathways that are up or down-regulated. This is possible because KEGG pathways also contain gene to gene links, not just gene to compound links.
The final network can be saved and reloaded to regenerate the same 3D visualization.
Mango networks are saved natively into Gel commands, thus running the saved code recreates the original graphs in Mango. In addition, the networks can be exported to tabular data using the export command. The tabular data can then be read by many other software programs, e.g., Excel, R, Matlab, Cytoscape, and other graph software or databases. Full descriptions of the interoperability and other features of Mango are available in the User Guide.
We have developed a powerful new program Mango for multi-network analysis and visualization. Mango enables scientists to test hypotheses on large heterogeneous networks, identify crucial features, and extract analysis results all within its integrated environment. Compared with existing programs, Mango extends the capability and convenience of large heterogeneous data analysis on a personal computer.
The Mango system was designed to be data agnostic, meaning that any type of network data can be loaded and analyzed – users have total control on node and link attribute definitions and their associations within Mango. Mango can load networks with millions of links, integrate and explore large amounts of data following Gel commands, and help users deduce predictions or outcomes that can be validated in labs. It is our goal to make this software widely available to all researchers to promote its use in solving ever more complex biological research problems. As Mango developers, we will continue to provide support and further develop the software according to user needs.
Availability and requirements
Project name: Mango 1.24.
Project home page: http://www.complex.iastate.edu/download/Mango/
Operating system(s): Mac OS X 10.9 or later, Windows 7 or later, and Linux variants. Both 32- and 64-bit operating systems are supported.
Programming language: C++
Other requirements: An Internet connection for online database access.
License: Free versions available; specific license agreement included with each distribution.
Any restriction to use by non-academics: Specific restrictions included with each distribution and license agreement.
2D, 2-dimensional; 3D, 3-dimensional; CSV, comma separated values; Gel, graph exploration language; GO, gene ontology; GHz, Gigahertz; KEGG, Kyoto Encyclopedia of Genes and Genomes; PPI, protein-protein interaction; RAM, random access memory; WGCNA, weighted gene correlation network analysis
We thank Dr. Jo Anne Powell-Coffman, Zebulun Arendsee, and Kannan Sankar for proof-reading the draft manuscript and offering valuable suggestions.
This work is partially supported by the National Science Foundation grant DBI-0850195 and the Iowa State University Plant Sciences Institute Scholar grant to HC. JC is partially supported by the James Cornette Research Fellowship. None of these funding agencies had any role in the design of the study, data collection, analysis and interpretation, or in writing the manuscript.
Availability of data and materials
JC developed the Mango system and drafted the manuscript. HJC carried out the E. coli studies and collected the microarray data. HHC developed the Gel language and revised the manuscript. All authors read and approved the final manuscript.
JC and HHC have founded a software company and have licensed Mango from Iowa State University for further development. A free and functional Mango will always be made available to the public which can be downloaded and used by anyone including commercial entities.
Consent for publication
Ethics approval and consent to participate
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Fernández-Suárez XM, Rigden DJ, Galperin MY. The 2014 nucleic acids research database issue and an updated nar online molecular biology database collection. Nucleic Acids Res. 2014; 42(D1):1–6.View ArticleGoogle Scholar
- Khatri P, Sirota M, Butte AJ. Ten years of pathway analysis: current approaches and outstanding challenges. PLoS Comput Biol. 2012; 8(2):1002375.View ArticleGoogle Scholar
- Jeong H, Mason SP, Barabási AL, Oltvai ZN. Lethality and centrality in protein networks. Nature. 2001; 411(6833):41–2.View ArticlePubMedGoogle Scholar
- Albert R, Barabási AL. Statistical mechanics of complex networks. Rev Modern Phys. 2002; 74(1):47.View ArticleGoogle Scholar
- Pavlopoulos GA, Secrier M, Moschopoulos CN, Soldatos TG, Kossida S, Aerts J, Schneider R, Bagos PG, et al.Using graph theory to analyze biological networks. BioData mining. 2011; 4(1):10.View ArticlePubMedPubMed CentralGoogle Scholar
- Smoot ME, Ono K, Ruscheinski J, Wang PL, Ideker T. Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics. 2011; 27(3):431–2.View ArticlePubMedGoogle Scholar
- Jarukasemratana S, Murata T. Recent large graph visualization tools: a review. Inf Media Technol. 2013; 8(4):944–60.Google Scholar
- Adar E. Guess: a language and interface for graph exploration. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. Montreal, Quebec, Canada: ACM: 2006. p. 791–800.Google Scholar
- Bastian M, Heymann S, Jacomy M, et al.Gephi: an open source software for exploring and manipulating networks. ICWSM. 2009; 8:361–2.Google Scholar
- Auber D. Tulip-a huge graph visualization framework. In: Graph Drawing Software. Berlin Heidelberg: Springer: 2004. p. 105–26.Google Scholar
- Sandve GK, Nekrutenko A, Taylor J, Hovig E. Ten simple rules for reproducible computational research. PLoS Comput Biol. 2013; 9(10):1003285.View ArticleGoogle Scholar
- Wilkinson L. The grammar of graphics. New York: Springer; 2006.Google Scholar
- Langfelder P, Horvath S. Wgcna: an r package for weighted correlation network analysis. BMC Bioinforma. 2008; 9(1):559.View ArticleGoogle Scholar
- Cho H, Chou HH. Thermodynamically optimal whole-genome tiling microarray design and validation. BMC Res Notes. 2016; 9(1):305.View ArticlePubMedPubMed CentralGoogle Scholar
- Rajagopala SV, Sikorski P, Kumar A, Mosca R, Vlasblom J, Arnold R, Franca-Koh J, Pakala SB, Phanse S, Ceol A, et al.The binary protein-protein interaction landscape of escherichia coli. Nat Biotechnol. 2014; 32(3):285–90.View ArticlePubMedPubMed CentralGoogle Scholar