SoyBase was originally developed as the USDA-ARS soybean genetics database. Since its inception, SoyBase has matured from a genetic map-based database to include the just-released soybean genomic sequence and its annotation. The SoyBase database contains numerous data types including QTL, locus and phenotypic data for soybean. Gramene [2, 3]is a database for comparative genomics of the grasses, which offers a suite of tools and data for comparing different grass species. The Gramene database is a resource for comparative genetics and genomics of plants, which holds numerous classes of data related to the molecular biology, genetics and genomics of the species present in the database. The Legume Information System (LIS)  is a USDA-ARS funded information resource for comparative genetics across legume species. The mission of LIS is to help basic science researchers translate and leverage information from the data-rich model and crop legume plants to fill knowledge gaps across other legume species and to provide the ability to traverse interrelated data types. While SoyBase is a species-specific database for soybean (Glycine max, (L.) Merr.), it contains many of the same data classes as the Gramene database, including information on agronomically important plant phenotypes (traits) and genetically mapped quantitative traits, commonly referred to as QTLs (Quantitative Trait Loci). Soybeans are not grasses; they are legumes. Thus while SoyBase shares data types with Gramene, it shares data and evolutionary relevancy with LIS. SoyBase contains QTL and genetic information for soybean, Gramene includes QTLs identified for numerous agronomic traits in the grasses with information on associated traits and coordinates for their loci on various genetic maps, and LIS includes cross-legume comparative data. All three major information resources afford their users the ability to go to their respective websites and search or browse sequences, genes, traits and other data from major cereal crops (Gramene) or legumes (SoyBase, LIS), yet cross website scientific integration is laborious and largely unstructured.
Two challenges that SoyBase, Gramene, and LIS face today are discovery of relevant external web services and the integration of external data sources into integrated presentations for their users. To illustrate the discovery challenge from the perspective of a user, consider a researcher searching the web for QTL services, where--other than going to particular resources already known to the researcher such as SoyBase or Gramene--one has few discovery options other than beginning a search with web search engines such as Google. However, the results returned by Google when searching on the string 'QTL' or similar keys are varied in their context and relevancy. So instead of searching for web resources with the string 'QTL' with its lack of contextual relevancy, we seek a method to provide users with a way to find services that operate on formal QTL objects and other data types based on well defined data models. When we offer such contextualized services, they can be invoked directly by users with appropriate front-ends as well as integrated by us in our own informatic offerings. Additionally and importantly, analysis of our requirements shows that the term "users" needs to be interpreted broadly: we seek not just the capability to allow people to better find data and services, but to allow computers, without human assistance, to both discover and engage such resources. With this capability, SoyBase, Gramene, and LIS can deploy automated programs to discover, engage, assess, assimilate, and variously integrate disparate data and services to provide more productive resources for our users.
An immediate and well-known difficulty in discovering and engaging disparate data and services is the non-standard, idiosyncratic structure of data resources and the idiosyncratic ways common data is described. Under normal circumstances, a database administrator that wanted to transfer data between databases would first have to examine the external database schema, looking at the sometimes cryptic labels for the fields in each of the external database's tables, and try to determine if there is an analogous field in one's own database before populating data. This process would, of course, have to be repeated with each external database the administrator wished to integrate. Documentation helps direct this process, but utilizing that documentation as human readers is low-throughput and non-automated. Web service application programming interfaces (APIs) can alleviate some of the low level issues in data retrieval and transfer, but they do not standardize discovery and invocation across providers.
Major information resources such as NCBI eUtils  and EMBL-EBI [6, 7] offer web services and to a lesser extent Gramene and Soybase with such services as Distributed Annotation Servers (DAS) and the Genomic Diversity and Phenotype Connection servers (GDPC)  are moving in this direction. These interfaces allow programmatic search and retrieval of data and engagement of services. Furthermore, one can discover these services using specialized search engines such as BioCatalogue . But a limitation of this approach is that the underlying technologies do not lend themselves easily to semantic markup [11, 12]. Without semantic markup, it is virtually impossible to write generic programs to discover and engage services without low-throughput human intervention. Without infrastructural support, non-semantic services require programmers to write custom programs or scripts to engage and parse each individual web service--a process that inherently does not scale to thousands of resources on the web. Efforts to address this limitation exist. For example BioMoby  uses a classification scheme of data and service ontologies to allow web services and their data to be tagged using publicly available terms. Research into more formalized web service composition expands the domain into automated on-demand workflows [14, 15] and Service-Oriented Computing . Still, many of these approaches are built upon an ad hoc semantic that either does not lend itself to formalized reasoning, or is promising yet not sufficiently developed from research to production grade.
The issue of semantics and ontologies is non-trivial. It is long noted that it is difficult to agree on what to call something as (seemingly) simple as plant anatomy or genes. Terms may become overloaded, such as "locus" or "marker," yielding different meanings in different contexts. Ontologies and the discussions surrounding their creation have helped alleviate some of the ambiguity, or at least aided in the establishment common terms for the purpose of knowledge classification and data exchange. Yet the nature of classification itself raises substantial conceptual challenges beyond simple agreement on terms . It is unclear if any static ontological approach can ever fully capture the rich diversity of concepts and instantiations seen in biology. Data schemas used by information resources will likely share major concepts, but it is equally as likely that specific implementations will also differ in ways that make integration of their contents laborious. We reject an approach where database designers would mold their web service offerings around a universal model, but rather we enable a model whereby there is a shared and vibrant semantic, building upon existing ontologies, and as appropriate, extending or creating new ontologies under a formal semantic.
To address this, we sought a system that is based on a simple, REST-based architecture (REpresentational State Transfer, ) using industry-standard semantic web--rather than web service--technologies. Semantic web technologies, such as the W3C-sanctioned language OWL, provide a formal semantic and logic for grounding web resource descriptions. We describe here our use of the Simple Semantic Web Architecture and Protocol (SSWAP) [18, 19] to build a system that allows semantically robust description, discovery, and invocation of semantic web services. SSWAP affords data or service providers with the ability to describe their resources using semantics in a way that is both consistent with the use of shared community ontologies and amenable to formalized reasoning. This is particularly important when trying to transfer analogous data between distinct and independently created databases. SSWAP is a semantic web service architecture, protocol, and platform that allows users to re-use or create ontologies for their data, thus leveraging the efforts of many groups but still allowing web resources the ability to describe data that is unique to their offering or not addressed by other ontologies. SSWAP enables services to be described, discovered, and engaged based on the use of an extensible and formalize semantic, rather than the ad hoc conventions of simple lexical token matching.