Filling the gap between biology and computer science
BioData Miningvolume 1, Article number: 1 (2008)
This editorial introduces BioData Mining, a new journal which publishes research articles related to advances in computational methods and techniques for the extraction of useful knowledge from heterogeneous biological data. We outline the aims and scope of the journal, introduce the publishing model and describe the open peer review policy, which fosters interaction within the research community.
Aim and scope
BioData Mining  is an open-access, open peer-reviewed, online journal that publishes articles on the development of data mining techniques applied to biological data. The journal stems from the gap between biology and computer science and covers a number of topics in the middle of these fields. One of the main interests of BioData Mining is the advance in computational methods or theoretical informatics for the progress in the discovery of new knowledge in biomedical sciences.
Data mining  techniques have been traditionally used in many varied contexts. Usually datasets contained many examples (thousands) and some attributes (at most several tens). Algorithms have been developed taking into account these characteristics, and have been validated by means of statistical tests with synthetic and real-world data. Statistics has been the support for any analysis of biological data for many years. However, the biological data has changed over time in size, but above all in structure, and many challenges arise from genetic, transcriptomic, genomic, proteomic and metabolomic data.
The enormous increase of biological data incorporates another element of difficulty because statistics, without losing its relevance, has moved to the background leaving in the foreground a space for complex heuristics. In addition, the curse of dimensionality plays an important role in the design of new data mining algorithms. However, the most important challenge comes from the intrinsic characteristics of new problems to be solved. Due to the high volume of data, optimization and efficiency are key aspects in the design of new heuristics, which many times only provide approximate solutions.
In this sense, BioData Mining aims at publishing articles that not only adapt, evaluate or apply traditional data mining techniques, but also that develop, evaluate or apply novel methods from data mining or machine learning fields to the analysis of complex biological data.
Moreover, the situation has substantially changed during the last decade. Nowadays, biological information is distributed and adopts different formats. It is not trivial to consider different types of data, which are located in different databases and present various levels of structure or heterogeneity. In some cases the effort is focused on facilitating the management of biological information, dealing with semantic aspects of the information through the Internet.
In order to promote the advance in science many research groups are making their software development projects publicly available, as open-source software, which encourages researchers to develop extensions of verified software applications, like interfaces, packages or specific services.
BioData Mining aims at publishing articles that design, develop and integrate databases, software and web services for the storage, management and retrieval of complex biological data, with emphasis on open-source software for the application of data mining to the analysis such type of information.
The role of biologists, geneticists, physicians, etc. is critical in the correct interpretation of results obtained by data mining algorithms. In many cases, data needs to be pre-processed for extracting useful knowledge and, in some cases, algorithms produce models that must be post-processed to get an insight of the knowledge that information hides. At the end, experimental validation is crucial to show the research community the quality of the approaches. In this field, statistics offers robust tools that can be applied directly, although new developments are also needed to deal with biological data.
BioData Mining aims at publishing articles that present new methods for pre-processing, post-processing and validation of data mining algorithms for the analysis of genetic, transcriptomic, genomic, proteomic, and metabolomic data.
In the expectation of filling the gap between biology and computer science, we believe that BioData Mining will contribute to the development of theoretical and practical aspects of new methodologies driven by biological data.
Open access and open peer review publishing model
The time interval between the date an article is written and the date an article is read should be as short as possible. Long intervals are mainly due to slow reviewing process and limited access to articles. BioData Mining will put much effort into reducing the reviewing process to several weeks, and will avoid the other aspect due to the open access nature of the journal, i.e., articles will be fully accessible online to any reader immediately upon publication.
In order to make the peer review process transparent BioData Mining has adopted an open-review policy. Reviewers' names are included on the peer review reports and are made publically available upon acceptance of an article. We believe that this will foster constructive reviews, and therefore enrich the criticism. This policy will contribute greatly in driving young researchers to improve the quality of their articles.
During the last years, many journals have adopted the open-access policy. Nowadays the success is unquestionable. We expect that the open peer review policy will follow a similar path in the near future, and some experiences show enthusiasm for the concept, such as PLoS ONE, that strongly urge reviewers to relinquish the anonymity to promote open decision-making.
Finally, to facilitate the search for topics or related research in articles published in BioData Mining, the readers will find all the articles archived in PubMed Central .
The journal is run by two Editors-in-Chief, who subscribe this editorial. Marylyn D. Ritchie acts as Managing Editor. The members of the Editorial Board cover a wide range of research fields related to Biology and Computer Science. To mention some expertise, it ranges from Biomedical Informatics (M. Ramoni , F. Azuaje ) to Structural Bioinformatics (R. Casadio , D. Jones , J. M. Carazo ), from Soft Computing (O. Cordon , I. Zwir ) to Clinical Research (M. Eppstein ), from Machine Learning (E. Marchiori , K. Cios ) to Evolutionary Computation (P. Larrañaga ), from Cancer Research (S. Volinia ) to Data Mining (J. Aguilar-Ruiz , D. Simovici , D. Gamberger , H. Toivonen ), from Biostatistics (J. Rahnenfuhrer ) to High–performance Technologies (R. Schneider ), from Immunology (B.A. McKinney ) to Computational Genetics (J.H Moore , M.D. Ritchie ), from Database Integration (M. Kanehisa ) to Functional Genomics (S. Kasif ), from Software Technologies (A. Omicini ) to Stem Cells (B. Soria ).
BioData mining. [http://www.biodatamining.org]
Hand DJ, Mannila H, Smyth P: Principles of Data Mining. 2001, The MIT Press
PLoS ONE. [http://www.plosone.org]
PubMed Central. [http://www.pubmedcentral.nih.gov]
Schachter A, Ramoni M: Clinical forecasting in drug development. Nat Rev Drug Disc. 2007, 6: 107-108. 10.1038/nrd2246.
Wang H, Zheng H, Simpson D, Azuaje F: Machine learning approaches to supporting the identification of photoreceptor-enriched genes based on expression data. BMC Bioinformatics. 2006, 7: 116-10.1186/1471-2105-7-116.
Bartoli L, Calabrese R, Fariselli P, Mita D, Casadio R: A computational approach for detecting peptidases and their specific inhibitors at the genome level. BMC Bioinformatics. 2007, 8: S3-10.1186/1471-2105-8-S1-S3.
Lise S, Walker-Taylor A, Jones DT: Docking protein domains in contact space. BMC Bioinformatics. 2006, 7: 310-10.1186/1471-2105-7-310.
Scheres S, Gao H, Valle M, Herman G, Eggermont P, Frank J, Carazo J: Disentangling conformational states of macromolecules in 3D-EM through likelihood optimization. Nature Methods. 2007, 4: 27-29. 10.1038/nmeth992.
Romero R, Rubio C, Cordon O, Cobb J, Herrera F, Zwir I: A multi-objective evolutionary conceptual clustering methodology for gene annotation within structural databases: A case of study on the Gene Ontology database. IEEE Transactions on Evolutionary Computation. [http://ieeexplore.ieee.org/Xplore/login.jsp?url=/iel5/4235/4358751/04469888.pdf?tp=&arnumber=4469888&isnumber=4358751]
Zwir I, Shin D, Kato A, Nishino K, Latifi T, Solomon F, Hare JM, Huang H, Groisman EA: Dissecting the PhoP regulatory network of Escherichia coli and Salmonella enterica. PNAS. 2005, 102 (8): 2862-2867. 10.1073/pnas.0408238102.
Eppstein M, Molofsky J: Invasiveness in plant communities with feedbacks. Ecology Letters. 2007, 10: 253-263. 10.1111/j.1461-0248.2007.01017.x.
Vanhoutte K, Laarakkers C, Marchiori E, Pickkers P, Wetzels J, Willems J, Heuvel van den L, Russel F, Masereeuw R: Biomarker discovery with SELDI-TOF MS in human urine associated with early renal injury: evaluation with computational analytical tools. Nephrol Dial Transplant. 2007, 22 (10): 2932-2943. 10.1093/ndt/gfm170.
Swiercz W, Cios KJ, Staley K, Kurgan LA, Accurso F, Sagel S: A new synaptic plasticity rule for networks of spiking neurons. IEEE Transactions on Neural Networks. 2006, 17: 94-105. 10.1109/TNN.2005.860834.
Larrañaga P, Lozano J: Estimation of Distribution Algorithms. A New Tool for Evolutionary Computation. 2002, Kluwer Academic Publishers, [http://www.springer.com/computer/artificial/book/978-0-7923-7466-4]
Petrocca F, Visone R, Onelli M, Shah M, Nicoloso M, de Martino I, Iliopoulos D, Pilozzi E, Liu C, Negrini M, Cavazzini L, Volinia S, Alder H, Ruco L, Baldassarre G, Croce C, Vecchione A: E2F1-regulated microRNAs impair TGFbeta-dependent cell-cycle arrest and apoptosis in gastric cancer. Cancer Cell. 2008, 13 (3): 272-286. 10.1016/j.ccr.2008.02.013.
Aguilar-Ruiz JS: Shifting and scaling patterns from gene expression data. Bioinformatics. 2005, 21 (20): 3840-3845. 10.1093/bioinformatics/bti641.
Jaroszewicz S, Simovici D, Kuo W, Ohno-Machado L: The Goodman-Kruskal coefficient and its applications in genetic diagnosis of cancer. IEEE Trans Biomed Eng. 2004, 51 (7): 1095-1102. 10.1109/TBME.2004.827267.
Gamberger D, Lavrac N, Krstacic A, Krstacic G: Clinical data analysis based on iterative subgroup discovery: Experiments in brain ischaemia data analysis. Applied Intelligence. 2007, 27: 205-217. 10.1007/s10489-007-0068-9.
Landwehr N, Mielikainen T, Eronen L, Toivonen H, Mannila H: Constrained Hidden Markov Models for Population-based Haplotyping. Bioinformatics. 2007, 8: S9-10.1186/1471-2105-8-59.
Schlicker A, Rahnenfuhrer J, Albrecht M, Lengauer T, Domingues FS: GOTax: investigating biological processes and biochemical activities along the taxonomic tree. Genome Biology. 2007, 8 (3): R33-10.1186/gb-2007-8-3-r33.
Barbosa-Silva A, Satagopam VP, Schneider R, Ortega JM: Clustering of cognate proteins among distinct proteomes derived from multiple links to a single seed sequence. BMC Bioinformatics. 2008, 9: 141-10.1186/1471-2105-9-141.
Kallewaard N, McKinney B, Gu Y, Chen A, Venkataram B, JE Crowe J: Functional maturation of the human antibody response to rotavirus. The Journal of Immunology. 2008, 180: 3980-3989.
Moore JH, Barney N, Tsai C, Chiang F, Gui J, White B: Symbolic modeling of epistasis. Hum Hered. 2007, 63 (2): 120-33. 10.1159/000099184.
Schwarz U, Ritchie M, Bradford Y, Li C, Dudek S, Frye-Anderson A, Kim R, Roden D, Stein C: Genetic determinants of response to warfarin during initial anticoagulation. The New England Journal of Medicine. 2008, 358 (10): 999-1008. 10.1056/NEJMoa0708078.
Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, Yamanishi Y: KEGG for linking genomes to life and the environment. Nucleic Acids Res. 2008, 36: D480-D484. 10.1093/nar/gkm882.
Murali T, Wu C, Kasif S: The art of gene function prediction. Nat Biotechnol. 2006, 24 (12): 1474-5. 10.1038/nbt1206-1474.
Viroli M, Ricci A, Omicini A: Operating Instructions for Intelligent Agent Coordination. The Knowledge Engineering Review. 2006, 21: 49-69. 10.1017/S0269888906000774.
Vaca P, Martin F, Vegara-Meseguer J, Rovira J, Berna G, Soria B: Induction of differentiation of embryonic stem cells into insulin-secreting cells by fetal soluble factors. Stem Cells. 2006, 24 (2): 258-65. 10.1634/stemcells.2005-0058.