Articles

Page 5 of 10

Soft document clustering using a novel graph covering approach

In text mining, document clustering describes the efforts to assign unstructured documents to clusters, which in turn usually refer to topics. Clustering is widely used in science for data retrieval and organi...

Authors: Jens Dörpinghaus, Sebastian Schaaf and Marc Jacobs

Citation: BioData Mining 2018 11:11

Content type: Methodology Published on: 14 June 2018
- View Full Text
- View PDF
Characterizing the effects of missing data and evaluating imputation methods for chemical prioritization applications using ToxPi

The Toxicological Priority Index (ToxPi) is a method for prioritization and profiling of chemicals that integrates data from diverse sources. However, individual data sources (“assays”), such as in vitro bioas...

Authors: Kimberly T. To, Rebecca C. Fry and David M. Reif

Citation: BioData Mining 2018 11:10

Content type: Research Published on: 13 June 2018
- View Full Text
- View PDF
Feature selection for gene prediction in metagenomic fragments

Computational approaches, specifically machine-learning techniques, play an important role in many metagenomic analysis algorithms, such as gene prediction. Due to the large feature space, current de novo gene...

Authors: Amani Al-Ajlan and Achraf El Allali

Citation: BioData Mining 2018 11:9

Content type: Methodology Published on: 7 June 2018
- View Full Text
- View PDF
Gene set analysis methods: a systematic comparison

Gene set analysis is a valuable tool to summarize high-dimensional gene expression data in terms of biologically relevant sets. This is an active area of research and numerous gene set analysis methods have be...

Authors: Ravi Mathur, Daniel Rotroff, Jun Ma, Ali Shojaie and Alison Motsinger-Reif

Citation: BioData Mining 2018 11:8

Content type: Research Published on: 31 May 2018
- View Full Text
- View PDF
Connecting genetics and gene expression data for target prioritisation and drug repositioning

Developing new drugs continues to be a highly inefficient and costly business. By repurposing an existing compound for a different indication, drug repositioning offers an attractive alternative to traditional...

Authors: Enrico Ferrero and Pankaj Agarwal

Citation: BioData Mining 2018 11:7

Content type: Short report Published on: 31 May 2018
- View Full Text
- View PDF
Improving machine learning reproducibility in genetic association studies with proportional instance cross validation (PICV)

Machine learning methods and conventions are increasingly employed for the analysis of large, complex biomedical data sets, including genome-wide association studies (GWAS). Reproducibility of machine learning...

Authors: Elizabeth R. Piette and Jason H. Moore

Citation: BioData Mining 2018 11:6

Content type: Methodology Published on: 19 April 2018
- View Full Text
- View PDF
Collective feature selection to identify crucial epistatic variants

Machine learning methods have gained popularity and practicality in identifying linear and non-linear effects of variants associated with complex disease/traits. Detection of epistatic interactions still remai...

Authors: Shefali S. Verma, Anastasia Lucas, Xinyuan Zhang, Yogasudha Veturi, Scott Dudek, Binglan Li, Ruowang Li, Ryan Urbanowicz, Jason H. Moore, Dokyoon Kim and Marylyn D. Ritchie

Citation: BioData Mining 2018 11:5

Content type: Research Published on: 19 April 2018
- View Full Text
- View PDF
Pairwise gene GO-based measures for biclustering of high-dimensional expression data

Biclustering algorithms search for groups of genes that share the same behavior under a subset of samples in gene expression data. Nowadays, the biological knowledge available in public repositories can be use...

Authors: Juan A. Nepomuceno, Alicia Troncoso, Isabel A. Nepomuceno-Chamorro and Jesús S. Aguilar-Ruiz

Citation: BioData Mining 2018 11:4

Content type: Research Published on: 27 March 2018
- View Full Text
- View PDF
A novel joint analysis framework improves identification of differentially expressed genes in cross disease transcriptomic analysis

Detecting differentially expressed (DE) genes between disease and normal control group is one of the most common analyses in genome-wide transcriptomic data. Since most studies don’t have a lot of samples, res...

Authors: Wenyi Qin and Hui Lu

Citation: BioData Mining 2018 11:3

Content type: Methodology Published on: 20 February 2018
- View Full Text
- View PDF
Investigating the parameter space of evolutionary algorithms

Evolutionary computation (EC) has been widely applied to biological and biomedical data. The practice of EC involves the tuning of many parameters, such as population size, generation count, selection size, an...

Authors: Moshe Sipper, Weixuan Fu, Karuna Ahuja and Jason H. Moore

Citation: BioData Mining 2018 11:2

Content type: Research Published on: 17 February 2018

The Correction to this article has been published in BioData Mining 2019 12:22
- View Full Text
- View PDF
Identification of influential observations in high-dimensional cancer survival data through the rank product test

Survival analysis is a statistical technique widely used in many fields of science, in particular in the medical area, and which studies the time until an event of interest occurs. Outlier detection in this co...

Authors: Eunice Carrasquinha, André Veríssimo, Marta B. Lopes and Susana Vinga

Citation: BioData Mining 2018 11:1

Content type: Research Published on: 14 February 2018
- View Full Text
- View PDF
Scalable non-negative matrix tri-factorization

Matrix factorization is a well established pattern discovery tool that has seen numerous applications in biomedical data analytics, such as gene expression co-clustering, patient stratification, and gene-disea...

Authors: Andrej Čopar, Marinka žitnik and Blaž Zupan

Citation: BioData Mining 2017 10:41

Content type: Research Published on: 29 December 2017
- View Full Text
- View PDF
An automated pipeline for bouton, spine, and synapse detection of in vivo two-photon images

In the nervous system, the neurons communicate through synapses. The size, morphology, and connectivity of these synapses are significant in determining the functional properties of the neural network. Therefo...

Authors: Qiwei Xie, Xi Chen, Hao Deng, Danqian Liu, Yingyu Sun, Xiaojuan Zhou, Yang Yang and Hua Han

Citation: BioData Mining 2017 10:40

Content type: Research Published on: 20 December 2017
- View Full Text
- View PDF
Sparse generalized linear model with L ₀ approximation for feature selection and prediction with big omics data

Feature selection and prediction are the most important tasks for big data mining. The common strategies for feature selection in big data mining are L ₁, SCAD and MC+. However, none of...

Authors: Zhenqiu Liu, Fengzhu Sun and Dermot P. McGovern

Citation: BioData Mining 2017 10:39

Content type: Methodology Published on: 19 December 2017
- View Full Text
- View PDF
TSPmap, a tool making use of traveling salesperson problem solvers in the efficient and accurate construction of high-density genetic linkage maps

Recent advances in nucleic acid sequencing technologies have led to a dramatic increase in the number of markers available to generate genetic linkage maps. This increased marker density can be used to improve...

Authors: J. Grey Monroe, Zachariah A. Allen, Paul Tanger, Jack L. Mullen, John T. Lovell, Brook T. Moyers, Darrell Whitley and John K. McKay

Citation: BioData Mining 2017 10:38

Content type: Software article Published on: 19 December 2017
- View Full Text
- View PDF
Cluster ensemble based on Random Forests for genetic data

Clustering plays a crucial role in several application domains, such as bioinformatics. In bioinformatics, clustering has been extensively used as an approach for detecting interesting patterns in genetic data...

Authors: Luluah Alhusain and Alaaeldin M. Hafez

Citation: BioData Mining 2017 10:37

Content type: Methodology Published on: 15 December 2017
- View Full Text
- View PDF
PMLB: a large benchmark suite for machine learning evaluation and comparison

The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world ...

Authors: Randal S. Olson, William La Cava, Patryk Orzechowski, Ryan J. Urbanowicz and Jason H. Moore

Citation: BioData Mining 2017 10:36

Content type: Research Published on: 11 December 2017
- View Full Text
- View PDF
Ten quick tips for machine learning in computational biology

Machine learning has become a pivotal tool for many projects in computational biology, bioinformatics, and health informatics. Nevertheless, beginners and biomedical researchers often do not have enough experi...

Authors: Davide Chicco

Citation: BioData Mining 2017 10:35

Content type: Review Published on: 8 December 2017
- View Full Text
- View PDF
Artificial intelligence: more human with human

Authors: Moshe Sipper and Jason H. Moore

Citation: BioData Mining 2017 10:34

Content type: Editorial Published on: 1 December 2017
- View Full Text
- View PDF
OCDD: an obesity and co-morbid disease database

Obesity is a medical condition that is known for increased body mass index (BMI). It is also associated with chronic low level inflammation. Obesity disrupts the immune-metabolic homeostasis by changing the se...

Authors: Indrani Ray, Anindya Bhattacharya and Rajat K. De

Citation: BioData Mining 2017 10:33

Content type: Research Published on: 21 November 2017
- View Full Text
- View PDF
Metrics to estimate differential co-expression networks

Detecting the differences in gene expression data is important for understanding the underlying molecular mechanisms. Although the differentially expressed genes are a large component, differences in correlati...

Authors: Elpidio-Emmanuel Gonzalez-Valbuena and Víctor Treviño

Citation: BioData Mining 2017 10:32

Content type: Methodology Published on: 10 November 2017
- View Full Text
- View PDF
Methods for enhancing the reproducibility of biomedical research findings using electronic health records

The ability of external investigators to reproduce published scientific findings is critical for the evaluation and validation of biomedical research by the wider community. However, a substantial proportion o...

Authors: Spiros Denaxas, Kenan Direk, Arturo Gonzalez-Izquierdo, Maria Pikoula, Aylin Cakiroglu, Jason Moore, Harry Hemingway and Liam Smeeth

Citation: BioData Mining 2017 10:31

Content type: Review Published on: 11 September 2017
- View Full Text
- View PDF
RNA-sequence data normalization through in silico prediction of reference genes: the bacterial response to DNA damage as case study

Measuring how gene expression changes in the course of an experiment assesses how an organism responds on a molecular level. Sequencing of RNA molecules, and their subsequent quantification, aims to assess glo...

Authors: Bork A. Berghoff, Torgny Karlsson, Thomas Källman, E. Gerhart H. Wagner and Manfred G. Grabherr

Citation: BioData Mining 2017 10:30

Content type: Research Published on: 5 September 2017
- View Full Text
- View PDF
Identifying time-delayed gene regulatory networks via an evolvable hierarchical recurrent neural network

The modeling of genetic interactions within a cell is crucial for a basic understanding of physiology and for applied areas such as drug design. Interactions in gene regulatory networks (GRNs) include effects ...

Authors: Mina Moradi Kordmahalleh, Mohammad Gorji Sefidmazgi, Scott H. Harrison and Abdollah Homaifar

Citation: BioData Mining 2017 10:29

Content type: Methodology Published on: 3 August 2017
- View Full Text
- View PDF
Genetically improved BarraCUDA

BarraCUDA is an open source C program which uses the BWA algorithm in parallel with nVidia CUDA to align short next generation DNA sequences against a reference genome. Recently its source code was optimised u...

Authors: W. B. Langdon and Brian Yee Hong Lam

Citation: BioData Mining 2017 10:28

Content type: Short Report Published on: 2 August 2017
- View Full Text
- View PDF
nRC: non-coding RNA Classifier based on structural features

Non-coding RNA (ncRNA) are small non-coding sequences involved in gene expression regulation of many biological processes and diseases. The recent discovery of a large set of different ncRNAs with biologically...

Authors: Antonino Fiannaca, Massimo La Rosa, Laura La Paglia, Riccardo Rizzo and Alfonso Urso

Citation: BioData Mining 2017 10:27

Content type: Research Published on: 1 August 2017
- View Full Text
- View PDF
Evolutionary computation: the next major transition of artificial intelligence?

Authors: Moshe Sipper, Randal S. Olson and Jason H. Moore

Citation: BioData Mining 2017 10:26

Content type: Editorial Published on: 29 July 2017
- View Full Text
- View PDF
Discovery and replication of SNP-SNP interactions for quantitative lipid traits in over 60,000 individuals

The genetic etiology of human lipid quantitative traits is not fully elucidated, and interactions between variants may play a role. We performed a gene-centric interaction study for four different lipid traits...

Authors: Emily R. Holzinger, Shefali S. Verma, Carrie B. Moore, Molly Hall, Rishika De, Diane Gilbert-Diamond, Matthew B. Lanktree, Nathan Pankratz, Antoinette Amuzu, Amber Burt, Caroline Dale, Scott Dudek, Clement E. Furlong, Tom R. Gaunt, Daniel Seung Kim, Helene Riess…

Citation: BioData Mining 2017 10:25

Content type: Research Published on: 24 July 2017
- View Full Text
- View PDF
The Dark Proteome Database

Recently we surveyed the dark-proteome, i.e., regions of proteins never observed by experimental structure determination and inaccessible to homology modelling. Surprisingly, we found that most of the dark pro...

Authors: Nelson Perdigão, Agostinho C. Rosa and Seán I. O’Donoghue

Citation: BioData Mining 2017 10:24

Content type: Software article Published on: 20 July 2017
- View Full Text
- View PDF
epiACO - a method for identifying epistasis based on ant Colony optimization algorithm

Identifying epistasis or epistatic interactions, which refer to nonlinear interaction effects of single nucleotide polymorphisms (SNPs), is essential to understand disease susceptibility and to detect genetic ...

Authors: Yingxia Sun, Junliang Shang, Jin-Xing Liu, Shengjun Li and Chun-Hou Zheng

Citation: BioData Mining 2017 10:23

Content type: Methodology Published on: 6 July 2017
- View Full Text
- View PDF
Arete – candidate gene prioritization using biological network topology with additional evidence types

Refinement of candidate gene lists to select the most promising candidates for further experimental verification remains an essential step between high-throughput exploratory analysis and the discovery of spec...

Authors: Artem Lysenko, Keith Anthony Boroevich and Tatsuhiko Tsunoda

Citation: BioData Mining 2017 10:22

Content type: Software article Published on: 6 July 2017
- View Full Text
- View PDF
EFS: an ensemble feature selection tool implemented as R-package and web-application

Feature selection methods aim at identifying a subset of features that improve the prediction performance of subsequent classification models and thereby also simplify their interpretability. Preceding studies...

Authors: Ursula Neumann, Nikita Genze and Dominik Heider

Citation: BioData Mining 2017 10:21

Content type: Software Article Published on: 27 June 2017
- View Full Text
- View PDF
Computational dynamic approaches for temporal omics data with applications to systems medicine

Modeling and predicting biological dynamic systems and simultaneously estimating the kinetic structural and functional parameters are extremely important in systems and computational biology. This is key for u...

Authors: Yulan Liang and Arpad Kelemen

Citation: BioData Mining 2017 10:20

Content type: Review Published on: 17 June 2017
- View Full Text
- View PDF
Grid-based stochastic search for hierarchical gene-gene interactions in population-based genetic studies of common human diseases

Large-scale genetic studies of common human diseases have focused almost exclusively on the independent main effects of single-nucleotide polymorphisms (SNPs) on disease susceptibility. These studies have had ...

Authors: Jason H. Moore, Peter C. Andrews, Randal S. Olson, Sarah E. Carlson, Curt R. Larock, Mario J. Bulhoes, James P. O’Connor, Ellen M. Greytak and Steven L. Armentrout

Citation: BioData Mining 2017 10:19

Content type: Methodology Published on: 30 May 2017
- View Full Text
- View PDF
Gene Set Enrichment Analyses: lessons learned from the heart failure phenotype

Genetic studies for complex diseases have predominantly discovered main effects at individual loci, but have not focused on genomic and environmental contexts important for a phenotype. Gene Set Enrichment Ana...

Authors: Vinicius Tragante, Johannes M. I. H. Gho, Janine F. Felix, Ramachandran S. Vasan, Nicholas L. Smith, Benjamin F. Voight, Colin Palmer, Pim van der Harst, Jason H. Moore and Folkert W. Asselbergs

Citation: BioData Mining 2017 10:18

Content type: Methodology Published on: 26 May 2017
- View Full Text
- View PDF
Vinasse fertirrigation alters soil resistome dynamics: an analysis based on metagenomic profiles

Every year around 300 Gl of vinasse, a by-product of ethanol distillation in sugarcane mills, are flushed into more than 9 Mha of sugarcane cropland in Brazil. This practice links fermentation waste management...

Authors: Lucas P. P. Braga, Rafael F. Alves, Marina T. F. Dellias, Acacio A. Navarrete, Thiago O. Basso and Siu M. Tsai

Citation: BioData Mining 2017 10:17

Content type: Short report Published on: 23 May 2017
- View Full Text
- View PDF
The optimal crowd learning machine

Any family of learning machines can be combined into a single learning machine using various methods with myriad degrees of usefulness.

Authors: Bilguunzaya Battogtokh, Majid Mojirsheibani and James Malley

Citation: BioData Mining 2017 10:16

Content type: Research Published on: 19 May 2017
- View Full Text
- View PDF
Study of Meta-analysis strategies for network inference using information-theoretic approaches

Reverse engineering of gene regulatory networks (GRNs) from gene expression data is a classical challenge in systems biology. Thanks to high-throughput technologies, a massive amount of gene-expression data ha...

Authors: Ngoc C. Pham, Benjamin Haibe-Kains, Pau Bellot, Gianluca Bontempi and Patrick E. Meyer

Citation: BioData Mining 2017 10:15

Content type: Methodology Published on: 6 May 2017
- View Full Text
- View PDF
Feature analysis for classification of trace fluorescent labeled protein crystallization images

Large number of features are extracted from protein crystallization trial images to improve the accuracy of classifiers for predicting the presence of crystals or phases of the crystallization process. The exc...

Authors: Madhav Sigdel, Imren Dinc, Madhu S. Sigdel, Semih Dinc, Marc L. Pusey and Ramazan S. Aygun

Citation: BioData Mining 2017 10:14

Content type: Research Published on: 27 April 2017
- View Full Text
- View PDF
Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery

A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model comp...

Authors: Nathaniel M. Crabtree, Jason H. Moore, John F. Bowyer and Nysia I. George

Citation: BioData Mining 2017 10:13

Content type: Methodology Published on: 24 April 2017
- View Full Text
- View PDF
Discovering feature relevancy and dependency by kernel-guided probabilistic model-building evolution

Discovering relevant features (biomarkers) that discriminate etiologies of a disease is useful to provide biomedical researchers with candidate targets for further laboratory experimentation while saving costs...

Authors: Nestor Rodriguez and Sergio Rojas–Galeano

Citation: BioData Mining 2017 10:12

Content type: Methodology Published on: 15 March 2017
- View Full Text
- View PDF
Rapid development of entity-based data models for bioinformatics with persistence object-oriented design and structured interfaces

Databases are imperative for research in bioinformatics and computational biology. Current challenges in database design include data heterogeneity and context-dependent interconnections between data entities....

Authors: Elishai Ezra Tsur

Citation: BioData Mining 2017 10:11

Content type: Software article Published on: 11 March 2017
- View Full Text
- View PDF
Label-free data standardization for clinical metabolomics

In metabolomics, thousands of substances can be detected in a single assay. This capacity motivates the development of metabolomics testing, which is currently a very promising option for improving laboratory ...

Authors: Petr G. Lokhov, Dmitri L. Maslov, Oleg N. Kharibin, Elena E. Balashova and Alexander I. Archakov

Citation: BioData Mining 2017 10:10

Content type: Methodology Published on: 28 February 2017
- View Full Text
- View PDF
Variant Set Enrichment: an R package to identify disease-associated functional genomic regions

Genetic predispositions to diseases populate the noncoding regions of the human genome. Delineating their functional basis can inform on the mechanisms contributing to disease development. However, this remain...

Authors: Musaddeque Ahmed, Richard C. Sallari, Haiyang Guo, Jason H. Moore, Housheng Hansen He and Mathieu Lupien

Citation: BioData Mining 2017 10:9

Content type: Methodology Published on: 22 February 2017
- View Full Text
- View PDF
Erratum to: Meta-analytic support vector machine for integrating multiple omics data

Authors: SungHwan Kim, Jae-Hwan Jhong, JungJun Lee and Ja-Yong Koo

Citation: BioData Mining 2017 10:8

Content type: Erratum Published on: 14 February 2017

The original article was published in BioData Mining 2017 10:2
- View Full Text
- View PDF
Semantics-based plausible reasoning to extend the knowledge coverage of medical knowledge bases for improved clinical decision support

Capturing complete medical knowledge is challenging-often due to incomplete patient Electronic Health Records (EHR), but also because of valuable, tacit medical knowledge hidden away in physicians’ experiences...

Authors: Hossein Mohammadhassanzadeh, William Van Woensel, Samina Raza Abidi and Syed Sibte Raza Abidi

Citation: BioData Mining 2017 10:7

Content type: Methodology Published on: 10 February 2017
- View Full Text
- View PDF
Elevated transcriptional levels of aldolase A (ALDOA) associates with cell cycle-related genes in patients with NSCLC and several solid tumors

Aldolase A (ALDOA) is one of the glycolytic enzymes primarily found in the developing embryo and adult muscle. Recently, a new role of ALDOA in several cancers has been proposed. However, the underlying mechan...

Authors: Fan Zhang, Jie-Diao Lin, Xiao-Yu Zuo, Yi-Xuan Zhuang, Chao-Qun Hong, Guo-Jun Zhang, Xiao-Jiang Cui and Yu-Kun Cui

Citation: BioData Mining 2017 10:6

Content type: Research Published on: 7 February 2017
- View Full Text
- View PDF
Gene set analysis controlling for length bias in RNA-seq experiments

In gene set analysis, the researchers are interested in determining the gene sets that are significantly correlated with an outcome, e.g. disease status or treatment. With the rapid development of high through...

Authors: Xing Ren, Qiang Hu, Song Liu, Jianmin Wang and Jeffrey C. Miecznikowski

Citation: BioData Mining 2017 10:5

Content type: Methodology Published on: 6 February 2017
- View Full Text
- View PDF
A feature selection method based on multiple kernel learning with expression profiles of different types

With the development of high-throughput technology, the researchers can acquire large number of expression data with different types from several public databases. Because most of these data have small number ...

Authors: Wei Du, Zhongbo Cao, Tianci Song, Ying Li and Yanchun Liang

Citation: BioData Mining 2017 10:4

Content type: Methodology Published on: 2 February 2017
- View Full Text
- View PDF
Mining pathway associations for disease-related pathway activity analysis based on gene expression and methylation data

The problem of discovering genetic markers as disease signatures is of great significance for the successful diagnosis, treatment, and prognosis of complex diseases. Even if many earlier studies worked on iden...

Authors: Hyeonjeong Lee and Miyoung Shin

Citation: BioData Mining 2017 10:3

Content type: Methodology Published on: 1 February 2017
- View Full Text
- View PDF

How was your experience today?

Rating Please select one rating

Awful

Bad

Good

Great

Thank you for your feedback.

Tell us why (opens in a new tab)

Articles

Follow

Annual Journal Metrics

BioData Mining

Contact us