Visual programming for next-generation sequencing data analytics
© Milicchio et al. 2016
Received: 22 January 2016
Accepted: 21 April 2016
Published: 27 April 2016
High-throughput or next-generation sequencing (NGS) technologies have become an established and affordable experimental framework in biological and medical sciences for all basic and translational research. Processing and analyzing NGS data is challenging. NGS data are big, heterogeneous, sparse, and error prone. Although a plethora of tools for NGS data analysis has emerged in the past decade, (i) software development is still lagging behind data generation capabilities, and (ii) there is a ‘cultural’ gap between the end user and the developer.
Generic software template libraries specifically developed for NGS can help in dealing with the former problem, whilst coupling template libraries with visual programming may help with the latter. Here we scrutinize the state-of-the-art low-level software libraries implemented specifically for NGS and graphical tools for NGS analytics. An ideal developing environment for NGS should be modular (with a native library interface), scalable in computational methods (i.e. serial, multithread, distributed), transparent (platform-independent), interoperable (with external software interface), and usable (via an intuitive graphical user interface). These characteristics should facilitate both the run of standardized NGS pipelines and the development of new workflows based on technological advancements or users’ needs. We discuss in detail the potential of a computational framework blending generic template programming and visual programming that addresses all of the current limitations.
In the long term, a proper, well-developed (although not necessarily unique) software framework will bridge the current gap between data generation and hypothesis testing. This will eventually facilitate the development of novel diagnostic tools embedded in routine healthcare.
KeywordsNext-generation sequencing High-throughput sequencing Big data Template library Generic programming Visual programming Graphical user interface Software suite
High-throughput or next-generation sequencing (NGS) technologies have become an established and affordable experimental framework for basic and translational research in biomedical sciences and clinical diagnostics [1–3]. The applications of NGS are almost endless, spanning many ‘–omics’ fields, such as genomics, transcriptomics, and metabolomics [3–11]. Nowadays, it is possible to sequence any microbial organism or metagenomic sample within hours and to obtain human genomes in weeks. By sequencing the entire genome in targeted patients, it is possible to identify genes and regulatory elements related to pathophysiological conditions. Genome-wide association studies and analysis of gene expression, usually made via well-established microarray techniques, can now be done via NGS, e.g. RNA-Seq[uencing]. NGS allows for full genome characterization of other organisms besides the human genome, including known pathogens, and yet-to-be-identified bacterial, viral, or fungal species that may pose a public health threat . Another growing application of NGS is microbial community analysis. The diverse host-associated microbiota has received intense research interests for its potential associations with human health outcomes . With few modifications in sample preparation protocols, a single NGS machine can offer the scientist an abundance of data for exploring multi-domain research questions.
Several NGS platforms and sequencing technologies are available . Technology providers include Illumina Inc. , Thermo Fisher Scientific , Roche , and Pacific Biosciences . NGS services are available at a comparable price to established sequencing methods such as Sanger, although with considerably greater data output [19–21].
Whereas the traditional Sanger  approach produces contiguous nucleotide sequence reads between 400 and 700 bases with a throughput of 50-30,000 kilobases per hour, NGS reaches a throughput of 10-600 gigabases per hour, producing reads up to 700 nucleotide bases long , and Pacific Biosciences broke the 10,000+ bases length record. The terabyte-size of nucleotide sequence data per run is becoming a reality, which will further lower per-sample sequencing cost [23, 24]. The fourth-generation of Oxford’s Nanopore-based sequencers have the potential to reduce the cost for sequencing an entire human genome from the fairly recent $1,000 target  to an astounding $100 [26, 27]. The decreasing trend of the cost-per-base of DNA sequence since 2008 even exceeded Moore’s law , i.e. the exponential growth of computing hardware capabilities, where the number of transistors in an integrated circuit doubles approximately every two years.
Ever since the first NGS machine was commercialized in 2004 by 454, the development of robust, intuitive, and easy to use analytic tools has been behind data generation capabilities. This state was defined with the evocative term “analysis paralysis” in 2010 . A landmark paper in 2012 by Vyverman et al. highlighted the limitations and needs of bioinformatics tools for a variety of complex string problems that are at the base of most NGS analytics . Five years later, analysis is no longer paralyzed. A plethora of NGS data analysis software has emerged, with considerable redundancy. Nevertheless, software development must adapt to handle fast-pace evolving technology, e.g. further data inflation resulting from the Nanopore platform [31–34].
Most of the current NGS software requires dedicated bioinformaticians with access to comprehensive computational infrastructure. Just a few years ago, there was a bottleneck between data generation and inference (analyzing and making sense of the data), but nowadays, access to these bioinformatics resources is more common and affordable. The new bottleneck is the evolution of software in accordance with technological advances and users’ needs.
Comprehensive software suites for NGS analytics must be supported by an appropriate development environment. The lack of an organized programming base slows down the development of innovative applications that can be handled directly by the investigators generating the data. Biological scientists carrying out experiments at times undergo delays and difficulties in analyzing NGS data because tools customized to their needs and abilities are not readily available. Current software for NGS analytics requires medium-to-advanced level of computational proficiency. One reason is the compulsory use of high-performance computing infrastructure for analyzing most NGS data sets. Such computational arrangements should not be necessary when sequencing individual fungal, microbial or viral pathogens or when performing targeted phylogenetic studies (e.g. 16S ribosomal RNA); a desktop computer should be sufficient for analyzing bacterial data generated by platforms such as Illumina’s MiSeq. When users need to move onto a high-performance computing infrastructure for projects involving large numbers of human genome sequences, they may benefit from the availability of software they are already familiar with (i.e. the one running on their desktop machine), rather than being required to learn an entirely new set of programs. An example in statistical analytics is the SAS software system (SAS Institute Inc.), which does not require the users to change the programming syntax when migrating across different components or installations (including desktop, server, and distributed editions).
At present, software engineers who develop new algorithms and analytical tools for NGS face a lack of dedicated libraries and interoperable software, and they have to write new tools which in turn cannot be interoperable. From a developer’s perspective, many existing programs could be rewritten to be more efficient or to be parallelized homogeneously, as in hierarchical build of programs, for easy integration across various platforms. With a common software layer that abstracts interactions between data and algorithms, integrating procedures that exploit multithreading or distributed computing may be achieved without in-depth modifications of the algorithms themselves. In addition, the adoption of generic programming template libraries can homogenize programmers’ work and permit a more community-engaged software development.
Template libraries and generic programming
Summary of programming libraries/toolkits for analysis of (next-generation) sequencing data
Sequence alignment; rapid database search; protein motif identification; nucleotide sequence pattern analysis; codon usage analysis for small genomes; rapid identification of sequence patterns in large scale sequence sets; presentation tools for publication.
Data structures (e.g. graphs); nucleotide string methods (e.g. Fourier transform, Needleman-Wunsch alignment).
Access sequence data from local/remote data bases; manage data base formats; data base search; manipulating sequences/sequence alignments; gene annotations.
Repository of multiple libraries for analysis and comprehension of genomic and –omics data, including NGS.
DNA and protein sequence analysis, sequence alignment.
Parsing, compression, k-mer, suffix trees, annotation, error correction and other sequence analytics (FASTA, FASTQ)
GNU Lesser GPL
Compressed indices, text collections
Sequence analysis, phylogenetics, molecular evolution; population genetics.
GNU Lesser GPL
Manipulate biological sequences; file parse; DAS client/server support; access to BioSQL/Ensembl data bases; tools for making sequence analysis GUIs; statistical routines; dynamic programming toolkit.
Extensive set of algorithms and data structures for the analysis of nucleotide sequences, with emphasis on NGS data; includes index, compression, data base search, support for NGS-specific file formats (fastq, SAM/BAM, VCF, BED).
Sequence input/output; alignment input/output; population genetics; structural bioinformatics; SQL interface.
Read, write, edit, index, view SAM/BAM/CRAM formats; read, write BCF2/VCF/gVCF files; call, filter, summarize SNP/short indels.
DNA and protein sequence analysis, sequence alignment, biological database parsing, ontology, structural biology.
Read, write, manipulate BAM formats
Handle SAM/BAM, fastq, GLF, VCF, ASP.
GNU Lesser GPL
Read, write, manipulate multiple genomic file formats and data associated with BED type files (epigenomics).
GNU Lesser GPL
Parse of Genbank, Uniprot XML, fasta, fastq formats; wrappers for BLAST, signalP, TMHMM; index files for random access, lazy processing of sequences from very large files.
What is template generic programming?
A generic programming framework provides traits classes , i.e. an abstract representation of data types and algorithms . As a real-life example, take the LEGO® toy construction kits. A child (end user) wants a LEGO® model of a brick that resists trampling. A company (developer) may be contracted to produce new stomp-resistant brick types (data structures/functions) that respect the general specification (traits class) of LEGO® bricks (e.g. hole spacing and size), thus ensuring that, before reaching the end-user, the new brick (structure) will work with any other LEGO® toy piece. In essence, a traits class may be seen as a prescription, a specification for data structures or functions: it enforces clients to respect a list of prerequisites. More technically, a traits class is the gateway for calling a function on a generic and a priori unknown data structure, employed at compile time (that is parametric polymorphism). If the given data structure provides types and methods definitions required by the traits, then any algorithm employing such a structure may be applied to any data structure that respects the requirements. This allows developers to write algorithms that can be applied at no runtime cost to any data structure, without knowing a priori which data structure will be employed or the types involved in the process.
Template generic programming enables provision of seamless infrastructure for computational tasks and replacement of serial algorithms, or data types with multithreaded/distributed ones, without significant changes to the program structure and without additional runtime overhead, provided the adherence to a traits class. For instance, when reading molecular sequence files and associated metadata (e.g. fastq files with nucleotides and quality scores), the compiler generates an ad hoc class given a chosen structure (e.g. a list or a suffix tree); no virtual functions and inheritance will be involved, reducing the runtime overhead. One could replace a serial constructor with a distributed one (e.g. reading sequence files in parallel), or change its associated methods (e.g. a sequence corrector based on two different algorithms) without changing the program; since the software employs traits to access a generic data structure, the compiler will ensure that the change will be a valid one. Hence, when a container does not respect the requirements, the software will refuse to compile, allowing developers to assess the interoperability of data types and algorithms without any performance degradation, and to write software libraries that may be applied to any traits-abiding data structure. The advantage of a library on generic programming templates is straightforward: the library can be updated for handling new data types, increase in data size, and technology changes in computing, like a new graphic processing unit chip or a computation accelerator (e.g. Intel® Xeon Phi).
General-purpose software suites and graphical user interfaces
Low-level libraries and command-line toolsets are appropriate for software developers and programming-savvy users; they form the base to develop high-level ensemble software suites that feature a graphical user interface (GUI) with menus and premade workflows or analysis pipelines. Suites bring complex analytical procedures to the general user. For instance, user-friendly statistical software with powerful GUI includes Statistica (Dell Inc.), SAS Enterprise (SAS Institute Inc.), and SPSS (IBM Corporation). This easily accessible interface is advantageous as it can facilitate the applied and translational aspects of NGS research for those without programming knowledge, but the software must be carefully designed in order to prevent users from making errors due to lack of specialized expertise. In addition to a GUI featuring single functions and premade workflows, some high-level suites offer visual workflow builders. Visual builders are different form GUIs because they permit combination (more or less flexibly) of existing program functions into new data processing pipelines, which may not be available as pull-down menus or icons in a GUI.
Summary of all-purpose software suites for analysis of next-generation sequencing data offered with a graphical user interface option
+ third party
GNU GPL (Rabix)
+ third party
Illumina’s BaseSpace provides an exclusively cloud-based environment hosted by Amazon Inc., direct integration with sequencing instruments, and workflows as mobile-touch apps.
Galaxy, a web-browser application, is one of the most popular, open source NGS suites and effectively offers a unique developer’s platform, permitting the integration and collation of different programs. Galaxy has a substantial support from the developers’ community; both researchers and federal agencies are investing in this promising platform, which also features a visual workflow builder. However, Galaxy does not permit development of new algorithms and standalone software by itself, and it must be supported by a proper collection of low-level programs.
UGENE is another free and open source platform. The suite is multiplatform, i.e. it runs on computers with any operating system, such as MS Windows or Mac OSX by using the Qt C++ framework, incorporates multiple pipelines for NGS, and allows visual design of new ones with its built-in workflow builder, as in Galaxy (with similar limitations).
Not all procedures implementable from the command-line can be represented in the Galaxy’s workflow builder. For instance, workflows require a fixed set of inputs and therefore looping on files within is not yet available (as of November 2015). ‘Enhanced’ Galaxy frameworks like Globus Genomics or SevenBridges can overcome such workflow limitations, but require a higher technical expertise . UGENE’s builder has limitations similar to Galaxy. New features in UGENE can be requested to the developers’ team, which also provides tech and other types of support for a price. UGENE is equipped for parallel computation, but the capabilities to distribute jobs are limited by the capabilities of programs embedded (also the case for Galaxy). Nonetheless, UGENE developers’ team has steadily published new workflows for NGS, providing extensive walkthroughs for the non-specialists: recently released pipelines include: “Variant Calling with SAMtools,” “Tuxedo Pipeline for RNA-seq Data Analysis,” and “Cistrome Pipeline for ChIP-seq Data Analysis,” all currently integrated into the Unipro UGENE desktop toolkit .
We have highlighted the importance of creating a solid low-level base for NGS programming and a high-level base to scale up analytics, especially with the usage of visual tools.
In computer science, a visual programming (VP) language is a medium for implementing computer programs that makes uses of graphical operators and elements rather than textual ones. VP is not a new concept [71–74]; it has been envisioned in several ways starting from the early 1960s and has been the object of philosophical debates [75, 76]. VP is different from GUI. A GUI aids users executing programs via visual menu items in contrast to command-line (i.e. terminal) text scripting. In general, GUI menus are premade and users cannot create new programs or combine menu functions within the GUI. Conversely, a VP language has the same power as a textual programming language or a library, if it features the same functional elements (e.g. data structures and methods); therefore, new algorithms and programs can be designed and compiled within a VP, and VP can even be used to implement GUIs. Visual approaches to programming have been explored in diverse environments, including education, multimedia, system simulation and automation, data warehousing, and business intelligence, with probably the most successful example being the computer-aided design (CAD) software industry. Another extremely popular area for VP is video game design [77, 78]. Although in principle VP can be used to create algorithms starting from the lowest hierarchy of programming language elements, in practice, VP is employed for creating higher-level applications using libraries. This facilitates developers’ work when a large amount of coding (and redundant coding) is required.
How can visual programming benefit NGS software development?
Currently, there are no ‘pure’ VP approaches being developed for NGS applications. Galaxy or UGENE workflow builders can be considered rudimental VP environments, but as discussed previously, they do not offer the same set of functions as the command-line and have limited interoperability (i.e. they work only within their parent environment and cannot build independent programs). However, there is potential for improving the workflow builders using the VP approach.
Visual programming entities
The main elements of a VP language are: building blocks, block engines, block connectors, and meta-blocks. Building blocks are the basic VP pieces, like LEGO®; they can have different functions, like the different shapes of LEGO® blocks. Building blocks can represent a file parser, a read trimmer, a mapping algorithm, a k-mer graph builder, a SAM/BAM file converter, et cetera. They can be data structures, constructors, methods. Technically, building blocks are filters modelled functionally, linking connectors with input-output control, in accordance to the domain-driven design paradigm, e.g. C# or Microsoft.NET [79, 80]. Block engines perform computational procedures within a building block; for example, they read a fastq file or they map a read set to a reference genome using a specific algorithm. In the generic template framework, the algorithms can be transparently replaced (given that implementations respect devised traits classes). Block connectors are relational mappings between data structures and algorithms within functional visual blocks, i.e. the communication links among building blocks. For instance, a read mapper requires both a reference sequence (which could be indexed upon parsing) and a read set (which could be read as single- or paired-end). A metagenomics classifier requires a genomic database and the read sets. Block connectors encode these relations. Finally, meta-blocks are blocks made of multiple building blocks and connectors, i.e. whole data analysis pipelines, replaced by a single building block of a higher hierarchy (given a defined ontology for the blocks). If a user is not a developer, there is no need for them to access the whole block workflow. They can use the blocks at the highest level hierarchy which would correspond to a single icon/menu in the GUI; for instance, a “metagenomics analysis pipeline” meta-block would ask for a specific set of input files (fastq) and write a standardized output (fasta, KEGG, SEED). The more a user is acquainted with NGS algorithms, the more they can get trained into the visual programming interface, without resorting to traditional text-based programming. This is a convenient trade-off between black-box and informed design/use.
Physiognomy of visual programming
As an example, Galaxy has strengths in expertise interoperability (thanks to a nourished developers’ and users’ community), software interoperability (by incorporating a plethora of different software and toolsets), and transparency (through the web-browser interface). There is relevant, yet limited, modularity (graphic workflow builder), and scalability (tool parallelization and server installation). Flexibility is enabled by the web-browser GUI. Scalability and interoperability, however, are also subject to original tools’ capabilities and may not keep up with the pace of the current technology evolution. Recent efforts in interoperability for development and standardization of NGS pipelines (i.e. importable in Galaxy and other frameworks) have been concretized in the open source Rabix toolkit  for developing and running portable workflows based on the Common Workflow Language specification . Illumina’s BaseSpace and UGENE are strong in usability and flexibility, but are behind Galaxy in other requisites; none of the available suites has a proper VP interface.
NGS data science (or analytics) is an interdisciplinary and critical field of bioinformatics research that has gained increased attention and visibility upon the explosion of NGS technology. It is a sector which has to keep up with the tremendous advancement in sequencing yield and ‘next-NGS’ (future generation) technologies. NGS data science is feasible when proper software is available and can routinely and reliably be used, with reasonable resource spending. The development of software tools for NGS analytics is challenging, given multiple practical hurdles that include the large data sizes, data heterogeneity, and data errors. Despite the glut of NGS software released in the past years, the toolsets are not yet homogenized as they are in other fields (e.g. statistics, automation). We have reviewed the current panorama of low-level software for NGS (i.e. libraries and toolsets) used mainly by developers, and high-level suites (i.e. all-purpose programs with GUIs) used by stakeholders, biological scientists performing experiments for instance. Among the reviewed software libraries, we have identified a positive effort of the Open Bioinformatics Foundation in promoting the ‘Bio’ extensions to programming languages, such as BioJava, BioPerl, BioRuby, BioPHP, et cetera. However, these toolsets are often not well calibrated for the NGS needs (e.g. scalability of methods). Among the NGS-specific libraries, we have identified SeqAn (open source, C++) as the most promising one, because on top of its specificity, it is a generic programming template framework that can seamlessly upgrade itself. SeqAn has already been used in many proof-of-concept works providing efficient/optimized re-implementation of existing methods. Still, low-level libraries are instruments for software developers, not for end users. For the latter, high-level software suites with user-friendly GUIs are available. We have reviewed both commercial and free suites, including Galaxy and Geneious. This general-purpose software usually wraps around existing command-line tools, which may not have been programmed using consistent libraries or programming languages. The suites offer premade pipelines to analyze specific data sets (e.g. RNASeq) with a simple click, combining different programs together. Usability should be the key feature of these graphical suites. Some of the suites also offer the possibility to create ad hoc workflows, but the functionalities are limited; new programs are hard to develop. At the moment workflow design is bound to existing programs present in the suite, but some workflow builders could be modified to incorporate NGS libraries in the near future.
Visual programming is used in many sectors of software development, such as education, architecture, and video game design. The visual programming philosophy linked to generic template libraries, seen as a powerful extension of workflow builders, can be a valuable aid for improving NGS tool and workflow development. VisPro is a conceptual visual programming framework covering a number of requisites that make it appropriate for development of NGS software (in the need of scalability, transparency, usability, interoperability), especially if coupled with a powerful generic template library, like SeqAn. However, instantiating a brand new VisPro for NGS may require a tremendous development effort. On the other hand, existing general-purpose suites are supported by a large community of developers, users, and investors; even so, they have flaws and may be stalled by further technological changes.
In conclusion, visual programming could effectively bridge the gap between software developers and users needing cutting-edge software, making NGS data science fully translational. While an ex novo development of VP software specific for NGS may be unfeasible, trying to improve the visual programming capabilities of existing software and the interoperability with low-level libraries could be a preferred course of action.
The VIROGENESIS project receives funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 634650.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Xuan J, Yu Y, Qing T, Guo L, Shi L. Next-generation sequencing in the clinic: promises and challenges. Cancer Lett. 2013;340(2):284–95.View ArticlePubMedGoogle Scholar
- van Dijk EL, Auger H, Jaszczyszyn Y, Thermes C. Ten years of next-generation sequencing technology. Trends Genet. 2014;30(9):418–26.View ArticlePubMedGoogle Scholar
- Ohashi H, Hasegawa M, Wakimoto K, Miyamoto-Sato E. Next-generation technologies for multiomics approaches including interactome sequencing. BioMed Res Int. 2015;2015:104209.View ArticlePubMedPubMed CentralGoogle Scholar
- Beggs AD, Dilworth MP. Surgery in the era of the 'omics revolution. Br J Surg. 2015;102(2):e29–40.View ArticlePubMedPubMed CentralGoogle Scholar
- Mensaert K, Denil S, Trooskens G, Van Criekinge W, Thas O, De Meyer T. Next-generation technologies and data analytical approaches for epigenomics. Environ Mol Mutagen. 2014;55(3):155–70.View ArticlePubMedGoogle Scholar
- Mason CE, Porter SG, Smith TM. Characterizing multi-omic data in systems biology. Adv Exp Med Biol. 2014;799:15–38.View ArticlePubMedGoogle Scholar
- Grada A, Weinbrecht K. Next-generation sequencing: methodology and application. J Invest Dermatol. 2013;133(8):e11.View ArticlePubMedGoogle Scholar
- Berger B, Peng J, Singh M. Computational solutions for omics data. Nat Rev Genet. 2013;14(5):333–46.View ArticlePubMedPubMed CentralGoogle Scholar
- Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010;11(1):31–46.View ArticlePubMedGoogle Scholar
- Koboldt DC, Steinberg KM, Larson DE, Wilson RK, Mardis ER. The next-generation sequencing revolution and its impact on genomics. Cell. 2013;155(1):27–38.View ArticlePubMedPubMed CentralGoogle Scholar
- Hawkins RD, Hon GC, Ren B. Next-generation genomics: an integrative approach. Nat Rev Genet. 2010;11(7):476–86.PubMedPubMed CentralGoogle Scholar
- Azarian T, Cook RL, Johnson JA, Guzman N, McCarter YS, Gomez N, McCarter YS, Gomez N, Rathore MH, Morris JGJ, Salemi M. Whole-Genome Sequencing for Outbreak Investigations of Methicillin-Resistant Staphylococcus aureus in the Neonatal Intensive Care Unit: Time for Routine Practice? Infect Control Hosp Epidemiol. 2015;FirstView:1–9.Google Scholar
- Berger G, Bitterman R, Azzam ZS. The human microbiota: the rise of an “empire”. Rambam Maimonides Med J. 2015;6(2):e0018.View ArticlePubMedPubMed CentralGoogle Scholar
- Buermans HP, den Dunnen JT. Next generation sequencing technology: Advances and applications. Biochim Biophys Acta. 2014;1842(10):1932–41.View ArticlePubMedGoogle Scholar
- Illumina Inc. [http://www.illumina.com/]. Accessed 25 Apr 2016.
- James F. Welles Replies. J Infor Ethics. 2012;21(1):5–6.Google Scholar
- Roche Sequencing. [http://sequencing.roche.com/]. Accessed 25 Apr 2016.
- Pacific Biosciences. [http://www.pacb.com/].
- Facio FM, Lee K, O’Daniel JM. A genetic counselor’s guide to using next-generation sequencing in clinical practice. J Genet Couns. 2014;23(4):455–62.View ArticlePubMedGoogle Scholar
- Aronson N. Making personalized medicine more affordable. Ann N Y Acad Sci. 2015;1346(1):81-9. doi:https://doi.org/10.1111/nyas.12614. Epub 2015 Feb 27.
- Desai AN, Jere A. Next-generation sequencing: ready for the clinics? Clin Genet. 2012;81(6):503–10.View ArticlePubMedGoogle Scholar
- Sanger F, Coulson AR. A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. J Mol Biol. 1975;94(3):441–8.View ArticlePubMedGoogle Scholar
- Niedringhaus TP, Milanova D, Kerby MB, Snyder MP, Barron AE. Landscape of next-generation sequencing technologies. Anal Chem. 2011;83(12):4327–41.View ArticlePubMedPubMed CentralGoogle Scholar
- el Bahassi M, Stambrook PJ. Next-generation sequencing technologies: breaking the sound barrier of human genetics. Mutagenesis. 2014;29(5):303–10.View ArticleGoogle Scholar
- Service RF. Gene sequencing. The race for the $1000 genome. Science. 2006;311(5767):1544–6.View ArticlePubMedGoogle Scholar
- Feng Y, Zhang Y, Ying C, Wang D, Du C. Nanopore-based Fourth-generation DNA Sequencing Technology. Genomics Proteomics Bioinformatics. 2015;13(1):4–16.View ArticlePubMedPubMed CentralGoogle Scholar
- Ying YL, Zhang J, Gao R, Long YT. Nanopore-based sequencing and detection of nucleic acids. Angew Chem Int Ed Engl. 2013;52(50):13154–61.View ArticlePubMedGoogle Scholar
- DNA Sequencing Costs. [http://www.genome.gov/sequencingcosts/]. Accessed 25 Apr 2016.
- Baker M. Next-generation sequencing: adjusting to data overload. Nat Meth. 2010;7(7):495–9.View ArticleGoogle Scholar
- Vyverman M, De Baets B, Fack V, Dawyndt P. Prospects and limitations of full-text index structures in genome analysis. Nucleic Acids Res. 2012;40(15):6993–7015.View ArticlePubMedPubMed CentralGoogle Scholar
- Bao R, Huang L, Andrade J, Tan W, Kibbe WA, Jiang H, Feng G. Review of current methods, applications, and data management for the bioinformatics analysis of whole exome sequencing. Cancer Informat. 2014;13 Suppl 2:67–82.Google Scholar
- Finotello F, Di Camillo B. Measuring differential gene expression with RNA-seq: challenges and strategies for data analysis. Brief Funct Genomics. 2015;14(2):130–42.View ArticlePubMedGoogle Scholar
- Yu B. Setting up next-generation sequencing in the medical laboratory. Methods Mol Biol. 2014;1168:195–206.View ArticlePubMedGoogle Scholar
- Shyr C, Kushniruk A, Wasserman WW. Usability study of clinical exome analysis software: top lessons learned and recommendations. J Biomed Inform. 2014;51:129–36.View ArticlePubMedGoogle Scholar
- SEQanswers’ List of Next Generation Sequencing Software. [http://seqanswers.com/wiki/Software/list]. Accessed 25 Apr 2016.
- Barnett DW, Garrison EK, Quinlan AR, Stromberg MP, Marth GT. BamTools: a C++ API and toolkit for analyzing and managing BAM files. Bioinformatics. 2011;27(12):1691–2.View ArticlePubMedPubMed CentralGoogle Scholar
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data Processing S. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Nordell Markovits A, Joly Beauparlant C, Toupin D, Wang S, Droit A, Gevry N. NGS++: a library for rapid prototyping of epigenomics software tools. Bioinformatics. 2013;29(15):1893–4.View ArticlePubMedGoogle Scholar
- Plieskatt J, Rinaldi G, Brindley PJ, Jia X, Potriquet J, Bethony J, Mulvenna J. Bioclojure: a functional library for the manipulation of biological sequences. Bioinformatics. 2014;30(17):2537–9.View ArticlePubMedPubMed CentralGoogle Scholar
- libStatGen. [https://github.com/statgen/libStatGen/]. Accessed 25 Apr 2016.
- Pitt WR, Williams MA, Steven M, Sweeney B, Bleasby AJ, Moss DS. The Bioinformatics Template Library--generic components for biocomputing. Bioinformatics. 2001;17(8):729–37.View ArticlePubMedGoogle Scholar
- Dutheil J, Gaillard S, Bazin E, Glemin S, Ranwez V, Galtier N, Belkhir K. Bio++: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics. BMC Bioinf. 2006;7:188.View ArticleGoogle Scholar
- Rice P, Longden I, Bleasby A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000;16(6):276–7.View ArticlePubMedGoogle Scholar
- Goto N, Prins P, Nakao M, Bonnal R, Aerts J, Katayama T. BioRuby: bioinformatics software for the Ruby programming language. Bioinformatics. 2010;26(20):2617–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Cock PJ, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25(11):1422–3.View ArticlePubMedPubMed CentralGoogle Scholar
- Holland RC, Down TA, Pocock M, Prlic A, Huen D, James K, Foisy S, Drager A, Yates A, Heuer M, et al. BioJava: an open-source framework for bioinformatics. Bioinformatics. 2008;24(18):2096–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JG, Korf I, Lapp H, et al. The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 2002;12(10):1611–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Open Bioinformatics foundation. [http://www.open-bio.org/]. Accessed 25 Apr 2016.
- Huber W, Carey VJ, Gentleman R, Anders S, Carlson M, Carvalho BS, Bravo HC, Davis S, Gatto L, Girke T, et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods. 2015;12(2):115–21.View ArticlePubMedPubMed CentralGoogle Scholar
- Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004;5(10):R80. Epub 2004 Sep 15.Google Scholar
- Mangalam H. The Bio* toolkits--a brief overview. Brief Bioinform. 2002;3(3):296–302.View ArticlePubMedGoogle Scholar
- Doring A, Weese D, Rausch T, Reinert K. SeqAn an efficient, generic C++ library for sequence analysis. BMC Bioinf. 2008;9:11.View ArticleGoogle Scholar
- Gogol-Döring A, Reinert K. Biological sequence analysis using the SeqAn C++ library. Boca Raton: CRC Press; 2010.Google Scholar
- Mason CE, Zumbo P, Sanders S, Folk M, Robinson D, Aydt R, Gollery M, Welsh M, Olson NE, Smith TM. Standardizing the Next Generation of Bioinformatics Software Development with BioHDF (HDF5). Adv Comput Biol. 2010;680:693–700.View ArticleGoogle Scholar
- Rahn R, Weese D, Reinert K. Journaled string tree-a scalable data structure for analyzing thousands of similar genomes on your laptop. Bioinformatics. 2014;30(24):3499–505.View ArticlePubMedGoogle Scholar
- Schulz MH, Weese D, Holtgrewe M, Dimitrova V, Niu S, Reinert K, Richard H. Fiona: a parallel and automatic strategy for read error correction. Bioinformatics. 2014;30(17):i356–363.View ArticlePubMedPubMed CentralGoogle Scholar
- Hauswedell H, Singer J, Reinert K. Lambda: the local aligner for massive biological data. Bioinformatics. 2014;30(17):i349–355.View ArticlePubMedPubMed CentralGoogle Scholar
- Gremme G, Steinbiss S, Kurtz S. GenomeTools: a comprehensive software library for efficient processing of structured genome annotations. IEEE/ACM Trans Comput Biol Bioinform. 2013;10(3):645–56.View ArticlePubMedGoogle Scholar
- Stroustrup B. The C++ Programming Language (4th Edition). Boston, MA, USA: Addison-Wesley Professional; 2013.Google Scholar
- Pataki N, Porkolab Z. Extension of iterator traits in the C++ Standard Template Library. In: Computer Science and Information Systems (FedCSIS), 2011 Federated Conference on: 18-21 Sept. 2011. 2011. p. 911–4.Google Scholar
- Illumina’s BaseSpace. [https://basespace.illumina.com/]
- CLCBio. [http://www.clcbio.com/]
- DNASTAR. [http://www.dnastar.com/]
- Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, Buxton S, Cooper A, Markowitz S, Duran C, et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics. 2012;28(12):1647–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P, Zhang Y, Blankenberg D, Albert I, Taylor J, et al. Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 2005;15(10):1451–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Goecks J, Nekrutenko A, Taylor J, Galaxy T. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010;11(8):R86.View ArticlePubMedPubMed CentralGoogle Scholar
- Madduri RK, Sulakhe D, Lacinski L, Liu B, Rodriguez A, Chard K, Dave UJ, Foster IT. Experiences Building Globus Genomics: A Next-Generation Sequencing Analysis Service using Galaxy, Globus, and Amazon Web Services. Concurr Comput. 2014;26(13):2266–79.View ArticlePubMedPubMed CentralGoogle Scholar
- Wattam AR, Abraham D, Dalay O, Disz TL, Driscoll T, Gabbard JL, Gillespie JJ, Gough R, Hix D, Kenyon R, et al. PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic Acids Res. 2014;42(Database issue):D581–591.View ArticlePubMedGoogle Scholar
- Golosova O, Henderson R, Vaskin Y, Gabrielian A, Grekhov G, Nagarajan V, Oler AJ, Quinones M, Hurt D, Fursov M, et al. Unipro UGENE NGS pipelines and components for variant calling, RNA-seq and ChIP-seq data analyses. PeerJ. 2014;2:e644.View ArticlePubMedPubMed CentralGoogle Scholar
- Okonechnikov K, Golosova O, Fursov M, Team U. Unipro UGENE: a unified bioinformatics toolkit. Bioinformatics. 2012;28(8):1166–7.View ArticlePubMedGoogle Scholar
- Glinert EP. Visual Programming Environments: Paradigms and Systems. Los Alamitos, CA, USA: IEEE Computer Society Press; 1990.Google Scholar
- Shu N. Visual Programming Languages: A Perspective and a Dimensional Analysis. In: Chang S-K, Ichikawa T, Ligomenides P, editors. Visual Languages. US: Springer; 1986. p. 11–34.View ArticleGoogle Scholar
- Cypher A, editor. Watch what I do: programming by demonstration. Cambridge, MA, USA: MIT Press; 1993.Google Scholar
- Lieberman H, editor. Your wish is my command: programming by example. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc; 2001.Google Scholar
- Brooks R. Watch What I Do - Programming by Demonstration - Cypher,A. Int J Man Mach Stud. 1993;39(6):1054–5.Google Scholar
- Green TRG, Petre M. Usability analysis of visual programming environments: A ‘cognitive dimensions’ framework. J Visual Lang Comput. 1996;7(2):131–74.View ArticleGoogle Scholar
- MacLaurin M. The Design of Kodu: A Tiny Visual Programming Language for Children on the Xbox 360. Acm Sigplan Notices. 2011;46(1):241–5.View ArticleGoogle Scholar
- Busby J, Parrish Z, Wilson J. Mastering Unreal technology. Indianapolis: Sams; 2010.Google Scholar
- Evans E. Domain-driven design : tackling complexity in the heart of software. Boston: Addison-Wesley; 2004.Google Scholar
- Nilsson J. Applying domain-driven design and patterns: with examples in C# and.NET. Upper Saddle River: Addison-Wesley; 2006.Google Scholar
- Jain R. Agile Software Development: Adaptive Systems Principles and Best Practices. Inf Syst Manag. 2006;23(3):19–30.View ArticleGoogle Scholar
- Memon AM, Pollack ME, Soffa ML. Using a goal-driven approach to generate test cases for GUIs. In: Proceedings of the 21st international conference on Software engineering; Los Angeles, California, USA. 302632: ACM 1999: 257-266Google Scholar
- IEEE 1012. [https://standards.ieee.org/findstds/standard/1012-2012.html]. Accessed 25 Apr 2016.
- SEQanswers. [http://seqanswers.com/]. Accessed 25 Apr 2016.
- GitHub. [https://github.com/]. Accessed 25 Apr 2016.
- Rabix: Reproducible Analyses for Bioinformatics. [https://www.rabix.org/]. Accessed 25 Apr 2016.
- The Common Workflow Language (CWL). [http://www.commonwl.org]. Accessed 25 Apr 2016.
- Milicchio F, Paoluzzi A, Bertoli C. A Visual Approach To Geometric Programming. Comput-Aided Des Applic. 2005;2:411–20.View ArticleGoogle Scholar
- Bottaro A, Marino E, Milicchio F, Paoluzzi A, Rosina M, Spini F. Visual Programming of Location-Based Services. In: Smith M, Salvendy G, editors. Human Interface and the Management of Information Interacting with Information, vol. 6771. Berlin Heidelberg: Springer; 2011. p. 3–12.View ArticleGoogle Scholar
- Dimou A, Verborgh R, Sande MV, Mannens E, Walle RVd. Machine-interpretable dataset and service descriptions for heterogeneous data access and retrieval. In: Proceedings of the 11th International Conference on Semantic Systems; Vienna, Austria. 2814873: ACM 2015: 145-152Google Scholar
- Lanthaler M, Gütl C. On using JSON-LD to create evolvable RESTful services. In: Proceedings of the Third International Workshop on RESTful Design; Lyon, France. 2307827: ACM 2012: 25-32Google Scholar
- Liu HJ, Luo P, Wang DS. A distributed expansible authentication model based on Kerberos. J Netw Comput Appl. 2008;31(4):472–86.View ArticleGoogle Scholar
- Butler F, Cervesato I, Jaggard AD, Scedrov A, Walstad C. Formal analysis of Kerberos 5. Theor Comput Sci. 2006;367(1-2):57–87.View ArticleGoogle Scholar
- Makinen V. Compressed Full-Text Indexes. Acm Comput Surv. 2007;39(1):1–61.View ArticleGoogle Scholar