Skip to main content

Table 1 Summary of programming libraries/toolkits for analysis of (next-generation) sequencing data

From: Visual programming for next-generation sequencing data analytics

Library Name Release
License Website Features
EMBOSS [43] 2000 C
GNU GPL Sequence alignment; rapid database search; protein motif identification; nucleotide sequence pattern analysis; codon usage analysis for small genomes; rapid identification of sequence patterns in large scale sequence sets; presentation tools for publication.
BTL [41] 2001 C++ GNU GPL Data structures (e.g. graphs); nucleotide string methods (e.g. Fourier transform, Needleman-Wunsch alignment).
Bioperl [47] 2002 Perl Artistic License
GNU GPL Access sequence data from local/remote data bases; manage data base formats; data base search; manipulating sequences/sequence alignments; gene annotations.
Bioconductor [50] 2003 R
GNU GPL Repository of multiple libraries for analysis and comprehension of genomic and –omics data, including NGS.
BioPHP 2003 PHP GNU GPL DNA and protein sequence analysis, sequence alignment.
GenomeTools [58] 2003 C Open BSD Parsing, compression, k-mer, suffix trees, annotation, error correction and other sequence analytics (FASTA, FASTQ)
Pizza&Chili [94] 2005 C/C++ GNU Lesser GPL Compressed indices, text collections
Bio++[42] 2006 C++ CeCILL GPL Sequence analysis, phylogenetics, molecular evolution; population genetics.
Biojava [46] 2008 Java GNU Lesser GPL Manipulate biological sequences; file parse; DAS client/server support; access to BioSQL/Ensembl data bases; tools for making sequence analysis GUIs; statistical routines; dynamic programming toolkit.
SeqAn [52] 2008 C++ BSD 3-clause Extensive set of algorithms and data structures for the analysis of nucleotide sequences, with emphasis on NGS data; includes index, compression, data base search, support for NGS-specific file formats (fastq, SAM/BAM, VCF, BED).
Biopython [45] 2009 Python, C Biopython Sequence input/output; alignment input/output; population genetics; structural bioinformatics; SQL interface.
BCFtools [37]
2009 C MIT Expat
Modified BSD Read, write, edit, index, view SAM/BAM/CRAM formats; read, write BCF2/VCF/gVCF files; call, filter, summarize SNP/short indels.
BioRuby [44] 2010 Ruby GNU GPL DNA and protein sequence analysis, sequence alignment, biological database parsing, ontology, structural biology.
BAMTools [36] 2011 C++ MIT Read, write, manipulate BAM formats
libStatGen [40] 2011 C++ GNU GPL Handle SAM/BAM, fastq, GLF, VCF, ASP.
NGS++ [38] 2013 C++ GNU Lesser GPL Read, write, manipulate multiple genomic file formats and data associated with BED type files (epigenomics).
Bioclojure [39] 2014 Clojure GNU Lesser GPL Parse of Genbank, Uniprot XML, fasta, fastq formats; wrappers for BLAST, signalP, TMHMM; index files for random access, lazy processing of sequences from very large files.