Skip to main content

Table 1 Summary of programming libraries/toolkits for analysis of (next-generation) sequencing data

From: Visual programming for next-generation sequencing data analytics

Library Name Release
Date
Programming
Language
License Website Features
EMBOSS [43] 2000 C
C++ BTL
others
GNU GPL http://emboss.sourceforge.net/ Sequence alignment; rapid database search; protein motif identification; nucleotide sequence pattern analysis; codon usage analysis for small genomes; rapid identification of sequence patterns in large scale sequence sets; presentation tools for publication.
BTL [41] 2001 C++ GNU GPL http://www.cryst.bbk.ac.uk/~classlib/ Data structures (e.g. graphs); nucleotide string methods (e.g. Fourier transform, Needleman-Wunsch alignment).
Bioperl [47] 2002 Perl Artistic License
GNU GPL
http://bioperl.org/ Access sequence data from local/remote data bases; manage data base formats; data base search; manipulating sequences/sequence alignments; gene annotations.
Bioconductor [50] 2003 R
(C/C++)
Artistic
BSD
GNU GPL
https://www.bioconductor.org/ Repository of multiple libraries for analysis and comprehension of genomic and –omics data, including NGS.
BioPHP 2003 PHP GNU GPL http://biophp.org/ DNA and protein sequence analysis, sequence alignment.
GenomeTools [58] 2003 C Open BSD http://genometools.org/ Parsing, compression, k-mer, suffix trees, annotation, error correction and other sequence analytics (FASTA, FASTQ)
Pizza&Chili [94] 2005 C/C++ GNU Lesser GPL http://pizzachili.di.unipi.it/ Compressed indices, text collections
Bio++[42] 2006 C++ CeCILL GPL http://kimura.univ-montp2.fr/BioPP Sequence analysis, phylogenetics, molecular evolution; population genetics.
Biojava [46] 2008 Java GNU Lesser GPL www.biojava.org/ Manipulate biological sequences; file parse; DAS client/server support; access to BioSQL/Ensembl data bases; tools for making sequence analysis GUIs; statistical routines; dynamic programming toolkit.
SeqAn [52] 2008 C++ BSD 3-clause http://www.seqan.de/ Extensive set of algorithms and data structures for the analysis of nucleotide sequences, with emphasis on NGS data; includes index, compression, data base search, support for NGS-specific file formats (fastq, SAM/BAM, VCF, BED).
Biopython [45] 2009 Python, C Biopython http://biopython.org/ Sequence input/output; alignment input/output; population genetics; structural bioinformatics; SQL interface.
htslib
SAMtools
BCFtools [37]
2009 C MIT Expat
Modified BSD
http://www.htslib.org/ Read, write, edit, index, view SAM/BAM/CRAM formats; read, write BCF2/VCF/gVCF files; call, filter, summarize SNP/short indels.
BioRuby [44] 2010 Ruby GNU GPL http://bioruby.open-bio.org/ DNA and protein sequence analysis, sequence alignment, biological database parsing, ontology, structural biology.
BAMTools [36] 2011 C++ MIT https://github.com/pezmaster31/bamtools Read, write, manipulate BAM formats
libStatGen [40] 2011 C++ GNU GPL https://github.com/statgen/libStatGen Handle SAM/BAM, fastq, GLF, VCF, ASP.
NGS++ [38] 2013 C++ GNU Lesser GPL https://github.com/NGS-lib/NGSplusplus Read, write, manipulate multiple genomic file formats and data associated with BED type files (epigenomics).
Bioclojure [39] 2014 Clojure GNU Lesser GPL https://github.com/s312569/clj-biosequence Parse of Genbank, Uniprot XML, fasta, fastq formats; wrappers for BLAST, signalP, TMHMM; index files for random access, lazy processing of sequences from very large files.