Skip to main content

Table 1 Summary of programming libraries/toolkits for analysis of (next-generation) sequencing data

From: Visual programming for next-generation sequencing data analytics

Library Name

Release

Date

Programming

Language

License

Website

Features

EMBOSS [43]

2000

C

C++ BTL

others

GNU GPL

http://emboss.sourceforge.net/

Sequence alignment; rapid database search; protein motif identification; nucleotide sequence pattern analysis; codon usage analysis for small genomes; rapid identification of sequence patterns in large scale sequence sets; presentation tools for publication.

BTL [41]

2001

C++

GNU GPL

http://www.cryst.bbk.ac.uk/~classlib/

Data structures (e.g. graphs); nucleotide string methods (e.g. Fourier transform, Needleman-Wunsch alignment).

Bioperl [47]

2002

Perl

Artistic License

GNU GPL

http://bioperl.org/

Access sequence data from local/remote data bases; manage data base formats; data base search; manipulating sequences/sequence alignments; gene annotations.

Bioconductor [50]

2003

R

(C/C++)

Artistic

BSD

GNU GPL

https://www.bioconductor.org/

Repository of multiple libraries for analysis and comprehension of genomic and –omics data, including NGS.

BioPHP

2003

PHP

GNU GPL

http://biophp.org/

DNA and protein sequence analysis, sequence alignment.

GenomeTools [58]

2003

C

Open BSD

http://genometools.org/

Parsing, compression, k-mer, suffix trees, annotation, error correction and other sequence analytics (FASTA, FASTQ)

Pizza&Chili [94]

2005

C/C++

GNU Lesser GPL

http://pizzachili.di.unipi.it/

Compressed indices, text collections

Bio++[42]

2006

C++

CeCILL GPL

http://kimura.univ-montp2.fr/BioPP

Sequence analysis, phylogenetics, molecular evolution; population genetics.

Biojava [46]

2008

Java

GNU Lesser GPL

www.biojava.org/

Manipulate biological sequences; file parse; DAS client/server support; access to BioSQL/Ensembl data bases; tools for making sequence analysis GUIs; statistical routines; dynamic programming toolkit.

SeqAn [52]

2008

C++

BSD 3-clause

http://www.seqan.de/

Extensive set of algorithms and data structures for the analysis of nucleotide sequences, with emphasis on NGS data; includes index, compression, data base search, support for NGS-specific file formats (fastq, SAM/BAM, VCF, BED).

Biopython [45]

2009

Python, C

Biopython

http://biopython.org/

Sequence input/output; alignment input/output; population genetics; structural bioinformatics; SQL interface.

htslib

SAMtools

BCFtools [37]

2009

C

MIT Expat

Modified BSD

http://www.htslib.org/

Read, write, edit, index, view SAM/BAM/CRAM formats; read, write BCF2/VCF/gVCF files; call, filter, summarize SNP/short indels.

BioRuby [44]

2010

Ruby

GNU GPL

http://bioruby.open-bio.org/

DNA and protein sequence analysis, sequence alignment, biological database parsing, ontology, structural biology.

BAMTools [36]

2011

C++

MIT

https://github.com/pezmaster31/bamtools

Read, write, manipulate BAM formats

libStatGen [40]

2011

C++

GNU GPL

https://github.com/statgen/libStatGen

Handle SAM/BAM, fastq, GLF, VCF, ASP.

NGS++ [38]

2013

C++

GNU Lesser GPL

https://github.com/NGS-lib/NGSplusplus

Read, write, manipulate multiple genomic file formats and data associated with BED type files (epigenomics).

Bioclojure [39]

2014

Clojure

GNU Lesser GPL

https://github.com/s312569/clj-biosequence

Parse of Genbank, Uniprot XML, fasta, fastq formats; wrappers for BLAST, signalP, TMHMM; index files for random access, lazy processing of sequences from very large files.