With the development of next-generation sequencing, efficient tools are needed to handle millions of sequences in reasonable amounts of time. Sumatra is a program developed by the LECA. Sumatra aims to compare sequences in a way that is fast and exact at the same time. This tool has been developed to be adapted to the type of data generated by DNA metabarcoding, i.e. entirely sequenced, short markers. Sumatra computes the pairwise alignment scores from one dataset or between two datasets, with the possibility to specify a similarity threshold under which pairs of sequences that have a lower similarity are not reported. The output can then go through a classification process with programs such as MCL or MOTHUR. Currently, Sumatra is available as a program that you can download and install on Unix-like machines.
Latest Updates
Version 1.0.36: New version of the libsuma library with more robust sequence parsing.
Version 1.0.34: Compilation with the libsuma library (see https://git.metabarcoding.org/obitools/sumalibs).
Version 1.0.31: Fixed a memory bug with similarity thresholds of 100%.
Version 1.0.20: The input can now be the standard input when there is only one dataset to analyze.
Version 1.0.10: Sumatra and Sumaclust have been split in two packages.
Installing Sumatra
Download the archive on this page, then untar it, go into the newly created directory and compile:
tar –zxvf sumatra_v[x.x.xx].tar.gz
cd sumatra_v[x.x.xx]
make -C sumalibs install
make install
See the user manual downloadable from this page and included in the archive for a complete documentation.
See also
Sumaclust clusters sequences using the same alignment methods as Sumatra.