Commit cc0f8ca8 by Aurélie Bonin

--no commit message

parent 4d9d881c
......@@ -7,28 +7,28 @@
Format of the sequence file. Possible formats are:
- ``raw``: for regular or :doc:`OBITools extended fasta <../fasta>` files (default value).
- ``raw``: for regular ``OBITools`` extended :doc:`fasta <../fasta>` files (default value).
- ``UNITE``: for fasta files downloaded from the `UNITE web site <http://unite.ut.ee/>`_.
- ``UNITE``: for :doc:`fasta <../fasta>` files downloaded from the `UNITE web site <http://unite.ut.ee/>`_.
- ``SILVA``: for fasta files downloaded from the `SILVA web site <http://www.arb-silva.de/>`_.
- ``SILVA``: for :doc:`fasta <../fasta>` files downloaded from the `SILVA web site <http://www.arb-silva.de/>`_.
.. cmdoption:: -k <KEYNAME>, --key-name=<KEYNAME>
.. cmdoption:: -k <KEY>, --key-name=<KEY>
Key of the attribute containing the taxon name in sequence files in
:doc:`OBITools extended fasta <../fasta>` format.
Key of the attribute containing the taxon name in sequence files in the ``OBITools`` extended
:doc:`fasta <../fasta>` format.
.. cmdoption:: -a <ANCESTOR>, --restricting_ancestor=<ANCESTOR>
Enables to restrict the search of taxids under a specified ancestor.
Enables to restrict the search of *taxids* under a specified ancestor.
``<ANCESTOR>`` can be a taxid (integer) or a key (string).
``<ANCESTOR>`` can be a *taxid* (integer) or a key (string).
- If it is a taxid, this taxid is used to restrict the search for all the sequence
- If it is a *taxid*, this *taxid* is used to restrict the search for all the sequence
records.
- If it is a key, :py:mod:`obiaddtaxids`: looks for the ancestor taxid in the
- If it is a key, :py:mod:`obiaddtaxids` looks for the ancestor *taxid* in the
corresponding attribute. This allows having a different ancestor restriction
for each sequence record.
......
#!/usr/local/bin/python
'''
:py:mod:`obiaddtaxids`: Adds taxids to sequence records using an ecopcr database
================================================================================
:py:mod:`obiaddtaxids`: adds *taxids* to sequence records using an ecopcr database
==================================================================================
.. codeauthor:: Celine Mercier <celine.mercier@metabarcoding.org>
The :py:mod:`obiaddtaxids` command annotates sequence records with a taxid based on
The :py:mod:`obiaddtaxids` command annotates sequence records with a *taxid* based on
a taxon scientific name stored in the sequence record header.
Taxonomic information linking a taxid to a taxon scientific name is stored in a
database formatted as an ecopcr database (see :doc:`obitaxonomy <obitaxonomy>`) or
Taxonomic information linking a *taxid* to a taxon scientific name is stored in a
database formatted as an ecoPCR database (see :doc:`obitaxonomy <obitaxonomy>`) or
a NCBI taxdump (see NCBI ftp site).
The way to extract the taxon scientific name from the sequence record header can be
......@@ -19,10 +19,10 @@ specified by two options:
by spaces before looking for the taxon scientific name into the taxonomic
database.
- If the input file is an :doc:`OBITools extended fasta format <../fasta>`, the ``-k`` option
- If the input file is an ``OBITools`` extended :doc:`fasta <../fasta>` format, the ``-k`` option
specifies the attribute containing the taxon scientific name.
- If the input file is a fasta file imported from the UNITE or from the SILVA web sites,
- If the input file is a :doc:`fasta <../fasta>` file imported from the UNITE or from the SILVA web sites,
the ``-f`` option allows specifying this source and parsing correctly the associated
taxonomic information.
......@@ -30,7 +30,7 @@ specified by two options:
For each sequence record, :py:mod:`obiaddtaxids` tries to match the extracted taxon scientific name
with those stored in the taxonomic database.
- If a match is found, the sequence record is annotated with the corresponding ``taxid``.
- If a match is found, the sequence record is annotated with the corresponding *taxid*.
Otherwise,
......@@ -53,11 +53,11 @@ Otherwise,
my_sequences.fasta > identified.fasta
Tries to match the value associated with the ``species_name`` key of each sequence record
from the ``my_sequences.fasta`` file with a taxon name from the ecopcr database ``my_ecopcr_database``.
from the ``my_sequences.fasta`` file with a taxon name from the ecoPCR database ``my_ecopcr_database``.
- If there is an exact match, the sequence record is stored in the ``identified.fasta`` file.
- If not and the ``species_name`` value is composed of two words, :py:mod:`obiaddtaxids`:
- If not and the ``species_name`` value is composed of two words, :py:mod:`obiaddtaxids`
considers the first word as a genus name and tries to find it into the taxonomic database.
- If a genus is found, the sequence record is stored in the ``genus_identified.fasta``
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment