Commit 03215e4d by Celine Mercier

obiaddtaxids : added the genus constraint associated with the -g option and fixed some typos

parent 799ed5ec
......@@ -9,13 +9,13 @@ The :py:mod:`obiaddtaxids` command annotates sequence records with a taxid based
a taxon scientific name stored in the sequence record header.
Taxonomic information linking a taxid to a taxon scientific name is stored in a
database formated as an ecopcr database (see :doc:`obitaxonomy <obitaxonomy>`) or
database formatted as an ecopcr database (see :doc:`obitaxonomy <obitaxonomy>`) or
a NCBI taxdump (see NCBI ftp site).
The way to extract the taxon scientific name from the sequence record header can be
refined by two options:
specified by two options:
- By default, the sequence identifier is used. Underscore characters (_) are substituted
- By default, the sequence identifier is used. Underscore characters (``_``) are substituted
by spaces before looking for the taxon scientific name into the taxonomic
database.
......@@ -32,12 +32,12 @@ with those stored in the taxonomic database.
- If a match is found, the sequence record is annotated with the corresponding ``taxid``.
Otherwise
Otherwise,
- If the ``-g`` option is set and the taxon name is composed of two words and only the
first one is found in the taxonomic database, :py:mod:`obiaddtaxids` considers that
it found the genus associated with this sequence record and it stores this sequence
record in the file specified by the ``-g`` option.
first one is found in the taxonomic database at the 'genus' rank, :py:mod:`obiaddtaxids`
considers that it found the genus associated with this sequence record and it stores this
sequence record in the file specified by the ``-g`` option.
- If the ``-u`` option is set and no taxonomic information is retrieved from the
scientific taxon name, the sequence record is stored in the file specified by the
......@@ -210,11 +210,14 @@ def dirtyLookForSimilarNames(name, tax, ancestor):
def getGenusTaxid(tax, species_name, ancestor):
genus_sp = species_name.split(' ')
return getTaxid(tax, genus_sp[0], ancestor)
genus_taxid = getTaxid(tax, genus_sp[0], ancestor)
if tax.getRank(genus_taxid) != 'genus' :
raise KeyError()
return genus_taxid
def getTaxid(tax, name, ancestor):
taxid = tax.findTaxonByName(name)[0]
if ancestor != None and not tax.isAncestor(ancestor, taxid) :
raise KeyError()
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment