obiannotate.py 2.88 KB
Newer Older
1 2
#!/usr/local/bin/python

3
'''
Aurélie Bonin committed
4
:py:mod:`obiannotate`: adds/edits sequence record annotations
Aurélie Bonin committed
5
=============================================================
6 7 8

.. codeauthor:: Eric Coissac <eric.coissac@metabarcoding.org>

Aurélie Bonin committed
9 10
:py:mod:`obiannotate` is the command that allows adding/modifying/removing 
annotation attributes attached to sequence records.
11

Aurélie Bonin committed
12 13
Once such attributes are added, they can be used by the other OBITools commands for 
filtering purposes or for statistics computing.
14

15
*Example 1:*
16

Aurélie Bonin committed
17 18 19 20 21
    .. code-block:: bash
        
        > obiannotate -S short:'len(sequence)<100' seq1.fasta > seq2.fasta

    The above command adds an attribute named *short* which has a boolean value indicating whether the sequence length is less than 100bp.
22

23
*Example 2:*
24

Aurélie Bonin committed
25 26
    .. code-block:: bash
        
27
        > obiannotate --seq-rank seq1.fasta | \\
28 29
          obiannotate -C --set-identifier '"'FungA'_%05d" % seq_rank' \\
          > seq2.fasta
30

Aurélie Bonin committed
31 32 33 34
    The above command adds a new attribute whose value is the sequence record 
    entry number in the file. Then it clears all the sequence record attributes 
    and sets the identifier to a string beginning with *FungA_* followed by a 
    suffix with 5 digits containing the sequence entry number.
35

36
*Example 3:*
37

Aurélie Bonin committed
38 39
    .. code-block:: bash
        
40 41
        > obiannotate -d my_ecopcr_database \\
          --with-taxon-at-rank=genus seq1.fasta > seq2.fasta
Aurélie Bonin committed
42 43 44

    The above command adds taxonomic information at the *genus* rank to the 
    sequence records. 
45

46
*Example 4:*
47

Aurélie Bonin committed
48 49
    .. code-block:: bash
        
50 51
        > obiannotate -S 'new_seq:str(sequence).replace("a","t")' \\
          seq1.fasta | obiannotate --set-sequence new_seq > seq2.fasta
52

Aurélie Bonin committed
53 54 55 56 57
    The overall aim of the above command is to edit the *sequence* object itself, 
    by replacing all nucleotides *a* by nucleotides *t*. First, a new attribute 
    named *new_seq* is created, which contains the modified sequence, and then 
    the former sequence is replaced by the modified one.
    
58
'''
59 60 61 62 63 64

from obitools.options import getOptionManager
from obitools.options.bioseqfilter import addSequenceFilteringOptions
from obitools.options.bioseqfilter import filterGenerator
from obitools.options.bioseqedittag import addSequenceEditTagOptions
from obitools.options.bioseqedittag import sequenceTaggerGenerator
Eric Coissac committed
65
from obitools.format.options import addInOutputOption, sequenceWriterGenerator
Eric Coissac committed
66
        
67 68 69 70 71
    
if __name__=='__main__':
    
    optionParser = getOptionManager([addSequenceFilteringOptions,
                                     addSequenceEditTagOptions,
72
                                     addInOutputOption], progdoc=__doc__)
73 74 75

    (options, entries) = optionParser()
    
Eric Coissac committed
76 77
    writer = sequenceWriterGenerator(options)
    
78 79 80 81 82 83
    sequenceTagger = sequenceTaggerGenerator(options)
    goodFasta = filterGenerator(options)
    
    for seq in entries:
        if goodFasta(seq):
            sequenceTagger(seq)
Eric Coissac committed
84
        writer(seq)
85