Commit 69a7c2c9 by Frédéric Boyer

Edited the doc

parent 3374a79c
Sequence annotations
====================
:py:mod:`obiannotate` is the command that allows to edit/add/clear annotations attached to sequences.
Once such tags are added, they can be used by the other OBITools command to filter them or get statistics.
Annotate sequence files
-----------------------
.. toctree::
:maxdepth: 2
......
Reverse complement the sequences from a sequence file
=====================================================
.. automodule:: obicomplement
Print the reverse complement of the sequences of a sequence file.
.. include:: ../optionsSet/defaultoptions.txt
.. include:: ../optionsSet/defaultoptions.txt
.. include:: ../optionsSet/inputformat.txt
.. include:: ../optionsSet/inputformat.txt
Count the number of sequences
=============================
.. automodule:: obicount
Count the number of sequences in a sequence file.
.. include:: ../optionsSet/defaultoptions.txt
.. include:: ../optionsSet/defaultoptions.txt
.. include:: ../optionsSet/inputformat.txt
.. include:: ../optionsSet/inputformat.txt
:py:mod:`obicount` specific options
--------------------------------------
.. cmdoption:: -a, --all
Print the total count of the sequences (if a sequence has no 'count' tag its default count is 1) [default : False]
.. cmdoption:: -s, --sequence
Print the number sequences and their total count
......@@ -3,3 +3,11 @@
.. include:: ../optionsSet/defaultoptions.txt
.. include:: ../optionsSet/inputformat.txt
:py:mod:`obihead` specific options
----------------------------------
.. cmdoption:: -n <INT>, --sequence-count <INT>
Number of sequences to be printed [default : 10]
\ No newline at end of file
Extract a random subset of sequences from a sequence file
=========================================================
.. automodule:: obisample
.. include:: ../optionsSet/defaultoptions.txt
.. include:: ../optionsSet/inputformat.txt
.. include:: ../optionsSet/defaultoptions.txt
:py:mod:`obisample` specific options
----------------------------------
.. cmdoption:: -s <INT>, --sample-size <INT>
Total count of sequences to be printed [default : number of provided sequences]
If -a option is set size is expressed as fraction
.. cmdoption:: -a, --approx-sampling
Switch to an approximative algorithm, useful for large files
.. cmdoption:: -w, --without-replacement
Ask for sampling without replacement
.. automodule:: obigrep
.. automodule:: obiselect
.. include:: ../optionsSet/defaultoptions.txt
......
Extract the first sequences from a sequence file
================================================
.. automodule:: obitail
.. include:: ../optionsSet/defaultoptions.txt
.. include:: ../optionsSet/inputformat.txt
Print tail of a sequence file. Can specify number of sequences to be printed.
:py:mod:`obitail` specific options
-------------------------------------
.. include:: ../optionsSet/defaultoptions.txt
.. include:: ../optionsSet/inputformat.txt
.. cmdoption:: -n <INT>, --sequence-count <INT>
Number of sequences to be printed [default : 10]
\ No newline at end of file
#!/usr/local/bin/python
'''
`obiannotate` : Add/Edit annotations of sequences in a sequence file
====================================================================
:py:mod:`obiannotate` : Add/Edit annotations of sequences
=========================================================
.. codeauthor:: Eric Coissac <eric.coissac@metabarcoding.org>
:py:mod:`obiannotate` command allows to manipulate the identifier and the definition of sequences in a file.
:py:mod:`obiannotate` is the command that allows to add/modify/clear annotations tags attached to sequences.
Once such tags are added, they can be used by the other OBITools command for filtering purpose or for getting statistics.
The basic usage allows to specify `tag:value` annotations to be added in the definition of the sequences, the `value`
being a valid PYTHON EXPRESSION. The `tag:value` annotations are added to the definition of each the sequences and can
......
#!/usr/local/bin/python
"""
`obicomplement` : Reverse complement sequences
==============================================
:py:mod:`obicomplement` : Reverse complement sequences
======================================================
.. codeauthor:: Eric Coissac <eric.coissac@metabarcoding.org>
:py:mod:`obicomplement` command allows to reverse complement the sequences in a file.
:py:mod:`obicomplement` allows to reverse complement the sequences in a file.
Note that the identifiers 'ID' of the sequences are modified and are of the form 'ID_CMP'.
......
#!/usr/local/bin/python
'''
Created on 1 nov. 2009
:py:mod:`obicount` : Count the number of sequences
==================================================
@author: coissac
.. codeauthor:: Eric Coissac <eric.coissac@metabarcoding.org>
:py:mod:`obicount` count the number of sequences in a sequence file and/or their total count (use the 'count' annotation tag).
Example: To get the number of sequences in a file :
.. code-block:: bash
> obicount seq.fasta
Example: To get the total count of the sequences in a file:
.. code-block:: bash
> obicount -a seq.fasta
Example: To get both the number and the total count of the sequences in a file :
.. code-block:: bash
> obicount -s seq.fasta
'''
from obitools.options import getOptionManager
......@@ -12,13 +35,13 @@ def addCountOptions(optionManager):
optionManager.add_option('-s','--sequence',
action="store_true", dest="sequence",
default=False,
help="Print the count of differentes sequences"
help="Print the number sequences and their total count"
)
optionManager.add_option('-a','--all',
action="store_true", dest="all",
default=False,
help="Print the count of all sequences"
help="Print the total count of the sequences (if a sequence has no 'count' tag its default count is 1)"
)
......
#!/usr/local/bin/python
'''
`obigrep` : Filtering a sequence file
=====================================
:py:mod:`obigrep` : Filter sequences
====================================
.. codeauthor:: Eric Coissac <eric.coissac@metabarcoding.org>
:py:mod:`obigrep` command is in some way analog to the standard Unix `grep` command.
It select a subset of sequences from :ref:`a sequence file<the-sequence-files>`.
But instead of working text line by text line as the standard tool,
selection is done sequence by sequence.
But instead of working text line by text line as the standard tool, selection is done sequence by sequence.
Moreover :py:mod:`obigrep` allows the user to specify several conditions (that take the value TRUE or FALSE) and only the sequences
that fulfill all the conditions (all conditions are TRUE) are printed.
A sequence is a more complexe than a single text line and it can be split in several parts:
A sequence is more complex than a single text line and it can be split in several parts:
- the identifier (the character string just after the '>' character)
- the definition (the set of [tag=value ;]* strings just after the sequence identifier)
- the sequence itself
- the identifier
- the definition
and with the :ref:`the OBITools extension of the FASTA and FASTQ formats<obitools-fasta>` all the key, values pairs.
So a large set of options allows to refine selection on each one of these elements.
Depending on the options, the conditions can either concern the sequence part (e.g. some pattern to be found in the DNA sequence), the identifier or
any of the (tag,values) pairs contained in the definition part of the sequence (see :ref:`the OBITools extension of the FASTA and FASTQ formats<obitools-fasta>`).
A large set of options allows to refine selection on each one of these elements.
Note that the '-v' option invert the selection.
Example: Keep only the sequences that contains of stretch of at least 10 'A' :
.. code-block:: bash
> obigrep -s '[CGT]*A{10,}[CGT]*' seq1.fasta > seq2.fasta
Example: Keep only the sequences that have a sequence length greater than 100bp :
.. code-block:: bash
> obigrep -l 100 seq1.fasta > seq2.fasta
Example: Keep only the sequences that have a sequence length lesser than 100bp :
.. code-block:: bash
> obigrep -s '[ACGT]+N{5,}[ACGT]+' seq1.fasta > seq2.fasta
> obigrep -L 100 seq1.fasta > seq2.fasta
'''
......
#!/usr/local/bin/python
'''
:py:mod:`obihead` : Extract the first sequences from a sequence file
====================================================================
:py:mod:`obihead` : Extract the first sequences
===============================================
.. codeauthor:: Eric Coissac <eric.coissac@metabarcoding.org>
:py:mod:`obihead` command is in some way analog to the standard Unix `head` command.
It selects the head of :ref:`a sequence file<the-sequence-files>`.
......
#!/usr/local/bin/python
'''
Created on 1 nov. 2009
:py:mod:`obisample` : Extract a random subset of sequences from a sequence file
===============================================================================
.. codeauthor:: Eric Coissac <eric.coissac@metabarcoding.org>
:py:mod:`obisample` randomly sample sequences.
'-s' option allows to specify the final 'count' of sequences (by default the number of sequences provided). When option '-a' is set
the argument of the '-s' option is considered a fraction of the total number of sequences provided.
.. code-block:: bash
> obisample -s 100 seq1.fasta > seq2.fasta
@author: coissac
'''
from obitools.options import getOptionManager
......@@ -17,7 +31,7 @@ def addSampleOptions(optionManager):
type="float",
default=None,
help="Size of the generated sample. "
"If -a option is set size is expressed as fraction"
"If -a option is set, size is expressed as fraction"
)
optionManager.add_option('-a','--approx-sampling',
action="store_true", dest="approx",
......
#!/usr/local/bin/python
"""
`obiselect` : Select and print sequences identified by a list of IDs
=====================================================================
:py:mod:`obiselect` : Select and print sequences identified by a list of IDs
============================================================================
.. codeauthor:: Eric Coissac <eric.coissac@metabarcoding.org>
......@@ -13,6 +13,7 @@ If an ID is not found, no warning will be produced.
Option -v allows to invert the selection and thus select sequences whose IDs are not in the list.
Example:
.. code-block:: bash
> obiselect --identifier IDs.list seq.fasta > selectedSeq.fasta
......
#!/usr/local/bin/python
'''
Created on 15 dec. 2009
:py:mod:`obitail` : Extract the last sequences
==============================================
.. codeauthor:: Eric Coissac <eric.coissac@metabarcoding.org>
:py:mod:`obitail` command is in some way analog to the standard Unix `tail` command.
It selects the tail of :ref:`a sequence file<the-sequence-files>`.
But instead of working text line by text line as the standard tool,
selection is done at sequence level. You can specify number of sequences to be printed.
.. code-block:: bash
> obitail -n 150 seq1.fasta > seq2.fasta
@author: coissac
'''
from obitools.format.options import addInOutputOption, sequenceWriterGenerator
from obitools.options import getOptionManager
import collections
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment