Commit 870e1d62 by Eric Coissac

--no commit message

parent abf86aaa
Sequence selection options
--------------------------
Sequence record selection options
---------------------------------
.. cmdoption:: -s <REGULAR_PATTERN>, --sequence=<REGULAR_PATTERN>
Regular expression pattern used to select the
sequence. The pattern is case insensitive.
Regular expression pattern to be tested against the
sequence itself. The pattern is case insensitive.
*Examples:*
Keeps only the sequences that contain a stretch of at least 10 'A':
.. code-block:: bash
> obigrep -s 'GAATTC' seq1.fasta > seq2.fasta
Keeps only the sequence records that contain an *EcoRI* restriction site.
.. code-block:: bash
.. code-block:: bash
> obigrep -s 'A{10,}' seq1.fasta > seq2.fasta
Keeps only the sequences that do not contain ambiguous nucleotides:
Keeps only the sequence records that contain a stretch of at least 10 ``A``.
.. code-block:: bash
.. code-block:: bash
> obigrep -s '^[ACGT]+$' seq1.fasta > seq2.fasta
Keeps only the sequence records that do not contain ambiguous nucleotides.
.. cmdoption:: -D <REGULAR_PATTERN>, --definition=<REGULAR_PATTERN>
Regular expression pattern matched against :doc:`the
definition of the sequence <../fasta>`. The pattern is case
Regular expression pattern to be tested against :doc:`the
definition of the sequence record<../fasta>`. The pattern is case
sensitive.
*Example:*
Keeps only the sequences whose definition contains 'chloroplast':
.. code-block:: bash
.. code-block:: bash
> obigrep -D 'chloroplast' seq1.fasta > seq2.fasta
> obigrep -D '[Cc]hloroplast' seq1.fasta > seq2.fasta
Keeps only the sequence records whose definition contains ``chloroplast`` or
``Chloroplast``.
.. cmdoption:: -I <REGULAR_PATTERN>, --identifier=<REGULAR_PATTERN>
Regular expression pattern matched against :doc:`the
identifier of the sequence <../fasta>`. The pattern is case
Regular expression pattern to be tested against :doc:`the
identifier of the sequence record <../fasta>`. The pattern is case
sensitive.
*Example:*
Keeps only the sequences whose identifier contains 'GH':
.. code-block:: bash
.. code-block:: bash
> obigrep -I 'GH' seq1.fasta > seq2.fasta
> obigrep -I '^GH' seq1.fasta > seq2.fasta
Keeps only the sequence records whose identifier begins with ``GH``.
.. cmdoption:: --id-list=<FILENAME>
A file containing a list of :doc:`sequence identifiers <../fasta>` to
be selected. The file is a text file with a single
identifier per line.
``<FILENAME>`` points to a text file containing the list of :doc:`sequence
record identifiers <../fasta>` to be selected.
The file format consists in a single identifier per line.
*Example:*
.. code-block:: bash
> obigrep --idlist=my_id_list.txt seq1.fasta > seq2.fasta
> obigrep --id-list=my_id_list.txt seq1.fasta > seq2.fasta
Keeps only the sequence records whose identifier is present in the
``my_id_list.txt`` file.
.. cmdoption:: -a <ATTRIBUTE_NAME>:<REGULAR_PATTERN>, --attribute=<ATTRIBUTE_NAME>:<REGULAR_PATTERN>
Regular expression pattern matched against the
:doc:`attributes of the sequence <../fasta>`. the value of this attribute
:doc:`attributes of the sequence record <../fasta>`. the value of this attribute
is of the form : key:regular_pattern. The
pattern is case sensitive. Several -a options can be
pattern is case sensitive. Several ``-a`` options can be
used on the same command line and in this last case,
the selected sequences will match all constraints.
the selected sequence records will match all constraints.
*Example:*
Keeps only the sequences belonging to species from the *Asteraceae* family:
.. code-block:: bash
.. code-block:: bash
> obigrep -a family_sn:Asteraceae seq1.fasta > seq2.fasta
> obigrep -a 'family_name:Asteraceae' seq1.fasta > seq2.fasta
Selects the sequence records containing an attribute whose key is ``family_name`` and value
is ``Asteraceae``.
.. cmdoption:: -A <ATTRIBUTE_NAME>, --has-attribute=<KEY>
Selects sequences having attribute <KEY> defined.
Selects sequence records having an attribute whose key = <KEY>.
*Example:*
Keeps only the sequences having a *taxid* defined:
.. code-block:: bash
.. code-block:: bash
> obigrep -A taxid seq1.fasta > seq2.fasta
Keeps only the sequence records having a ``taxid`` attribute defined.
.. cmdoption:: -p <PYTHON_EXPRESSION>, --predicat=<PYTHON_EXPRESSION>
Python boolean expression to be evaluated for each
sequence. The attribute keys defined for the sequence
sequence record. The attribute keys defined for each sequence record
can be used in the expression as variable names.
An extra variable named 'sequence' refers to the
sequence object itself.
sequence record itself.
Several -p options can be used on the same command
line and in this last case,
the selected sequences will match all constraints.
the selected sequence records will match all constraints.
*Example:*
Keeps only the sequences with less than two errors in the forward and reverse primers:
.. code-block:: bash
.. code-block:: bash
> obigrep -p "(forward_error<2) and (reverse_error<2)" seq1.fasta > seq2.fasta
> obigrep -p '(forward_error<2) and (reverse_error<2)' seq1.fasta > seq2.fasta
Keeps only the sequence records whose ``forward_error`` and ``reverse_error``
attributes have a value smaller than two.
.. cmdoption:: -L <##>, --lmax=<##>
Keeps sequences shorter than lmax.
Keeps sequence records whose sequence length is
equal or shorter than ``lmax``.
*Example:*
Keeps only the sequences that have a sequence length equal or shorter than 100bp:
.. code-block:: bash
.. code-block:: bash
> obigrep -L 100 seq1.fasta > seq2.fasta
Keeps only the sequence records that have a sequence
length equal or shorter than 100bp.
.. cmdoption:: -l <##>, --lmin=<##>
Keeps sequences longer than lmin.
Keeps sequence records whose sequence length is
equal or longer than ``lmin``.
*Examples:*
Keeps only the sequences that have a sequence length equal or longer than 100bp:
.. code-block:: bash
.. code-block:: bash
> obigrep -l 100 seq1.fasta > seq2.fasta
Keeps only the sequence records that have a sequence length
equal or longer than 100bp.
.. cmdoption:: -v, --inverse-match
Inverts the sequence selection.
Inverts the sequence record selection.
*Examples:*
Keeps only the sequences that have a sequence length shorter than 100bp:
.. code-block:: bash
.. code-block:: bash
> obigrep -v -l 100 seq1.fasta > seq2.fasta
Keeps only the sequence records that have a sequence length shorter than 100bp.
\ No newline at end of file
......@@ -6,7 +6,7 @@
.. cmdoption:: -n <INTEGER>, --sequence-count=<INTEGER>
Number of sequences to be selected (default value : 10).
Number of sequence records to be selected (default value : 10).
.. include:: ../optionsSet/inputformat.txt
......
......@@ -6,7 +6,7 @@
.. cmdoption:: -n <INTEGER>, --sequence-count <INTEGER>
Number of sequences to be selected (default value : 10).
Number of sequence records to be selected (default value : 10).
.. include:: ../optionsSet/inputformat.txt
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment