Commit 7d969494 by Eric Coissac

Continuing the first release of the documentation.

parent dbf4a082
......@@ -4,4 +4,6 @@ Assembling the reads
.. toctree::
:maxdepth: 2
commands/seeds
commands/buildgraph
The ORGanelle ASseMbler commands
================================
.. toctree::
:maxdepth: 2
preparing
assembling
finishing
unfolding
utilities
......@@ -7,6 +7,16 @@ The :ref:`organelle assembler <oa>`'s :program:`buildgraph`
realizes the assembling of the reads by building the De Bruinj Graph which
is the central data structure used by the :ref:`organelle assembler <oa>`.
.. figure:: ../oa-buildgraph.*
:align: center
:figwidth: 80 %
:width: 500
The :ref:`organelle assembler <oa>`'s :program:`buildgraph` command
executes all the colored tasks, starting by the green one and ending
at the red task
command prototype
-----------------
......@@ -14,7 +24,8 @@ command prototype
.. code-block:: none
usage: oa buildgraph [-h] [--seeds seeds]
usage: oa buildgraph [-h]
[--seeds seeds] [--kup ORGASM:KUP]
[--adapt5 adapt5] [--adapt3 adapt3]
[--coverage BUILDGRAPH:COVERAGE]
[--lowcomplexity]
......@@ -44,27 +55,11 @@ Graph initialisation options
.. _buildgraph.seeds:
.. option:: --seeds seeds
Seed sequences; either a fasta file containing seeds
sequences (nucleic or proteic) or the name of an internal
set of seeds among:
- ``nucrRNAAHypogastrura``
- ``nucrRNAArabidopsis``
- ``protChloroArabidopsis``
- ``protMitoCapra``
- ``protMitoMachaon``
.. code-block:: bash
$ oa buildgraph --seeds protChloroArabidopsis seqindex
.. include:: ../options/seeds.txt
.. _buildgraph.kup:
.. option:: --kup ORGASM:KUP
The word size used to identify the seed reads
[default: protein=4, DNA=12].
.. include:: ../options/kup.txt
Graph extension options
+++++++++++++++++++++++
......
......@@ -3,6 +3,15 @@
The :program:`fillgaps` command
===============================
.. figure:: ../oa-fillgaps.*
:align: center
:figwidth: 80 %
:width: 500
The :ref:`organelle assembler <oa>`'s :program:`fillgaps` command
executes all the colored tasks, starting by the green one and ending
at the red task
command prototype
-----------------
......@@ -15,47 +24,153 @@ command prototype
[--minoverlap BUILDGRAPH:MINOVERLAP]
[--smallbranches BUILDGRAPH:SMALLBRANCHES]
[--lowcomplexity] [--back ORGASM:BACK] [--snp]
[--adapt5 adapt5] [--adapt3 adapt3] [--seeds seeds]
[--adapt5 adapt5] [--adapt3 adapt3]
[--seeds seeds] [--kup ORGASM:KUP]
index [output]
.. include:: ../options/positional.txt
optional arguments:
-h, --help show this help message and exit
--minread BUILDGRAPH:MINREAD
the minimum count of read to consider [default:
<estimated>]
--coverage BUILDGRAPH:COVERAGE
the expected sequencing coverage [default:
<estimated>]
--minratio BUILDGRAPH:MINRATIO
minimum ratio between occurrences of an extension and
the occurrences of the most frequent extension to keep
it. [default: <estimated>]
--mincov BUILDGRAPH:MINCOV
minimum occurrences of an extension to keep it.
[default: 1]
--minoverlap BUILDGRAPH:MINOVERLAP
minimum length of the overlap between the sequence and
reads to participate in the extension. [default:
<estimated>]
--smallbranches BUILDGRAPH:SMALLBRANCHES
maximum length of the branches to cut during the
cleaning process [default: <estimated>]
--lowcomplexity Use also low complexity probes
--back ORGASM:BACK the number of bases taken at the end of contigs to
jump with pared-ends [default: <estimated>]
--snp desactivate the SNP clearing mode
--adapt5 adapt5 adapter sequences used to filter reads beginning by
such sequences; either a fasta file containing adapter
sequences or internal set of adapter sequences among
['adapt5ILLUMINA'] [default: adapt5ILLUMINA]
--adapt3 adapt3 adapter sequences used to filter reads ending by such
sequences; either a fasta file containing adapter
sequences or internal set of adapter sequences among
['adapt3ILLUMINA'] [default: adapt3ILLUMINA]
--seeds seeds protein seeds; either a file containing seeds proteic
sequences or internal set of seeds among
['nucrRNAAHypogastrura', 'nucrRNAArabidopsis',
'protChloroArabidopsis', 'protMitoCapra',
'protMitoMachaon']
optional arguments
------------------
General option
++++++++++++++
.. option:: -h, --help
Shows the help message and exit
Graph initialisation options
++++++++++++++++++++++++++++
.. _fillgaps.seeds:
.. _seeds.kup:
.. include:: ../options/kup.txt
Graph extension options
+++++++++++++++++++++++
.. figure:: ../extension.*
:align: center
:figwidth: 80 %
:width: 500
The assembling stack
.. option:: --minread BUILDGRAPH:MINREAD
the minimum count of read to consider [default: <estimated>]
.. code-block:: bash
$ oa fillgaps --minread 5 seqindex
Consider an extension if at least five reads are present in the extension
stack.
.. option:: --coverage BUILDGRAPH:COVERAGE
the expected sequencing coverage [default:
<estimated>]
.. option:: --minoverlap BUILDGRAPH:MINOVERLAP
minimum length of the overlap between the sequence and
reads to participate in the extension. [default:
<estimated>]
.. option:: --minratio BUILDGRAPH:MINRATIO
minimum ratio between occurrences of an extension and
the occurrences of the most frequent extension to keep
it. [default: <estimated>]
.. option:: --mincov BUILDGRAPH:MINCOV
minimum occurrences of an extension to keep it.
[default: 1]
Graph filtering options
+++++++++++++++++++++++
.. option:: --lowcomplexity
Use also low complexity probes
.. option:: --adapt5 adapt5
adapter sequences used to filter reads beginning by
such sequences; either a fasta file containing adapter
sequences or internal set of adapter sequences among
['adapt5ILLUMINA'] [default: adapt5ILLUMINA]
.. option:: --adapt3 adapt3
adapter sequences used to filter reads ending by such
sequences; either a fasta file containing adapter
sequences or internal set of adapter sequences among
['adapt3ILLUMINA'] [default: adapt3ILLUMINA]
Graph limit option
++++++++++++++++++
.. option:: --assmax BUILDGRAPH:ASSMAX
maximum base pair assembled
Graph cleaning options
++++++++++++++++++++++
.. option:: --smallbranches BUILDGRAPH:SMALLBRANCHES
After a cycle a extension, if you observe the assembling graph
you can observe a main path and many small aborted branches surrounding
this main path. They correspond to path initiated by a sequencing
error or a nuclear copy of a chloroplast region not enough covered by
the skimming sequencing to be successfuly extended.
One of the cleaning step consist in deleting these small branches.
This option indicates up to which lenght branches have to be deleted.
By default this legth is automaticaly estimated from the graph.
.. code-block:: bash
$ oa buildgraph --seeds protChloroArabidopsis \
--smallbranches 15 seqindex
During the cleaning steps, all the branches with a legth
shorter or equal to 15 base pairs will be deleted
.. option:: --snp
When the data set correspond to a pool of individuals, it is possible
that natural polymorphisms artificially complexy the assembling graph.
For helping the assembling process of such data set, this option will
clear the graph for such SNP by keeping only the most abundant allele
prsent in the dataset. The generated sequence can be considered as a
king of consensus. Read can be remapped in a second time on this
consensus using classical sofware like `BWA <bwa>`_ to get the lost SNP
information.
By default this option is deactivated
.. code-block:: bash
$ oa fillgaps --seeds protChloroArabidopsis \
--snp seqindex
Run the assembling, ignoring the SNPs.
Gap filling option
++++++++++++++++++
.. option:: --back ORGASM:BACK
The number of bases taken at the end of contigs to
jump with pared-ends [default: <estimated>]
.. _`bwa`: http://sourceforge.net/projects/bio-bwa/
.. _`yed`: https://www.yworks.com/en/products/yfiles/yed/
.. _oa_seeds:
The :program:`seeds` command
============================
The :ref:`organelle assembler <oa>`'s :program:`seeds` computes the set
of seed reads. The main reason of this command if to write a new version
of the file containing the set of seed reads, because its format changed.
.. figure:: ../oa-seeds.*
:align: center
:figwidth: 80 %
:width: 500
The :ref:`organelle assembler <oa>`'s :program:`seeds` command
executes only the red task
.. note::
For most of the users this command is useless, because this task is automaticaly
realized by the :ref:`oa buildgraph <oa_buildgraph>` command.
command prototype
-----------------
.. program:: oa seeds
.. code-block:: none
usage: oa seeds [-h] [--seeds seeds] [--kup ORGASM:KUP]
index [output]
.. include:: ../options/positional.txt
optional arguments
------------------
General option
++++++++++++++
.. option:: -h, --help
Shows the help message and exit
Graph initialisation options
++++++++++++++++++++++++++++
.. _seeds.seeds:
.. include:: ../options/seeds.txt
.. _seeds.kup:
.. include:: ../options/kup.txt
......@@ -11,6 +11,7 @@ Contents:
.. toctree::
:maxdepth: 3
install
strategies
oa
mitochondrion
......
Installing |orgasm|
===================
Availability of |orgasm|
........................
|Orgasm| is open source and protected by the CeCILL 2.1 license
(`http://www.cecill.info/licences/Licence_CeCILL_V2.1-en.html <http://www.cecill.info/licences/Licence_CeCILL_V2.1-en.html>`_).
|Orgasm| is deposited on the Python Package Index (PyPI : `https://pypi.python.org/pypi/ORG.asm`_)
and all the sources can be downloaded from the `metabarcoding.org <http://metabarcoding.org>`_ gitlab server
(`https://git.metabarcoding.org/org-asm/org-asm`_).
Prerequisites
.............
To install the |orgasm|, you need that these softwares are installed on your
system:
* Python 3.4 (installed by default on most ``Unix`` systems, available from
`the Python website <http://www.python.org/>`_)
* ``gcc`` (installed by default on most ``Unix`` systems, available from the
GNU sites dedicated to `GCC <https://www.gnu.org/software/gcc/>`_ and
`GMake <https://www.gnu.org/software/make/>`_)
On a linux system
^^^^^^^^^^^^^^^^^
You have to take care that the Python development packages are installed.
On MacOSX
^^^^^^^^^
The C compiler and all the other compilation tools are included in the `XCode <https://itunes.apple.com/fr/app/xcode/id497799835?mt=12>`_
application not installed by default. Python3 is not prived by default. You have to install a complete distribution
of Python that you can download as a `MacOSX package from the Python website <https://www.python.org/downloads/>`_.
Downloading and installing |orgasm|
...................................
The |orgasm| is downloaded and installed using the :download:`get-orgasm.py <../../../get_orgasm/get-orgasm.py>` script.
This is a user level installation that does not need administrator privilege.
Once downloaded, move the file :download:`get-orgasm.py <../../../get_orgasm/get-orgasm.py>` in the directory where you want to install
the |orgasm|. From a Unix terminal you must now run the command :
.. code-block:: bash
> python3 get-orgasm.py
The script will create a new directory at the place you are running it in which all the
|orgasm| will be installed. No system privilege are required, and you system will not
be altered in any way by the obitools installation.
The newly created directory is named ORG.asm-VERSION where version is substituted by the
latest version number available.
Inside the newly created directory all the |orgasm| is installed. Close to this directory
there is a shell script named ``orgasm``. Running this script activate the |orgasm|
by reconfiguring your Unix environment.
.. code-block:: bash
> ./orgasm
Once activated you can desactivate |orgasm| by typing the command ``exit``.
.. code-block:: bash
> exit
ORG.asm are no more activated, Bye...
=====================================
System level installation
.........................
To install the |orgasm| at the system level you can follow two options :
- copy the |orgasm| script in a usual directory for installing program like ``/usr/local/bin``
but never move the ``ORG.asm`` directory itself after the installation by the
:download:`get-orgasm.py <../../../get_orgasm/get-orgasm.py>`.
- The other solution is to add the ``export/bin`` directory located in the ``ORG.asm`` directory
to the ``PATH`` environment variable.
Retrieving the sources of |orgasm|
..................................
If you want to compile by yourself the |orgasm|, you will need to install the same
prerequisite:
.. code-block:: bash
> pip3 install -U pip
> pip3 install -U sphinx
> pip3 install -U cython
moreover you need to install any git client (a list of clients is available from `GIT website <https://git-scm.com/downloads>`_)
Then you can download the
.. code-block:: bash
> git clone https://git.metabarcoding.org/org-asm/org-asm.git
This command will create a new directory called ``org-asm``.
Compiling and installing |orgasm|
.................................
From the directory where you retrieved the sources, execute the following commands:
.. code-block:: bash
> cd org-asm
> python3 setup.py --serenity install
Once installed, you can test your installation by running the commands of the
:doc:`tutorials <./tutorials>`.
......@@ -4,7 +4,34 @@ The ORGanelle ASseMbler command line
====================================
The ORGanelle ASseMbler commands
--------------------------------
.. figure:: command-flowgram.*
:align: center
:figwidth: 80 %
:width: 500
The :ref:`organelle assembler <oa>`'s proides a set of commands.
Several of these commands (at least three) have to executed to complete
an assembling process.
- The bold green path indicates the minimal succession of
commands you need to run to assemble a sequence from a set of illumina reads.
- The green dotted path indicates an alternative succession of commands
commonly run to achieve the assembling process.
- The fine blue dotted arrows indicate the data used by each of the commands.
- The fine red dotted arrows indicate the final results provided by commands.
- The orange boxed commands correspond to utility commands not required for the
assembling but sometime useful to get or restore some information.
.. toctree::
:maxdepth: 3
:maxdepth: 2
commands
preparing
assembling
finishing
unfolding
utilities
.. option:: --kup ORGASM:KUP
The word size used to identify the seed reads
[default: protein=4, DNA=12].
.. option:: --seeds seeds
Seed sequences; either a fasta file containing seeds
sequences (nucleic or proteic) or the name of an internal
set of seeds among:
- ``nucrRNAAHypogastrura``
- ``nucrRNAArabidopsis``
- ``protChloroArabidopsis``
- ``protMitoCapra``
- ``protMitoMachaon``
.. code-block:: bash
$ oa buildgraph --seeds protChloroArabidopsis seqindex
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment