Commit dd4aea99 by Eric Coissac

continue documentation

parent 8a6849e8
The ORGanelle ASeMbler algorithmns
==================================
The organelle assembler commands
The ORGanelle ASseMbler commands
================================
.. toctree::
......
......@@ -6,3 +6,150 @@ The :program:`buildgraph` command
The :ref:`organelle assembler <oa>`'s :program:`buildgraph`
realizes the assembling of the reads by building the De Bruinj Graph which
is the central data structure used by the :ref:`organelle assembler <oa>`.
command prototype
-----------------
.. program:: oa buildgraph
.. code-block:: none
usage: oa buildgraph [-h] --seeds seeds
[--adapt5 adapt5] [--adapt3 adapt3]
[--coverage BUILDGRAPH:COVERAGE]
[--lowcomplexity]
[--minread BUILDGRAPH:MINREAD]
[--minoverlap BUILDGRAPH:MINOVERLAP]
[--minratio BUILDGRAPH:MINRATIO]
[--mincov BUILDGRAPH:MINCOV]
[--assmax BUILDGRAPH:ASSMAX]
[--smallbranches BUILDGRAPH:SMALLBRANCHES]
[--back ORGASM:BACK] [--snp]
index [output]
positional arguments
--------------------
.. option:: index
index root filename (produced by the oa index command)
.. option:: output
output prefix
optional arguments
------------------
General option
++++++++++++++
.. option:: -h, --help
show the help message and exit
Assembling initiation option
++++++++++++++++++++++++++++
.. figure:: ../extension.*
:align: center
:figwidth: 80 %
:width: 500
The assembling stack
.. option:: --seeds seeds
Seed sequences; either a fasta file containing seeds
sequences (nucleic or proteic) or the name of an internal
set of seeds among:
- ``nucrRNAAHypogastrura``
- ``nucrRNAArabidopsis``
- ``protChloroArabidopsis``
- ``protMitoCapra``
- ``protMitoMachaon``
.. code-block:: bash
$ oa buildgraph --seeds protChloroArabidopsis seqindex
Graph extension options
+++++++++++++++++++++++
.. option:: --minread BUILDGRAPH:MINREAD
the minimum count of read to consider [default:
<estimated>]
.. code-block:: bash
$ oa buildgraph --seeds protChloroArabidopsis --minread 5 seqindex
Consider an extension if at least five reads are present in the extension
stack.
.. option:: --coverage BUILDGRAPH:COVERAGE
the expected sequencing coverage [default:
<estimated>]
.. option:: --minratio BUILDGRAPH:MINRATIO
minimum ratio between occurrences of an extension and
the occurrences of the most frequent extension to keep
it. [default: <estimated>]
.. option:: --mincov BUILDGRAPH:MINCOV
minimum occurrences of an extension to keep it.
[default: 1]
.. option:: --assmax BUILDGRAPH:ASSMAX
maximum base pair assembled
.. option:: --minoverlap BUILDGRAPH:MINOVERLAP
minimum length of the overlap between the sequence and
reads to participate in the extension. [default:
<estimated>]
.. option:: --lowcomplexity
Use also low complexity probes
.. option:: --adapt5 adapt5
adapter sequences used to filter reads beginning by
such sequences; either a fasta file containing adapter
sequences or internal set of adapter sequences among
['adapt5ILLUMINA'] [default: adapt5ILLUMINA]
.. option:: --adapt3 adapt3
adapter sequences used to filter reads ending by such
sequences; either a fasta file containing adapter
sequences or internal set of adapter sequences among
['adapt3ILLUMINA'] [default: adapt3ILLUMINA]
Graph cleaning
++++++++++++++
.. option:: --smallbranches BUILDGRAPH:SMALLBRANCHES
maximum length of the branches to cut during the
cleaning process [default: <estimated>]
.. option:: --back ORGASM:BACK
The number of bases taken at the end of contigs to
jump with pared-ends [default: <estimated>]
.. option:: --snp
Desactivate the SNP clearing mode
.. _`yed`: https://www.yworks.com/en/products/yfiles/yed/
......@@ -3,17 +3,16 @@
The :program:`index` command
============================
The :ref:`organelle assembler <oa>`'s :program:`index` command indexes
|Orgasm|'s :program:`index` command indexes
lexicographicaly a set of sequence reads to make them usable by the
assembler.
The constraints of the :ref:`organelle assembler <oa>`
------------------------------------------------------
The constraints of the |orgasm|
-------------------------------
The :ref:`organelle assembler <oa>` was developed to deal with Illumina
paired-end reads. Consequently, the algortihm of the
:ref:`organelle assembler <oa>` requires that the indexed reads respect
several constraints.
|Orgasm| was developed to deal with Illumina
paired-end reads. Consequently, the algortihm of |orgasm|
requires that the indexed reads respect several constraints.
- The reads must be paired.
- They must have all the same length.
......@@ -40,14 +39,17 @@ These options allow for indicating:
They also permit to define the read length following three strategies.
Setting up the length of the indexed reads
------------------------------------------
The default strategy
++++++++++++++++++++
When nothing is specified the :ref:`organelle assembler <oa>`
:program:`index <oa_index>` commandconsiders that all the reads from the
dataset have the same length. Considering this, the actual read length of
the dataset is estimated from the first read of the forward file.
If the read length is even, it is decreased by one.
When nothing is specified |orgasm| :program:`index <oa_index>` command
considers that all the reads from the dataset have the same length.
Considering this, the actual read length of the dataset is estimated
from the first read of the forward file. If the read length is even,
it is decreased by one.
During the indexing procedure, reads shorter than the limit are discarded,
read longuer than the limit are trimmed on their 3' end to fit the good
......@@ -57,16 +59,24 @@ code are also discarded.
The user defined read length
++++++++++++++++++++++++++++
Using the option :ref:`--length <opt_idx_length>`
Using the option :ref:`--length <opt_idx_length>`, users can specify the read
length to index. If the specified read length is even, it is decreased by one.
As for the default strategy, reads shorter than the specified limit are
discarded, read longuer than the limit are trimmed on their 3' end to fit the
good length. After the trimming reads containing :ref:`IUPAC <iupac_code>`
ambiguity code are also discarded.
The estimated read length
+++++++++++++++++++++++++
Running the :program:`index`
One of the most common way for running the :program:`index` command looks like
to the following command:
The most common way to run the index command
--------------------------------------------
The basic unix command for running the :program:`index` command looks like
to this:
.. code-block:: bash
......@@ -78,41 +88,65 @@ to the following command:
the :program:`index` command creates four files :
- <index>.ogx : contains information concerning the index
- <index>.ofx : contains the sequences themselves and the forward index
- <index>.orx : contains reverse index
- <index>.opx : contains read pairing data
- ``<index>.ogx`` : contains information concerning the index
- ``<index>.ofx`` : contains the sequences themselves and the forward index
- ``<index>.orx`` : contains reverse index
- ``<index>.opx`` : contains read pairing data
The :ref:`organelle assembler <oa>` will need all these file to process assembling.
`<index>` represents the name of index that will be used later by the assembler.
|Orgasm| will need all these file to process assembling.
``<index>``` represents the name of index that will be used later by the assembler.
A fifth file named ``<index>.log`` contains the traces generated by the indexation
process.
command prototype
-----------------
.. code-block:: none
$ oa index [--single | --mate-pairs] \
[--check-ids] [--check-pairing] \
[--max-read ###] \
[--length ### | --estimate-length #.##] \
[--fasta | --forward-fasta | --reverse-fasta] \
[--no-pipe] \
<index> <forward_fastq_file> [reverse_fastq_file]
usage: $ oa index [-h] [--single | --mate-pairs]
[--check-ids] [--check-pairing]
[--max-read ###]
[--length ### | --estimate-length #.##]
[--fasta | --forward-fasta | --reverse-fasta]
[--no-pipe]
<index> <forward_fastq_file> [reverse_fastq_file]
positional arguments
--------------------
.. option:: index
Name of the produced index
options
-------
.. option:: forward
Filename of the forward reads
.. option:: reverse
Filename of the reverse reads
optional arguments
------------------
.. program:: oa index
General option
++++++++++++++
.. option:: -h, --help
show the help message and exit
Sequencing strategy
+++++++++++++++++++
.. cmdoption:: --single
.. option:: --single
Single read mode.
.. cmdoption:: --mate-pairs
.. option:: --mate-pairs
Indicates that the two read files were obtained using a mate pair
sequencing strategy.
......@@ -120,13 +154,13 @@ Sequencing strategy
Sequence file checking
++++++++++++++++++++++
.. cmdoption:: --check-ids
.. option:: --check-ids
Checks that forward and reverse ids are identical.
The two sequence ids `seqid/1` and `seqid/2` are considered as
identical.
.. cmdoption:: --check-pairing
.. option:: --check-pairing
Ensure that forward and reverse files are correctly paired.
The pairing is checked based on the sequence identifier.
......@@ -136,7 +170,7 @@ Sequence file checking
Limit for the indexation
++++++++++++++++++++++++
.. cmdoption:: --max-read ###
.. option:: --max-read ###
`###` indicates the number of millions of reads to index. If not
specified all the reads are indexed within the limit imposed by
......@@ -150,7 +184,7 @@ Limit for the indexation
.. _opt_idx_length:
.. cmdoption:: --length ###
.. option:: --length ###
`###` represents the read length to consider. Only reads
with a length greater or equal to `###` will be indexed. Reads longer
......@@ -167,7 +201,7 @@ Limit for the indexation
from the length of the first read of the forward file or through the
:option:`--estimate-length #.##` option.
.. cmdoption:: --estimate-length #.##
.. option:: --estimate-length #.##
`#.##` ranging between 0.0 and 1.0, indicates which fraction
of the overall dataset we want to use. When this option is used
......@@ -183,7 +217,7 @@ Limit for the indexation
Sequence file format
++++++++++++++++++++
.. cmdoption:: --fasta
.. option:: --fasta
Indicates than the two sequence files to index are :ref:`fasta <fasta>` files.
......@@ -191,7 +225,7 @@ Sequence file format
$ oa index --fasta seqindex forward.fasta reverse.fasta
.. cmdoption:: --forward-fasta
.. option:: --forward-fasta
Indicates than the forward file is a fasta file
......@@ -199,7 +233,7 @@ Sequence file format
$ oa index --forward-fasta seqindex forward.fasta reverse.fastq
.. cmdoption:: --reverse-fasta
.. option:: --reverse-fasta
Indicates than the reverse file is a fasta file
......@@ -224,7 +258,7 @@ compressed with `bzip2`_.
System option
+++++++++++++
.. cmdoption:: --no-pipe
.. option:: --no-pipe
By default the :ref:`organelle assembler <oa>` uses named pipes to transfer
data among programs. Using this option you can enforce to use
......
......@@ -65,6 +65,11 @@ master_doc = 'index'
project = u'Organelle Assembler'
copyright = u'2014, Frédéric Boyer, Eric Coissac, Alain Viari'
rst_epilog = """
.. |orgasm| replace:: :ref:`the ORGanelle ASeMbler <oa>`
.. |Orgasm| replace:: :ref:`The ORGanelle ASeMbler <oa>`
"""
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
......
......@@ -3,18 +3,18 @@
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to Organelle Assembler's documentation!
===============================================
Org.Asm: |Orgasm|
=================
Contents:
.. toctree::
:maxdepth: 2
:maxdepth: 3
Sequencing strategies and file formats <strategies>
The ORGanelle ASeMbler ``oa`` command line tool <oa>
|Orgasm| ``oa`` command line tool <oa>
Assembling a mitochondrion genome <mitochondrion>
Algorithms <algorithms>
algorithms
Indices and tables
......
.. _oa:
The organelle assembler command line
The ORGanelle ASseMbler command line
====================================
......
......@@ -37321,9 +37321,7 @@ def installModules():
except:
print >>sys.stderr,'Unable to install %s on your system' % SOFT
print install_zip
def cleanModules():
if tmpdir:
#!/usr/bin/env python
import os
import optparse
import sys
import re
from pip.exceptions import InstallationError, CommandError, PipError
from pip.log import logger
from pip.util import get_installed_distributions, get_prog
from pip.vcs import git, mercurial, subversion, bazaar # noqa
from pip.baseparser import ConfigOptionParser, UpdatingDefaultsHelpFormatter
from pip.commands import commands, get_summaries, get_similar_commands
# This fixes a peculiarity when importing via __import__ - as we are
# initialising the pip module, "from pip import cmdoptions" is recursive
# and appears not to work properly in that situation.
import pip.cmdoptions
cmdoptions = pip.cmdoptions
# The version as used in the setup.py and the docs conf.py
__version__ = "1.5.6"
def autocomplete():
"""Command and option completion for the main option parser (and options)
and its subcommands (and options).
Enable by sourcing one of the completion shell scripts (bash or zsh).
"""
# Don't complete if user hasn't sourced bash_completion file.
if 'PIP_AUTO_COMPLETE' not in os.environ:
return
cwords = os.environ['COMP_WORDS'].split()[1:]
cword = int(os.environ['COMP_CWORD'])
try:
current = cwords[cword - 1]
except IndexError:
current = ''
subcommands = [cmd for cmd, summary in get_summaries()]
options = []
# subcommand
try:
subcommand_name = [w for w in cwords if w in subcommands][0]
except IndexError:
subcommand_name = None
parser = create_main_parser()
# subcommand options
if subcommand_name:
# special case: 'help' subcommand has no options
if subcommand_name == 'help':
sys.exit(1)
# special case: list locally installed dists for uninstall command
if subcommand_name == 'uninstall' and not current.startswith('-'):
installed = []
lc = current.lower()
for dist in get_installed_distributions(local_only=True):
if dist.key.startswith(lc) and dist.key not in cwords[1:]:
installed.append(dist.key)
# if there are no dists installed, fall back to option completion
if installed:
for dist in installed:
print(dist)
sys.exit(1)
subcommand = commands[subcommand_name]()
options += [(opt.get_opt_string(), opt.nargs)
for opt in subcommand.parser.option_list_all
if opt.help != optparse.SUPPRESS_HELP]
# filter out previously specified options from available options
prev_opts = [x.split('=')[0] for x in cwords[1:cword - 1]]
options = [(x, v) for (x, v) in options if x not in prev_opts]
# filter options by current input
options = [(k, v) for k, v in options if k.startswith(current)]
for option in options:
opt_label = option[0]
# append '=' to options which require args
if option[1]:
opt_label += '='
print(opt_label)
else:
# show main parser options only when necessary
if current.startswith('-') or current.startswith('--'):
opts = [i.option_list for i in parser.option_groups]
opts.append(parser.option_list)
opts = (o for it in opts for o in it)
subcommands += [i.get_opt_string() for i in opts
if i.help != optparse.SUPPRESS_HELP]
print(' '.join([x for x in subcommands if x.startswith(current)]))
sys.exit(1)
def create_main_parser():
parser_kw = {
'usage': '\n%prog <command> [options]',
'add_help_option': False,
'formatter': UpdatingDefaultsHelpFormatter(),
'name': 'global',
'prog': get_prog(),
}
parser = ConfigOptionParser(**parser_kw)
parser.disable_interspersed_args()
pip_pkg_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
parser.version = 'pip %s from %s (python %s)' % (
__version__, pip_pkg_dir, sys.version[:3])
# add the general options
gen_opts = cmdoptions.make_option_group(cmdoptions.general_group, parser)
parser.add_option_group(gen_opts)
parser.main = True # so the help formatter knows
# create command listing for description
command_summaries = get_summaries()
description = [''] + ['%-27s %s' % (i, j) for i, j in command_summaries]
parser.description = '\n'.join(description)
return parser
def parseopts(args):
parser = create_main_parser()
# Note: parser calls disable_interspersed_args(), so the result of this call
# is to split the initial args into the general options before the
# subcommand and everything else.
# For example:
# args: ['--timeout=5', 'install', '--user', 'INITools']
# general_options: ['--timeout==5']
# args_else: ['install', '--user', 'INITools']
general_options, args_else = parser.parse_args(args)
# --version
if general_options.version:
sys.stdout.write(parser.version)
sys.stdout.write(os.linesep)
sys.exit()
# pip || pip help -> print_help()
if not args_else or (args_else[0] == 'help' and len(args_else) == 1):
parser.print_help()
sys.exit()
# the subcommand name
cmd_name = args_else[0].lower()
#all the args without the subcommand
cmd_args = args[:]
cmd_args.remove(args_else[0].lower())
if cmd_name not in commands:
guess = get_similar_commands(cmd_name)
msg = ['unknown command "%s"' % cmd_name]
if guess:
msg.append('maybe you meant "%s"' % guess)
raise CommandError(' - '.join(msg))
return cmd_name, cmd_args
def main(initial_args=None):
if initial_args is None:
initial_args = sys.argv[1:]
autocomplete()
try:
cmd_name, cmd_args = parseopts(initial_args)
except PipError:
e = sys.exc_info()[1]
sys.stderr.write("ERROR: %s" % e)
sys.stderr.write(os.linesep)
sys.exit(1)
command = commands[cmd_name]()
return command.main(cmd_args)
def bootstrap():
"""
Bootstrapping function to be called from install-pip.py script.
"""
pkgs = ['pip']
try:
import setuptools
except ImportError:
pkgs.append('setuptools')
return main(['install', '--upgrade'] + pkgs + sys.argv[1:])