Commit 949fca88 by Frédéric Boyer

Modification of 'Orgasm principles'

parent 7be51454
Raw sequencing results (after adapter trimming) are usually provided in the fastq format, the raw result of the assembly in fasta format and the annotated result (with CDS, tRNA, ...) in the EMBL format.
.. toctree::
:maxdepth: 2
fasta
fastq
embl
......@@ -3,26 +3,7 @@
The ORGanelle ASseMbler principles
==================================
Sequencing strategies and file formats
--------------------------------------
Low-coverage shotgun sequencing of genomic DNA
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The resulting data of low-coverage shotgun sequencing of genomic DNA (gDNA), aka genome skimming, is the primary data used by ``ORG.asm``. If we hypophethize that the organelle genomes represent several percent of the total gDNA, even with a modest depth of sequencing of the nuclear genome (around 1x coverage), on can hope to get more than 100x coverage for the organelle genomes and repeated regions (such as rDNA clusters). This allows the reconstruction of organelle genomes and repeated regions for up to 48 samples loaded in the same HiSeq 2500 lane.
Raw sequencing results (after adapter trimming) are usually provided in the fastq format, the raw result of the assembly in fasta format and the annotated result (with CDS, tRNA, ...) in the EMBL format.
The file formats
^^^^^^^^^^^^^^^^
.. toctree::
:maxdepth: 2
fasta
fastq
embl
.. include:: ./strategy.txt
The ORGanelle ASseMbler commands
......@@ -53,7 +34,6 @@ an assembling process.
- The orange boxed commands correspond to utility commands not required for the
assembling but sometime useful to get or restore some information.
The set of sub-commands can be splitted in several categories corresponding to
the main steps of the assembling procedure.
......@@ -65,3 +45,11 @@ the main steps of the assembling procedure.
finishing
unfolding
utilities
The file formats
================
.. include:: ./formats.txt
Sequencing strategy: Low-coverage shotgun sequencing of genomic DNA
-------------------------------------------------------------------
The resulting data of low-coverage shotgun sequencing of genomic DNA (gDNA), aka genome skimming, is the primary data used by ``ORG.asm``. If we hypophethize that the organelle genomes represent several percent of the total gDNA (organellar genomes can be present in more than 1000 copies in a single cell), even with a modest depth of sequencing of the nuclear genome (around 1x coverage), on can hope to get more than 100x coverage for the organelle genomes and repeated regions (such as rDNA clusters). This allows the reconstruction of organelle genomes and repeated regions for up to 48 samples loaded in the same HiSeq 2500 lane.
For example, Consider that you sequence 3.10e6 pair-end reads -> 6.10e6 reads of 100bp
==================== ============ ============
Organelle Chloroplast Mitochondria
==================== ============ ============
Belonging organelle 5% 0.5%
Effective reads 300,000 30,000
Base pairs 30.10e6 3.10e6
Genome size 150kb 16Kb
Sequencing depth 200X 187X
==================== ============ ============
This can be further observed using the k-mer frequency spectrum of a plant genome low-coverage shotgun sequencing.
The spectrum shows particular a shape with a high number of non-frequent kmer and a bimodal shape at intermediate and high frequency.
This shape can be explained by the mix of the low coverage sequencing the high coverage sequecing of the chloroplastic genome. Indeed the nuclear genome sequencing is responsible for a large number of unique or nearly unique k-mer and the high coverage sequencing of the chloroplastic genome translate into the bimodal distribution of moderatly and highly encountered k-mers, the bimodal distribution being due to the large duplicated region (Inverted Repeat) typical of the chloroplastic genome.
.. figure:: ./Kmer-histogram.*
:align: center
:figwidth: 80 %
:width: 500
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment