strategies.rst 1.11 KB
Newer Older
Eric Coissac's avatar
Eric Coissac committed
1 2 3
Sequencing strategies and file formats
======================================

4 5 6 7 8 9 10
Low-coverage shotgun sequencing of genomic DNA
----------------------------------------------

The resulting data of low-coverage shotgun sequencing of genomic DNA (gDNA), aka genome skimming, is the primary data used by ``ORG.asm``. If we hypophethize that the organelle genomes represent several percent of the total gDNA, even with a modest depth of sequencing of the nuclear genome (around 1x coverage), on can hope to get more than 100x coverage for the organelle genomes and repeated regions (such as rDNA clusters). This allows the reconstruction of organelle genomes and repeated regions for up to 48 samples loaded in the same HiSeq 2500 lane. 

Raw sequencing results (after adapter trimming) are usually provided in the fastq format, the raw result of the assembly in fasta format and the annotated result (with CDS, tRNA, ...) in the EMBL format (see 3. structure of an entry, of the ENA user manual ftp://ftp.ebi.ac.uk/pub/databases/embl/doc/usrman.txt).

Eric Coissac's avatar
Eric Coissac committed
11 12 13 14 15 16 17 18 19

The file formats
----------------

.. toctree::
   :maxdepth: 2

   fasta
   fastq
20
   embl