Commit 1d831e48 by Eric Coissac

--no commit message

parent 2f770c29
......@@ -17,8 +17,8 @@ ASCII character for brevity. It was originally developed at the `Wellcome Trust
Institute` to bundle a
:ref:`fasta <classical-fasta>` sequence and its quality data, but has recently
become the *de facto* standard for storing the output of high throughput
sequencing instruments such as the `Illumina Genome
Analyzer <Illumina (company)>`__. [1]_
sequencing instruments such as the Illumina Genome
Analyzer Illumina. [1]_
Format
------
......@@ -67,14 +67,14 @@ Quality
A quality value *Q* is an integer mapping of *p* (i.e., the probability
that the corresponding base call is incorrect). Two different equations
have been in use. The first is the standard Sanger variant to assess
reliability of a base call, otherwise known as `Phred quality
score <Phred quality score>`__:
reliability of a base call, otherwise known as Phred quality
score:
:math:`Q_\text{sanger} = -10 \, \log_{10} p`
The Solexa pipeline (i.e., the software delivered with the Illumina
Genome Analyzer) earlier used a different mapping, encoding the
`odds <odds>`__ *p*/(1-*p*) instead of the probability *p*:
odds *p*/(1-*p*) instead of the probability *p*:
:math:`Q_\text{solexa-prior to v.1.3} = -10 \, \log_{10} \frac{p}{1-p}`
......@@ -90,16 +90,16 @@ values, they differ at lower quality levels (i.e., approximately *p* >
Encoding
~~~~~~~~
- Sanger format can encode a `Phred quality
score <Phred quality score>`__ from 0 to 93 using ASCII 33 to 126
- Sanger format can encode a Phred quality
score from 0 to 93 using ASCII 33 to 126
(although in raw read data the Phred quality score rarely exceeds 60,
higher scores are possible in assemblies or read maps).
- Solexa/Illumina 1.0 format can encode a Solexa/Illumina quality score
from -5 to 62 using `ASCII <ASCII>`__ 59 to 126 (although in raw read
from -5 to 62 using ASCII 59 to 126 (although in raw read
data Solexa scores from -5 to 40 only are expected)
- Starting with Illumina 1.3 and before Illumina 1.8, the format
encoded a `Phred quality score <Phred quality score>`__ from 0 to 62
using `ASCII <ASCII>`__ 64 to 126 (although in raw read data Phred
encoded a Phred quality score from 0 to 62
using ASCII 64 to 126 (although in raw read data Phred
scores from 0 to 40 only are expected).
- Starting in Illumina 1.5 and before Illumina 1.8, the Phred scores 0
to 2 have a slightly different meaning. The values 0 and 1 are no
......@@ -137,16 +137,13 @@ a given run.
File extension
--------------
There is no standard `file extension <file extension>`__ for a FASTQ
There is no standard file extension for a FASTQ
file, but .fq and .fastq, are commonly used.
See also
--------
- `FASTA format <FASTA format>`__
- `Phred quality score <Phred quality score>`__
- `List of file formats for molecular
biology <List of file formats#Biology>`__
- :ref:`fasta <fasta-classicalt>`__
References
----------
......@@ -175,8 +172,6 @@ External links
{{DEFAULTSORT:Fastq Format}}
Category:Bioinformatics `Category:Biological sequence
format <Category:Biological sequence format>`__
.. [1]
Cock et al (2009) The Sanger FASTQ file format for sequences with
......
......@@ -59,6 +59,9 @@ The innovation of the ``OBITools`` is their ability to take into account the
taxonomic annotations, ultimately allowing sorting and filtering of sequence
records based on the taxonomy.
|Pipeline example for a standard biodiversity survey|
References
..........
......@@ -195,6 +198,7 @@ output format, and the ``OBITools`` are also able to write sequences in the
programs. In the :doc:`fasta <fasta>` or :doc:`fastq <fastq>` format, the attributes are written in the header
line just after the *id*, following a `key=value;` format (Figure 2).
|The structure of an OBITools sequence record and its representation in fasta and fastq formats|
Taxonomical aspects
...................
......@@ -273,3 +277,9 @@ Genomics, 11, 434.
Riaz T, Shehzad W, Viari A, Pompanon F, Taberlet P, Coissac E (2011) ecoPrimers:
inference of new DNA barcode markers from whole genome sequence analysis.
Nucleic Acids Research, 39, e145.
.. |Pipeline example for a standard biodiversity survey| image:: fig-Pipeline.pdf
.. |The structure of an OBITools sequence record and its representation in fasta and fastq formats| image:: fig-Record.pdf
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment