Commit 64163e50 by Pierre Taberlet

--no commit message

parent 00c89239
......@@ -118,8 +118,8 @@ And the result is:
ccgcctcctttagataccccactatgcttagccctaaacacaagtaattattataacaaaatcattcgccagagtgtagc
gggagtaggttaaaactcaaaggacttggcggtgctttatacccttctagaggagcctgttctaaggaggcgg
+
ddddddddddddddddddddddcddddcacdddddddddddddc\d~b~~~b~~~~~~b`ryK~|uxyXk`}~
ccBccBcccBcBcccBcBccccccc~~~~b|~~xdbaddaaWcccdaaddddadacddddddcddadbbddddddddddd
ddddddddddddddddddddddcddddcacdddddddddddddc\d~b~~~b~~~~~~b`ryK~|uxyXk`}~ccBccBc
ccBcBcccBcBccccccc~~~~b|~~xdbaddaaWcccdaaddddadacddddddcddadbbddddddddddd
......@@ -275,7 +275,7 @@ the result in the unix commands ``sort`` and ``head`` we keep only the counting
.. code-block:: bash
> obistat -c count CL-b.uniq.fasta | \
> obistat -c count wolf.ali.ngs.uniq.fasta | \
sort -nk1 | head -20
This print the output:
......@@ -305,20 +305,31 @@ This print the output:
The dataset contains 3504 sequences occurring only once.
TODO: Stopped here
Keep only the sequences having a count greater or equal to 10
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Keep only the sequences having a count greater or equal to 10 and a length shorter than 80 bp
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Based on the previous observation, we set the cut-off for keeping sequences for further analysis to a count of 10. To do this, we use the :doc:`obigrep <scripts/obigrep>` command.
The ``-p 'count>=10'`` option means that the ``python`` expression :py:mod:`count>=10` must be evaluated to :py:mod:`True` for each sequence to be kept.
The ``-p 'count>=10'`` option means that the ``python`` expression :py:mod:`count>=10` must be evaluated to :py:mod:`True` for each sequence to be kept. We also remove
sequences with a length shorter than 80 bp (option -l).
.. code-block:: bash
> obigrep -p 'count>=10' CL-b.uniq.fasta > CL-b.uniq.10.fasta
> obigrep -l 80 -p 'count>=10' wolf.ali.ngs.uniq.fasta \
> wolf.ali.ngs.uniq.c10.l80.fasta
The first sequence record of ``wolf.ali.ngs.uniq.c10.l80.fasta`` is:
.. code-block:: bash
>HELIUM_000100422_612GNAAXX:7:22:8540:14708#0/1_CONS_SUB_SUB count=12335;
merged_sample={'29a_F260619': 4697, '15a_F730814': 7638};
aagggtataaagcaccgccaagtcctttgagttttaagctattgccggtagtactctggc
gaataattttgttatattaattacttgtgtttagggctaa
Clean the sequences for PCR/sequencing errors (sequence variants)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......@@ -329,7 +340,21 @@ sequences with no variants with 20-fold greater (``-r 0.05`` option).
.. code-block:: bash
> obiclean -s merged_sample -r 0.05 -H \
CL-b.uniq.10.fasta > CL-b.uniq.10.heads.fasta
wolf.ali.ngs.uniq.c10.l80.fasta > wolf.ali.ngs.uniq.c10.l80.clean.fasta
The first sequence record of ``wolf.ali.ngs.uniq.c10.l80.clean.fasta`` is:
.. code-block:: bash
>HELIUM_000100422_612GNAAXX:7:22:8540:14708#0/1_CONS_SUB_SUB count=12335;
merged_sample={'29a_F260619': 4697, '15a_F730814': 7638};
aagggtataaagcaccgccaagtcctttgagttttaagctattgccggtagtactctggc
gaataattttgttatattaattacttgtgtttagggctaa
TODO: Stopped here
Taxonomic assignment of sequences
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment