OBITools issueshttps://git.metabarcoding.org/obitools/obitools/-/issues2017-09-29T05:28:39Zhttps://git.metabarcoding.org/obitools/obitools/-/issues/2Installer does not support UTF82017-09-29T05:28:39ZEric CoissacInstaller does not support UTF8There is a strange comportement of the get-obitools.py script when the computer runs by default using the UTF8 encoding system leading to an error during the installation.
If the `UTF8` environment variable is unset, the script runs normally
There is a strange comportement of the get-obitools.py script when the computer runs by default using the UTF8 encoding system leading to an error during the installation.
If the `UTF8` environment variable is unset, the script runs normally
Eric CoissacEric Coissachttps://git.metabarcoding.org/obitools/obitools/-/issues/4obiclean: mettre un exemple dans la doc avec -s2017-09-29T05:28:39ZGhost Userobiclean: mettre un exemple dans la doc avec -shttps://git.metabarcoding.org/obitools/obitools/-/issues/10obitools activating script loose local setting2017-09-29T05:28:39ZEric Coissacobitools activating script loose local settingthe obitools script generated by the get-obitools.py installer loose the local settings (path, alias...) when obitools are activated :-(
the obitools script generated by the get-obitools.py installer loose the local settings (path, alias...) when obitools are activated :-(
https://git.metabarcoding.org/obitools/obitools/-/issues/12ngsfilter produces assigned-file with "forward_tag=None" or "reverse_tag=None"2017-09-29T05:28:39ZTobias Frøslevngsfilter produces assigned-file with "forward_tag=None" or "reverse_tag=None"My collegue discovered that running ngsfilter results in an "assigned-file" with many reads being assigned to samples but containing the attribute “forward_tag=None” or “reverse_tag=None” in the header.
I have now checked one of my recent “assigned.fastq”-files (product of running ngsfilter and, and that also contains many of these "pseudo-assigned” reads. (approx 4% of the total!)a
This indicates that ngsfilter assigns reads on the basis of only one matching tag.
Here you can see what my ngsfilter file looks like
Lichen F017R008 TACGACT:ATCGCGA GTGARTCATCGARTCTTTG TCCTCCGCTTATTGATATGC F at-sign
Lichen F103R064 CTTCCTT:GCATGGA GTGARTCATCGARTCTTTG TCCTCCGCTTATTGATATGC F at-sign
Lichen F020R009 ATCAGTC:CGCTCTC GTGARTCATCGARTCTTTG TCCTCCGCTTATTGATATGC F at-sign
Lichen F027R014 CTCTGCT:GCGTCAG GTGARTCATCGARTCTTTG TCCTCCGCTTATTGATATGC F at-sign
And here is the stats on the tags
$ obistat -c forward_tag -c reverse_tag lichen.assigned.fastq
lichen.assigned.fastq 100.0 % |#################################################| ] remain : 00:00:00
forward_tag reverse_tag count total
tacgact atcgcga 82157 82157
None gcgtcag 8303 8303
None atcgcga 1101 1101
None cgctctc 12055 12055
atcagtc cgctctc 667541 667541
cttcctt gcatgga 240579 240579
cttcctt None 4504 4504
None gcatgga 3343 3343
atcagtc None 9706 9706
tacgact None 945 945
PS: I also tested it on the wolf tutorial. Here it is also evident, although only one sample:
$ obistat -c forward_tag -c reverse_tag wolf.ali.assigned.fastq
wolf.ali.assigned.fastq 99.3 % |#################################################- ] remain : 00:00:00
forward_tag reverse_tag count total
gcctcct gcctcct 9851 9851
gaatatc gaatatc 14700 14700
aattaac aattaac 9056 9056
gcctcct None 1 1
gaagtag gaagtag 9724 9724
Looking at actual corresponding (raw) reads, it is evident that the tag (sequence) indicated as “None” is actually not present in the raw reads, and that the sequence should not be included as assigned
Are you aware of this potential problem?
(edit: changed to at-signs in the ngsfilter-file (this post) to "at-sign", as it interfered with the layout, and added some double new-lines to ease the reading)My collegue discovered that running ngsfilter results in an "assigned-file" with many reads being assigned to samples but containing the attribute “forward_tag=None” or “reverse_tag=None” in the header.
I have now checked one of my recent “assigned.fastq”-files (product of running ngsfilter and, and that also contains many of these "pseudo-assigned” reads. (approx 4% of the total!)a
This indicates that ngsfilter assigns reads on the basis of only one matching tag.
Here you can see what my ngsfilter file looks like
Lichen F017R008 TACGACT:ATCGCGA GTGARTCATCGARTCTTTG TCCTCCGCTTATTGATATGC F at-sign
Lichen F103R064 CTTCCTT:GCATGGA GTGARTCATCGARTCTTTG TCCTCCGCTTATTGATATGC F at-sign
Lichen F020R009 ATCAGTC:CGCTCTC GTGARTCATCGARTCTTTG TCCTCCGCTTATTGATATGC F at-sign
Lichen F027R014 CTCTGCT:GCGTCAG GTGARTCATCGARTCTTTG TCCTCCGCTTATTGATATGC F at-sign
And here is the stats on the tags
$ obistat -c forward_tag -c reverse_tag lichen.assigned.fastq
lichen.assigned.fastq 100.0 % |#################################################| ] remain : 00:00:00
forward_tag reverse_tag count total
tacgact atcgcga 82157 82157
None gcgtcag 8303 8303
None atcgcga 1101 1101
None cgctctc 12055 12055
atcagtc cgctctc 667541 667541
cttcctt gcatgga 240579 240579
cttcctt None 4504 4504
None gcatgga 3343 3343
atcagtc None 9706 9706
tacgact None 945 945
PS: I also tested it on the wolf tutorial. Here it is also evident, although only one sample:
$ obistat -c forward_tag -c reverse_tag wolf.ali.assigned.fastq
wolf.ali.assigned.fastq 99.3 % |#################################################- ] remain : 00:00:00
forward_tag reverse_tag count total
gcctcct gcctcct 9851 9851
gaatatc gaatatc 14700 14700
aattaac aattaac 9056 9056
gcctcct None 1 1
gaagtag gaagtag 9724 9724
Looking at actual corresponding (raw) reads, it is evident that the tag (sequence) indicated as “None” is actually not present in the raw reads, and that the sequence should not be included as assigned
Are you aware of this potential problem?
(edit: changed to at-signs in the ngsfilter-file (this post) to "at-sign", as it interfered with the layout, and added some double new-lines to ease the reading)https://git.metabarcoding.org/obitools/obitools/-/issues/15SILVA reference database parsing2017-09-29T05:28:39ZCeline MercierSILVA reference database parsingSomeone would be interested in having `obiaddtaxids` parse SILVA reference databases (it used to be able to do it but the format probably changed). I'm not sure if there are several SILVA formats or only this one:
```
>AAAA02038008.4342.6342 Eukaryota;Opisthokonta;Holozoa;Metazoa (Animalia);Eumetazoa;Bilateria;Arthropoda;Hexapoda;Insecta;Pterygota;Neoptera;Diptera;Oryza sativa Indica Group
UCUGGUUGAUCCUGCCAGUAGUU.......
>AAAB01001705.32.640 Eukaryota;Opisthokonta;Holozoa;Metazoa (Animalia);Eumetazoa;Bilateria;Arthropoda;Hexapoda;Insecta;Pterygota;Neoptera;Diptera;Anopheles gambiae str. PEST
UGAUAUACGCUCGUCUCAAAGGU.....
```
Should I write a parser for that format in `obiaddtaxids`?Someone would be interested in having `obiaddtaxids` parse SILVA reference databases (it used to be able to do it but the format probably changed). I'm not sure if there are several SILVA formats or only this one:
```
>AAAA02038008.4342.6342 Eukaryota;Opisthokonta;Holozoa;Metazoa (Animalia);Eumetazoa;Bilateria;Arthropoda;Hexapoda;Insecta;Pterygota;Neoptera;Diptera;Oryza sativa Indica Group
UCUGGUUGAUCCUGCCAGUAGUU.......
>AAAB01001705.32.640 Eukaryota;Opisthokonta;Holozoa;Metazoa (Animalia);Eumetazoa;Bilateria;Arthropoda;Hexapoda;Insecta;Pterygota;Neoptera;Diptera;Anopheles gambiae str. PEST
UGAUAUACGCUCGUCUCAAAGGU.....
```
Should I write a parser for that format in `obiaddtaxids`?https://git.metabarcoding.org/obitools/obitools/-/issues/19ngsfilter clash if there is one (or two?) serie(s) of "n" longer than the pri...2022-03-18T21:46:01ZGhost Userngsfilter clash if there is one (or two?) serie(s) of "n" longer than the primersClash of ngsfilter when a sequence record contains one or two series of "n" longer than the primers.
I obtained the following message:
[RocheNoire:~/Desktop/co1_16s] pierretaberlet% ngsfilter -t ngs_co1_16s.txt --fasta -e 0 L002.obi.fasta > L002.obi.ngs.fasta
L002.obi.fasta 2.8 % |#/ ] remain : 13:14:02
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/bin/ngsfilter", line 451, in <module>
good,seq = annotate(seq,options)
File "/Library/Frameworks/Python.framework/Versions/2.7/bin/ngsfilter", line 320, in annotate
reversematch = [(p,p(sequence)) for p in primers if p is not None]
File "/Library/Frameworks/Python.framework/Versions/2.7/bin/ngsfilter", line 148, in __call__
tag=str(sequence[end:end+self.taglength].complement())
File "obitools/_obitools.pyx", line 553, in obitools._obitools.WrappedBioSequence.complement (src/obitools/_obitools.c:13753)
File "obitools/_obitools.pyx", line 562, in obitools._obitools.WrappedBioSequence.complement (src/obitools/_obitools.c:13703)
AttributeError
and the sequence record causing the crash was:
>NS500639:5:H2WMCAFXX:2:11102:21949:20400_CONS ali_length=46; direction=left; seq_ab_match=29; score=13.2920129752; seq_a_mismatch=3; seq_b_deletion=0; seq_b_mismatch=10; seq_a_deletion=5; score_norm=0.288956803808; seq_b_insertion=1; seq_a_insertion=3; mode=alignment; seq_a_single=106; seq_b_single=103;
agctggctgctgaacgccctcntaaggatattcgcgangnnnnnnnnnnnnnnnnnnnnnnaggtcttaaggnngannnnnnnnnnnnnnnnnnnnnnnnnnnnatgcaatcacgtaggcggcctaccgattcggcgtgtgatgaatgccaaggcgttgcgggactattttcgggatattggtcggatggttcttgctgccgagggtcgcaaggctaatgattcacacgccgactgctcgcagtatttttgtgtg
Clash of ngsfilter when a sequence record contains one or two series of "n" longer than the primers.
I obtained the following message:
[RocheNoire:~/Desktop/co1_16s] pierretaberlet% ngsfilter -t ngs_co1_16s.txt --fasta -e 0 L002.obi.fasta > L002.obi.ngs.fasta
L002.obi.fasta 2.8 % |#/ ] remain : 13:14:02
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/bin/ngsfilter", line 451, in <module>
good,seq = annotate(seq,options)
File "/Library/Frameworks/Python.framework/Versions/2.7/bin/ngsfilter", line 320, in annotate
reversematch = [(p,p(sequence)) for p in primers if p is not None]
File "/Library/Frameworks/Python.framework/Versions/2.7/bin/ngsfilter", line 148, in __call__
tag=str(sequence[end:end+self.taglength].complement())
File "obitools/_obitools.pyx", line 553, in obitools._obitools.WrappedBioSequence.complement (src/obitools/_obitools.c:13753)
File "obitools/_obitools.pyx", line 562, in obitools._obitools.WrappedBioSequence.complement (src/obitools/_obitools.c:13703)
AttributeError
and the sequence record causing the crash was:
>NS500639:5:H2WMCAFXX:2:11102:21949:20400_CONS ali_length=46; direction=left; seq_ab_match=29; score=13.2920129752; seq_a_mismatch=3; seq_b_deletion=0; seq_b_mismatch=10; seq_a_deletion=5; score_norm=0.288956803808; seq_b_insertion=1; seq_a_insertion=3; mode=alignment; seq_a_single=106; seq_b_single=103;
agctggctgctgaacgccctcntaaggatattcgcgangnnnnnnnnnnnnnnnnnnnnnnaggtcttaaggnngannnnnnnnnnnnnnnnnnnnnnnnnnnnatgcaatcacgtaggcggcctaccgattcggcgtgtgatgaatgccaaggcgttgcgggactattttcgggatattggtcggatggttcttgctgccgagggtcgcaaggctaatgattcacacgccgactgctcgcagtatttttgtgtg
https://git.metabarcoding.org/obitools/obitools/-/issues/22Error in the documentation of the --merge option of obiselect2017-09-29T05:28:38ZAurélie BoninError in the documentation of the --merge option of obiselect```
--merge=<KEY>
Attribute to merge.
Example:
> obiselect -c seq_length -n 2 -m sample seq1.fasta > seq2.fasta
This command keeps two sequences per sequence length, and records how many times they were observed for each
sample in the new attribute merged_sample.
```
Should be --merge and not -m in the example```
--merge=<KEY>
Attribute to merge.
Example:
> obiselect -c seq_length -n 2 -m sample seq1.fasta > seq2.fasta
This command keeps two sequences per sequence length, and records how many times they were observed for each
sample in the new attribute merged_sample.
```
Should be --merge and not -m in the examplehttps://git.metabarcoding.org/obitools/obitools/-/issues/25installation problem2023-08-09T17:26:52ZCamila Duarteinstallation problemDear,
I'm trying to install obitools but I have this error message:
IOError: [Errno 2] No such file or directory: '/home/camila/sequencing_results/18S/Otus/Obitools/OBITools-1.2.9/bin/activate'
The OBItools file I downloaded has no bin directory.
Best wishes,Dear,
I'm trying to install obitools but I have this error message:
IOError: [Errno 2] No such file or directory: '/home/camila/sequencing_results/18S/Otus/Obitools/OBITools-1.2.9/bin/activate'
The OBItools file I downloaded has no bin directory.
Best wishes,https://git.metabarcoding.org/obitools/obitools/-/issues/28OBITools not running on 32bits Ubuntu2017-09-29T05:28:38ZCeline MercierOBITools not running on 32bits UbuntuRunning any OBITools script on any(?) 32bits Ubuntu system results in an immediate segmentation fault.Running any OBITools script on any(?) 32bits Ubuntu system results in an immediate segmentation fault.https://git.metabarcoding.org/obitools/obitools/-/issues/33obiclean --without-progress-bar not working2017-09-29T05:28:37Zsproftobiclean --without-progress-bar not workingthe option --without-progress-bar does not seem to work for obiclean in version 1.2.9the option --without-progress-bar does not seem to work for obiclean in version 1.2.9https://git.metabarcoding.org/obitools/obitools/-/issues/35ngsfilter discarding many records as "unidentified"2019-06-05T17:53:22ZPatrick Sheangsfilter discarding many records as "unidentified"I am analyzing COI metabarcoding data. When I try to assign sequences to samples following "illuminapairedend", many records (just over half) end up in the "unidentified" file. Although some unidentified records end up with "error=No primer match", most have "error=Cannot assign sequence to a sample". When I look at one such sequence from the aligned fastq, I can start to see why it doesn't end up assigned to a sample:
natacctctgggtgaccgaagaaccaaaacaaatgttgatataacactgggtcacctccacctgcaggatcaaaaaaagttgtattgaaatttctatctgttagaagcatcgtaatagctccagctaaaacaggtaatgataaaagcaataatacagctgttattaaaacagaccatgcaaaaagaggtaacttatgcataaataatccttttgttcgcatatttaaaattgtacagataaagttgatagctcctaaaatagaagaagctccagataaatgcaaactaaaaatagctaggtcgactgctggtcctgaatgagaatttccacttgataatggagggtatactgttcaacctgtaccatagcatn
The first 26 nucleotides correspond to the reverse primer's reverse complement, the last 8 to the identifying tag's RC, and the 26 preceding the tag correspond to the forward primer RC. Looking at the reverse complement of the above sequence:
natgctatggtacaggttgaacagtataccctccattatcaagtggaaattctcattcaggaccagcagtcgacctagctatttttagtttgcatttatctggagcttcttctattttaggagctatcaactttatctgtacaattttaaatatgcgaacaaaaggattatttatgcataagttacctctttttgcatggtctgttttaataacagctgtattattgcttttatcattacctgttttagctggagctattacgatgcttctaacagatagaaatttcaatacaactttttttgatcctgcaggtggaggtgacccagtgttatatcaacatttgttttggttcttcggtcacccagaggtatn
Now everything looks fine - the first 8 nucleotides are the sample tag (I'm using 7 nucleotides "atgctat" in the ngsfilter text file to get around the "n"), followed by the forward primer, with the reverse primer at the very end. However, when I run the ngsfilter command, the aligned sequence (the first one above) ends up unidentified and looks like this:
attatcaagtggaaattctcattcaggaccagcagtcgacctagctatttttagtttgcatttatctggagcttcttctattttaggagctatcaactttatctgtacaattttaaatatgcgaacaaaaggattatttatgcataagttacctctttttgcatggtctgttttaataacagctgtattattgcttttatcattacctgttttagctggagctattacgatgcttctaacagatagaaatttcaatacaactttttttgatcctgcaggtggaggtgacccagtgttatatcaacatttgttt
So, the ngsfilter command managed to use the reverse complement of the aligned record in order to identify the primers, then remove these primers and the identifying tag, and produce the expected 313bp fragment afterward - all of which is great to see. BUT, it did NOT successfully assign the sequence to its sample, even though I can clearly see that the reverse complement of the tag sequence exists in the original record. In addition, looking at this record in the unidentified.fastq, there is an attribute for "forward_tag=atgctat", which matches one of my sample tags from the associated text file. Oddly enough, the attribute of "reverse_tag=gagggta" is also given to this unidentified record, even though I used the "T" operator in the text file for forward primer and tag only. Successfully identified records have "reverse_tag=None" as expected.
It seems the ngsfilter command isn't able to truly recognize the reverse complement of the tag sequence, thus leaving these records unidentified. Is there an obvious workaround to this? Would somehow ensuring the direction of aligned sequences prevent this? Is there an option in either the illuminapairedend or ngsfilter commands that I'm overlooking?I am analyzing COI metabarcoding data. When I try to assign sequences to samples following "illuminapairedend", many records (just over half) end up in the "unidentified" file. Although some unidentified records end up with "error=No primer match", most have "error=Cannot assign sequence to a sample". When I look at one such sequence from the aligned fastq, I can start to see why it doesn't end up assigned to a sample:
natacctctgggtgaccgaagaaccaaaacaaatgttgatataacactgggtcacctccacctgcaggatcaaaaaaagttgtattgaaatttctatctgttagaagcatcgtaatagctccagctaaaacaggtaatgataaaagcaataatacagctgttattaaaacagaccatgcaaaaagaggtaacttatgcataaataatccttttgttcgcatatttaaaattgtacagataaagttgatagctcctaaaatagaagaagctccagataaatgcaaactaaaaatagctaggtcgactgctggtcctgaatgagaatttccacttgataatggagggtatactgttcaacctgtaccatagcatn
The first 26 nucleotides correspond to the reverse primer's reverse complement, the last 8 to the identifying tag's RC, and the 26 preceding the tag correspond to the forward primer RC. Looking at the reverse complement of the above sequence:
natgctatggtacaggttgaacagtataccctccattatcaagtggaaattctcattcaggaccagcagtcgacctagctatttttagtttgcatttatctggagcttcttctattttaggagctatcaactttatctgtacaattttaaatatgcgaacaaaaggattatttatgcataagttacctctttttgcatggtctgttttaataacagctgtattattgcttttatcattacctgttttagctggagctattacgatgcttctaacagatagaaatttcaatacaactttttttgatcctgcaggtggaggtgacccagtgttatatcaacatttgttttggttcttcggtcacccagaggtatn
Now everything looks fine - the first 8 nucleotides are the sample tag (I'm using 7 nucleotides "atgctat" in the ngsfilter text file to get around the "n"), followed by the forward primer, with the reverse primer at the very end. However, when I run the ngsfilter command, the aligned sequence (the first one above) ends up unidentified and looks like this:
attatcaagtggaaattctcattcaggaccagcagtcgacctagctatttttagtttgcatttatctggagcttcttctattttaggagctatcaactttatctgtacaattttaaatatgcgaacaaaaggattatttatgcataagttacctctttttgcatggtctgttttaataacagctgtattattgcttttatcattacctgttttagctggagctattacgatgcttctaacagatagaaatttcaatacaactttttttgatcctgcaggtggaggtgacccagtgttatatcaacatttgttt
So, the ngsfilter command managed to use the reverse complement of the aligned record in order to identify the primers, then remove these primers and the identifying tag, and produce the expected 313bp fragment afterward - all of which is great to see. BUT, it did NOT successfully assign the sequence to its sample, even though I can clearly see that the reverse complement of the tag sequence exists in the original record. In addition, looking at this record in the unidentified.fastq, there is an attribute for "forward_tag=atgctat", which matches one of my sample tags from the associated text file. Oddly enough, the attribute of "reverse_tag=gagggta" is also given to this unidentified record, even though I used the "T" operator in the text file for forward primer and tag only. Successfully identified records have "reverse_tag=None" as expected.
It seems the ngsfilter command isn't able to truly recognize the reverse complement of the tag sequence, thus leaving these records unidentified. Is there an obvious workaround to this? Would somehow ensuring the direction of aligned sequences prevent this? Is there an option in either the illuminapairedend or ngsfilter commands that I'm overlooking?https://git.metabarcoding.org/obitools/obitools/-/issues/36obiannotate -C working partially2022-11-03T06:41:34ZLucie Zingerobiannotate -C working partiallyobiannotate -C option seems to work partially when the header contain python dictionaries.
Example of input:
>HISEQ:204:C8E5RANXX:7:1308:3148:82868_CONS_SUB_SUB ali_length=31; seq_ab_match=31; species_name=None; family=46569; class_name=Insecta; phylum_name=Arthropoda; seq_a_deletion=0; rank=subfamily; cluster=HISEQ:204:C8E5RANXX:7:1308:3148:82868_CONS_SUB_SUB; best_identity={'order_filtered_embl_r136_noenv_COL': 0.9030303030303031}; phylum=6656; forward_match=acgctgttatcccttagg; forward_primer=acgctgttatccctwagg; reverse_primer=gacgataagaccctwtaga; species=None; merged_sample={'H20_Cs_r4': 1, 'H20_Cs_r3': 11}; forward_score=72.0; seq_a_mismatch=0; start=taatt; forward_tag=atatagcg; seq_b_mismatch=0; scientific_name=Apicotermitinae; experiment=litiere_colle; species_list={'order_filtered_embl_r136_noenv_COL': ['Anoplotermes group sp. 1 TB-2017', 'Anoplotermes group nr. E1 TB-2014', 'Anoplotermes schwarzi', 'Ruptitermes nr. xanthochiton FG-ND2-26', 'Patawatermes sp. A TB-2017', 'Patawatermes nigripunctatus', 'Aparatermes sp. A TB-2017', 'Patawatermes turricola', 'Anoplotermes banksi', 'Humutermes krishnai', 'Grigiotermes hageni', 'Longustitermes manni']}; seq_a_single=94; cluster_weight=12; reverse_score=76.0; count=12; seq_b_insertion=0; taxid_by_db={'order_filtered_embl_r136_noenv_COL': 92739}; id_status={'order_filtered_embl_r136_noenv_COL': True}; seq_b_deletion=0; genus_name=None; cluster_center=True; seq_a_insertion=0; seq_length_ori=219; reverse_tag=agactatg; class=50557; goodAli=Alignement; match_count={'order_filtered_embl_r136_noenv_COL': 13}; family_name=Termitidae; best_match={'order_filtered_embl_r136_noenv_COL': 'KY224594; order_name=Blattodea; taxid=92739; scientific_name_by_db={'order_filtered_embl_r136_noenv_COL': 'Apicotermitinae'}; cluster_score=1.0; genus=None; order=85823; '}; order_name=Blattodea; rank_by_db={'order_filtered_embl_r136_noenv_COL': 'subfamily'}; seq_length=162; seq_b_single=94; status=full; mode=alignment; position=08_01D; genus=None; order=85823; distance=0.0;
taatttaatcttataatcaaaataaatggatcaaaaaactataaacaaatatatagcagt
aaagaggagttaaataaattcctcccatcgccccaacaaaacacctaaatcacttaataa
aacaaaacaaacaaaataataaaaagtaaataaaatgttaac
command:
obiannotate -C toto.fasta
Example corresponding output:
>HISEQ:204:C8E5RANXX:7:1308:3148:82868_CONS_SUB_SUB '}; order_name=Blattodea; rank_by_db={'order_filtered_embl_r136_noenv_COL': 'subfamily'}; seq_length=162; seq_b_single=94; status=full; mode=alignment; position=08_01D; genus=None; order=85823; distance=0.0;
taatttaatcttataatcaaaataaatggatcaaaaaactataaacaaatatatagcagt
aaagaggagttaaataaattcctcccatcgccccaacaaaacacctaaatcacttaataa
aacaaaacaaacaaaataataaaaagtaaataaaatgttaacobiannotate -C option seems to work partially when the header contain python dictionaries.
Example of input:
>HISEQ:204:C8E5RANXX:7:1308:3148:82868_CONS_SUB_SUB ali_length=31; seq_ab_match=31; species_name=None; family=46569; class_name=Insecta; phylum_name=Arthropoda; seq_a_deletion=0; rank=subfamily; cluster=HISEQ:204:C8E5RANXX:7:1308:3148:82868_CONS_SUB_SUB; best_identity={'order_filtered_embl_r136_noenv_COL': 0.9030303030303031}; phylum=6656; forward_match=acgctgttatcccttagg; forward_primer=acgctgttatccctwagg; reverse_primer=gacgataagaccctwtaga; species=None; merged_sample={'H20_Cs_r4': 1, 'H20_Cs_r3': 11}; forward_score=72.0; seq_a_mismatch=0; start=taatt; forward_tag=atatagcg; seq_b_mismatch=0; scientific_name=Apicotermitinae; experiment=litiere_colle; species_list={'order_filtered_embl_r136_noenv_COL': ['Anoplotermes group sp. 1 TB-2017', 'Anoplotermes group nr. E1 TB-2014', 'Anoplotermes schwarzi', 'Ruptitermes nr. xanthochiton FG-ND2-26', 'Patawatermes sp. A TB-2017', 'Patawatermes nigripunctatus', 'Aparatermes sp. A TB-2017', 'Patawatermes turricola', 'Anoplotermes banksi', 'Humutermes krishnai', 'Grigiotermes hageni', 'Longustitermes manni']}; seq_a_single=94; cluster_weight=12; reverse_score=76.0; count=12; seq_b_insertion=0; taxid_by_db={'order_filtered_embl_r136_noenv_COL': 92739}; id_status={'order_filtered_embl_r136_noenv_COL': True}; seq_b_deletion=0; genus_name=None; cluster_center=True; seq_a_insertion=0; seq_length_ori=219; reverse_tag=agactatg; class=50557; goodAli=Alignement; match_count={'order_filtered_embl_r136_noenv_COL': 13}; family_name=Termitidae; best_match={'order_filtered_embl_r136_noenv_COL': 'KY224594; order_name=Blattodea; taxid=92739; scientific_name_by_db={'order_filtered_embl_r136_noenv_COL': 'Apicotermitinae'}; cluster_score=1.0; genus=None; order=85823; '}; order_name=Blattodea; rank_by_db={'order_filtered_embl_r136_noenv_COL': 'subfamily'}; seq_length=162; seq_b_single=94; status=full; mode=alignment; position=08_01D; genus=None; order=85823; distance=0.0;
taatttaatcttataatcaaaataaatggatcaaaaaactataaacaaatatatagcagt
aaagaggagttaaataaattcctcccatcgccccaacaaaacacctaaatcacttaataa
aacaaaacaaacaaaataataaaaagtaaataaaatgttaac
command:
obiannotate -C toto.fasta
Example corresponding output:
>HISEQ:204:C8E5RANXX:7:1308:3148:82868_CONS_SUB_SUB '}; order_name=Blattodea; rank_by_db={'order_filtered_embl_r136_noenv_COL': 'subfamily'}; seq_length=162; seq_b_single=94; status=full; mode=alignment; position=08_01D; genus=None; order=85823; distance=0.0;
taatttaatcttataatcaaaataaatggatcaaaaaactataaacaaatatatagcagt
aaagaggagttaaataaattcctcccatcgccccaacaaaacacctaaatcacttaataa
aacaaaacaaacaaaataataaaaagtaaataaaatgttaachttps://git.metabarcoding.org/obitools/obitools/-/issues/43ecotag bug with short sequences2019-10-04T05:43:41ZCeline Mercierecotag bug with short sequencesecotag computes erroneous alignment results when the sequences are shorter than 8bp (I think).
It can result in something that looks like an infinite loop with a memory build-up.
Filter short sequences out with obigrep to avoid this problem.ecotag computes erroneous alignment results when the sequences are shorter than 8bp (I think).
It can result in something that looks like an infinite loop with a memory build-up.
Filter short sequences out with obigrep to avoid this problem.https://git.metabarcoding.org/obitools/obitools/-/issues/44extractread22019-08-06T01:30:52ZEmily Dziedzicextractread2is extractread2 still included in the OBITools package?is extractread2 still included in the OBITools package?https://git.metabarcoding.org/obitools/obitools/-/issues/45Please port to Python32019-08-21T19:43:54ZAndreas TillePlease port to Python3Hello,
the Debian Med team is maintaining OBITools for official Debian. The recently released Debian 10 was the last Debian release featuring Python2 since this programming language is EOL. If you are interested that we continue to maintain OBITools in official Debian (and that users of other modern distributions will have no problems to install OBITools on their systems) I'd recommend you port your code to Python3. The 2to3 tool might be of great help here.
Kind regards, Andreas.Hello,
the Debian Med team is maintaining OBITools for official Debian. The recently released Debian 10 was the last Debian release featuring Python2 since this programming language is EOL. If you are interested that we continue to maintain OBITools in official Debian (and that users of other modern distributions will have no problems to install OBITools on their systems) I'd recommend you port your code to Python3. The 2to3 tool might be of great help here.
Kind regards, Andreas.https://git.metabarcoding.org/obitools/obitools/-/issues/46ngsfilter: AssertionError2022-11-01T09:19:27ZChristina Pngsfilter: AssertionErrorHello,
I am trying to use ngsfilter and I get this error
`Traceback (most recent call last):
File "/mnt/big/Metagenomics/OBITools-1.2.0/bin/ngsfilter", line 426, in <module>
primers=readTagfile(options.taglist)
File "/mnt/big/Metagenomics/OBITools-1.2.0/bin/ngsfilter", line 249, in readTagfile
"partial fragment are not usable with primer pair : (%s,%s)" % (forward,reverse)
AssertionError: partial fragment are not usable with primer pair : (D: nnnncgctcttatggtgcatggccgttcttagt,R: nnnnatacagcgcatctaagggcatcacagacc)
`
I am using OBITools-1.2.0/ and a command that I have previously used with a different set of samples and a different sample description fileHello,
I am trying to use ngsfilter and I get this error
`Traceback (most recent call last):
File "/mnt/big/Metagenomics/OBITools-1.2.0/bin/ngsfilter", line 426, in <module>
primers=readTagfile(options.taglist)
File "/mnt/big/Metagenomics/OBITools-1.2.0/bin/ngsfilter", line 249, in readTagfile
"partial fragment are not usable with primer pair : (%s,%s)" % (forward,reverse)
AssertionError: partial fragment are not usable with primer pair : (D: nnnncgctcttatggtgcatggccgttcttagt,R: nnnnatacagcgcatctaagggcatcacagacc)
`
I am using OBITools-1.2.0/ and a command that I have previously used with a different set of samples and a different sample description filehttps://git.metabarcoding.org/obitools/obitools/-/issues/47obisilva not working2019-09-12T02:40:34ZChristina Tobisilva not workingHi,
I have been using obitools for my analysis of 16S metabarcoding data. Just trying to create an ecoPCR database from the SILVA SSU database, prior to taxonomic assignment.
As per the manual, I have tried the following:
```
obisilva --ssu --parc --local=/home/silva_db
```
the folder silva_db contains:
```
SILVA_132_SSUParc_tax_silva.fasta
/taxonomy/tax_slv_ssu_132.txt
```
this executes fine and the progress bar shows 100% and I get the following output files:
```
-rw-rw---- 1 user user 2814245955 Sep 2 13:08 silva_132_ssuparc_full_001.sdx
-rw-rw---- 1 user user 162328 Sep 2 12:53 silva_132_ssuparc_full.adx
-rw-rw---- 1 user user 256770210 Sep 2 13:08 silva_132_ssuparc_full.ldx
-rw-rw---- 1 user user 637224 Sep 2 12:53 silva_132_ssuparc_full.ndx
-rw-rw---- 1 user user 296 Sep 2 12:53 silva_132_ssuparc_full.rdx
-rw-rw---- 1 user user 434319 Sep 2 12:53 silva_132_ssuparc_full.tdx
```
But it doesn't create a pdx file, which it looks for later on when using
```
ecotag -d silva_132_ssuparc_full -R SILVA_132_SSUParc_tax_silva.fasta input.fasta > output.fasta
```
```
Error message:
Reading binary taxonomy database...
[INFO : Taxon alias file found]
[INFO : Local taxon file found] : 6073181 added taxa
Taxonomical tree read
[Errno 2] No such file or directory: 'silva_132_ssuparc_full.pdx'
[INFO : Preferred taxon name file not found]
ok
Reading reference DB ... : 6073181
Traceback (most recent call last):
File "/usr/bin/ecotag", line 346, in <module>
taxonlink[seqid]=int(seq['taxid'])
File "src/obitools/_obitools.pyx", line 263, in obitools._obitools.BioSequence.__getitem__
File "src/obitools/_obitools.pyx", line 217, in obitools._obitools.BioSequence.getKey
KeyError: 'taxid'
```
I have also tried
```
obisilva --ssu --parc
```
where it is supposed to download the database and create the indexing files, but the progress bar stopped at 68% and didn't continue the job at all, it just stalled and it resulted in the same set of output files as mentioned above. Plus, it didn't download the fasta file, which you need later on.
Any help, would be much appreciated, thanks!Hi,
I have been using obitools for my analysis of 16S metabarcoding data. Just trying to create an ecoPCR database from the SILVA SSU database, prior to taxonomic assignment.
As per the manual, I have tried the following:
```
obisilva --ssu --parc --local=/home/silva_db
```
the folder silva_db contains:
```
SILVA_132_SSUParc_tax_silva.fasta
/taxonomy/tax_slv_ssu_132.txt
```
this executes fine and the progress bar shows 100% and I get the following output files:
```
-rw-rw---- 1 user user 2814245955 Sep 2 13:08 silva_132_ssuparc_full_001.sdx
-rw-rw---- 1 user user 162328 Sep 2 12:53 silva_132_ssuparc_full.adx
-rw-rw---- 1 user user 256770210 Sep 2 13:08 silva_132_ssuparc_full.ldx
-rw-rw---- 1 user user 637224 Sep 2 12:53 silva_132_ssuparc_full.ndx
-rw-rw---- 1 user user 296 Sep 2 12:53 silva_132_ssuparc_full.rdx
-rw-rw---- 1 user user 434319 Sep 2 12:53 silva_132_ssuparc_full.tdx
```
But it doesn't create a pdx file, which it looks for later on when using
```
ecotag -d silva_132_ssuparc_full -R SILVA_132_SSUParc_tax_silva.fasta input.fasta > output.fasta
```
```
Error message:
Reading binary taxonomy database...
[INFO : Taxon alias file found]
[INFO : Local taxon file found] : 6073181 added taxa
Taxonomical tree read
[Errno 2] No such file or directory: 'silva_132_ssuparc_full.pdx'
[INFO : Preferred taxon name file not found]
ok
Reading reference DB ... : 6073181
Traceback (most recent call last):
File "/usr/bin/ecotag", line 346, in <module>
taxonlink[seqid]=int(seq['taxid'])
File "src/obitools/_obitools.pyx", line 263, in obitools._obitools.BioSequence.__getitem__
File "src/obitools/_obitools.pyx", line 217, in obitools._obitools.BioSequence.getKey
KeyError: 'taxid'
```
I have also tried
```
obisilva --ssu --parc
```
where it is supposed to download the database and create the indexing files, but the progress bar stopped at 68% and didn't continue the job at all, it just stalled and it resulted in the same set of output files as mentioned above. Plus, it didn't download the fasta file, which you need later on.
Any help, would be much appreciated, thanks!https://git.metabarcoding.org/obitools/obitools/-/issues/50Error in ngsfilter: AssertionError: tag pair (None, None) is already used wit...2020-01-18T00:09:37ZLotte SkovmandError in ngsfilter: AssertionError: tag pair (None, None) is already used with primer pairsHello!
I am using OBITools to do a diet analysis for my fecal samples. I am running into an error when using the ngsfilter function:
`(OBITools-1.2.13) ngsfilter -t /allmap_obi_trnl_trial_troubleshooting.txt -u unidentified.fastq Sample1.ali.fastq > Sample1.ali.assigned.fastq`
This is the output when we used the flag `--DEBUG`.
```Traceback (most recent call last):
File "/Users/username/Documents/trial/OBITools-1.2.13/export/bin/ngsfilter", line 426, in <module>
primers=readTagfile(options.taglist)
File "/Users/username/Documents/trial/OBITools-1.2.13/export/bin/ngsfilter", line 227, in readTagfile
"tag pair %s is already used with primer pairs : (%s,%s)" % (str(tags),forward,reverse)
AssertionError: tag pair (None, None) is already used with primer pairs : (D: gggcaatcctgagccaa,R: ccattgagtctctgcacctatc)
(OBITools-1.2.13) ngsfilter -t /allmap_obi_trnl_trial_troubleshooting.txt -u unidentified.fastq Sample1.ali.fastq > Sample1.ali.assigned.fastq --DEBUG
gggcaatcctgagccaa : 17 * 4.0 + 2 * -2.0 = 56.0
ccattgagtctctgcacctatc : 22 * 4.0 + 2 * -2.0 = 76.0
gggcaatcctgagccaa : 17 * 4.0 + 2 * -2.0 = 56.0
ccattgagtctctgcacctatc : 22 * 4.0 + 2 * -2.0 = 76.0
gggcaatcctgagccaa : 17 * 4.0 + 2 * -2.0 = 56.0
ccattgagtctctgcacctatc : 22 * 4.0 + 2 * -2.0 = 76.0
gggcaatcctgagccaa : 17 * 4.0 + 2 * -2.0 = 56.0
ccattgagtctctgcacctatc : 22 * 4.0 + 2 * -2.0 = 76.0
Traceback (most recent call last):
File "/Users/username/Documents/trial/OBITools-1.2.13/export/bin/ngsfilter", line 426, in <module>
primers=readTagfile(options.taglist)
File "/Users/username/Documents/trial/OBITools-1.2.13/export/bin/ngsfilter", line 227, in readTagfile
"tag pair %s is already used with primer pairs : (%s,%s)" % (str(tags),forward,reverse)
AssertionError: tag pair (None, None) is already used with primer pairs : (D: gggcaatcctgagccaa,R: ccattgagtctctgcacctatc)```
As barcodes had been removed in the original sequence processing and reads were already demultiplexed, `ngsfilter` was run with `-:-` in the mapping file in place of the barcode tag sequence. Mapping file example has been attached.
[allmap_obi_trnl_trial_troubleshooting.txt](/uploads/99c943a034647ad10a2616b746b9a338/allmap_obi_trnl_trial_troubleshooting.txt)
Does anyone have a similar problem or suggestion?Hello!
I am using OBITools to do a diet analysis for my fecal samples. I am running into an error when using the ngsfilter function:
`(OBITools-1.2.13) ngsfilter -t /allmap_obi_trnl_trial_troubleshooting.txt -u unidentified.fastq Sample1.ali.fastq > Sample1.ali.assigned.fastq`
This is the output when we used the flag `--DEBUG`.
```Traceback (most recent call last):
File "/Users/username/Documents/trial/OBITools-1.2.13/export/bin/ngsfilter", line 426, in <module>
primers=readTagfile(options.taglist)
File "/Users/username/Documents/trial/OBITools-1.2.13/export/bin/ngsfilter", line 227, in readTagfile
"tag pair %s is already used with primer pairs : (%s,%s)" % (str(tags),forward,reverse)
AssertionError: tag pair (None, None) is already used with primer pairs : (D: gggcaatcctgagccaa,R: ccattgagtctctgcacctatc)
(OBITools-1.2.13) ngsfilter -t /allmap_obi_trnl_trial_troubleshooting.txt -u unidentified.fastq Sample1.ali.fastq > Sample1.ali.assigned.fastq --DEBUG
gggcaatcctgagccaa : 17 * 4.0 + 2 * -2.0 = 56.0
ccattgagtctctgcacctatc : 22 * 4.0 + 2 * -2.0 = 76.0
gggcaatcctgagccaa : 17 * 4.0 + 2 * -2.0 = 56.0
ccattgagtctctgcacctatc : 22 * 4.0 + 2 * -2.0 = 76.0
gggcaatcctgagccaa : 17 * 4.0 + 2 * -2.0 = 56.0
ccattgagtctctgcacctatc : 22 * 4.0 + 2 * -2.0 = 76.0
gggcaatcctgagccaa : 17 * 4.0 + 2 * -2.0 = 56.0
ccattgagtctctgcacctatc : 22 * 4.0 + 2 * -2.0 = 76.0
Traceback (most recent call last):
File "/Users/username/Documents/trial/OBITools-1.2.13/export/bin/ngsfilter", line 426, in <module>
primers=readTagfile(options.taglist)
File "/Users/username/Documents/trial/OBITools-1.2.13/export/bin/ngsfilter", line 227, in readTagfile
"tag pair %s is already used with primer pairs : (%s,%s)" % (str(tags),forward,reverse)
AssertionError: tag pair (None, None) is already used with primer pairs : (D: gggcaatcctgagccaa,R: ccattgagtctctgcacctatc)```
As barcodes had been removed in the original sequence processing and reads were already demultiplexed, `ngsfilter` was run with `-:-` in the mapping file in place of the barcode tag sequence. Mapping file example has been attached.
[allmap_obi_trnl_trial_troubleshooting.txt](/uploads/99c943a034647ad10a2616b746b9a338/allmap_obi_trnl_trial_troubleshooting.txt)
Does anyone have a similar problem or suggestion?https://git.metabarcoding.org/obitools/obitools/-/issues/54ecotag blocked2020-03-18T23:20:23ZLuca Turoloecotag blockedHello
I ran ecotag and i don't know why it stopped to work after the 16.5% with a printed cache size. Here the code:
ecotag -t /home/lt/TAXO/ -R TRNL\ DATABASE/renl_db.fasta Plate_9_uniq_c10_l80_
clean.fasta > Plate_9_tag.fasta
Here the result:
Reading taxonomy dump file...
List all taxonomy rank...
Indexing taxonomy...
Indexing parent and rank...
Adding scientific name...
Adding taxid alias...
Adding deleted taxid...
Reading reference DB ... : 19083
Cache size : 1000000
Plate_9_uniq_c10_l80_clean.fasta 16.5 % |########- ] remain : 00:01:34
Thanks in advance.
Luca TuroloHello
I ran ecotag and i don't know why it stopped to work after the 16.5% with a printed cache size. Here the code:
ecotag -t /home/lt/TAXO/ -R TRNL\ DATABASE/renl_db.fasta Plate_9_uniq_c10_l80_
clean.fasta > Plate_9_tag.fasta
Here the result:
Reading taxonomy dump file...
List all taxonomy rank...
Indexing taxonomy...
Indexing parent and rank...
Adding scientific name...
Adding taxid alias...
Adding deleted taxid...
Reading reference DB ... : 19083
Cache size : 1000000
Plate_9_uniq_c10_l80_clean.fasta 16.5 % |########- ] remain : 00:01:34
Thanks in advance.
Luca Turolohttps://git.metabarcoding.org/obitools/obitools/-/issues/55asymmetry in simpleLCS()2020-05-04T00:22:37ZLara Urbanasymmetry in simpleLCS()Dear OBItools team,
I tried to understand how OBItools ECOTAG exactly finds the best matching hit, i.e how it determines the longest common substring (LCS) and the shortest alignment corresponding to this LCS.
I think that I found most of the source code here: https://git.metabarcoding.org/obitools/obitools/tree/master/src; if I compare two identical sequences with 'simpleLCS' (in src/obitools/align/_lcs.ext.1.c) and a base pair is added to the beginning of one of the sequences, this is evaluated as mismatch (i.e. LCS is reduced by one), whereas a base pair added to the end of a sequence is just being ignored (i.e. LCS stays the same). There seems to be an issue with symmetry; e.g. if:
s1 = 'acccctttgcccatatcggccctagctctc'
s2 = 'acccctttgcccatatcggccctagctct'
s3 = 'cccctttgcccatatcggccctagctctc'
then: simpleLCS(s1,s1), simpleLCS(s1,s2), simpleLCS(s2,s1), and simpleLCS(s1,s3) deliver the same values, but not simpleLCS(s3,s1).
Could you help me understand this, please?
Best regards,
LaraDear OBItools team,
I tried to understand how OBItools ECOTAG exactly finds the best matching hit, i.e how it determines the longest common substring (LCS) and the shortest alignment corresponding to this LCS.
I think that I found most of the source code here: https://git.metabarcoding.org/obitools/obitools/tree/master/src; if I compare two identical sequences with 'simpleLCS' (in src/obitools/align/_lcs.ext.1.c) and a base pair is added to the beginning of one of the sequences, this is evaluated as mismatch (i.e. LCS is reduced by one), whereas a base pair added to the end of a sequence is just being ignored (i.e. LCS stays the same). There seems to be an issue with symmetry; e.g. if:
s1 = 'acccctttgcccatatcggccctagctctc'
s2 = 'acccctttgcccatatcggccctagctct'
s3 = 'cccctttgcccatatcggccctagctctc'
then: simpleLCS(s1,s1), simpleLCS(s1,s2), simpleLCS(s2,s1), and simpleLCS(s1,s3) deliver the same values, but not simpleLCS(s3,s1).
Could you help me understand this, please?
Best regards,
Lara