OBITools3 issueshttps://git.metabarcoding.org/obitools/obitools3/-/issues2024-03-19T19:39:20Zhttps://git.metabarcoding.org/obitools/obitools3/-/issues/139More with already demultiplexed data2024-03-19T19:39:20ZJackie DoyleMore with already demultiplexed dataHi! I am also working with already demultiplexed data and wanted to ask about Celine's response to Jason's question here:
https://git.metabarcoding.org/obitools/obitools3/-/issues/127
Part of Celine's response is that "You can merge them (the demultiplexed reads) with `obi cat` (just annotate the samples before)".
Could you provide an example of how obi cat is used? I've looked through tutorials but not found one. Would we use obi cat at the very beginning of the pipeline (before even importing the reads) and concatenate all the R1 files together and all the R2 files together?
Could you also clarify what you mean by "annotate the samples before"? Is this something you would still do with ngsfilter? I'm confused regarding how the different steps in the pipeline would be tied together...
Many thanks,
Jacqueline DoyleHi! I am also working with already demultiplexed data and wanted to ask about Celine's response to Jason's question here:
https://git.metabarcoding.org/obitools/obitools3/-/issues/127
Part of Celine's response is that "You can merge them (the demultiplexed reads) with `obi cat` (just annotate the samples before)".
Could you provide an example of how obi cat is used? I've looked through tutorials but not found one. Would we use obi cat at the very beginning of the pipeline (before even importing the reads) and concatenate all the R1 files together and all the R2 files together?
Could you also clarify what you mean by "annotate the samples before"? Is this something you would still do with ngsfilter? I'm confused regarding how the different steps in the pipeline would be tied together...
Many thanks,
Jacqueline Doylehttps://git.metabarcoding.org/obitools/obitools3/-/issues/138Error trying to use multithreading with obi clean2024-01-04T20:03:12ZROBERT MULDOWNEYError trying to use multithreading with obi cleanObiclean is running with no errors using OBIools 1.0.0.
We wanted to try the parallel version with 60 threads on a system with 112 threads running RedHat 8.
We are receiving the following error:
2024-01-04 13:06:26,296 [clean : INFO ] obi clean
2024-01-04 13:06:26,297 [clean : INFO ] Opened file: /local/home/danielf/obi3/merged.uniq.c10.l80.fasta
Traceback (most recent call last):
File "/local/home/danielf/anaconda3/envs/obi3/bin/obi", line 62, in <module>
config[root_config_name]['module'].run(config)
File "python/obitools3/commands/clean.pyx", line 101, in obitools3.commands.clean.run
File "python/obitools3/files/uncompress.pyx", line 129, in __iter__
TypeError: 'obitools3.files.uncompress.MagicKeyFile' object is not iterable
The version is OBITools 3.0.1b25 installed in Anaconda with python 3.7.16
Any suggestions?Obiclean is running with no errors using OBIools 1.0.0.
We wanted to try the parallel version with 60 threads on a system with 112 threads running RedHat 8.
We are receiving the following error:
2024-01-04 13:06:26,296 [clean : INFO ] obi clean
2024-01-04 13:06:26,297 [clean : INFO ] Opened file: /local/home/danielf/obi3/merged.uniq.c10.l80.fasta
Traceback (most recent call last):
File "/local/home/danielf/anaconda3/envs/obi3/bin/obi", line 62, in <module>
config[root_config_name]['module'].run(config)
File "python/obitools3/commands/clean.pyx", line 101, in obitools3.commands.clean.run
File "python/obitools3/files/uncompress.pyx", line 129, in __iter__
TypeError: 'obitools3.files.uncompress.MagicKeyFile' object is not iterable
The version is OBITools 3.0.1b25 installed in Anaconda with python 3.7.16
Any suggestions?https://git.metabarcoding.org/obitools/obitools3/-/issues/137Cannot import rawdata2023-12-07T02:37:34Z李明家Cannot import rawdataHi,
When I import the NGS rawdata(my file or wolf file), some ERROR was happened~ <br />
I try to different options, such as --fastq-input, --input-na-string OBI:INPUTNASTRING, --no-quality, all of them could not run, and the same ERROR info was printed. I don't know how to do the next. Hoping the author or other guys can give me some suggestions! Thanks a lot!<br />
<br />
My python version: 3.9.18<br />
<br />
My command:<br />
<br />
obi import --quality-solexa raw_data/ZTPSN23LB979-WXY_1_R1.fq temp/try.reads1<br />
<br />
2023-12-07 09:39:07,830 [import : INFO ] obi import: imports an object (file(s), obiview, taxonomy...) into a DMS<br />
2023-12-07 09:39:07,847 [import : INFO ] Opened file: raw_data/ZTPSN23LB979-WXY_1_R1.fq<br />
2023-12-07 09:39:08,028 [import : INFO ] Importing 60229 entries<br />
Traceback (most recent call last):<br />
File "/home/mjli/obi3-env/bin/obi", line 62, in <module><br />
config[root_config_name]['module'].run(config)<br />
File "python/obitools3/commands/import.pyx", line 231, in obitools3.commands.import.run<br />
NotImplementedError<br />Hi,
When I import the NGS rawdata(my file or wolf file), some ERROR was happened~ <br />
I try to different options, such as --fastq-input, --input-na-string OBI:INPUTNASTRING, --no-quality, all of them could not run, and the same ERROR info was printed. I don't know how to do the next. Hoping the author or other guys can give me some suggestions! Thanks a lot!<br />
<br />
My python version: 3.9.18<br />
<br />
My command:<br />
<br />
obi import --quality-solexa raw_data/ZTPSN23LB979-WXY_1_R1.fq temp/try.reads1<br />
<br />
2023-12-07 09:39:07,830 [import : INFO ] obi import: imports an object (file(s), obiview, taxonomy...) into a DMS<br />
2023-12-07 09:39:07,847 [import : INFO ] Opened file: raw_data/ZTPSN23LB979-WXY_1_R1.fq<br />
2023-12-07 09:39:08,028 [import : INFO ] Importing 60229 entries<br />
Traceback (most recent call last):<br />
File "/home/mjli/obi3-env/bin/obi", line 62, in <module><br />
config[root_config_name]['module'].run(config)<br />
File "python/obitools3/commands/import.pyx", line 231, in obitools3.commands.import.run<br />
NotImplementedError<br />https://git.metabarcoding.org/obitools/obitools3/-/issues/135EMBL importing issue2023-09-27T10:00:34ZThomas D HughesEMBL importing issueHi,
I have an issue which seems to be similar or the same as issue #129.
I am trying to import the latest EMBL release using the methods outlined in the wolf tutorial but I get this error below. on this file Parsing this file STD_PLN_4.dat.gz. I have tired removing the file to see if the issue only this file is the issue but I also get the fault with STD_VRL_11.dat.gz.
I have tried updating obitools using pip and currently have version 3.0.1b22 which I hope is the latest version but it doesn't seem to resolve the issue for me at least.
The sequence that cannot be imported is very long so I just attached the start and the end which I think are the only bits which may have useful information.
DEBUG /private/var/folders/hx/z6wbn36d04l48hw501jt40lm0000gn/T/pip-install-gxjlmvbt/obitools3_e27c130ed86d4de0aeeea00f0794a369/src/obiavl.c:1669:obi_create_avl, obi_errno = 20, errno = 24 :
Error creating an AVL tree file
DEBUG /private/var/folders/hx/z6wbn36d04l48hw501jt40lm0000gn/T/pip-install-gxjlmvbt/obitools3_e27c130ed86d4de0aeeea00f0794a369/src/obiavl.c:1013:add_new_avl_in_group, obi_errno = 20, errno = 24 :
Error creating a new AVL tree in a group
Could not import sequence:
{b'ID': b'LR812263', b'NUC_SEQ': b'CAAATCTAGCAAGAATAGGCCAAAACTACAAGTTTTGAGGAGTTCCCCGTAACTGGACCCCGAGGTTCCCGAAATGTTTGGATCACAGCGGGACACTAAATCAGTGACTAATAACATACAAAATTGTCTGCAGTAGTCCTAAACTGCGAGTTTTGACGAGTTCCACGTAACCGGACCCCGAGGTTCCTGAAACATCCGGATCGTAGCGGGACCCAAAATCAATGAGTAATTGCATACAAAACTGGCAAGAACAGGCCAAAACTGCGAGTTTTAACGAGTTTTCGGTAACCGGACCCCGGGGGTCCCGAAACATTCGTATCGCAGGGAAACCAAAAACAGTGACTAATGGCCTAGAAAACTAGCCAGAACAGGCCAACACTGTGAGTTTTCACGAATTCCCCGTAACCCCACCTCGGGGTTCCCGAAACGTTCGGATCGCAGCGGGACCCAAAATGAGTGAGTAATAGCATATAAAACTAGCCAGAATAGGCCAAAACTGCGAGTTTGAAGAGTTCCCCGTAACCACACCCCGGGGTTCCCGAAATGTTCGGATCGTAGTAGGACCCCAAATCAGTCAGTAATAGCATACAAATCTAGCAAGAATAGGCCAAAACTACAAGTTTTGAGGAGTTCCCCGTAACTGGACCCCGAG
.........
GACACGCTCGTCGGACACCGGCAGGGCAGCTAGCTAGTCTGTGCGCGTCGCGACTCCCTTCACCAGCCGTCTTCTTCGTCAACGCTCGCAAGTTGTTCGACGGTTTGCCAAGGTACAAAATGGACTCCGTCGACGAGTTCTTTTTTCACAATTTCCTTTGCGACTCCGACGATTCGTCATCCGATGACGAGGAGGAGGTATTGGCTGCCGTGTTGGTCCATCACCTGCTCAATAGCTAGCGGCCGTTGTTCCGTGGCTCCATTCCGAGCCACCTTCCGGTGTTGA', b'DEFINITION': b'Hordeum vulgare subsp. vulgare genome assembly, chromosome: 3H', b'TAXID': 112509, b'organism': b'Hordeum vulgare subsp. vulgare'}
Error raised: Problem setting a value in a column
/!\ Check if '--input-na-string' option needs to be set
zsh: segmentation fault obi import --embl-input EMBL Guate/embl_refs
I hope I have included the relevant info, and any help would be greatly appreciated!
Best,
TomHi,
I have an issue which seems to be similar or the same as issue #129.
I am trying to import the latest EMBL release using the methods outlined in the wolf tutorial but I get this error below. on this file Parsing this file STD_PLN_4.dat.gz. I have tired removing the file to see if the issue only this file is the issue but I also get the fault with STD_VRL_11.dat.gz.
I have tried updating obitools using pip and currently have version 3.0.1b22 which I hope is the latest version but it doesn't seem to resolve the issue for me at least.
The sequence that cannot be imported is very long so I just attached the start and the end which I think are the only bits which may have useful information.
DEBUG /private/var/folders/hx/z6wbn36d04l48hw501jt40lm0000gn/T/pip-install-gxjlmvbt/obitools3_e27c130ed86d4de0aeeea00f0794a369/src/obiavl.c:1669:obi_create_avl, obi_errno = 20, errno = 24 :
Error creating an AVL tree file
DEBUG /private/var/folders/hx/z6wbn36d04l48hw501jt40lm0000gn/T/pip-install-gxjlmvbt/obitools3_e27c130ed86d4de0aeeea00f0794a369/src/obiavl.c:1013:add_new_avl_in_group, obi_errno = 20, errno = 24 :
Error creating a new AVL tree in a group
Could not import sequence:
{b'ID': b'LR812263', b'NUC_SEQ': b'CAAATCTAGCAAGAATAGGCCAAAACTACAAGTTTTGAGGAGTTCCCCGTAACTGGACCCCGAGGTTCCCGAAATGTTTGGATCACAGCGGGACACTAAATCAGTGACTAATAACATACAAAATTGTCTGCAGTAGTCCTAAACTGCGAGTTTTGACGAGTTCCACGTAACCGGACCCCGAGGTTCCTGAAACATCCGGATCGTAGCGGGACCCAAAATCAATGAGTAATTGCATACAAAACTGGCAAGAACAGGCCAAAACTGCGAGTTTTAACGAGTTTTCGGTAACCGGACCCCGGGGGTCCCGAAACATTCGTATCGCAGGGAAACCAAAAACAGTGACTAATGGCCTAGAAAACTAGCCAGAACAGGCCAACACTGTGAGTTTTCACGAATTCCCCGTAACCCCACCTCGGGGTTCCCGAAACGTTCGGATCGCAGCGGGACCCAAAATGAGTGAGTAATAGCATATAAAACTAGCCAGAATAGGCCAAAACTGCGAGTTTGAAGAGTTCCCCGTAACCACACCCCGGGGTTCCCGAAATGTTCGGATCGTAGTAGGACCCCAAATCAGTCAGTAATAGCATACAAATCTAGCAAGAATAGGCCAAAACTACAAGTTTTGAGGAGTTCCCCGTAACTGGACCCCGAG
.........
GACACGCTCGTCGGACACCGGCAGGGCAGCTAGCTAGTCTGTGCGCGTCGCGACTCCCTTCACCAGCCGTCTTCTTCGTCAACGCTCGCAAGTTGTTCGACGGTTTGCCAAGGTACAAAATGGACTCCGTCGACGAGTTCTTTTTTCACAATTTCCTTTGCGACTCCGACGATTCGTCATCCGATGACGAGGAGGAGGTATTGGCTGCCGTGTTGGTCCATCACCTGCTCAATAGCTAGCGGCCGTTGTTCCGTGGCTCCATTCCGAGCCACCTTCCGGTGTTGA', b'DEFINITION': b'Hordeum vulgare subsp. vulgare genome assembly, chromosome: 3H', b'TAXID': 112509, b'organism': b'Hordeum vulgare subsp. vulgare'}
Error raised: Problem setting a value in a column
/!\ Check if '--input-na-string' option needs to be set
zsh: segmentation fault obi import --embl-input EMBL Guate/embl_refs
I hope I have included the relevant info, and any help would be greatly appreciated!
Best,
Tomhttps://git.metabarcoding.org/obitools/obitools3/-/issues/132Error getting the taxid column in ecopcr2023-09-15T19:52:28ZTiff DeGrootError getting the taxid column in ecopcrI have ncbi reference sequence data in a fasta format that was downloaded using [nsdpy](https://pypi.org/project/nsdpy/). I was able to import my reference file (sequences.fasta), and import the taxdump file as specified in the wolf tutorial. But when I run obi ecopcr, I get "Error getting the taxid column". See below for my code and the error. I do not know how to verify the "my_tax" files because of the obi format, I cannot view the contents.
Code:
`obi import /ref_dbs/nsdpy/NSDPY_results/2023-08-11_12-06-02/fasta/sequences.fasta iDNAtest/nsdpy_refs`
`wget https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz`
`obi import --taxdump taxdump.tar.gz iDNAtest/taxonomy/my_tax`
`obi ecopcr -e 3 -l 50 -L 160 -F CGGTTGGGGTGACCTCGGA -R GCTGTTATCCCTAGGGTAACT --taxonomy iDNAtest/taxonomy/my_tax iDNAtest/nsdpy_refs iDNAtest/16SMam_refs`
The error message:
`[ecopcr : INFO ] obi ecopcr
DEBUG /tmp/pip-install-q9f2v4ku/obitools3_4d00853c37a9478b8d299eb6e1ee07f1/src/obi_ecopcr.c:816:obi_ecopcr, obi_errno = 0, errno = 2 :
Error getting the taxid column
Traceback (most recent call last):
File "/tools/OBITools3/obi3-env/bin/obi", line 62, in <module>
config[root_config_name]['module'].run(config)
File "python/obitools3/commands/ecopcr.pyx", line 217, in obitools3.commands.ecopcr.run
Exception: Error running ecopcr`
Format of the nsdpy fasta file:
`>OK183856.1 Cercopithecus erythrotis voucher T1768 16S large subunit ribosomal RNA gene, partial sequence; mitochondrial
CTGCCTGCCCAGTGACACACGTTTAACGGCCGCGGTACCCTGACCGTGCAAAGGTAGCATAATCACTTGT
TCTTTAAATAGGGACTCGTATGAATGGCATCACGAGGGTTTAACTGTCTCTTACTTTCAACCAGTGAAAT
TGACCTGTCCGTGAAGAGACGGACATGAAACAATAAGACGAGAAGACCCTGTGGAGCTTCAATTTATTAG
TACAACTAAAAACAACACAAACCAACAGGCCCTAAACCCCTACATCTGTGCTAAAAATTTTGGTTGGGGC
GACCTCGGAGCACAACCAAACCTCCGAATAATCCACGCTAAGACTACACAAGTCAAAGCAAACTAACACC
TACAATTGACCCAATAATTTGATCAACGGAACAAGTTACCCCAGGGATAACAGCGCAATTCTATTCTAGA
GTCCATATCAACAATAGAGTTTACGACCTCGATGTTGGATCAGGATATCCTAATGGTGCAGCAGCTATCA
AG`
It looks like this differs from the format of the reference sequences in the wolf tutorial from EMBL. But I do not know how to get my reference database fasta in the OBITools3 format, and I don't see any info on how to download ncbi data to OBITools3. Any help in resolving this is much appreciated!I have ncbi reference sequence data in a fasta format that was downloaded using [nsdpy](https://pypi.org/project/nsdpy/). I was able to import my reference file (sequences.fasta), and import the taxdump file as specified in the wolf tutorial. But when I run obi ecopcr, I get "Error getting the taxid column". See below for my code and the error. I do not know how to verify the "my_tax" files because of the obi format, I cannot view the contents.
Code:
`obi import /ref_dbs/nsdpy/NSDPY_results/2023-08-11_12-06-02/fasta/sequences.fasta iDNAtest/nsdpy_refs`
`wget https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz`
`obi import --taxdump taxdump.tar.gz iDNAtest/taxonomy/my_tax`
`obi ecopcr -e 3 -l 50 -L 160 -F CGGTTGGGGTGACCTCGGA -R GCTGTTATCCCTAGGGTAACT --taxonomy iDNAtest/taxonomy/my_tax iDNAtest/nsdpy_refs iDNAtest/16SMam_refs`
The error message:
`[ecopcr : INFO ] obi ecopcr
DEBUG /tmp/pip-install-q9f2v4ku/obitools3_4d00853c37a9478b8d299eb6e1ee07f1/src/obi_ecopcr.c:816:obi_ecopcr, obi_errno = 0, errno = 2 :
Error getting the taxid column
Traceback (most recent call last):
File "/tools/OBITools3/obi3-env/bin/obi", line 62, in <module>
config[root_config_name]['module'].run(config)
File "python/obitools3/commands/ecopcr.pyx", line 217, in obitools3.commands.ecopcr.run
Exception: Error running ecopcr`
Format of the nsdpy fasta file:
`>OK183856.1 Cercopithecus erythrotis voucher T1768 16S large subunit ribosomal RNA gene, partial sequence; mitochondrial
CTGCCTGCCCAGTGACACACGTTTAACGGCCGCGGTACCCTGACCGTGCAAAGGTAGCATAATCACTTGT
TCTTTAAATAGGGACTCGTATGAATGGCATCACGAGGGTTTAACTGTCTCTTACTTTCAACCAGTGAAAT
TGACCTGTCCGTGAAGAGACGGACATGAAACAATAAGACGAGAAGACCCTGTGGAGCTTCAATTTATTAG
TACAACTAAAAACAACACAAACCAACAGGCCCTAAACCCCTACATCTGTGCTAAAAATTTTGGTTGGGGC
GACCTCGGAGCACAACCAAACCTCCGAATAATCCACGCTAAGACTACACAAGTCAAAGCAAACTAACACC
TACAATTGACCCAATAATTTGATCAACGGAACAAGTTACCCCAGGGATAACAGCGCAATTCTATTCTAGA
GTCCATATCAACAATAGAGTTTACGACCTCGATGTTGGATCAGGATATCCTAATGGTGCAGCAGCTATCA
AG`
It looks like this differs from the format of the reference sequences in the wolf tutorial from EMBL. But I do not know how to get my reference database fasta in the OBITools3 format, and I don't see any info on how to download ncbi data to OBITools3. Any help in resolving this is much appreciated!https://git.metabarcoding.org/obitools/obitools3/-/issues/130Obiconvert command not found2023-09-21T06:16:38ZGustavo FickObiconvert command not foundI've installed Obitools3 following this instructions (https://git.metabarcoding.org/obitools/obitools3/-/wikis/Installing-the-OBITools3). The program installs, and when I run obi --help, I get the list of commands. However, I can't use the command 'obiconvert'. When I type the command I get: command not found.
I use a Ubunto 11.3I've installed Obitools3 following this instructions (https://git.metabarcoding.org/obitools/obitools3/-/wikis/Installing-the-OBITools3). The program installs, and when I run obi --help, I get the list of commands. However, I can't use the command 'obiconvert'. When I type the command I get: command not found.
I use a Ubunto 11.3https://git.metabarcoding.org/obitools/obitools3/-/issues/129EMBL import2023-08-22T16:41:29ZJan GogartenEMBL importI am trying to follow the instructions from the Wolf tutorial and am running into an issue with the obi import.
I ran:
obi import --embl EMBL kibale/embl_refs
and the import seems to work through 9800000 entries, and then I get the error:
DEBUG /private/var/folders/c1/15vq0bls6y1d1br45m_5wvjm0000gp/T/pip-install-_c3f4t5e/obitools3_8adb6055325446cca289afec61803104/src/obiavl.c:1669:obi_create_avl, obi_errno = 20, errno = 24 :
Error creating an AVL tree file
DEBUG /private/var/folders/c1/15vq0bls6y1d1br45m_5wvjm0000gp/T/pip-install-_c3f4t5e/obitools3_8adb6055325446cca289afec61803104/src/obiavl.c:1013:add_new_avl_in_group, obi_errno = 20, errno = 24 :
Error creating a new AVL tree in a group
Could not import sequence:
{b'ID': b'OW388299', b'NUC_SEQ': b' ACCCTAACCCTACACCCTCACCACCCTACTACCCAACCCTACACCCTAACCCTAACCCTACACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTACCCCTAACCCTAACCCTAACCCTAACCCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTCAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTACACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA [keeps going for a long time, with what look like some new lines in there]
', b'DEFINITION': b'Gibbula magus genome assembly, chromosome: 11', b'TAXID': 703304, b'organism': b'Gibbula magus'}
Error raised: Problem setting a value in a column
/!\ Check if '--input-na-string' option needs to be set
zsh: segmentation fault obi import --embl EMBL kibale/embl_refs
I also tried: obi import --input-na-string OBI:INPUTNASTRING --embl-input EMBL kibale/embl_refs
but get the same error.
I guess one option would be to unzip the STD_INV_13.dat.gz file and try to remove the problematic entry, but as the file is massive (134gb) this seems like it might take a very long time.
Any advice would be greatly appreciated and many thanks for your work on obitools.
JanI am trying to follow the instructions from the Wolf tutorial and am running into an issue with the obi import.
I ran:
obi import --embl EMBL kibale/embl_refs
and the import seems to work through 9800000 entries, and then I get the error:
DEBUG /private/var/folders/c1/15vq0bls6y1d1br45m_5wvjm0000gp/T/pip-install-_c3f4t5e/obitools3_8adb6055325446cca289afec61803104/src/obiavl.c:1669:obi_create_avl, obi_errno = 20, errno = 24 :
Error creating an AVL tree file
DEBUG /private/var/folders/c1/15vq0bls6y1d1br45m_5wvjm0000gp/T/pip-install-_c3f4t5e/obitools3_8adb6055325446cca289afec61803104/src/obiavl.c:1013:add_new_avl_in_group, obi_errno = 20, errno = 24 :
Error creating a new AVL tree in a group
Could not import sequence:
{b'ID': b'OW388299', b'NUC_SEQ': b' ACCCTAACCCTACACCCTCACCACCCTACTACCCAACCCTACACCCTAACCCTAACCCTACACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTACCCCTAACCCTAACCCTAACCCTAACCCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTCAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTACACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA [keeps going for a long time, with what look like some new lines in there]
', b'DEFINITION': b'Gibbula magus genome assembly, chromosome: 11', b'TAXID': 703304, b'organism': b'Gibbula magus'}
Error raised: Problem setting a value in a column
/!\ Check if '--input-na-string' option needs to be set
zsh: segmentation fault obi import --embl EMBL kibale/embl_refs
I also tried: obi import --input-na-string OBI:INPUTNASTRING --embl-input EMBL kibale/embl_refs
but get the same error.
I guess one option would be to unzip the STD_INV_13.dat.gz file and try to remove the problematic entry, but as the file is massive (134gb) this seems like it might take a very long time.
Any advice would be greatly appreciated and many thanks for your work on obitools.
Janhttps://git.metabarcoding.org/obitools/obitools3/-/issues/127Starting with already demultiplexed data2024-03-19T19:39:21ZJason HillStarting with already demultiplexed dataI'm having some trouble figuring out how to start with data that has already been demultiplexed. The data I have has already been split into fastq files for each sample, with the illumina and sample specific tags already removed. I've spent some time with the tutorial data and my own data using the obi tools but can't quite figure out the best way to hop into the pipeline. My specific questions would be:
1) Should I start at the inital import and simply skip over the ngsfilter step and if so, do I need to manually add tags to the DMS that would be added by ngsfilter and that are needed later?
2) Is there a way to merge the samples so they are all analyzed at once as they would have been if they started multiplexed or is doing them each individually the only option?
An additional complication is that within each demultiplexed sample are up to 3 different primer pairs.
3) Are the forward and reverse primer annotations that are added by ngsfilter needed in the downstream analyses?
4) Does each primer pair need to be used in ecopcr separately or can/should they be somehow run together?
Thanks in advance for your response, and I look forward to using what seems to be a great tool.
-JasonI'm having some trouble figuring out how to start with data that has already been demultiplexed. The data I have has already been split into fastq files for each sample, with the illumina and sample specific tags already removed. I've spent some time with the tutorial data and my own data using the obi tools but can't quite figure out the best way to hop into the pipeline. My specific questions would be:
1) Should I start at the inital import and simply skip over the ngsfilter step and if so, do I need to manually add tags to the DMS that would be added by ngsfilter and that are needed later?
2) Is there a way to merge the samples so they are all analyzed at once as they would have been if they started multiplexed or is doing them each individually the only option?
An additional complication is that within each demultiplexed sample are up to 3 different primer pairs.
3) Are the forward and reverse primer annotations that are added by ngsfilter needed in the downstream analyses?
4) Does each primer pair need to be used in ecopcr separately or can/should they be somehow run together?
Thanks in advance for your response, and I look forward to using what seems to be a great tool.
-Jasonhttps://git.metabarcoding.org/obitools/obitools3/-/issues/126Issue Demultiplexing Samples with ngsfile2022-11-13T23:55:27ZSamuel HerveyIssue Demultiplexing Samples with ngsfileHello,
I recently received data back to analyze wolf diet composition from scat samples. The data contained 384 dual-indexed samples. Using OBITools3 (Version 3.0.1b18), I have been able to import my fastq files, alignpairedend, and filter based on alignment score, but I am having difficulty with demultiplexing the data. When I run the following command:
`obi ngsfilter -t wolf/ngsfile -u wolf/unidentified_sequences wolf/good_sequences wolf/identified_sequences`
It runs, but the output is zero remaining sequences. I am still unsure if I am formatting my ngsfile correctly and have included the first line of the file below:
TX1 MTU_001_1 CGAGAGTT:ATCGTACG TTAGATACCCCACTATGC TAGAACAGGCTCCTCTAG F @
In the third column of the ngsfile, should the i5 or the i7 index be listed first, and do either of the indexes need to be reported in the reverse complement?
I also used Illumina's BaseSpace software to export my data already demultiplexed to make sure that my data contained indexes. I was able to retrieve most of the samples where I received a fastq file for each sample. I tried running the OBITools3 pipeline on a single fastq file from a single sample, but ran into the same issue as before where the ngsfilter command resulted in zero remaining reads. I used the following for the ngsfile:
TX1 MTU_001_1 -:- TTAGATACCCCACTATGC TAGAACAGGCTCCTCTAG F @
Thanks in advance,
SamHello,
I recently received data back to analyze wolf diet composition from scat samples. The data contained 384 dual-indexed samples. Using OBITools3 (Version 3.0.1b18), I have been able to import my fastq files, alignpairedend, and filter based on alignment score, but I am having difficulty with demultiplexing the data. When I run the following command:
`obi ngsfilter -t wolf/ngsfile -u wolf/unidentified_sequences wolf/good_sequences wolf/identified_sequences`
It runs, but the output is zero remaining sequences. I am still unsure if I am formatting my ngsfile correctly and have included the first line of the file below:
TX1 MTU_001_1 CGAGAGTT:ATCGTACG TTAGATACCCCACTATGC TAGAACAGGCTCCTCTAG F @
In the third column of the ngsfile, should the i5 or the i7 index be listed first, and do either of the indexes need to be reported in the reverse complement?
I also used Illumina's BaseSpace software to export my data already demultiplexed to make sure that my data contained indexes. I was able to retrieve most of the samples where I received a fastq file for each sample. I tried running the OBITools3 pipeline on a single fastq file from a single sample, but ran into the same issue as before where the ngsfilter command resulted in zero remaining reads. I used the following for the ngsfile:
TX1 MTU_001_1 -:- TTAGATACCCCACTATGC TAGAACAGGCTCCTCTAG F @
Thanks in advance,
Samhttps://git.metabarcoding.org/obitools/obitools3/-/issues/125Installing obitools3 on new Mac M12022-07-15T04:04:26ZEric CoissacInstalling obitools3 on new Mac M1I succeed but cannot open OBIDMS created by intel platform :-(
SniffI succeed but cannot open OBIDMS created by intel platform :-(
Sniffhttps://git.metabarcoding.org/obitools/obitools3/-/issues/122build_ref_db: taxonomy needs to be in the same DMS as the ref sequences but i...2022-04-20T05:32:26ZCeline Mercierbuild_ref_db: taxonomy needs to be in the same DMS as the ref sequences but it shouldn't be the caseWhen running the build_ref_db command, the taxonomy is not read if it is not in the same DMS as the reference sequences. It was a choice at the time, but it feels more annoying now. (At least proper error handling should be implemented)When running the build_ref_db command, the taxonomy is not read if it is not in the same DMS as the reference sequences. It was a choice at the time, but it feels more annoying now. (At least proper error handling should be implemented)Celine MercierCeline Mercierhttps://git.metabarcoding.org/obitools/obitools3/-/issues/113obi rename2021-07-21T03:29:14ZEric Coissacobi renameI have a prankster keyboard that likes to duplicate characters. So I regularly make typos. When using `obi ngsfilter` I used the `-u foo/uunidentified` option instead of `-u foo/unidentified`. I would like to rename the `foo/uunidentified` table to `foo/unidentified`. I can't find a solution. Can we expect a `obi rename` or `obi mv` that will do this?
Thanks ;-)I have a prankster keyboard that likes to duplicate characters. So I regularly make typos. When using `obi ngsfilter` I used the `-u foo/uunidentified` option instead of `-u foo/unidentified`. I would like to rename the `foo/uunidentified` table to `foo/unidentified`. I can't find a solution. Can we expect a `obi rename` or `obi mv` that will do this?
Thanks ;-)https://git.metabarcoding.org/obitools/obitools3/-/issues/107import EMBL database into DMS stalling2023-04-12T05:22:38ZGert-Jan Jeunenimport EMBL database into DMS stallingHello Celine,
I'm running into some issues when trying to import the EMBL database into a DMS and was wondering if you could help. I've ran through the tutorial with the wolf data without any problem. However, when I try to make my own reference database using the code provided, the obi import command stalls after a while. It does not provide any error messages unfortunately.
The code I used to download the EMBL database and import the database into the DMS:
```
mkdir EMBL
cd EMBL
wget -nH --cut-dirs=5 -A rel_std_*.dat.gz -R rel_std_hum_*.dat.gz,rel_std_env_*.dat.gz -m ftp://ftp.ebi.ac.uk/pub/databases/embl/release/std/
cd ..
obi import --embl EMBL wolf/embl_refs
```
I have also tried to run the obi import command in the following fashion:
```
obi import --embl EMBL/ wolf/embl_refs
```
I've reran the code several times and it stalls at different places when importing the various files, indicating the files are not corrupted, but that rather something is happening with the code. I've attached a screenshot of the output.
Thanks in advance,
Gert-Jan
![Screen_Shot_2021-05-14_at_8.33.39_AM](/uploads/3d7c9ea3b9fa41cf09032174bc7f4370/Screen_Shot_2021-05-14_at_8.33.39_AM.png)Hello Celine,
I'm running into some issues when trying to import the EMBL database into a DMS and was wondering if you could help. I've ran through the tutorial with the wolf data without any problem. However, when I try to make my own reference database using the code provided, the obi import command stalls after a while. It does not provide any error messages unfortunately.
The code I used to download the EMBL database and import the database into the DMS:
```
mkdir EMBL
cd EMBL
wget -nH --cut-dirs=5 -A rel_std_*.dat.gz -R rel_std_hum_*.dat.gz,rel_std_env_*.dat.gz -m ftp://ftp.ebi.ac.uk/pub/databases/embl/release/std/
cd ..
obi import --embl EMBL wolf/embl_refs
```
I have also tried to run the obi import command in the following fashion:
```
obi import --embl EMBL/ wolf/embl_refs
```
I've reran the code several times and it stalls at different places when importing the various files, indicating the files are not corrupted, but that rather something is happening with the code. I've attached a screenshot of the output.
Thanks in advance,
Gert-Jan
![Screen_Shot_2021-05-14_at_8.33.39_AM](/uploads/3d7c9ea3b9fa41cf09032174bc7f4370/Screen_Shot_2021-05-14_at_8.33.39_AM.png)https://git.metabarcoding.org/obitools/obitools3/-/issues/105Error when a folder has the same path and name as a DMS2021-04-15T20:17:57ZCeline MercierError when a folder has the same path and name as a DMSIf a folder has the same path and name as a DMS (root name of the DMS), the URI is not properly decoded because the program tries to follow the path inside the folder.
The URI decoding algorithm should be able to recognize that a DMS also matches, and go with what works best (opening of a view, or of a DMS).If a folder has the same path and name as a DMS (root name of the DMS), the URI is not properly decoded because the program tries to follow the path inside the folder.
The URI decoding algorithm should be able to recognize that a DMS also matches, and go with what works best (opening of a view, or of a DMS).Celine MercierCeline Mercierhttps://git.metabarcoding.org/obitools/obitools3/-/issues/104WSL bug (Fatal Python error: pyinit_main: can't initialize time)2021-11-11T21:45:19ZCeline MercierWSL bug (Fatal Python error: pyinit_main: can't initialize time)This bug seems to touch Python on certain versions of the WSL, after the computer has been running for several days:
```
>python
Fatal Python error: pyinit_main: can't initialize time
Python runtime state: core initialized
OverflowError: timestamp too large to convert to C _PyTime_t
Current thread 0x00007f53c21c0740 (most recent call first):
<no Python frame>
```
More details here: https://bugs.python.org/issue33965
The solution is to reboot the computer (or update the WSL if you can).This bug seems to touch Python on certain versions of the WSL, after the computer has been running for several days:
```
>python
Fatal Python error: pyinit_main: can't initialize time
Python runtime state: core initialized
OverflowError: timestamp too large to convert to C _PyTime_t
Current thread 0x00007f53c21c0740 (most recent call first):
<no Python frame>
```
More details here: https://bugs.python.org/issue33965
The solution is to reboot the computer (or update the WSL if you can).https://git.metabarcoding.org/obitools/obitools3/-/issues/99ngsfilter: identify primer dimers2021-03-21T20:53:15ZCeline Mercierngsfilter: identify primer dimers`ngsfilter` currently tags primer dimers with the 'reverse primer not found' error. Ideally it should detect the reverse primer overlapping on the forward primer, and there would be a 'primer dimer' error. This is not the case at present because the forward primer is cut out of the sequence once it is found.`ngsfilter` currently tags primer dimers with the 'reverse primer not found' error. Ideally it should detect the reverse primer overlapping on the forward primer, and there would be a 'primer dimer' error. This is not the case at present because the forward primer is cut out of the sequence once it is found.https://git.metabarcoding.org/obitools/obitools3/-/issues/94Read the PR2 database2021-02-21T03:39:54ZCeline MercierRead the PR2 databaseAdd a command or option in `import` to read/import the [PR2 database](https://pr2-database.org/)Add a command or option in `import` to read/import the [PR2 database](https://pr2-database.org/)Celine MercierCeline Mercierhttps://git.metabarcoding.org/obitools/obitools3/-/issues/93Add a findtaxon sub command2021-02-11T11:09:26ZEric CoissacAdd a findtaxon sub commandThat command would plays the same role than the `ecofind`program from the ecoPCR package.That command would plays the same role than the `ecofind`program from the ecoPCR package.https://git.metabarcoding.org/obitools/obitools3/-/issues/91Problem of quoting quotes when reporting obi commands including quotes in the...2021-02-11T10:59:08ZEric CoissacProblem of quoting quotes when reporting obi commands including quotes in the historyWhen you run a command like that one that includes quotes and double quotes
```bash
obi grep -p "len(sequence)>=80 and sequence['COUNT']>=10" \
wolf/cleaned_metadata_sequences \
wolf/denoised_sequences
```
The history is reporting it like that :
```bash
obi grep -p len(sequence)>=80 and sequence['COUNT']>=10 wolf/cleaned_metadata_sequences wolf/denoised_sequences
```
The previous command is not following a correct Unix syntax because the double quotes have been lost.
It induces also error in the generated dot file if you use the -d option of the obi history command.When you run a command like that one that includes quotes and double quotes
```bash
obi grep -p "len(sequence)>=80 and sequence['COUNT']>=10" \
wolf/cleaned_metadata_sequences \
wolf/denoised_sequences
```
The history is reporting it like that :
```bash
obi grep -p len(sequence)>=80 and sequence['COUNT']>=10 wolf/cleaned_metadata_sequences wolf/denoised_sequences
```
The previous command is not following a correct Unix syntax because the double quotes have been lost.
It induces also error in the generated dot file if you use the -d option of the obi history command.https://git.metabarcoding.org/obitools/obitools3/-/issues/79Dictionary efficiency issue2020-04-23T14:02:58ZCeline MercierDictionary efficiency issueHandling of huge dictionaries (typically merged information like merged taxids in reference databases with hundreds of thousands of taxids or merged samples in datasets with thousands of samples) is not efficient as it creates big files that are not mapping-friendly (and occupy a lot of disk space).
There is already a solution half implemented in the form of dictionaries stored as characters strings, but the API to parse them in C is not implemented, so it's not or rarely used. This would be the fastest solution to finish to implement, but eventually a better solution could be developed (e.g. using hash tables implemented in a way that makes the most of the mapping behaviour).Handling of huge dictionaries (typically merged information like merged taxids in reference databases with hundreds of thousands of taxids or merged samples in datasets with thousands of samples) is not efficient as it creates big files that are not mapping-friendly (and occupy a lot of disk space).
There is already a solution half implemented in the form of dictionaries stored as characters strings, but the API to parse them in C is not implemented, so it's not or rarely used. This would be the fastest solution to finish to implement, but eventually a better solution could be developed (e.g. using hash tables implemented in a way that makes the most of the mapping behaviour).Celine MercierCeline Mercier