OBITools issueshttps://git.metabarcoding.org/obitools/obitools/-/issues2017-09-29T05:28:38Zhttps://git.metabarcoding.org/obitools/obitools/-/issues/16Add a new tools selecting the most aboundant sequences per samples2017-09-29T05:28:38ZEric CoissacAdd a new tools selecting the most aboundant sequences per samplesTo build the reference database, it would be nice to be able to extract the n most abundant sequences per samples To build the reference database, it would be nice to be able to extract the n most abundant sequences per samples https://git.metabarcoding.org/obitools/obitools/-/issues/15SILVA reference database parsing2017-09-29T05:28:39ZCeline MercierSILVA reference database parsingSomeone would be interested in having `obiaddtaxids` parse SILVA reference databases (it used to be able to do it but the format probably changed). I'm not sure if there are several SILVA formats or only this one:
```
>AAAA02038008.4342.6342 Eukaryota;Opisthokonta;Holozoa;Metazoa (Animalia);Eumetazoa;Bilateria;Arthropoda;Hexapoda;Insecta;Pterygota;Neoptera;Diptera;Oryza sativa Indica Group
UCUGGUUGAUCCUGCCAGUAGUU.......
>AAAB01001705.32.640 Eukaryota;Opisthokonta;Holozoa;Metazoa (Animalia);Eumetazoa;Bilateria;Arthropoda;Hexapoda;Insecta;Pterygota;Neoptera;Diptera;Anopheles gambiae str. PEST
UGAUAUACGCUCGUCUCAAAGGU.....
```
Should I write a parser for that format in `obiaddtaxids`?Someone would be interested in having `obiaddtaxids` parse SILVA reference databases (it used to be able to do it but the format probably changed). I'm not sure if there are several SILVA formats or only this one:
```
>AAAA02038008.4342.6342 Eukaryota;Opisthokonta;Holozoa;Metazoa (Animalia);Eumetazoa;Bilateria;Arthropoda;Hexapoda;Insecta;Pterygota;Neoptera;Diptera;Oryza sativa Indica Group
UCUGGUUGAUCCUGCCAGUAGUU.......
>AAAB01001705.32.640 Eukaryota;Opisthokonta;Holozoa;Metazoa (Animalia);Eumetazoa;Bilateria;Arthropoda;Hexapoda;Insecta;Pterygota;Neoptera;Diptera;Anopheles gambiae str. PEST
UGAUAUACGCUCGUCUCAAAGGU.....
```
Should I write a parser for that format in `obiaddtaxids`?https://git.metabarcoding.org/obitools/obitools/-/issues/9obiaddtaxids: add a parser for the UNITE 'general FASTA release' format2017-09-29T05:28:38ZCeline Mercierobiaddtaxids: add a parser for the UNITE 'general FASTA release' formatThere are 2 formats used by UNITE for the fasta headers.
obiaddtaxids parses the one used for the 'Full UNITE+INSD dataset'.
Another parser could be added for the format used by the 'general FASTA release'.
Example for the header format used by the 'general FASTA release':
```
>Glomeraceae|AM076560|SH146432.05FU|refs|k__Fungi;p__Glomeromycota;c__Glomeromycetes;o__Glomerales;f__Glomeraceae;g__;s__uncultured_Glomus
```There are 2 formats used by UNITE for the fasta headers.
obiaddtaxids parses the one used for the 'Full UNITE+INSD dataset'.
Another parser could be added for the format used by the 'general FASTA release'.
Example for the header format used by the 'general FASTA release':
```
>Glomeraceae|AM076560|SH146432.05FU|refs|k__Fungi;p__Glomeromycota;c__Glomeromycetes;o__Glomerales;f__Glomeraceae;g__;s__uncultured_Glomus
```Celine MercierCeline Mercier