Release 4.0.3

May 2nd, 2023. Release 4.0.3

New features

  • Adding of the function contains to the expression language for testing if a map contains a key. It can be used from obibrep to select only sequences occurring in a given sample :

    obigrep -p 'contains(annotations.merged_sample,"15a_F730814")' wolf_new_tag.fasta
  • Adding of a new command obipcrtag. It tags raw Illumina reads with the identifier of their corresponding sample. The tags added are the same as those added by obimultiplex. The produced forward and reverse files can then be split into different files using the obidistribute command.

    obitagpcr -F library_R1.fastq \
              -R library_R2.fastq \
              -t sample_ngsfilter.txt \
              --out tagged_library.fastq \
              --unidentified not_assigned.fastq

    the command produced four files : tagged_library_R1.fastq and tagged_library_R2.fastq containing the assigned reads and not_assigned_R1.fastq and not_assigned_R2.fastq containing the unassignable reads.

    the tagged library files can then be split using obidistribute:

    mkdir pcr_reads
    obidistribute --pattern "pcr_reads/sample_%s_R1.fastq" -c sample tagged_library_R1.fastq
    obidistribute --pattern "pcr_reads/sample_%s_R2.fastq" -c sample tagged_library_R2.fastq
  • Adding of two options --add-lca-in and --lca-error to obiannotate. These options aim to help during construction of reference database using obipcr. On obipcr output, it is commonly run obiuniq. To merge identical sequences annotated with different taxids, it is now possible to use the following strategie :

    obiuniq -m taxid myrefdb.obipcr.fasta \
    | obiannotate -t taxdump --lca-error 0.05 --add-lca-in taxid \
    > myrefdb.obipcr.unique.fasta

    The obiuniq call merge identical sequences keeping track of the diversity of the taxonomic annotations in the merged_taxid slot, while obiannotate loads a NCBI taxdump and computes the lowest common ancestor of the taxids represented in merged_taxid. By specifying --lca-error 0.05, we indicate that we allow for at most 5% of the taxids disagreeing with the computed LCA. The computed LCA is stored in the slot specified as a parameter of the option --add-lca-in. Scientific name and actual error rate corresponding to the estimated LCA are also stored in the sequence annotation.

Enhancement

  • Rename the forward_mismatches and reverse_mismatches from instanced by obimutiplex into forward_error and reverse_error to be coherent with the tags instanced by obipcr

Corrected bugs

  • Correction of a bug in memory management and Slice recycling.
  • Correction of the --fragmented option help and logging information
  • Correction of a bug in obiconsensus leading into the deletion of a base close to the beginning of the consensus sequence.