asymmetry in simpleLCS()
Dear OBItools team,
I tried to understand how OBItools ECOTAG exactly finds the best matching hit, i.e how it determines the longest common substring (LCS) and the shortest alignment corresponding to this LCS.
I think that I found most of the source code here: https://git.metabarcoding.org/obitools/obitools/tree/master/src; if I compare two identical sequences with 'simpleLCS' (in src/obitools/align/_lcs.ext.1.c) and a base pair is added to the beginning of one of the sequences, this is evaluated as mismatch (i.e. LCS is reduced by one), whereas a base pair added to the end of a sequence is just being ignored (i.e. LCS stays the same). There seems to be an issue with symmetry; e.g. if: s1 = 'acccctttgcccatatcggccctagctctc' s2 = 'acccctttgcccatatcggccctagctct' s3 = 'cccctttgcccatatcggccctagctctc' then: simpleLCS(s1,s1), simpleLCS(s1,s2), simpleLCS(s2,s1), and simpleLCS(s1,s3) deliver the same values, but not simpleLCS(s3,s1).
Could you help me understand this, please?
Best regards, Lara