os buildgraph - clarify if seeds must be proteins
The help text suggests the seeds must be proteins, but appears to have built in nucleotide seed sets and the next argument suggests the seed reads can be DNA:
$ oa buildgraph -h ... --seeds seeds protein seeds; either a fasta file containing seeds proteic sequences or internal set of seeds among ['nucrRNAAHypogastrura', 'nucrRNAArabidopsis', 'protChloroArabidopsis', 'protMitoCapra', 'protMitoMachaon'] --kup ORGASM:KUP The word size used to identify the seed reads [default: protein=4, DNA=12]
python/orgasm/indexer/_orgasm.pyx it appears to attempt to auto-detect protein vs DNA seeds.
cpdef dict lookForSeeds(self, dict sequences, int kup=-1, int mincov=1,object logger=None): cdef AhoCorasick patterns cdef dict matches cdef str k cdef bint nuc nuc = all([isDNA(sequences[k]) for k in sequences]) if nuc: if logger is not None: logger.info('Matching against nucleic probes') patterns = NucAhoCorasick() kup = 12 if kup < 0 else kup else: if logger is not None: logger.info('Matching against protein probes') patterns = ProtAhoCorasick() kup = 4 if kup < 0 else kup for k in sequences: patterns.addSequence(sequences[k],k,kup) patterns.finalize() #minmatch = 50 if nuc else 15 minmatch = int(self.getReadSize() // (2 if nuc else 6)) matches = patterns.scanIndex(self,minmatch,-1,mincov) return matches
Assuming the seeds can be DNA, clarify the
--seeds help (and explain for example if an entire reference mitochondria could be used, or if the seeds should be fragments only, for example genes from a known mitochondria from a related species).