Issues aligning paired end sequences with alignpairedend
Hi everyone.
I am having an issue with the alignpairedend command. When I run the following code
'obi alignpairedend -R skink/reads2 skink/reads1 skink/aligned_reads'
'obi alignpairedend -R skink/reads2 skink/reads1 > aligned_reads.fastq'
for many reads the output contains only the overlapping portion of the sequence and seems to trim the non overlapping parts. Therefore, there is no primer sequence or tag left to assign the read to a sample in the subsequent ngsfilter command even though the primer and tag sequences are present in the raw read for example the read below contains the forward and reverse tags and primer sequences but doesn't after alignment.
Fwd read
@M05779:46:000000000-G6W6W:1:1101:14359:1799 1:N:0:CGATGT TGCTAGTCGCATGGTGCATGGCCGTTCTTAGTTGGTGGAGTGATTTGTCTGGTTTATTCCGATAACGAGCGAGACTCTAGCCTACTAAATAGTTATTGGATAATTTTCTAACTGTCCAACTAACTTCTTAGAGGGACAAGCGGTGTTCAGC +
3>A?FFFBD@AGFGFFCFFCCFEGGFGHFDGHHHGGG2FHFFHHHFHHHHFHHHFHHHFH?EEGHGAEEC?EEEGFHHHGGHFGHFBBGFHHHFHHHGFHGFHHHHHHHFHHHHHHFGGHFBGHHHHHFHBGHDCFFEFCECGGGGHHFF 1>1>1B>>A>FB1F1A1FE10B0EFCFH21GGBHE0100A/FGGFGFGFBGHF1GHGF0/EAABEAA>E///EGFCA@@EGCCGFB1FB1FGFECA//<<BC@E?A?CFC//@@CC@.DFGHFHHFD0D/..<<<AGF0.:EGEG0C::B-
Rev read
@M05779:46:000000000-G6W6W:1:1101:14359:1799 2:N:0:CGATGT CATATGTCAGCATCTAAGGGCATCACAGACCTGTTATTGCTCAGCTTCGTGCGGCTGAACACCGCTTGTCCCTCTAAGAAGTTAGTTGGACAGTTAGAAAATTATCCAATAACTATTTAGTAGGCTAGAGTCTCGCTCGTTATCGGAATA +
1A1AFFFBDCBGFGB131CCEGF1A1A1FGGHHHDHHCFGHFEEHHGEHFEE?EEEHEGFGECEGGEHHHHHHHBDDBDHHHFFGHFFBGBFHHFFFFFFHHEHHFGFGFFGHFHHHGHHFBBGHHGFFHGHGFEGECGGHHHHG?EFF
The same sequence recored after alignpairedend has been run.
@M05779:46:000000000-G6W6W:1:1101:14359:1799 score=97.0; COUNT=1; shift=53; seq_length=97; mode=alignment; overlap_length=97; score_norm=1.0; ali_direction=right; 1:N:0:CGATGT gctgaacaccgcttgtccctctaagaagttagttggacagttagaaaattatccaataactatttagtaggctagagtctcgctcgttatcggaata + C<C<DGFG@DDDGFDC;;8DDEGF<D<D<GFGHGECGGEGGGGFFGGGFGGFFCFFFGFGGGFEEGGFGEGHHHHEEFBEHFEFFFGDEBGDFGHGF
This is true for a high proportion of my sequences and I assume it is something I am doing wrong but I cannot work out what!
Best,
Tom