ngsfilter produces assigned-file with "forward_tag=None" or "reverse_tag=None"
My collegue discovered that running ngsfilter results in an "assigned-file" with many reads being assigned to samples but containing the attribute “forward_tag=None” or “reverse_tag=None” in the header.
I have now checked one of my recent “assigned.fastq”-files (product of running ngsfilter and, and that also contains many of these "pseudo-assigned” reads. (approx 4% of the total!)a
This indicates that ngsfilter assigns reads on the basis of only one matching tag.
Here you can see what my ngsfilter file looks like
Lichen F017R008 TACGACT:ATCGCGA GTGARTCATCGARTCTTTG TCCTCCGCTTATTGATATGC F at-sign
Lichen F103R064 CTTCCTT:GCATGGA GTGARTCATCGARTCTTTG TCCTCCGCTTATTGATATGC F at-sign
Lichen F020R009 ATCAGTC:CGCTCTC GTGARTCATCGARTCTTTG TCCTCCGCTTATTGATATGC F at-sign
Lichen F027R014 CTCTGCT:GCGTCAG GTGARTCATCGARTCTTTG TCCTCCGCTTATTGATATGC F at-sign
And here is the stats on the tags
$ obistat -c forward_tag -c reverse_tag lichen.assigned.fastq
lichen.assigned.fastq 100.0 % |#################################################| ] remain : 00:00:00
forward_tag reverse_tag count total
tacgact atcgcga 82157 82157
None gcgtcag 8303 8303
None atcgcga 1101 1101
None cgctctc 12055 12055
atcagtc cgctctc 667541 667541
cttcctt gcatgga 240579 240579
cttcctt None 4504 4504
None gcatgga 3343 3343
atcagtc None 9706 9706
tacgact None 945 945
PS: I also tested it on the wolf tutorial. Here it is also evident, although only one sample:
$ obistat -c forward_tag -c reverse_tag wolf.ali.assigned.fastq
wolf.ali.assigned.fastq 99.3 % |#################################################- ] remain : 00:00:00
forward_tag reverse_tag count total
gcctcct gcctcct 9851 9851
gaatatc gaatatc 14700 14700
aattaac aattaac 9056 9056
gcctcct None 1 1
gaagtag gaagtag 9724 9724
Looking at actual corresponding (raw) reads, it is evident that the tag (sequence) indicated as “None” is actually not present in the raw reads, and that the sequence should not be included as assigned
Are you aware of this potential problem?
(edit: changed to at-signs in the ngsfilter-file (this post) to "at-sign", as it interfered with the layout, and added some double new-lines to ease the reading)