Issue Demultiplexing Samples with ngsfile
Open
Issue Demultiplexing Samples with ngsfile
Hello,
I recently received data back to analyze wolf diet composition from scat samples. The data contained 384 dual-indexed samples. Using OBITools3 (Version 3.0.1b18), I have been able to import my fastq files, alignpairedend, and filter based on alignment score, but I am having difficulty with demultiplexing the data. When I run the following command:
obi ngsfilter -t wolf/ngsfile -u wolf/unidentified_sequences wolf/good_sequences wolf/identified_sequences
It runs, but the output is zero remaining sequences. I am still unsure if I am formatting my ngsfile correctly and have included the first line of the file below:
TX1 MTU_001_1 CGAGAGTT:ATCGTACG TTAGATACCCCACTATGC TAGAACAGGCTCCTCTAG F @
In the third column of the ngsfile, should the i5 or the i7 index be listed first, and do either of the indexes need to be reported in the reverse complement?
I also used Illumina's BaseSpace software to export my data already demultiplexed to make sure that my data contained indexes. I was able to retrieve most of the samples where I received a fastq file for each sample. I tried running the OBITools3 pipeline on a single fastq file from a single sample, but ran into the same issue as before where the ngsfilter command resulted in zero remaining reads. I used the following for the ngsfile:
TX1 MTU_001_1 -:- TTAGATACCCCACTATGC TAGAACAGGCTCCTCTAG F @
Thanks in advance,
Sam
Hi Celine,
Thank you for your quick response and I am sorry it took so long to respond! I am pasting the first five sequences from the forward reads in the fastq file output for one of my samples that were demultiplexed using Illumina BaseSpace software: @M00573:55:000000000-GB75C:1:1101:14566:2408 1:N:0:GATCTACG+TCATCGAG TTAGATACCCCACTATGCTTAGCCCTAAACATAGATAATTTTACAACAAAATAATTCGCCAGAGGACTACTAGCAATAGCTTAAAACTCAAAGGACTTGGCGGTGCTTTATATCCCTCTAGAGGAGCCTGTTCTA + BBBBBFFFFFBBGFGGGGGGGGFHHHHHHHHHHHHHHHHHHFHGHHHHHGHHHHHHHGGFGEHHGHHHHHGHHHHHHHHHHHHHGHHHFHHHHHHHGHHHGGG>EEFHHHHF4EGHHHHEHHHHHGHGBAGHGHE @M00573:55:000000000-GB75C:1:1101:17182:2713 1:N:0:GATCTACG+TCATCGAG TTAGATACCCCACTATGCTTAGCCCTAAACATAGATAATTTTACAACAAAATAATTCGCCAGAGGACTACTAGCAATAGCTTAAAACTCAAAGGACTTGGCGGTGCTTTATATCCCTCTAGAGGAGCCTGTTCTG + ABBBAFFFFFBBGGGGGGGFGGHHHHHHHHHHHHHFHHHHHHHHHHHHHHGHHHHHHGGGGGHHGHHHHFGHHHHHHHHHHHHHHHHHHHHHHGHHHHHHHEEEECGFHHHHGHHHGHHHHFHHHDEGHHHHHHH @M00573:55:000000000-GB75C:1:1101:14303:2790 1:N:0:GATCTACG+TCATCGAG TTAGATACCCCACTATGCTTAGCCCTAAACATAGATAATTTTACAACAAAATAATTCGCCAGAGGACTACTAGCAATAGCTTAAAACTCAAAGGACTTGGCGGTGCTTTATATCCCTCTAGAGGAGCCTGTTCTG + DDDDDFFFFFDDGGGGGGGGGGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGGGGGHHGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGGEGGHHHHHHHHHHHHGHHHGHHGFGHHHHHHH @M00573:55:000000000-GB75C:1:1101:17626:2899 1:N:0:GATCTACG+TCATCGAG TTAGATACCCCACTATGCTTAGCCCTAAACATAGATAATTTTACAACAAAATAATTCGCCAGAGGACTACTAGCAATAGCTTAAAACTCAAAGGACTTGGCGGTGCTTTATATCCCTCTAGAGGAGCCTGTTCTG + CCCCBFFFFFCCGGGGGGGGGGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGGGGGHHGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGGEGGHHHHGHHHHHHGHHGHHHFGFGHHHHHHH @M00573:55:000000000-GB75C:1:1101:19385:2922 1:N:0:GATCTACG+TCATCGAG TTAGATACCCCACTATGCTTAGCCCTAAACATAGATAATTTTACAACAAAATAATTCGCCAGAGGACTACTAGCAATAGCTTAAAACTCAAAGGACTTGGCGGTGCTTTATATCCCTCTAGAGGAGCCTGTTCTA + AAABBFFFFFBBGGGGGGGGGGHHHHHHHGFHHGHHHHHHHHFHHHHHHHHHHHHHHGGGGGGHGHHHHHHHHHHGHHHHHHHGHHHHHHHGHHGHHHHHHGGEEEHHHHHHFFHHHHHFFFFHHGEFGGF34HF
The Illumina BaseSpace software demultiplexed each sample and trimmed sequences beyond a specified trimming adapter so all that remains of the sequence are the locus-specific primers and the target region. For this study we used the following primers:
Forward Primer- TTAGATACCCCACTATGC
Reverse Primer- YAGAACAGGCTCCTCTAG
The "Y" in the reverse primer is a degenerate base that can either be a "C" or "T"
I have also attached the fastq files for this specific sample for reference as well.
MTU1825replicate1_S209_L001_R1_001.fastq
MTU1825replicate1_S209_L001_R2_001.fastq
Thank you for your help with this!
Best,
Sam