If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Does marking duplicate step is needed for pooled sample?
I am using pooled RNA seq samples.I had doubt on marking duplicate reads. Please clear my doubt.
From the literatures and manual of SNP calling tools i have read that after mapping the next step is to mark duplicate reads. As my samples is pooled i have very high sequence duplication. So i have used picard tool to remove the duplicates. I have predicted SNPs also.
I am interested to find SNPs in only 10 genes. I have nearly 30 SNPs in those genes.
Then i tried to find SNP by skipping the mark duplicates step. For those 10 genes I have found nearly 60 SNPs.
Then i compared the 30 SNPs found earlier with this 60. All those 30 were found in this 60 with higher SNP quality and read depth.
I was confused whether mark duplicates step is needed in my case. I am giving example below. Please suggest me which is correct one.
SNP found after using mark duplicates.
153333117 C A 249.43 DP=12
153333354 C T 49.68 DP=27
74606669 T G 62.62 DP=3
SNP found by skipping mark duplicates.
153333117 C A 1496.54 DP=56
153333354 C T 105.08 DP=62
74606669 T G 425.86 DP=15
Thanks in advance.