If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Is removing duplicates appropriate with pooled data?
Hello, I am a graduate student in lab that studies evolution, and I am relatively new to NGS. I have been given reads from pooled moth samples, and I am hoping to identify variants with the ultimate goal of quantifying the genetic differentiation between two strains of moths. I am wondering 1) if it is appropriate/recommended to remove duplicates with pooled data and 2) more broadly, are there particular situations in which removing duplicates is not suggested? For example, I have another data set in which the fragments were not generated by random shearing but rather by multiplex PCR of 17 particular amplicons for 42 different individual moths (not pooled). I'm guessing that removing duplicates doesn't make sense in this case because there will be lots of reads that start at the exact same position relative to the reference. Is this right?
Thanks a bunch!