Is removing duplicates appropriate with pooled data?
Hello, I am a graduate student in lab that studies evolution, and I am relatively new to NGS. I have been given reads from pooled moth samples, and I am hoping to identify variants with the ultimate goal of quantifying the genetic differentiation between two strains of moths. I am wondering 1) if it is appropriate/recommended to remove duplicates with pooled data and 2) more broadly, are there particular situations in which removing duplicates is not suggested? For example, I have another data set in which the fragments were not generated by random shearing but rather by multiplex PCR of 17 particular amplicons for 42 different individual moths (not pooled). I'm guessing that removing duplicates doesn't make sense in this case because there will be lots of reads that start at the exact same position relative to the reference. Is this right?
Thanks a bunch!