Any extra filters when using Haloplex or amplicon sequencing data?
Our group has a Haloplex dataset from the old days when Agilent wasn't marking the PCR duplicates with barcodes. So unfortunately, deduplication is out of question. 61 genes were sequenced in 150 individuals. I ran the Best Practices for GATK4. And used hard filtering settings. I visualized the difference between dbSNP and novel variants, to set the filters (thanks for the tutorials, they helped a lot). The data look good in the sense that we have 99% of the variants in the dbSNP, and not too many variants were filtered. So I am confident that the false positive rate is low. I am more worried about genotyping, more precisely, worried that there would be more homozygotes because of PCR biases.
Are there any other filtering options that could be useful for working with such a dataset? If there are some from amplicon sequencing, I think that could also be helpful, because the problems that could possibly arise are the same. Are there some extra measures that one can use to ensure that the homozygotes called for an amplicon sequencing are really homozygotes? Is it worth it to run the analysis with the Unified Genotyper as well? (I read that a couple of years ago HC used to mess up genotypes while UG didn't, but I guess things changed for the better)