missed variant calling for amplicon-based sequencing data using HaplotypeCaller

Hi all,
I am using GATK v3.7 HaplotypeCaller to genotype 2000 dbSNP variants, including SNPs and INDELs, from amplicon-based sequencing data.
--alleles is applied in HC, however, nearly 100 SNPs can not be successfully called in VCF file, either on simulation or real data (700~1000X).
UG is also tested in our pipeline, all the variants can be called, although realignment and BQSR is time consuming.
We are wondering the right command line for HC to call all our variants.
Or if UG is still the best way, can realignment and BQSR be skipped for our situation?
Great thanks,
Junfeng
Tagged:
Answers
Hi @asaki,
did you try the common tipps for running HaplotypeCaller ? You can find them in this article Article1235 . There were also some Threads which are dealing with HaplotypeCaller and Amplicon-Data. See thefollowing threads:
haplotypecaller-not-catching-variants , why-is-haplotypecaller-not-calling-this-sanger-sequencing-confirmed-variant#latest, two-validated-variants-missed-by-haplotypecaller-using-mip-data-amplicon-like-data.
Generally you should explicit deactivate DownSampling in all workflows-steps.
Hope that helps,
Greetings EADG
I am actually likely to play with HC, however, it seems that re-assemble the reads around target region to become haplotypes might lead to mess the mapping. If new version HC can fix this, plz correct me.
Changing Kmer size seems can partially fix the problem, however, it can increase false positives.
And if the reads can mapped to multiple regions, etc., pseudogene, repeat region, the variants will not be called due to low or 0 mapping quality.
Any suggestions?
@asaki
Hi Junfeng,
Did you mark duplicates? Can you post some IGV screenshots of the BAM files where HaplotypeCaller is not calling a variant?
Thanks,
Sheila
@Sheila
Hi Sheila, markduplicates were neglected since we used PCR based target enrichment method.
I just upload those screenshots in another thread, you can take a look.