To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

Unusual False Positives from HC/UG on edges of small, very high-coverage regions

nhvanlienhvanlie Member
edited April 2013 in Ask the GATK team

Hey there,

I'm currently working on a project that uses the MiSeq system to perform very deep sequencing (5000x coverage) on a small set of genes related to a particular cancer type. I have had some odd calls with HaplotypeCaller and UnifiedGenotyper, especially near the edges of the sequenced regions. It seems that often Haplotype calls very large deletions and Unified calls SNPs, e.g. (from the VCFs):

Haplotype Caller:

chr2 212576988 . TATTTTTAATTGTA T 13.95 LowQual AC=1;AF=0.500;AN=2;BaseQRankSum=1.805;ClippingRankSum=-0.916;DP=1000;FS=8.285;MLEAC=1;MLEAF=0.500;MQ=59.98;MQ0=0;MQRankSum=0.836;QD=0.00;ReadPosRankSum=0.528 GT:AD:GQ:PL 0/1:308,53:42:42,0,8418

Unified Genotyper:

chr2 212576990 . T C 555.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=-2.991;DP=1000;Dels=0.00;FS=0.000;HaplotypeScore=22.6027;MLEAC=1;MLEAF=0.500;MQ=59.98;MQ0=0;MQRankSum=0.073;QD=0.56;ReadPosRankSum=0.164 GT:AD:DP:GQ:PL 0/1:860,130:3743:99:584,0,32767

A screenshot from IGV showing some of the (few) BAM records in the area of these calls is attached.

Many of these false variant calls can be removed through hard filtering, but I was a little surprised to see them called in the first place, because our raw results had been quite good with lower-coverage data. Is this to be expected with high coverage data?

I am also wondering if there is a hard cap after which downsampling occurs. I set -dcov 10000 for all our samples, but the depth reported in the DP tag seemed max out at 1000. Is there another parameter I need to change? In some areas, due to the very small regions being sequenced, I have coverage in excess of 15,000 reads, so downsampling to 1000 might skew the results.

Any guidance is appreciated!

Best Answer


Sign In or Register to comment.