We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

All ALT fields are the same after GenomicsDBImport and GenotypeGVCFs

bdemareebdemaree San Francisco, CAMember
Hi GATK team,

I'm running GATK to perform targeted single-cell genotyping, where each sample in the output VCF is a single cell (10-20k samples total per VCF). I've noticed something strange in the final VCF: for all cells genotyped as WT for a given variant, all ALT allele depths are the same.

A snippet of the VCF is provided below as an example. For example, in the first sample, all ALTs have a depth of 6. Furthermore, the DP is also always the sum of the REF and first ALT allele depths (suggesting all other ALTs should probably be 0). In the second site (chr1:115256518), there is a HET call that has the correct depths listed, so this seems to only be an issue with the 0/0 calls.

chr1 115256516 . A G,T,*,C 32261.26 . AC=29,8,1,2;AF=1.566e-03,4.319e-04,5.399e-05,1.080e-04;AN=18522;BaseQRankSum=0.282;DP=4450248;ExcessHet=3.1936;FS=0.000;InbreedingCoeff=0.1634;MLEAC=28,8,1,2;MLEAF=1.512e-03,4.319e-04,5.399e-05,1.080e-04;MQ=41.96;MQRankSum=0.00;QD=2.19;ReadPosRankSum=0.00;SOR=0.291 GT:AD:DP:GQ:PL 0/0:717,6,6,6,6:723:99:0,120,1800,120,1800,1800,120,1800,1800,1800,120,1800,1800,1800,1800 0/0:841,8,8,8,8:849:99:0,120,1800,120,1800,1800,120,1800,1800,1800,120,1800,1800,1800,1800 0/0:292,1,1,1,1:293:99:0,120,1800,120,1800,1800,120,1800,1800,1800,120,1800,1800,1800,18000/0:1034,9,9,9,9:1043:99:0,120,1800,120,1800,1800,120,1800,1800,1800,120,1800,1800,1800,1800 0/0:134,0,0,0,0:134:99:0,120,1800,120,1800,1800,120,1800,1800,1800,120,1800,1800,1800,1800 0/0:130,0,0,0,0:130:99:0,120,1800,120,1800,1800,120,1800,1800,1800,120,1800,1800,1800,1800
chr1 115256518 . T C,A 33446.22 . AC=19,11;AF=1.026e-03,5.939e-04;AN=18522;BaseQRankSum=-1.180e+00;DP=4451040;ExcessHet=3.1125;FS=0.000;InbreedingCoeff=0.1986;MLEAC=19,11;MLEAF=1.026e-03,5.939e-04;MQ=41.96;MQRankSum=0.066;QD=2.08;ReadPosRankSum=0.023;SOR=0.021 GT:AD:DP:GQ:PL 0/0:722,1,1:723:99:0,120,1800,120,1800,1800 0/0:839,10,10:849:99:0,120,1800,120,1800,1800 ... 0/1:692,85,1:778:99:965,0,26646,3041,26933,30335 0/0:658,4,4:662:99:0,120,1800,120,1800,1800

In terms of the pipeline, I'm using HaplotypeCaller, GenomicsDBImport, and GenotypeGVCFs all from GATK I've observed the same behavior in as well. Strangely, using CombineGVCFs (which is very slow and requires iterative merging) does not produce the repeated ALT depths.

Here are the exact commands used for each of the three programs (I'm copying from my Python script so the string formatting is there):

'gatk HaplotypeCaller -R %s -I %s -O %s -L %s ' \
'--emit-ref-confidence BP_RESOLUTION ' \
'--verbosity ERROR ' \
'--native-pair-hmm-threads 1 ' \
'--max-alternate-alleles 2 ' \
'--max-reads-per-alignment-start 0 ' \

'gatk --java-options "-Xmx4g" GenomicsDBImport ' \
'--genomicsdb-workspace-path %s ' \
'--batch-size 50 ' \
'--reader-threads 2 ' \
'--validate-sample-name-map true ' \
'-L %s ' \
'--sample-name-map %s

'gatk --java-options "-Xmx4g" GenotypeGVCFs ' \
'-V %s ' \
'-R %s ' \
'-L %s ' \
'-D %s ' \
'-O %s ' \

I couldn't find any similar issues on the forum. I am fairly sure it's an issue with the GenomicsDB, given I have no issues when using CombineGVCFs instead (but, I could be wrong). Any ideas on what might be going on? Thanks!


Sign In or Register to comment.