Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

All ALT fields are the same after GenomicsDBImport and GenotypeGVCFs

bdemareebdemaree San Francisco, CAMember
Hi GATK team,

I'm running GATK 4.1.3.0 to perform targeted single-cell genotyping, where each sample in the output VCF is a single cell (10-20k samples total per VCF). I've noticed something strange in the final VCF: for all cells genotyped as WT for a given variant, all ALT allele depths are the same.

A snippet of the VCF is provided below as an example. For example, in the first sample, all ALTs have a depth of 6. Furthermore, the DP is also always the sum of the REF and first ALT allele depths (suggesting all other ALTs should probably be 0). In the second site (chr1:115256518), there is a HET call that has the correct depths listed, so this seems to only be an issue with the 0/0 calls.

```
chr1 115256516 . A G,T,*,C 32261.26 . AC=29,8,1,2;AF=1.566e-03,4.319e-04,5.399e-05,1.080e-04;AN=18522;BaseQRankSum=0.282;DP=4450248;ExcessHet=3.1936;FS=0.000;InbreedingCoeff=0.1634;MLEAC=28,8,1,2;MLEAF=1.512e-03,4.319e-04,5.399e-05,1.080e-04;MQ=41.96;MQRankSum=0.00;QD=2.19;ReadPosRankSum=0.00;SOR=0.291 GT:AD:DP:GQ:PL 0/0:717,6,6,6,6:723:99:0,120,1800,120,1800,1800,120,1800,1800,1800,120,1800,1800,1800,1800 0/0:841,8,8,8,8:849:99:0,120,1800,120,1800,1800,120,1800,1800,1800,120,1800,1800,1800,1800 0/0:292,1,1,1,1:293:99:0,120,1800,120,1800,1800,120,1800,1800,1800,120,1800,1800,1800,18000/0:1034,9,9,9,9:1043:99:0,120,1800,120,1800,1800,120,1800,1800,1800,120,1800,1800,1800,1800 0/0:134,0,0,0,0:134:99:0,120,1800,120,1800,1800,120,1800,1800,1800,120,1800,1800,1800,1800 0/0:130,0,0,0,0:130:99:0,120,1800,120,1800,1800,120,1800,1800,1800,120,1800,1800,1800,1800
...
chr1 115256518 . T C,A 33446.22 . AC=19,11;AF=1.026e-03,5.939e-04;AN=18522;BaseQRankSum=-1.180e+00;DP=4451040;ExcessHet=3.1125;FS=0.000;InbreedingCoeff=0.1986;MLEAC=19,11;MLEAF=1.026e-03,5.939e-04;MQ=41.96;MQRankSum=0.066;QD=2.08;ReadPosRankSum=0.023;SOR=0.021 GT:AD:DP:GQ:PL 0/0:722,1,1:723:99:0,120,1800,120,1800,1800 0/0:839,10,10:849:99:0,120,1800,120,1800,1800 ... 0/1:692,85,1:778:99:965,0,26646,3041,26933,30335 0/0:658,4,4:662:99:0,120,1800,120,1800,1800
```

In terms of the pipeline, I'm using HaplotypeCaller, GenomicsDBImport, and GenotypeGVCFs all from GATK 4.1.3.0. I've observed the same behavior in 4.1.2.0 as well. Strangely, using CombineGVCFs (which is very slow and requires iterative merging) does not produce the repeated ALT depths.

Here are the exact commands used for each of the three programs (I'm copying from my Python script so the string formatting is there):

```
'gatk HaplotypeCaller -R %s -I %s -O %s -L %s ' \
'--emit-ref-confidence BP_RESOLUTION ' \
'--verbosity ERROR ' \
'--native-pair-hmm-threads 1 ' \
'--max-alternate-alleles 2 ' \
'--max-reads-per-alignment-start 0 ' \

'gatk --java-options "-Xmx4g" GenomicsDBImport ' \
'--genomicsdb-workspace-path %s ' \
'--batch-size 50 ' \
'--reader-threads 2 ' \
'--validate-sample-name-map true ' \
'-L %s ' \
'--sample-name-map %s

'gatk --java-options "-Xmx4g" GenotypeGVCFs ' \
'-V %s ' \
'-R %s ' \
'-L %s ' \
'-D %s ' \
'-O %s ' \
'--include-non-variant-sites'
```

I couldn't find any similar issues on the forum. I am fairly sure it's an issue with the GenomicsDB, given I have no issues when using CombineGVCFs instead (but, I could be wrong). Any ideas on what might be going on? Thanks!

Answers

Sign In or Register to comment.