Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

understand HaplotypeCaller output vcf format

Hi there,

I am using GATK4.1.0.0 version on germline pair-end illumina WGS data with following command:

```
gatk4.1.0.0 --java-options '-Xmx5G' HaplotypeCaller -R broad_hg38_v0_Homo_sapiens_assembly38.fas
ta -I sample1.final.cram -L chr19 -O sample1_chr19_SA_newqual.g.vcf.gz --use-new-qual-calculator -ERC GVC
F -G StandardAnnotation -G StandardHCAnnotation -G AS_StandardAnnotation -GQB 10 -GQB 20 -GQB 30 -GQB 40 -GQB 50 -GQB 60 -GQB 70 -GQB 80 -GQB 90
```

After uncompress the sampleA_chr19_SA_newqual.g.vcf.gz file I get
```chr19 3104631 . T C,<NON_REF> 1242.03 . AS_RAW_BaseQRankSum=||;AS_RAW_MQ=0.00|115200.00|0.00;AS_RAW_MQRankSum=||;AS_RAW_ReadPosRankSum=||;AS_SB_TABLE=0,0|18,14|0,0;DP=32;ExcessHet=3.0103;MLEAC=2,0;MLEAF=1.00,0.00;RAW_MQandDP=115200,32 GT:AD:DP:GQ:PGT:PID:PL:PS:SB 1|1:0,32,0:32:96:0|1:3104631_T_C:1256,96,0,1256,96,1256:3104631:0,0,18,14
chr19 3104654 . T C,<NON_REF> 458.60 . AS_RAW_BaseQRankSum=|0.0,1|NaN;AS_RAW_MQ=50400.00|57600.00|0.00;AS_RAW_MQRankSum=|0.0,1|NaN;AS_RAW_ReadPosRankSum=|-0.4,1|NaN;AS_SB_TABLE=9,5|10,6|0,0;BaseQRankSum=0.000;DP=30;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=0.000;RAW_MQandDP=108000,30;ReadPosRankSum=-0.396 GT:AD:DP:GQ:PGT:PID:PL:PS:SB 0|1:14,16,0:30:99:0|1:3104631_T_C:466,0,396,508,445,953:3104631:9,5,10,6
```

I look at the site in another few samples.
SampleB
```
chr19 3104631 . T C,<NON_REF> 456.60 . AS_RAW_BaseQRankSum=|-0.9,1|NaN;AS_RAW_MQ=39600.00|46800.00|0.00;AS_RAW_MQRankSum=|0.0,1|NaN;AS_RAW_ReadPosRankSum=|-1.2,1|NaN;AS_SB_TABLE=4,7|7,6|0,0;BaseQRankSum=-0.836;DP=25;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=0.000;RAW_MQandDP=90000,25;ReadPosRankSum=-1.130 GT:AD:DP:GQ:PGT:PID:PL:PS:SB 0|1:11,13,0:24:99:0|1:3104631_T_C:
464,0,690,501,729,1229:3104631:4,7,7,6
chr19 3104654 . T C,<NON_REF> 491.60 . AS_RAW_BaseQRankSum=|-0.9,1|NaN;AS_RAW_MQ=39600.00|46800.00|0.00;AS_RAW_MQRankSum=|0.0,1|NaN;AS_RAW_ReadPosRankSum=|-0.7,1|NaN;AS_SB_TABLE=3,8|7,6|0,0;BaseQRankSum=-0.836;DP=25;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=0.000;RAW_MQandDP=90000,25;ReadPosRankSum=-0.637 GT:AD:DP:GQ:PGT:PID:PL:PS:SB 0|1:11,13,0:24:99:0|1:3104631_T_C:
499,0,655,532,697,1229:3104631:3,8,7,6
```
SampleC
```
chr19 3104631 . T C,<NON_REF> 1684.03 . AS_RAW_BaseQRankSum=||;AS_RAW_MQ=0.00|154800.00|0.00;AS_RAW_MQRankSum=||;AS_RAW_ReadPosRankSum=||;AS_SB_TABLE=0,0|16,27|0,0;DP=43;ExcessHet=3.0103;MLEAC=2,0;MLEAF=1.00,0.00;RAW_MQandDP=154800,43 GT:AD:DP:GQ:PGT:PID:PL:PS:SB 1|1:0,43,0:43:99:0|1:3104631_T_C:1698,129,0,1698,129,1698:3104631:0,0,16,27
chr19 3104654 . T C,<NON_REF> 582.60 . AS_RAW_BaseQRankSum=|-1.0,1|NaN;AS_RAW_MQ=79200.00|75600.00|0.00;AS_RAW_MQRankSum=|0.0,1|NaN;AS_RAW_ReadPosRankSum=|0.6,1|NaN;AS_SB_TABLE=8,14|10,11|0,0;BaseQRankSum=-0.977;DP=43;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=0.000;RAW_MQandDP=154800,43;ReadPosRankSum=0.693 GT:AD:DP:GQ:PGT:PID:PL:PS:SB 0|1:22,21,0:43:99:0|1:3104631_T_C:
590,0,635,656,699,1355:3104631:8,14,10,11
```

I know that I can look into the bam files in the IGV to know the haplotype of sampleA, sampleB and sampleC . However, I have too many samples to take care of, I would like to get some suggestion to automate the process.

And in the later steps I plan to do CombineGVCFs and GenotypeGVCFs for all subjects. I wonder if the predicted haplotype in this step can be kept to group g.vcf.gz file. If so, any particular options I need to turn on for the following two steps. Any suggestion are appreciated.

Xin

Best Answer

Answers

Sign In or Register to comment.