If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

HaplotypeCaller: -G Standard -G AS_Standard

edited July 2016 in GenomeSTRiP

Hi everyone,

I am running HaplotypeCAller of GATK-3.6, My command-line is:
java -Xmx20g -jar ./GATK-3.6/GenomeAnalysisTK.jar -T HaplotypeCaller \
-nct 30 -rf BadCigar -log $LOG/file.log \
-R $REF \
-I $BQSR/BQSR_Realign_Dedup_Sort_sample_PE.bam \
-D $KNOWN/dbsnp_138.b37.vcf \
--variant_index_type LINEAR \
--variant_index_parameter 128000 \
-G Standard -G AS_Standard \
-o $OUTPUT/RAW_sample_snp_indels_AS.g.vcf

In the output VCF file: for both in INFO header section and in the variant specific info section "ClippingRankSum" is missing. Moreover, In the INFO header section, Allele specific tags were mentioned:
INFO=<ID=AS_InbreedingCoeff,Number=A,Type=Float,Description="allele specific heterozygosity as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg ex
INFO=<ID=AS_QD,Number=1,Type=Float,Description="Allele-specific Variant Confidence/Quality by Depth">
INFO=<ID=AS_RAW_BaseQRankSum,Number=1,Type=String,Description="raw data for allele specific rank sum test of base qualities">
INFO=<ID=AS_RAW_MQ,Number=A,Type=Float,Description="Allele-specfic raw data for RMS Mapping Quality">
INFO=<ID=AS_RAW_MQRankSum,Number=1,Type=String,Description="Allele-specific raw data for Mapping Quality Rank Sum">
INFO=<ID=AS_RAW_ReadPosRankSum,Number=1,Type=String,Description="allele specific raw data for rank sum test of read position bias">
INFO=<ID=AS_SB_TABLE,Number=1,Type=String,Description="Allele-specific forward/reverse read counts for strand bias tests">

But they are missing in the variants INFO section.

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 Sample7 Sample8 Sample9 Sample10 Sample11 Sample12

1 12783 . G A 1436.11 . AC=5;AF=0.278;AN=18;BaseQRankSum=3.76;DP=136;ExcessHet=9.5122;FS=0.000;MLEAC=7;MLEAF=0.389;MQ=26.36;MQRankSum=-7.480e-01;QD=12.60;ReadPosRankSum=0.067;SOR=0.765 GT:AD:DP:GQ:PL ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 0/0:15,0:15:0:0,0,306 0/0:3,0:3:0:0,0,22 0/0:1,0:1:3:0,3,29 0/0:3,0:3:0:0,0,30 0/1:10,4:.:80:80,0,243 0/1:12,9:.:99:217,0,254 ./.:0,0:0:.:0,0,0 0/1:7,10:.:99:262,0,134 0/1:6,5:.:99:124,0,142 0/1:20,31:.:99:790,0,432

1 12807 . C T 222.05 . AC=2;AF=0.100;AN=20;BaseQRankSum=3.75;DP=222;ExcessHet=3.2451;FS=0.000;InbreedingCoeff=-0.1126;MLEAC=2;MLEAF=0.100;MQ=26.32;MQRankSumm=-1.806e+00;QD=2.27;ReadPosRankSum=1.74;SOR=0.061 GT:AD:DP:GQ:PL ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 0/0:27,0:27:81:0,81,855 0/0:13,0:13:39:0,39,451 0/0:6,0:6:18:0,18,199 0/0:12,0:12:36:0,36,407 0/0:24,0:24:42:0,42,741 0/1:29,8:.:99:115,0,790 0/0:7,0:7:21:0,21,242 0/0:20,0:20:60:0,60,672 0/0:15,0:15:45:0,45,427 0/1:50,11:.:99:148,0,1325

So My questions are:

  1. ClippingRankSum parameter is missing with -G Standard -G AS_Standard parameters, is it normal??
  2. Allele Specific annotations (i.e. present in INFO header were not added in variant section, Why?
  3. Why the InbreedingCoeff is not calculated for every variant?



    edited July 2016

    The resultant VCF records above were shown after joint genotyping with "GenotypeGVCFs"... with command line:
    java -Xmx20g -jar ./GATK-3.6/GenomeAnalysisTK.jar -T GenotypeGVCFs \
    -nt 50 -rf BadCigar -log $LOG/QJ-Joint-HC.log \
    -R $REF \
    -V $OUTPUT/RAW_Sample1_snp_indels_AS.g.vcf \
    -V $OUTPUT/RAW_Sample2_snp_indels_AS.g.vcf \
    -V $OUTPUT/RAW_Sample3_snp_indels_AS.g.vcf \
    -V $OUTPUT/RAW_Sample4_snp_indels_AS.g.vcf \
    -V $OUTPUT/RAW_Sample5_snp_indels_AS.g.vcf \
    -V $OUTPUT/RAW_Sample6_snp_indels_AS.g.vcf \
    -V $OUTPUT/RAW_Sample7_snp_indels_AS.g.vcf \
    -V $OUTPUT/RAW_Sample8_snp_indels_AS.g.vcf \
    -V $OUTPUT/RAW_Sample9_snp_indels_AS.g.vcf \
    -V $OUTPUT/RAW_Sample10_snp_indels_AS.g.vcf \
    -V $OUTPUT/RAW_Sample11_snp_indels_AS.g.vcf \
    -V $OUTPUT/RAW_Sample12_snp_indels_AS.g.vcf \
    -D $KNOWN/dbsnp_138.b37.vcf \
    -o $JOINT/RAW_All_snp_indels_12SAMPLE.vcf

    May be i did not mention -G parameter in the command-line while joint genotyping that's why the Allele Specific annotations were absent... But what about the ClippingRankSum and imbreedingcoefficient annotation?

    I also noted some instances where in the absence of -G standard -G AS_Standard tags in command-line, some annotations that were appeared by default in previous GATK versions such as MQ is missing in the INFO field in GATK-3.4-46..

This discussion has been closed.