Service notice: Several of our team members are on vacation so service will be slow through at least July 13th, possibly longer depending on how much backlog accumulates during that time. This means that for a while it may take us more time than usual to answer your questions. Thank you for your patience.

HaplotypeCaller: -G Standard -G AS_Standard

edited July 2016 in GenomeSTRiP

Hi everyone,

I am running HaplotypeCAller of GATK-3.6, My command-line is:
java -Xmx20g -jar ./GATK-3.6/GenomeAnalysisTK.jar -T HaplotypeCaller \
-nct 30 -rf BadCigar -log $LOG/file.log \
-R $REF \
-I $BQSR/BQSR_Realign_Dedup_Sort_sample_PE.bam \
-D $KNOWN/dbsnp_138.b37.vcf \
--variant_index_type LINEAR \
--variant_index_parameter 128000 \
-G Standard -G AS_Standard \
-o $OUTPUT/RAW_sample_snp_indels_AS.g.vcf

In the output VCF file: for both in INFO header section and in the variant specific info section "ClippingRankSum" is missing. Moreover, In the INFO header section, Allele specific tags were mentioned:
INFO=<ID=AS_InbreedingCoeff,Number=A,Type=Float,Description="allele specific heterozygosity as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg ex
INFO=<ID=AS_QD,Number=1,Type=Float,Description="Allele-specific Variant Confidence/Quality by Depth">
INFO=<ID=AS_RAW_BaseQRankSum,Number=1,Type=String,Description="raw data for allele specific rank sum test of base qualities">
INFO=<ID=AS_RAW_MQ,Number=A,Type=Float,Description="Allele-specfic raw data for RMS Mapping Quality">
INFO=<ID=AS_RAW_MQRankSum,Number=1,Type=String,Description="Allele-specific raw data for Mapping Quality Rank Sum">
INFO=<ID=AS_RAW_ReadPosRankSum,Number=1,Type=String,Description="allele specific raw data for rank sum test of read position bias">
INFO=<ID=AS_SB_TABLE,Number=1,Type=String,Description="Allele-specific forward/reverse read counts for strand bias tests">

But they are missing in the variants INFO section.

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 Sample7 Sample8 Sample9 Sample10 Sample11 Sample12

1 12783 . G A 1436.11 . AC=5;AF=0.278;AN=18;BaseQRankSum=3.76;DP=136;ExcessHet=9.5122;FS=0.000;MLEAC=7;MLEAF=0.389;MQ=26.36;MQRankSum=-7.480e-01;QD=12.60;ReadPosRankSum=0.067;SOR=0.765 GT:AD:DP:GQ:PL ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 0/0:15,0:15:0:0,0,306 0/0:3,0:3:0:0,0,22 0/0:1,0:1:3:0,3,29 0/0:3,0:3:0:0,0,30 0/1:10,4:.:80:80,0,243 0/1:12,9:.:99:217,0,254 ./.:0,0:0:.:0,0,0 0/1:7,10:.:99:262,0,134 0/1:6,5:.:99:124,0,142 0/1:20,31:.:99:790,0,432

1 12807 . C T 222.05 . AC=2;AF=0.100;AN=20;BaseQRankSum=3.75;DP=222;ExcessHet=3.2451;FS=0.000;InbreedingCoeff=-0.1126;MLEAC=2;MLEAF=0.100;MQ=26.32;MQRankSumm=-1.806e+00;QD=2.27;ReadPosRankSum=1.74;SOR=0.061 GT:AD:DP:GQ:PL ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 0/0:27,0:27:81:0,81,855 0/0:13,0:13:39:0,39,451 0/0:6,0:6:18:0,18,199 0/0:12,0:12:36:0,36,407 0/0:24,0:24:42:0,42,741 0/1:29,8:.:99:115,0,790 0/0:7,0:7:21:0,21,242 0/0:20,0:20:60:0,60,672 0/0:15,0:15:45:0,45,427 0/1:50,11:.:99:148,0,1325

So My questions are:

  1. ClippingRankSum parameter is missing with -G Standard -G AS_Standard parameters, is it normal??
  2. Allele Specific annotations (i.e. present in INFO header were not added in variant section, Why?
  3. Why the InbreedingCoeff is not calculated for every variant?



    edited July 2016

    The resultant VCF records above were shown after joint genotyping with "GenotypeGVCFs"... with command line:
    java -Xmx20g -jar ./GATK-3.6/GenomeAnalysisTK.jar -T GenotypeGVCFs \
    -nt 50 -rf BadCigar -log $LOG/QJ-Joint-HC.log \
    -R $REF \
    -V $OUTPUT/RAW_Sample1_snp_indels_AS.g.vcf \
    -V $OUTPUT/RAW_Sample2_snp_indels_AS.g.vcf \
    -V $OUTPUT/RAW_Sample3_snp_indels_AS.g.vcf \
    -V $OUTPUT/RAW_Sample4_snp_indels_AS.g.vcf \
    -V $OUTPUT/RAW_Sample5_snp_indels_AS.g.vcf \
    -V $OUTPUT/RAW_Sample6_snp_indels_AS.g.vcf \
    -V $OUTPUT/RAW_Sample7_snp_indels_AS.g.vcf \
    -V $OUTPUT/RAW_Sample8_snp_indels_AS.g.vcf \
    -V $OUTPUT/RAW_Sample9_snp_indels_AS.g.vcf \
    -V $OUTPUT/RAW_Sample10_snp_indels_AS.g.vcf \
    -V $OUTPUT/RAW_Sample11_snp_indels_AS.g.vcf \
    -V $OUTPUT/RAW_Sample12_snp_indels_AS.g.vcf \
    -D $KNOWN/dbsnp_138.b37.vcf \
    -o $JOINT/RAW_All_snp_indels_12SAMPLE.vcf

    May be i did not mention -G parameter in the command-line while joint genotyping that's why the Allele Specific annotations were absent... But what about the ClippingRankSum and imbreedingcoefficient annotation?

    I also noted some instances where in the absence of -G standard -G AS_Standard tags in command-line, some annotations that were appeared by default in previous GATK versions such as MQ is missing in the INFO field in GATK-3.4-46..

This discussion has been closed.