Attention:
The frontline support team will be slow on the forum because we are occupied with a GATK Workshop on March 26th and 27th 2019. We will be back and available to answer questions on the forum on March 28th 2019.

GenotypeGVCFs

Hi,
I am currently unsure of how to interpret the output of GenotypeGVCFs when typing my ~600 samples.

Genotype format at a given locus with 2 alternate alleles:
0/2:16,17,0:33:99:453,501,972,0,471,420
--This is what I am expecting for each sample

Genotype format at another given locus with 2 alternate alleles:
0/2:4,1,5,3,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0:15:20:85,46,133,0,111,358,82,64,23,190,20,60,49,41,45,20,60,49,41,45,45
--I really don't understand this

I have written several down-stream scripts which rely on this format being consistent -- why are there so many fields in some of these lines?
I apologize if this is very obvious and documented somewhere, but I have tried searching with no results.

Thanks,
-Briana

Tagged:

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Briana,

    The line with the many values must be a site where the program considered many other alleles. If only two alternate alleles are reported in the ALT column then this may be a bug in how the values are output. Can you please post the full command lines you posted at each step (HaplotypeCaller, CombineGVCFs if applicable and GenotypeGVCFs), and the actual VCF records at the problem site in each output file (for the HaplotypeCaller step, just include a few as examples -- no need to post all 600)?

  • bvecchiobvecchio Member

    HaplotypeCaller:
    -T HaplotypeCaller -stand_call_conf 15 -stand_emit_conf 10 -dcov 400 -R human_g1k_v37_decoy.fasta --dbsnp dbsnp_138.b37.excluding_sites_after_129.vcf -L ReSeq2014.bed -nct 30
    --emitRefConfidence GVCF --variant_index_type LINEAR
    --variant_index_parameter 128000
    -I in.bam -o out.gvcf

    GenotypeGVCFs:
    -T GenotypeGVCFs -R human_g1k_v37_decoy.fasta --dbsnp dbsnp_138.b37.excluding_sites_after_129.vcf --out CombinedGVCFs.vcf -nt 30 -V sample1.gvcf -V sample2.gvcf (...etc up to 600)

    A 2 alternate allele problematic locus:
    1 205884645 . C CGT,CGTGT 26489.74 PASS AC=47,2;AF=0.040,1.715e-03;AN=1166;BaseQRankSum=-4.114e+00;ClippingRankSum=-1.533e+00;DP=69613;FS=0.000;InbreedingCoeff=-0.0379;MLEAC=47,2;MLEAF=0.040,1.715e-03;MQ=59.64;MQ0=0;MQRankSum=0.533;QD=4.45;ReadPosRankSum=0.985 GT:AD:DP:GQ:PL
    0/0:.:88:99:0,108,1620,108,1620,1620
    0/1:33,0,6,3,0,0,0,0,0:42:77:126,0,2298,77,2301,2809
    0/2:91,0,13,37,0,2,0,0,0:143:99:1652,937,7067,0,6080,7057

    A 3 alternate allele locus:
    1 205916937 rs56320248 T TGAGGTACCCGAGGCCCCA,TGAGATACCCGAGGCCCCA,TGAGATACCTGAGGCCCCA
    0/1:76,100,0,0,0,0,0,0,0,0,0,0,0:176:99:3874,0,12412,4158,12713,16871,4158,12713,16871,16871

    A 5 alternate allele locus:
    1 205881376 rs61004872 CTT CT,C,CTTT,CTTTT,CTTTTT 37907.29 PASS AC=176,38,313,130,45;AF=0.151,0.033,0.268,0.111,0.039;AN=1166;BaseQRankSum=-1.500e-02;ClippingRankSum=0.058;DB;DP=71532;FS=3.644;InbreedingCoeff=0.4387;MLEAC=144,23,297,88,32;MLEAF=0.123,0.020,0.255,0.075,0.027;MQ=58.17;MQ0=0;MQRankSum=0.080;QD=2.96;ReadPosRankSum=-3.400e-01 GT:AD:DP:GQ:PL
    0/2:34,0,5,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0:41:47:47,92,667,0,584,569,92,667,584,667,92,667,584,667,667,92,667,584,667,667,667

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    OK, this is a known bug with the AD field trimming that we've fixed internally. The fix will be in the next release; in the meantime you can use the latest nightly build (see Downloads page) if you need to use the fixed version immediately. Sorry about the inconvenience.

  • bvecchiobvecchio Member

    Thanks Geraldine!! Always so helpful

Sign In or Register to comment.