Service notice: Several of our team members are on vacation so service will be slow through at least July 13th, possibly longer depending on how much backlog accumulates during that time. This means that for a while it may take us more time than usual to answer your questions. Thank you for your patience.

All annotations in BP_RESOLUTION mode

adammaikaiadammaikai San Diego, CAMember

Hello,

I was wondering if there is a way to output all annotations for all sites when running HaplotypeCaller with BP_RESOLUTION. Currently it outputs all annotations for only called variants. Thanks in advance.

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @adammaikai
    Hi,

    Having all the annotations output would result in an outrageously large file. Which specific annotations are you interested in?

    -Sheila

  • adammaikaiadammaikai San Diego, CAMember

    Hi Sheila,

    File size will not particularly be an issue for the project type I am currently working on otherwise I'd agree. I am interested in AF, BaseQRankSum, ClippingRankSum, DP, FS, GQ, MLEAC, MLEAF, MQ, MQ0, MQRankSum, ReadPosRankSum, SOR, GT, AD, GQ, PL.

    Adam

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @adammaikai
    Hi Adam,

    I think the big issue is that the BP_RESOLUTION output is an intermediate file that is not to be used as a final vcf or for any analysis. The final vcf will not have a alternate allele (to follow vcf spec guidelines). Without an alternate allele, many of the annotations cannot be calculated.

    -Sheila

  • adammaikaiadammaikai San Diego, CAMember

    Hi,

    Thanks for the response. Yes, that makes sense. What I am mainly concerned about are annotations for variants that were not called but have alternate allele counts > 0. Is there an alternative method for obtaining this information?

    Adam

  • mqbpkkd2mqbpkkd2 ManchesterMember

    Hi Sheila,

    I am also running HaplotypeCaller in BP RESOLUTION mode but don't have all positions. Are the non called sites considered non variant by default? Below is my command and some output lines, and as you can see there are positions missing. (running gatk 3.5 through a cluster)

    gatk -T HaplotypeCaller -R ~/scratch/Barley_morex_pseudomolecules_reference/barley_morex_pseudomolecules.fasta -I ./ERR753130_filtered_trimmed.RG.bam
    -gt_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 30 -o ERR753130_filtered_trimmed.RG.bam.g.vcf -ERC BP_RESOLUTION -drf DuplicateRead

    CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SampleName

    chr1H 14766 . T C 46.74 . AC=2;AF=1.00;AN=2;DP=3;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=37.80;QD=23.37;SOR=2.303 GT:AD:DP:GQ:PL 1/1:0,2:2:6:74,6,0
    chr1H 41709 . C T 62.74 . AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=31.37;SOR=2.303 GT:AD:DP:GQ:PL 1/1:0,2:2:6:90,6,0
    chr1H 41718 . C G 62.74 . AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=31.37;SOR=2.303 GT:AD:DP:GQ:PL 1/1:0,2:2:6:90,6,0
    chr1H 41744 . CGTT C 53.70 . AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=26.85;SOR=2.303 GT:AD:DP:GQ:PL 1/1:0,2:2:6:90,6,0
    chr1H 41926 . T TTCCTC 1186.73 . AC=2;AF=1.00;AN=2;DP=32;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=29.43;SOR=6.736 GT:AD:DP:GQ:PL 1/1:0,28:28:83:1224,83,0
    chr1H 42252 . C CT 159.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=1.532;ClippingRankSum=0.118;DP=16;ExcessHet=3.0103;FS=7.404;MLEAC=1;MLEAF=0.500;MQ=59.53;MQRankSum=-0.825;QD=14.52;R
    eadPosRankSum=-1.061;SOR=1.155 GT:AD:DP:GQ:PL 0/1:2,9:11:21:197,0,21
    chr1H 42262 . G T 919.77 . AC=2;AF=1.00;AN=2;DP=24;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=59.69;QD=27.88;SOR=1.051 GT:AD:DP:GQ:PL 1/1:0,24:24:72:948,72,0
    chr1H 44079 . C T 1274.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=-1.921;ClippingRankSum=0.546;DP=244;ExcessHet=3.0103;FS=23.015;MLEAC=1;MLEAF=0.500;MQ=48.13;MQRankSum=-8.858;QD=5.25
    ;ReadPosRankSum=2.515;SOR=1.345 GT:AD:DP:GQ:PL 0/1:197,46:243:99:1303,0,10429
    chr1H 44082 . C T 1277.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=-3.254;ClippingRankSum=-1.143;DP=240;ExcessHet=3.0103;FS=20.908;MLEAC=1;MLEAF=0.500;MQ=48.58;MQRankSum=-8.985;QD=5.3
    5;ReadPosRankSum=2.299;SOR=1.296 GT:AD:DP:GQ:PL 0/1:194,45:239:99:1306,0,10406
    chr1H 44099 . T A 293.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=-2.589;ClippingRankSum=-2.312;DP=225;ExcessHet=3.0103;FS=10.615;MLEAC=1;MLEAF=0.500;MQ=50.84;MQRankSum=-8.298;QD=1.3
    1;ReadPosRankSum=0.120;SOR=0.811 GT:AD:DP:GQ:PL 0/1:192,33:225:99:322,0,6985
    chr1H 45251 . T C 511.77 . AC=2;AF=1.00;AN=2;DP=14;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=31.22;SOR=3.912 GT:AD:DP:GQ:PL 1/1:0,14:14:41:540,41,0
    chr1H 47364 . ATC A 470.73 . AC=1;AF=0.500;AN=2;BaseQRankSum=-0.962;ClippingRankSum=0.092;DP=87;ExcessHet=3.0103;FS=11.837;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.046;QD=7.59;R
    eadPosRankSum=2.188;SOR=0.051 GT:AD:DP:GQ:PL 0/1:43,19:62:99:508,0,1383
    chr1H 49336 . C G 950.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.128;ClippingRankSum=0.848;DP=113;ExcessHet=3.0103;FS=7.727;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.360;QD=8.41;Re
    adPosRankSum=-2.326;SOR=1.576 GT:AD:DP:GQ:PL 0/1:79,34:113:99:979,0,2739
    chr1H 49350 . A T 1046.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=-0.582;ClippingRankSum=1.291;DP=121;ExcessHet=3.0103;FS=8.490;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.633;QD=8.65;R
    eadPosRankSum=-1.195;SOR=1.511 GT:AD:DP:GQ:PL 0/1:84,37:121:99:1075,0,2989
    chr1H 49389 . C A 1102.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=2.675;ClippingRankSum=-0.786;DP=110;ExcessHet=3.0103;FS=4.455;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.550;QD=10.03;

  • shleeshlee CambridgeMember, Broadie, Moderator

    @mqbpkkd2, I believe this is a duplicate post to https://gatkforums.broadinstitute.org/gatk/discussion/12356/cant-get-variant-and-invariant-sites-with-haplotypecaller#latest. Please do not post duplicate questions. Please read this blogpost on forum etiquette.

Sign In or Register to comment.