maxAltAlleles in GenotypeGVCFs

bd338bd338 Cambridge, UKMember

Hi Geraldine,

We have used the 3.x workflow to process targeted resequencing for a large cohort. At the genotypeGVCFs stage, we get the following warning in the output:

ExactAFCalc - this tool is currently set to genotype at most 6 alternate alleles in a given context, but the context at chr:pos has NN alternate alleles so only the top alleles will be used; see the --max_alternate_alleles argument

Since our cohort is very large, it is extremely unlikely, but not impossible that more than 6 indel alleles exist at a given locus. Accordingly, allowing genotypeGVCFs to consider more possible alleles seems warranted.

While --max_alternate_alleles is an option for HaplotypeCaller, it is not recognized by genotypeGVCFs. Is this parameter fixed for genotypeGVCFs? If not, is there a way to change this parameter at the genotypeGVCFs stage? If not, can this parameter be modified by altering the combined GVCF header or rerunning HaplotypeCaller with a higher value for --max_alternate_alleles?

Thanks for all your hard work in this forum!

Tagged:

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I think this may be an oversight that occurred during development -- I do believe that the desired behavior is to be able to set the max alt alleles from command line. Let me check with the devs; if appropriate I'll put in a feature request.

  • bd338bd338 Cambridge, UKMember

    Thanks for looking into this, Geraldine! Is it the same story for other options that were previously controlled at the HaplotypeCaller stage, such as --maxNumHaplotypesInPopulation?

    Best,
    bd

  • KurtKurt Member ✭✭✭

    Hi @Geraldine_VdAuwera‌ ,

    Is -emit/call confidence one of those options to made available at the command line for GenotypeGVCF? In Haplotype caller gvcf mode everything is set to zero, but it seems like in GenotypeGVCF it will only emit at 30.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @Kurt,

    We're reviewing which options need to be made user-adjustable, and I do think that will be one of them as well, yes.

  • mhairimhairi Member

    Hi,
    Not sure if anyone else has reported this but i have noticed weird behavour in the 'AD' field of GenotypeGVCF v3.3.1 when there are more than 6 alternative alleles.

    I get the same message as previously reported
    WARN 15:53:47,185 ExactAFCalc - this tool is currently set to genotype at most 6 alternate alleles in a given context, but the context at chr12:102792885 has 11 alternate alleles so only the top alleles will be used; see the --max_alternate_alleles argument

    and when i look at the INDEL i have 6 alternative alleles
    chr12 102792885 . ATGTG ATGTGTG,ATG,ATGTGTGTG,A,ATGTGTGTGTG,ATGTGTGTGTGTG

    However the AD column for all the samples has counts for 11 ALT alleles instead of the 6 which is printed:

    0/2:10,0,4,0,2,0,0,0,0,0,0,0:16:45:87,105,344,0,240,228,105,344,240,344,45,319,214,319,479,105,344,240,344,319,344,105,344,240,344,319,344,344

    REF alleles has AD of 10 and then the AD of 11 ALT alleles printed.

    I also have a 2nd example where there are only two ALT allels:
    chr12 7069425 rs28579154 G C,T

    The AD for the samples for this SNP also seems to have an extra column. It has counts for one REF and 3 ALT even though there are only two ALT.
    0/1:7,18,0,0:25:99:540,0,123,561,177,738
    1/1:0,19,0,0:19:56:611,56,0,611,56,611

    I have however seen plenty of SNPs with two alternative alleles that correctly print the AD column.

    Is this a bug and not me doing something weird?

    Thanks

    Mhairi

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @mhairi This is a known bug where ADs are not getting trimmed properly. The fix will be in the next release. If you need to use the fixed version now you can use the latest nightly build (see Downloads page).

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭
    edited July 2014

    @Geraldine_VdAuwera‌ (aka BDFL of the GATK forum) I have a related question. Is it possible to turn off ExactAFCalc WARNings? I get a warning every few seconds when running GenotypeGVCFs on gVCFs generated with HC using the default --max_alternate_alleles of 6 on 2000 samples. If possible great. If not possible, then you are still great despite probably suggesting I write the option myself :) I am going on vacation next week, so you will be able to get a little rest ;) Thanks!

    Here columns REF and ALT of input files after combining 2000 gVCFs into 10 gVCFs with CombineGVCFs (all GT values ./.) before running GenotypeGVCFs:

    GAC GACACACAC,GACACAC,GACAC,GACACACACACAC,G,GACACACACAC,<NON_REF>
    GACACAC GACACACAC,GACACACACAC,GACACACACACAC,GACAC,GACACACACACACAC,G,<NON_REF>
    GAC GACAC,GACACAC,GACACACAC,G,GACACACACAC,<NON_REF>
    GACACAC GACACACACAC,GACACACAC,GACACACACACAC,GACACACACACACAC,G,GAC,<NON_REF>
    GACACAC GACACACACAC,GACACACACACAC,GACACACAC,GACAC,G,GACACACACACACAC,<NON_REF>
    GACACAC GACACACACACACACAC,GACAC,GACACACAC,GACACACACACAC,GACACACACAC,GACACACACACACAC,G,<NON_REF>
    GACACAC GACACACACACAC,GACACACACAC,GACACACAC,G,GACACACACACACAC,<NON_REF>
    GACACAC GACACACACAC,GACACACACACAC,GACACACAC,GACACACACACACAC,GACAC,GACACACACACACACAC,G,<NON_REF>
    GACACACAC   GACACACACACAC,G,GACACACACACACAC,GACACACACAC,GACACACACACACACAC,<NON_REF>
    GAC GACAC,GACACAC,GACACACAC,G,GACACACACAC,<NON_REF>
    

    Here first 9 columns of output after running GenotypeGVCFs (most GT values now different from ./.):

    20  69506   .   GACACAC GACACACACACAC,GACACACACAC,GACACACAC,GACAC,GACACACACACACAC,G 51892.80    .   AC=170,573,268,19,40,8;AF=0.065,0.219,0.102,7.257e-03,0.015,3.056e-03;AN=2618;BaseQRankSum=0.727;DP=6001;FS=0.929;GQ_MEAN=12.94;GQ_STDDEV=20.49;InbreedingCoeff=0.7018;MLEAC=121,416,200,17,28,7;MLEAF=0.046,0.159,0.076,6.494e-03,0.011,2.674e-03;MQ=55.16;MQ0=0;MQRankSum=-1.980e-01;NCC=677;QD=19.38;ReadPosRankSum=0.727    GT:AD:DP:GQ:PL
    
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hah, BDFL, eh? I'm honored ;)

    I agree it probably makes sense to suppress that warning for GGVCFs; let me discuss with the group and we'll see if we can get that done.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Actually, you can turn off WARN messages by setting -l INFO (lowercase L, not capital i) to suppress WARN lines altogether. Of course this means you won't see any other warnings either...

  • HasaniHasani GermanyMember
    edited August 2014

    Hallo,

    I'm recieving the same warning regarding number of alternate alleles in sites and that GATK will consider only the top alleles. However, AD does not report the 6, and only one alternate allele is given ...could you explain why?

    chr10 8096706 . TAA T 282.73 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.775;DP=249;FS=4.434;MLEAC=1;MLEAF=0.500;MQ=36.99;MQ0=0;MQRankSum=-0.170;QD=1.14;RPA=16,14;RU=A;ReadPosRankSum=0.555;STR GT:AD:DP:GQ:PL 0/1:138,63:213:99:893,0,2742

    Thank you in advance!
    H.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @Hasani, this tool reports at most two alleles (including ref) out of the list of all possible alleles that it considers, since it assumes the organism is diploid. If you want to see all considered alleles you need to look in the GVCF.

  • Hi,
    I am using HaplotypeCaller and GenotypeGVCFs to genotype a large population and would like to change the -maxAltAlleles option to allow more than 6 alleles to be considered. I thought this option was going to be enabled in the current GATK version (3.2-2), but I noticed it is still not possible to modify it. Is this option going to be available in the near future?

    Thanks

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @afahrenkrog‌

    Hi,

    The -maxAltAlleles argument should be enabled in the latest version. What do you mean by it is not possible to modify it?

    Thanks,
    Sheila

Sign In or Register to comment.