Unexpected genotypes in GenotypeGVCF output

rfraserrfraser Guelph, OntarioMember
edited January 2016 in Ask the GATK team

Hello GATK Team,

I have 21 bam files that I ran through HaplotypeCaller in GVCF mode followed by GenotypeGVCF, using Version=3.4-0-g7e26428. I found a few entries that I am having a difficult time understanding.

Here is one of the entries in question (quick note: each sample is from a pool of 3-5 animals, explaining the high ploidy):

chr1   88988835    rs396411987 C   G   226099.87   .   AC=61;AF=0.455;AN=134;BaseQRankSum=-1.176e+00;ClippingRankSum=0.054;DB;DP=29307;FS=0.000;MLEAC=61;MLEAF=0.455;MQ=59.98;MQRankSum=0.603;QD=11.32;ReadPosRankSum=1.30;SOR=0.696   GT:AD:DP:GQ:PL  0/0/0/0/0/1/1/1/1/1:396,449,0,0:845:28:9249,2087,965,413,120,0,28,219,649,1588,7879 0/1/1/1/1/1/1/1/1/1:275,2150,0,0:2425:99:52058,17674,11483,7903,5424,3571,2143,1051,287,0,3240  0/0/0/1/1/1/1/1:626,1134,0,0:1760:99:24237,5532,2600,1118,316,0,197,1292,11237  0/0/1/1/1/1/1/1:225,824,0,0:1049:99:18893,5121,2835,1577,772,257,0,116,3439 ./././././././././.:0,0 ./././././././././.:0,0 ./././././././.:0,0 0/0/0/0/0/0/0/1:824,105,0,0:929:99:1309,0,239,705,1369,2290,3644,6013,20091 0/0/0/0/0/1/1/1:786,499,0,0:1285:99:9169,1197,249,0,139,633,1609,3600,16598 0/0/1/1/1/1:557,1132,0,2:1691:99:24696,4545,1719,433,0,563,9784 0/0/0/0/0/0/0/1:922,140,0,0:1062:99:1825,0,202,685,1401,2410,3908,6544,22289    0/0/0/0/1/1/1/1/1/1:650,828,0,0:1478:26:16909,4072,1966,903,311,26,0,254,906,2400,12402 ./././././././.:0,0 0/0/0/0/0/0/0/0/1/1:846,179,0,0:1025:95:2546,95,0,178,520,1015,1689,2617,3986,6389,20368    0/0/0/0/1/1/1/1:875,986,0,0:1861:99:20237,3745,1410,380,0,136,885,2819,17626    0/0/0/0/0/0/0/0:570,0:570:23:0,23,50,82,120,170,241,361,1800    0/0/0/0/0/1/1/1:798,604,0,0:1402:20:11395,1675,423,0,20,429,1344,3302,16391 0/0/0/0/1/1/1/1:644,730,0,0:1374:96:14884,2778,1049,285,0,96,643,2062,12779 0/0/0/0/0/1/1/1:613,417,0,0:1030:74:7703,1065,243,0,74,433,1174,2710,12915  0/0/0/1/1/1/1/1:239,513,0,0:752:13:11159,2667,1309,604,198,0,13,378,4191    ./././././././.:0,0

So, for the samples with genotypes present, AD is obviously quite high (in the 1000s, generally). It surprised me, then, that some samples don't have information (represented by ./././././././.). I went back to the original sample gvcf files and pulled the entries from a subset that were represented by ./././././././.:

group14.g.vcf:chr1 88988835    rs396411987 C   G,A,   16897.68    .   BaseQRankSum=-0.709;ClippingRankSum=0.843;DB;DP=1729;MLEAC=5,0,0;MLEAF=0.500,0.00,0.00;MQ=59.96;MQRankSum=0.621;ReadPosRankSum=1.987    GT:AD:DP:GQ:SB  0/0/0/0/0/1/1/1/1/1:876,844,2,0:1722:99:443,433,424,422
group15.g.vcf:chr1 88988835    rs396411987 C   G,T,   690.35  .   BaseQRankSum=-1.311;ClippingRankSum=-0.019;DB;DP=1168;MLEAC=1,0,0;MLEAF=0.125,0.00,0.00;MQ=59.98;MQRankSum=0.603;ReadPosRankSum=3.624   GT:AD:DP:GQ:SB  0/0/0/0/0/0/0/1:1082,78,4,0:1164:99:542,540,44,38

A consistent difference between samples that are included and those that are not is the presence of multiple alternative alleles in the excluded samples (G,A and G,T in the above example). Is this the source of my troubles? Is there a way of forcing GATK to include those samples in the VCF, especially given that the support for the third allele seems pretty weak (2 reads out of ~1722 in sample 14 above)? It seems to me that these samples should reasonably be included in the output of GenotypeGVCF.

Apologies if the answer is somewhere in the forums/guide - I did check, but didn't find anything.

Thanks,
Russ

Post edited by rfraser on

Comments

  • rfraserrfraser Guelph, OntarioMember

    Quick follow-up - any thoughts?

    Russ

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @rfraser
    Hi Russ,

    Sorry for the late response. Are you saying that the group14 and group15 samples do not show up at all in the final VCF? Or, are you saying that they seem to be variant in the GVCF, but they show up as no-call in the final VCF? Can you please post the GVCF record vs the final VCF record together? I am having a hard time lining up the groups in the final VCF and the groups from the GVCF.

    Also, can you please post a screenshot of the original bam file and the bamout file in the region?

    Thanks,
    Sheila

  • rfraserrfraser Guelph, OntarioMember

    Hi Sheila,

    Thanks for the response, and sorry for the lack of clarity.

    Group14 and group15 samples DO show up in the final VCF, but they show up as no-calls. I'm not sure I fully understand your request to post the GVCF record vs the final VCF record together -- I thought that data is what I originally posted. Would the header line from the final VCF be helpful? I did my best to highlight group14 and group15 in the final VCF below (using ***).

    GVCFs

    group14.g.vcf
    chr1   88988835    rs396411987 C   G,A,   16897.68    .   BaseQRankSum=-0.709;ClippingRankSum=0.843;DB;DP=1729;MLEAC=5,0,0;MLEAF=0.500,0.00,0.00;MQ=59.96;MQRankSum=0.621;ReadPosRankSum=1.987    GT:AD:DP:GQ:SB  0/0/0/0/0/1/1/1/1/1:876,844,2,0:1722:99:443,433,424,422
    group15.g.vcf
    chr1   88988835    rs396411987 C   G,T,   690.35  .   BaseQRankSum=-1.311;ClippingRankSum=-0.019;DB;DP=1168;MLEAC=1,0,0;MLEAF=0.125,0.00,0.00;MQ=59.98;MQRankSum=0.603;ReadPosRankSum=3.624   GT:AD:DP:GQ:SB  0/0/0/0/0/0/0/1:1082,78,4,0:1164:99:542,540,44,38

    Final VCF

    #CHROM POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  group1  group10 group11 group12 group13 group14 group15 group16 group17 group18 group19 group2  group20 group24 group3  group4  group5  group6  group7  group8  group9
    chr1    88988835    rs396411987 C   G   226099.87   PASS    AC=61;AF=0.455;AN=134;BaseQRankSum=-1.176e+00;ClippingRankSum=0.054;DB;DP=29307;FS=0.000;MLEAC=61;MLEAF=0.455;MQ=59.98;MQRankSum=0.603;QD=11.32;ReadPosRankSum=1.30;SOR=0.696;set=variant   GT:AD:DP:GQ:PL  0/0/0/0/0/1/1/1/1/1:396,449,0,0:845:28:9249,2087,965,413,120,0,28,219,649,1588,7879 0/1/1/1/1/1/1/1/1/1:275,2150,0,0:2425:99:52058,17674,11483,7903,5424,3571,2143,1051,287,0,3240  0/0/0/1/1/1/1/1:626,1134,0,0:1760:99:24237,5532,2600,1118,316,0,197,1292,11237  0/0/1/1/1/1/1/1:225,824,0,0:1049:99:18893,5121,2835,1577,772,257,0,116,3439 ./././././././././.:0,0 ***./././././././././.:0,0  ./././././././.:0,0***  0/0/0/0/0/0/0/1:824,105,0,0:929:99:1309,0,239,705,1369,2290,3644,6013,20091 0/0/0/0/0/1/1/1:786,499,0,0:1285:99:9169,1197,249,0,139,633,1609,3600,16598 0/0/1/1/1/1:557,1132,0,2:1691:99:24696,4545,1719,433,0,563,9784 0/0/0/0/0/0/0/1:922,140,0,0:1062:99:1825,0,202,685,1401,2410,3908,6544,22289    0/0/0/0/1/1/1/1/1/1:650,828,0,0:1478:26:16909,4072,1966,903,311,26,0,254,906,2400,12402 ./././././././.:0,0 0/0/0/0/0/0/0/0/1/1:846,179,0,0:1025:95:2546,95,0,178,520,1015,1689,2617,3986,6389,20368    0/0/0/0/1/1/1/1:875,986,0,0:1861:99:20237,3745,1410,380,0,136,885,2819,17626    0/0/0/0/0/0/0/0:570,0:570:23:0,23,50,82,120,170,241,361,1800    0/0/0/0/0/1/1/1:798,604,0,0:1402:20:11395,1675,423,0,20,429,1344,3302,16391 0/0/0/0/1/1/1/1:644,730,0,0:1374:96:14884,2778,1049,285,0,96,643,2062,12779 0/0/0/0/0/1/1/1:613,417,0,0:1030:74:7703,1065,243,0,74,433,1174,2710,12915  0/0/0/1/1/1/1/1:239,513,0,0:752:13:11159,2667,1309,604,198,0,13,378,4191    ./././././././.:0,0

    Screenshots of BAM and bamout are attached for group14 and group15.

    Thanks for your help. Apologies if I haven't provided the information you requested.

    Russ

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @rfraser
    Hi Russ,

    Indeed the sample names helped a lot! Thank you.

    Unfortunately, I am not sure what could have happened. But, before I ask you to submit a bug report, can you please try using the latest version of GATK and let us know if you still get the same result? You can just try running on the small region (300 bases before and after) the site of interest.

    Thanks,
    Sheila

  • rfraserrfraser Guelph, OntarioMember

    Hi @Sheila,

    I had the same thought, and ran GenotypeGVCFs from Version=3.5-0-g36282e4 on Date="Mon Jan 11 09:48:34 EST 2016" with the same results. Did you want me to run HaplotypeCaller on them all again, too?

    The bamout files that I posted were generated using the latest version of GATK.

    The only common pattern that I can see is that all of the 'no-call' entries in the Final VCF file come from groups that were multiallelic at this particular locus...

    Russ

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @rfraser
    Hi Russ,

    Yes, it would be best to re-run HaplotypeCaller and GenotypeGVCFs with the latest version. If you want you can simply run on a small interval like I suggested above and even run on only groups 14 and 15 to save time.

    Thanks,
    Sheila

  • rfraserrfraser Guelph, OntarioMember

    @Sheila,

    Done... Alas, results are the same. I threw in a few of the groups that worked normally just to be sure. Results:

    group14.g.vcf

    ID=HaplotypeCaller,Version=3.5-0-g36282e4,Date="Fri Jan 15 12:13:34 EST 2016"
    
    chr1    88988835    rs396411987 C   G,A,   16897.68    .   BaseQRankSum=-2.311;ClippingRankSum=-1.016;DB;DP=1729;MLEAC=5,0,0;MLEAF=0.500,0.00,0.00;MQRankSum=1.002;RAW_MQ=6215865.00;ReadPosRankSum=0.885  GT:AD:DP:GQ:SB  0/0/0/0/0/1/1/1/1/1:876,844,2,0:1722:99:443,433,424,422

    group15.g.vcf

    ID=HaplotypeCaller,Version=3.5-0-g36282e4,Date="Fri Jan 15 12:14:20 EST 2016"
    
    chr1    88988835    rs396411987 C   G,T,   690.35  .   BaseQRankSum=-1.973;ClippingRankSum=0.464;DB;DP=1168;MLEAC=1,0,0;MLEAF=0.125,0.00,0.00;MQRankSum=-0.689;RAW_MQ=4202561.00;ReadPosRankSum=3.304  GT:AD:DP:GQ:SB  0/0/0/0/0/0/0/1:1082,78,4,0:1164:99:542,540,44,38

    Final VCF

    ID=GenotypeGVCFs,Version=3.5-0-g36282e4,Date="Fri Jan 15 12:17:31 EST 2016"
    
    #CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  group1  group10 group14 group15 group2  group20
    chr1    88988835    rs396411987 C   G   78175.14    .   AC=20;AF=0.667;AN=30;BaseQRankSum=-1.959e+00;ClippingRankSum=0.421;DB;DP=9051;FS=0.000;MLEAC=20;MLEAF=0.667;MQ=59.97;MQRankSum=0.378;QD=16.46;ReadPosRankSum=0.885;SOR=0.699    GT:AD:DP:GQ:PL  0/0/0/0/0/1/1/1/1/1:396,449:845:28:9249,2087,965,413,120,0,28,219,649,1588,7879 0/1/1/1/1/1/1/1/1/1:275,2150:2425:99:52058,17674,11483,7903,5424,3571,2143,1051,287,0,3240  ./././././././././.:0,0 ./././././././.:0,0 0/0/0/0/1/1/1/1/1/1:650,828:1478:26:16909,4072,1966,903,311,26,0,254,906,2400,12402 ./././././././.:0,0

    Thanks,
    Russ

  • rfraserrfraser Guelph, OntarioMember

    Good morning @Sheila,

    Should I submit a bug report with this information? If so, I have preemptively begun preparing the files for submission, but was wondering what would you like me to do regarding submission of a reference genome. I am working with the horse (equCab 2.0), and I suspect you don't want me to transfer the entire 12 Gb to your servers...

    Thanks,
    Russ

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Yes, please submit a bug report, including your reference file. Unfortunately we can't do anything without it and we don't have a horse genome on file. Don't worry about our server, we have a few terabytes free :)

  • rfraserrfraser Guelph, OntarioMember

    Hi @Geraldine_VdAuwera,

    4 snippets of my bam files (2 working, 2 giving error), a dbSNP file, and the equine genome have been uploaded. The file is named "2016_01_19_rfraser.tar.gz".

    Thanks,
    Russ

    Issue · Github
    by Sheila

    Issue Number
    487
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    vdauwera
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @rfraser
    Hi Russ,

    Thanks. I will have a look at this soon.

    -Sheila

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @rfraser
    Hi,

    I submitted a bug report and will keep you updated on the status. You can also keep track of any bugs here.

    -Sheila

  • rfraserrfraser Guelph, OntarioMember

    That's great, thanks Sheila.

  • rfraserrfraser Guelph, OntarioMember

    Hi @Sheila,

    Sorry to pester, but are there any updates on this issue?

    Thanks,
    Russ

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @rfraser
    Hi Russ,

    There has been some chatter about this, but no solid plan yet. The issue is that the PL field is missing because there are too many PL values for that particular sample. I will see what I can do to have this fixed soon.

    Thanks for being patient!

    -Sheila

  • rfraserrfraser Guelph, OntarioMember
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    To expand a little bit on Sheila's comment, what's happening under the hood is that due to the combination of high ploidy and the number of alleles, there are more PLs than we want to keep track of, due to a hardcoded limitation that was originally meant to limit computational requirements. We have several options to solve this, including replacing the hard limit by a value settable from the command line which would be the best in my opinion. We're looking into feasibility right now; if that works then we could have a fix for you fairly soon.

  • rfraserrfraser Guelph, OntarioMember
    edited February 2016

    Thanks for the follow-up explanation. It seems to fit well with the observation that the missing genotypes were only in samples that were multiallelic rather than biallelic. The fix sounds reasonable.

    On a related topic, Sheila noted in her summary of this bug that "the AD field contains 4 values even though there are only 3 alleles."

    I had noticed that, too, but had just assumed I was ignorant about something - perhaps the 0 was referring to the <NON-REF> field or something - but I take it that is not the correct answer. Are there any thoughts about this?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Yes the 4th value pertains to the <NON_REF> allele -- it's not visible in the record you posted because it gets mistaken for an html tag, which occasionally trips us up.

  • rfraserrfraser Guelph, OntarioMember

    Hi @Geraldine_VdAuwera & @Sheila,

    I noticed from the bug tracker that this bug has been resolved, with the fix expected to be in GATK v3.6. Is the fix currently in any of the nightly builds? Alternatively, any kind of ETA on v3.6?

    Cheers,
    Russ

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Yes, the fix should be in the recent nightly builds.

  • rfraserrfraser Guelph, OntarioMember

    Hopefully a final question: I browsed the documentation that accompanies the nightly builds but couldn't find the command line argument to set the max # of PLs to consider - could you let me know what it is so I can proceed with some testing?

    Thanks,
    Russ

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @rfraser
    Hi Russ,

    Sorry, I closed the original issue because of the thread becoming too complicated. I opened a new issue for the feature request to allow users to set the maximum number of PL values, but I did not tag it to be in the bug tracker. This feature has not yet been implemented. Sorry for the confusion.

    I am making the new feature request available in the tracker so you can keep track of it there.

    -Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @rfraser, good news! The full fix was merged in this morning so it should be available in the nightly builds starting tomorrow. The argument that controls the new behavior is defined as follows:

    * Determines the maximum number of PL values that will be logged in the output.  If the number of genotypes
    * (which is determined by the ploidy and the number of alleles) exceeds the value provided by this argument,
    * then output of all of the PL values will be suppressed.
    */
    @Advanced
    @Argument(fullName = "max_num_PL_values", shortName = "maxNumPLValues", doc = "Maximum number of PL values to output", required = false)
    public int MAX_NUM_PL_VALUES = AFCalculator.MAX_NUM_PL_VALUES_DEFAULT;
    

    Sorry it took a while to get this in -- there were a few complications along the way. Let us know how it works out in your hands.

  • rfraserrfraser Guelph, OntarioMember

    Awesome! Thanks so much. I probably won't have an opportunity to get to it for a week or two, but will update you with results when I do.

    Cheers,
    Russ

  • rfraserrfraser Guelph, OntarioMember
    edited May 2016

    Hi @Geraldine_VdAuwera,

    I've finally had a chance to test out the the new -maxNumPLValues argument but am unfortunately getting the same results as before.

    I am using the same GVCFs as noted above (group14 and group15).

    The command I have used is:

    /usr/lib/jvm/jdk1.8.0/bin/java -Xmx28G -jar ~/java/GATK/nightly2016-05-06-GATK.jar \
    -T GenotypeGVCFs \
    -R ~/genome/genome.fa \
    -D ~/equine/snp/dbSNP/merged_dbSNP_aug2015.vcf \
    -L chr1:88000000-89000000 \
    -stand_call_conf 30 \
    -stand_emit_conf 10 \
    -allSites \
    -maxNumPLValues 10000 \
    -V group12.g.vcf \
    -V:diseased group14.g.vcf \
    -V:diseased group15.g.vcf \
    -o ~/equine/2014_11_24/temp/test.pl-corrected.vcf \
    -log ~/equine/2014_11_24/temp/log.txt \
    -l DEBUG

    And the results:

    #CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  group12 group14 group15
    chr1    88988835    rs396411987 C   G   18868.88    .   AC=6;AF=0.750;AN=8;BaseQRankSum=-9.220e-01;ClippingRankSum=0.054;DB;DP=3947;FS=0.000;MLEAC=6;MLEAF=0.750;MQ=59.98;MQRankSum=0.621;QD=17.99;ReadPosRankSum=1.99;SOR=0.702    GT:AD:DP:GQ:PL  0/0/1/1/1/1/1/1:225,824:1049:99:18893,5121,2835,1577,772,257,0,116,3439 ./././././././././.:0,0 ./././././././.:0,0

    I've tried playing around with the maxNumPLValues argument a bit, but nothing has done the trick. Any thoughts?

    Thanks,
    Russ

    Issue · Github
    by Sheila

    Issue Number
    920
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    chandrans
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @rfraser
    Hi Russ,

    Thanks for getting back to us. I will have a look soon and check with the team.

    -Sheila

  • rfraserrfraser Guelph, OntarioMember

    Hi @Sheila,

    Any updates? Thanks!

    Russ

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @rfraser
    Hi Russ,

    Yes! Instead of using -maxNumPLValues 10000 in GenotypeGVCFs, try adding it to your HaplotypeCaller commands. I just did that on your test files, and even when I don't add it to the GenotypeGVCFs command, the output looks good :smiley:

    -Sheila

  • rfraserrfraser Guelph, OntarioMember

    Hi @Sheila,

    Just wanted to let you know that it seems to be working, and to thank you and the GATK Team for implementing the fix!

    Cheers,
    Russ

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @rfraser
    Hi Russ,

    Glad to hear it! :smile:

    I will let the team know you are happy.

    -Sheila

  • apfuentesapfuentes Member

    Hello GATK team,

    I am also analyzing WGS of pooled DNA of a non-model organism. I obtained GVCFs for each BAM file with the HaplotypeCaller -ERC mode (I used -maxNumPLValues 10000 as recommended in this forum) and then obtained a VCF file with GenotypeGVCF for the group of GVCFs (GATK 3.7).

    In the stdout file of the HaplotypeCaller -ERC step I obtained these warning messages:

    There were 2 WARN messages, the first 2 are repeated below.
    WARN  17:24:53,958 InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples. 
    WARN  17:47:31,504 HaplotypeScore - Annotation will not be calculated, must be called from UnifiedGenotyper, not org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller 
    

    And in the stdout file of the GenotypeGVCFs I got these warning messages:

    There were 5 WARN messages, the first 5 are repeated below.
    WARN  03:59:38,403 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail. 
    WARN  03:59:38,404 InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples. 
    WARN  03:59:38,405 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail. 
    WARN  03:59:38,636 HaplotypeScore - Annotation will not be calculated, must be called from UnifiedGenotyper, not org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs 
    WARN  03:59:40,896 AFCalculator - Maximum allowed number of PLs (100) exceeded for sample 7iA at scaffold776:23036-23037 with 286 possible genotypes. No PLs will be output for these genotypes (which may cause incorrect results in subsequent analyses) unless the --max_num_PL_values argument is increased accordingly. Unless the DEBUG logging level is used, this warning message is output just once per run and further warnings are suppressed.
    

    The problem with the excess of PLs in pooled data is fixed in the HaplotypeCaller -ERC with the -maxNumPLValues (as I did not get a warning message about PLs when I used it), but it appears in the GenotypeGVCFs step (as shown in the 'stdout' file). As far as I understand, -maxNumPLValues is not available for the GenotypeGVCFs, right?

    I wonder if the variants in the VCF file have low confidence as the excess of PLs is not fixed in the GenotypeGVCF step.

    Thanks for any help.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @apfuentes
    Hi,

    Can you also try setting --max_genotype_count to a high number in your command?

    Thanks,
    Sheila

Sign In or Register to comment.