Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

Is it possible to suppress the NON_REF tag on variant calls?


In GVCF output from HaplotypeCaller, each line contains the allele, including the lines with explicit variant calls. Is there a simple way to suppress the allele on variant calls?

Also, what is the reason to have a allele on a variant where a specific alternate allele is called?

Thanks for your help.


  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
    edited February 2017

    Hi @smottarella,

    Please refer to this document and this doc and links therein for more detailed explanations as to why you want to run HaplotypeCaller in reference confidence mode.

  • Thank you for this information. My question is specific to running HaplotypeCaller in reference confidence mode. I am still not understanding why the NON_REF allele is present on lines that have explicit alternate alleles. In the case of an explicit alternate allele, why are you passing the confidence that at this position my sample is generically not reference (meaning the NON_REF allele) when that same position has a high enough confidence to explicitly call a variant? And again, is there any way to have HaplotypeCaller not return a NON_REF allele on the lines produced in reference confidence mode that already have an explicit alternate allele called, while still providing the NON_REF allele for the blocks that are being called as homozygous reference? I'm finding cases where HaplotypeCaller is callling a genotype that points to NON_REF despite having explicit alternate alleles listed. I'd prefer to only call explicit alleles, and then in the case of homozygous reference, provide the confidence for NON_REF.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    @smottarella When run in GVCF mode, HaplotypeCaller produces a GVCF file, which is an intermediate intended to be further processed by GenotypeGVCFs, as documented in the links above. Once you run this second tool the NONREF tags will go away.
  • splaisansplaisan Leuven (Belgium)Member ✭✭
    edited September 2018

    running GATK

    I ran GenotypeGVCFs on a merge of 37 single genome g.vcf files (merged with CombineGVCFs) and a lot of full "NON_REF" as well as mixed "NON_REF" & normal calls are printed to the output .vcf file

    How can I get rid of these NON_REF in the VCF? (especially the mixed rows making I simply cannot grep out all lines with NON_REF!)

    Thanks for help

    3 examples with NON_REF in different forms

    2L      4517    .       A       <NON_REF>       .       .       END=4525        GT:DP:GQ:MIN_DP:PL      ./.:1:3:1:0,3,32        ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:1:3:1:0,3,33        ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:1:3:1:0,3,33        ./.:0:0:0:0,0,0 ./.:1:3:1:0,3,34        ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:1:3:1:0,3,34        ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0
    2L      5042    .       C       *,<NON_REF>     .       .       DP=772  GT:AD:DP:GQ:PGT:PID:PL:SB       ./.:0,15,0:15:45:0|1:5038_TCGACAA_T:675,45,0,675,45,675:0,0,12,3        ./.:0,26,0:26:78:0|1:5038_TCGACAA_T:1159,78,0,1159,78,1159:0,0,13,
    13      ./.:0,18,0:18:55:0|1:5038_TCGACAA_T:811,55,0,811,55,811:0,0,14,4        ./.:0,28,0:28:85:0|1:5038_TCGACAA_T:1261,85,0,1261,85,1261:0,0,24,4     ./.:0,20,0:20:60:0|1:5038_TCGACAA_T:900,60,0,900,60,900:0,0,14,6        ./.:0,15,0:15:45:0
    |1:5038_TCGACAA_T:675,45,0,675,45,675:0,0,11,4  ./.:1,24,0:25:99:0|1:5038_TCGACAA_T:1005,0,458,1008,530,1538:1,0,17,7   ./.:0,7,0:7:21:0|1:5038_TCGACAA_T:315,21,0,315,21,315:0,0,4,3   ./.:0,22,0:22:67:0|1:5038_TCGACAA_T:991,67,0,991,67,991:0,
    0,19,3  ./.:0,20,0:20:63:0|1:5038_TCGACAA_T:945,63,0,945,63,945:0,0,17,3        ./.:0,22,0:22:67:0|1:5038_TCGACAA_T:992,67,0,992,67,992:0,0,18,4        ./.:0,7,0:7:24:0|1:5038_TCGACAA_T:360,24,0,360,24,360:0,0,6,1   ./.:0,30,0:30:90:0|1:5038_
    TCGACAA_T:1350,90,0,1350,90,1350:0,0,23,7       ./.:0,24,0:24:75:0|1:5038_TCGACAA_T:1125,75,0,1125,75,1125:0,0,17,7     ./.:0,21,0:21:64:0|1:5038_TCGACAA_T:946,64,0,946,64,946:0,0,17,4        ./.:0,24,0:24:72:0|1:5038_TCGACAA_T:1069,72,0,1069
    ,72,1069:0,0,18,6       ./.:0,13,0:13:40:0|1:5038_TCGACAA_T:587,40,0,587,40,587:0,0,11,2        ./.:0,25,0:25:75:0|1:5038_TCGACAA_T:1125,75,0,1125,75,1125:0,0,22,3     ./.:0,13,0:13:39:0|1:5038_TCGACAA_T:585,39,0,585,39,585:0,0,9,4 ./.:0,20,0
    :20:61:0|1:5038_TCGACAA_T:901,61,0,901,61,901:0,0,18,2  ./.:0,19,0:19:58:0|1:5038_TCGACAA_T:856,58,0,856,58,856:0,0,15,4        ./.:0,32,0:32:96:0|1:5038_TCGACAA_T:1405,96,0,1405,96,1405:0,0,23,9     ./.:0,21,0:21:64:0|1:5038_TCGACAA_T:946,64
    ,0,946,64,946:0,0,15,6  ./.:1,22,0:23:87:0|1:5038_TCGACAA_T:921,0,87,924,154,1077:1,0,19,3      ./.:1,27,0:28:99:0|1:5038_TCGACAA_T:1131,0,482,1134,564,1698:0,1,15,12  ./.:0,17,0:17:52:0|1:5038_TCGACAA_T:766,52,0,766,52,766:0,0,13,4        ./
    .:0,12,0:12:36:0|1:5038_TCGACAA_T:540,36,0,540,36,540:0,0,9,3   ./.:0,20,0:20:60:0|1:5038_TCGACAA_T:900,60,0,900,60,900:0,0,17,3        ./.:0,20,0:20:66:0|1:5038_TCGACAA_T:990,66,0,990,66,990:0,0,14,6        ./.:0,22,0:22:66:0|1:5038_TCGACAA_
    T:990,66,0,990,66,990:0,0,22,0  ./.:0,21,0:21:63:0|1:5038_TCGACAA_T:945,63,0,945,63,945:0,0,19,2        ./.:0,21,0:21:63:0|1:5038_TCGACAA_T:945,63,0,945,63,945:0,0,16,5        ./.:0,27,0:27:81:0|1:5038_TCGACAA_T:1215,81,0,1215,81,1215:0,0,26,
    1       ./.:0,21,0:21:63:0|1:5038_TCGACAA_T:945,63,0,945,63,945:0,0,17,4        ./.:0,19,0:19:57:0|1:5038_TCGACAA_T:855,57,0,855,57,855:0,0,13,6        ./.:0,24,0:24:72:0|1:5038_TCGACAA_T:1080,72,0,1080,72,1080:0,0,17,7     ./.:0,22,0:22:66:0
    2L      711798  .       A       G,<NON_REF>     .       .       DP=1512;ExcessHet=3.01;RAW_MQ=5443200.00        GT:AD:DP:GQ:PGT:PID:PL:SB       ./.:0,33,0:33:99:0|1:711791_A_G:1512,105,0,1512,105,1512:0,0,13,20      ./.:0,39,0:39:99:0|1:711791_A_G:1747,120,0,1747,120,1747:0,0,11,28        ./.:0,49,0:49:99:0|1:711791_A_G:2238,153,0,2238,153,2238:0,0,27,22      ./.:0,45,0:45:99:0|1:711791_A_G:1944,135,0,1944,135,1944:0,0,22,23      ./.:0,52,0:52:99:0|1:711791_A_G:2326,162,0,2326,162,2326:0,0,24,28        ./.:0,25,0:25:75:0|1:711791_A_G:1118,75,0,1118,75,1118:0,0,12,13        ./.:0,47,0:47:99:0|1:711791_A_G:2110,144,0,2110,144,2110:0,0,20,27      ./.:0,35,0:35:99:0|1:711791_A_G:1502,105,0,1502,105,1502:0,0,16,19      ./.:0,50,0:50:99:0|1:711791_A_G:2213,150,0,2213,150,2213:0,0,22,28        ./.:0,50,0:50:99:0|1:711791_A_G:2304,159,0,2304,159,2304:0,0,31,19      ./.:0,37,0:37:99:0|1:711791_A_G:1687,114,0,1687,114,1687:0,0,20,17      ./.:0,35,0:35:99:0|1:711791_A_G:1570,111,0,1570,111,1570:0,0,17,18        ./.:0,44,0:44:99:0|1:711791_A_G:2009,138,0,2009,138,2009:0,0,21,23      ./.:0,49,0:49:99:0|1:711791_A_G:2195,150,0,2195,150,2195:0,0,16,33      ./.:0,41,0:41:99:0|1:711791_A_G:1867,132,0,1867,132,1867:0,0,18,23        ./.:0,45,0:45:99:0|1:711791_A_G:2029,138,0,2029,138,2029:0,0,23,22      ./.:0,51,0:51:99:0|1:711791_A_G:2293,154,0,2293,154,2293:0,0,23,28      ./.:0,35,0:35:99:0|1:711791_A_G:1618,111,0,1618,111,1618:0,0,15,20      ./.:0,60,0:60:99:0|1:711791_A_G:2670,184,0,2670,184,2670:0,0,24,36        ./.:0,26,0:26:84:0|1:711791_A_G:1243,84,0,1243,84,1243:0,0,4,22 ./.:0,50,0:50:99:0|1:711791_A_G:2243,154,0,2243,154,2243:0,0,22,28      ./.:0,48,0:48:99:0|1:711791_A_G:2196,147,0,2196,147,2196:0,0,28,20        ./.:0,49,0:49:99:0|1:711791_A_G:2225,151,0,2225,151,2225:0,0,31,18      ./.:0,30,0:30:90:0|1:711791_A_G:1339,90,0,1339,90,1339:0,0,9,21 ./.:0,27,0:27:81:0|1:711791_A_G:1197,81,0,1197,81,1197:0,0,12,15 ./.:0,40,0:40:99:0|1:711791_A_G:1759,120,0,1759,120,1759:0,0,15,25       ./.:0,36,0:36:99:0|1:711791_A_G:1590,108,0,1590,108,1590:0,0,20,16      ./.:0,32,0:32:96:0|1:711791_A_G:1436,96,0,1436,96,1436:0,0,14,18        ./.:0,35,0:35:99:0|1:711791_A_G:1574,108,0,1574,108,1574:0,0,17,18        ./.:0,47,0:47:99:0|1:711791_A_G:2083,144,0,2083,144,2083:0,0,27,20      ./.:0,39,0:39:99:0|1:711791_A_G:1748,117,0,1748,117,1748:0,0,18,21      ./.:0,35,0:35:99:0|1:711791_A_G:1522,105,0,1522,105,1522:0,0,18,17        ./.:0,34,0:34:99:0|1:711791_A_G:1557,105,0,1557,105,1557:0,0,19,15      ./.:0,44,0:44:99:0|1:711791_A_G:2002,138,0,2002,138,2002:0,0,18,26      ./.:0,39,0:39:99:0|1:711791_A_G:1814,123,0,1814,123,1814:0,0,14,25      ./.:0,37,0:37:99:0|1:711791_A_G:1685,117,0,1685,117,1685:0,0,20,17        ./.:0,28,0:28:84:0|1:711791_A_G:1215,84,0,1215,84,1215:0,0,11,17
  • splaisansplaisan Leuven (Belgium)Member ✭✭

    Could this be a bug? I looked into a output and it seems OK on that regard.

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
    edited September 2018

    Hi @splaisan, the workflow for joint calling with HaplotypeCaller is as follows:

    1. Run HaplotypeCaller per sample in GVCF mode --> per-sample GVCFs
    2. Collate GVCFs from cohort samples with CombineGVCFs or GenomicsDBImport --> cohort-level GVCF
    3. Genotype the cohort-level GVCF with GenotypeGVCFs --> cohort-level VCF callset

    It appears you are at step 2 of this workflow. After you complete step 3, the<NON_REF> alleles are absorbed by the process and are absent from the cohort-level VCF callset.

    Post edited by shlee on
  • KatieKatie United StatesMember ✭✭

    I am running into a similar problem: when I use GenotypeGVCFs to convert my single-sample gVCF into a VCF file, many of the variant alleles are coded as NON_REF in the final callset. I used GATK for a haploid sample. The correct allele call should be GCACC, but in the VCF is coded .

    The problematic line is here in the gVCF:

    NC_000962.3 338792  .   G   *,GCACC,<NON_REF>   1160.01 .   DP=136;MLEAC=0,0,1;MLEAF=0.00,0.00,1.00;RAW_MQandDP=472359,136GT:AD:DP:GQ:PL:SB 3:0,0,0,0:0:99:1170,7898,2147483647,0:0,0,0,0

    And here in the VCF file:

    NC_000962.3 338792  .   G   <NON_REF>   1160.01 .   AC=1;AF=1.00;AN=1;DP=136;FS=0.000;MLEAC=1;MLEAF=1.00;MQ=58.93;SOR=0.69GT:AD:DP:GQ:PL    1:0,0:0:99:1170,0

    My commands are below. Thank you in advance!

    /ifs/labs/andrews/walter/bin/gatk-  --java-options "-Xmx50g" HaplotypeCaller \
    -R ${REF_DIR}${ref} \
    -ploidy 1 \
    -I ${BAMS_DIR}${bam} \
    --intervals NC_000962.3:338780-338800 \
    -ERC GVCF \
    -O test.g.vcf
    /ifs/labs/andrews/walter/bin/gatk-  GenotypeGVCFs \
    -R ${REF_DIR}${ref} \
    --variant test.g.vcf \
    -ploidy 1 \
    --include-non-variant-sites true \
    --intervals NC_000962.3:338780-338800 \
    --output test.vcf
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @Katie

    Can you please provide the bam snippets from around the positions where alt alleles occur? And also provide the reference file that was used. You can follow this doc to send us these files: https://software.broadinstitute.org/gatk/guide/article?id=1894

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @Katie

    We have not heard from you in 2 business days and so we are not closing this issue. Please write to us if you have any other questions.

Sign In or Register to comment.