Is it possible to suppress the NON_REF tag on variant calls?

Hello,

In GVCF output from HaplotypeCaller, each line contains the allele, including the lines with explicit variant calls. Is there a simple way to suppress the allele on variant calls?

Also, what is the reason to have a allele on a variant where a specific alternate allele is called?

Thanks for your help.

Answers

  • shleeshlee CambridgeMember, Broadie, Moderator admin
    edited February 2017

    Hi @smottarella,

    Please refer to this document and this doc and links therein for more detailed explanations as to why you want to run HaplotypeCaller in reference confidence mode.

  • Thank you for this information. My question is specific to running HaplotypeCaller in reference confidence mode. I am still not understanding why the NON_REF allele is present on lines that have explicit alternate alleles. In the case of an explicit alternate allele, why are you passing the confidence that at this position my sample is generically not reference (meaning the NON_REF allele) when that same position has a high enough confidence to explicitly call a variant? And again, is there any way to have HaplotypeCaller not return a NON_REF allele on the lines produced in reference confidence mode that already have an explicit alternate allele called, while still providing the NON_REF allele for the blocks that are being called as homozygous reference? I'm finding cases where HaplotypeCaller is callling a genotype that points to NON_REF despite having explicit alternate alleles listed. I'd prefer to only call explicit alleles, and then in the case of homozygous reference, provide the confidence for NON_REF.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    @smottarella When run in GVCF mode, HaplotypeCaller produces a GVCF file, which is an intermediate intended to be further processed by GenotypeGVCFs, as documented in the links above. Once you run this second tool the NONREF tags will go away.
  • splaisansplaisan Leuven (Belgium)Member ✭✭
    edited September 28

    running GATK 4.0.7.0

    I ran GenotypeGVCFs on a merge of 37 single genome g.vcf files (merged with CombineGVCFs) and a lot of full "NON_REF" as well as mixed "NON_REF" & normal calls are printed to the output .vcf file

    How can I get rid of these NON_REF in the VCF? (especially the mixed rows making I simply cannot grep out all lines with NON_REF!)

    Thanks for help
    Stephane

    3 examples with NON_REF in different forms

    2L      4517    .       A       <NON_REF>       .       .       END=4525        GT:DP:GQ:MIN_DP:PL      ./.:1:3:1:0,3,32        ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:1:3:1:0,3,33        ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:1:3:1:0,3,33        ./.:0:0:0:0,0,0 ./.:1:3:1:0,3,34        ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:1:3:1:0,3,34        ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0
    
    2L      5042    .       C       *,<NON_REF>     .       .       DP=772  GT:AD:DP:GQ:PGT:PID:PL:SB       ./.:0,15,0:15:45:0|1:5038_TCGACAA_T:675,45,0,675,45,675:0,0,12,3        ./.:0,26,0:26:78:0|1:5038_TCGACAA_T:1159,78,0,1159,78,1159:0,0,13,
    13      ./.:0,18,0:18:55:0|1:5038_TCGACAA_T:811,55,0,811,55,811:0,0,14,4        ./.:0,28,0:28:85:0|1:5038_TCGACAA_T:1261,85,0,1261,85,1261:0,0,24,4     ./.:0,20,0:20:60:0|1:5038_TCGACAA_T:900,60,0,900,60,900:0,0,14,6        ./.:0,15,0:15:45:0
    |1:5038_TCGACAA_T:675,45,0,675,45,675:0,0,11,4  ./.:1,24,0:25:99:0|1:5038_TCGACAA_T:1005,0,458,1008,530,1538:1,0,17,7   ./.:0,7,0:7:21:0|1:5038_TCGACAA_T:315,21,0,315,21,315:0,0,4,3   ./.:0,22,0:22:67:0|1:5038_TCGACAA_T:991,67,0,991,67,991:0,
    0,19,3  ./.:0,20,0:20:63:0|1:5038_TCGACAA_T:945,63,0,945,63,945:0,0,17,3        ./.:0,22,0:22:67:0|1:5038_TCGACAA_T:992,67,0,992,67,992:0,0,18,4        ./.:0,7,0:7:24:0|1:5038_TCGACAA_T:360,24,0,360,24,360:0,0,6,1   ./.:0,30,0:30:90:0|1:5038_
    TCGACAA_T:1350,90,0,1350,90,1350:0,0,23,7       ./.:0,24,0:24:75:0|1:5038_TCGACAA_T:1125,75,0,1125,75,1125:0,0,17,7     ./.:0,21,0:21:64:0|1:5038_TCGACAA_T:946,64,0,946,64,946:0,0,17,4        ./.:0,24,0:24:72:0|1:5038_TCGACAA_T:1069,72,0,1069
    ,72,1069:0,0,18,6       ./.:0,13,0:13:40:0|1:5038_TCGACAA_T:587,40,0,587,40,587:0,0,11,2        ./.:0,25,0:25:75:0|1:5038_TCGACAA_T:1125,75,0,1125,75,1125:0,0,22,3     ./.:0,13,0:13:39:0|1:5038_TCGACAA_T:585,39,0,585,39,585:0,0,9,4 ./.:0,20,0
    :20:61:0|1:5038_TCGACAA_T:901,61,0,901,61,901:0,0,18,2  ./.:0,19,0:19:58:0|1:5038_TCGACAA_T:856,58,0,856,58,856:0,0,15,4        ./.:0,32,0:32:96:0|1:5038_TCGACAA_T:1405,96,0,1405,96,1405:0,0,23,9     ./.:0,21,0:21:64:0|1:5038_TCGACAA_T:946,64
    ,0,946,64,946:0,0,15,6  ./.:1,22,0:23:87:0|1:5038_TCGACAA_T:921,0,87,924,154,1077:1,0,19,3      ./.:1,27,0:28:99:0|1:5038_TCGACAA_T:1131,0,482,1134,564,1698:0,1,15,12  ./.:0,17,0:17:52:0|1:5038_TCGACAA_T:766,52,0,766,52,766:0,0,13,4        ./
    .:0,12,0:12:36:0|1:5038_TCGACAA_T:540,36,0,540,36,540:0,0,9,3   ./.:0,20,0:20:60:0|1:5038_TCGACAA_T:900,60,0,900,60,900:0,0,17,3        ./.:0,20,0:20:66:0|1:5038_TCGACAA_T:990,66,0,990,66,990:0,0,14,6        ./.:0,22,0:22:66:0|1:5038_TCGACAA_
    T:990,66,0,990,66,990:0,0,22,0  ./.:0,21,0:21:63:0|1:5038_TCGACAA_T:945,63,0,945,63,945:0,0,19,2        ./.:0,21,0:21:63:0|1:5038_TCGACAA_T:945,63,0,945,63,945:0,0,16,5        ./.:0,27,0:27:81:0|1:5038_TCGACAA_T:1215,81,0,1215,81,1215:0,0,26,
    1       ./.:0,21,0:21:63:0|1:5038_TCGACAA_T:945,63,0,945,63,945:0,0,17,4        ./.:0,19,0:19:57:0|1:5038_TCGACAA_T:855,57,0,855,57,855:0,0,13,6        ./.:0,24,0:24:72:0|1:5038_TCGACAA_T:1080,72,0,1080,72,1080:0,0,17,7     ./.:0,22,0:22:66:0
    |1:5038_TCGACAA_T:955,66,0,955,66,955:0,0,13,9
    
    2L      711798  .       A       G,<NON_REF>     .       .       DP=1512;ExcessHet=3.01;RAW_MQ=5443200.00        GT:AD:DP:GQ:PGT:PID:PL:SB       ./.:0,33,0:33:99:0|1:711791_A_G:1512,105,0,1512,105,1512:0,0,13,20      ./.:0,39,0:39:99:0|1:711791_A_G:1747,120,0,1747,120,1747:0,0,11,28        ./.:0,49,0:49:99:0|1:711791_A_G:2238,153,0,2238,153,2238:0,0,27,22      ./.:0,45,0:45:99:0|1:711791_A_G:1944,135,0,1944,135,1944:0,0,22,23      ./.:0,52,0:52:99:0|1:711791_A_G:2326,162,0,2326,162,2326:0,0,24,28        ./.:0,25,0:25:75:0|1:711791_A_G:1118,75,0,1118,75,1118:0,0,12,13        ./.:0,47,0:47:99:0|1:711791_A_G:2110,144,0,2110,144,2110:0,0,20,27      ./.:0,35,0:35:99:0|1:711791_A_G:1502,105,0,1502,105,1502:0,0,16,19      ./.:0,50,0:50:99:0|1:711791_A_G:2213,150,0,2213,150,2213:0,0,22,28        ./.:0,50,0:50:99:0|1:711791_A_G:2304,159,0,2304,159,2304:0,0,31,19      ./.:0,37,0:37:99:0|1:711791_A_G:1687,114,0,1687,114,1687:0,0,20,17      ./.:0,35,0:35:99:0|1:711791_A_G:1570,111,0,1570,111,1570:0,0,17,18        ./.:0,44,0:44:99:0|1:711791_A_G:2009,138,0,2009,138,2009:0,0,21,23      ./.:0,49,0:49:99:0|1:711791_A_G:2195,150,0,2195,150,2195:0,0,16,33      ./.:0,41,0:41:99:0|1:711791_A_G:1867,132,0,1867,132,1867:0,0,18,23        ./.:0,45,0:45:99:0|1:711791_A_G:2029,138,0,2029,138,2029:0,0,23,22      ./.:0,51,0:51:99:0|1:711791_A_G:2293,154,0,2293,154,2293:0,0,23,28      ./.:0,35,0:35:99:0|1:711791_A_G:1618,111,0,1618,111,1618:0,0,15,20      ./.:0,60,0:60:99:0|1:711791_A_G:2670,184,0,2670,184,2670:0,0,24,36        ./.:0,26,0:26:84:0|1:711791_A_G:1243,84,0,1243,84,1243:0,0,4,22 ./.:0,50,0:50:99:0|1:711791_A_G:2243,154,0,2243,154,2243:0,0,22,28      ./.:0,48,0:48:99:0|1:711791_A_G:2196,147,0,2196,147,2196:0,0,28,20        ./.:0,49,0:49:99:0|1:711791_A_G:2225,151,0,2225,151,2225:0,0,31,18      ./.:0,30,0:30:90:0|1:711791_A_G:1339,90,0,1339,90,1339:0,0,9,21 ./.:0,27,0:27:81:0|1:711791_A_G:1197,81,0,1197,81,1197:0,0,12,15 ./.:0,40,0:40:99:0|1:711791_A_G:1759,120,0,1759,120,1759:0,0,15,25       ./.:0,36,0:36:99:0|1:711791_A_G:1590,108,0,1590,108,1590:0,0,20,16      ./.:0,32,0:32:96:0|1:711791_A_G:1436,96,0,1436,96,1436:0,0,14,18        ./.:0,35,0:35:99:0|1:711791_A_G:1574,108,0,1574,108,1574:0,0,17,18        ./.:0,47,0:47:99:0|1:711791_A_G:2083,144,0,2083,144,2083:0,0,27,20      ./.:0,39,0:39:99:0|1:711791_A_G:1748,117,0,1748,117,1748:0,0,18,21      ./.:0,35,0:35:99:0|1:711791_A_G:1522,105,0,1522,105,1522:0,0,18,17        ./.:0,34,0:34:99:0|1:711791_A_G:1557,105,0,1557,105,1557:0,0,19,15      ./.:0,44,0:44:99:0|1:711791_A_G:2002,138,0,2002,138,2002:0,0,18,26      ./.:0,39,0:39:99:0|1:711791_A_G:1814,123,0,1814,123,1814:0,0,14,25      ./.:0,37,0:37:99:0|1:711791_A_G:1685,117,0,1685,117,1685:0,0,20,17        ./.:0,28,0:28:84:0|1:711791_A_G:1215,84,0,1215,84,1215:0,0,11,17
    
  • splaisansplaisan Leuven (Belgium)Member ✭✭

    Could this be a 4.0.7.0 bug? I looked into a 4.0.9.0 output and it seems OK on that regard.

  • shleeshlee CambridgeMember, Broadie, Moderator admin
    edited September 28

    Hi @splaisan, the workflow for joint calling with HaplotypeCaller is as follows:

    1. Run HaplotypeCaller per sample in GVCF mode --> per-sample GVCFs
    2. Collate GVCFs from cohort samples with CombineGVCFs or GenomicsDBImport --> cohort-level GVCF
    3. Genotype the cohort-level GVCF with GenotypeGVCFs --> cohort-level VCF callset

    It appears you are at step 2 of this workflow. After you complete step 3, the <NON_REF> alleles are absorbed by the process and are absent from the cohort-level VCF callset.

    Post edited by shlee on
Sign In or Register to comment.